1 <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook V4.1//EN"
2 "http://www.oasis-open.org/docbook/xml/4.1/docbookx.dtd"
4 <!ENTITY % local SYSTEM "local.ent">
6 <!ENTITY % entities SYSTEM "entities.ent">
8 <!ENTITY % idcommon SYSTEM "common/common.ent">
11 <refentry id="zebrasrv">
13 <productname>zebra</productname>
14 <productnumber>&version;</productnumber>
18 <refentrytitle>zebrasrv</refentrytitle>
19 <manvolnum>8</manvolnum>
23 <refname>zebrasrv</refname>
24 <refpurpose>Zebra Server</refpurpose>
30 <refsect1><title>DESCRIPTION</title>
31 <para>Zebra is a high-performance, general-purpose structured text indexing
32 and retrieval engine. It reads structured records in a variety of input
33 formats (eg. email, &acro.xml;, &acro.marc;) and allows access to them through exact
34 boolean search expressions and relevance-ranked free-text queries.
37 <command>zebrasrv</command> is the &acro.z3950; and &acro.sru; frontend
38 server for the <command>Zebra</command> search engine and indexer.
41 On Unix you can run the <command>zebrasrv</command>
42 server from the command line - and put it
43 in the background. It may also operate under the inet daemon.
44 On WIN32 you can run the server as a console application or
49 <title>OPTIONS</title>
52 The options for <command>zebrasrv</command> are the same
53 as those for &yaz;' <command>yaz-ztest</command>.
54 Option <literal>-c</literal> specifies a Zebra configuration
55 file - if omitted <filename>zebra.cfg</filename> is read.
61 <refsect1 id="protocol-support">
62 <title>&acro.z3950; Protocol Support and Behavior</title>
64 <refsect2 id="zebrasrv-initialization">
65 <title>&acro.z3950; Initialization</title>
68 During initialization, the server will negotiate to version 3 of the
69 &acro.z3950; protocol, and the option bits for Search, Present, Scan,
70 NamedResultSets, and concurrentOperations will be set, if requested by
71 the client. The maximum PDU size is negotiated down to a maximum of
77 <refsect2 id="zebrasrv-search">
78 <title>&acro.z3950; Search</title>
81 The supported query type are 1 and 101. All operators are currently
82 supported with the restriction that only proximity units of type "word"
83 are supported for the proximity operator.
84 Queries can be arbitrarily complex.
85 Named result sets are supported, and result sets can be used as operands
87 Searches may span multiple databases.
91 The server has full support for piggy-backed retrieval (see
92 also the following section).
97 <refsect2 id="zebrasrv-present">
98 <title>&acro.z3950; Present</title>
100 The present facility is supported in a standard fashion. The requested
101 record syntax is matched against the ones supported by the profile of
102 each record retrieved. If no record syntax is given, &acro.sutrs; is the
103 default. The requested element set name, again, is matched against any
104 provided by the relevant record profiles.
107 <refsect2 id="zebrasrv-scan">
108 <title>&acro.z3950; Scan</title>
110 The attribute combinations provided with the termListAndStartPoint are
111 processed in the same way as operands in a query (see above).
112 Currently, only the term and the globalOccurrences are returned with
113 the termInfo structure.
116 <refsect2 id="zebrasrv-sort">
117 <title>&acro.z3950; Sort</title>
120 &acro.z3950; specifies three different types of sort criteria.
121 Of these Zebra supports the attribute specification type in which
122 case the use attribute specifies the "Sort register".
123 Sort registers are created for those fields that are of type "sort" in
124 the default.idx file.
125 The corresponding character mapping file in default.idx specifies the
126 ordinal of each character used in the actual sort.
130 &acro.z3950; allows the client to specify sorting on one or more input
131 result sets and one output result set.
132 Zebra supports sorting on one result set only which may or may not
133 be the same as the output result set.
136 <refsect2 id="zebrasrv-close">
137 <title>&acro.z3950; Close</title>
139 If a Close PDU is received, the server will respond with a Close PDU
140 with reason=FINISHED, no matter which protocol version was negotiated
141 during initialization. If the protocol version is 3 or more, the
142 server will generate a Close PDU under certain circumstances,
143 including a session timeout (60 minutes by default), and certain kinds of
144 protocol errors. Once a Close PDU has been sent, the protocol
145 association is considered broken, and the transport connection will be
146 closed immediately upon receipt of further data, or following a short
151 <refsect2 id="zebrasrv-explain">
152 <title>&acro.z3950; Explain</title>
154 Zebra maintains a "classic"
155 <ulink url="&url.z39.50.explain;">&acro.z3950; Explain</ulink> database
157 This database is called <literal>IR-Explain-1</literal> and can be
158 searched using the attribute set <literal>exp-1</literal>.
161 The records in the explain database are of type
162 <literal>grs.sgml</literal>.
163 The root element for the Explain grs.sgml records is
164 <literal>explain</literal>, thus
165 <filename>explain.abs</filename> is used for indexing.
169 Zebra <emphasis>must</emphasis> be able to locate
170 <filename>explain.abs</filename> in order to index the Explain
171 records properly. Zebra will work without it but the information
172 will not be searchable.
177 <refsect1 id="zebrasrv-sru">
178 <title>The &acro.sru; Server</title>
180 In addition to &acro.z3950;, Zebra supports the more recent and
181 web-friendly IR protocol <ulink url="&url.sru;">&acro.sru;</ulink>.
182 &acro.sru; can be carried over &acro.soap; or a &acro.rest;-like protocol
183 that uses HTTP &acro.get; or &acro.post; to request search responses. The request
184 itself is made of parameters such as
185 <literal>query</literal>,
186 <literal>startRecord</literal>,
187 <literal>maximumRecords</literal>
189 <literal>recordSchema</literal>;
190 the response is an &acro.xml; document containing hit-count, result-set
191 records, diagnostics, etc. &acro.sru; can be thought of as a re-casting
192 of &acro.z3950; semantics in web-friendly terms; or as a standardisation
193 of the ad-hoc query parameters used by search engines such as Google
194 and AltaVista; or as a superset of A9's OpenSearch (which it
198 Zebra supports &acro.z3950;, &acro.sru; &acro.get;, SRU &acro.post;, SRU &acro.soap; (&acro.srw;)
199 - on the same port, recognising what protocol is used by each incoming
200 requests and handling them accordingly. This is a achieved through
201 the use of Deep Magic; civilians are warned not to stand too close.
203 <refsect2 id="zebrasrv-sru-run">
204 <title>Running zebrasrv as an &acro.sru; Server</title>
206 Because Zebra supports all protocols on one port, it would
207 seem to follow that the &acro.sru; server is run in the same way as
208 the &acro.z3950; server, as described above. This is true, but only in
209 an uninterestingly vacuous way: a Zebra server run in this manner
210 will indeed recognise and accept &acro.sru; requests; but since it
211 doesn't know how to handle the &acro.cql; queries that these protocols
212 use, all it can do is send failure responses.
216 It is possible to cheat, by having &acro.sru; search Zebra with
217 a &acro.pqf; query instead of &acro.cql;, using the
218 <literal>x-pquery</literal>
220 <literal>query</literal>.
222 <emphasis role="strong">non-standard extension</emphasis>
224 <emphasis role="strong">very naughty</emphasis>
225 thing to do, but it does give you a way to see Zebra serving &acro.sru;
226 ``right out of the box''. If you start your favourite Zebra
227 server in the usual way, on port 9999, then you can send your web
231 http://localhost:9999/Default?version=1.1
232 &operation=searchRetrieve
233 &x-pquery=mineral
235 &maximumRecords=1
238 This will display the &acro.xml;-formatted &acro.sru; response that includes the
239 first record in the result-set found by the query
240 <literal>mineral</literal>. (For clarity, the &acro.sru; URL is shown
241 here broken across lines, but the lines should be joined to gether
242 to make single-line URL for the browser to submit.)
246 In order to turn on Zebra's support for &acro.cql; queries, it's necessary
247 to have the &yaz; generic front-end (which Zebra uses) translate them
248 into the &acro.z3950; Type-1 query format that is used internally. And
249 to do this, the generic front-end's own configuration file must be
250 used. See <xref linkend="gfs-config"/>;
251 the salient point for &acro.sru; support is that
252 <command>zebrasrv</command>
253 must be started with the
254 <literal>-f frontendConfigFile</literal>
255 option rather than the
256 <literal>-c zebraConfigFile</literal>
258 and that the front-end configuration file must include both a
259 reference to the Zebra configuration file and the &acro.cql;-to-&acro.pqf;
260 translator configuration file.
263 A minimal front-end configuration file that does this would read as
270 <config>zebra.cfg</config>
271 <cql2rpn>../../tab/pqf.properties</cql2rpn>
277 <literal><config></literal>
278 element contains the name of the Zebra configuration file that was
279 previously specified by the
280 <literal>-c</literal>
281 command-line argument, and the
282 <literal><cql2rpn></literal>
283 element contains the name of the &acro.cql; properties file specifying how
284 various &acro.cql; indexes, relations, etc. are translated into Type-1
288 A zebra server running with such a configuration can then be
289 queried using proper, conformant &acro.sru; URLs with &acro.cql; queries:
292 http://localhost:9999/Default?version=1.1
293 &operation=searchRetrieve
294 &query=title=utah and description=epicent*
296 &maximumRecords=1
300 <refsect1 id="zebrasrv-sru-support">
301 <title>&acro.sru; Protocol Support and Behavior</title>
303 Zebra running as an &acro.sru; server supports SRU version 1.1, including
304 &acro.cql; version 1.1. In particular, it provides support for the
305 following elements of the protocol.
308 <refsect2 id="zebrasrvr-search-and-retrieval">
309 <title>&acro.sru; Search and Retrieval</title>
312 <ulink url="&url.sru.searchretrieve;">&acro.sru; searchRetrieve</ulink>
316 One of the great strengths of &acro.sru; is that it mandates a standard
317 query language, &acro.cql;, and that all conforming implementations can
318 therefore be trusted to correctly interpret the same queries. It
319 is with some shame, then, that we admit that Zebra also supports
320 an additional query language, our own Prefix Query Format
321 (<ulink url="&url.yaz.pqf;">&acro.pqf;</ulink>).
322 A &acro.pqf; query is submitted by using the extension parameter
323 <literal>x-pquery</literal>,
325 <literal>query</literal>
326 parameter must be omitted, which makes the request not valid &acro.sru;.
327 Please feel free to use this facility within your own
328 applications; but be aware that it is not only non-standard &acro.sru;
329 but not even syntactically valid, since it omits the mandatory
330 <literal>query</literal> parameter.
334 <refsect2 id="zebrasrv-sru-scan">
335 <title>&acro.sru; Scan</title>
337 Zebra supports <ulink url="&url.sru.scan;">&acro.sru; scan</ulink>
339 Scanning using &acro.cql; syntax is the default, where the
340 standard <literal>scanClause</literal> parameter is used.
344 mutant form of &acro.sru; scan is supported, using
345 the non-standard <literal>x-pScanClause</literal> parameter in
346 place of the standard <literal>scanClause</literal> to scan on a
347 &acro.pqf; query clause.
351 <refsect2 id="zebrasrv-sru-explain">
352 <title>&acro.sru; Explain</title>
354 Zebra supports <ulink url="&url.sru.explain;">&acro.sru; explain</ulink>.
357 The ZeeRex record explaining a database may be requested either
358 with a fully fledged &acro.sru; request (with
359 <literal>operation</literal>=<literal>explain</literal>
360 and version-number specified)
361 or with a simple HTTP &acro.get; at the server's basename.
362 The ZeeRex record returned in response is the one embedded
363 in the &yaz; Frontend Server configuration file that is described in the
364 <xref linkend="gfs-config"/>.
367 Unfortunately, the data found in the
368 &acro.cql;-to-&acro.pqf; text file must be added by hand-craft into the explain
369 section of the &yaz; Frontend Server configuration file to be able
370 to provide a suitable explain record.
371 Too bad, but this is all extreme
372 new alpha stuff, and a lot of work has yet to be done ..
375 There is no linkeage whatsoever between the &acro.z3950; explain model
376 and the &acro.sru; explain response (well, at least not implemented
377 in Zebra, that is ..). Zebra does not provide a means using
378 &acro.z3950; to obtain the ZeeRex record.
382 <refsect2 id="zebrasrv-non-sru-ops">
383 <title>Other &acro.sru; operations</title>
385 In the &acro.z3950; protocol, Initialization, Present, Sort and Close
386 are separate operations. In &acro.sru;, however, these operations do not
392 &acro.sru; has no explicit initialization handshake phase, but
393 commences immediately with searching, scanning and explain
399 Neither does &acro.sru; have a close operation, since the protocol is
400 stateless and each request is self-contained. (It is true that
401 multiple &acro.sru; request/response pairs may be implemented as
402 multiple HTTP request/response pairs over a single persistent
403 TCP/IP connection; but the closure of that connection is not a
404 protocol-level operation.)
409 Retrieval in &acro.sru; is part of the
410 <literal>searchRetrieve</literal> operation, in which a search
411 is submitted and the response includes a subset of the records
412 in the result set. There is no direct analogue of &acro.z3950;'s
413 Present operation which requests records from an established
414 result set. In &acro.sru;, this is achieved by sending a subsequent
415 <literal>searchRetrieve</literal> request with the query
416 <literal>cql.resultSetId=</literal><emphasis>id</emphasis> where
417 <emphasis>id</emphasis> is the identifier of the previously
418 generated result-set.
423 Sorting in &acro.cql; is done within the
424 <literal>searchRetrieve</literal> operation - in v1.1, by an
425 explicit <literal>sort</literal> parameter, but the forthcoming
426 v1.2 or v2.0 will most likely use an extension of the query
427 language, <ulink url="&url.cql.sorting;">&acro.cql; sorting</ulink>.
432 It can be seen, then, that while Zebra operating as an &acro.sru; server
433 does not provide the same set of operations as when operating as a
434 &acro.z3950; server, it does provide equivalent functionality.
439 <refsect1 id="zebrasrv-sru-examples">
440 <title>&acro.sru; Examples</title>
442 Surf into <literal>http://localhost:9999</literal>
443 to get an explain response, or use
445 http://localhost:9999/?version=1.1&operation=explain
449 See number of hits for a query
451 http://localhost:9999/?version=1.1&operation=searchRetrieve
452 &query=text=(plant%20and%20soil)
456 Fetch record 5-7 in Dublin Core format
458 http://localhost:9999/?version=1.1&operation=searchRetrieve
459 &query=text=(plant%20and%20soil)
460 &startRecord=5&maximumRecords=2&recordSchema=dc
464 Even search using &acro.pqf; queries using the <emphasis>extended naughty
465 parameter</emphasis> <literal>x-pquery</literal>
467 http://localhost:9999/?version=1.1&operation=searchRetrieve
468 &x-pquery=@attr%201=text%20@and%20plant%20soil
472 Or scan indexes using the <emphasis>extended extremely naughty
473 parameter</emphasis> <literal>x-pScanClause</literal>
475 http://localhost:9999/?version=1.1&operation=scan
476 &x-pScanClause=@attr%201=text%20something
478 <emphasis>Don't do this in production code!</emphasis>
479 But it's a great fast debugging aid.
484 <refsect1 id="gfs-config"><title>&yaz; server virtual hosts</title>
488 <refsect1><title>SEE ALSO</title>
491 <refentrytitle>zebraidx</refentrytitle>
492 <manvolnum>1</manvolnum>
498 <!-- Keep this comment at the end of the file
503 sgml-minimize-attributes:nil
504 sgml-always-quote-attributes:t
507 sgml-parent-document: "zebra.xml"
508 sgml-local-catalogs: nil
509 sgml-namecase-general:t