-<chapter id="zebrasrv">
- <!-- $Id: server.xml,v 1.26 2006-09-03 21:37:27 adam Exp $ -->
- <title>The Z39.50 Server</title>
-
- <sect1 id="zebrasrv-running">
- <title>Running the Z39.50 Server (zebrasrv)</title>
-
- <!--
- FIXME - We need to be consistent here, zebraidx had the options at the
- end, and lots of explaining text before them. Same for zebrasvr! -H
- FIXME - At least we need a small intro, what is zebrasvr, and how it
- can be run (inetd, nt service, stand-alone program, daemon...) -H
- -->
-
- <!-- re-write by MC, using the newly created input files for the
- zebrasrv manpage -->
-
-
- <sect2 id="zebrasrv-description"><title>Description</title>
- <para>Zebra is a high-performance, general-purpose structured text indexing
- and retrieval engine. It reads structured records in a variety of input
- formats (eg. email, XML, MARC) and allows access to them through exact
- boolean search expressions and relevance-ranked free-text queries.
- </para>
- <para>
- <command>zebrasrv</command> is the Z39.50 and <ulink url="http://www.loc.gov/standards/sru/srw/">SRW</ulink>/U frontend
- server for the <command>Zebra</command> indexer.
- </para>
- <para>
- On Unix you can run the <command>zebrasrv</command>
- server from the command line - and put it
- in the background. It may also operate under the inet daemon.
- On WIN32 you can run the server as a console application or
- as a WIN32 Service.
- </para>
- </sect2>
-
- <sect2 id="zebrasrv-synopsis">
- <title>Synopsis</title>
- &zebrasrv-synopsis;
- </sect2>
-
- <sect2 id="zebrasrv-options">
- <title>Options</title>
-
- <para>
- The options for <command>zebrasrv</command> are the same
- as those for YAZ' <command>yaz-ztest</command>.
- Option <literal>-c</literal> specifies a Zebra configuration
- file - if omitted <filename>zebra.cfg</filename> is read.
- </para>
-
- &zebrasrv-options;
- </sect2>
-
- <sect2 id="zebrasrv-files"><title>Files</title>
- <para>
- <filename>zebra.cfg</filename>
- </para>
- </sect2>
- <sect2 id="zebrasrv-see-also"><title>See Also</title>
- <para>
- <citerefentry>
- <refentrytitle>zebraidx</refentrytitle>
- <manvolnum>1</manvolnum>
- </citerefentry>,
- <citerefentry>
- <refentrytitle>yaz-ztest</refentrytitle>
- <manvolnum>8</manvolnum>
- </citerefentry>
- </para>
- <para>
- The Zebra software is Copyright <command>Index Data</command>
- <filename>http://www.indexdata.dk</filename>
- and distributed under the
- GPLv2 license.
- </para>
- </sect2>
-
- <!--
- <para>
- <emphasis remap="bf">Syntax</emphasis>
-
- <screen>
- zebrasrv [options] [listener-address ...]
- </screen>
-
- </para>
-
- <para>
- <emphasis remap="bf">Options</emphasis>
- <variablelist>
-
- <varlistentry>
- <term>-a <replaceable>APDU file</replaceable></term>
- <listitem>
- <para>
- Specify a file for dumping PDUs (for diagnostic purposes).
- The special name "-" sends output to <literal>stderr</literal>.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>-c <replaceable>config-file</replaceable></term>
- <listitem>
- <para>
- Read configuration information from
- <replaceable>config-file</replaceable>.
- The default configuration is <literal>./zebra.cfg</literal>.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>-S</term>
- <listitem>
- <para>
- Don't fork on connection requests. This can be useful for
- symbolic-level debugging. The server can only accept a single
- connection in this mode.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>-z</term>
- <listitem>
- <para>
- Use the Z39.50 protocol. Currently the only protocol supported.
- The option is retained for historical reasons, and for future
- extensions.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>-l <replaceable>logfile</replaceable></term>
- <listitem>
- <para>
- Specify an output file for the diagnostic messages.
- The default is to write this information to <literal>stderr</literal>.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>-v <replaceable>log-level</replaceable></term>
- <listitem>
- <para>
- The log level. Use a comma-separated list of members of the set
- {fatal,debug,warn,log,all,none}.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>-u <replaceable>username</replaceable></term>
- <listitem>
- <para>
- Set user ID. Sets the real UID of the server process to that of the
- given <replaceable>username</replaceable>.
- It's useful if you aren't comfortable with having the
- server run as root, but you need to start it as such to bind a
- privileged port.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>-w <replaceable>working-directory</replaceable></term>
- <listitem>
- <para>
- Change working directory.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>-i</term>
- <listitem>
- <para>
- Run under the Internet superserver, <literal>inetd</literal>.
- Make sure you use the logfile option <literal>-l</literal> in
- conjunction with this mode and specify the <literal>-l</literal>
- option before any other options.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>-t <replaceable>timeout</replaceable></term>
- <listitem>
- <para>
- Set the idle session timeout (default 60 minutes).
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>-k <replaceable>kilobytes</replaceable></term>
- <listitem>
- <para>
- Set the (approximate) maximum size of
- present response messages. Default is 1024 KB (1 MB).
- </para>
- </listitem>
- </varlistentry>
- </variablelist>
- </para>
- -->
- </sect1>
-
-
- <sect1 id="protocol-support">
- <title>Z39.50 Protocol Support and Behavior</title>
-
- <sect2 id="zebrasrv-initialization">
- <title>Initialization</title>
-
- <para>
- During initialization, the server will negotiate to version 3 of the
- Z39.50 protocol, and the option bits for Search, Present, Scan,
- NamedResultSets, and concurrentOperations will be set, if requested by
- the client. The maximum PDU size is negotiated down to a maximum of
- 1 MB by default.
- </para>
-
- </sect2>
-
- <sect2 id="search">
- <title>Search</title>
-
- <!--
- FIXME - Need to explain the string tag stuff before people get bogged
- down with all these attribute numbers. Perhaps in its own
- chapter? -H
- -->
-
- <para>
- The supported query type are 1 and 101. All operators are currently
- supported with the restriction that only proximity units of type "word"
- are supported for the proximity operator.
- Queries can be arbitrarily complex.
- Named result sets are supported, and result sets can be used as operands
- without limitations.
- Searches may span multiple databases.
- </para>
-
- <para>
- The server has full support for piggy-backed retrieval (see
- also the following section).
- </para>
-
- </sect2>
-
- <sect2 id="zebrasrv-present">
- <title>Present</title>
- <para>
- The present facility is supported in a standard fashion. The requested
- record syntax is matched against the ones supported by the profile of
- each record retrieved. If no record syntax is given, SUTRS is the
- default. The requested element set name, again, is matched against any
- provided by the relevant record profiles.
- </para>
- </sect2>
- <sect2 id="zebrasrv-scan">
- <title>Scan</title>
- <para>
- The attribute combinations provided with the termListAndStartPoint are
- processed in the same way as operands in a query (see above).
- Currently, only the term and the globalOccurrences are returned with
- the termInfo structure.
- </para>
- </sect2>
- <sect2 id="zebrasrv-sort">
- <title>Sort</title>
-
- <para>
- Z39.50 specifies three different types of sort criteria.
- Of these Zebra supports the attribute specification type in which
- case the use attribute specifies the "Sort register".
- Sort registers are created for those fields that are of type "sort" in
- the default.idx file.
- The corresponding character mapping file in default.idx specifies the
- ordinal of each character used in the actual sort.
- </para>
-
- <para>
- Z39.50 allows the client to specify sorting on one or more input
- result sets and one output result set.
- Zebra supports sorting on one result set only which may or may not
- be the same as the output result set.
- </para>
- </sect2>
- <sect2 id="zebrasrv-close">
- <title>Close</title>
- <para>
- If a Close PDU is received, the server will respond with a Close PDU
- with reason=FINISHED, no matter which protocol version was negotiated
- during initialization. If the protocol version is 3 or more, the
- server will generate a Close PDU under certain circumstances,
- including a session timeout (60 minutes by default), and certain kinds of
- protocol errors. Once a Close PDU has been sent, the protocol
- association is considered broken, and the transport connection will be
- closed immediately upon receipt of further data, or following a short
- timeout.
- </para>
- </sect2>
-
- <sect2 id="zebrasrv-explain">
- <title>Explain</title>
- <para>
- Zebra maintains a "classic"
- <ulink url="&url.z39.50.explain;">Explain</ulink> database
- on the side.
- This database is called <literal>IR-Explain-1</literal> and can be
- searched using the attribute set <literal>exp-1</literal>.
- </para>
- <para>
- The records in the explain database are of type
- <literal>grs.sgml</literal>.
- The root element for the Explain grs.sgml records is
- <literal>explain</literal>, thus
- <filename>explain.abs</filename> is used for indexing.
- </para>
- <note>
- <para>
- Zebra <emphasis>must</emphasis> be able to locate
- <filename>explain.abs</filename> in order to index the Explain
- records properly. Zebra will work without it but the information
- will not be searchable.
- </para>
- </note>
- </sect2>
- </sect1>
-</chapter>
-
-
-<chapter id="zebrasrv-sru">
- <title>The SRU/SRW Server</title>
- <para>
- In addition to Z39.50, Zebra supports the more recent and
- web-friendly IR protocol SRU, described at
- <ulink url="http://www.loc.gov/sru"/>.
- SRU is ``Search/Retrieve via URL'', a simple, REST-like protocol
- that uses HTTP GET to request search responses. The request
- itself is made of parameters such as
- <literal>query</literal>,
- <literal>startRecord</literal>,
- <literal>maximumRecords</literal>
- and
- <literal>recordSchema</literal>;
- the response is an XML document containing hit-count, result-set
- records, diagnostics, etc. SRU can be thought of as a re-casting
- of Z39.50 semantics in web-friendly terms; or as a standardisation
- of the ad-hoc query parameters used by search engines such as Google
- and AltaVista; or as a superset of A9's OpenSearch (which it
- predates).
- </para>
- <para>
- Zebra further supports SRW, described at
- <ulink url="http://www.loc.gov/srw"/>.
- SRW is the ``Search/Retrieve Web Service'', a SOAP-based alternative
- implementation of the abstract protocol that SRU implements as HTTP
- GET requests. In SRW, requests are encoded as XML documents which
- are posted to the server. The responses are identical to those
- returned by SRU servers, except that they are wrapped in a several
- layers of SOAP envelope.
- </para>
- <para>
- Zebra supports all three protocols - Z39.50, SRU and SRW - on the
- same port, recognising what protocol is used by each incoming
- requests and handling them accordingly. This is a achieved through
- the use of Deep Magic; civilians are warned not to stand too close.
- </para>
- <para>
- From here on, ``SRU'' is used to indicate both the SRU and SRW
- protocols, as they are identical except for the transport used for
- the protocol packets and Zebra's support for them is equivalent.
- </para>
-
- <sect1 id="zebrasrv-sru-run">
- <title>Running the SRU Server (zebrasrv)</title>
- <para>
- Because Zebra supports all three protocols on one port, it would
- seem to follow that the SRU server is run in the same way as
- the Z39.50 server, as described above. This is true, but only in
- an uninterestingly vacuous way: a Zebra server run in this manner
- will indeed recognise and accept SRU requests; but since it
- doesn't know how to handle the CQL queries that these protocols
- use, all it can do is send failure responses.
- </para>
- <note>
- <para>
- It is possible to cheat, by having SRU search Zebra with
- a PQF query instead of CQL, using the
- <literal>x-pquery</literal>
- parameter instead of
- <literal>query</literal>.
- This is a
- <emphasis role="strong">non-standard extension</emphasis>
- of CQL, and a
- <emphasis role="strong">very naughty</emphasis>
- thing to do, but it does give you a way to see Zebra serving SRU
- ``right out of the box''. If you start your favourite Zebra
- server in the usual way, on port 9999, then you can send your web
- browser to:
- </para>
- <screen>
- http://localhost:9999/Default?version=1.1
- &operation=searchRetrieve
- &x-pquery=mineral
- &startRecord=1
- &maximumRecords=1
- </screen>
- <para>
- This will display the XML-formatted SRU response that includes the
- first record in the result-set found by the query
- <literal>mineral</literal>. (For clarity, the SRU URL is shown
- here broken across lines, but the lines should be joined to gether
- to make single-line URL for the browser to submit.)
- </para>
- </note>
- <para>
- In order to turn on Zebra's support for CQL queries, it's necessary
- to have the YAZ generic front-end (which Zebra uses) translate them
- into the Z39.50 Type-1 query format that is used internally. And
- to do this, the generic front-end's own configuration file must be
- used. This file is described
- <link linkend="gfs-config">elsewhere</link>;
- the salient point for SRU support is that
- <command>zebrasrv</command>
- must be started with the
- <literal>-f frontendConfigFile</literal>
- option rather than the
- <literal>-c zebraConfigFile</literal>
- option,
- and that the front-end configuration file must include both a
- reference to the Zebra configuration file and the CQL-to-PQF
- translator configuration file.
- </para>
- <para>
- A minimal front-end configuration file that does this would read as
- follows:
- </para>
- <screen><![CDATA[
- <yazgfs>
- <server>
- <config>zebra.cfg</config>
- <cql2rpn>../../tab/pqf.properties</cql2rpn>
- </server>
- </yazgfs>
-]]></screen>
- <para>
- The
- <literal><config></literal>
- element contains the name of the Zebra configuration file that was
- previously specified by the
- <literal>-c</literal>
- command-line argument, and the
- <literal><cql2rpn></literal>
- element contains the name of the CQL properties file specifying how
- various CQL indexes, relations, etc. are translated into Type-1
- queries.
- </para>
- <para>
- A zebra server running with such a configuration can then be
- queried using proper, conformant SRU URLs with CQL queries:
- </para>
- <screen>
- http://localhost:9999/Default?version=1.1
- &operation=searchRetrieve
- &query=title=utah and description=epicent*
- &startRecord=1
- &maximumRecords=1
- </screen>
- </sect1>
-
- <sect1 id="zebrasrv-sru-support">
- <title>SRU and SRW Protocol Support and Behavior</title>
- <para>
- Zebra running as an SRU server supports SRU version 1.1, including
- CQL version 1.1. In particular, it provides support for the
- following elements of the protocol.
- </para>
-
- <sect2 id="zebrasrvr-search-and-retrieval">
- <title>Search and Retrieval</title>
- <para>
- Zebra fully supports SRU's core
- <literal>searchRetrieve</literal>
- operation, as described at
- <ulink url="http://www.loc.gov/standards/sru/sru-spec.html"/>
- </para>
- <para>
- One of the great strengths of SRU is that it mandates a standard
- query language, CQL, and that all conforming implementations can
- therefore be trusted to correctly interpret the same queries. It
- is with some shame, then, that we admit that Zebra also supports
- an additional query language, our own Prefix Query Format (PQF,
- <ulink url="http://indexdata.com/yaz/doc/tools.tkl#PQF"/>).
- A PQF query is submitted by using the extension parameter
- <literal>x-pquery</literal>,
- in which case the
- <literal>query</literal>
- parameter must be omitted, which makes the request not valid SRU.
- Please don't do this.
- </para>
- </sect2>
-
- <sect2 id="zebrasrv-scan">
- <title>Scan</title>
- <para>
- Zebra supports SRU's
- <literal>scan</literal>
- operation, as described at
- <ulink url="http://www.loc.gov/standards/sru/scan/"/>.
- Scanning using CQL syntax is the default, where the
- standard <literal>scanClause</literal> parameter is used.
- </para>
- <para>
- In addition, a
- mutant form of SRU scan is supported, using
- the non-standard <literal>x-pScanClause</literal> parameter in
- place of the standard <literal>scanClause</literal> to scan on a
- PQF query clause.
- </para>
- </sect2>
-
- <sect2 id="zebrasrv-explain">
- <title>Explain</title>
- <para>
- Zebra fully supports SRU's core
- <literal>explain</literal>
- operation, as described at
- <ulink url="http://www.loc.gov/standards/sru/explain/index.html"/>
- </para>
- <para>
- The ZeeRex record explaining a database may be requested either
- with a fully fledged SRU request (with
- <literal>operation</literal>=<literal>explain</literal>
- and version-number specified)
- or with a simple HTTP GET at the server's basename.
- The ZeeRex record returned in response is the one embedded
- in the YAZ Frontend Server configuration file that is described in the
- <link linkend="gfs-config">Virtual Hosts</link> documentation.
- </para>
- <para>
- Unfortunately, the data found in the
- CQL-to-PQF text file must be added by hand-craft into the explain
- section of the YAZ Frontend Server configuration file to be able
- to provide a suitable explain record.
- Too bad, but this is all extreme
- new alpha stuff, and a lot of work has yet to be done ..
- </para>
- <para>
- There is no linkeage whatsoever between the Z39.50 explain model
- and the SRU/SRW explain response (well, at least not implemented
- in Zebra, that is ..). Zebra does not provide a means using
- Z39.50 to obtain the ZeeRex record.
- </para>
- </sect2>
-
- <sect2 id="zebrasrv-sru-examples">
- <title>Some SRU Examples</title>
- <para>
- Surf into <literal>http://localhost:9999</literal>
- to get an explain response, or use
- <screen><![CDATA[
- http://localhost:9999/?version=1.1&operation=explain
- ]]></screen>
- </para>
- <para>
- See number of hits for a query
- <screen><![CDATA[
- http://localhost:9999/?version=1.1&operation=searchRetrieve
- &query=text=(plant%20and%20soil)
- ]]></screen>
- </para>
- <para>
- Fetch record 5-7 in Dublin Core format
- <screen><![CDATA[
- http://localhost:9999/?version=1.1&operation=searchRetrieve
- &query=text=(plant%20and%20soil)
- &startRecord=5&maximumRecords=2&recordSchema=dc
- ]]></screen>
- </para>
- <para>
- Even search using PQF queries using the <emphasis>extended naughty
- verb</emphasis> <literal>x-pquery</literal>
- <screen><![CDATA[
- http://localhost:9999/?version=1.1&operation=searchRetrieve
- &x-pquery=@attr%201=text%20@and%20plant%20soil
- ]]></screen>
- </para>
- <para>
- Or scan indexes using the <emphasis>extended extremely naughty
- verb</emphasis> <literal>x-pScanClause</literal>
- <screen><![CDATA[
- http://localhost:9999/?version=1.1&operation=scan
- &x-pScanClause=@attr%201=text%20something
- ]]></screen>
- <emphasis>Don't do this in production code!</emphasis>
- But it's a great fast debugging aid.
- </para>
- </sect2>
-
- <sect2 id="zebrasrv-non-sru-ops">
- <title>Initialization, Present, Sort, Close</title>
- <para>
- In the Z39.50 protocol, Initialization, Present, Sort and Close
- are separate operations. In SRU, however, these operations do not
- exist.
- </para>
- <itemizedlist>
- <listitem>
- <para>
- SRU has no explicit initialization handshake phase, but
- commences immediately with searching, scanning and explain
- operations.
- </para>
- </listitem>
- <listitem>
- <para>
- Neither does SRU have a close operation, since the protocol is
- stateless and each request is self-contained. (It is true that
- multiple SRU request/response pairs may be implemented as
- multiple HTTP request/response pairs over a single persistent
- TCP/IP connection; but the closure of that connection is not a
- protocol-level operation.)
- </para>
- </listitem>
- <listitem>
- <para>
- Retrieval in SRU is part of the
- <literal>searchRetrieve</literal> operation, in which a search
- is submitted and the response includes a subset of the records
- in the result set. There is no direct analogue of Z39.50's
- Present operation which requests records from an established
- result set. In SRU, this is achieved by sending a subsequent
- <literal>searchRetrieve</literal> request with the query
- <literal>cql.resultSetId=</literal><emphasis>id</emphasis> where
- <emphasis>id</emphasis> is the identifier of the previously
- generated result-set.
- </para>
- </listitem>
- <listitem>
- <para>
- Sorting in CQL is done within the
- <literal>searchRetrieve</literal> operation - in v1.1, by an
- explicit <literal>sort</literal> parameter, but the forthcoming
- v1.2 or v2.0 will most likely use an extension of the query
- language, CQL for sorting: see
- <ulink url="http://zing.z3950.org/cql/sorting.html"/>
- </para>
- </listitem>
- </itemizedlist>
- <para>
- It can be seen, then, that while Zebra operating as an SRU server
- does not provide the same set of operations as when operating as a
- Z39.50 server, it does provide equivalent functionality.
- </para>
- </sect2>
- </sect1>
-</chapter>
-
- <!-- Keep this comment at the end of the file
- Local variables:
- mode: sgml
- sgml-omittag:t
- sgml-shorttag:t
- sgml-minimize-attributes:nil
- sgml-always-quote-attributes:t
- sgml-indent-step:1
- sgml-indent-data:t
- sgml-parent-document: "zebra.xml"
- sgml-local-catalogs: nil
- sgml-namecase-general:t
- End:
- -->