--- /dev/null
+<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook V4.1//EN"
+ "http://www.oasis-open.org/docbook/xml/4.1/docbookx.dtd"
+[
+ <!ENTITY % local SYSTEM "local.ent">
+ %local;
+ <!ENTITY % entities SYSTEM "entities.ent">
+ %entities;
+ <!ENTITY % common SYSTEM "common/common.ent">
+ %common;
+]>
+ <!-- $Id: zebrasrv.xml,v 1.1 2006-09-05 12:01:31 adam Exp $ -->
+<refentry id="zebrasrv">
+ <refentryinfo>
+ <productname>ZEBRA</productname>
+ <productnumber>&version;</productnumber>
+ </refentryinfo>
+
+ <refmeta>
+ <refentrytitle>zebrasrv</refentrytitle>
+ <manvolnum>8</manvolnum>
+ </refmeta>
+
+ <refnamediv>
+ <refname>zebrasrv</refname>
+ <refpurpose>Zebra Server</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+ &zebrasrv-synopsis;
+ </refsynopsisdiv>
+ <refsect1><title>DESCRIPTION</title>
+ <para>Zebra is a high-performance, general-purpose structured text indexing
+ and retrieval engine. It reads structured records in a variety of input
+ formats (eg. email, XML, MARC) and allows access to them through exact
+ boolean search expressions and relevance-ranked free-text queries.
+ </para>
+ <para>
+ <command>zebrasrv</command> is the Z39.50 and SRU frontend
+ server for the <command>Zebra</command> search engine and indexer.
+ </para>
+ <para>
+ On Unix you can run the <command>zebrasrv</command>
+ server from the command line - and put it
+ in the background. It may also operate under the inet daemon.
+ On WIN32 you can run the server as a console application or
+ as a WIN32 Service.
+ </para>
+ </refsect1>
+ <refsect1>
+ <title>OPTIONS</title>
+
+ <para>
+ The options for <command>zebrasrv</command> are the same
+ as those for YAZ' <command>yaz-ztest</command>.
+ Option <literal>-c</literal> specifies a Zebra configuration
+ file - if omitted <filename>zebra.cfg</filename> is read.
+ </para>
+
+ &zebrasrv-options;
+ </refsect1>
+
+ <refsect1 id="protocol-support">
+ <title>Z39.50 Protocol Support and Behavior</title>
+
+ <refsect2 id="zebrasrv-initialization">
+ <title>Z39.50 Initialization</title>
+
+ <para>
+ During initialization, the server will negotiate to version 3 of the
+ Z39.50 protocol, and the option bits for Search, Present, Scan,
+ NamedResultSets, and concurrentOperations will be set, if requested by
+ the client. The maximum PDU size is negotiated down to a maximum of
+ 1 MB by default.
+ </para>
+
+ </refsect2>
+
+ <refsect2 id="zebrasrv-search">
+ <title>Z39.50 Search</title>
+
+ <para>
+ The supported query type are 1 and 101. All operators are currently
+ supported with the restriction that only proximity units of type "word"
+ are supported for the proximity operator.
+ Queries can be arbitrarily complex.
+ Named result sets are supported, and result sets can be used as operands
+ without limitations.
+ Searches may span multiple databases.
+ </para>
+
+ <para>
+ The server has full support for piggy-backed retrieval (see
+ also the following section).
+ </para>
+
+ </refsect2>
+
+ <refsect2 id="zebrasrv-present">
+ <title>Z39.50 Present</title>
+ <para>
+ The present facility is supported in a standard fashion. The requested
+ record syntax is matched against the ones supported by the profile of
+ each record retrieved. If no record syntax is given, SUTRS is the
+ default. The requested element set name, again, is matched against any
+ provided by the relevant record profiles.
+ </para>
+ </refsect2>
+ <refsect2 id="zebrasrv-scan">
+ <title>Z39.50 Scan</title>
+ <para>
+ The attribute combinations provided with the termListAndStartPoint are
+ processed in the same way as operands in a query (see above).
+ Currently, only the term and the globalOccurrences are returned with
+ the termInfo structure.
+ </para>
+ </refsect2>
+ <refsect2 id="zebrasrv-sort">
+ <title>Z39.50 Sort</title>
+
+ <para>
+ Z39.50 specifies three different types of sort criteria.
+ Of these Zebra supports the attribute specification type in which
+ case the use attribute specifies the "Sort register".
+ Sort registers are created for those fields that are of type "sort" in
+ the default.idx file.
+ The corresponding character mapping file in default.idx specifies the
+ ordinal of each character used in the actual sort.
+ </para>
+
+ <para>
+ Z39.50 allows the client to specify sorting on one or more input
+ result sets and one output result set.
+ Zebra supports sorting on one result set only which may or may not
+ be the same as the output result set.
+ </para>
+ </refsect2>
+ <refsect2 id="zebrasrv-close">
+ <title>Z39.50 Close</title>
+ <para>
+ If a Close PDU is received, the server will respond with a Close PDU
+ with reason=FINISHED, no matter which protocol version was negotiated
+ during initialization. If the protocol version is 3 or more, the
+ server will generate a Close PDU under certain circumstances,
+ including a session timeout (60 minutes by default), and certain kinds of
+ protocol errors. Once a Close PDU has been sent, the protocol
+ association is considered broken, and the transport connection will be
+ closed immediately upon receipt of further data, or following a short
+ timeout.
+ </para>
+ </refsect2>
+
+ <refsect2 id="zebrasrv-explain">
+ <title>Z39.50 Explain</title>
+ <para>
+ Zebra maintains a "classic"
+ <ulink url="&url.z39.50.explain;">Z39.50 Explain</ulink> database
+ on the side.
+ This database is called <literal>IR-Explain-1</literal> and can be
+ searched using the attribute set <literal>exp-1</literal>.
+ </para>
+ <para>
+ The records in the explain database are of type
+ <literal>grs.sgml</literal>.
+ The root element for the Explain grs.sgml records is
+ <literal>explain</literal>, thus
+ <filename>explain.abs</filename> is used for indexing.
+ </para>
+ <note>
+ <para>
+ Zebra <emphasis>must</emphasis> be able to locate
+ <filename>explain.abs</filename> in order to index the Explain
+ records properly. Zebra will work without it but the information
+ will not be searchable.
+ </para>
+ </note>
+ </refsect2>
+ </refsect1>
+ <refsect1 id="zebrasrv-sru">
+ <title>The SRU Server</title>
+ <para>
+ In addition to Z39.50, Zebra supports the more recent and
+ web-friendly IR protocol <ulink url="&url.sru;">SRU</ulink>.
+ SRU can be carried over SOAP or a REST-like protocol
+ that uses HTTP GET or POST to request search responses. The request
+ itself is made of parameters such as
+ <literal>query</literal>,
+ <literal>startRecord</literal>,
+ <literal>maximumRecords</literal>
+ and
+ <literal>recordSchema</literal>;
+ the response is an XML document containing hit-count, result-set
+ records, diagnostics, etc. SRU can be thought of as a re-casting
+ of Z39.50 semantics in web-friendly terms; or as a standardisation
+ of the ad-hoc query parameters used by search engines such as Google
+ and AltaVista; or as a superset of A9's OpenSearch (which it
+ predates).
+ </para>
+ <para>
+ Zebra supports Z39.50, SRU GET, SRU POST, SRU SOAP (SRW)
+ - on the same port, recognising what protocol is used by each incoming
+ requests and handling them accordingly. This is a achieved through
+ the use of Deep Magic; civilians are warned not to stand too close.
+ </para>
+ <refsect2 id="zebrasrv-sru-run">
+ <title>Running zebrasrv as an SRU Server</title>
+ <para>
+ Because Zebra supports all protocols on one port, it would
+ seem to follow that the SRU server is run in the same way as
+ the Z39.50 server, as described above. This is true, but only in
+ an uninterestingly vacuous way: a Zebra server run in this manner
+ will indeed recognise and accept SRU requests; but since it
+ doesn't know how to handle the CQL queries that these protocols
+ use, all it can do is send failure responses.
+ </para>
+ <note>
+ <para>
+ It is possible to cheat, by having SRU search Zebra with
+ a PQF query instead of CQL, using the
+ <literal>x-pquery</literal>
+ parameter instead of
+ <literal>query</literal>.
+ This is a
+ <emphasis role="strong">non-standard extension</emphasis>
+ of CQL, and a
+ <emphasis role="strong">very naughty</emphasis>
+ thing to do, but it does give you a way to see Zebra serving SRU
+ ``right out of the box''. If you start your favourite Zebra
+ server in the usual way, on port 9999, then you can send your web
+ browser to:
+ </para>
+ <screen>
+ http://localhost:9999/Default?version=1.1
+ &operation=searchRetrieve
+ &x-pquery=mineral
+ &startRecord=1
+ &maximumRecords=1
+ </screen>
+ <para>
+ This will display the XML-formatted SRU response that includes the
+ first record in the result-set found by the query
+ <literal>mineral</literal>. (For clarity, the SRU URL is shown
+ here broken across lines, but the lines should be joined to gether
+ to make single-line URL for the browser to submit.)
+ </para>
+ </note>
+ <para>
+ In order to turn on Zebra's support for CQL queries, it's necessary
+ to have the YAZ generic front-end (which Zebra uses) translate them
+ into the Z39.50 Type-1 query format that is used internally. And
+ to do this, the generic front-end's own configuration file must be
+ used. See <xref linkend="gfs-config"/>;
+ the salient point for SRU support is that
+ <command>zebrasrv</command>
+ must be started with the
+ <literal>-f frontendConfigFile</literal>
+ option rather than the
+ <literal>-c zebraConfigFile</literal>
+ option,
+ and that the front-end configuration file must include both a
+ reference to the Zebra configuration file and the CQL-to-PQF
+ translator configuration file.
+ </para>
+ <para>
+ A minimal front-end configuration file that does this would read as
+ follows:
+ </para>
+ <screen>
+<![CDATA[
+ <yazgfs>
+ <server>
+ <config>zebra.cfg</config>
+ <cql2rpn>../../tab/pqf.properties</cql2rpn>
+ </server>
+ </yazgfs>
+]]></screen>
+ <para>
+ The
+ <literal><config></literal>
+ element contains the name of the Zebra configuration file that was
+ previously specified by the
+ <literal>-c</literal>
+ command-line argument, and the
+ <literal><cql2rpn></literal>
+ element contains the name of the CQL properties file specifying how
+ various CQL indexes, relations, etc. are translated into Type-1
+ queries.
+ </para>
+ <para>
+ A zebra server running with such a configuration can then be
+ queried using proper, conformant SRU URLs with CQL queries:
+ </para>
+ <screen>
+ http://localhost:9999/Default?version=1.1
+ &operation=searchRetrieve
+ &query=title=utah and description=epicent*
+ &startRecord=1
+ &maximumRecords=1
+ </screen>
+ </refsect2>
+ </refsect1>
+ <refsect1 id="zebrasrv-sru-support">
+ <title>SRU Protocol Support and Behavior</title>
+ <para>
+ Zebra running as an SRU server supports SRU version 1.1, including
+ CQL version 1.1. In particular, it provides support for the
+ following elements of the protocol.
+ </para>
+
+ <refsect2 id="zebrasrvr-search-and-retrieval">
+ <title>SRU Search and Retrieval</title>
+ <para>
+ Zebra supports the
+ <ulink url="&url.sru.searchretrieve;">SRU searchRetrieve</ulink>
+ operation.
+ </para>
+ <para>
+ One of the great strengths of SRU is that it mandates a standard
+ query language, CQL, and that all conforming implementations can
+ therefore be trusted to correctly interpret the same queries. It
+ is with some shame, then, that we admit that Zebra also supports
+ an additional query language, our own Prefix Query Format
+ (<ulink url="&url.yaz.pqf;">PQF</ulink>).
+ A PQF query is submitted by using the extension parameter
+ <literal>x-pquery</literal>,
+ in which case the
+ <literal>query</literal>
+ parameter must be omitted, which makes the request not valid SRU.
+ Please don't do this.
+ </para>
+ </refsect2>
+
+ <refsect2 id="zebrasrv-sru-scan">
+ <title>SRU Scan</title>
+ <para>
+ Zebra supports <ulink url="&url.sru.scan;">SRU scan</ulink>
+ operation.
+ Scanning using CQL syntax is the default, where the
+ standard <literal>scanClause</literal> parameter is used.
+ </para>
+ <para>
+ In addition, a
+ mutant form of SRU scan is supported, using
+ the non-standard <literal>x-pScanClause</literal> parameter in
+ place of the standard <literal>scanClause</literal> to scan on a
+ PQF query clause.
+ </para>
+ </refsect2>
+
+ <refsect2 id="zebrasrv-sru-explain">
+ <title>SRU Explain</title>
+ <para>
+ Zebra supports <ulink url="&url.sru.explain;">SRU explain</ulink>.
+ </para>
+ <para>
+ The ZeeRex record explaining a database may be requested either
+ with a fully fledged SRU request (with
+ <literal>operation</literal>=<literal>explain</literal>
+ and version-number specified)
+ or with a simple HTTP GET at the server's basename.
+ The ZeeRex record returned in response is the one embedded
+ in the YAZ Frontend Server configuration file that is described in the
+ <xref linkend="gfs-config"/>.
+ </para>
+ <para>
+ Unfortunately, the data found in the
+ CQL-to-PQF text file must be added by hand-craft into the explain
+ section of the YAZ Frontend Server configuration file to be able
+ to provide a suitable explain record.
+ Too bad, but this is all extreme
+ new alpha stuff, and a lot of work has yet to be done ..
+ </para>
+ <para>
+ There is no linkeage whatsoever between the Z39.50 explain model
+ and the SRU explain response (well, at least not implemented
+ in Zebra, that is ..). Zebra does not provide a means using
+ Z39.50 to obtain the ZeeRex record.
+ </para>
+ </refsect2>
+
+ <refsect2 id="zebrasrv-non-sru-ops">
+ <title>Other SRU operations</title>
+ <para>
+ In the Z39.50 protocol, Initialization, Present, Sort and Close
+ are separate operations. In SRU, however, these operations do not
+ exist.
+ </para>
+ <itemizedlist>
+ <listitem>
+ <para>
+ SRU has no explicit initialization handshake phase, but
+ commences immediately with searching, scanning and explain
+ operations.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Neither does SRU have a close operation, since the protocol is
+ stateless and each request is self-contained. (It is true that
+ multiple SRU request/response pairs may be implemented as
+ multiple HTTP request/response pairs over a single persistent
+ TCP/IP connection; but the closure of that connection is not a
+ protocol-level operation.)
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Retrieval in SRU is part of the
+ <literal>searchRetrieve</literal> operation, in which a search
+ is submitted and the response includes a subset of the records
+ in the result set. There is no direct analogue of Z39.50's
+ Present operation which requests records from an established
+ result set. In SRU, this is achieved by sending a subsequent
+ <literal>searchRetrieve</literal> request with the query
+ <literal>cql.resultSetId=</literal><emphasis>id</emphasis> where
+ <emphasis>id</emphasis> is the identifier of the previously
+ generated result-set.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Sorting in CQL is done within the
+ <literal>searchRetrieve</literal> operation - in v1.1, by an
+ explicit <literal>sort</literal> parameter, but the forthcoming
+ v1.2 or v2.0 will most likely use an extension of the query
+ language, <ulink url="&url.cql.sorting;">CQL sorting</ulink>.
+ </para>
+ </listitem>
+ </itemizedlist>
+ <para>
+ It can be seen, then, that while Zebra operating as an SRU server
+ does not provide the same set of operations as when operating as a
+ Z39.50 server, it does provide equivalent functionality.
+ </para>
+ </refsect2>
+ </refsect1>
+
+ <refsect1 id="zebrasrv-sru-examples">
+ <title>SRU Examples</title>
+ <para>
+ Surf into <literal>http://localhost:9999</literal>
+ to get an explain response, or use
+ <screen><![CDATA[
+ http://localhost:9999/?version=1.1&operation=explain
+ ]]></screen>
+ </para>
+ <para>
+ See number of hits for a query
+ <screen><![CDATA[
+ http://localhost:9999/?version=1.1&operation=searchRetrieve
+ &query=text=(plant%20and%20soil)
+ ]]></screen>
+ </para>
+ <para>
+ Fetch record 5-7 in Dublin Core format
+ <screen><![CDATA[
+ http://localhost:9999/?version=1.1&operation=searchRetrieve
+ &query=text=(plant%20and%20soil)
+ &startRecord=5&maximumRecords=2&recordSchema=dc
+ ]]></screen>
+ </para>
+ <para>
+ Even search using PQF queries using the <emphasis>extended naughty
+ verb</emphasis> <literal>x-pquery</literal>
+ <screen><![CDATA[
+ http://localhost:9999/?version=1.1&operation=searchRetrieve
+ &x-pquery=@attr%201=text%20@and%20plant%20soil
+ ]]></screen>
+ </para>
+ <para>
+ Or scan indexes using the <emphasis>extended extremely naughty
+ verb</emphasis> <literal>x-pScanClause</literal>
+ <screen><![CDATA[
+ http://localhost:9999/?version=1.1&operation=scan
+ &x-pScanClause=@attr%201=text%20something
+ ]]></screen>
+ <emphasis>Don't do this in production code!</emphasis>
+ But it's a great fast debugging aid.
+ </para>
+
+ </refsect1>
+
+ <refsect1 id="gfs-config"><title>YAZ server virtual hosts</title>
+ &zebrasrv-virtual;
+ </refsect1>
+
+ <refsect1><title>SEE ALSO</title>
+ <para>
+ <citerefentry>
+ <refentrytitle>zebraidx</refentrytitle>
+ <manvolnum>1</manvolnum>
+ </citerefentry>
+ </para>
+</refsect1>
+</refentry>
+
+ <!-- Keep this comment at the end of the file
+ Local variables:
+ mode: sgml
+ sgml-omittag:t
+ sgml-shorttag:t
+ sgml-minimize-attributes:nil
+ sgml-always-quote-attributes:t
+ sgml-indent-step:1
+ sgml-indent-data:t
+ sgml-parent-document: "zebra.xml"
+ sgml-local-catalogs: nil
+ sgml-namecase-general:t
+ End:
+ -->