doc/server.xml

   1 <chapter id="server">
   2  <!-- $Id: server.xml,v 1.22 2006-06-07 13:17:48 marc Exp $ -->
   3  <title>The Z39.50 Server</title>
   4
   5  <sect1 id="zebrasrv">
   6   <title>Running the Z39.50 Server (zebrasrv)</title>
   7
   8   <!--
   9    FIXME - We need to be consistent here, zebraidx had the options at the
  10            end, and lots of explaining text before them. Same for zebrasvr! -H
  11    FIXME - At least we need a small intro, what is zebrasvr, and how it
  12            can be run (inetd, nt service, stand-alone program, daemon...) -H
  13   -->
  14
  15   <!-- re-write by MC, using the newly created input files for the
  16    zebrasrv manpage -->
  17
  18
  19  <sect2><title>Description</title>
  20     <para>Zebra is a high-performance, general-purpose structured text indexing
  21    and retrieval engine. It reads structured records in a variety of input
  22    formats (eg. email, XML, MARC) and allows access to them through exact
  23    boolean search expressions and relevance-ranked free-text queries.
  24    </para>
  25    <para>
  26     <command>zebrasrv</command> is the Z39.50 and  <ulink url="http://www.loc.gov/standards/sru/srw/">SRW</ulink>/U frontend
  27     server for the <command>Zebra</command> indexer.
  28    </para>
  29    <para>
  30     On Unix you can run the <command>zebrasrv</command>
  31     server from the command line - and put it
  32     in the background. It may also operate under the inet daemon.
  33     On WIN32 you can run the server as a console application or
  34     as a WIN32 Service.
  35    </para>
  36   </sect2>
  37
  38  <sect2>
  39    <title>Synopsis</title>
  40     &zebrasrv-synopsis;
  41  </sect2>
  42
  43  <sect2>
  44    <title>Options</title>
  45
  46    <para>
  47     The options for <command>zebrasrv</command> are the same
  48     as those for YAZ' <command>yaz-ztest</command>.
  49     Option <literal>-c</literal> specifies a Zebra configuration
  50     file - if omitted <filename>zebra.cfg</filename> is read.
  51    </para>
  52
  53   &zebrasrv-options;
  54   </sect2>
  55
  56   <sect2><title>Files</title>
  57    <para>
  58     <filename>zebra.cfg</filename>
  59    </para>
  60   </sect2>
  61   <sect2><title>See Also</title>
  62    <para>
  63     <citerefentry>
  64      <refentrytitle>zebraidx</refentrytitle>
  65      <manvolnum>1</manvolnum>
  66     </citerefentry>,
  67     <citerefentry>
  68      <refentrytitle>yaz-ztest</refentrytitle>
  69      <manvolnum>8</manvolnum>
  70     </citerefentry>
  71    </para>
  72    <para>
  73     The Zebra software is Copyright <command>Index Data</command>
  74     <filename>http://www.indexdata.dk</filename>
  75     and distributed under the
  76     GPLv2 license.
  77    </para>
  78   </sect2>
  79
  80   <!--
  81   <para>
  82    <emphasis remap="bf">Syntax</emphasis>
  83
  84    <screen>
  85     zebrasrv [options] [listener-address ...]
  86    </screen>
  87
  88   </para>
  89
  90   <para>
  91    <emphasis remap="bf">Options</emphasis>
  92    <variablelist>
  93
  94     <varlistentry>
  95      <term>-a <replaceable>APDU file</replaceable></term>
  96      <listitem>
  97       <para>
  98        Specify a file for dumping PDUs (for diagnostic purposes).
  99        The special name "-" sends output to <literal>stderr</literal>.
 100       </para>
 101      </listitem>
 102     </varlistentry>
 103     <varlistentry>
 104      <term>-c <replaceable>config-file</replaceable></term>
 105      <listitem>
 106       <para>
 107        Read configuration information from
 108        <replaceable>config-file</replaceable>.
 109        The default configuration is <literal>./zebra.cfg</literal>.
 110       </para>
 111      </listitem>
 112     </varlistentry>
 113     <varlistentry>
 114      <term>-S</term>
 115      <listitem>
 116       <para>
 117        Don't fork on connection requests. This can be useful for
 118        symbolic-level debugging. The server can only accept a single
 119        connection in this mode.
 120       </para>
 121      </listitem>
 122     </varlistentry>
 123     <varlistentry>
 124      <term>-z</term>
 125      <listitem>
 126       <para>
 127        Use the Z39.50 protocol. Currently the only protocol supported.
 128        The option is retained for historical reasons, and for future
 129        extensions.
 130       </para>
 131      </listitem>
 132     </varlistentry>
 133     <varlistentry>
 134      <term>-l <replaceable>logfile</replaceable></term>
 135      <listitem>
 136       <para>
 137        Specify an output file for the diagnostic messages.
 138        The default is to write this information to <literal>stderr</literal>.
 139       </para>
 140      </listitem>
 141     </varlistentry>
 142     <varlistentry>
 143      <term>-v <replaceable>log-level</replaceable></term>
 144      <listitem>
 145       <para>
 146        The log level. Use a comma-separated list of members of the set
 147        {fatal,debug,warn,log,all,none}.
 148       </para>
 149      </listitem>
 150     </varlistentry>
 151     <varlistentry>
 152      <term>-u <replaceable>username</replaceable></term>
 153      <listitem>
 154       <para>
 155        Set user ID. Sets the real UID of the server process to that of the
 156        given <replaceable>username</replaceable>.
 157        It's useful if you aren't comfortable with having the
 158        server run as root, but you need to start it as such to bind a
 159        privileged port.
 160       </para>
 161      </listitem>
 162     </varlistentry>
 163     <varlistentry>
 164      <term>-w <replaceable>working-directory</replaceable></term>
 165      <listitem>
 166       <para>
 167        Change working directory.
 168       </para>
 169      </listitem>
 170     </varlistentry>
 171     <varlistentry>
 172      <term>-i</term>
 173      <listitem>
 174       <para>
 175        Run under the Internet superserver, <literal>inetd</literal>.
 176        Make sure you use the logfile option <literal>-l</literal> in
 177        conjunction with this mode and specify the <literal>-l</literal>
 178        option before any other options.
 179       </para>
 180      </listitem>
 181     </varlistentry>
 182     <varlistentry>
 183      <term>-t <replaceable>timeout</replaceable></term>
 184      <listitem>
 185       <para>
 186        Set the idle session timeout (default 60 minutes).
 187       </para>
 188      </listitem>
 189     </varlistentry>
 190     <varlistentry>
 191      <term>-k <replaceable>kilobytes</replaceable></term>
 192      <listitem>
 193       <para>
 194        Set the (approximate) maximum size of
 195        present response messages. Default is 1024 KB (1 MB).
 196       </para>
 197      </listitem>
 198     </varlistentry>
 199    </variablelist>
 200   </para>
 201   -->
 202  </sect1>
 203
 204
 205  <sect1 id="protocol-support">
 206   <title>Z39.50 Protocol Support and Behavior</title>
 207
 208   <sect2>
 209    <title>Initialization</title>
 210
 211    <para>
 212     During initialization, the server will negotiate to version 3 of the
 213     Z39.50 protocol, and the option bits for Search, Present, Scan,
 214     NamedResultSets, and concurrentOperations will be set, if requested by
 215     the client. The maximum PDU size is negotiated down to a maximum of
 216     1 MB by default.
 217    </para>
 218
 219   </sect2>
 220
 221   <sect2 id="search">
 222    <title>Search</title>
 223
 224    <!--
 225     FIXME - Need to explain the string tag stuff before people get bogged
 226             down with all these attribute numbers. Perhaps in its own
 227             chapter? -H
 228    -->
 229
 230    <para>
 231     The supported query type are 1 and 101. All operators are currently
 232     supported with the restriction that only proximity units of type "word"
 233     are supported for the proximity operator.
 234     Queries can be arbitrarily complex.
 235     Named result sets are supported, and result sets can be used as operands
 236     without limitations.
 237     Searches may span multiple databases.
 238    </para>
 239
 240    <para>
 241     The server has full support for piggy-backed retrieval (see
 242     also the following section).
 243    </para>
 244
 245    <para>
 246     <emphasis>Use</emphasis> attributes are interpreted according to the
 247     attribute sets which have been loaded in the
 248     <literal>zebra.cfg</literal> file, and are matched against specific
 249     fields as specified in the <literal>.abs</literal> file which
 250     describes the profile of the records which have been loaded.
 251     If no Use attribute is provided, a default of Bib-1 Any is assumed.
 252    </para>
 253
 254    <para>
 255     If a <emphasis>Structure</emphasis> attribute of
 256     <emphasis>Phrase</emphasis> is used in conjunction with a
 257     <emphasis>Completeness</emphasis> attribute of
 258     <emphasis>Complete (Sub)field</emphasis>, the term is matched
 259     against the contents of the phrase (long word) register, if one
 260     exists for the given <emphasis>Use</emphasis> attribute.
 261     A phrase register is created for those fields in the
 262     <literal>.abs</literal> file that contains a
 263     <literal>p</literal>-specifier.
 264     <!-- ### whatever the hell _that_ is -->
 265    </para>
 266
 267    <para>
 268     If <emphasis>Structure</emphasis>=<emphasis>Phrase</emphasis> is
 269     used in conjunction with <emphasis>Incomplete Field</emphasis> - the
 270     default value for <emphasis>Completeness</emphasis>, the
 271     search is directed against the normal word registers, but if the term
 272     contains multiple words, the term will only match if all of the words
 273     are found immediately adjacent, and in the given order.
 274     The word search is performed on those fields that are indexed as
 275     type <literal>w</literal> in the <literal>.abs</literal> file.
 276    </para>
 277
 278    <para>
 279     If the <emphasis>Structure</emphasis> attribute is
 280     <emphasis>Word List</emphasis>,
 281     <emphasis>Free-form Text</emphasis>, or
 282     <emphasis>Document Text</emphasis>, the term is treated as a
 283     natural-language, relevance-ranked query.
 284     This search type uses the word register, i.e. those fields
 285     that are indexed as type <literal>w</literal> in the
 286     <literal>.abs</literal> file.
 287    </para>
 288
 289    <para>
 290     If the <emphasis>Structure</emphasis> attribute is
 291     <emphasis>Numeric String</emphasis> the term is treated as an integer.
 292     The search is performed on those fields that are indexed
 293     as type <literal>n</literal> in the <literal>.abs</literal> file.
 294    </para>
 295
 296    <para>
 297     If the <emphasis>Structure</emphasis> attribute is
 298     <emphasis>URx</emphasis> the term is treated as a URX (URL) entity.
 299     The search is performed on those fields that are indexed as type
 300     <literal>u</literal> in the <literal>.abs</literal> file.
 301    </para>
 302
 303    <para>
 304     If the <emphasis>Structure</emphasis> attribute is
 305     <emphasis>Local Number</emphasis> the term is treated as
 306     native Zebra Record Identifier.
 307    </para>
 308
 309    <para>
 310     If the <emphasis>Relation</emphasis> attribute is
 311     <emphasis>Equals</emphasis> (default), the term is matched
 312     in a normal fashion (modulo truncation and processing of
 313     individual words, if required).
 314     If <emphasis>Relation</emphasis> is <emphasis>Less Than</emphasis>,
 315     <emphasis>Less Than or Equal</emphasis>,
 316     <emphasis>Greater than</emphasis>, or <emphasis>Greater than or
 317      Equal</emphasis>, the term is assumed to be numerical, and a
 318     standard regular expression is constructed to match the given
 319     expression.
 320     If <emphasis>Relation</emphasis> is <emphasis>Relevance</emphasis>,
 321     the standard natural-language query processor is invoked.
 322    </para>
 323
 324    <para>
 325     For the <emphasis>Truncation</emphasis> attribute,
 326     <emphasis>No Truncation</emphasis> is the default.
 327     <emphasis>Left Truncation</emphasis> is not supported.
 328     <emphasis>Process # in search term</emphasis> is supported, as is
 329     <emphasis>Regxp-1</emphasis>.
 330     <emphasis>Regxp-2</emphasis> enables the fault-tolerant (fuzzy)
 331     search. As a default, a single error (deletion, insertion,
 332     replacement) is accepted when terms are matched against the register
 333     contents.
 334    </para>
 335
 336    <sect3>
 337     <title>Regular expressions</title>
 338
 339     <para>
 340      Each term in a query is interpreted as a regular expression if
 341      the truncation value is either <emphasis>Regxp-1</emphasis> (102)
 342      or <emphasis>Regxp-2</emphasis> (103).
 343      Both query types follow the same syntax with the operands:
 344      <variablelist>
 345
 346       <varlistentry>
 347        <term>x</term>
 348        <listitem>
 349         <para>
 350          Matches the character <emphasis>x</emphasis>.
 351         </para>
 352        </listitem>
 353       </varlistentry>
 354       <varlistentry>
 355        <term>.</term>
 356        <listitem>
 357         <para>
 358          Matches any character.
 359         </para>
 360        </listitem>
 361       </varlistentry>
 362       <varlistentry>
 363        <term><literal>[</literal>..<literal>]</literal></term>
 364        <listitem>
 365         <para>
 366          Matches the set of characters specified;
 367          such as <literal>[abc]</literal> or <literal>[a-c]</literal>.
 368         </para>
 369        </listitem>
 370       </varlistentry>
 371      </variablelist>
 372      and the operators:
 373      <variablelist>
 374
 375       <varlistentry>
 376        <term>x*</term>
 377        <listitem>
 378         <para>
 379          Matches <emphasis>x</emphasis> zero or more times. Priority: high.
 380         </para>
 381        </listitem>
 382       </varlistentry>
 383       <varlistentry>
 384        <term>x+</term>
 385        <listitem>
 386         <para>
 387          Matches <emphasis>x</emphasis> one or more times. Priority: high.
 388         </para>
 389        </listitem>
 390       </varlistentry>
 391       <varlistentry>
 392        <term>x?</term>
 393        <listitem>
 394         <para>
 395          Matches <emphasis>x</emphasis> zero or once. Priority: high.
 396         </para>
 397        </listitem>
 398       </varlistentry>
 399       <varlistentry>
 400        <term>xy</term>
 401        <listitem>
 402         <para>
 403          Matches <emphasis>x</emphasis>, then <emphasis>y</emphasis>.
 404          Priority: medium.
 405         </para>
 406        </listitem>
 407       </varlistentry>
 408       <varlistentry>
 409        <term>x|y</term>
 410        <listitem>
 411         <para>
 412          Matches either <emphasis>x</emphasis> or <emphasis>y</emphasis>.
 413          Priority: low.
 414         </para>
 415        </listitem>
 416       </varlistentry>
 417      </variablelist>
 418      The order of evaluation may be changed by using parentheses.
 419     </para>
 420
 421     <para>
 422      If the first character of the <emphasis>Regxp-2</emphasis> query
 423      is a plus character (<literal>+</literal>) it marks the
 424      beginning of a section with non-standard specifiers.
 425      The next plus character marks the end of the section.
 426      Currently Zebra only supports one specifier, the error tolerance,
 427      which consists one digit.
 428     </para>
 429
 430     <para>
 431      Since the plus operator is normally a suffix operator the addition to
 432      the query syntax doesn't violate the syntax for standard regular
 433      expressions.
 434     </para>
 435
 436    </sect3>
 437
 438    <sect3>
 439     <title>Query examples</title>
 440
 441     <para>
 442      Phrase search for <emphasis>information retrieval</emphasis> in
 443      the title-register:
 444      <screen>
 445       @attr 1=4 "information retrieval"
 446      </screen>
 447     </para>
 448
 449     <para>
 450      Ranked search for the same thing:
 451      <screen>
 452       @attr 1=4 @attr 2=102 "Information retrieval"
 453      </screen>
 454     </para>
 455
 456     <para>
 457      Phrase search with a regular expression:
 458      <screen>
 459       @attr 1=4 @attr 5=102 "informat.* retrieval"
 460      </screen>
 461     </para>
 462
 463     <para>
 464      Ranked search with a regular expression:
 465      <screen>
 466       @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
 467      </screen>
 468     </para>
 469
 470     <para>
 471      In the GILS schema (<literal>gils.abs</literal>), the
 472      west-bounding-coordinate is indexed as type <literal>n</literal>,
 473      and is therefore searched by specifying
 474      <emphasis>structure</emphasis>=<emphasis>Numeric String</emphasis>.
 475      To match all those records with west-bounding-coordinate greater
 476      than -114 we use the following query:
 477      <screen>
 478       @attr 4=109 @attr 2=5 @attr gils 1=2038 -114
 479      </screen>
 480     </para>
 481    </sect3>
 482    </sect2>
 483
 484   <sect2>
 485    <title>Present</title>
 486    <para>
 487     The present facility is supported in a standard fashion. The requested
 488     record syntax is matched against the ones supported by the profile of
 489     each record retrieved. If no record syntax is given, SUTRS is the
 490     default. The requested element set name, again, is matched against any
 491     provided by the relevant record profiles.
 492    </para>
 493   </sect2>
 494   <sect2>
 495    <title>Scan</title>
 496    <para>
 497     The attribute combinations provided with the termListAndStartPoint are
 498     processed in the same way as operands in a query (see above).
 499     Currently, only the term and the globalOccurrences are returned with
 500     the termInfo structure.
 501    </para>
 502   </sect2>
 503   <sect2>
 504    <title>Sort</title>
 505
 506    <para>
 507     Z39.50 specifies three different types of sort criteria.
 508     Of these Zebra supports the attribute specification type in which
 509     case the use attribute specifies the "Sort register".
 510     Sort registers are created for those fields that are of type "sort" in
 511     the default.idx file.
 512     The corresponding character mapping file in default.idx specifies the
 513     ordinal of each character used in the actual sort.
 514    </para>
 515
 516    <para>
 517     Z39.50 allows the client to specify sorting on one or more input
 518     result sets and one output result set.
 519     Zebra supports sorting on one result set only which may or may not
 520     be the same as the output result set.
 521    </para>
 522   </sect2>
 523   <sect2>
 524    <title>Close</title>
 525    <para>
 526     If a Close PDU is received, the server will respond with a Close PDU
 527     with reason=FINISHED, no matter which protocol version was negotiated
 528     during initialization. If the protocol version is 3 or more, the
 529     server will generate a Close PDU under certain circumstances,
 530     including a session timeout (60 minutes by default), and certain kinds of
 531     protocol errors. Once a Close PDU has been sent, the protocol
 532     association is considered broken, and the transport connection will be
 533     closed immediately upon receipt of further data, or following a short
 534     timeout.
 535    </para>
 536   </sect2>
 537
 538    <sect2>
 539     <title>Explain</title>
 540     <para>
 541      Zebra maintains a "classic"
 542      <ulink url="&url.z39.50.explain;">Explain</ulink> database
 543      on the side.
 544      This database is called <literal>IR-Explain-1</literal> and can be
 545      searched using the attribute set <literal>exp-1</literal>.
 546     </para>
 547     <para>
 548      The records in the explain database are of type
 549      <literal>grs.sgml</literal> and can be retrieved as
 550      <literal>SUTRS</literal>, <literal>XML</literal>,
 551      <literal>GRS-1</literal> and  <literal>ASN.1</literal> Explain.
 552     </para>
 553     <para>
 554      Classic Explain only defines retrieaval of Explain information
 555      via ASN.1. Pratically no Z39.50 clients supports this. Fortunately
 556      they don't have to - since Zebra allows retrieval of this information
 557      in the other formats.
 558     </para>
 559     <para>
 560      The root element for the Explain grs.sgml records is
 561      <literal>explain</literal>, thus
 562      <filename>explain.abs</filename> is used for indexing.
 563     </para>
 564     <note>
 565      <para>
 566       Zebra <emphasis>must</emphasis> be able to locate
 567       <filename>explain.abs</filename> in order to index the Explain
 568       records properly. Zebra will work without it but the information
 569       will not be searchable.
 570      </para>
 571     </note>
 572     <para>
 573      The following Explain categories are supported:
 574      <literal>CategoryList</literal>, <literal>TargetInfo</literal>,
 575      <literal>DatabaseInfo</literal>, <literal>AttributeDetails</literal>.
 576     </para>
 577     <para>
 578      The following Explain search atributes are supported:
 579      <literal>ExplainCategory</literal> (@attr 1=1),
 580      <literal>DatabaseName</literal> (@attr 1=3),
 581      <literal>DateAdded</literal> (@attr 1=9),
 582      <literal>DateChanged</literal>(@ayyt 1=10).
 583      See <filename>tab/explain.att</filename> for more information.
 584     </para>
 585
 586     <sect3>
 587      <title>Example searches with yaz-client</title>
 588
 589
 590      <para>
 591       List supported categories to find out which explain commands are
 592       supported:
 593       <screen>
 594        Z> base IR-Explain-1
 595        Z> @attr exp1 1=1 categorylist
 596        Z> form sutrs
 597        Z> show 1+2
 598       </screen>
 599      </para>
 600
 601      <para>
 602       Get target info, that is, investigate which databases exist at
 603       this server endpoint:
 604       <screen>
 605        Z> base IR-Explain-1
 606        Z> @attr exp1 1=1 targetinfo
 607        Z> form xml
 608        Z> show 1+1
 609        Z> form grs-1
 610        Z> show 1+1
 611        Z> form sutrs
 612        Z> show 1+1
 613       </screen>
 614      </para>
 615
 616      <para>
 617       List all supported databases, the number of hits
 618       is the number of databases found, which most commonly are the
 619       following two:
 620       the <literal>Default</literal> and the
 621       <literal>IR-Explain-1</literal> databases.
 622       <screen>
 623        Z> base IR-Explain-1
 624        Z> f @attr exp1 1=1 databaseinfo
 625        Z> form sutrs
 626        Z> show 1+2
 627       </screen>
 628      </para>
 629
 630      <para>
 631       Get database info record for database <literal>Default</literal>.
 632       <screen>
 633        Z> base IR-Explain-1
 634        Z> @and @attr exp1 1=1 databaseinfo @attr exp1 1=3 Default
 635       </screen>
 636       Identical query with explicitly specified attribute set:
 637       <screen>
 638        Z> base IR-Explain-1
 639        Z> @attrset exp1 @and @attr 1=1 databaseinfo @attr 1=3 Default
 640       </screen>
 641      </para>
 642
 643      <para>
 644       Get attribute details record for database
 645       <literal>Default</literal>.
 646       This query is very useful to study the internal Zebra indexes.
 647       If records have been indexed using the <literal>alvis</literal>
 648       XSLT filter, the string representation names of the known indexes can be
 649       found.
 650       <screen>
 651        Z> base IR-Explain-1
 652        Z> @and @attr exp1 1=1 attributedetails @attr exp1 1=3 Default
 653       </screen>
 654       Identical query with explicitly specified attribute set:
 655       <screen>
 656        Z> base IR-Explain-1
 657        Z> @attrset exp1 @and @attr 1=1 attributedetails @attr 1=3 Default
 658       </screen>
 659      </para>
 660
 661     </sect3>
 662    </sect2>
 663  </sect1>
 664 </chapter>
 665
 666
 667 <chapter id="server-sru">
 668  <title>The SRU/SRW Server</title>
 669  <para>
 670   In addition to Z39.50, Zebra supports the more recent and
 671   web-friendly IR protocol SRU, described at
 672   <ulink url="http://www.loc.gov/sru"/>.
 673   SRU is ``Search/Retrieve via URL'', a simple, REST-like protocol
 674   that uses HTTP GET to request search responses.  The request
 675   itself is made of parameters such as
 676   <literal>query</literal>,
 677   <literal>startRecord</literal>,
 678   <literal>maximumRecords</literal>
 679   and
 680   <literal>recordSchema</literal>;
 681   the response is an XML document containing hit-count, result-set
 682   records, diagnostics, etc.  SRU can be thought of as a re-casting
 683   of Z39.50 semantics in web-friendly terms; or as a standardisation
 684   of the ad-hoc query parameters used by search engines such as Google
 685   and AltaVista; or as a superset of A9's OpenSearch (which it
 686   predates).
 687  </para>
 688  <para>
 689   Zebra further supports SRW, described at
 690   <ulink url="http://www.loc.gov/srw"/>.
 691   SRW is the ``Search/Retrieve Web Service'', a SOAP-based alternative
 692   implementation of the abstract protocol that SRU implements as HTTP
 693   GET requests.  In SRW, requests are encoded as XML documents which
 694   are posted to the server.  The responses are identical to those
 695   returned by SRU servers, except that they are wrapped in a several
 696   layers of SOAP envelope.
 697  </para>
 698  <para>
 699   Zebra supports all three protocols - Z39.50, SRU and SRW - on the
 700   same port, recognising what protocol is used by each incoming
 701   requests and handling them accordingly.  This is a achieved through
 702   the use of Deep Magic; civilians are warned not to stand too close.
 703  </para>
 704  <para>
 705   From here on, ``SRU'' is used to indicate both the SRU and SRW
 706   protocols, as they are identical except for the transport used for
 707   the protocol packets and Zebra's support for them is equivalent.
 708  </para>
 709
 710  <sect1 id="server-sru-run">
 711   <title>Running the SRU Server (zebrasrv)</title>
 712   <para>
 713    Because Zebra supports all three protocols on one port, it would
 714    seem to follow that the SRU server is run in the same way as
 715    the Z39.50 server, as described above.  This is true, but only in
 716    an uninterestingly vacuous way: a Zebra server run in this manner
 717    will indeed recognise and accept SRU requests; but since it
 718    doesn't know how to handle the CQL queries that these protocols
 719    use, all it can do is send failure responses.
 720   </para>
 721   <note>
 722    <para>
 723     It is possible to cheat, by having SRU search Zebra with
 724     a PQF query instead of CQL, using the
 725     <literal>x-pquery</literal>
 726     parameter instead of
 727     <literal>query</literal>.
 728     This is a
 729     <emphasis role="strong">non-standard extension</emphasis>
 730     of CQL, and a
 731     <emphasis role="strong">very naughty</emphasis>
 732     thing to do, but it does give you a way to see Zebra serving SRU
 733     ``right out of the box''.  If you start your favourite Zebra
 734     server in the usual way, on port 9999, then you can send your web
 735     browser to:
 736   </para>
 737   <screen>
 738         http://localhost:9999/Default?version=1.1
 739                 &amp;operation=searchRetrieve
 740                 &amp;x-pquery=mineral
 741                 &amp;startRecord=1
 742                 &amp;maximumRecords=1
 743   </screen>
 744   <para>
 745     This will display the XML-formatted SRU response that includes the
 746     first record in the result-set found by the query
 747     <literal>mineral</literal>.  (For clarity, the SRU URL is shown
 748     here broken across lines, but the lines should be joined to gether
 749     to make single-line URL for the browser to submit.)
 750    </para>
 751   </note>
 752   <para>
 753    In order to turn on Zebra's support for CQL queries, it's necessary
 754    to have the YAZ generic front-end (which Zebra uses) translate them
 755    into the Z39.50 Type-1 query format that is used internally.  And
 756    to do this, the generic front-end's own configuration file must be
 757    used.  This file is described
 758    <link linkend="gfs-config">elsewhere</link>;
 759    the salient point for SRU support is that
 760    <command>zebrasrv</command>
 761    must be started with the
 762    <literal>-f&nbsp;frontendConfigFile</literal>
 763    option rather than the
 764    <literal>-c&nbsp;zebraConfigFile</literal>
 765    option,
 766    and that the front-end configuration file must include both a
 767    reference to the Zebra configuration file and the CQL-to-PQF
 768    translator configuration file.
 769   </para>
 770   <para>
 771    A minimal front-end configuration file that does this would read as
 772    follows:
 773   </para>
 774   <screen><![CDATA[
 775         <yazgfs>
 776           <server>
 777             <config>zebra.cfg</config>
 778             <cql2rpn>../../tab/pqf.properties</cql2rpn>
 779           </server>
 780         </yazgfs>
 781 ]]></screen>
 782   <para>
 783    The
 784    <literal>&lt;config&gt;</literal>
 785    element contains the name of the Zebra configuration file that was
 786    previously specified by the
 787    <literal>-c</literal>
 788    command-line argument, and the
 789    <literal>&lt;cql2rpn&gt;</literal>
 790    element contains the name of the CQL properties file specifying how
 791    various CQL indexes, relations, etc. are translated into Type-1
 792    queries.
 793   </para>
 794   <para>
 795    A zebra server running with such a configuration can then be
 796    queried using proper, conformant SRU URLs with CQL queries:
 797   </para>
 798   <screen>
 799         http://localhost:9999/Default?version=1.1
 800                 &amp;operation=searchRetrieve
 801                 &amp;query=title=utah and description=epicent*
 802                 &amp;startRecord=1
 803                 &amp;maximumRecords=1
 804   </screen>
 805  </sect1>
 806
 807  <sect1 id="server-sru-support">
 808   <title>SRU and SRW Protocol Support and Behavior</title>
 809   <para>
 810    Zebra running as an SRU server supports SRU version 1.1, including
 811    CQL version 1.1.  In particular, it provides support for the
 812    following elements of the protocol.
 813   </para>
 814
 815   <sect2>
 816    <title>Search and Retrieval</title>
 817    <para>
 818     Zebra fully supports SRU's core
 819     <literal>searchRetrieve</literal>
 820     operation, as described at
 821     <ulink url="http://www.loc.gov/standards/sru/sru-spec.html"/>
 822    </para>
 823    <para>
 824     One of the great strengths of SRU is that it mandates a standard
 825     query language, CQL, and that all conforming implementations can
 826     therefore be trusted to correctly interpret the same queries.  It
 827     is with some shame, then, that we admit that Zebra also supports
 828     an additional query language, our own Prefix Query Format (PQF,
 829     <ulink url="http://indexdata.com/yaz/doc/tools.tkl#PQF"/>).
 830     A PQF query is submitted by using the extension parameter
 831     <literal>x-pquery</literal>,
 832     in which case the
 833     <literal>query</literal>
 834     parameter must be omitted, which makes the request not valid SRU.
 835     Please don't do this.
 836    </para>
 837   </sect2>
 838
 839   <sect2>
 840    <title>Scan</title>
 841    <para>
 842     Zebra does <emphasis>not</emphasis> support SRU's
 843     <literal>scan</literal>
 844     operation, as described at
 845     <ulink url="http://www.loc.gov/standards/sru/scan/"/>
 846    </para>
 847    <para>
 848     This is a rather embarrassing surprise as the pieces are all
 849     there: Z39.50 scan is supported, and SRU scan requests are
 850     recognised and diagnosed.  To add further to the embarrassment, a
 851     mutant form of SRU scan <emphasis>is</emphasis> supported, using
 852     the non-standard <literal>x-pScanClause</literal> parameter in
 853     place of the standard <literal>scanClause</literal> to scan on a
 854     PQF query clause.
 855    </para>
 856   </sect2>
 857
 858   <sect2>
 859    <title>Explain</title>
 860    <para>
 861     Zebra fully supports SRU's core
 862     <literal>explain</literal>
 863     operation, as described at
 864     <ulink url="http://www.loc.gov/standards/sru/explain/index.html"/>
 865    </para>
 866    <para>
 867     The ZeeRex record explaining a database may be requested either
 868     with a fully fledged SRU request (with
 869     <literal>operation</literal>=<literal>explain</literal>
 870     and version-number specified)
 871     or with a simple HTTP GET at the server's basename.
 872     The ZeeRex record returned in response is the one embedded
 873     in the YAZ Frontend Server configuration file that is described in the
 874     <link linkend="gfs-config">Virtual Hosts</link> documentation.
 875    </para>
 876     <para>
 877      Unfortunately, the data found in the
 878      CQL-to-PQF text file must be added by hand-craft into the explain
 879      section of the YAZ Frontend Server configuration file to be able
 880      to provide a suitable explain record.
 881      Too bad, but this is all extreme
 882      new alpha stuff, and a lot of work has yet to be done ..
 883     </para>
 884     <para>
 885      There is no linkeage whatsoever between the Z39.50 explain model
 886      and the SRU/SRW explain response (well, at least not implemented
 887      in Zebra, that is ..).  Zebra does not provide a means using
 888      Z39.50 to obtain the ZeeRex record.
 889      </para>
 890   </sect2>
 891
 892   <sect2>
 893    <title>Some SRU Examples</title>
 894     <para>
 895      Surf into <literal>http://localhost:9999</literal>
 896      to get an explain response, or use
 897      <screen><![CDATA[
 898       http://localhost:9999/?version=1.1&operation=explain
 899      ]]></screen>
 900     </para>
 901     <para>
 902      See number of hits for a query
 903      <screen><![CDATA[
 904        http://localhost:9999/?version=1.1&operation=searchRetrieve
 905                        &query=text=(plant%20and%20soil)
 906      ]]></screen>
 907     </para>
 908     <para>
 909       Fetch record 5-7 in Dublin Core format
 910      <screen><![CDATA[
 911        http://localhost:9999/?version=1.1&operation=searchRetrieve
 912                        &query=text=(plant%20and%20soil)
 913                        &startRecord=5&maximumRecords=2&recordSchema=dc
 914      ]]></screen>
 915     </para>
 916     <para>
 917      Even search using PQF queries using the <emphasis>extended naughty
 918      verb</emphasis> <literal>x-pquery</literal>
 919      <screen><![CDATA[
 920       http://localhost:9999/?version=1.1&operation=searchRetrieve
 921                        &x-pquery=@attr%201=text%20@and%20plant%20soil
 922      ]]></screen>
 923     </para>
 924     <para>
 925      Or scan indexes using the <emphasis>extended extremely naughty
 926      verb</emphasis> <literal>x-pScanClause</literal>
 927      <screen><![CDATA[
 928       http://localhost:9999/?version=1.1&operation=scan
 929                        &x-pScanClause=@attr%201=text%20something
 930      ]]></screen>
 931      <emphasis>Don't do this in production code!</emphasis>
 932      But it's a great fast debugging aid.
 933     </para>
 934   </sect2>
 935
 936   <sect2>
 937    <title>Initialization, Present, Sort, Close</title>
 938    <para>
 939     In the Z39.50 protocol, Initialization, Present, Sort and Close
 940     are separate operations.  In SRU, however, these operations do not
 941     exist.
 942    </para>
 943    <itemizedlist>
 944     <listitem>
 945      <para>
 946       SRU has no explicit initialization handshake phase, but
 947       commences immediately with searching, scanning and explain
 948       operations.
 949      </para>
 950     </listitem>
 951     <listitem>
 952      <para>
 953       Neither does SRU have a close operation, since the protocol is
 954       stateless and each request is self-contained.  (It is true that
 955       multiple SRU request/response pairs may be implemented as
 956       multiple HTTP request/response pairs over a single persistent
 957       TCP/IP connection; but the closure of that connection is not a
 958       protocol-level operation.)
 959      </para>
 960     </listitem>
 961     <listitem>
 962      <para>
 963       Retrieval in SRU is part of the
 964       <literal>searchRetrieve</literal> operation, in which a search
 965       is submitted and the response includes a subset of the records
 966       in the result set.  There is no direct analogue of Z39.50's
 967       Present operation which requests records from an established
 968       result set.  In SRU, this is achieved by sending a subsequent
 969       <literal>searchRetrieve</literal> request with the query
 970       <literal>cql.resultSetId=</literal><emphasis>id</emphasis> where
 971       <emphasis>id</emphasis> is the identifier of the previously
 972       generated result-set.
 973      </para>
 974     </listitem>
 975     <listitem>
 976      <para>
 977       Sorting in CQL is done within the
 978       <literal>searchRetrieve</literal> operation - in v1.1, by an
 979       explicit <literal>sort</literal> parameter, but the forthcoming
 980       v1.2 or v2.0 will most likely use an extension of the query
 981       language, CQL for sorting: see
 982       <ulink url="http://zing.z3950.org/cql/sorting.html"/>
 983      </para>
 984     </listitem>
 985    </itemizedlist>
 986    <para>
 987     It can be seen, then, that while Zebra operating as an SRU server
 988     does not provide the same set of operations as when operating as a
 989     Z39.50 server, it does provide equivalent functionality.
 990    </para>
 991   </sect2>
 992  </sect1>
 993 </chapter>
 994
 995  <!-- Keep this comment at the end of the file
 996  Local variables:
 997  mode: sgml
 998  sgml-omittag:t
 999  sgml-shorttag:t
1000  sgml-minimize-attributes:nil
1001  sgml-always-quote-attributes:t
1002  sgml-indent-step:1
1003  sgml-indent-data:t
1004  sgml-parent-document: "zebra.xml"
1005  sgml-local-catalogs: nil
1006  sgml-namecase-general:t
1007  End:
1008  -->