doc/examples.xml

   1 <chapter id="examples">
   2  <!-- $Id: examples.xml,v 1.8 2002-10-10 14:27:18 heikki Exp $ -->
   3  <title>Example Configurations</title>
   4
   5  <sect1>
   6   <title>Overview</title>
   7
   8   <para>
   9    <literal>zebraidx</literal> and <literal>zebrasrv</literal> are both
  10    driven by a master configuration file, which may refer to other
  11    subsidiary configuration files.  By default, they try to use
  12    <filename>zebra.cfg</filename> in the working directory as the
  13    master file; but this can be changed using the <literal>-t</literal>
  14    option to specify an alternative master configuration file.
  15   </para>
  16   <para>
  17    The master configuration file tells Zebra:
  18    <itemizedlist>
  19
  20     <listitem>
  21      <para>
  22       Where to find subsidiary configuration files, including
  23       <literal>default.idx</literal>
  24       which specifies the default indexing rules.
  25      </para>
  26     </listitem>
  27
  28     <listitem>
  29      <para>
  30       What attribute sets to recognise in searches.
  31      </para>
  32     </listitem>
  33
  34     <listitem>
  35      <para>
  36       Policy details such as what record type to expect, what
  37       low-level indexing algorithm to use, how to identify potential
  38       duplicate records, etc.
  39      </para>
  40     </listitem>
  41
  42    </itemizedlist>
  43   </para>
  44   <para>
  45    Now let's see what goes in the <literal>zebra.cfg</literal> file
  46    for some example configurations.
  47   </para>
  48  </sect1>
  49
  50  <sect1 id="example1">
  51   <title>Example 1: XML Indexing And Searching</title>
  52
  53   <para>
  54    This example shows how Zebra can be used with absolutely minimal
  55    configuration to index a body of
  56    <ulink url="http://www.w3.org/xml/###">XML</ulink>
  57    documents, and search them using
  58    <ulink url="http://www.w3.org/xpath/###">XPath</ulink>
  59    expressions to specify access points.
  60   </para>
  61   <para>
  62    Go to the <literal>examples/dinosauricon</literal> subdirectory
  63    of the distribution archive.
  64    There you will find a <literal>records</literal> subdirectory,
  65    which contains some raw XML data to be added to the database: in
  66    this case, as single file, <literal>genera.xml</literal>,
  67    which contain information about all the known dinosaur genera as of
  68    August 2002.
  69   </para>
  70   <para>
  71    Now we need to create the Zebra database, which we do with the
  72    Zebra indexer, <literal>zebraidx</literal>, which is
  73    driven by the <literal>zebra.cfg</literal> configuration file.
  74    For our purposes, we don't need any
  75    special behaviour - we can use the defaults - so we start with a
  76    minimal file that just tells <literal>zebraidx</literal> where to
  77    find the default indexing rules, and how to parse the records:
  78    <screen>
  79     profilePath: .:../../tab:../../../yaz/tab
  80     recordType: grs.sgml
  81    </screen>
  82   </para>
  83   <para>
  84    That's all you need for a minimal Zebra configuration.  Now you can
  85    roll the XML records into the database and build the indexes:
  86    <screen>
  87     zebraidx update records
  88    </screen>
  89   </para>
  90   <para>
  91    Now start the server.  Like the indexer, its behaviour is
  92    controlled by the
  93    <literal>zebra.cfg</literal> file; and like the indexer, it works
  94    just fine with this minimal configuration.
  95    <screen>
  96         zebrasrv
  97    </screen>
  98    By default, the server listens on IP port number 9999, although
  99    this can easily be changed - see
 100    <xref linkend="zebrasrv"/>.
 101   </para>
 102   <para>
 103    Now you can use the Z39.50 client program of your choice to execute
 104    XPath-based boolean queries and fetch the XML records that satisfy
 105    them:
 106    <screen>
 107     $ yaz-client tcp:@:9999
 108     Connecting...Ok.
 109     Z&gt; find @attr 1=/GENUS/MEANING @and lizard earthquakes
 110     Number of hits: 1
 111     Z&gt; format xml
 112     Z&gt; show 1
 113     &lt;GENUS name="Sauroposeidon" type="with"&gt;
 114      &lt;MEANING&gt;lizard Poseidon &lt;LOW&gt;(Greek god of, among other things, earthquakes)&lt;/LOW&gt;&lt;/MEANING&gt;
 115      &lt;SPECIES name="proteles"&gt;
 116       &lt;AUTHOR type="vide" name="Franklin" year="2000"&gt;&lt;/AUTHOR&gt;
 117       &lt;AUTHOR name="Wedel, Cifelli, Sanders"&gt;&lt;/AUTHOR&gt;
 118      &lt;/SPECIES&gt;
 119      &lt;PLACE name="Oklahoma"&gt;&lt;/PLACE&gt;
 120      &lt;TIME value="Albian"&gt;&lt;/TIME&gt;
 121      &lt;LENGTH value="30" q="1"&gt;&lt;/LENGTH&gt;
 122      &lt;REMAINS content="rib, cervical vertebrae"&gt;&lt;/REMAINS&gt;
 123      &lt;ESSAY&gt;
 124       &lt;P&gt; This new &lt;NOMEN name="Brachiosaurus"&gt;&lt;/NOMEN&gt;-like &lt;LINK content="dinosaur"&gt;&lt;/LINK&gt;
 125       was perhaps the tallest. With its head raised, it stood 60 feet (nearly
 126       20 m) tall. &lt;/P&gt;
 127      &lt;/ESSAY&gt;
 128
 129       &lt;idzebra xmlns="http://www.indexdata.dk/zebra/"&gt;
 130         &lt;size&gt;593&lt;/size&gt;
 131         &lt;localnumber&gt;891&lt;/localnumber&gt;
 132         &lt;filename&gt;records/genera.xml&lt;/filename&gt;
 133       &lt;/idzebra&gt;
 134     &lt;/GENUS&gt;
 135    </screen>
 136   </para>
 137   <para>
 138    Now wasn't that easy?
 139   </para>
 140  </sect1>
 141
 142  <sect1 id="example2">
 143   <title>Example 2: Supporting Z39.50 Searches</title>
 144
 145   <para>
 146    You may have noticed as <literal>zebraidx</literal> was building
 147    the database that it issued a warning, which we ignored at the
 148    time:
 149    <screen>
 150     $ zebraidx update records
 151     00:45:46-08/10: ../../index/zebraidx(5016) [warn] records/genera.xml:0 Couldn't open GENUS.abs [No such file or directory]
 152    </screen>
 153    <!-- FIXME ### This needs more text -->
 154   </para>
 155  </sect1>
 156 </chapter>
 157
 158 <!--
 159
 160    <listitem>
 161     <para>
 162      The master configuration file, <literal>zebra.cfg</literal>,
 163      which is as short and simple as it can be:
 164      <screen>
 165         # $Header: /home/cvsroot/idis/doc/examples.xml,v 1.8 2002-10-10 14:27:18 heikki Exp $
 166         # Bare-bones master configuration file for Zebra
 167         profilePath: .:../../tab:../../../yaz/tab
 168      </screen>
 169      Apart from the comments, which are ignored, all this specifies is
 170      that the server should recognise the attribute set described in
 171      the file called
 172      <literal>bib1.att</literal>.
 173      ### What is an attribute set?
 174     </para>
 175    </listitem>
 176
 177    <listitem>
 178     <para>
 179      The BIB-1 attribute set configuration file,
 180      <literal>bib1.att</literal>, which is also as short as possible:
 181      <screen>
 182         # $Header: /home/cvsroot/idis/doc/examples.xml,v 1.8 2002-10-10 14:27:18 heikki Exp $
 183         # Bare-bones BIB-1 attribute set file for Zebra
 184         reference Bib-1
 185      </screen>
 186      Apart from the comments, all this specifies is that reference of
 187      the attribute set described by this file is
 188      <literal>Bib-1</literal>, a name recognised by the system as
 189      referring to a well-known opaque identifier that is transmitted
 190      by clients as part of their searches.
 191      ### Yeuch!  Surely we can say that better!
 192     </para>
 193     <para>
 194      ### Can't we somehow say this trivial thing in the main
 195      configuration file?
 196     </para>
 197    </listitem>
 198 -->
 199
 200 <!--
 201         The simplest hello-world example could go like this:
 202
 203         Index the document
 204
 205         <book>
 206            <title>The art of motorcycle maintenance</title>
 207            <subject scheme="Dewey">zen</subject>
 208         </book>
 209
 210         And search it like
 211
 212         f @attr 1=/book/title motorcycle
 213
 214         f @attr 1=/book/subject[@scheme=Dewey] zen
 215
 216         If you suddenly decide you want broader interop, you can add
 217         an abs file (more or less like this):
 218
 219         attset bib1.att
 220         tagset tagsetg.tag
 221
 222         elm (2,1)       title   title
 223         elm (2,21)      subject  subject
 224 -->
 225
 226 <!--
 227 How to include images:
 228
 229         <mediaobject>
 230           <imageobject>
 231             <imagedata fileref="system.eps" format="eps">
 232           </imageobject>
 233           <imageobject>
 234             <imagedata fileref="system.gif" format="gif">
 235           </imageobject>
 236           <textobject>
 237             <phrase>The Multi-Lingual Search System Architecture</phrase>
 238           </textobject>
 239           <caption>
 240             <para>
 241               <emphasis role="strong">
 242                 The Multi-Lingual Search System Architecture.
 243               </emphasis>
 244               <para>
 245                 Network connections across local area networks are
 246                 represented by straight lines, and those over the
 247                 internet by jagged lines.
 248           </caption>
 249         </mediaobject>
 250
 251 Whene the three <*object> thingies inside the top-level <mediaobject>
 252 are decreasingly preferred version to include depending on what the
 253 rendering engine can handle.  I generated the EPS version of the image
 254 by exporting a line-drawing done in TGIF, then converted that to the
 255 GIF using a shell-script called "epstogif" which used an appallingly
 256 baroque sequence of conversions, which I would prefer not to pollute
 257 the Zebra build environment with:
 258
 259         #!/bin/sh
 260
 261         # Yes, what follows is stupidly convoluted, but I can't find a
 262         # more straightforward path from the EPS generated by tgif's
 263         # "Print" command into a browser-friendly format.
 264
 265         file=`echo "$1" | sed 's/\.eps//'`
 266         ps2pdf "$1" "$file".pdf
 267         pdftopbm "$file".pdf "$file"
 268         pnmscale 0.50 < "$file"-000001.pbm | pnmcrop | ppmtogif
 269         rm -f "$file".pdf "$file"-000001.pbm
 270
 271 -->
 272
 273  <!-- Keep this comment at the end of the file
 274  Local variables:
 275  mode: sgml
 276  sgml-omittag:t
 277  sgml-shorttag:t
 278  sgml-minimize-attributes:nil
 279  sgml-always-quote-attributes:t
 280  sgml-indent-step:1
 281  sgml-indent-data:t
 282  sgml-parent-document: "zebra.xml"
 283  sgml-local-catalogs: nil
 284  sgml-namecase-general:t
 285  End:
 286  -->