X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Fexamples.xml;h=f2af44421d6e1a8f9c3908e45df5e493b3153b61;hb=518c06f68ffac6658aa792da45282a165b32ca95;hp=1a08eae6a443535a4ee5c688630b52e4104af324;hpb=8ad5e21914fe3a09f6241a06b25fd7e1bbc1d73e;p=idzebra-moved-to-github.git

diff --git a/doc/examples.xml b/doc/examples.xml
index 1a08eae..f2af444 100644
--- a/doc/examples.xml
+++ b/doc/examples.xml
@@ -1,5 +1,5 @@
 <chapter id="examples">
- <!-- $Id: examples.xml,v 1.1 2002-08-29 01:16:12 mike Exp $ -->
+ <!-- $Id: examples.xml,v 1.8 2002-10-10 14:27:18 heikki Exp $ -->
  <title>Example Configurations</title>
 
  <sect1>
@@ -19,80 +19,167 @@
 
     <listitem>
      <para>
-      Where to find the default indexing rules (### default.idx)
+      Where to find subsidiary configuration files, including
+      <literal>default.idx</literal>
+      which specifies the default indexing rules.
      </para>
     </listitem>
 
     <listitem>
      <para>
-      ### Something to do with explain.abs?!
+      What attribute sets to recognise in searches.
      </para>
     </listitem>
 
     <listitem>
      <para>
-      ### Where to find other configuration files, e.g. searches using
-      BIB-1 attributes require a bib1.att configuration file (even if
-      the access point is actually an XPath expression).  These are
-      searched for in the working directory unless otherwise
-      specified.
+      Policy details such as what record type to expect, what
+      low-level indexing algorithm to use, how to identify potential
+      duplicate records, etc.
      </para>
     </listitem>
 
    </itemizedlist>
   </para>
+  <para>
+   Now let's see what goes in the <literal>zebra.cfg</literal> file
+   for some example configurations.
+  </para>
  </sect1>
 
- <sect1>
-  <title>First Example: Minimal Configuration</title>
+ <sect1 id="example1">
+  <title>Example 1: XML Indexing And Searching</title>
 
   <para>
-   This example shows how Zebra can be used, with absolutely minimal
-   configuration, to index a body of XML documents, and search them
-   using XPath expressions to specify access points.
+   This example shows how Zebra can be used with absolutely minimal
+   configuration to index a body of
+   <ulink url="http://www.w3.org/xml/###">XML</ulink>
+   documents, and search them using
+   <ulink url="http://www.w3.org/xpath/###">XPath</ulink>
+   expressions to specify access points.
   </para>
   <para>
-   Go to the
-   <literal>zebra/examples/dinosauricon</literal>
-   directory.  There you will find three significant files:
+   Go to the <literal>examples/dinosauricon</literal> subdirectory
+   of the distribution archive.
+   There you will find a <literal>records</literal> subdirectory,
+   which contains some raw XML data to be added to the database: in
+   this case, as single file, <literal>genera.xml</literal>,
+   which contain information about all the known dinosaur genera as of
+   August 2002.
   </para>
+  <para>
+   Now we need to create the Zebra database, which we do with the
+   Zebra indexer, <literal>zebraidx</literal>, which is
+   driven by the <literal>zebra.cfg</literal> configuration file.
+   For our purposes, we don't need any
+   special behaviour - we can use the defaults - so we start with a
+   minimal file that just tells <literal>zebraidx</literal> where to
+   find the default indexing rules, and how to parse the records:
+   <screen>
+    profilePath: .:../../tab:../../../yaz/tab
+    recordType: grs.sgml
+   </screen>
+  </para>
+  <para>
+   That's all you need for a minimal Zebra configuration.  Now you can
+   roll the XML records into the database and build the indexes:
+   <screen>
+    zebraidx update records
+   </screen>
+  </para>
+  <para>
+   Now start the server.  Like the indexer, its behaviour is
+   controlled by the
+   <literal>zebra.cfg</literal> file; and like the indexer, it works
+   just fine with this minimal configuration.
+   <screen>
+	zebrasrv
+   </screen>
+   By default, the server listens on IP port number 9999, although
+   this can easily be changed - see 
+   <xref linkend="zebrasrv"/>.
+  </para>
+  <para>
+   Now you can use the Z39.50 client program of your choice to execute
+   XPath-based boolean queries and fetch the XML records that satisfy
+   them:
+   <screen>
+    $ yaz-client tcp:@:9999
+    Connecting...Ok.
+    Z&gt; find @attr 1=/GENUS/MEANING @and lizard earthquakes
+    Number of hits: 1
+    Z&gt; format xml
+    Z&gt; show 1
+    &lt;GENUS name="Sauroposeidon" type="with"&gt;
+     &lt;MEANING&gt;lizard Poseidon &lt;LOW&gt;(Greek god of, among other things, earthquakes)&lt;/LOW&gt;&lt;/MEANING&gt;
+     &lt;SPECIES name="proteles"&gt;
+      &lt;AUTHOR type="vide" name="Franklin" year="2000"&gt;&lt;/AUTHOR&gt;
+      &lt;AUTHOR name="Wedel, Cifelli, Sanders"&gt;&lt;/AUTHOR&gt;
+     &lt;/SPECIES&gt;
+     &lt;PLACE name="Oklahoma"&gt;&lt;/PLACE&gt;
+     &lt;TIME value="Albian"&gt;&lt;/TIME&gt;
+     &lt;LENGTH value="30" q="1"&gt;&lt;/LENGTH&gt;
+     &lt;REMAINS content="rib, cervical vertebrae"&gt;&lt;/REMAINS&gt;
+     &lt;ESSAY&gt;
+      &lt;P&gt; This new &lt;NOMEN name="Brachiosaurus"&gt;&lt;/NOMEN&gt;-like &lt;LINK content="dinosaur"&gt;&lt;/LINK&gt;
+      was perhaps the tallest. With its head raised, it stood 60 feet (nearly
+      20 m) tall. &lt;/P&gt;
+     &lt;/ESSAY&gt;
 
-  <itemizedlist>
-   <listitem>
-    <para>
-     The <literal>records</literal> subdirectory, which contains the
-     raw XML data to be added to the database: in this case, just one
-     file, <literal>genera.xml</literal>, which contains information
-     about all the known dinosaur genera as of October 2000.
-     <!-- ### Get more recent data -->
-    </para>
-   </listitem>
+      &lt;idzebra xmlns="http://www.indexdata.dk/zebra/"&gt;
+	&lt;size&gt;593&lt;/size&gt;
+	&lt;localnumber&gt;891&lt;/localnumber&gt;
+	&lt;filename&gt;records/genera.xml&lt;/filename&gt;
+      &lt;/idzebra&gt;
+    &lt;/GENUS&gt;
+   </screen>
+  </para>
+  <para>
+   Now wasn't that easy?
+  </para>
+ </sect1>
+
+ <sect1 id="example2">
+  <title>Example 2: Supporting Z39.50 Searches</title>
+
+  <para>
+   You may have noticed as <literal>zebraidx</literal> was building
+   the database that it issued a warning, which we ignored at the
+   time:
+   <screen>
+    $ zebraidx update records
+    00:45:46-08/10: ../../index/zebraidx(5016) [warn] records/genera.xml:0 Couldn't open GENUS.abs [No such file or directory]
+   </screen>
+   <!-- FIXME ### This needs more text -->
+  </para>
+ </sect1>
+</chapter>
+
+<!--
 
    <listitem>
     <para>
      The master configuration file, <literal>zebra.cfg</literal>,
      which is as short and simple as it can be:
-     <!-- ### Keep this up to date -->
      <screen>
-	# $Header: /home/cvsroot/idis/doc/examples.xml,v 1.1 2002-08-29 01:16:12 mike Exp $
+	# $Header: /home/cvsroot/idis/doc/examples.xml,v 1.8 2002-10-10 14:27:18 heikki Exp $
 	# Bare-bones master configuration file for Zebra
-	attset: bib1.att
+	profilePath: .:../../tab:../../../yaz/tab
      </screen>
      Apart from the comments, which are ignored, all this specifies is
      that the server should recognise the attribute set described in
      the file called
      <literal>bib1.att</literal>.
+     ### What is an attribute set?
     </para>
-    <!-- ### What is an attribute set? -->
    </listitem>
 
    <listitem>
     <para>
      The BIB-1 attribute set configuration file,
      <literal>bib1.att</literal>, which is also as short as possible:
-     <!-- ### Keep this up to date -->
      <screen>
-	# $Header: /home/cvsroot/idis/doc/examples.xml,v 1.1 2002-08-29 01:16:12 mike Exp $
+	# $Header: /home/cvsroot/idis/doc/examples.xml,v 1.8 2002-10-10 14:27:18 heikki Exp $
 	# Bare-bones BIB-1 attribute set file for Zebra
 	reference Bib-1
      </screen>
@@ -101,44 +188,87 @@
      <literal>Bib-1</literal>, a name recognised by the system as
      referring to a well-known opaque identifier that is transmitted
      by clients as part of their searches.
-     <!-- ### Yeuch!  Surely we can say that better! -->
+     ### Yeuch!  Surely we can say that better!
     </para>
     <para>
      ### Can't we somehow say this trivial thing in the main
      configuration file?
     </para>
    </listitem>
-  </itemizedlist>
+-->
 
-  <para>
-   That's all you need for a minimal Zebra configuration.  Now you can
-   roll the XML records into the database and build the indexes:
-   <screen>
-	zebraidx -t grs.sgml update records
-   </screen>
-   <!-- ### What does "grs.sgml" actually mean? -->
-   and start the server which, by default listens on port 9999:
-   <screen>
-	zebrasrv
-   </screen>
-  </para>
-  <para>
-   Now you can use the Z39.50 client program of your choice to execute
-   XPath-based boolean queries and fetch the XML records that satisfy
-   them:
-   <screen>
-	Z&gt; open tcp:@:9999
-	Connecting...Ok.
-	Z&gt; find @attr 1=/GENUS/MEANING @or vertebra jaw
-	Number of hits: 2
-	Z&gt; format xml
-	Z&gt; show 1
-	&lt;GENUS name="Anurognathus" type="with" xmlns:idzebra="http://www.indexdata.dk/zebra/"&gt;&lt;SPECIES name="ammoni"&gt;&lt;AUTHOR name="Doederline" year="1923"&gt;&lt;/AUTHOR&gt;&lt;/SPECIES&gt;&lt;MEANING&gt;tailless&lt;I&gt;or&lt;/I&gt;anuran&lt;LOW&gt;(frog)&lt;/LOW&gt;jaw&lt;/MEANING&gt;&lt;TIME value="Tithonian" section="late"&gt;&lt;/TIME&gt;&lt;PLACE name="Germany"&gt;&lt;/PLACE&gt;&lt;LENGTH wingspan="1" value=".5"&gt;&lt;/LENGTH&gt;&lt;idzebra:size&gt;304&lt;/idzebra:size&gt;&lt;idzebra:localnumber&gt;70&lt;/idzebra:localnumber&gt;&lt;idzebra:filename&gt;records/genera.xml&lt;/idzebra:filename&gt;&lt;/GENUS&gt;
-   </screen>
-  </para>
- </sect1>
+<!--
+	The simplest hello-world example could go like this:
+	
+	Index the document
+	
+	<book>
+	   <title>The art of motorcycle maintenance</title>
+	   <subject scheme="Dewey">zen</subject>
+	</book>
+	
+	And search it like
+	
+	f @attr 1=/book/title motorcycle
+	
+	f @attr 1=/book/subject[@scheme=Dewey] zen
+	
+	If you suddenly decide you want broader interop, you can add
+	an abs file (more or less like this):
+	
+	attset bib1.att
+	tagset tagsetg.tag
+	
+	elm (2,1)       title   title
+	elm (2,21)      subject  subject
+-->
 
-</chapter>
+<!--
+How to include images:
+
+	<mediaobject>
+	  <imageobject>
+	    <imagedata fileref="system.eps" format="eps">
+	  </imageobject>
+	  <imageobject>
+	    <imagedata fileref="system.gif" format="gif">
+	  </imageobject>
+	  <textobject>
+	    <phrase>The Multi-Lingual Search System Architecture</phrase>
+	  </textobject>
+	  <caption>
+	    <para>
+	      <emphasis role="strong">
+		The Multi-Lingual Search System Architecture.
+	      </emphasis>
+	      <para>
+		Network connections across local area networks are
+		represented by straight lines, and those over the
+		internet by jagged lines.
+	  </caption>
+	</mediaobject>
+
+Whene the three <*object> thingies inside the top-level <mediaobject>
+are decreasingly preferred version to include depending on what the
+rendering engine can handle.  I generated the EPS version of the image
+by exporting a line-drawing done in TGIF, then converted that to the
+GIF using a shell-script called "epstogif" which used an appallingly
+baroque sequence of conversions, which I would prefer not to pollute
+the Zebra build environment with:
+
+	#!/bin/sh
+
+	# Yes, what follows is stupidly convoluted, but I can't find a
+	# more straightforward path from the EPS generated by tgif's
+	# "Print" command into a browser-friendly format.
+
+	file=`echo "$1" | sed 's/\.eps//'`
+	ps2pdf "$1" "$file".pdf
+	pdftopbm "$file".pdf "$file"
+	pnmscale 0.50 < "$file"-000001.pbm | pnmcrop | ppmtogif
+	rm -f "$file".pdf "$file"-000001.pbm
+
+-->
 
  <!-- Keep this comment at the end of the file
  Local variables: