Start work on ICU based regexp searches

[idzebra-moved-to-github.git] / doc / administration.xml
diff --git a/doc/administration.xml b/doc/administration.xml

index fd1e8f4..762ba7d 100644 (file)
--- a/doc/administration.xml
+++ b/doc/administration.xml
@@ -1,127 +1,19 @@
-<chapter id="quick-start">
- <title>Quick Start </title>
- 
- <para>
-  In this section, we will test the system by indexing a small set of sample
-  GILS records that are included with the software distribution. Go to the
-  <literal>test/gils</literal> subdirectory of the distribution archive.
-  There you will find a configuration
-  file named <literal>zebra.cfg</literal> with the following contents:
-  
-  <screen>
-   # Where are the YAZ tables located.
-   profilePath: ../../../yaz/tab ../../tab
-   
-   # Files that describe the attribute sets supported.
-   attset: bib1.att
-   attset: gils.att
-  </screen>
- </para>
- 
- <para>
-  Now, edit the file and set <literal>profilePath</literal> to the path of the
-  YAZ profile tables (sub directory <literal>tab</literal> of the YAZ
-  distribution archive).
- </para>
- 
- <para>
-  The 48 test records are located in the sub directory
-  <literal>records</literal>. To index these, type:
-  
-  <screen>
-   $ ../../index/zebraidx -t grs.sgml update records
-  </screen>
- </para>
- 
- <para>
-  In the command above the option <literal>-t</literal> specified the record
-  type &mdash; in this case <literal>grs.sgml</literal>.
-  The word <literal>update</literal> followed
-  by a directory root updates all files below that directory node.
- </para>
- 
- <para>
-  If your indexing command was successful, you are now ready to
-  fire up a server. To start a server on port 2100, type:
-  
-  <screen>
-   $ ../../index/zebrasrv tcp:@:2100
-  </screen>
-  
- </para>
-
- <para>
-  The Zebra index that you have just created has a single database
-  named <literal>Default</literal>.
-  The database contains records structured according to
-  the GILS profile, and the server will
-  return records in either either USMARC, GRS-1, or SUTRS depending
-  on what your client asks for.
- </para>
- 
- <para>
-  To test the server, you can use any Z39.50 client (1992 or later).
-  For instance, you can use the demo client that comes with YAZ: Just
-  cd to the <literal>client</literal> subdirectory of the YAZ distribution
-  and type:
- </para>
- <para>
-  <screen>
-   $ ./yaz-client tcp:localhost:2100
-  </screen>
- </para>
- 
- <para>
-  When the client has connected, you can type:
- </para>
- 
-<para>
-  
-  <screen>
-   Z&#62; find surficial
-   Z&#62; show 1
-  </screen>
- </para>
- 
- <para>
-  The default retrieval syntax for the client is USMARC. To try other
-  formats for the same record, try:
- </para>
- <para>
-  <screen>
-   Z&#62;format sutrs
-   Z&#62;show 1
-   Z&#62;format grs-1
-   Z&#62;show 1
-   Z&#62;elements B
-   Z&#62;show 1
-  </screen>
- </para>
- 
- <note>
-  <para>You may notice that more fields are returned when your
-   client requests SUTRS or GRS-1 records. When retrieving GILS records,
-   this is normal - not all of the GILS data elements have mappings in
-   the USMARC record format.
-  </para>
- </note>
- <para>
-  If you've made it this far, there's a good chance that
-  you've got through the compilation OK.
- </para>
- 
-</chapter>
-
  <chapter id="administration">
  <chapter id="administration">
- <title>Administrating Zebra</title>
- 
+ <title>Administrating &zebra;</title>
+ <!-- ### It's a bit daft that this chapter (which describes half of
+          the configuration-file formats) is separated from
+          "recordmodel-grs.xml" (which describes the other half) by the
+          instructions on running zebraidx and zebrasrv.  Some careful
+          re-ordering is required here.
+ -->
+
   <para>
   <para>
-  Unlike many simpler retrieval systems, Zebra supports safe, incremental
+  Unlike many simpler retrieval systems, &zebra; supports safe, incremental
    updates to an existing index.
   </para>
   
   <para>
    updates to an existing index.
   </para>
   
   <para>
-  Normally, when Zebra modifies the index it reads a number of records
+  Normally, when &zebra; modifies the index it reads a number of records
    that you specify.
    Depending on your specifications and on the contents of each record
    one the following events take place for each record:
    that you specify.
    Depending on your specifications and on the contents of each record
    one the following events take place for each record:
@@ -132,8 +24,8 @@
      <listitem>
       <para>
        The record is indexed as if it never occurred before.
      <listitem>
       <para>
        The record is indexed as if it never occurred before.
-      Either the Zebra system doesn't know how to identify the record or
-      Zebra can identify the record but didn't find it to be already indexed.
+      Either the &zebra; system doesn't know how to identify the record or
+      &zebra; can identify the record but didn't find it to be already indexed.
       </para>
      </listitem>
     </varlistentry>
       </para>
      </listitem>
     </varlistentry>
@@ -141,9 +33,9 @@
      <term>Modify</term>
      <listitem>
       <para>
      <term>Modify</term>
      <listitem>
       <para>
-      The record has already been indexed. In this case
-      either the contents of the record or the location (file) of the record
-      indicates that it has been indexed before.
+      The record has already been indexed.
+      In this case either the contents of the record or the location
+      (file) of the record indicates that it has been indexed before.
       </para>
      </listitem>
     </varlistentry>
       </para>
      </listitem>
     </varlistentry>
@@ -160,22 +52,23 @@
   </para>
   
   <para>
   </para>
   
   <para>
-  Please note that in both the modify- and delete- case the Zebra
-  indexer must be able to generate a unique key that identifies the record in
-  question (more on this below).
+  Please note that in both the modify- and delete- case the &zebra;
+  indexer must be able to generate a unique key that identifies the record 
+  in question (more on this below).
   </para>
   
   <para>
   </para>
   
   <para>
-  To administrate the Zebra retrieval system, you run the
+  To administrate the &zebra; retrieval system, you run the
    <literal>zebraidx</literal> program.
    This program supports a number of options which are preceded by a dash,
    and a few commands (not preceded by dash).
  </para>
   
   <para>
    <literal>zebraidx</literal> program.
    This program supports a number of options which are preceded by a dash,
    and a few commands (not preceded by dash).
  </para>
   
   <para>
-  Both the Zebra administrative tool and the Z39.50 server share a
-  set of index files and a global configuration file. The
-  name of the configuration file defaults to <literal>zebra.cfg</literal>.
+  Both the &zebra; administrative tool and the &acro.z3950; server share a
+  set of index files and a global configuration file.
+  The name of the configuration file defaults to
+  <literal>zebra.cfg</literal>.
    The configuration file includes specifications on how to index
    various kinds of records and where the other configuration files
    are located. <literal>zebrasrv</literal> and <literal>zebraidx</literal>
    The configuration file includes specifications on how to index
    various kinds of records and where the other configuration files
    are located. <literal>zebrasrv</literal> and <literal>zebraidx</literal>
@@ -191,7 +84,7 @@
     Indexing is a per-record process, in which either insert/modify/delete
     will occur. Before a record is indexed search keys are extracted from
     whatever might be the layout the original record (sgml,html,text, etc..).
     Indexing is a per-record process, in which either insert/modify/delete
     will occur. Before a record is indexed search keys are extracted from
     whatever might be the layout the original record (sgml,html,text, etc..).
-   The Zebra system currently supports two fundamantal types of records:
+   The &zebra; system currently supports two fundamental types of records:
     structured and simple text.
     To specify a particular extraction process, use either the
     command line option <literal>-t</literal> or specify a
     structured and simple text.
     To specify a particular extraction process, use either the
     command line option <literal>-t</literal> or specify a
@@ -200,19 +93,19 @@
    
   </sect1>
   
    
   </sect1>
   
- <sect1 id="configuration-file">
-  <title>The Zebra Configuration File</title>
+ <sect1 id="zebra-cfg">
+  <title>The &zebra; Configuration File</title>
    
    <para>
    
    <para>
-   The Zebra configuration file, read by <literal>zebraidx</literal> and
+   The &zebra; configuration file, read by <literal>zebraidx</literal> and
     <literal>zebrasrv</literal> defaults to <literal>zebra.cfg</literal>
     unless specified by <literal>-c</literal> option.
    </para>
    
    <para>
     You can edit the configuration file with a normal text editor.
     <literal>zebrasrv</literal> defaults to <literal>zebra.cfg</literal>
     unless specified by <literal>-c</literal> option.
    </para>
    
    <para>
     You can edit the configuration file with a normal text editor.
-   parameter names and values are seperated by colons in the file. Lines
-   starting with a hash sign (<literal>&num;</literal>) are
+   parameter names and values are separated by colons in the file. Lines
+   starting with a hash sign (<literal>#</literal>) are
     treated as comments.
    </para>
    
     treated as comments.
    </para>
    
@@ -233,7 +126,7 @@
     In the configuration file, the group name is placed before the option
     name itself, separated by a dot (.). For instance, to set the record type
     for group <literal>public</literal> to <literal>grs.sgml</literal>
     In the configuration file, the group name is placed before the option
     name itself, separated by a dot (.). For instance, to set the record type
     for group <literal>public</literal> to <literal>grs.sgml</literal>
-   (the SGML-like format for structured records) you would write:
+   (the &acro.sgml;-like format for structured records) you would write:
    </para>
    
    <para>
    </para>
    
    <para>
@@ -258,13 +151,18 @@
     explained further in the following sections.
    </para>
    
     explained further in the following sections.
    </para>
    
+  <!--
+   FIXME - Didn't Adam make something to have multiple databases in multiple dirs...
+  -->
+  
    <para>
     <variablelist>
      
      <varlistentry>
       <term>
        <emphasis>group</emphasis>
    <para>
     <variablelist>
      
      <varlistentry>
       <term>
        <emphasis>group</emphasis>
-      .recordType&lsqb;<emphasis>.name</emphasis>&rsqb;
+      .recordType[<emphasis>.name</emphasis>]:
+      <replaceable>type</replaceable>
       </term>
       <listitem>
        <para>
       </term>
       <listitem>
        <para>
@@ -282,70 +180,76 @@
       </listitem>
      </varlistentry>
      <varlistentry>
       </listitem>
      </varlistentry>
      <varlistentry>
-     <term><emphasis>group</emphasis>.recordId</term>
+     <term><emphasis>group</emphasis>.recordId: 
+     <replaceable>record-id-spec</replaceable></term>
       <listitem>
        <para>
         Specifies how the records are to be identified when updated. See
       <listitem>
        <para>
         Specifies how the records are to be identified when updated. See
-       section <xref linkend="locating-records"/>.
+       <xref linkend="locating-records"/>.
        </para>
       </listitem>
      </varlistentry>
      <varlistentry>
        </para>
       </listitem>
      </varlistentry>
      <varlistentry>
-     <term><emphasis>group</emphasis>.database</term>
+     <term><emphasis>group</emphasis>.database:
+     <replaceable>database</replaceable></term>
       <listitem>
        <para>
       <listitem>
        <para>
-       Specifies the Z39.50 database name.
+       Specifies the &acro.z3950; database name.
+       <!-- FIXME - now we can have multiple databases in one server. -H -->
        </para>
       </listitem>
      </varlistentry>
      <varlistentry>
        </para>
       </listitem>
      </varlistentry>
      <varlistentry>
-     <term><emphasis>group</emphasis>.storeKeys</term>
+     <term><emphasis>group</emphasis>.storeKeys:
+     <replaceable>boolean</replaceable></term>
       <listitem>
        <para>
         Specifies whether key information should be saved for a given
         group of records. If you plan to update/delete this type of
         records later this should be specified as 1; otherwise it
       <listitem>
        <para>
         Specifies whether key information should be saved for a given
         group of records. If you plan to update/delete this type of
         records later this should be specified as 1; otherwise it
-       should be 0 (default), to save register space. See section
-       <xref linkend="file-ids"/>.
+       should be 0 (default), to save register space.
+       <!-- ### this is the first mention of "register" -->
+       See <xref linkend="file-ids"/>.
        </para>
       </listitem>
      </varlistentry>
      <varlistentry>
        </para>
       </listitem>
      </varlistentry>
      <varlistentry>
-     <term><emphasis>group</emphasis>.storeData</term>
+     <term><emphasis>group</emphasis>.storeData:
+      <replaceable>boolean</replaceable></term>
       <listitem>
        <para>
         Specifies whether the records should be stored internally
       <listitem>
        <para>
         Specifies whether the records should be stored internally
-       in the Zebra system files.
+       in the &zebra; system files.
         If you want to maintain the raw records yourself,
         this option should be false (0).
         If you want to maintain the raw records yourself,
         this option should be false (0).
-       If you want Zebra to take care of the records for you, it
+       If you want &zebra; to take care of the records for you, it
         should be true(1).
        </para>
       </listitem>
      </varlistentry>
      <varlistentry>
         should be true(1).
        </para>
       </listitem>
      </varlistentry>
      <varlistentry>
-     <term>register</term>
+     <!-- ### probably a better place to define "register" -->
+     <term>register: <replaceable>register-location</replaceable></term>
       <listitem>
        <para>
       <listitem>
        <para>
-       Specifies the location of the various register files that Zebra uses
-       to represent your databases. See section
-       <xref linkend="register-location"/>.
+       Specifies the location of the various register files that &zebra; uses
+       to represent your databases.
+       See <xref linkend="register-location"/>.
        </para>
       </listitem>
      </varlistentry>
      <varlistentry>
        </para>
       </listitem>
      </varlistentry>
      <varlistentry>
-     <term>shadow</term>
+     <term>shadow: <replaceable>register-location</replaceable></term>
       <listitem>
        <para>
       <listitem>
        <para>
-       Enables the <emphasis>safe update</emphasis> facility of Zebra, and
+       Enables the <emphasis>safe update</emphasis> facility of &zebra;, and
         tells the system where to place the required, temporary files.
         tells the system where to place the required, temporary files.
-       See section
-       <xref linkend="shadow-registers"/>.
+       See <xref linkend="shadow-registers"/>.
        </para>
       </listitem>
      </varlistentry>
      <varlistentry>
        </para>
       </listitem>
      </varlistentry>
      <varlistentry>
-     <term>lockDir</term>
+     <term>lockDir: <replaceable>directory</replaceable></term>
       <listitem>
        <para>
         Directory in which various lock files are stored.
       <listitem>
        <para>
         Directory in which various lock files are stored.
@@ -353,16 +257,16 @@
       </listitem>
      </varlistentry>
      <varlistentry>
       </listitem>
      </varlistentry>
      <varlistentry>
-     <term>keyTmpDir</term>
+     <term>keyTmpDir: <replaceable>directory</replaceable></term>
       <listitem>
        <para>
       <listitem>
        <para>
-       Directory in which temporary files used during zebraidx' update
+       Directory in which temporary files used during zebraidx's update
         phase are stored. 
        </para>
       </listitem>
      </varlistentry>
      <varlistentry>
         phase are stored. 
        </para>
       </listitem>
      </varlistentry>
      <varlistentry>
-     <term>setTmpDir</term>
+     <term>setTmpDir: <replaceable>directory</replaceable></term>
       <listitem>
        <para>
         Specifies the directory that the server uses for temporary result sets.
       <listitem>
        <para>
         Specifies the directory that the server uses for temporary result sets.
@@ -371,35 +275,235 @@
       </listitem>
      </varlistentry>
      <varlistentry>
       </listitem>
      </varlistentry>
      <varlistentry>
-     <term>profilePath</term>
+     <term>profilePath: <replaceable>path</replaceable></term>
+     <listitem>
+      <para>
+       Specifies a path of profile specification files. 
+       The path is composed of one or more directories separated by
+       colon. Similar to <literal>PATH</literal> for UNIX systems.
+      </para>
+     </listitem>
+    </varlistentry>
+
+     <varlistentry>
+      <term>modulePath: <replaceable>path</replaceable></term>
+      <listitem>
+       <para>
+       Specifies a path of record filter modules.
+       The path is composed of one or more directories separated by
+       colon. Similar to <literal>PATH</literal> for UNIX systems.
+       The 'make install' procedure typically puts modules in
+       <filename>/usr/local/lib/idzebra-2.0/modules</filename>.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term>index: <replaceable>filename</replaceable></term>
+      <listitem>
+       <para>
+       Defines the filename which holds fields structure
+       definitions. If omitted, the file <filename>default.idx</filename>
+       is read.
+       Refer to <xref linkend="default-idx-file"/> for
+       more information.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term>sortmax: <replaceable>integer</replaceable></term>
+      <listitem>
+       <para>
+    Specifies the maximum number of records that will be sorted
+    in a result set.  If the result set contains more than 
+    <replaceable>integer</replaceable> records, records after the
+    limit will not be sorted.  If omitted, the default value is
+    1,000.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term>staticrank: <replaceable>integer</replaceable></term>
+      <listitem>
+       <para>
+       Enables whether static ranking is to be enabled (1) or
+       disabled (0). If omitted, it is disabled - corresponding
+       to a value of 0.
+       Refer to <xref linkend="administration-ranking-static"/> .
+       </para>
+      </listitem>
+     </varlistentry>
+
+
+     <varlistentry>
+      <term>estimatehits: <replaceable>integer</replaceable></term>
+      <listitem>
+       <para>
+       Controls whether &zebra; should calculate approximate hit counts and
+       at which hit count it is to be enabled.
+       A value of 0 disables approximate hit counts.
+       For a positive value approximate hit count is enabled
+       if it is known to be larger than <replaceable>integer</replaceable>.
+       </para>
+       <para>
+       Approximate hit counts can also be triggered by a particular
+       attribute in a query.
+       Refer to <xref linkend="querymodel-zebra-global-attr-limit"/>.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry>
+     <term>attset: <replaceable>filename</replaceable></term>
+     <listitem>
+      <para>
+       Specifies the filename(s) of attribute set files for use in
+       searching. In many configurations <filename>bib1.att</filename>
+       is used, but that is not required. If Classic Explain
+       attributes is to be used for searching,
+       <filename>explain.att</filename> must be given.
+       The path to att-files in general can be given using 
+       <literal>profilePath</literal> setting.
+       See also <xref linkend="attset-files"/>.
+      </para>
+     </listitem>
+    </varlistentry>
+    <varlistentry>
+     <term>memMax: <replaceable>size</replaceable></term>
+     <listitem>
+      <para>
+       Specifies <replaceable>size</replaceable> of internal memory
+       to use for the zebraidx program.
+       The amount is given in megabytes - default is 4 (4 MB).
+       The more memory, the faster large updates happen, up to about
+       half the free memory available on the computer.
+      </para>
+     </listitem>
+    </varlistentry>
+    <varlistentry>
+     <term>tempfiles: <replaceable>Yes/Auto/No</replaceable></term>
+     <listitem>
+      <para>
+       Tells zebra if it should use temporary files when indexing. The
+       default is Auto, in which case zebra uses temporary files only
+       if it would need more that <replaceable>memMax</replaceable> 
+       megabytes of memory. This should be good for most uses.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term>root: <replaceable>dir</replaceable></term>
+     <listitem>
+      <para>
+       Specifies a directory base for &zebra;. All relative paths
+       given (in profilePath, register, shadow) are based on this
+       directory. This setting is useful if your &zebra; server
+       is running in a different directory from where
+       <literal>zebra.cfg</literal> is located.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term>passwd: <replaceable>file</replaceable></term>
       <listitem>
        <para>
       <listitem>
        <para>
-       Specifies the location of profile specification files.
+       Specifies a file with description of user accounts for &zebra;.
+       The format is similar to that known to Apache's htpasswd files
+       and UNIX' passwd files. Non-empty lines not beginning with
+       # are considered account lines. There is one account per-line.
+       A line consists of fields separate by a single colon character.
+       First field is username, second is password.
        </para>
       </listitem>
      </varlistentry>
        </para>
       </listitem>
      </varlistentry>
+
      <varlistentry>
      <varlistentry>
-     <term>attset</term>
+     <term>passwd.c: <replaceable>file</replaceable></term>
       <listitem>
        <para>
       <listitem>
        <para>
-       Specifies the filename(s) of attribute set files for use in
-       searching. At least the Bib-1 set should be loaded
-       (<literal>bib1.att</literal>).
-       The <literal>profilePath</literal> setting is used to look for
-       the specified files.
-       See section <xref linkend="attset-files"/>
+       Specifies a file with description of user accounts for &zebra;.
+       File format is similar to that used by the passwd directive except
+       that the password are encrypted. Use Apache's htpasswd or similar
+       for maintenance.
        </para>
       </listitem>
      </varlistentry>
        </para>
       </listitem>
      </varlistentry>
+
      <varlistentry>
      <varlistentry>
-     <term>memMax</term>
+     <term>perm.<replaceable>user</replaceable>:
+     <replaceable>permstring</replaceable></term>
       <listitem>
        <para>
       <listitem>
        <para>
-       Specifies size of internal memory to use for the zebraidx program. The
-       amount is given in megabytes - default is 4 (4 MB).
+       Specifies permissions (privilege) for a user that are allowed
+       to access &zebra; via the passwd system. There are two kinds
+       of permissions currently: read (r) and write(w). By default
+       users not listed in a permission directive are given the read
+       privilege. To specify permissions for a user with no
+       username, or &acro.z3950; anonymous style use
+       <literal>anonymous</literal>. The permstring consists of
+       a sequence of characters. Include character <literal>w</literal>
+       for write/update access, <literal>r</literal> for read access and
+       <literal>a</literal> to allow anonymous access through this account.
        </para>
       </listitem>
      </varlistentry>
        </para>
       </listitem>
      </varlistentry>
+
+    <varlistentry>
+      <term>dbaccess: <replaceable>accessfile</replaceable></term>
+      <listitem>
+        <para>
+         Names a file which lists database subscriptions for individual users.
+         The access file should consists of lines of the form
+          <literal>username: dbnames</literal>, where dbnames is a list of
+          database names, separated by '+'. No whitespace is allowed in the
+          database list.
+       </para>
+      </listitem>
+    </varlistentry>
+
+    <varlistentry>
+      <term>encoding: <replaceable>charsetname</replaceable></term>
+      <listitem>
+        <para>
+         Tells &zebra; to interpret the terms in Z39.50 queries as
+         having been encoded using the specified character
+         encoding.  The default is <literal>ISO-8859-1</literal>; one
+         useful alternative is <literal>UTF-8</literal>.
+       </para>
+      </listitem>
+    </varlistentry>
+
+    <varlistentry>
+      <term>storeKeys: <replaceable>value</replaceable></term>
+      <listitem>
+        <para>
+          Specifies whether &zebra; keeps a copy of indexed keys.
+          Use a value of 1 to enable; 0 to disable. If storeKeys setting is
+          omitted, it is enabled. Enabled storeKeys
+          are required for updating and deleting records.  Disable only 
+          storeKeys to save space and only plan to index data once.
+       </para>
+      </listitem>
+    </varlistentry>
+
+    <varlistentry>
+      <term>storeData: <replaceable>value</replaceable></term>
+      <listitem>
+        <para>
+          Specifies whether &zebra; keeps a copy of indexed records.
+          Use a value of 1 to enable; 0 to disable. If storeData setting is
+          omitted, it is enabled. A storeData setting of 0 (disabled) makes
+          Zebra fetch records from the original locaction in the file 
+          system using filename, file offset and file length. For the
+          DOM and ALVIS filter, the storeData setting is ignored.
+       </para>
+      </listitem>
+    </varlistentry>
+
     </variablelist>
    </para>
    
     </variablelist>
    </para>
    
@@ -409,23 +513,24 @@
    <title>Locating Records</title>
    
    <para>
    <title>Locating Records</title>
    
    <para>
-   The default behaviour of the Zebra system is to reference the
+   The default behavior of the &zebra; system is to reference the
     records from their original location, i.e. where they were found when you
     records from their original location, i.e. where they were found when you
-   ran <literal>zebraidx</literal>.
+   run <literal>zebraidx</literal>.
     That is, when a client wishes to retrieve a record
     following a search operation, the files are accessed from the place
     where you originally put them - if you remove the files (without
     That is, when a client wishes to retrieve a record
     following a search operation, the files are accessed from the place
     where you originally put them - if you remove the files (without
-   running <literal>zebraidx</literal> again, the client
-   will receive a diagnostic message.
+   running <literal>zebraidx</literal> again, the server will return
+   diagnostic number 14 (``System error in presenting records'') to
+   the client.
    </para>
    
    <para>
     If your input files are not permanent - for example if you retrieve
     your records from an outside source, or if they were temporarily
     mounted on a CD-ROM drive,
    </para>
    
    <para>
     If your input files are not permanent - for example if you retrieve
     your records from an outside source, or if they were temporarily
     mounted on a CD-ROM drive,
-   you may want Zebra to make an internal copy of them. To do this,
+   you may want &zebra; to make an internal copy of them. To do this,
     you specify 1 (true) in the <literal>storeData</literal> setting. When
     you specify 1 (true) in the <literal>storeData</literal> setting. When
-   the Z39.50 server retrieves the records they will be read from the
+   the &acro.z3950; server retrieves the records they will be read from the
     internal file structures of the system.
    </para>
    
     internal file structures of the system.
    </para>
    
@@ -454,14 +559,14 @@
    <para>
     Consider a system in which you have a group of text files called
     <literal>simple</literal>.
    <para>
     Consider a system in which you have a group of text files called
     <literal>simple</literal>.
-   That group of records should belong to a Z39.50 database called
+   That group of records should belong to a &acro.z3950; database called
     <literal>textbase</literal>.
     The following <literal>zebra.cfg</literal> file will suffice:
    </para>
    <para>
     
     <screen>
     <literal>textbase</literal>.
     The following <literal>zebra.cfg</literal> file will suffice:
    </para>
    <para>
     
     <screen>
-    profilePath: /usr/local/yaz
+    profilePath: /usr/local/idzebra/tab
      attset: bib1.att
      simple.recordType: text
      simple.database: textbase
      attset: bib1.att
      simple.recordType: text
      simple.database: textbase
@@ -509,7 +614,7 @@
     disk space than simpler indexing methods, but it makes it easier for
     you to keep the index in sync with a frequently changing set of data.
     If you combine this system with the <emphasis>safe update</emphasis>
     disk space than simpler indexing methods, but it makes it easier for
     you to keep the index in sync with a frequently changing set of data.
     If you combine this system with the <emphasis>safe update</emphasis>
-   facility (see below), you never have to take your server offline for
+   facility (see below), you never have to take your server off-line for
     maintenance or register updating purposes.
    </para>
    
     maintenance or register updating purposes.
    </para>
    
@@ -517,11 +622,15 @@
     To enable indexing with pathname IDs, you must specify
     <literal>file</literal> as the value of <literal>recordId</literal>
     in the configuration file. In addition, you should set
     To enable indexing with pathname IDs, you must specify
     <literal>file</literal> as the value of <literal>recordId</literal>
     in the configuration file. In addition, you should set
-   <literal>storeKeys</literal> to <literal>1</literal>, since the Zebra
+   <literal>storeKeys</literal> to <literal>1</literal>, since the &zebra;
     indexer must save additional information about the contents of each record
     indexer must save additional information about the contents of each record
-   in order to modify the indices correctly at a later time.
+   in order to modify the indexes correctly at a later time.
    </para>
    
    </para>
    
+   <!--
+    FIXME - There must be a simpler way to do this with Adams string tags -H
+     -->
+
    <para>
     For example, to update records of group <literal>esdd</literal>
     located below
    <para>
     For example, to update records of group <literal>esdd</literal>
     located below
@@ -543,7 +652,7 @@
    <note>
     <para>You cannot start out with a group of records with simple
      indexing (no record IDs as in the previous section) and then later
    <note>
     <para>You cannot start out with a group of records with simple
      indexing (no record IDs as in the previous section) and then later
-    enable file record Ids. Zebra must know from the first time that you
+    enable file record Ids. &zebra; must know from the first time that you
      index the group that
      the files should be indexed with file record IDs.
     </para>
      index the group that
      the files should be indexed with file record IDs.
     </para>
@@ -557,18 +666,19 @@
     and then run <literal>zebraidx</literal> with the
     <literal>update</literal> command.
    </para>
     and then run <literal>zebraidx</literal> with the
     <literal>update</literal> command.
    </para>
+  <!-- ### what happens if a file contains multiple records? -->
  </sect1>
   
   <sect1 id="generic-ids">
    <title>Indexing with General Record IDs</title>
    
    <para>
  </sect1>
   
   <sect1 id="generic-ids">
    <title>Indexing with General Record IDs</title>
    
    <para>
-   When using this method you construct an (almost) arbritrary, internal
+   When using this method you construct an (almost) arbitrary, internal
     record key based on the contents of the record itself and other system
     information. If you have a group of records that explicitly associates
     an ID with each record, this method is convenient. For example, the
     record format may contain a title or a ID-number - unique within the group.
     record key based on the contents of the record itself and other system
     information. If you have a group of records that explicitly associates
     an ID with each record, this method is convenient. For example, the
     record format may contain a title or a ID-number - unique within the group.
-   In either case you specify the Z39.50 attribute set and use-attribute
+   In either case you specify the &acro.z3950; attribute set and use-attribute
     location in which this information is stored, and the system looks at
     that field to determine the identity of the record.
    </para>
     location in which this information is stored, and the system looks at
     that field to determine the identity of the record.
    </para>
@@ -653,9 +763,9 @@
    </para>
    
    <para>
    </para>
    
    <para>
-   For instance, the sample GILS records that come with the Zebra
+   For instance, the sample GILS records that come with the &zebra;
     distribution contain a unique ID in the data tagged Control-Identifier.
     distribution contain a unique ID in the data tagged Control-Identifier.
-   The data is mapped to the Bib-1 use attribute Identifier-standard
+   The data is mapped to the &acro.bib1; use attribute Identifier-standard
     (code 1007). To use this field as a record id, specify
     <literal>(bib1,Identifier-standard)</literal> as the value of the
     <literal>recordId</literal> in the configuration file.
     (code 1007). To use this field as a record id, specify
     <literal>(bib1,Identifier-standard)</literal> as the value of the
     <literal>recordId</literal> in the configuration file.
@@ -672,7 +782,7 @@
    </para>
    
    <para>
    </para>
    
    <para>
-   (see section <xref linkend="data-model"/>
+   (see <xref linkend="grs"/>
      for details of how the mapping between elements of your records and
      searchable attributes is established).
    </para>
      for details of how the mapping between elements of your records and
      searchable attributes is established).
    </para>
@@ -707,7 +817,7 @@
     <literal>zebraidx</literal>. If you wish to store these, possibly large,
     files somewhere else, you must add the <literal>register</literal>
     entry to the <literal>zebra.cfg</literal> file.
     <literal>zebraidx</literal>. If you wish to store these, possibly large,
     files somewhere else, you must add the <literal>register</literal>
     entry to the <literal>zebra.cfg</literal> file.
-   Furthermore, the Zebra system allows its file
+   Furthermore, the &zebra; system allows its file
     structures to span multiple file systems, which is useful for
     managing very large databases. 
    </para>
     structures to span multiple file systems, which is useful for
     managing very large databases. 
    </para>
@@ -716,40 +826,43 @@
     The value of the <literal>register</literal> setting is a sequence
     of tokens. Each token takes the form:
     
     The value of the <literal>register</literal> setting is a sequence
     of tokens. Each token takes the form:
     
-   <screen>
-    <emphasis>dir</emphasis><literal>:</literal><emphasis>size</emphasis>. 
-   </screen>
+   <emphasis>dir</emphasis><literal>:</literal><emphasis>size</emphasis> 
     
     The <emphasis>dir</emphasis> specifies a directory in which index files
     will be stored and the <emphasis>size</emphasis> specifies the maximum
     
     The <emphasis>dir</emphasis> specifies a directory in which index files
     will be stored and the <emphasis>size</emphasis> specifies the maximum
-   size of all files in that directory. The Zebra indexer system fills
+   size of all files in that directory. The &zebra; indexer system fills
     each directory in the order specified and use the next specified
     directories as needed.
     The <emphasis>size</emphasis> is an integer followed by a qualifier
     each directory in the order specified and use the next specified
     directories as needed.
     The <emphasis>size</emphasis> is an integer followed by a qualifier
-   code, <literal>M</literal> for megabytes,
+   code, 
+   <literal>b</literal> for bytes,
     <literal>k</literal> for kilobytes.
     <literal>k</literal> for kilobytes.
+   <literal>M</literal> for megabytes,
+   <literal>G</literal> for gigabytes.
+   Specifying a negative value disables the checking (it still needs the unit, 
+   use <literal>-1b</literal>).
    </para>
    
    <para>
    </para>
    
    <para>
-   For instance, if you have allocated two disks for your register, and
+   For instance, if you have allocated three disks for your register, and
     the first disk is mounted
     the first disk is mounted
-   on <literal>/d1</literal> and has 200 Mb of free space and the
-   second, mounted on <literal>/d2</literal> has 300 Mb, you could
-   put this entry in your configuration file:
+   on <literal>/d1</literal> and has 2GB of free space, the
+   second, mounted on <literal>/d2</literal> has 3.6 GB, and the third,
+   on which you have more space than you bother to worry about, mounted on 
+   <literal>/d3</literal> you could put this entry in your configuration file:
     
     <screen>
     
     <screen>
-    register: /d1:200M /d2:300M
+    register: /d1:2G /d2:3600M /d3:-1b
     </screen>
     </screen>
-   
    </para>
    
    <para>
    </para>
    
    <para>
-   Note that Zebra does not verify that the amount of space specified is
+   Note that &zebra; does not verify that the amount of space specified is
     actually available on the directory (file system) specified - it is
     your responsibility to ensure that enough space is available, and that
     other applications do not attempt to use the free space. In a large
     production system, it is recommended that you allocate one or more
     actually available on the directory (file system) specified - it is
     your responsibility to ensure that enough space is available, and that
     other applications do not attempt to use the free space. In a large
     production system, it is recommended that you allocate one or more
-   filesystem exclusively to the Zebra register files.
+   file system exclusively to the &zebra; register files.
    </para>
    
   </sect1>
    </para>
    
   </sect1>
@@ -757,13 +870,13 @@
   <sect1 id="shadow-registers">
    <title>Safe Updating - Using Shadow Registers</title>
    
   <sect1 id="shadow-registers">
    <title>Safe Updating - Using Shadow Registers</title>
    
-  <sect2>
+  <sect2 id="shadow-registers-description">
     <title>Description</title>
     
     <para>
     <title>Description</title>
     
     <para>
-    The Zebra server supports <emphasis>updating</emphasis> of the index
+    The &zebra; server supports <emphasis>updating</emphasis> of the index
      structures. That is, you can add, modify, or remove records from
      structures. That is, you can add, modify, or remove records from
-    databases managed by Zebra without rebuilding the entire index.
+    databases managed by &zebra; without rebuilding the entire index.
      Since this process involves modifying structured files with various
      references between blocks of data in the files, the update process
      is inherently sensitive to system crashes, or to process interruptions:
      Since this process involves modifying structured files with various
      references between blocks of data in the files, the update process
      is inherently sensitive to system crashes, or to process interruptions:
@@ -778,7 +891,7 @@
     
     <para>
      You can solve these problems by enabling the shadow register system in
     
     <para>
      You can solve these problems by enabling the shadow register system in
-    Zebra.
+    &zebra;.
      During the updating procedure, <literal>zebraidx</literal> will temporarily
      write changes to the involved files in a set of "shadow
      files", without modifying the files that are accessed by the
      During the updating procedure, <literal>zebraidx</literal> will temporarily
      write changes to the involved files in a set of "shadow
      files", without modifying the files that are accessed by the
@@ -811,7 +924,7 @@
     
    </sect2>
    
     
    </sect2>
    
-  <sect2>
+  <sect2 id="shadow-registers-how-to-use">
     <title>How to Use Shadow Register Files</title>
     
     <para>
     <title>How to Use Shadow Register Files</title>
     
     <para>
@@ -821,7 +934,7 @@
      <literal>zebra.cfg</literal> file.
      The syntax of the <literal>shadow</literal> entry is exactly the
      same as for the <literal>register</literal> entry
      <literal>zebra.cfg</literal> file.
      The syntax of the <literal>shadow</literal> entry is exactly the
      same as for the <literal>register</literal> entry
-    (see section <xref linkend="register-location"/>).
+    (see <xref linkend="register-location"/>).
       The location of the shadow area should be
       <emphasis>different</emphasis> from the location of the main register
       area (if you have specified one - remember that if you provide no
       The location of the shadow area should be
       <emphasis>different</emphasis> from the location of the main register
       area (if you have specified one - remember that if you provide no
@@ -843,7 +956,6 @@
      
      <screen>
       register: /d1:500M
      
      <screen>
       register: /d1:500M
-     
       shadow: /scratch1:100M /scratch2:200M
      </screen>
      
       shadow: /scratch1:100M /scratch2:200M
      </screen>
      
@@ -855,14 +967,13 @@
      In order to make changes to the system take effect for the
      users, you'll have to submit a "commit" command after a
      (sequence of) update operation(s).
      In order to make changes to the system take effect for the
      users, you'll have to submit a "commit" command after a
      (sequence of) update operation(s).
-    You can ask the indexer to commit the changes immediately
-    after the update operation:
     </para>
     
     <para>
      
      <screen>
     </para>
     
     <para>
      
      <screen>
-     $ zebraidx update /d1/records update /d2/more-records commit
+     $ zebraidx update /d1/records 
+     $ zebraidx commit
      </screen>
      
     </para>
      </screen>
      
     </para>
@@ -874,7 +985,7 @@
     <para>
      
      <screen>
     <para>
      
      <screen>
-     $ zebraidx -g books update /d1/records update /d2/more-records
+     $ zebraidx -g books update /d1/records  /d2/more-records
       $ zebraidx -g fun update /d3/fun-records
       $ zebraidx commit
      </screen>
       $ zebraidx -g fun update /d3/fun-records
       $ zebraidx commit
      </screen>
@@ -922,2552 +1033,927 @@
    </sect2>
    
   </sect1>
    </sect2>
    
   </sect1>
- 
-</chapter>
-
-<chapter id="zebraidx">
- <title>Running the Maintenance Interface (zebraidx)</title>
- 
- <para>
-  The following is a complete reference to the command line interface to
-  the <literal>zebraidx</literal> application.
- </para>
- 
- <para>
-  Syntax
-  
-  <screen>
-   $ zebraidx &lsqb;options&rsqb; command &lsqb;directory&rsqb; ...
-  </screen>
-  
-  Options:
-  <variablelist>
-   
-   <varlistentry>
-    <term>-t <replaceable>type</replaceable></term>
-    <listitem>
-     <para>
-      Update all files as <replaceable>type</replaceable>. Currently, the
-      types supported are <literal>text</literal> and
-      <literal>grs</literal><replaceable>.subtype</replaceable>.
-      If no <replaceable>subtype</replaceable> is provided for the GRS
-      (General Record Structure) type, the canonical input format
-      is assumed (see section <xref linkend="local-representation"/>).
-       Generally, it is probably advisable to specify the record types
-       in the <literal>zebra.cfg</literal> file (see section
-       <xref linkend="record-types"/>), to avoid confusion at
-        subsequent updates.
-     </para>
-    </listitem>
-   </varlistentry>
-   <varlistentry>
-    <term>-c <replaceable>config-file</replaceable></term>
-    <listitem>
-     <para>
-      Read the configuration file
-      <replaceable>config-file</replaceable> instead of
-      <literal>zebra.cfg</literal>.
-     </para>
-    </listitem>
-   </varlistentry>
-   <varlistentry>
-    <term>-g <replaceable>group</replaceable></term>
-    <listitem>
-     <para>
-      Update the files according to the group
-      settings for <replaceable>group</replaceable> (see section
-      <xref linkend="configuration-file"/>).
-     </para>
-    </listitem>
-   </varlistentry>
-   <varlistentry>
-    <term>-d <replaceable>database</replaceable></term>
-    <listitem>
-     <para>
-      The records located should be associated with the database name
-      <replaceable>database</replaceable> for access through the Z39.50 server.
-     </para>
-    </listitem>
-   </varlistentry>
-   <varlistentry>
-    <term>-m <replaceable>mbytes</replaceable></term>
-    <listitem>
-     <para>
-      Use <replaceable>mbytes</replaceable> of megabytes before flushing
-      keys to background storage. This setting affects performance when
-      updating large databases.
-     </para>
-    </listitem>
-   </varlistentry>
-   <varlistentry>
-    <term>-n</term>
-    <listitem>
-     <para>
-      Disable the use of shadow registers for this operation
-      (see section <xref linkend="shadow-registers"/>).
-     </para>
-    </listitem>
-   </varlistentry>
-   <varlistentry>
-    <term>-s</term>
-    <listitem>
-     <para>
-      Show analysis of the indexing process. The maintenance
-      program works in a read-only mode and doesn't change the state
-      of the index. This options is very useful when you wish to test a
-      new profile.
-     </para>
-    </listitem>
-   </varlistentry>
-   <varlistentry>
-    <term>-V</term>
-    <listitem>
-     <para>
-      Show Zebra version.
-     </para>
-    </listitem>
-   </varlistentry>
-   <varlistentry>
-    <term>-v <replaceable>level</replaceable></term>
-    <listitem>
-     <para>
-      Set the log level to <replaceable>level</replaceable>.
-      <replaceable>level</replaceable> should be one of
-      <literal>none</literal>, <literal>debug</literal>, and
-      <literal>all</literal>.
-     </para>
-    </listitem>
-   </varlistentry>
-  </variablelist>
- </para>
- 
- <para>
-  Commands
-  <variablelist>
-   
-   <varlistentry>
-    <term>update <replaceable>directory</replaceable></term>
-    <listitem>
-     <para>
-      Update the register with the files contained in
-      <replaceable>directory</replaceable>.
-      If no directory is provided, a list of files is read from
-      <literal>stdin</literal>.
-      See section <xref linkend="administration"/>.
-     </para>
-    </listitem>
-   </varlistentry>
-   <varlistentry>
-    <term>delete <replaceable>directory</replaceable></term>
-    <listitem>
-     <para>
-      Remove the records corresponding to the files found under
-      <replaceable>directory</replaceable> from the register.
-     </para>
-    </listitem>
-   </varlistentry>
-   <varlistentry>
-    <term>commit</term>
-    <listitem>
-     <para>
-      Write the changes resulting from the last <literal>update</literal>
-      commands to the register. This command is only available if the use of
-      shadow register files is enabled (see section
-      <xref linkend="shadow-registers"/>).
-     </para>
-    </listitem>
-   </varlistentry>
-  </variablelist>
- </para>
  
  
-</chapter>
  
  
-<chapter id="server">
- <title>The Z39.50 Server</title>
+ <sect1 id="administration-ranking">
+  <title>Relevance Ranking and Sorting of Result Sets</title>
  
  
- <sect1 id="zebrasrv">
-  <title>Running the Z39.50 Server (zebrasrv)</title>
+  <sect2 id="administration-overview">
+   <title>Overview</title>
+   <para>
+    The default ordering of a result set is left up to the server,
+    which inside &zebra; means sorting in ascending document ID order. 
+    This is not always the order humans want to browse the sometimes
+    quite large hit sets. Ranking and sorting comes to the rescue.
+   </para>
  
  
-  <para>
-   <emphasis remap="bf">Syntax</emphasis>
+   <para> 
+    In cases where a good presentation ordering can be computed at
+    indexing time, we can use a fixed <literal>static ranking</literal>
+    scheme, which is provided for the <literal>alvis</literal>
+    indexing filter. This defines a fixed ordering of hit lists,
+    independently of the query issued. 
+   </para>
  
  
-   <screen>
-    zebrasrv &lsqb;options&rsqb; &lsqb;listener-address ...&rsqb;
-   </screen>
-
-  </para>
-
-  <para>
-   <emphasis remap="bf">Options</emphasis>
-   <variablelist>
-
-    <varlistentry>
-     <term>-a <replaceable>APDU file</replaceable></term>
-     <listitem>
-      <para>
-       Specify a file for dumping PDUs (for diagnostic purposes).
-       The special name "-" sends output to <literal>stderr</literal>.
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term>-c <replaceable>config-file</replaceable></term>
-     <listitem>
-      <para>
-       Read configuration information from
-       <replaceable>config-file</replaceable>.
-       The default configuration is <literal>./zebra.cfg</literal>.
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term>-S</term>
-     <listitem>
-      <para>
-       Don't fork on connection requests. This can be useful for
-       symbolic-level debugging. The server can only accept a single
-       connection in this mode.
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term>-s</term>
-     <listitem>
-      <para>
-       Use the SR protocol.
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term>-z</term>
-     <listitem>
-      <para>
-       Use the Z39.50 protocol (default). These two options complement
-       eachother. You can use both multiple times on the same command
-       line, between listener-specifications (see below). This way, you
-       can set up the server to listen for connections in both protocols
-       concurrently, on different local ports.
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term>-l <replaceable>logfile</replaceable></term>
-     <listitem>
-      <para>
-       Specify an output file for the diagnostic messages.
-       The default is to write this information to <literal>stderr</literal>.
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term>-v <replaceable>log-level</replaceable></term>
-     <listitem>
-      <para>
-       The log level. Use a comma-separated list of members of the set
-       &lcub;fatal,debug,warn,log,all,none&rcub;.
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term>-u <replaceable>username</replaceable></term>
-     <listitem>
-      <para>
-       Set user ID. Sets the real UID of the server process to that of the
-       given <replaceable>username</replaceable>.
-       It's useful if you aren't comfortable with having the
-       server run as root, but you need to start it as such to bind a
-       privileged port.
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term>-w <replaceable>working-directory</replaceable></term>
-     <listitem>
-      <para>
-       Change working directory.
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term>-i</term>
-     <listitem>
-      <para>
-       Run under the Internet superserver, <literal>inetd</literal>.
-       Make sure you use the logfile option <literal>-l</literal> in
-       conjunction with this mode and specify the <literal>-l</literal>
-       option before any other options.
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term>-t <replaceable>timeout</replaceable></term>
-     <listitem>
-      <para>
-       Set the idle session timeout (default 60 minutes).
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term>-k <replaceable>kilobytes</replaceable></term>
-     <listitem>
-      <para>
-       Set the (approximate) maximum size of
-       present response messages. Default is 1024 Kb (1 Mb).
-      </para>
-     </listitem>
-    </varlistentry>
-   </variablelist>
-  </para>
-
-  <para>
-   A <replaceable>listener-address</replaceable> consists of a transport
-   mode followed by a colon (:) followed by a listener address.
-   The transport mode is either <literal>ssl</literal> or
-   <literal>tcp</literal>.
-  </para>
-
-  <para>
-   For TCP, an address has the form
-  </para>
-
-  <para>
-
-   <screen>
-    hostname | IP-number &lsqb;: portnumber&rsqb;
-   </screen>
-
-  </para>
-
-  <para>
-   The port number defaults to 210 (standard Z39.50 port).
-  </para>
-
-  <para>
-   Examples
-  </para>
-
-  <para>
-
-   <screen>
-    tcp:dranet.dra.com
-
-    ssl:secure.lib.com:3000
-   </screen>
-
-  </para>
-
-  <para>
-   In both cases, the special hostname "@" is mapped to
-   the address INADDR_ANY, which causes the server to listen on any local
-   interface. To start the server listening on the registered port for
-   Z39.50, and to drop root privileges once the ports are bound, execute
-   the server like this (from a root shell):
-  </para>
-
-  <para>
-
-   <screen>
-    zebrasrv -u daemon tcp:@
-   </screen>
-
-  </para>
-
-  <para>
-   You can replace <literal>daemon</literal> with another user, eg.
-   your own account, or a dedicated IR server account.
-  </para>
-
-  <para>
-   The default behavior for <literal>zebrasrv</literal> is to establish
-   a single TCP/IP listener, for the Z39.50 protocol, on port 9999.
-  </para>
-
- </sect1>
-
- <sect1 id="protocol-support">
-  <title>Z39.50 Protocol Support and Behavior</title>
-
-  <sect2>
-   <title>Initialization</title>
+   <para>
+    There are cases, however, where relevance of hit set documents is
+    highly dependent on the query processed.
+    Simply put, <literal>dynamic relevance ranking</literal> 
+    sorts a set of retrieved records such that those most likely to be
+    relevant to your request are retrieved first. 
+    Internally, &zebra; retrieves all documents that satisfy your
+    query, and re-orders the hit list to arrange them based on
+    a measurement of similarity between your query and the content of
+    each record. 
+   </para>
  
     <para>
  
     <para>
-    During initialization, the server will negotiate to version 3 of the
-    Z39.50 protocol, and the option bits for Search, Present, Scan,
-    NamedResultSets, and concurrentOperations will be set, if requested by
-    the client. The maximum PDU size is negotiated down to a maximum of
-    1Mb by default.
+    Finally, there are situations where hit sets of documents should be
+    <literal>sorted</literal> during query time according to the
+    lexicographical ordering of certain sort indexes created at
+    indexing time.
     </para>
     </para>
-
    </sect2>
  
    </sect2>
  
-  <sect2 id="search">
-   <title>Search</title>
-
-   <para>
-    The supported query type are 1 and 101. All operators are currently
-    supported with the restriction that only proximity units of type "word"
-    are supported for the proximity operator.
-    Queries can be arbitrarily complex.
-    Named result sets are supported, and result sets can be used as operands
-    without limitations.
-    Searches may span multiple databases.
-   </para>
-
-   <para>
-    The server has full support for piggy-backed present requests (see
-    also the following section).
-   </para>
-
-   <para>
-    <emphasis>Use</emphasis> attributes are interpreted according to the
-    attribute sets which have been loaded in the
-    <literal>zebra.cfg</literal> file, and are matched against specific
-    fields as specified in the <literal>.abs</literal> file which
-    describes the profile of the records which have been loaded.
-    If no Use attribute is provided, a default of Bib-1 Any is assumed.
-   </para>
-
-   <para>
-    If a <emphasis>Structure</emphasis> attribute of
-    <emphasis>Phrase</emphasis> is used in conjunction with a
-    <emphasis>Completeness</emphasis> attribute of
-    <emphasis>Complete (Sub)field</emphasis>, the term is matched
-    against the contents of the phrase (long word) register, if one
-    exists for the given <emphasis>Use</emphasis> attribute.
-    A phrase register is created for those fields in the
-    <literal>.abs</literal> file that contains a
-    <literal>p</literal>-specifier.
-   </para>
-
-   <para>
-    If <emphasis>Structure</emphasis>=<emphasis>Phrase</emphasis> is
-    used in conjunction with <emphasis>Incomplete Field</emphasis> - the
-    default value for <emphasis>Completeness</emphasis>, the
-    search is directed against the normal word registers, but if the term
-    contains multiple words, the term will only match if all of the words
-    are found immediately adjacent, and in the given order.
-    The word search is performed on those fields that are indexed as
-    type <literal>w</literal> in the <literal>.abs</literal> file.
-   </para>
  
  
+ <sect2 id="administration-ranking-static">
+  <title>Static Ranking</title>
+  
     <para>
     <para>
-    If the <emphasis>Structure</emphasis> attribute is
-    <emphasis>Word List</emphasis>,
-    <emphasis>Free-form Text</emphasis>, or
-    <emphasis>Document Text</emphasis>, the term is treated as a
-    natural-language, relevance-ranked query.
-    This search type uses the word register, i.e. those fields
-    that are indexed as type <literal>w</literal> in the
-    <literal>.abs</literal> file.
+    &zebra; uses internally inverted indexes to look up term frequencies
+    in documents. Multiple queries from different indexes can be
+    combined by the binary boolean operations <literal>AND</literal>, 
+    <literal>OR</literal> and/or <literal>NOT</literal> (which
+    is in fact a binary <literal>AND NOT</literal> operation). 
+    To ensure fast query execution
+    speed, all indexes have to be sorted in the same order.
     </para>
     </para>
-
     <para>
     <para>
-    If the <emphasis>Structure</emphasis> attribute is
-    <emphasis>Numeric String</emphasis> the term is treated as an integer.
-    The search is performed on those fields that are indexed
-    as type <literal>n</literal> in the <literal>.abs</literal> file.
+    The indexes are normally sorted according to document 
+    <literal>ID</literal> in
+    ascending order, and any query which does not invoke a special
+    re-ranking function will therefore retrieve the result set in
+    document 
+    <literal>ID</literal>
+    order.
     </para>
     </para>
-
     <para>
     <para>
-    If the <emphasis>Structure</emphasis> attribute is
-    <emphasis>URx</emphasis> the term is treated as a URX (URL) entity.
-    The search is performed on those fields that are indexed as type
-    <literal>u</literal> in the <literal>.abs</literal> file.
+    If one defines the 
+    <screen>
+    staticrank: 1 
+    </screen> 
+    directive in the main core &zebra; configuration file, the internal document
+    keys used for ordering are augmented by a preceding integer, which
+    contains the static rank of a given document, and the index lists
+    are ordered 
+    first by ascending static rank,
+    then by ascending document <literal>ID</literal>.
+    Zero
+    is the ``best'' rank, as it occurs at the
+    beginning of the list; higher numbers represent worse scores.
+   </para>
+   <para>
+    The experimental <literal>alvis</literal> filter provides a
+    directive to fetch static rank information out of the indexed &acro.xml;
+    records, thus making <emphasis>all</emphasis> hit sets ordered
+    after <emphasis>ascending</emphasis> static
+    rank, and for those doc's which have the same static rank, ordered
+    after <emphasis>ascending</emphasis> doc <literal>ID</literal>.
+    See <xref linkend="record-model-alvisxslt"/> for the gory details.
+   </para>
+    </sect2>
+
+
+ <sect2 id="administration-ranking-dynamic">
+  <title>Dynamic Ranking</title>
+   <para>
+    In order to fiddle with the static rank order, it is necessary to
+    invoke additional re-ranking/re-ordering using dynamic
+    ranking or score functions. These functions return positive
+    integer scores, where <emphasis>highest</emphasis> score is 
+    ``best'';
+    hit sets are sorted according to <emphasis>descending</emphasis> 
+    scores (in contrary
+    to the index lists which are sorted according to
+    ascending rank number and document ID).
+   </para>
+   <para>
+    Dynamic ranking is enabled by a directive like one of the
+    following in the zebra configuration file (use only one of these a time!):
+    <screen> 
+    rank: rank-1        # default TDF-IDF like
+    rank: rank-static   # dummy do-nothing
+    </screen>
     </para>
     </para>
-
+ 
     <para>
     <para>
-    If the <emphasis>Structure</emphasis> attribute is
-    <emphasis>Local Number</emphasis> the term is treated as
-    native Zebra Record Identifier.
+    Dynamic ranking is done at query time rather than
+    indexing time (this is why we
+    call it ``dynamic ranking'' in the first place ...)
+    It is invoked by adding
+    the &acro.bib1; relation attribute with
+    value ``relevance'' to the &acro.pqf; query (that is,
+    <literal>@attr&nbsp;2=102</literal>, see also  
+    <ulink url="&url.z39.50;bib1.html">
+     The &acro.bib1; Attribute Set Semantics</ulink>, also in 
+      <ulink url="&url.z39.50.attset.bib1;">HTML</ulink>). 
+    To find all articles with the word <literal>Eoraptor</literal> in
+    the title, and present them relevance ranked, issue the &acro.pqf; query:
+    <screen>
+     @attr 2=102 @attr 1=4 Eoraptor
+    </screen>
     </para>
  
     </para>
  
-   <para>
-    If the <emphasis>Relation</emphasis> attribute is
-    <emphasis>Equals</emphasis> (default), the term is matched
-    in a normal fashion (modulo truncation and processing of
-    individual words, if required).
-    If <emphasis>Relation</emphasis> is <emphasis>Less Than</emphasis>,
-    <emphasis>Less Than or Equal</emphasis>,
-    <emphasis>Greater than</emphasis>, or <emphasis>Greater than or
-     Equal</emphasis>, the term is assumed to be numerical, and a
-    standard regular expression is constructed to match the given
-    expression.
-    If <emphasis>Relation</emphasis> is <emphasis>Relevance</emphasis>,
-    the standard natural-language query processor is invoked.
-   </para>
+    <sect3 id="administration-ranking-dynamic-rank1">
+     <title>Dynamically ranking using &acro.pqf; queries with the 'rank-1' 
+      algorithm</title>
  
     <para>
  
     <para>
-    For the <emphasis>Truncation</emphasis> attribute,
-    <emphasis>No Truncation</emphasis> is the default.
-    <emphasis>Left Truncation</emphasis> is not supported.
-    <emphasis>Process &num;</emphasis> is supported, as is
-    <emphasis>Regxp-1</emphasis>.
-    <emphasis>Regxp-2</emphasis> enables the fault-tolerant (fuzzy)
-    search. As a default, a single error (deletion, insertion, 
-    replacement) is accepted when terms are matched against the register
-    contents.
-   </para>
-
-   <sect3>
-    <title>Regular expressions</title>
-    
-    <para>
-     Each term in a query is interpreted as a regular expression if
-     the truncation value is either <emphasis>Regxp-1</emphasis> (102)
-     or <emphasis>Regxp-2</emphasis> (103).
-     Both query types follow the same syntax with the operands:
+     The default <literal>rank-1</literal> ranking module implements a 
+     TF/IDF (Term Frequecy over Inverse Document Frequency) like
+     algorithm. In contrast to the usual definition of TF/IDF
+     algorithms, which only considers searching in one full-text
+     index, this one works on multiple indexes at the same time.
+     More precisely, 
+     &zebra; does boolean queries and searches in specific addressed
+     indexes (there are inverted indexes pointing from terms in the
+     dictionary to documents and term positions inside documents). 
+     It works like this:
       <variablelist>
       <variablelist>
-
        <varlistentry>
        <varlistentry>
-       <term>x</term>
+       <term>Query Components</term>
         <listitem>
          <para>
         <listitem>
          <para>
-         Matches the character <emphasis>x</emphasis>.
+         First, the boolean query is dismantled into its principal components,
+         i.e. atomic queries where one term is looked up in one index.
+         For example, the query
+         <screen>
+        @attr 2=102 @and @attr 1=1010 Utah @attr 1=1018 Springer
+         </screen>
+         is a boolean AND between the atomic parts
+         <screen>
+       @attr 2=102 @attr 1=1010 Utah
+         </screen>
+          and
+         <screen>
+       @attr 2=102 @attr 1=1018 Springer
+         </screen>
+         which gets processed each for itself.
          </para>
         </listitem>
        </varlistentry>
          </para>
         </listitem>
        </varlistentry>
+
        <varlistentry>
        <varlistentry>
-       <term>.</term>
+       <term>Atomic hit lists</term>
         <listitem>
          <para>
         <listitem>
          <para>
-         Matches any character.
+         Second, for each atomic query, the hit list of documents is
+         computed.
          </para>
          </para>
-       </listitem>
-      </varlistentry>
-      <varlistentry>
-       <term><literal>[</literal>..<literal>]</literal></term>
-       <listitem>
          <para>
          <para>
-         Matches the set of characters specified;
-         such as <literal>[abc]</literal> or <literal>[a-c]</literal>.
+         In this example, two hit lists for each index  
+         <literal>@attr 1=1010</literal>  and  
+         <literal>@attr 1=1018</literal> are computed.
          </para>
         </listitem>
        </varlistentry>
          </para>
         </listitem>
        </varlistentry>
-     </variablelist>
-     and the operators:
-     <variablelist>
-      
+
        <varlistentry>
        <varlistentry>
-       <term>x*</term>
+       <term>Atomic scores</term>
         <listitem>
          <para>
         <listitem>
          <para>
-         Matches <emphasis>x</emphasis> zero or more times. Priority: high.
+         Third, each document in the hit list is assigned a score (_if_ ranking
+         is enabled and requested in the query)  using a TF/IDF scheme.
+        </para>
+        <para>
+         In this example, both atomic parts of the query assign the magic
+         <literal>@attr 2=102</literal> relevance attribute, and are
+         to be used in the relevance ranking functions. 
          </para>
          </para>
-       </listitem>
-      </varlistentry>
-      <varlistentry>
-       <term>x+</term>
-       <listitem>
          <para>
          <para>
-         Matches <emphasis>x</emphasis> one or more times. Priority: high.
+         It is possible to apply dynamic ranking on only parts of the
+         &acro.pqf; query: 
+         <screen>
+          @and @attr 2=102 @attr 1=1010 Utah @attr 1=1018 Springer
+         </screen>
+         searches for all documents which have the term 'Utah' on the
+         body of text, and which have the term 'Springer' in the publisher
+         field, and sort them in the order of the relevance ranking made on
+         the body-of-text index only. 
          </para>
         </listitem>
        </varlistentry>
          </para>
         </listitem>
        </varlistentry>
+
        <varlistentry>
        <varlistentry>
-       <term>x?</term>
+       <term>Hit list merging</term>
         <listitem>
          <para>
         <listitem>
          <para>
-         Matches <emphasis>x</emphasis> once or twice. Priority: high.
+         Fourth, the atomic hit lists are merged according to the boolean
+         conditions to a final hit list of documents to be returned.
+        </para>
+        <para>
+        This step is always performed, independently of the fact that
+        dynamic ranking is enabled or not.
          </para>
         </listitem>
        </varlistentry>
          </para>
         </listitem>
        </varlistentry>
+
        <varlistentry>
        <varlistentry>
-       <term>xy</term>
+       <term>Document score computation</term>
         <listitem>
          <para>
         <listitem>
          <para>
-         Matches <emphasis>x</emphasis>, then <emphasis>y</emphasis>.
-         Priority: medium.
+         Fifth, the total score of a document is computed as a linear
+         combination of the atomic scores of the atomic hit lists
          </para>
          </para>
+        <para>
+         Ranking weights may be used to pass a value to a ranking
+         algorithm, using the non-standard &acro.bib1; attribute type 9.
+         This allows one branch of a query to use one value while
+         another branch uses a different one.  For example, we can search
+         for <literal>utah</literal> in the 
+         <literal>@attr 1=4</literal> index with weight 30, as
+         well as in the <literal>@attr 1=1010</literal> index with weight 20:
+         <screen>
+         @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 @attr 1=1010 city
+         </screen>
+        </para>
+        <para>
+         The default weight is
+         sqrt(1000) ~ 34 , as the &acro.z3950; standard prescribes that the top score
+         is 1000 and the bottom score is 0, encoded in integers.
+        </para>
+        <warning>
+         <para>
+          The ranking-weight feature is experimental. It may change in future
+          releases of zebra. 
+         </para>
+        </warning>
         </listitem>
        </varlistentry>
         </listitem>
        </varlistentry>
+
        <varlistentry>
        <varlistentry>
-       <term>x&verbar;y</term>
+       <term>Re-sorting of hit list</term>
         <listitem>
          <para>
         <listitem>
          <para>
-         Matches either <emphasis>x</emphasis> or <emphasis>y</emphasis>.
-         Priority: low.
+         Finally, the final hit list is re-ordered according to scores.
          </para>
         </listitem>
        </varlistentry>
       </variablelist>
          </para>
         </listitem>
        </varlistentry>
       </variablelist>
-     The order of evaluation may be changed by using parentheses.
-    </para>
+ 
+
+<!--
+Still need to describe the exact TF/IDF formula. Here's the info, need -->
+<!--to extract it in human readable form .. MC
+
+static int calc (void *set_handle, zint sysno, zint staticrank,
+                 int *stop_flag)
+{
+    int i, lo, divisor, score = 0;
+    struct rank_set_info *si = (struct rank_set_info *) set_handle;
+
+    if (!si->no_rank_entries)
+        return -1;   /* ranking not enabled for any terms */
+
+    for (i = 0; i < si->no_entries; i++)
+    {
+        yaz_log(log_level, "calc: i=%d rank_flag=%d lo=%d",
+                i, si->entries[i].rank_flag, si->entries[i].local_occur);
+        if (si->entries[i].rank_flag && (lo = si->entries[i].local_occur))
+            score += (8+log2_int (lo)) * si->entries[i].global_inv *
+                si->entries[i].rank_weight;
+    }
+    divisor = si->no_rank_entries * (8+log2_int (si->last_pos/si->no_entries));
+    score = score / divisor;
+    yaz_log(log_level, "calc sysno=" ZINT_FORMAT " score=%d", sysno, score);
+    if (score > 1000)
+        score = 1000;
+    /* reset the counts for the next term */
+    for (i = 0; i < si->no_entries; i++)
+        si->entries[i].local_occur = 0;
+    return score;
+}
+
+
+where lo = si->entries[i].local_occur is the local documents term-within-index frequency, si->entries[i].global_inv represents the IDF part (computed in static void *begin()), and
+si->entries[i].rank_weight is the weight assigner per index (default 34, or set in the @attr 9=xyz magic)
+
+Finally, the IDF part is computed as:
+
+static void *begin (struct zebra_register *reg,
+                    void *class_handle, RSET rset, NMEM nmem,
+                    TERMID *terms, int numterms)
+{
+    struct rank_set_info *si =
+        (struct rank_set_info *) nmem_malloc (nmem,sizeof(*si));
+    int i;
+
+    yaz_log(log_level, "rank-1 begin");
+    si->no_entries = numterms;
+    si->no_rank_entries = 0;
+    si->nmem=nmem;
+    si->entries = (struct rank_term_info *)
+        nmem_malloc (si->nmem, sizeof(*si->entries)*numterms);
+    for (i = 0; i < numterms; i++)
+    {
+        zint g = rset_count(terms[i]->rset);
+        yaz_log(log_level, "i=%d flags=%s '%s'", i,
+                terms[i]->flags, terms[i]->name );
+        if  (!strncmp (terms[i]->flags, "rank,", 5))
+        {
+            const char *cp = strstr(terms[i]->flags+4, ",w=");
+            si->entries[i].rank_flag = 1;
+            if (cp)
+                si->entries[i].rank_weight = atoi (cp+3);
+            else
+              si->entries[i].rank_weight = 34; /* sqrroot of 1000 */
+            yaz_log(log_level, " i=%d weight=%d g="ZINT_FORMAT, i,
+                     si->entries[i].rank_weight, g);
+            (si->no_rank_entries)++;
+        }
+        else
+            si->entries[i].rank_flag = 0;
+        si->entries[i].local_occur = 0;  /* FIXME */
+        si->entries[i].global_occur = g;
+        si->entries[i].global_inv = 32 - log2_int (g);
+        yaz_log(log_level, " global_inv = %d g = " ZINT_FORMAT,
+                (int) (32-log2_int (g)), g);
+        si->entries[i].term = terms[i];
+        si->entries[i].term_index=i;
+        terms[i]->rankpriv = &(si->entries[i]);
+    }
+    return si;
+}
+
+
+where g = rset_count(terms[i]->rset) is the count of all documents in this specific index hit list, and the IDF part then is
+
+ si->entries[i].global_inv = 32 - log2_int (g);
+   -->
+
+   </para>
+
  
      <para>
  
      <para>
-     If the first character of the <emphasis>Regxp-2</emphasis> query
-     is a plus character (<literal>+</literal>) it marks the
-     beginning of a section with non-standard specifiers.
-     The next plus character marks the end of the section.
-     Currently Zebra only supports one specifier, the error tolerance,
-     which consists one digit. 
+    The <literal>rank-1</literal> algorithm
+    does not use the static rank 
+    information in the list keys, and will produce the same ordering
+    with or without static ranking enabled.
      </para>
      </para>
+ 
  
  
+    <!--
+    <sect3 id="administration-ranking-dynamic-rank1">
+     <title>Dynamically ranking &acro.pqf; queries with the 'rank-static' 
+      algorithm</title>
      <para>
      <para>
-     Since the plus operator is normally a suffix operator the addition to
-     the query syntax doesn't violate the syntax for standard regular
-     expressions.
-    </para>
+    The dummy <literal>rank-static</literal> reranking/scoring
+    function returns just 
+    <literal>score = max int - staticrank</literal>
+    in order to preserve the static ordering of hit sets that would
+    have been produced had it not been invoked.
+    Obviously, to combine static and dynamic ranking usefully,
+    it is necessary
+    to make a new ranking 
+    function; this is left
+    as an exercise for the reader. 
+   </para>
+    </sect3>
+    -->
+ 
+   <warning>
+     <para>
+      <literal>Dynamic ranking</literal> is not compatible
+      with <literal>estimated hit sizes</literal>, as all documents in
+      a hit set must be accessed to compute the correct placing in a
+      ranking sorted list. Therefore the use attribute setting
+      <literal>@attr&nbsp;2=102</literal> clashes with 
+      <literal>@attr&nbsp;9=integer</literal>. 
+     </para>
+   </warning>  
+
+   <!--
+    we might want to add ranking like this:
+    UNPUBLISHED:
+    Simple BM25 Extension to Multiple Weighted Fields
+    Stephen Robertson, Hugo Zaragoza and Michael Taylor
+    Microsoft Research
+    ser@microsoft.com
+    hugoz@microsoft.com
+    mitaylor2microsoft.com
+   -->
+
+    </sect3>
+
+    <sect3 id="administration-ranking-dynamic-cql">
+     <title>Dynamically ranking &acro.cql; queries</title>
+     <para>
+      Dynamic ranking can be enabled during sever side &acro.cql;
+      query expansion by adding <literal>@attr&nbsp;2=102</literal>
+      chunks to the &acro.cql; config file. For example
+      <screen>
+       relationModifier.relevant               = 2=102
+      </screen>
+      invokes dynamic ranking each time a &acro.cql; query of the form 
+      <screen>
+       Z> querytype cql
+       Z> f alvis.text =/relevant house
+      </screen>
+      is issued. Dynamic ranking can also be automatically used on
+      specific &acro.cql; indexes by (for example) setting
+      <screen>
+       index.alvis.text                        = 1=text 2=102
+      </screen>
+      which then invokes dynamic ranking each time a &acro.cql; query of the form 
+      <screen>
+       Z> querytype cql
+       Z> f alvis.text = house
+      </screen>
+      is issued.
+     </para>
+     
+    </sect3>
  
  
-   </sect3>
+    </sect2>
  
  
-   <sect3>
-    <title>Query examples</title>
  
  
-    <para>
-     Phrase search for <emphasis>information retrieval</emphasis> in
-     the title-register:
+ <sect2 id="administration-ranking-sorting">
+  <title>Sorting</title>
+   <para>
+     &zebra; sorts efficiently using special sorting indexes
+     (type=<literal>s</literal>; so each sortable index must be known
+     at indexing time, specified in the configuration of record
+     indexing.  For example, to enable sorting according to the &acro.bib1;
+     <literal>Date/time-added-to-db</literal> field, one could add the line
       <screen>
       <screen>
-      @attr 1=4 "information retrieval"
+        xelm /*/@created               Date/time-added-to-db:s
       </screen>
       </screen>
-    </para>
-
-    <para>
-     Ranked search for the same thing:
+     to any <literal>.abs</literal> record-indexing configuration file.
+     Similarly, one could add an indexing element of the form
+     <screen><![CDATA[       
+      <z:index name="date-modified" type="s">
+       <xsl:value-of select="some/xpath"/>
+      </z:index>
+      ]]></screen>
+     to any <literal>alvis</literal>-filter indexing stylesheet.
+     </para>
+     <para>
+      Indexing can be specified at searching time using a query term
+      carrying the non-standard
+      &acro.bib1; attribute-type <literal>7</literal>.  This removes the
+      need to send a &acro.z3950; <literal>Sort Request</literal>
+      separately, and can dramatically improve latency when the client
+      and server are on separate networks.
+      The sorting part of the query is separate from the rest of the
+      query - the actual search specification - and must be combined
+      with it using OR.
+     </para>
+     <para>
+      A sorting subquery needs two attributes: an index (such as a
+      &acro.bib1; type-1 attribute) specifying which index to sort on, and a
+      type-7 attribute whose value is be <literal>1</literal> for
+      ascending sorting, or <literal>2</literal> for descending.  The
+      term associated with the sorting attribute is the priority of
+      the sort key, where <literal>0</literal> specifies the primary
+      sort key, <literal>1</literal> the secondary sort key, and so
+      on.
+     </para>
+    <para>For example, a search for water, sort by title (ascending),
+    is expressed by the &acro.pqf; query
       <screen>
       <screen>
-      @attr 1=4 @attr 2=102 "Information retrieval"
+     @or @attr 1=1016 water @attr 7=1 @attr 1=4 0
       </screen>
       </screen>
-    </para>
-
-    <para>
-     Phrase search with a regular expression:
+      whereas a search for water, sort by title ascending, 
+     then date descending would be
       <screen>
       <screen>
-      @attr 1=4 @attr 5=102 "informat.* retrieval"
+     @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1
       </screen>
      </para>
       </screen>
      </para>
-
      <para>
      <para>
-     Ranked search with a regular expression:
-     <screen>
-      @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
-     </screen>
-    </para>
+     Notice the fundamental differences between <literal>dynamic
+     ranking</literal> and <literal>sorting</literal>: there can be
+     only one ranking function defined and configured; but multiple
+     sorting indexes can be specified dynamically at search
+     time. Ranking does not need to use specific indexes, so
+     dynamic ranking can be enabled and disabled without
+     re-indexing; whereas, sorting indexes need to be
+     defined before indexing.
+     </para>
  
  
-    <para>
-     In the GILS schema (<literal>gils.abs</literal>), the
-     west-bounding-coordinate is indexed as type <literal>n</literal>,
-     and is therefore searched by specifying
-     <emphasis>structure</emphasis>=<emphasis>Numeric String</emphasis>.
-     To match all those records with west-bounding-coordinate greater
-     than -114 we use the following query:
-     <screen>
-      @attr 4=109 @attr 2=5 @attr gils 1=2038 -114
-     </screen> 
-    </para>
-   </sect3>
-  </sect2>
+ </sect2>
  
  
-  <sect2>
-   <title>Present</title>
-   <para>
-    The present facility is supported in a standard fashion. The requested
-    record syntax is matched against the ones supported by the profile of
-    each record retrieved. If no record syntax is given, SUTRS is the
-    default. The requested element set name, again, is matched against any
-    provided by the relevant record profiles.
-   </para>
-  </sect2>
-  <sect2>
-   <title>Scan</title>
-   <para>
-    The attribute combinations provided with the termListAndStartPoint are
-    processed in the same way as operands in a query (see above).
-    Currently, only the term and the globalOccurrences are returned with
-    the termInfo structure.
-   </para>
-  </sect2>
-  <sect2>
-   <title>Sort</title>
  
  
-   <para>
-    Z39.50 specifies three diffent types of sort criterias.
-    Of these Zebra supports the attribute specification type in which
-    case the use attribute specifies the "Sort register".
-    Sort registers are created for those fields that are of type "sort" in
-    the default.idx file. 
-    The corresponding character mapping file in default.idx specifies the
-    ordinal of each character used in the actual sort.
-   </para>
+ </sect1>
  
  
-   <para>
-    Z39.50 allows the client to specify sorting on one or more input
-    result sets and one output result set.
-    Zebra supports sorting on one result set only which may or may not
-    be the same as the output result set.
+ <sect1 id="administration-extended-services">
+  <title>Extended Services: Remote Insert, Update and Delete</title>
+  
+   <note>
+    <para>
+     Extended services are only supported when accessing the &zebra;
+     server using the <ulink url="&url.z39.50;">&acro.z3950;</ulink>
+     protocol. The <ulink url="&url.sru;">&acro.sru;</ulink> protocol does
+     not support extended services.
+    </para>
+   </note>
+   
+  <para>
+    The extended services are not enabled by default in zebra - due to the
+    fact that they modify the system. &zebra; can be configured
+    to allow anybody to
+    search, and to allow only updates for a particular admin user
+    in the main zebra configuration file <filename>zebra.cfg</filename>.
+    For user <literal>admin</literal>, you could use:
+    <screen>
+     perm.anonymous: r
+     perm.admin: rw
+     passwd: passwordfile
+    </screen>
+    And in the password file 
+    <filename>passwordfile</filename>, you have to specify users and
+    encrypted passwords as colon separated strings. 
+    Use a tool like <filename>htpasswd</filename> 
+    to maintain the encrypted passwords. 
+    <screen> 
+     admin:secret
+    </screen>
+    It is essential to configure  &zebra; to store records internally, 
+    and to support
+    modifications and deletion of records:
+    <screen>
+     storeData: 1
+     storeKeys: 1
+    </screen>
+    The general record type should be set to any record filter which
+    is able to parse &acro.xml; records, you may use any of the two
+    declarations (but not both simultaneously!)
+    <screen>    
+     recordType: dom.filter_dom_conf.xml
+     # recordType: grs.xml
+    </screen>
+    Notice the difference to the specific instructions
+    <screen>    
+     recordType.xml: dom.filter_dom_conf.xml
+     # recordType.xml: grs.xml
+    </screen> 
+    which only work when indexing XML files from the filesystem using
+    the <literal>*.xml</literal> naming convention.
     </para>
     </para>
-  </sect2>
-  <sect2>
-   <title>Close</title>
     <para>
     <para>
-    If a Close PDU is received, the server will respond with a Close PDU
-    with reason=FINISHED, no matter which protocol version was negotiated
-    during initialization. If the protocol version is 3 or more, the
-    server will generate a Close PDU under certain circumstances,
-    including a session timeout (60 minutes by default), and certain kinds of
-    protocol errors. Once a Close PDU has been sent, the protocol
-    association is considered broken, and the transport connection will be
-    closed immediately upon receipt of further data, or following a short
-    timeout.
+    To enable transaction safe shadow indexing,
+    which is extra important for this kind of operation, set
+    <screen>
+     shadow: directoryname: size (e.g. 1000M)
+    </screen>
+     See <xref linkend="zebra-cfg"/> for additional information on
+     these configuration options.
     </para>
     </para>
-  </sect2>
- </sect1>
-</chapter>
+   <note>
+    <para>
+     It is not possible to carry information about record types or
+     similar to &zebra; when using extended services, due to
+     limitations of the <ulink url="&url.z39.50;">&acro.z3950;</ulink>
+     protocol. Therefore, indexing filters can not be chosen on a
+     per-record basis. One and only one general &acro.xml; indexing filter
+     must be defined.  
+     <!-- but because it is represented as an OID, we would need some
+     form of proprietary mapping scheme between record type strings and
+     OIDs. -->
+     <!--
+     However, as a minimum, it would be extremely useful to enable
+     people to use &acro.marc21;, assuming grs.marcxml.marc21 as a record
+     type.  
+     -->
+    </para>
+   </note>
  
  
-<chapter id="record-model">
- <title>The Record Model</title>
  
  
- <para>
-  The Zebra system is designed to support a wide range of data management
-  applications. The system can be configured to handle virtually any
-  kind of structured data. Each record in the system is associated with
-  a <emphasis>record schema</emphasis> which lends context to the data
-  elements of the record.
-  Any number of record schema can coexist in the system.
-  Although it may be wise to use only a single schema within
-  one database, the system poses no such restrictions.
- </para>
+   <sect2 id="administration-extended-services-z3950">
+    <title>Extended services in the &acro.z3950; protocol</title>
  
  
- <para>
-  The record model described in this chapter applies to the fundamental,
-  structured
-  record type <literal>grs</literal> as introduced in
-  section <xref linkend="record-types"/>.
- </para>
+    <para>
+     The <ulink url="&url.z39.50;">&acro.z3950;</ulink> standard allows
+     servers to accept special binary <emphasis>extended services</emphasis>
+     protocol packages, which may be used to insert, update and delete
+     records into servers. These carry  control and update
+     information to the servers, which are encoded in seven package fields: 
+    </para>
  
  
- <para>
-  Records pass through three different states during processing in the
-  system.
- </para>
-
- <para>
-
-  <itemizedlist>
-   <listitem>
+    <table id="administration-extended-services-z3950-table" frame="top">
+     <title>Extended services &acro.z3950; Package Fields</title>
+      <tgroup cols="3">
+       <thead>
+       <row>
+         <entry>Parameter</entry>
+         <entry>Value</entry>
+         <entry>Notes</entry>
+        </row>
+      </thead>
+       <tbody>
+        <row>
+         <entry><literal>type</literal></entry>
+         <entry><literal>'update'</literal></entry>
+         <entry>Must be set to trigger extended services</entry>
+        </row>
+        <row>
+         <entry><literal>action</literal></entry>
+         <entry><literal>string</literal></entry>
+        <entry>
+         Extended service action type with 
+         one of four possible values: <literal>recordInsert</literal>,
+         <literal>recordReplace</literal>,
+         <literal>recordDelete</literal>,
+         and <literal>specialUpdate</literal>
+        </entry>
+        </row>
+        <row>
+         <entry><literal>record</literal></entry>
+         <entry><literal>&acro.xml; string</literal></entry>
+         <entry>An &acro.xml; formatted string containing the record</entry>
+        </row>
+       <row>
+       <entry><literal>syntax</literal></entry>
+       <entry><literal>'xml'</literal></entry>
+       <entry>XML/SUTRS/MARC. GRS-1 not supported.
+        The default filter (record type) as given by recordType in
+        zebra.cfg is used to parse the record.</entry>
+       </row>
+        <row>
+         <entry><literal>recordIdOpaque</literal></entry>
+         <entry><literal>string</literal></entry>
+         <entry>
+         Optional client-supplied, opaque record
+         identifier used under insert operations.
+        </entry>
+        </row>
+        <row>
+         <entry><literal>recordIdNumber </literal></entry>
+         <entry><literal>positive number</literal></entry>
+         <entry>&zebra;'s internal system number,
+         not allowed for  <literal>recordInsert</literal> or 
+         <literal>specialUpdate</literal> actions which result in fresh
+         record inserts.
+        </entry>
+        </row>
+        <row>
+         <entry><literal>databaseName</literal></entry>
+         <entry><literal>database identifier</literal></entry>
+        <entry>
+         The name of the database to which the extended services should be 
+         applied.
+        </entry>
+        </row>
+      </tbody>
+      </tgroup>
+     </table>
+
+
+   <para>
+    The <literal>action</literal> parameter can be any of 
+    <literal>recordInsert</literal> (will fail if the record already exists),
+    <literal>recordReplace</literal> (will fail if the record does not exist),
+    <literal>recordDelete</literal> (will fail if the record does not
+       exist), and
+    <literal>specialUpdate</literal> (will insert or update the record
+       as needed, record deletion is not possible).
+   </para>
  
      <para>
  
      <para>
-     When records are accessed by the system, they are represented
-     in their local, or native format. This might be SGML or HTML files,
-     News or Mail archives, MARC records. If the system doesn't already
-     know how to read the type of data you need to store, you can set up an
-     input filter by preparing conversion rules based on regular
-     expressions and possibly augmented by a flexible scripting language
-     (Tcl).
-     The input filter produces as output an internal representation:
+     During all actions, the
+     usual rules for internal record ID generation apply, unless an
+     optional <literal>recordIdNumber</literal> &zebra; internal ID or a
+    <literal>recordIdOpaque</literal> string identifier is assigned. 
+     The default ID generation is
+     configured using the <literal>recordId:</literal> from
+     <filename>zebra.cfg</filename>.  
+     See <xref linkend="zebra-cfg"/>.   
+    </para>
  
  
+   <para>
+    Setting of the <literal>recordIdNumber</literal> parameter, 
+    which must be an existing &zebra; internal system ID number, is not
+    allowed during any  <literal>recordInsert</literal> or 
+     <literal>specialUpdate</literal> action resulting in fresh record
+    inserts.
      </para>
      </para>
-   </listitem>
-   <listitem>
  
      <para>
  
      <para>
-     When records are processed by the system, they are represented
-     in a tree-structure, constructed by tagged data elements hanging off a
-     root node. The tagged elements may contain data or yet more tagged
-     elements in a recursive structure. The system performs various
-     actions on this tree structure (indexing, element selection, schema
-     mapping, etc.),
-
+     When retrieving existing
+     records indexed with &acro.grs1; indexing filters, the &zebra; internal 
+     ID number is returned in the field
+    <literal>/*/id:idzebra/localnumber</literal> in the namespace
+    <literal>xmlns:id="http://www.indexdata.dk/zebra/"</literal>,
+    where it can be picked up for later record updates or deletes. 
      </para>
      </para>
-   </listitem>
-   <listitem>
-
+ 
      <para>
      <para>
-     Before transmitting records to the client, they are first
-     converted from the internal structure to a form suitable for exchange
-     over the network - according to the Z39.50 standard.
+     A new element set for retrieval of internal record
+     data has been added, which can be used to access minimal records
+     containing only the <literal>recordIdNumber</literal> &zebra;
+     internal ID, or the <literal>recordIdOpaque</literal> string
+     identifier. This works for any indexing filter used.
+     See <xref linkend="special-retrieval"/>.
      </para>
      </para>
-   </listitem>
-
-  </itemizedlist>
-
- </para>
-
- <sect1 id="local-representation">
-  <title>Local Representation</title>
-
-  <para>
-   As mentioned earlier, Zebra places few restrictions on the type of
-   data that you can index and manage. Generally, whatever the form of
-   the data, it is parsed by an input filter specific to that format, and
-   turned into an internal structure that Zebra knows how to handle. This
-   process takes place whenever the record is accessed - for indexing and
-   retrieval.
-  </para>
-
-  <para>
-   The RecordType parameter in the <literal>zebra.cfg</literal> file, or
-   the <literal>-t</literal> option to the indexer tells Zebra how to
-   process input records.
-   Two basic types of processing are available - raw text and structured
-   data. Raw text is just that, and it is selected by providing the
-   argument <emphasis>text</emphasis> to Zebra. Structured records are
-   all handled internally using the basic mechanisms described in the
-   subsequent sections.
-   Zebra can read structured records in many different formats.
-   How this is done is governed by additional parameters after the
-   "grs" keyboard, separated by "." characters.
-  </para>
-
-  <para>
-   Three basic subtypes to the <emphasis>grs</emphasis> type are
-   currently available:
-  </para>
-
-  <para>
-   <variablelist>
-    <varlistentry>
-     <term>grs.sgml</term>
-     <listitem>
-      <para>
-       This is the canonical input format &mdash;
-       described below. It is a simple SGML-like syntax.
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term>grs.regx.<emphasis>filter</emphasis></term>
-     <listitem>
-      <para>
-       This enables a user-supplied input
-       filter. The mechanisms of these filters are described below.
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term>grs.marc.<emphasis>abstract syntax</emphasis></term>
-     <listitem>
-      <para>
-       This allows Zebra to read
-       records in the ISO2709 (MARC) encoding standard. In this case, the
-       last paramemeter <emphasis>abstract syntax</emphasis> names the
-       <literal>.abs</literal> file (see below)
-       which describes the specific MARC structure of the input record as
-       well as the indexing rules.
-      </para>
-     </listitem>
-    </varlistentry>
-   </variablelist>
-  </para>
-
-  <sect2>
-   <title>Canonical Input Format</title>
  
     <para>
  
     <para>
-    Although input data can take any form, it is sometimes useful to
-    describe the record processing capabilities of the system in terms of
-    a single, canonical input format that gives access to the full
-    spectrum of structure and flexibility in the system. In Zebra, this
-    canonical format is an "SGML-like" syntax.
-   </para>
+     The <literal>recordIdOpaque</literal> string parameter
+     is an client-supplied, opaque record
+     identifier, which may be  used under 
+     insert, update and delete operations. The
+     client software is responsible for assigning these to
+     records.      This identifier will
+     replace zebra's own automagic identifier generation with a unique
+     mapping from <literal>recordIdOpaque</literal> to the 
+     &zebra; internal <literal>recordIdNumber</literal>.
+     <emphasis>The opaque <literal>recordIdOpaque</literal> string
+     identifiers
+      are not visible in retrieval records, nor are
+      searchable, so the value of this parameter is
+      questionable. It serves mostly as a convenient mapping from
+      application domain string identifiers to &zebra; internal ID's.
+     </emphasis> 
+    </para>
+   </sect2>
+
+   
+ <sect2 id="administration-extended-services-yaz-client">
+  <title>Extended services from yaz-client</title>
  
     <para>
  
     <para>
-    To use the canonical format specify <literal>grs.sgml</literal> as
-    the record type.
+    We can now start a yaz-client admin session and create a database:
+   <screen>
+    <![CDATA[
+     $ yaz-client localhost:9999 -u admin/secret
+     Z> adm-create
+     ]]>
+   </screen>
+    Now the <literal>Default</literal> database was created,
+    we can insert an &acro.xml; file (esdd0006.grs
+    from example/gils/records) and index it:
+   <screen>  
+    <![CDATA[
+     Z> update insert id1234 esdd0006.grs
+     ]]>
+   </screen>
+    The 3rd parameter - <literal>id1234</literal> here -
+      is the  <literal>recordIdOpaque</literal> package field.
     </para>
     </para>
-
     <para>
     <para>
-    Consider a record describing an information resource (such a record is
-    sometimes known as a <emphasis>locator record</emphasis>).
-    It might contain a field describing the distributor of the
-    information resource, which might in turn be partitioned into
-    various fields providing details about the distributor, like this:
+    Actually, we should have a way to specify "no opaque record id" for
+    yaz-client's update command.. We'll fix that.
     </para>
     </para>
-
     <para>
     <para>
-
+    The newly inserted record can be searched as usual:
      <screen>
      <screen>
-     &#60;Distributor&#62;
-     &#60;Name&#62; USGS/WRD &#60;/Name&#62;
-     &#60;Organization&#62; USGS/WRD &#60;/Organization&#62;
-     &#60;Street-Address&#62;
-     U.S. GEOLOGICAL SURVEY, 505 MARQUETTE, NW
-     &#60;/Street-Address&#62;
-     &#60;City&#62; ALBUQUERQUE &#60;/City&#62;
-     &#60;State&#62; NM &#60;/State&#62;
-     &#60;Zip-Code&#62; 87102 &#60;/Zip-Code&#62;
-     &#60;Country&#62; USA &#60;/Country&#62;
-     &#60;Telephone&#62; (505) 766-5560 &#60;/Telephone&#62;
-     &#60;/Distributor&#62;
+    <![CDATA[
+     Z> f utah
+     Sent searchRequest.
+     Received SearchResponse.
+     Search was a success.
+     Number of hits: 1, setno 1
+     SearchResult-1: term=utah cnt=1
+     records returned: 0
+     Elapsed: 0.014179
+     ]]>
      </screen>
      </screen>
-
     </para>
     </para>
-
-   <note>
-   <para>
-    The indentation used above is used to illustrate how Zebra
-     interprets the markup. The indentation, in itself, has no
-     significance to the parser for the canonical input format, which
-     discards superfluous whitespace.
-    </para>
-   </note>
     <para>
     <para>
-    The keywords surrounded by &lt;...&gt; are
-    <emphasis>tags</emphasis>, while the sections of text
-    in between are the <emphasis>data elements</emphasis>.
-    A data element is characterized by its location in the tree
-    that is made up by the nested elements.
-    Each element is terminated by a closing tag - beginning
-    with <literal>&#60;</literal>/, and containing the same symbolic
-    tag-name as the corresponding opening tag.
-    The general closing tag - <literal>&#60;</literal>&gt;/ -
-    terminates the element started by the last opening tag. The
-    structuring of elements is significant.
-    The element <emphasis>Telephone</emphasis>,
-    for instance, may be indexed and presented to the client differently,
-    depending on whether it appears inside the
-    <emphasis>Distributor</emphasis> element, or some other,
-    structured data element such a <emphasis>Supplier</emphasis> element.
-   </para>
-
-   <sect3>
-    <title>Record Root</title>
-
-    <para>
-     The first tag in a record describes the root node of the tree that
-     makes up the total record. In the canonical input format, the root tag
-     should contain the name of the schema that lends context to the
-     elements of the record (see section
-     <xref linkend="internal-representation"/>).
-      The following is a GILS record that
-      contains only a single element (strictly speaking, that makes it an
-      illegal GILS record, since the GILS profile includes several mandatory
-      elements - Zebra does not validate the contents of a record against
-      the Z39.50 profile, however - it merely attempts to match up elements
-      of a local representation with the given schema):
-    </para>
-
-    <para>
-
-     <screen>
-      &#60;gils&#62;
-      &#60;title&#62;Zen and the Art of Motorcycle Maintenance&#60;/title&#62;
-      &#60;/gils&#62;
-     </screen>
-
-    </para>
-
-   </sect3>
-
-   <sect3>
-    <title>Variants</title>
-
-    <para>
-     Zebra allows you to provide individual data elements in a number of
-     <emphasis>variant forms</emphasis>. Examples of variant forms are
-     textual data elements which might appear in different languages, and
-     images which may appear in different formats or layouts.
-     The variant system in Zebra is essentially a representation of
-     the variant mechanism of Z39.50-1995.
-    </para>
-
-    <para>
-     The following is an example of a title element which occurs in two
-     different languages.
-    </para>
-
-    <para>
-
-     <screen>
-      &#60;title&#62;
-      &#60;var lang lang "eng"&#62;
-      Zen and the Art of Motorcycle Maintenance&#60;/&#62;
-      &#60;var lang lang "dan"&#62;
-      Zen og Kunsten at Vedligeholde en Motorcykel&#60;/&#62;
-      &#60;/title&#62;
-     </screen>
-
-    </para>
-
-    <para>
-     The syntax of the <emphasis>variant element</emphasis> is
-     <literal>&lt;var class type value&gt;</literal>.
-     The available values for the <emphasis>class</emphasis> and
-     <emphasis>type</emphasis> fields are given by the variant set
-     that is associated with the current schema
-     (see section <xref linkend="variant-set"/>).
-    </para>
-
-    <para>
-     Variant elements are terminated by the general end-tag &#60;/&#62;, by
-     the variant end-tag &#60;/var&#62;, by the appearance of another variant
-     tag with the same <emphasis>class</emphasis> and
-     <emphasis>value</emphasis> settings, or by the
-     appearance of another, normal tag. In other words, the end-tags for
-     the variants used in the example above could have been saved.
-    </para>
-
-    <para>
-     Variant elements can be nested. The element
-    </para>
-
-    <para>
-
-     <screen>
-      &#60;title&#62;
-      &#60;var lang lang "eng"&#62;&#60;var body iana "text/plain"&#62;
-      Zen and the Art of Motorcycle Maintenance
-      &#60;/title&#62;
-     </screen>
-
-    </para>
-
-    <para>
-     Associates two variant components to the variant list for the title
-     element.
-    </para>
-
-    <para>
-     Given the nesting rules described above, we could write
-    </para>
-
-    <para>
-
-     <screen>
-      &#60;title&#62;
-      &#60;var body iana "text/plain&#62;
-      &#60;var lang lang "eng"&#62;
-      Zen and the Art of Motorcycle Maintenance
-      &#60;var lang lang "dan"&#62;
-      Zen og Kunsten at Vedligeholde en Motorcykel
-      &#60;/title&#62;
+     Let's delete the beast, using the same 
+     <literal>recordIdOpaque</literal> string parameter:
+    <screen>
+    <![CDATA[
+     Z> update delete id1234
+     No last record (update ignored)
+     Z> update delete 1 esdd0006.grs
+     Got extended services response
+     Status: done
+     Elapsed: 0.072441
+     Z> f utah
+     Sent searchRequest.
+     Received SearchResponse.
+     Search was a success.
+     Number of hits: 0, setno 2
+     SearchResult-1: term=utah cnt=0
+     records returned: 0
+     Elapsed: 0.013610
+     ]]>
       </screen>
       </screen>
-
      </para>
      </para>
-
      <para>
      <para>
-     The title element above comes in two variants. Both have the IANA body
-     type "text/plain", but one is in English, and the other in
-     Danish. The client, using the element selection mechanism of Z39.50,
-     can retrieve information about the available variant forms of data
-     elements, or it can select specific variants based on the requirements
-     of the end-user.
-    </para>
-
-   </sect3>
-
-  </sect2>
-
-  <sect2>
-   <title>Input Filters</title>
-
-   <para>
-    In order to handle general input formats, Zebra allows the
-    operator to define filters which read individual records in their
-    native format and produce an internal representation that the system
-    can work with.
-   </para>
-
-   <para>
-    Input filters are ASCII files, generally with the suffix
-    <literal>.flt</literal>.
-    The system looks for the files in the directories given in the
-    <emphasis>profilePath</emphasis> setting in the
-    <literal>zebra.cfg</literal> files.
-    The record type for the filter is
-    <literal>grs.regx.</literal><emphasis>filter-filename</emphasis>
-    (fundamental type <literal>grs</literal>, file read
-    type <literal>regx</literal>, argument
-    <emphasis>filter-filename</emphasis>).
-   </para>
-   
-   <para>
-    Generally, an input filter consists of a sequence of rules, where each
-    rule consists of a sequence of expressions, followed by an action. The
-    expressions are evaluated against the contents of the input record,
-    and the actions normally contribute to the generation of an internal
-    representation of the record.
-   </para>
-   
-   <para>
-    An expression can be either of the following:
-   </para>
-
-   <para>
-    <variablelist>
-
-     <varlistentry>
-      <term>INIT</term>
-      <listitem>
-       <para>
-        The action associated with this expression is evaluated
-        exactly once in the lifetime of the application, before any records
-        are read. It can be used in conjunction with an action that
-        initializes tables or other resources that are used in the processing
-        of input records.
-       </para>
-      </listitem>
-     </varlistentry>
-     <varlistentry>
-      <term>BEGIN</term>
-      <listitem>
-       <para>
-        Matches the beginning of the record. It can be used to
-        initialize variables, etc. Typically, the
-        <emphasis>BEGIN</emphasis> rule is also used
-        to establish the root node of the record.
-       </para>
-      </listitem>
-     </varlistentry>
-     <varlistentry>
-      <term>END</term>
-      <listitem>
-       <para>
-        Matches the end of the record - when all of the contents
-        of the record has been processed.
-       </para>
-      </listitem>
-     </varlistentry>
-     <varlistentry>
-      <term>/pattern/</term>
-      <listitem>
-       <para>
-        Matches a string of characters from the input record.
-       </para>
-      </listitem>
-     </varlistentry>
-     <varlistentry>
-      <term>BODY</term>
-      <listitem>
-       <para>
-        This keyword may only be used between two patterns.
-        It matches everything between (not including) those patterns.
-       </para>
-      </listitem>
-     </varlistentry>
-     <varlistentry>
-      <term>FINISH</term>
-      <listitem>
-       <para>
-        The expression asssociated with this pattern is evaluated
-        once, before the application terminates. It can be used to release
-        system resources - typically ones allocated in the
-        <emphasis>INIT</emphasis> step.
-       </para>
-      </listitem>
-     </varlistentry>
-    </variablelist>
-   </para>
-
-   <para>
-    An action is surrounded by curly braces (&lcub;...&rcub;), and
-    consists of a sequence of statements. Statements may be separated
-    by newlines or semicolons (;).
-    Within actions, the strings that matched the expressions
-    immediately preceding the action can be referred to as
-    &dollar;0, &dollar;1, &dollar;2, etc.
-   </para>
-
-   <para>
-    The available statements are:
-   </para>
-
-   <para>
-    <variablelist>
-
-     <varlistentry>
-      <term>begin <emphasis>type &lsqb;parameter ... &rsqb;</emphasis></term>
-      <listitem>
-       <para>
-        Begin a new
-        data element. The type is one of the following:
-        <variablelist>
-
-         <varlistentry>
-          <term>record</term>
-          <listitem>
-           <para>
-            Begin a new record. The followingparameter should be the
-            name of the schema that describes the structure of the record, eg.
-            <literal>gils</literal> or <literal>wais</literal> (see below).
-            The <literal>begin record</literal> call should precede
-            any other use of the <emphasis>begin</emphasis> statement.
-           </para>
-          </listitem>
-         </varlistentry>
-         <varlistentry>
-          <term>element</term>
-          <listitem>
-           <para>
-            Begin a new tagged element. The parameter is the
-            name of the tag. If the tag is not matched anywhere in the tagsets
-            referenced by the current schema, it is treated as a local string
-            tag.
-           </para>
-          </listitem>
-         </varlistentry>
-         <varlistentry>
-          <term>variant</term>
-          <listitem>
-           <para>
-            Begin a new node in a variant tree. The parameters are
-            <emphasis>class type value</emphasis>.
-           </para>
-          </listitem>
-         </varlistentry>
-        </variablelist>
-       </para>
-      </listitem>
-     </varlistentry>
-     <varlistentry>
-      <term>data</term>
-      <listitem>
-       <para>
-        Create a data element. The concatenated arguments make
-        up the value of the data element.
-        The option <literal>-text</literal> signals that
-        the layout (whitespace) of the data should be retained for
-        transmission.
-        The option <literal>-element</literal>
-        <emphasis>tag</emphasis> wraps the data up in
-        the <emphasis>tag</emphasis>.
-        The use of the <literal>-element</literal> option is equivalent to
-        preceding the command with a <emphasis>begin
-         element</emphasis> command, and following
-        it with the <emphasis>end</emphasis> command.
-       </para>
-      </listitem>
-     </varlistentry>
-     <varlistentry>
-      <term>end <emphasis>&lsqb;type&rsqb;</emphasis></term>
-      <listitem>
-       <para>
-        Close a tagged element. If no parameter is given,
-        the last element on the stack is terminated.
-        The first parameter, if any, is a type name, similar
-        to the <emphasis>begin</emphasis> statement.
-        For the <emphasis>element</emphasis> type, a tag
-        name can be provided to terminate a specific tag.
-       </para>
-      </listitem>
-     </varlistentry>
-    </variablelist>
-   </para>
-
-   <para>
-    The following input filter reads a Usenet news file, producing a
-    record in the WAIS schema. Note that the body of a news posting is
-    separated from the list of headers by a blank line (or rather a
-    sequence of two newline characters.
-   </para>
-
-   <para>
-
+    If shadow register is enabled in your
+    <filename>zebra.cfg</filename>,
+    you must run the adm-commit command
      <screen>
      <screen>
-     BEGIN                { begin record wais }
-
-     /^From:/ BODY /$/    { data -element name $1 }
-     /^Subject:/ BODY /$/ { data -element title $1 }
-     /^Date:/ BODY /$/    { data -element lastModified $1 }
-     /\n\n/ BODY END      {
-         begin element bodyOfDisplay
-         begin variant body iana "text/plain"
-         data -text $1
-         end record
-       }
+    <![CDATA[
+     Z> adm-commit
+     ]]>
      </screen>
      </screen>
-
+     after each update session in order write your changes from the
+     shadow to the life register space.
     </para>
     </para>
+ </sect2>
  
  
-   <para>
-    If Zebra is compiled with support for Tcl (Tool Command Language)
-    enabled, the statements described above are supplemented with a complete
-    scripting environment, including control structures (conditional
-    expressions and loop constructs), and powerful string manipulation
-    mechanisms for modifying the elements of a record. Tcl is a popular
-    scripting environment, with several tutorials available both online
-    and in hardcopy.
-   </para>
+  
+ <sect2 id="administration-extended-services-yaz-php">
+  <title>Extended services from yaz-php</title>
  
     <para>
  
     <para>
-    <emphasis>NOTE: Tcl support is not currently available, but will be
-     included with one of the next alpha or beta releases.</emphasis>
-   </para>
+    Extended services are also available from the &yaz; &acro.php; client layer. An
+    example of an &yaz;-&acro.php; extended service transaction is given here:
+    <screen>
+    <![CDATA[
+     $record = '<record><title>A fine specimen of a record</title></record>';
+
+     $options = array('action' => 'recordInsert',
+                      'syntax' => 'xml',
+                      'record' => $record,
+                      'databaseName' => 'mydatabase'
+                     );
+
+     yaz_es($yaz, 'update', $options);
+     yaz_es($yaz, 'commit', array());
+     yaz_wait();
+
+     if ($error = yaz_error($yaz))
+       echo "$error";
+     ]]>
+    </screen>  
+    </para>
+    </sect2>
  
  
-   <para>
-    <emphasis>NOTE: Variant support is not currently available in the input
-     filter, but will be included with one of the next alpha or beta
-     releases.</emphasis>
-   </para>
+   <sect2 id="administration-extended-services-debugging">
+    <title>Extended services debugging guide</title>
+    <para>
+     When debugging ES over PHP we recommend the following order of tests:
+    </para>
  
  
-  </sect2>
+    <itemizedlist>
+     <listitem>
+      <para>
+       Make sure you have a nice record on your filesystem, which you can 
+       index from the filesystem by use of the zebraidx command.
+       Do it exactly as you planned, using one of the GRS-1 filters,
+       or the DOMXML filter. 
+       When this works, proceed.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Check that your server setup is OK before you even coded one single 
+       line PHP using ES.
+       Take the same record form the file system, and send as ES via 
+       <literal>yaz-client</literal> like described in
+       <xref linkend="administration-extended-services-yaz-client"/>,
+       and
+       remember the <literal>-a</literal> option which tells you what
+       goes over the wire! Notice also the section on permissions:
+       try 
+       <screen>
+        perm.anonymous: rw
+       </screen>
+       in <literal>zebra.cfg</literal> to make sure you do not run into 
+       permission  problems (but never expose such an insecure setup on the 
+       internet!!!). Then, make sure to set the general
+       <literal>recordType</literal> instruction, pointing correctly
+       to the GRS-1 filters,
+       or the DOMXML filters.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       If you insist on using the <literal>sysno</literal> in the 
+       <literal>recordIdNumber</literal> setting, 
+       please make sure you do only updates and deletes. Zebra's internal 
+       system number is not allowed for
+       <literal>recordInsert</literal> or 
+       <literal>specialUpdate</literal> actions 
+       which result in fresh record inserts.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       If <literal>shadow register</literal> is enabled in your 
+       <literal>zebra.cfg</literal>, you must remember running the 
+       <screen>
+        Z> adm-commit
+       </screen>
+       command as well.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       If this works, then proceed to do the same thing in your PHP script.
+      </para>
+     </listitem>
+    </itemizedlist>
  
  
- </sect1>
  
  
- <sect1 id="internal-representation">
-  <title>Internal Representation</title>
+   </sect2>
  
  
-  <para>
-   When records are manipulated by the system, they're represented in a
-   tree-structure, with data elements at the leaf nodes, and tags or
-   variant components at the non-leaf nodes. The root-node identifies the
-   schema that lends context to the tagging and structuring of the
-   record. Imagine a simple record, consisting of a 'title' element and
-   an 'author' element:
-  </para>
+ </sect1>
  
  
-  <para>
+</chapter>
  
  
-   <screen>
-    TITLE     "Zen and the Art of Motorcycle Maintenance"
-    ROOT 
-    AUTHOR    "Robert Pirsig"
-   </screen>
-
-  </para>
-
-  <para>
-   A slightly more complex record would have the author element consist
-   of two elements, a surname and a first name:
-  </para>
-
-  <para>
-
-   <screen>
-    TITLE     "Zen and the Art of Motorcycle Maintenance"
-    ROOT  
-    FIRST-NAME "Robert"
-    AUTHOR
-    SURNAME    "Pirsig"
-   </screen>
-
-  </para>
-
-  <para>
-   The root of the record will refer to the record schema that describes
-   the structuring of this particular record. The schema defines the
-   element tags (TITLE, FIRST-NAME, etc.) that may occur in the record, as
-   well as the structuring (SURNAME should appear below AUTHOR, etc.). In
-   addition, the schema establishes element set names that are used by
-   the client to request a subset of the elements of a given record. The
-   schema may also establish rules for converting the record to a
-   different schema, by stating, for each element, a mapping to a
-   different tag path.
-  </para>
-
-  <sect2>
-   <title>Tagged Elements</title>
-
-   <para>
-    A data element is characterized by its tag, and its position in the
-    structure of the record. For instance, while the tag "telephone
-    number" may be used different places in a record, we may need to
-    distinguish between these occurrences, both for searching and
-    presentation purposes. For instance, while the phone numbers for the
-    "customer" and the "service provider" are both
-    representatives for the same type of resource (a telephone number), it
-    is essential that they be kept separate. The record schema provides
-    the structure of the record, and names each data element (defined by
-    the sequence of tags - the tag path - by which the element can be
-    reached from the root of the record).
-   </para>
-
-  </sect2>
-
-  <sect2>
-   <title>Variants</title>
-
-   <para>
-    The children of a tag node may be either more tag nodes, a data node
-    (possibly accompanied by tag nodes),
-    or a tree of variant nodes. The children of  variant nodes are either
-    more variant nodes or a data node (possibly accompanied by more
-    variant nodes). Each leaf node, which is normally a
-    data node, corresponds to a <emphasis>variant form</emphasis> of the
-    tagged element identified by the tag which parents the variant tree.
-    The following title element occurs in two different languages:
-   </para>
-
-   <para>
-
-    <screen>
-     VARIANT LANG=ENG  "War and Peace"
-     TITLE
-     VARIANT LANG=DAN  "Krig og Fred"
-    </screen>
-
-   </para>
-
-   <para>
-    Which of the two elements are transmitted to the client by the server
-    depends on the specifications provided by the client, if any.
-   </para>
-
-   <para>
-    In practice, each variant node is associated with a triple of class,
-    type, value, corresponding to the variant mechanism of Z39.50.
-   </para>
-
-  </sect2>
-
-  <sect2>
-   <title>Data Elements</title>
-
-   <para>
-    Data nodes have no children (they are always leaf nodes in the record
-    tree).
-   </para>
-
-   <note>
-    <para>
-     Documentation needs extension here about types of nodes - numerical,
-     textual, etc., plus the various types of inclusion notes.
-    </para>
-   </note>
-   
-  </sect2>
-
- </sect1>
-
- <sect1 id="data-model">
-  <title>Configuring Your Data Model</title>
-
-  <para>
-   The following sections describe the configuration files that govern
-   the internal management of data records. The system searches for the files
-   in the directories specified by the <emphasis>profilePath</emphasis>
-   setting in the <literal>zebra.cfg</literal> file.
-  </para>
-
-  <sect2>
-   <title>The Abstract Syntax</title>
-
-   <para>
-    The abstract syntax definition (also known as an Abstract Record
-    Structure, or ARS) is the focal point of the
-    record schema description. For a given schema, the ABS file may state any
-    or all of the following:
-   </para>
-
-   <para>
-
-    <itemizedlist>
-     <listitem>
-
-      <para>
-       The object identifier of the Z39.50 schema associated
-       with the ARS, so that it can be referred to by the client.
-      </para>
-     </listitem>
-
-     <listitem>
-      <para>
-       The attribute set (which can possibly be a compound of multiple
-       sets) which applies in the profile. This is used when indexing and
-       searching the records belonging to the given profile.
-      </para>
-     </listitem>
-
-     <listitem>
-      <para>
-       The Tag set (again, this can consist of several different sets).
-       This is used when reading the records from a file, to recognize the
-       different tags, and when transmitting the record to the client -
-       mapping the tags to their numerical representation, if they are
-       known.
-      </para>
-     </listitem>
-
-     <listitem>
-      <para>
-       The variant set which is used in the profile. This provides a
-       vocabulary for specifying the <emphasis>forms</emphasis> of data that appear inside
-       the records.
-      </para>
-     </listitem>
-
-     <listitem>
-      <para>
-       Element set names, which are a shorthand way for the client to
-       ask for a subset of the data elements contained in a record. Element
-       set names, in the retrieval module, are mapped to <emphasis>element
-        specifications</emphasis>, which contain information equivalent to the
-       <emphasis>Espec-1</emphasis> syntax of Z39.50.
-      </para>
-     </listitem>
-
-     <listitem>
-      <para>
-       Map tables, which may specify mappings to
-       <emphasis>other</emphasis> database profiles, if desired.
-      </para>
-     </listitem>
-
-     <listitem>
-      <para>
-       Possibly, a set of rules describing the mapping of elements to a
-       MARC representation.
-
-      </para>
-     </listitem>
-
-     <listitem>      
-      <para>
-       A list of element descriptions (this is the actual ARS of the
-       schema, in Z39.50 terms), which lists the ways in which the various
-       tags can be used and organized hierarchically.
-      </para>
-     </listitem>
-
-    </itemizedlist>
-
-   </para>
-
-   <para>
-    Several of the entries above simply refer to other files, which
-    describe the given objects.
-   </para>
-
-  </sect2>
-
-  <sect2>
-   <title>The Configuration Files</title>
-
-   <para>
-    This section describes the syntax and use of the various tables which
-    are used by the retrieval module.
-   </para>
-
-   <para>
-    The number of different file types may appear daunting at first, but
-    each type corresponds fairly clearly to a single aspect of the Z39.50
-    retrieval facilities. Further, the average database administrator,
-    who is simply reusing an existing profile for which tables already
-    exist, shouldn't have to worry too much about the contents of these tables.
-   </para>
-
-   <para>
-    Generally, the files are simple ASCII files, which can be maintained
-    using any text editor. Blank lines, and lines beginning with a (&num;) are
-    ignored. Any characters on a line followed by a (&num;) are also ignored.
-    All other lines contain <emphasis>directives</emphasis>, which provide
-    some setting or value to the system.
-    Generally, settings are characterized by a single
-    keyword, identifying the setting, followed by a number of parameters.
-    Some settings are repeatable (r), while others may occur only once in a
-    file. Some settings are optional (o), whicle others again are
-    mandatory (m).
-   </para>
-   
-  </sect2>
-  
-  <sect2>
-   <title>The Abstract Syntax (.abs) Files</title>
-   
-   <para>
-    The name of this file type is slightly misleading in Z39.50 terms,
-    since, apart from the actual abstract syntax of the profile, it also
-    includes most of the other definitions that go into a database
-    profile.
-   </para>
-   
-   <para>
-    When a record in the canonical, SGML-like format is read from a file
-    or from the database, the first tag of the file should reference the
-    profile that governs the layout of the record. If the first tag of the
-    record is, say, <literal>&lt;gils&gt;</literal>, the system will look
-    for the profile definition in the file <literal>gils.abs</literal>.
-    Profile definitions are cached, so they only have to be read once
-    during the lifespan of the current process. 
-   </para>
-
-   <para>
-    When writing your own input filters, the
-    <emphasis>record-begin</emphasis> command
-    introduces the profile, and should always be called first thing when
-    introducing a new record.
-   </para>
-   
-   <para>
-    The file may contain the following directives:
-   </para>
-
-   <para>
-    <variablelist>
-
-     <varlistentry>
-      <term>name <emphasis>symbolic-name</emphasis></term>
-      <listitem>
-       <para>
-        (m) This provides a shorthand name or
-        description for the profile. Mostly useful for diagnostic purposes.
-       </para>
-      </listitem>
-     </varlistentry>
-     <varlistentry>
-      <term>reference <emphasis>OID-name</emphasis></term>
-      <listitem>
-       <para>
-        (m) The reference name of the OID for the profile.
-        The reference names can be found in the <emphasis>util</emphasis>
-        module of <emphasis>YAZ</emphasis>.
-       </para>
-      </listitem>
-     </varlistentry>
-     <varlistentry>
-      <term>attset <emphasis>filename</emphasis></term>
-      <listitem>
-       <para>
-        (m) The attribute set that is used for
-        indexing and searching records belonging to this profile.
-       </para>
-      </listitem>
-     </varlistentry>
-     <varlistentry>
-      <term>tagset <emphasis>filename</emphasis></term>
-      <listitem>
-       <para>
-        (o) The tag set (if any) that describe
-        that fields of the records.
-       </para>
-      </listitem>
-     </varlistentry>
-     <varlistentry>
-      <term>varset <emphasis>filename</emphasis></term>
-      <listitem>
-       <para>
-        (o) The variant set used in the profile.
-       </para>
-      </listitem>
-     </varlistentry>
-     <varlistentry>
-      <term>maptab <emphasis>filename</emphasis></term>
-      <listitem>
-       <para>
-        (o,r) This points to a
-        conversion table that might be used if the client asks for the record
-        in a different schema from the native one.
-       </para>
-      </listitem></varlistentry>
-     <varlistentry>
-      <term>marc <emphasis>filename</emphasis></term>
-      <listitem>
-       <para>
-        (o) Points to a file containing parameters
-        for representing the record contents in the ISO2709 syntax. Read the
-        description of the MARC representation facility below.
-       </para>
-      </listitem></varlistentry>
-     <varlistentry>
-      <term>esetname <emphasis>name filename</emphasis></term>
-      <listitem>
-       <para>
-        (o,r) Associates the
-        given element set name with an element selection file. If an (@) is
-        given in place of the filename, this corresponds to a null mapping for
-        the given element set name.
-       </para>
-      </listitem></varlistentry>
-     <varlistentry>
-      <term>any <emphasis>tags</emphasis></term>
-      <listitem>
-       <para>
-        (o) This directive specifies a list of attributes
-        which should be appended to the attribute list given for each
-        element. The effect is to make every single element in the abstract
-        syntax searchable by way of the given attributes. This directive
-        provides an efficient way of supporting free-text searching across all
-        elements. However, it does increase the size of the index
-        significantly. The attributes can be qualified with a structure, as in
-        the <emphasis>elm</emphasis> directive below.
-       </para>
-      </listitem></varlistentry>
-     <varlistentry>
-      <term>elm <emphasis>path name attributes</emphasis></term>
-      <listitem>
-       <para>
-        (o,r) Adds an element to the abstract record syntax of the schema.
-        The <emphasis>path</emphasis> follows the
-        syntax which is suggested by the Z39.50 document - that is, a sequence
-        of tags separated by slashes (/). Each tag is given as a
-        comma-separated pair of tag type and -value surrounded by parenthesis.
-        The <emphasis>name</emphasis> is the name of the element, and
-        the <emphasis>attributes</emphasis>
-        specifies which attributes to use when indexing the element in a
-        comma-separated list.
-        A ! in place of the attribute name is equivalent to
-        specifying an attribute name identical to the element name.
-        A - in place of the attribute name
-        specifies that no indexing is to take place for the given element.
-        The attributes can be qualified with <emphasis>field
-         types</emphasis> to specify which
-        character set should govern the indexing procedure for that field.
-        The same data element may be indexed into several different
-        fields, using different character set definitions.
-        See the section <xref linkend="field-structure-and-character-sets"/>.
-         The default field type is "w" for <emphasis>word</emphasis>.
-       </para>
-      </listitem></varlistentry>
-    </variablelist>
-   </para>
-
-   <note>
-   <para>
-     The mechanism for controlling indexing is not adequate for
-     complex databases, and will probably be moved into a separate
-     configuration table eventually.
-    </para>
-   </note>
-   
-   <para>
-    The following is an excerpt from the abstract syntax file for the GILS
-    profile.
-   </para>
-
-   <para>
-
-    <screen>
-     name gils
-     reference GILS-schema
-     attset gils.att
-     tagset gils.tag
-     varset var1.var
-
-     maptab gils-usmarc.map
-
-     # Element set names
-
-     esetname VARIANT gils-variant.est  # for WAIS-compliance
-     esetname B gils-b.est
-     esetname G gils-g.est
-     esetname F @
-
-     elm (1,10)              rank                        -
-     elm (1,12)              url                         -
-     elm (1,14)              localControlNumber     Local-number
-     elm (1,16)              dateOfLastModification Date/time-last-modified
-     elm (2,1)               title                       w:!,p:!
-     elm (4,1)               controlIdentifier      Identifier-standard
-     elm (2,6)               abstract               Abstract
-     elm (4,51)              purpose                     !
-     elm (4,52)              originator                  - 
-     elm (4,53)              accessConstraints           !
-     elm (4,54)              useConstraints              !
-     elm (4,70)              availability                -
-     elm (4,70)/(4,90)       distributor                 -
-     elm (4,70)/(4,90)/(2,7) distributorName             !
-     elm (4,70)/(4,90)/(2,10 distributorOrganization     !
-     elm (4,70)/(4,90)/(4,2) distributorStreetAddress    !
-     elm (4,70)/(4,90)/(4,3) distributorCity             !
-    </screen>
-
-   </para>
-
-  </sect2>
-
-  <sect2 id="attset-files">
-   <title>The Attribute Set (.att) Files</title>
-
-   <para>
-    This file type describes the <emphasis>Use</emphasis> elements of
-    an attribute set. 
-    It contains the following directives. 
-   </para>
-   
-   <para>
-    <variablelist>
-     <varlistentry>
-      <term>name <emphasis>symbolic-name</emphasis></term>
-      <listitem>
-       <para>
-        (m) This provides a shorthand name or
-        description for the attribute set.
-        Mostly useful for diagnostic purposes.
-       </para>
-      </listitem></varlistentry>
-     <varlistentry>
-      <term>reference <emphasis>OID-name</emphasis></term>
-      <listitem>
-       <para>
-        (m) The reference name of the OID for
-        the attribute set.
-        The reference names can be found in the <emphasis>util</emphasis>
-        module of <emphasis>YAZ</emphasis>.
-       </para>
-      </listitem></varlistentry>
-     <varlistentry>
-      <term>include <emphasis>filename</emphasis></term>
-      <listitem>
-       <para>
-        (o,r) This directive is used to
-        include another attribute set as a part of the current one. This is
-        used when a new attribute set is defined as an extension to another
-        set. For instance, many new attribute sets are defined as extensions
-        to the <emphasis>bib-1</emphasis> set.
-        This is an important feature of the retrieval
-        system of Z39.50, as it ensures the highest possible level of
-        interoperability, as those access points of your database which are
-        derived from the external set (say, bib-1) can be used even by clients
-        who are unaware of the new set.
-       </para>
-      </listitem></varlistentry>
-     <varlistentry>
-      <term>att
-       <emphasis>att-value att-name &lsqb;local-value&rsqb;</emphasis></term>
-      <listitem>
-       <para>
-        (o,r) This
-        repeatable directive introduces a new attribute to the set. The
-        attribute value is stored in the index (unless a
-        <emphasis>local-value</emphasis> is
-        given, in which case this is stored). The name is used to refer to the
-        attribute from the <emphasis>abstract syntax</emphasis>. 
-       </para>
-      </listitem></varlistentry>
-    </variablelist>
-   </para>
-
-   <para>
-    This is an excerpt from the GILS attribute set definition.
-    Notice how the file describing the <emphasis>bib-1</emphasis>
-    attribute set is referenced.
-   </para>
-
-   <para>
-
-    <screen>
-     name gils
-     reference GILS-attset
-     include bib1.att
-
-     att 2001          distributorName
-     att 2002          indextermsControlled
-     att 2003          purpose
-     att 2004          accessConstraints
-     att 2005          useConstraints
-    </screen>
-
-   </para>
-
-  </sect2>
-
-  <sect2>
-   <title>The Tag Set (.tag) Files</title>
-
-   <para>
-    This file type defines the tagset of the profile, possibly by
-    referencing other tag sets (most tag sets, for instance, will include
-    tagsetG and tagsetM from the Z39.50 specification. The file may
-    contain the following directives.
-   </para>
-
-   <para>
-    <variablelist>
-
-     <varlistentry>
-      <term>name <emphasis>symbolic-name</emphasis></term>
-      <listitem>
-       <para>
-        (m) This provides a shorthand name or
-        description for the tag set. Mostly useful for diagnostic purposes.
-       </para>
-      </listitem></varlistentry>
-     <varlistentry>
-      <term>reference <emphasis>OID-name</emphasis></term>
-      <listitem>
-       <para>
-        (o) The reference name of the OID for the tag set.
-        The reference names can be found in the <emphasis>util</emphasis>
-        module of <emphasis>YAZ</emphasis>.
-        The directive is optional, since not all tag sets
-        are registered outside of their schema.
-       </para>
-      </listitem></varlistentry>
-     <varlistentry>
-      <term>type <emphasis>integer</emphasis></term>
-      <listitem>
-       <para>
-        (m) The type number of the tagset within the schema
-        profile (note: this specification really should belong to the .abs
-        file. This will be fixed in a future release).
-       </para>
-      </listitem></varlistentry>
-     <varlistentry>
-      <term>include <emphasis>filename</emphasis></term>
-      <listitem>
-       <para>
-        (o,r) This directive is used
-        to include the definitions of other tag sets into the current one.
-       </para>
-      </listitem></varlistentry>
-     <varlistentry>
-      <term>tag <emphasis>number names type</emphasis></term>
-      <listitem>
-       <para>
-        (o,r) Introduces a new tag to the set.
-        The <emphasis>number</emphasis> is the tag number as used
-        in the protocol (there is currently no mechanism for
-        specifying string tags at this point, but this would be quick
-        work to add).
-        The <emphasis>names</emphasis> parameter is a list of names
-        by which the tag should be recognized in the input file format.
-        The names should be separated by slashes (/).
-        The <emphasis>type</emphasis> is th recommended datatype of
-        the tag.
-        It should be one of the following:
-
-        <itemizedlist>
-         <listitem>
-          <para>
-           structured
-          </para>
-         </listitem>
-
-         <listitem>
-          <para>
-           string
-          </para>
-         </listitem>
-
-         <listitem>
-          <para>
-           numeric
-          </para>
-         </listitem>
-
-         <listitem>
-          <para>
-           bool
-          </para>
-         </listitem>
-
-         <listitem>
-          <para>
-           oid
-          </para>
-         </listitem>
-
-         <listitem>
-          <para>
-           generalizedtime
-          </para>
-         </listitem>
-
-         <listitem>
-          <para>
-           intunit
-          </para>
-         </listitem>
-
-         <listitem>
-          <para>
-           int
-          </para>
-         </listitem>
-
-         <listitem>
-          <para>
-           octetstring
-          </para>
-         </listitem>
-
-         <listitem>
-          <para>
-           null
-          </para>
-         </listitem>
-
-        </itemizedlist>
-
-       </para>
-      </listitem></varlistentry>
-    </variablelist>
-   </para>
-
-   <para>
-    The following is an excerpt from the TagsetG definition file.
-   </para>
-
-   <para>
-    <screen>
-     name tagsetg
-     reference TagsetG
-     type 2
-
-     tag       1       title           string
-     tag       2       author          string
-     tag       3       publicationPlace string
-     tag       4       publicationDate string
-     tag       5       documentId      string
-     tag       6       abstract        string
-     tag       7       name            string
-     tag       8       date            generalizedtime
-     tag       9       bodyOfDisplay   string
-     tag       10      organization    string
-    </screen>
-   </para>
-
-  </sect2>
-
-  <sect2 id="variant-set">
-   <title>The Variant Set (.var) Files</title>
-
-   <para>
-    The variant set file is a straightforward representation of the
-    variant set definitions associated with the protocol. At present, only
-    the <emphasis>Variant-1</emphasis> set is known.
-   </para>
-
-   <para>
-    These are the directives allowed in the file.
-   </para>
-
-   <para>
-    <variablelist>
-
-     <varlistentry>
-      <term>name <emphasis>symbolic-name</emphasis></term>
-      <listitem>
-       <para>
-        (m) This provides a shorthand name or
-        description for the variant set. Mostly useful for diagnostic purposes.
-       </para>
-      </listitem></varlistentry>
-     <varlistentry>
-      <term>reference <emphasis>OID-name</emphasis></term>
-      <listitem>
-       <para>
-        (o) The reference name of the OID for
-        the variant set, if one is required. The reference names can be found
-        in the <emphasis>util</emphasis> module of <emphasis>YAZ</emphasis>.
-       </para>
-      </listitem></varlistentry>
-     <varlistentry>
-      <term>class <emphasis>integer class-name</emphasis></term>
-      <listitem>
-       <para>
-        (m,r) Introduces a new
-        class to the variant set.
-       </para>
-      </listitem></varlistentry>
-     <varlistentry>
-      <term>type <emphasis>integer type-name datatype</emphasis></term>
-      <listitem>
-       <para>
-        (m,r) Addes a
-        new type to the current class (the one introduced by the most recent
-        <emphasis>class</emphasis> directive).
-        The type names belong to the same name space as the one used
-        in the tag set definition file.
-       </para>
-      </listitem></varlistentry>
-    </variablelist>
-   </para>
-
-   <para>
-    The following is an excerpt from the file describing the variant set
-    <emphasis>Variant-1</emphasis>.
-   </para>
-
-   <para>
-
-    <screen>
-     name variant-1
-     reference Variant-1
-
-     class 1 variantId
-
-     type      1       variantId               octetstring
-
-     class 2 body
-
-     type      1       iana                    string
-     type      2       z39.50                  string
-     type      3       other                   string
-    </screen>
-
-   </para>
-
-  </sect2>
-
-  <sect2>
-   <title>The Element Set (.est) Files</title>
-
-   <para>
-    The element set specification files describe a selection of a subset
-    of the elements of a database record. The element selection mechanism
-    is equivalent to the one supplied by the <emphasis>Espec-1</emphasis>
-    syntax of the Z39.50 specification.
-    In fact, the internal representation of an element set
-    specification is identical to the <emphasis>Espec-1</emphasis> structure,
-    and we'll refer you to the description of that structure for most of
-    the detailed semantics of the directives below.
-   </para>
-
-   <note>
-    <para>
-     Not all of the Espec-1 functionality has been implemented yet.
-     The fields that are mentioned below all work as expected, unless
-     otherwise is noted.
-    </para>
-   </note>
-   
-   <para>
-    The directives available in the element set file are as follows:
-   </para>
-
-   <para>
-    <variablelist>
-     <varlistentry>
-      <term>defaultVariantSetId <emphasis>OID-name</emphasis></term>
-      <listitem>
-       <para>
-        (o) If variants are used in
-        the following, this should provide the name of the variantset used
-        (it's not currently possible to specify a different set in the
-        individual variant request). In almost all cases (certainly all
-        profiles known to us), the name
-        <literal>Variant-1</literal> should be given here.
-       </para>
-      </listitem></varlistentry>
-     <varlistentry>
-      <term>defaultVariantRequest <emphasis>variant-request</emphasis></term>
-      <listitem>
-       <para>
-        (o) This directive
-        provides a default variant request for
-        use when the individual element requests (see below) do not contain a
-        variant request. Variant requests consist of a blank-separated list of
-        variant components. A variant compont is a comma-separated,
-        parenthesized triple of variant class, type, and value (the two former
-        values being represented as integers). The value can currently only be
-        entered as a string (this will change to depend on the definition of
-        the variant in question). The special value (@) is interpreted as a
-        null value, however.
-       </para>
-      </listitem></varlistentry>
-     <varlistentry>
-      <term>simpleElement
-       <emphasis>path &lsqb;'variant' variant-request&rsqb;</emphasis></term>
-      <listitem>
-       <para>
-        (o,r) This corresponds to a simple element request
-        in <emphasis>Espec-1</emphasis>.
-        The path consists of a sequence of tag-selectors, where each of
-        these can consist of either:
-       </para>
-
-       <para>
-        <itemizedlist>
-         <listitem>
-          <para>
-           A simple tag, consisting of a comma-separated type-value pair in
-           parenthesis, possibly followed by a colon (:) followed by an
-           occurrences-specification (see below). The tag-value can be a number
-           or a string. If the first character is an apostrophe ('), this
-           forces the value to be interpreted as a string, even if it
-           appears to be numerical.
-          </para>
-         </listitem>
-
-         <listitem>
-          <para>
-           A WildThing, represented as a question mark (?), possibly
-           followed by a colon (:) followed by an occurrences
-           specification (see below).
-          </para>
-         </listitem>
-
-         <listitem>
-          <para>
-           A WildPath, represented as an asterisk (*). Note that the last
-           element of the path should not be a wildPath (wildpaths don't
-           work in this version).
-          </para>
-         </listitem>
-
-        </itemizedlist>
-
-       </para>
-
-       <para>
-        The occurrences-specification can be either the string
-        <literal>all</literal>, the string <literal>last</literal>, or
-        an explicit value-range. The value-range is represented as
-        an integer (the starting point), possibly followed by a
-        plus (+) and a second integer (the number of elements, default
-        being one).
-       </para>
-
-       <para>
-        The variant-request has the same syntax as the defaultVariantRequest
-        above. Note that it may sometimes be useful to give an empty variant
-        request, simply to disable the default for a specific set of fields
-        (we aren't certain if this is proper <emphasis>Espec-1</emphasis>,
-        but it works in this implementation).
-       </para>
-      </listitem></varlistentry>
-    </variablelist>
-   </para>
-
-   <para>
-    The following is an example of an element specification belonging to
-    the GILS profile.
-   </para>
-
-   <para>
-
-    <screen>
-     simpleelement (1,10)
-     simpleelement (1,12)
-     simpleelement (2,1)
-     simpleelement (1,14)
-     simpleelement (4,1)
-     simpleelement (4,52)
-    </screen>
-
-   </para>
-
-  </sect2>
-
-  <sect2 id="schema-mapping">
-   <title>The Schema Mapping (.map) Files</title>
-
-   <para>
-    Sometimes, the client might want to receive a database record in
-    a schema that differs from the native schema of the record. For
-    instance, a client might only know how to process WAIS records, while
-    the database record is represented in a more specific schema, such as
-    GILS. In this module, a mapping of data to one of the MARC formats is
-    also thought of as a schema mapping (mapping the elements of the
-    record into fields consistent with the given MARC specification, prior
-    to actually converting the data to the ISO2709). This use of the
-    object identifier for USMARC as a schema identifier represents an
-    overloading of the OID which might not be entirely proper. However,
-    it represents the dual role of schema and record syntax which
-    is assumed by the MARC family in Z39.50.
-   </para>
-
-   <para>
-    <emphasis>NOTE: The schema-mapping functions are so far limited to a
-     straightforward mapping of elements. This should be extended with
-     mechanisms for conversions of the element contents, and conditional
-     mappings of elements based on the record contents.</emphasis>
-   </para>
-
-   <para>
-    These are the directives of the schema mapping file format:
-   </para>
-
-   <para>
-    <variablelist>
-
-     <varlistentry>
-      <term>targetName <emphasis>name</emphasis></term>
-      <listitem>
-       <para>
-        (m) A symbolic name for the target schema
-        of the table. Useful mostly for diagnostic purposes.
-       </para>
-      </listitem></varlistentry>
-     <varlistentry>
-      <term>targetRef <emphasis>OID-name</emphasis></term>
-      <listitem>
-       <para>
-        (m) An OID name for the target schema.
-        This is used, for instance, by a server receiving a request to present
-        a record in a different schema from the native one.
-        The name, again, is found in the <emphasis>oid</emphasis>
-        module of <emphasis>YAZ</emphasis>.
-       </para>
-      </listitem></varlistentry>
-     <varlistentry>
-      <term>map <emphasis>element-name target-path</emphasis></term>
-      <listitem>
-       <para>
-        (o,r) Adds
-        an element mapping rule to the table.
-       </para>
-      </listitem></varlistentry>
-    </variablelist>
-   </para>
-
-  </sect2>
-
-  <sect2>
-   <title>The MARC (ISO2709) Representation (.mar) Files</title>
-
-   <para>
-    This file provides rules for representing a record in the ISO2709
-    format. The rules pertain mostly to the values of the constant-length
-    header of the record.
-   </para>
-
-   <para>
-    <emphasis>NOTE: This will be described better. We're in the process of
-     re-evaluating and most likely changing the way that MARC records are
-     handled by the system.</emphasis>
-   </para>
-
-  </sect2>
-
-  <sect2 id="field-structure-and-character-sets">
-   <title>Field Structure and Character Sets
-   </title>
-
-   <para>
-    In order to provide a flexible approach to national character set
-    handling, Zebra allows the administrator to configure the set up the
-    system to handle any 8-bit character set &mdash; including sets that
-    require multi-octet diacritics or other multi-octet characters. The
-    definition of a character set includes a specification of the
-    permissible values, their sort order (this affects the display in the
-    SCAN function), and relationships between upper- and lowercase
-    characters. Finally, the definition includes the specification of
-    space characters for the set.
-   </para>
-
-   <para>
-    The operator can define different character sets for different fields,
-    typical examples being standard text fields, numerical fields, and
-    special-purpose fields such as WWW-style linkages (URx).
-   </para>
-
-   <para>
-    The field types, and hence character sets, are associated with data
-    elements by the .abs files (see above).
-    The file <literal>default.idx</literal>
-    provides the association between field type codes (as used in the .abs
-    files) and the character map files (with the .chr suffix). The format
-    of the .idx file is as follows
-   </para>
-
-   <para>
-    <variablelist>
-
-     <varlistentry>
-      <term>index <emphasis>field type code</emphasis></term>
-      <listitem>
-       <para>
-        This directive introduces a new search index code.
-        The argument is a one-character code to be used in the
-        .abs files to select this particular index type. An index, roughly,
-        corresponds to a particular structure attribute during search. Refer
-        to section <xref linkend="search"/>.
-       </para>
-      </listitem></varlistentry>
-     <varlistentry>
-      <term>sort <emphasis>field code type</emphasis></term>
-      <listitem>
-       <para>
-        This directive introduces a 
-        sort index. The argument is a one-character code to be used in the
-        .abs fie to select this particular index type. The corresponding
-        use attribute must be used in the sort request to refer to this
-        particular sort index. The corresponding character map (see below)
-        is used in the sort process.
-       </para>
-      </listitem></varlistentry>
-     <varlistentry>
-      <term>completeness <emphasis>boolean</emphasis></term>
-      <listitem>
-       <para>
-        This directive enables or disables complete field indexing.
-        The value of the <emphasis>boolean</emphasis> should be 0
-        (disable) or 1. If completeness is enabled, the index entry will
-        contain the complete contents of the field (up to a limit), with words
-        (non-space characters) separated by single space characters
-        (normalized to " " on display). When completeness is
-        disabled, each word is indexed as a separate entry. Complete subfield
-        indexing is most useful for fields which are typically browsed (eg.
-        titles, authors, or subjects), or instances where a match on a
-        complete subfield is essential (eg. exact title searching). For fields
-        where completeness is disabled, the search engine will interpret a
-        search containing space characters as a word proximity search.
-       </para>
-      </listitem></varlistentry>
-     <varlistentry>
-      <term>charmap <emphasis>filename</emphasis></term>
-      <listitem>
-       <para>
-        This is the filename of the character
-        map to be used for this index for field type.
-       </para>
-      </listitem></varlistentry>
-    </variablelist>
-   </para>
-
-   <para>
-    The contents of the character map files are structured as follows:
-   </para>
-
-   <para>
-    <variablelist>
-
-     <varlistentry>
-      <term>lowercase <emphasis>value-set</emphasis></term>
-      <listitem>
-       <para>
-        This directive introduces the basic value set of the field type.
-        The format is an ordered list (without spaces) of the
-        characters which may occur in "words" of the given type.
-        The order of the entries in the list determines the
-        sort order of the index. In addition to single characters, the
-        following combinations are legal:
-       </para>
-
-       <para>
-
-        <itemizedlist>
-         <listitem>
-          <para>
-           Backslashes may be used to introduce three-digit octal, or
-           two-digit hex representations of single characters
-           (preceded by <literal>x</literal>).
-           In addition, the combinations
-           \\, \\r, \\n, \\t, \\s (space &mdash; remember that real
-           space-characters may ot occur in the value definition), and
-           \\ are recognised, with their usual interpretation.
-          </para>
-         </listitem>
-
-         <listitem>
-          <para>
-           Curly braces &lcub;&rcub; may be used to enclose ranges of single
-           characters (possibly using the escape convention described in the
-           preceding point), eg. &lcub;a-z&rcub; to entroduce the
-           standard range of ASCII characters.
-           Note that the interpretation of such a range depends on
-           the concrete representation in your local, physical character set.
-          </para>
-         </listitem>
-
-         <listitem>
-          <para>
-           paranthesises () may be used to enclose multi-byte characters -
-           eg. diacritics or special national combinations (eg. Spanish
-           "ll"). When found in the input stream (or a search term),
-           these characters are viewed and sorted as a single character, with a
-           sorting value depending on the position of the group in the value
-           statement.
-          </para>
-         </listitem>
-
-        </itemizedlist>
-
-       </para>
-      </listitem></varlistentry>
-     <varlistentry>
-      <term>uppercase <emphasis>value-set</emphasis></term>
-      <listitem>
-       <para>
-        This directive introduces the
-        upper-case equivalencis to the value set (if any). The number and
-        order of the entries in the list should be the same as in the
-        <literal>lowercase</literal> directive.
-       </para>
-      </listitem></varlistentry>
-     <varlistentry>
-      <term>space <emphasis>value-set</emphasis></term>
-      <listitem>
-       <para>
-        This directive introduces the character
-        which separate words in the input stream. Depending on the
-        completeness mode of the field in question, these characters either
-        terminate an index entry, or delimit individual "words" in
-        the input stream. The order of the elements is not significant &mdash;
-        otherwise the representation is the same as for the
-        <literal>uppercase</literal> and <literal>lowercase</literal>
-        directives.
-       </para>
-      </listitem></varlistentry>
-     <varlistentry>
-      <term>map <emphasis>value-set</emphasis>
-       <emphasis>target</emphasis></term>
-      <listitem>
-       <para>
-        This directive introduces a
-        mapping between each of the members of the value-set on the left to
-        the character on the right. The character on the right must occur in
-        the value set (the <literal>lowercase</literal> directive) of
-        the character set, but
-        it may be a paranthesis-enclosed multi-octet character. This directive
-        may be used to map diacritics to their base characters, or to map
-        HTML-style character-representations to their natural form, etc.
-       </para>
-      </listitem></varlistentry>
-    </variablelist>
-   </para>
-
-  </sect2>
-
- </sect1>
-
- <sect1 id="formats">
-  <title>Exchange Formats</title>
-
-  <para>
-   Converting records from the internal structure to en exchange format
-   is largely an automatic process. Currently, the following exchange
-   formats are supported:
-  </para>
-
-  <para>
-   <itemizedlist>
-    <listitem>
-     <para>
-      GRS-1. The internal representation is based on GRS-1, so the
-      conversion here is straightforward. The system will create
-      applied variant and supported variant lists as required, if a record
-      contains variant information.
-     </para>
-    </listitem>
-
-    <listitem>
-     <para>
-      SUTRS. Again, the mapping is fairly straighforward. Indentation
-      is used to show the hierarchical structure of the record. All
-      "GRS" type records support both the GRS-1 and SUTRS
-      representations.
-     </para>
-    </listitem>
-
-    <listitem>
-     <para>
-      ISO2709-based formats (USMARC, etc.). Only records with a
-      two-level structure (corresponding to fields and subfields) can be
-      directly mapped to ISO2709. For records with a different structuring
-      (eg., GILS), the representation in a structure like USMARC involves a
-      schema-mapping (see section <xref linkend="schema-mapping"/>), to an
-       "implied" USMARC schema (implied,
-       because there is no formal schema which specifies the use of the
-       USMARC fields outside of ISO2709). The resultant, two-level record is
-       then mapped directly from the internal representation to ISO2709. See
-       the GILS schema definition files for a detailed example of this
-       approach.
-     </para>
-    </listitem>
-
-    <listitem>
-     <para>
-      Explain. This representation is only available for records
-      belonging to the Explain schema.
-     </para>
-    </listitem>
-
-    <listitem>
-     <para>
-      Summary.  This ASN-1 based structure is only available for records
-      belonging to the Summary schema - or schema which provide a mapping
-      to this schema (see the description of the schema mapping facility
-      above).
-     </para>
-    </listitem>
-
-    <listitem>
-     <para>
-      SOIF. Support for this syntax is experimental, and is currently
-      keyed to a private Index Data OID (1.2.840.10003.5.1000.81.2). All
-      abstract syntaxes can be mapped to the SOIF format, although nested
-      elements are represented by concatenation of the tag names at each
-      level.
-     </para>
-    </listitem>
-
-   </itemizedlist>
-
-  </para>
-
- </sect1>
-
-</chapter>
   <!-- Keep this comment at the end of the file
   Local variables:
   mode: sgml
   <!-- Keep this comment at the end of the file
   Local variables:
   mode: sgml