Fix typo in path to oai-pmh example

[idzebra-moved-to-github.git] / doc / administration.xml
diff --git a/doc/administration.xml b/doc/administration.xml

index 13baec6..b95db66 100644 (file)
--- a/doc/administration.xml
+++ b/doc/administration.xml
@@ -1,290 +1,289 @@
-<chapter id="administration">
- <!-- $Id: administration.xml,v 1.49 2007-02-02 09:58:39 marc Exp $ -->
- <title>Administrating &zebra;</title>
- <!-- ### It's a bit daft that this chapter (which describes half of
-          the configuration-file formats) is separated from
-          "recordmodel-grs.xml" (which describes the other half) by the
-          instructions on running zebraidx and zebrasrv.  Some careful
-          re-ordering is required here.
- -->
+ <chapter id="administration">
+  <title>Administrating &zebra;</title>
+  <!-- ### It's a bit daft that this chapter (which describes half of
+  the configuration-file formats) is separated from
+  "recordmodel-grs.xml" (which describes the other half) by the
+  instructions on running zebraidx and zebrasrv.  Some careful
+  re-ordering is required here.
+  -->
  
  
- <para>
-  Unlike many simpler retrieval systems, &zebra; supports safe, incremental
-  updates to an existing index.
- </para>
- 
- <para>
-  Normally, when &zebra; modifies the index it reads a number of records
-  that you specify.
-  Depending on your specifications and on the contents of each record
-  one the following events take place for each record:
-  <variablelist>
-   
-   <varlistentry>
-    <term>Insert</term>
-    <listitem>
-     <para>
-      The record is indexed as if it never occurred before.
-      Either the &zebra; system doesn't know how to identify the record or
-      &zebra; can identify the record but didn't find it to be already indexed.
-     </para>
-    </listitem>
-   </varlistentry>
-   <varlistentry>
-    <term>Modify</term>
-    <listitem>
-     <para>
-      The record has already been indexed.
-      In this case either the contents of the record or the location
-      (file) of the record indicates that it has been indexed before.
-     </para>
-    </listitem>
-   </varlistentry>
-   <varlistentry>
-    <term>Delete</term>
-    <listitem>
-     <para>
-      The record is deleted from the index. As in the
-      update-case it must be able to identify the record.
-     </para>
-    </listitem>
-   </varlistentry>
-  </variablelist>
- </para>
- 
- <para>
-  Please note that in both the modify- and delete- case the &zebra;
-  indexer must be able to generate a unique key that identifies the record 
-  in question (more on this below).
- </para>
- 
- <para>
-  To administrate the &zebra; retrieval system, you run the
-  <literal>zebraidx</literal> program.
-  This program supports a number of options which are preceded by a dash,
-  and a few commands (not preceded by dash).
-</para>
- 
- <para>
-  Both the &zebra; administrative tool and the Z39.50 server share a
-  set of index files and a global configuration file.
-  The name of the configuration file defaults to
-  <literal>zebra.cfg</literal>.
-  The configuration file includes specifications on how to index
-  various kinds of records and where the other configuration files
-  are located. <literal>zebrasrv</literal> and <literal>zebraidx</literal>
-  <emphasis>must</emphasis> be run in the directory where the
-  configuration file lives unless you indicate the location of the 
-  configuration file by option <literal>-c</literal>.
- </para>
- 
- <sect1 id="record-types">
-  <title>Record Types</title>
-  
-  <para>
-   Indexing is a per-record process, in which either insert/modify/delete
-   will occur. Before a record is indexed search keys are extracted from
-   whatever might be the layout the original record (sgml,html,text, etc..).
-   The &zebra; system currently supports two fundamental types of records:
-   structured and simple text.
-   To specify a particular extraction process, use either the
-   command line option <literal>-t</literal> or specify a
-   <literal>recordType</literal> setting in the configuration file.
-  </para>
-  
- </sect1>
- 
- <sect1 id="zebra-cfg">
-  <title>The &zebra; Configuration File</title>
-  
-  <para>
-   The &zebra; configuration file, read by <literal>zebraidx</literal> and
-   <literal>zebrasrv</literal> defaults to <literal>zebra.cfg</literal>
-   unless specified by <literal>-c</literal> option.
-  </para>
-  
-  <para>
-   You can edit the configuration file with a normal text editor.
-   parameter names and values are separated by colons in the file. Lines
-   starting with a hash sign (<literal>#</literal>) are
-   treated as comments.
-  </para>
-  
-  <para>
-   If you manage different sets of records that share common
-   characteristics, you can organize the configuration settings for each
-   type into "groups".
-   When <literal>zebraidx</literal> is run and you wish to address a
-   given group you specify the group name with the <literal>-g</literal>
-   option.
-   In this case settings that have the group name as their prefix 
-   will be used by <literal>zebraidx</literal>.
-   If no <literal>-g</literal> option is specified, the settings
-   without prefix are used.
-  </para>
-  
-  <para>
-   In the configuration file, the group name is placed before the option
-   name itself, separated by a dot (.). For instance, to set the record type
-   for group <literal>public</literal> to <literal>grs.sgml</literal>
-   (the SGML-like format for structured records) you would write:
-  </para>
-  
-  <para>
-   <screen>
-    public.recordType: grs.sgml
-   </screen>   
-  </para>
-  
-  <para>
-   To set the default value of the record type to <literal>text</literal>
-   write:
-  </para>
-  
    <para>
    <para>
-   <screen>
-    recordType: text
-   </screen>
+   Unlike many simpler retrieval systems, &zebra; supports safe, incremental
+   updates to an existing index.
    </para>
    </para>
-  
-  <para>
-   The available configuration settings are summarized below. They will be
-   explained further in the following sections.
-  </para>
-  
-  <!--
-   FIXME - Didn't Adam make something to have multiple databases in multiple dirs...
-  -->
-  
+
    <para>
    <para>
+   Normally, when &zebra; modifies the index it reads a number of records
+   that you specify.
+   Depending on your specifications and on the contents of each record
+   one the following events take place for each record:
     <variablelist>
     <variablelist>
-    
-    <varlistentry>
-     <term>
-      <emphasis>group</emphasis>
-      .recordType[<emphasis>.name</emphasis>]:
-      <replaceable>type</replaceable>
-     </term>
-     <listitem>
-      <para>
-       Specifies how records with the file extension
-       <emphasis>name</emphasis> should be handled by the indexer.
-       This option may also be specified as a command line option
-       (<literal>-t</literal>). Note that if you do not specify a
-       <emphasis>name</emphasis>, the setting applies to all files.
-       In general, the record type specifier consists of the elements (each
-       element separated by dot), <emphasis>fundamental-type</emphasis>,
-       <emphasis>file-read-type</emphasis> and arguments. Currently, two
-       fundamental types exist, <literal>text</literal> and
-       <literal>grs</literal>.
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term><emphasis>group</emphasis>.recordId: 
-     <replaceable>record-id-spec</replaceable></term>
-     <listitem>
-      <para>
-       Specifies how the records are to be identified when updated. See
-       <xref linkend="locating-records"/>.
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term><emphasis>group</emphasis>.database:
-     <replaceable>database</replaceable></term>
-     <listitem>
-      <para>
-       Specifies the Z39.50 database name.
-       <!-- FIXME - now we can have multiple databases in one server. -H -->
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term><emphasis>group</emphasis>.storeKeys:
-     <replaceable>boolean</replaceable></term>
-     <listitem>
-      <para>
-       Specifies whether key information should be saved for a given
-       group of records. If you plan to update/delete this type of
-       records later this should be specified as 1; otherwise it
-       should be 0 (default), to save register space.
-       <!-- ### this is the first mention of "register" -->
-       See <xref linkend="file-ids"/>.
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term><emphasis>group</emphasis>.storeData:
-      <replaceable>boolean</replaceable></term>
-     <listitem>
-      <para>
-       Specifies whether the records should be stored internally
-       in the &zebra; system files.
-       If you want to maintain the raw records yourself,
-       this option should be false (0).
-       If you want &zebra; to take care of the records for you, it
-       should be true(1).
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <!-- ### probably a better place to define "register" -->
-     <term>register: <replaceable>register-location</replaceable></term>
-     <listitem>
-      <para>
-       Specifies the location of the various register files that &zebra; uses
-       to represent your databases.
-       See <xref linkend="register-location"/>.
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term>shadow: <replaceable>register-location</replaceable></term>
-     <listitem>
-      <para>
-       Enables the <emphasis>safe update</emphasis> facility of &zebra;, and
-       tells the system where to place the required, temporary files.
-       See <xref linkend="shadow-registers"/>.
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term>lockDir: <replaceable>directory</replaceable></term>
-     <listitem>
-      <para>
-       Directory in which various lock files are stored.
-      </para>
-     </listitem>
-    </varlistentry>
+
      <varlistentry>
      <varlistentry>
-     <term>keyTmpDir: <replaceable>directory</replaceable></term>
+     <term>Insert</term>
       <listitem>
        <para>
       <listitem>
        <para>
-       Directory in which temporary files used during zebraidx's update
-       phase are stored. 
+       The record is indexed as if it never occurred before.
+       Either the &zebra; system doesn't know how to identify the record or
+       &zebra; can identify the record but didn't find it to be already indexed.
        </para>
       </listitem>
      </varlistentry>
      <varlistentry>
        </para>
       </listitem>
      </varlistentry>
      <varlistentry>
-     <term>setTmpDir: <replaceable>directory</replaceable></term>
+     <term>Modify</term>
       <listitem>
        <para>
       <listitem>
        <para>
-       Specifies the directory that the server uses for temporary result sets.
-       If not specified <literal>/tmp</literal> will be used.
+       The record has already been indexed.
+       In this case either the contents of the record or the location
+       (file) of the record indicates that it has been indexed before.
        </para>
       </listitem>
      </varlistentry>
      <varlistentry>
        </para>
       </listitem>
      </varlistentry>
      <varlistentry>
-     <term>profilePath: <replaceable>path</replaceable></term>
+     <term>Delete</term>
       <listitem>
        <para>
       <listitem>
        <para>
-       Specifies a path of profile specification files. 
-       The path is composed of one or more directories separated by
-       colon. Similar to <literal>PATH</literal> for UNIX systems.
+       The record is deleted from the index. As in the
+       update-case it must be able to identify the record.
        </para>
       </listitem>
      </varlistentry>
        </para>
       </listitem>
      </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+   Please note that in both the modify- and delete- case the &zebra;
+   indexer must be able to generate a unique key that identifies the record
+   in question (more on this below).
+  </para>
+
+  <para>
+   To administrate the &zebra; retrieval system, you run the
+   <literal>zebraidx</literal> program.
+   This program supports a number of options which are preceded by a dash,
+   and a few commands (not preceded by dash).
+  </para>
+
+  <para>
+   Both the &zebra; administrative tool and the &acro.z3950; server share a
+   set of index files and a global configuration file.
+   The name of the configuration file defaults to
+   <literal>zebra.cfg</literal>.
+   The configuration file includes specifications on how to index
+   various kinds of records and where the other configuration files
+   are located. <literal>zebrasrv</literal> and <literal>zebraidx</literal>
+   <emphasis>must</emphasis> be run in the directory where the
+   configuration file lives unless you indicate the location of the
+   configuration file by option <literal>-c</literal>.
+  </para>
+
+  <sect1 id="record-types">
+   <title>Record Types</title>
+
+   <para>
+    Indexing is a per-record process, in which either insert/modify/delete
+    will occur. Before a record is indexed search keys are extracted from
+    whatever might be the layout the original record (sgml,html,text, etc..).
+    The &zebra; system currently supports two fundamental types of records:
+    structured and simple text.
+    To specify a particular extraction process, use either the
+    command line option <literal>-t</literal> or specify a
+    <literal>recordType</literal> setting in the configuration file.
+   </para>
+
+  </sect1>
+
+  <sect1 id="zebra-cfg">
+   <title>The &zebra; Configuration File</title>
+
+   <para>
+    The &zebra; configuration file, read by <literal>zebraidx</literal> and
+    <literal>zebrasrv</literal> defaults to <literal>zebra.cfg</literal>
+    unless specified by <literal>-c</literal> option.
+   </para>
+
+   <para>
+    You can edit the configuration file with a normal text editor.
+    parameter names and values are separated by colons in the file. Lines
+    starting with a hash sign (<literal>#</literal>) are
+    treated as comments.
+   </para>
+
+   <para>
+    If you manage different sets of records that share common
+    characteristics, you can organize the configuration settings for each
+    type into "groups".
+    When <literal>zebraidx</literal> is run and you wish to address a
+    given group you specify the group name with the <literal>-g</literal>
+    option.
+    In this case settings that have the group name as their prefix
+    will be used by <literal>zebraidx</literal>.
+    If no <literal>-g</literal> option is specified, the settings
+    without prefix are used.
+   </para>
+
+   <para>
+    In the configuration file, the group name is placed before the option
+    name itself, separated by a dot (.). For instance, to set the record type
+    for group <literal>public</literal> to <literal>grs.sgml</literal>
+    (the &acro.sgml;-like format for structured records) you would write:
+   </para>
+
+   <para>
+    <screen>
+     public.recordType: grs.sgml
+    </screen>
+   </para>
+
+   <para>
+    To set the default value of the record type to <literal>text</literal>
+    write:
+   </para>
+
+   <para>
+    <screen>
+     recordType: text
+    </screen>
+   </para>
+
+   <para>
+    The available configuration settings are summarized below. They will be
+    explained further in the following sections.
+   </para>
+
+   <!--
+   FIXME - Didn't Adam make something to have multiple databases in multiple dirs...
+   -->
+
+   <para>
+    <variablelist>
+
+     <varlistentry>
+      <term>
+       <emphasis>group</emphasis>
+       .recordType[<emphasis>.name</emphasis>]:
+       <replaceable>type</replaceable>
+      </term>
+      <listitem>
+       <para>
+       Specifies how records with the file extension
+       <emphasis>name</emphasis> should be handled by the indexer.
+       This option may also be specified as a command line option
+       (<literal>-t</literal>). Note that if you do not specify a
+       <emphasis>name</emphasis>, the setting applies to all files.
+       In general, the record type specifier consists of the elements (each
+       element separated by dot), <emphasis>fundamental-type</emphasis>,
+       <emphasis>file-read-type</emphasis> and arguments. Currently, two
+       fundamental types exist, <literal>text</literal> and
+       <literal>grs</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term><emphasis>group</emphasis>.recordId:
+       <replaceable>record-id-spec</replaceable></term>
+      <listitem>
+       <para>
+       Specifies how the records are to be identified when updated. See
+       <xref linkend="locating-records"/>.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term><emphasis>group</emphasis>.database:
+       <replaceable>database</replaceable></term>
+      <listitem>
+       <para>
+       Specifies the &acro.z3950; database name.
+       <!-- FIXME - now we can have multiple databases in one server. -H -->
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term><emphasis>group</emphasis>.storeKeys:
+       <replaceable>boolean</replaceable></term>
+      <listitem>
+       <para>
+       Specifies whether key information should be saved for a given
+       group of records. If you plan to update/delete this type of
+       records later this should be specified as 1; otherwise it
+       should be 0 (default), to save register space.
+       <!-- ### this is the first mention of "register" -->
+       See <xref linkend="file-ids"/>.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term><emphasis>group</emphasis>.storeData:
+       <replaceable>boolean</replaceable></term>
+      <listitem>
+       <para>
+       Specifies whether the records should be stored internally
+       in the &zebra; system files.
+       If you want to maintain the raw records yourself,
+       this option should be false (0).
+       If you want &zebra; to take care of the records for you, it
+       should be true(1).
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <!-- ### probably a better place to define "register" -->
+      <term>register: <replaceable>register-location</replaceable></term>
+      <listitem>
+       <para>
+       Specifies the location of the various register files that &zebra; uses
+       to represent your databases.
+       See <xref linkend="register-location"/>.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term>shadow: <replaceable>register-location</replaceable></term>
+      <listitem>
+       <para>
+       Enables the <emphasis>safe update</emphasis> facility of &zebra;, and
+       tells the system where to place the required, temporary files.
+       See <xref linkend="shadow-registers"/>.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term>lockDir: <replaceable>directory</replaceable></term>
+      <listitem>
+       <para>
+       Directory in which various lock files are stored.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term>keyTmpDir: <replaceable>directory</replaceable></term>
+      <listitem>
+       <para>
+       Directory in which temporary files used during zebraidx's update
+       phase are stored.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term>setTmpDir: <replaceable>directory</replaceable></term>
+      <listitem>
+       <para>
+       Specifies the directory that the server uses for temporary result sets.
+       If not specified <literal>/tmp</literal> will be used.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term>profilePath: <replaceable>path</replaceable></term>
+      <listitem>
+       <para>
+       Specifies a path of profile specification files.
+       The path is composed of one or more directories separated by
+       colon. Similar to <literal>PATH</literal> for UNIX systems.
+       </para>
+      </listitem>
+     </varlistentry>
  
       <varlistentry>
        <term>modulePath: <replaceable>path</replaceable></term>
  
       <varlistentry>
        <term>modulePath: <replaceable>path</replaceable></term>
@@ -300,6 +299,32 @@
       </varlistentry>
  
       <varlistentry>
       </varlistentry>
  
       <varlistentry>
+      <term>index: <replaceable>filename</replaceable></term>
+      <listitem>
+       <para>
+       Defines the filename which holds fields structure
+       definitions. If omitted, the file <filename>default.idx</filename>
+       is read.
+       Refer to <xref linkend="default-idx-file"/> for
+       more information.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term>sortmax: <replaceable>integer</replaceable></term>
+      <listitem>
+       <para>
+       Specifies the maximum number of records that will be sorted
+       in a result set.  If the result set contains more than
+       <replaceable>integer</replaceable> records, records after the
+       limit will not be sorted.  If omitted, the default value is
+       1,000.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
        <term>staticrank: <replaceable>integer</replaceable></term>
        <listitem>
         <para>
        <term>staticrank: <replaceable>integer</replaceable></term>
        <listitem>
         <para>
@@ -313,13 +338,13 @@
  
  
       <varlistentry>
  
  
       <varlistentry>
-      <term>estimatehits:: <replaceable>integer</replaceable></term>
+      <term>estimatehits: <replaceable>integer</replaceable></term>
        <listitem>
         <para>
        <listitem>
         <para>
-       Controls whether &zebra; should calculate approximite hit counts and
+       Controls whether &zebra; should calculate approximate hit counts and
         at which hit count it is to be enabled.
         at which hit count it is to be enabled.
-       A value of 0 disables approximiate hit counts.
-       For a positive value approximaite hit count is enabled
+       A value of 0 disables approximate hit counts.
+       For a positive value approximate hit count is enabled
         if it is known to be larger than <replaceable>integer</replaceable>.
         </para>
         <para>
         if it is known to be larger than <replaceable>integer</replaceable>.
         </para>
         <para>
@@ -330,1169 +355,1119 @@
        </listitem>
       </varlistentry>
  
        </listitem>
       </varlistentry>
  
-    <varlistentry>
-     <term>attset: <replaceable>filename</replaceable></term>
-     <listitem>
-      <para>
+     <varlistentry>
+      <term>attset: <replaceable>filename</replaceable></term>
+      <listitem>
+       <para>
         Specifies the filename(s) of attribute set files for use in
         searching. In many configurations <filename>bib1.att</filename>
         is used, but that is not required. If Classic Explain
         attributes is to be used for searching,
         <filename>explain.att</filename> must be given.
         Specifies the filename(s) of attribute set files for use in
         searching. In many configurations <filename>bib1.att</filename>
         is used, but that is not required. If Classic Explain
         attributes is to be used for searching,
         <filename>explain.att</filename> must be given.
-       The path to att-files in general can be given using 
+       The path to att-files in general can be given using
         <literal>profilePath</literal> setting.
         See also <xref linkend="attset-files"/>.
         <literal>profilePath</literal> setting.
         See also <xref linkend="attset-files"/>.
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term>memMax: <replaceable>size</replaceable></term>
-     <listitem>
-      <para>
-       Specifies <replaceable>size</replaceable> of internal memory
-       to use for the zebraidx program.
-       The amount is given in megabytes - default is 4 (4 MB).
-       The more memory, the faster large updates happen, up to about
-       half the free memory available on the computer.
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term>tempfiles: <replaceable>Yes/Auto/No</replaceable></term>
-     <listitem>
-      <para>
-       Tells zebra if it should use temporary files when indexing. The
-       default is Auto, in which case zebra uses temporary files only
-       if it would need more that <replaceable>memMax</replaceable> 
-       megabytes of memory. This should be good for most uses.
-      </para>
-     </listitem>
-    </varlistentry>
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term>memMax: <replaceable>size</replaceable></term>
+      <listitem>
+       <para>
+       Specifies <replaceable>size</replaceable> of internal memory
+       to use for the zebraidx program.
+       The amount is given in megabytes - default is 4 (4 MB).
+       The more memory, the faster large updates happen, up to about
+       half the free memory available on the computer.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term>tempfiles: <replaceable>Yes/Auto/No</replaceable></term>
+      <listitem>
+       <para>
+       Tells zebra if it should use temporary files when indexing. The
+       default is Auto, in which case zebra uses temporary files only
+       if it would need more that <replaceable>memMax</replaceable>
+       megabytes of memory. This should be good for most uses.
+       </para>
+      </listitem>
+     </varlistentry>
  
  
-    <varlistentry>
-     <term>root: <replaceable>dir</replaceable></term>
-     <listitem>
-      <para>
-       Specifies a directory base for &zebra;. All relative paths
-       given (in profilePath, register, shadow) are based on this
-       directory. This setting is useful if your &zebra; server
-       is running in a different directory from where
-       <literal>zebra.cfg</literal> is located.
-      </para>
-     </listitem>
-    </varlistentry>
+     <varlistentry>
+      <term>root: <replaceable>dir</replaceable></term>
+      <listitem>
+       <para>
+       Specifies a directory base for &zebra;. All relative paths
+       given (in profilePath, register, shadow) are based on this
+       directory. This setting is useful if your &zebra; server
+       is running in a different directory from where
+       <literal>zebra.cfg</literal> is located.
+       </para>
+      </listitem>
+     </varlistentry>
  
  
-    <varlistentry>
-     <term>passwd: <replaceable>file</replaceable></term>
-     <listitem>
-      <para>
-       Specifies a file with description of user accounts for &zebra;.
-       The format is similar to that known to Apache's htpasswd files
-       and UNIX' passwd files. Non-empty lines not beginning with
-       # are considered account lines. There is one account per-line.
-       A line consists of fields separate by a single colon character.
-       First field is username, second is password.
-      </para>
-     </listitem>
-    </varlistentry>
+     <varlistentry>
+      <term>passwd: <replaceable>file</replaceable></term>
+      <listitem>
+       <para>
+       Specifies a file with description of user accounts for &zebra;.
+       The format is similar to that known to Apache's htpasswd files
+       and UNIX' passwd files. Non-empty lines not beginning with
+       # are considered account lines. There is one account per-line.
+       A line consists of fields separate by a single colon character.
+       First field is username, second is password.
+       </para>
+      </listitem>
+     </varlistentry>
  
  
-    <varlistentry>
-     <term>passwd.c: <replaceable>file</replaceable></term>
-     <listitem>
-      <para>
-       Specifies a file with description of user accounts for &zebra;.
-       File format is similar to that used by the passwd directive except
-       that the password are encrypted. Use Apache's htpasswd or similar
-       for maintenance.
-      </para>
-     </listitem>
-    </varlistentry>
+     <varlistentry>
+      <term>passwd.c: <replaceable>file</replaceable></term>
+      <listitem>
+       <para>
+       Specifies a file with description of user accounts for &zebra;.
+       File format is similar to that used by the passwd directive except
+       that the password are encrypted. Use Apache's htpasswd or similar
+       for maintenance.
+       </para>
+      </listitem>
+     </varlistentry>
  
  
-    <varlistentry>
-     <term>perm.<replaceable>user</replaceable>:
-     <replaceable>permstring</replaceable></term>
-     <listitem>
-      <para>
-       Specifies permissions (priviledge) for a user that are allowed
-       to access &zebra; via the passwd system. There are two kinds
-       of permissions currently: read (r) and write(w). By default
-       users not listed in a permission directive are given the read
-       privilege. To specify permissions for a user with no
-       username, or Z39.50 anonymous style use
+     <varlistentry>
+      <term>perm.<replaceable>user</replaceable>:
+       <replaceable>permstring</replaceable></term>
+      <listitem>
+       <para>
+       Specifies permissions (privilege) for a user that are allowed
+       to access &zebra; via the passwd system. There are two kinds
+       of permissions currently: read (r) and write(w). By default
+       users not listed in a permission directive are given the read
+       privilege. To specify permissions for a user with no
+       username, or &acro.z3950; anonymous style use
         <literal>anonymous</literal>. The permstring consists of
         <literal>anonymous</literal>. The permstring consists of
-       a sequence of characters. Include character <literal>w</literal>
-       for write/update access, <literal>r</literal> for read access and
-       <literal>a</literal> to allow anonymous access through this account.
-      </para>
-     </listitem>
-    </varlistentry>
+       a sequence of characters. Include character <literal>w</literal>
+       for write/update access, <literal>r</literal> for read access and
+       <literal>a</literal> to allow anonymous access through this account.
+       </para>
+      </listitem>
+     </varlistentry>
  
  
-    <varlistentry>
-      <term>dbaccess <replaceable>accessfile</replaceable></term>
+     <varlistentry>
+      <term>dbaccess: <replaceable>accessfile</replaceable></term>
        <listitem>
        <listitem>
-        <para>
-         Names a file which lists database subscriptions for individual users.
-         The access file should consists of lines of the form <literal>username:
-         dbnames</literal>, where dbnames is a list of database names, seprated by
-         '+'. No whitespace is allowed in the database list.
-       </para>
+       <para>
+       Names a file which lists database subscriptions for individual users.
+       The access file should consists of lines of the form
+       <literal>username: dbnames</literal>, where dbnames is a list of
+       database names, separated by '+'. No whitespace is allowed in the
+       database list.
+       </para>
        </listitem>
        </listitem>
-    </varlistentry>
+     </varlistentry>
  
  
-   </variablelist>
-  </para>
-  
- </sect1>
- 
- <sect1 id="locating-records">
-  <title>Locating Records</title>
-  
-  <para>
-   The default behavior of the &zebra; system is to reference the
-   records from their original location, i.e. where they were found when you
-   run <literal>zebraidx</literal>.
-   That is, when a client wishes to retrieve a record
-   following a search operation, the files are accessed from the place
-   where you originally put them - if you remove the files (without
-   running <literal>zebraidx</literal> again, the server will return
-   diagnostic number 14 (``System error in presenting records'') to
-   the client.
-  </para>
-  
-  <para>
-   If your input files are not permanent - for example if you retrieve
-   your records from an outside source, or if they were temporarily
-   mounted on a CD-ROM drive,
-   you may want &zebra; to make an internal copy of them. To do this,
-   you specify 1 (true) in the <literal>storeData</literal> setting. When
-   the Z39.50 server retrieves the records they will be read from the
-   internal file structures of the system.
-  </para>
-  
- </sect1>
- 
- <sect1 id="simple-indexing">
-  <title>Indexing with no Record IDs (Simple Indexing)</title>
-  
-  <para>
-   If you have a set of records that are not expected to change over time
-   you may can build your database without record IDs.
-   This indexing method uses less space than the other methods and
-   is simple to use. 
-  </para>
-  
-  <para>
-   To use this method, you simply omit the <literal>recordId</literal> entry
-   for the group of files that you index. To add a set of records you use
-   <literal>zebraidx</literal> with the <literal>update</literal> command. The
-   <literal>update</literal> command will always add all of the records that it
-   encounters to the index - whether they have already been indexed or
-   not. If the set of indexed files change, you should delete all of the
-   index files, and build a new index from scratch.
-  </para>
-  
-  <para>
-   Consider a system in which you have a group of text files called
-   <literal>simple</literal>.
-   That group of records should belong to a Z39.50 database called
-   <literal>textbase</literal>.
-   The following <literal>zebra.cfg</literal> file will suffice:
-  </para>
-  <para>
-   
-   <screen>
-    profilePath: /usr/local/idzebra/tab
-    attset: bib1.att
-    simple.recordType: text
-    simple.database: textbase
-   </screen>
+     <varlistentry>
+      <term>encoding: <replaceable>charsetname</replaceable></term>
+      <listitem>
+       <para>
+       Tells &zebra; to interpret the terms in Z39.50 queries as
+       having been encoded using the specified character
+       encoding.  The default is <literal>ISO-8859-1</literal>; one
+       useful alternative is <literal>UTF-8</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
  
  
-  </para>
-  
-  <para>
-   Since the existing records in an index can not be addressed by their
-   IDs, it is impossible to delete or modify records when using this method.
-  </para>
-  
- </sect1>
- 
- <sect1 id="file-ids">
-  <title>Indexing with File Record IDs</title>
-  
-  <para>
-   If you have a set of files that regularly change over time: Old files
-   are deleted, new ones are added, or existing files are modified, you
-   can benefit from using the <emphasis>file ID</emphasis>
-   indexing methodology.
-   Examples of this type of database might include an index of WWW
-   resources, or a USENET news spool area.
-   Briefly speaking, the file key methodology uses the directory paths
-   of the individual records as a unique identifier for each record.
-   To perform indexing of a directory with file keys, again, you specify
-   the top-level directory after the <literal>update</literal> command.
-   The command will recursively traverse the directories and compare
-   each one with whatever have been indexed before in that same directory.
-   If a file is new (not in the previous version of the directory) it
-   is inserted into the registers; if a file was already indexed and
-   it has been modified since the last update, the index is also
-   modified; if a file has been removed since the last
-   visit, it is deleted from the index.
-  </para>
-  
-  <para>
-   The resulting system is easy to administrate. To delete a record you
-   simply have to delete the corresponding file (say, with the
-   <literal>rm</literal> command). And to add records you create new
-   files (or directories with files). For your changes to take effect
-   in the register you must run <literal>zebraidx update</literal> with
-   the same directory root again. This mode of operation requires more
-   disk space than simpler indexing methods, but it makes it easier for
-   you to keep the index in sync with a frequently changing set of data.
-   If you combine this system with the <emphasis>safe update</emphasis>
-   facility (see below), you never have to take your server off-line for
-   maintenance or register updating purposes.
-  </para>
-  
-  <para>
-   To enable indexing with pathname IDs, you must specify
-   <literal>file</literal> as the value of <literal>recordId</literal>
-   in the configuration file. In addition, you should set
-   <literal>storeKeys</literal> to <literal>1</literal>, since the &zebra;
-   indexer must save additional information about the contents of each record
-   in order to modify the indexes correctly at a later time.
-  </para>
-  
-   <!--
-    FIXME - There must be a simpler way to do this with Adams string tags -H
-     -->
+     <varlistentry>
+      <term>storeKeys: <replaceable>value</replaceable></term>
+      <listitem>
+       <para>
+       Specifies whether &zebra; keeps a copy of indexed keys.
+       Use a value of 1 to enable; 0 to disable. If storeKeys setting is
+       omitted, it is enabled. Enabled storeKeys
+       are required for updating and deleting records.  Disable only
+       storeKeys to save space and only plan to index data once.
+       </para>
+      </listitem>
+     </varlistentry>
  
  
-  <para>
-   For example, to update records of group <literal>esdd</literal>
-   located below
-   <literal>/data1/records/</literal> you should type:
-   <screen>
-    $ zebraidx -g esdd update /data1/records
-   </screen>
-  </para>
-  
-  <para>
-   The corresponding configuration file includes:
-   <screen>
-    esdd.recordId: file
-    esdd.recordType: grs.sgml
-    esdd.storeKeys: 1
-   </screen>
-  </para>
-  
-  <note>
-   <para>You cannot start out with a group of records with simple
-    indexing (no record IDs as in the previous section) and then later
-    enable file record Ids. &zebra; must know from the first time that you
-    index the group that
-    the files should be indexed with file record IDs.
+     <varlistentry>
+      <term>storeData: <replaceable>value</replaceable></term>
+      <listitem>
+       <para>
+       Specifies whether &zebra; keeps a copy of indexed records.
+       Use a value of 1 to enable; 0 to disable. If storeData setting is
+       omitted, it is enabled. A storeData setting of 0 (disabled) makes
+       Zebra fetch records from the original locaction in the file
+       system using filename, file offset and file length. For the
+       DOM and ALVIS filter, the storeData setting is ignored.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    </variablelist>
     </para>
     </para>
-   </note>
-  
-  <para>
-   You cannot explicitly delete records when using this method (using the
-   <literal>delete</literal> command to <literal>zebraidx</literal>. Instead
-   you have to delete the files from the file system (or move them to a
-   different location)
-   and then run <literal>zebraidx</literal> with the
-   <literal>update</literal> command.
-  </para>
-  <!-- ### what happens if a file contains multiple records? -->
-</sect1>
- 
- <sect1 id="generic-ids">
-  <title>Indexing with General Record IDs</title>
-  
-  <para>
-   When using this method you construct an (almost) arbitrary, internal
-   record key based on the contents of the record itself and other system
-   information. If you have a group of records that explicitly associates
-   an ID with each record, this method is convenient. For example, the
-   record format may contain a title or a ID-number - unique within the group.
-   In either case you specify the Z39.50 attribute set and use-attribute
-   location in which this information is stored, and the system looks at
-   that field to determine the identity of the record.
-  </para>
-  
-  <para>
-   As before, the record ID is defined by the <literal>recordId</literal>
-   setting in the configuration file. The value of the record ID specification
-   consists of one or more tokens separated by whitespace. The resulting
-   ID is represented in the index by concatenating the tokens and
-   separating them by ASCII value (1).
-  </para>
-  
-  <para>
-   There are three kinds of tokens:
-   <variablelist>
-    
-    <varlistentry>
-     <term>Internal record info</term>
-     <listitem>
-      <para>
-       The token refers to a key that is
-       extracted from the record. The syntax of this token is
-       <literal>(</literal> <emphasis>set</emphasis> <literal>,</literal>
-       <emphasis>use</emphasis> <literal>)</literal>,
-       where <emphasis>set</emphasis> is the
-       attribute set name <emphasis>use</emphasis> is the
-       name or value of the attribute.
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term>System variable</term>
-     <listitem>
-      <para>
-       The system variables are preceded by
-       
-       <screen>
-        $
-       </screen>
-       and immediately followed by the system variable name, which
-       may one of
-       <variablelist>
-        
-        <varlistentry>
-         <term>group</term>
-         <listitem>
-          <para>
-           Group name.
-          </para>
-         </listitem>
-        </varlistentry>
-        <varlistentry>
-         <term>database</term>
-         <listitem>
-          <para>
-           Current database specified.
-          </para>
-         </listitem>
-        </varlistentry>
-        <varlistentry>
-         <term>type</term>
-         <listitem>
-          <para>
-           Record type.
-          </para>
-         </listitem>
-        </varlistentry>
-       </variablelist>
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term>Constant string</term>
-     <listitem>
-      <para>
-       A string used as part of the ID &mdash; surrounded
-       by single- or double quotes.
-      </para>
-     </listitem>
-    </varlistentry>
-   </variablelist>
-  </para>
-  
-  <para>
-   For instance, the sample GILS records that come with the &zebra;
-   distribution contain a unique ID in the data tagged Control-Identifier.
-   The data is mapped to the Bib-1 use attribute Identifier-standard
-   (code 1007). To use this field as a record id, specify
-   <literal>(bib1,Identifier-standard)</literal> as the value of the
-   <literal>recordId</literal> in the configuration file.
-   If you have other record types that uses the same field for a
-   different purpose, you might add the record type
-   (or group or database name) to the record id of the gils
-   records as well, to prevent matches with other types of records.
-   In this case the recordId might be set like this:
-   
-   <screen>
-    gils.recordId: $type (bib1,Identifier-standard)
-   </screen>
-   
-  </para>
-  
-  <para>
-   (see <xref linkend="grs"/>
-    for details of how the mapping between elements of your records and
-    searchable attributes is established).
-  </para>
-  
-  <para>
-   As for the file record ID case described in the previous section,
-   updating your system is simply a matter of running
-   <literal>zebraidx</literal>
-   with the <literal>update</literal> command. However, the update with general
-   keys is considerably slower than with file record IDs, since all files
-   visited must be (re)read to discover their IDs. 
-  </para>
-  
-  <para>
-   As you might expect, when using the general record IDs
-   method, you can only add or modify existing records with the
-   <literal>update</literal> command.
-   If you wish to delete records, you must use the,
-   <literal>delete</literal> command, with a directory as a parameter.
-   This will remove all records that match the files below that root
-   directory.
-  </para>
-  
- </sect1>
- 
- <sect1 id="register-location">
-  <title>Register Location</title>
-  
-  <para>
-   Normally, the index files that form dictionaries, inverted
-   files, record info, etc., are stored in the directory where you run
-   <literal>zebraidx</literal>. If you wish to store these, possibly large,
-   files somewhere else, you must add the <literal>register</literal>
-   entry to the <literal>zebra.cfg</literal> file.
-   Furthermore, the &zebra; system allows its file
-   structures to span multiple file systems, which is useful for
-   managing very large databases. 
-  </para>
-  
-  <para>
-   The value of the <literal>register</literal> setting is a sequence
-   of tokens. Each token takes the form:
-   
-   <screen>
-    <emphasis>dir</emphasis><literal>:</literal><emphasis>size</emphasis>. 
-   </screen>
-   
-   The <emphasis>dir</emphasis> specifies a directory in which index files
-   will be stored and the <emphasis>size</emphasis> specifies the maximum
-   size of all files in that directory. The &zebra; indexer system fills
-   each directory in the order specified and use the next specified
-   directories as needed.
-   The <emphasis>size</emphasis> is an integer followed by a qualifier
-   code, 
-   <literal>b</literal> for bytes,
-   <literal>k</literal> for kilobytes.
-   <literal>M</literal> for megabytes,
-   <literal>G</literal> for gigabytes.
-  </para>
-  
-  <para>
-   For instance, if you have allocated two disks for your register, and
-   the first disk is mounted
-   on <literal>/d1</literal> and has 2GB of free space and the
-   second, mounted on <literal>/d2</literal> has 3.6 GB, you could
-   put this entry in your configuration file:
-   
-   <screen>
-    register: /d1:2G /d2:3600M
-   </screen>
-   
-  </para>
-  
-  <para>
-   Note that &zebra; does not verify that the amount of space specified is
-   actually available on the directory (file system) specified - it is
-   your responsibility to ensure that enough space is available, and that
-   other applications do not attempt to use the free space. In a large
-   production system, it is recommended that you allocate one or more
-   file system exclusively to the &zebra; register files.
-  </para>
-  
- </sect1>
- 
- <sect1 id="shadow-registers">
-  <title>Safe Updating - Using Shadow Registers</title>
-  
-  <sect2 id="shadow-registers-description">
-   <title>Description</title>
-   
+
+  </sect1>
+
+  <sect1 id="locating-records">
+   <title>Locating Records</title>
+
     <para>
     <para>
-    The &zebra; server supports <emphasis>updating</emphasis> of the index
-    structures. That is, you can add, modify, or remove records from
-    databases managed by &zebra; without rebuilding the entire index.
-    Since this process involves modifying structured files with various
-    references between blocks of data in the files, the update process
-    is inherently sensitive to system crashes, or to process interruptions:
-    Anything but a successfully completed update process will leave the
-    register files in an unknown state, and you will essentially have no
-    recourse but to re-index everything, or to restore the register files
-    from a backup medium.
-    Further, while the update process is active, users cannot be
-    allowed to access the system, as the contents of the register files
-    may change unpredictably.
+    The default behavior of the &zebra; system is to reference the
+    records from their original location, i.e. where they were found when you
+    run <literal>zebraidx</literal>.
+    That is, when a client wishes to retrieve a record
+    following a search operation, the files are accessed from the place
+    where you originally put them - if you remove the files (without
+    running <literal>zebraidx</literal> again, the server will return
+    diagnostic number 14 (``System error in presenting records'') to
+    the client.
     </para>
     </para>
-   
+
     <para>
     <para>
-    You can solve these problems by enabling the shadow register system in
-    &zebra;.
-    During the updating procedure, <literal>zebraidx</literal> will temporarily
-    write changes to the involved files in a set of "shadow
-    files", without modifying the files that are accessed by the
-    active server processes. If the update procedure is interrupted by a
-    system crash or a signal, you simply repeat the procedure - the
-    register files have not been changed or damaged, and the partially
-    written shadow files are automatically deleted before the new updating
-    procedure commences.
+    If your input files are not permanent - for example if you retrieve
+    your records from an outside source, or if they were temporarily
+    mounted on a CD-ROM drive,
+    you may want &zebra; to make an internal copy of them. To do this,
+    you specify 1 (true) in the <literal>storeData</literal> setting. When
+    the &acro.z3950; server retrieves the records they will be read from the
+    internal file structures of the system.
     </para>
     </para>
-   
+
+  </sect1>
+
+  <sect1 id="simple-indexing">
+   <title>Indexing with no Record IDs (Simple Indexing)</title>
+
     <para>
     <para>
-    At the end of the updating procedure (or in a separate operation, if
-    you so desire), the system enters a "commit mode". First,
-    any active server processes are forced to access those blocks that
-    have been changed from the shadow files rather than from the main
-    register files; the unmodified blocks are still accessed at their
-    normal location (the shadow files are not a complete copy of the
-    register files - they only contain those parts that have actually been
-    modified). If the commit process is interrupted at any point during the
-    commit process, the server processes will continue to access the
-    shadow files until you can repeat the commit procedure and complete
-    the writing of data to the main register files. You can perform
-    multiple update operations to the registers before you commit the
-    changes to the system files, or you can execute the commit operation
-    at the end of each update operation. When the commit phase has
-    completed successfully, any running server processes are instructed to
-    switch their operations to the new, operational register, and the
-    temporary shadow files are deleted.
+    If you have a set of records that are not expected to change over time
+    you may can build your database without record IDs.
+    This indexing method uses less space than the other methods and
+    is simple to use.
     </para>
     </para>
-   
-  </sect2>
-  
-  <sect2 id="shadow-registers-how-to-use">
-   <title>How to Use Shadow Register Files</title>
-   
+
     <para>
     <para>
-    The first step is to allocate space on your system for the shadow
-    files.
-    You do this by adding a <literal>shadow</literal> entry to the
-    <literal>zebra.cfg</literal> file.
-    The syntax of the <literal>shadow</literal> entry is exactly the
-    same as for the <literal>register</literal> entry
-    (see <xref linkend="register-location"/>).
-     The location of the shadow area should be
-     <emphasis>different</emphasis> from the location of the main register
-     area (if you have specified one - remember that if you provide no
-     <literal>register</literal> setting, the default register area is the
-     working directory of the server and indexing processes).
+    To use this method, you simply omit the <literal>recordId</literal> entry
+    for the group of files that you index. To add a set of records you use
+    <literal>zebraidx</literal> with the <literal>update</literal> command. The
+    <literal>update</literal> command will always add all of the records that it
+    encounters to the index - whether they have already been indexed or
+    not. If the set of indexed files change, you should delete all of the
+    index files, and build a new index from scratch.
     </para>
     </para>
-   
+
     <para>
     <para>
-    The following excerpt from a <literal>zebra.cfg</literal> file shows
-    one example of a setup that configures both the main register
-    location and the shadow file area.
-    Note that two directories or partitions have been set aside
-    for the shadow file area. You can specify any number of directories
-    for each of the file areas, but remember that there should be no
-    overlaps between the directories used for the main registers and the
-    shadow files, respectively.
+    Consider a system in which you have a group of text files called
+    <literal>simple</literal>.
+    That group of records should belong to a &acro.z3950; database called
+    <literal>textbase</literal>.
+    The following <literal>zebra.cfg</literal> file will suffice:
     </para>
     <para>
     </para>
     <para>
-    
+
      <screen>
      <screen>
-     register: /d1:500M
-     shadow: /scratch1:100M /scratch2:200M
+     profilePath: /usr/local/idzebra/tab
+     attset: bib1.att
+     simple.recordType: text
+     simple.database: textbase
      </screen>
      </screen>
-    
+
     </para>
     </para>
-   
+
     <para>
     <para>
-    When shadow files are enabled, an extra command is available at the
-    <literal>zebraidx</literal> command line.
-    In order to make changes to the system take effect for the
-    users, you'll have to submit a "commit" command after a
-    (sequence of) update operation(s).
+    Since the existing records in an index can not be addressed by their
+    IDs, it is impossible to delete or modify records when using this method.
     </para>
     </para>
-   
+
+  </sect1>
+
+  <sect1 id="file-ids">
+   <title>Indexing with File Record IDs</title>
+
     <para>
     <para>
-    
-    <screen>
-     $ zebraidx update /d1/records 
-     $ zebraidx commit
-    </screen>
-    
+    If you have a set of files that regularly change over time: Old files
+    are deleted, new ones are added, or existing files are modified, you
+    can benefit from using the <emphasis>file ID</emphasis>
+    indexing methodology.
+    Examples of this type of database might include an index of WWW
+    resources, or a USENET news spool area.
+    Briefly speaking, the file key methodology uses the directory paths
+    of the individual records as a unique identifier for each record.
+    To perform indexing of a directory with file keys, again, you specify
+    the top-level directory after the <literal>update</literal> command.
+    The command will recursively traverse the directories and compare
+    each one with whatever have been indexed before in that same directory.
+    If a file is new (not in the previous version of the directory) it
+    is inserted into the registers; if a file was already indexed and
+    it has been modified since the last update, the index is also
+    modified; if a file has been removed since the last
+    visit, it is deleted from the index.
     </para>
     </para>
-   
+
     <para>
     <para>
-    Or you can execute multiple updates before committing the changes:
+    The resulting system is easy to administrate. To delete a record you
+    simply have to delete the corresponding file (say, with the
+    <literal>rm</literal> command). And to add records you create new
+    files (or directories with files). For your changes to take effect
+    in the register you must run <literal>zebraidx update</literal> with
+    the same directory root again. This mode of operation requires more
+    disk space than simpler indexing methods, but it makes it easier for
+    you to keep the index in sync with a frequently changing set of data.
+    If you combine this system with the <emphasis>safe update</emphasis>
+    facility (see below), you never have to take your server off-line for
+    maintenance or register updating purposes.
     </para>
     </para>
-   
+
     <para>
     <para>
-    
+    To enable indexing with pathname IDs, you must specify
+    <literal>file</literal> as the value of <literal>recordId</literal>
+    in the configuration file. In addition, you should set
+    <literal>storeKeys</literal> to <literal>1</literal>, since the &zebra;
+    indexer must save additional information about the contents of each record
+    in order to modify the indexes correctly at a later time.
+   </para>
+
+   <!--
+   FIXME - There must be a simpler way to do this with Adams string tags -H
+   -->
+
+   <para>
+    For example, to update records of group <literal>esdd</literal>
+    located below
+    <literal>/data1/records/</literal> you should type:
+    <screen>
+     $ zebraidx -g esdd update /data1/records
+    </screen>
+   </para>
+
+   <para>
+    The corresponding configuration file includes:
      <screen>
      <screen>
-     $ zebraidx -g books update /d1/records  /d2/more-records
-     $ zebraidx -g fun update /d3/fun-records
-     $ zebraidx commit
+     esdd.recordId: file
+     esdd.recordType: grs.sgml
+     esdd.storeKeys: 1
      </screen>
      </screen>
-    
     </para>
     </para>
-   
+
+   <note>
+    <para>You cannot start out with a group of records with simple
+     indexing (no record IDs as in the previous section) and then later
+     enable file record Ids. &zebra; must know from the first time that you
+     index the group that
+     the files should be indexed with file record IDs.
+    </para>
+   </note>
+
     <para>
     <para>
-    If one of the update operations above had been interrupted, the commit
-    operation on the last line would fail: <literal>zebraidx</literal>
-    will not let you commit changes that would destroy the running register.
-    You'll have to rerun all of the update operations since your last
-    commit operation, before you can commit the new changes.
+    You cannot explicitly delete records when using this method (using the
+    <literal>delete</literal> command to <literal>zebraidx</literal>. Instead
+    you have to delete the files from the file system (or move them to a
+    different location)
+    and then run <literal>zebraidx</literal> with the
+    <literal>update</literal> command.
     </para>
     </para>
-   
+   <!-- ### what happens if a file contains multiple records? -->
+  </sect1>
+
+  <sect1 id="generic-ids">
+   <title>Indexing with General Record IDs</title>
+
     <para>
     <para>
-    Similarly, if the commit operation fails, <literal>zebraidx</literal>
-    will not let you start a new update operation before you have
-    successfully repeated the commit operation.
-    The server processes will keep accessing the shadow files rather
-    than the (possibly damaged) blocks of the main register files
-    until the commit operation has successfully completed.
+    When using this method you construct an (almost) arbitrary, internal
+    record key based on the contents of the record itself and other system
+    information. If you have a group of records that explicitly associates
+    an ID with each record, this method is convenient. For example, the
+    record format may contain a title or a ID-number - unique within the group.
+    In either case you specify the &acro.z3950; attribute set and use-attribute
+    location in which this information is stored, and the system looks at
+    that field to determine the identity of the record.
     </para>
     </para>
-   
+
     <para>
     <para>
-    You should be aware that update operations may take slightly longer
-    when the shadow register system is enabled, since more file access
-    operations are involved. Further, while the disk space required for
-    the shadow register data is modest for a small update operation, you
-    may prefer to disable the system if you are adding a very large number
-    of records to an already very large database (we use the terms
-    <emphasis>large</emphasis> and <emphasis>modest</emphasis>
-    very loosely here, since every application will have a
-    different perception of size).
-    To update the system without the use of the the shadow files,
-    simply run <literal>zebraidx</literal> with the <literal>-n</literal>
-    option (note that you do not have to execute the
-    <emphasis>commit</emphasis> command of <literal>zebraidx</literal>
-    when you temporarily disable the use of the shadow registers in
-    this fashion.
-    Note also that, just as when the shadow registers are not enabled,
-    server processes will be barred from accessing the main register
-    while the update procedure takes place.
+    As before, the record ID is defined by the <literal>recordId</literal>
+    setting in the configuration file. The value of the record ID specification
+    consists of one or more tokens separated by whitespace. The resulting
+    ID is represented in the index by concatenating the tokens and
+    separating them by ASCII value (1).
     </para>
     </para>
-   
-  </sect2>
-  
- </sect1>
  
  
+   <para>
+    There are three kinds of tokens:
+    <variablelist>
  
  
- <sect1 id="administration-ranking">
-  <title>Relevance Ranking and Sorting of Result Sets</title>
+     <varlistentry>
+      <term>Internal record info</term>
+      <listitem>
+       <para>
+       The token refers to a key that is
+       extracted from the record. The syntax of this token is
+       <literal>(</literal> <emphasis>set</emphasis> <literal>,</literal>
+       <emphasis>use</emphasis> <literal>)</literal>,
+       where <emphasis>set</emphasis> is the
+       attribute set name <emphasis>use</emphasis> is the
+       name or value of the attribute.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term>System variable</term>
+      <listitem>
+       <para>
+       The system variables are preceded by
+
+       <screen>
+        $
+       </screen>
+       and immediately followed by the system variable name, which
+       may one of
+       <variablelist>
+
+        <varlistentry>
+         <term>group</term>
+         <listitem>
+          <para>
+           Group name.
+          </para>
+         </listitem>
+        </varlistentry>
+        <varlistentry>
+         <term>database</term>
+         <listitem>
+          <para>
+           Current database specified.
+          </para>
+         </listitem>
+        </varlistentry>
+        <varlistentry>
+         <term>type</term>
+         <listitem>
+          <para>
+           Record type.
+          </para>
+         </listitem>
+        </varlistentry>
+       </variablelist>
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term>Constant string</term>
+      <listitem>
+       <para>
+       A string used as part of the ID &mdash; surrounded
+       by single- or double quotes.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+   </para>
  
  
-  <sect2 id="administration-overview">
-   <title>Overview</title>
     <para>
     <para>
-    The default ordering of a result set is left up to the server,
-    which inside &zebra; means sorting in ascending document ID order. 
-    This is not always the order humans want to browse the sometimes
-    quite large hit sets. Ranking and sorting comes to the rescue.
+    For instance, the sample GILS records that come with the &zebra;
+    distribution contain a unique ID in the data tagged Control-Identifier.
+    The data is mapped to the &acro.bib1; use attribute Identifier-standard
+    (code 1007). To use this field as a record id, specify
+    <literal>(bib1,Identifier-standard)</literal> as the value of the
+    <literal>recordId</literal> in the configuration file.
+    If you have other record types that uses the same field for a
+    different purpose, you might add the record type
+    (or group or database name) to the record id of the gils
+    records as well, to prevent matches with other types of records.
+    In this case the recordId might be set like this:
+
+    <screen>
+     gils.recordId: $type (bib1,Identifier-standard)
+    </screen>
+
     </para>
  
     </para>
  
-   <para> 
-    In cases where a good presentation ordering can be computed at
-    indexing time, we can use a fixed <literal>static ranking</literal>
-    scheme, which is provided for the <literal>alvis</literal>
-    indexing filter. This defines a fixed ordering of hit lists,
-    independently of the query issued. 
+   <para>
+    (see <xref linkend="grs"/>
+    for details of how the mapping between elements of your records and
+    searchable attributes is established).
     </para>
  
     <para>
     </para>
  
     <para>
-    There are cases, however, where relevance of hit set documents is
-    highly dependent on the query processed.
-    Simply put, <literal>dynamic relevance ranking</literal> 
-    sorts a set of retrieved records such that those most likely to be
-    relevant to your request are retrieved first. 
-    Internally, &zebra; retrieves all documents that satisfy your
-    query, and re-orders the hit list to arrange them based on
-    a measurement of similarity between your query and the content of
-    each record. 
+    As for the file record ID case described in the previous section,
+    updating your system is simply a matter of running
+    <literal>zebraidx</literal>
+    with the <literal>update</literal> command. However, the update with general
+    keys is considerably slower than with file record IDs, since all files
+    visited must be (re)read to discover their IDs.
     </para>
  
     <para>
     </para>
  
     <para>
-    Finally, there are situations where hit sets of documents should be
-    <literal>sorted</literal> during query time according to the
-    lexicographical ordering of certain sort indexes created at
-    indexing time.
+    As you might expect, when using the general record IDs
+    method, you can only add or modify existing records with the
+    <literal>update</literal> command.
+    If you wish to delete records, you must use the,
+    <literal>delete</literal> command, with a directory as a parameter.
+    This will remove all records that match the files below that root
+    directory.
     </para>
     </para>
-  </sect2>
  
  
+  </sect1>
+
+  <sect1 id="register-location">
+   <title>Register Location</title>
  
  
- <sect2 id="administration-ranking-static">
-  <title>Static Ranking</title>
-  
     <para>
     <para>
-    &zebra; uses internally inverted indexes to look up term occurencies
-    in documents. Multiple queries from different indexes can be
-    combined by the binary boolean operations <literal>AND</literal>, 
-    <literal>OR</literal> and/or <literal>NOT</literal> (which
-    is in fact a binary <literal>AND NOT</literal> operation). 
-    To ensure fast query execution
-    speed, all indexes have to be sorted in the same order.
+    Normally, the index files that form dictionaries, inverted
+    files, record info, etc., are stored in the directory where you run
+    <literal>zebraidx</literal>. If you wish to store these, possibly large,
+    files somewhere else, you must add the <literal>register</literal>
+    entry to the <literal>zebra.cfg</literal> file.
+    Furthermore, the &zebra; system allows its file
+    structures to span multiple file systems, which is useful for
+    managing very large databases.
     </para>
     </para>
+
     <para>
     <para>
-    The indexes are normally sorted according to document 
-    <literal>ID</literal> in
-    ascending order, and any query which does not invoke a special
-    re-ranking function will therefore retrieve the result set in
-    document 
-    <literal>ID</literal>
-    order.
+    The value of the <literal>register</literal> setting is a sequence
+    of tokens. Each token takes the form:
+
+    <emphasis>dir</emphasis><literal>:</literal><emphasis>size</emphasis>
+
+    The <emphasis>dir</emphasis> specifies a directory in which index files
+    will be stored and the <emphasis>size</emphasis> specifies the maximum
+    size of all files in that directory. The &zebra; indexer system fills
+    each directory in the order specified and use the next specified
+    directories as needed.
+    The <emphasis>size</emphasis> is an integer followed by a qualifier
+    code,
+    <literal>b</literal> for bytes,
+    <literal>k</literal> for kilobytes.
+    <literal>M</literal> for megabytes,
+    <literal>G</literal> for gigabytes.
+    Specifying a negative value disables the checking (it still needs the unit,
+    use <literal>-1b</literal>).
     </para>
     </para>
+
     <para>
     <para>
-    If one defines the 
+    For instance, if you have allocated three disks for your register, and
+    the first disk is mounted
+    on <literal>/d1</literal> and has 2GB of free space, the
+    second, mounted on <literal>/d2</literal> has 3.6 GB, and the third,
+    on which you have more space than you bother to worry about, mounted on
+    <literal>/d3</literal> you could put this entry in your configuration file:
+
      <screen>
      <screen>
-    staticrank: 1 
-    </screen> 
-    directive in the main core &zebra; configuration file, the internal document
-    keys used for ordering are augmented by a preceding integer, which
-    contains the static rank of a given document, and the index lists
-    are ordered 
-    first by ascending static rank,
-    then by ascending document <literal>ID</literal>.
-    Zero
-    is the ``best'' rank, as it occurs at the
-    beginning of the list; higher numbers represent worse scores.
+     register: /d1:2G /d2:3600M /d3:-1b
+    </screen>
     </para>
     </para>
+
     <para>
     <para>
-    The experimental <literal>alvis</literal> filter provides a
-    directive to fetch static rank information out of the indexed XML
-    records, thus making <emphasis>all</emphasis> hit sets ordered
-    after <emphasis>ascending</emphasis> static
-    rank, and for those doc's which have the same static rank, ordered
-    after <emphasis>ascending</emphasis> doc <literal>ID</literal>.
-    See <xref linkend="record-model-alvisxslt"/> for the gory details.
+    Note that &zebra; does not verify that the amount of space specified is
+    actually available on the directory (file system) specified - it is
+    your responsibility to ensure that enough space is available, and that
+    other applications do not attempt to use the free space. In a large
+    production system, it is recommended that you allocate one or more
+    file system exclusively to the &zebra; register files.
     </para>
     </para>
-    </sect2>
  
  
+  </sect1>
  
  
- <sect2 id="administration-ranking-dynamic">
-  <title>Dynamic Ranking</title>
-   <para>
-    In order to fiddle with the static rank order, it is necessary to
-    invoke additional re-ranking/re-ordering using dynamic
-    ranking or score functions. These functions return positive
-    integer scores, where <emphasis>highest</emphasis> score is 
-    ``best'';
-    hit sets are sorted according to <emphasis>descending</emphasis> 
-    scores (in contrary
-    to the index lists which are sorted according to
-    ascending rank number and document ID).
-   </para>
-   <para>
-    Dynamic ranking is enabled by a directive like one of the
-    following in the zebra configuration file (use only one of these a time!):
-    <screen> 
-    rank: rank-1        # default TDF-IDF like
-    rank: rank-static   # dummy do-nothing
-    </screen>
-   </para>
- 
-   <para>
-    Dynamic ranking is done at query time rather than
-    indexing time (this is why we
-    call it ``dynamic ranking'' in the first place ...)
-    It is invoked by adding
-    the Bib-1 relation attribute with
-    value ``relevance'' to the PQF query (that is,
-    <literal>@attr&nbsp;2=102</literal>, see also  
-    <ulink url="&url.z39.50;bib1.html">
-     The BIB-1 Attribute Set Semantics</ulink>, also in 
-      <ulink url="&url.z39.50.attset.bib1;">HTML</ulink>). 
-    To find all articles with the word <literal>Eoraptor</literal> in
-    the title, and present them relevance ranked, issue the PQF query:
-    <screen>
-     @attr 2=102 @attr 1=4 Eoraptor
-    </screen>
-   </para>
+  <sect1 id="shadow-registers">
+   <title>Safe Updating - Using Shadow Registers</title>
  
  
-    <sect3 id="administration-ranking-dynamic-rank1">
-     <title>Dynamically ranking using PQF queries with the 'rank-1' 
-      algorithm</title>
+   <sect2 id="shadow-registers-description">
+    <title>Description</title>
  
  
-   <para>
-     The default <literal>rank-1</literal> ranking module implements a 
-     TF/IDF (Term Frequecy over Inverse Document Frequency) like
-     algorithm. In contrast to the usual defintion of TF/IDF
-     algorithms, which only considers searching in one full-text
-     index, this one works on multiple indexes at the same time.
-     More precisely, 
-     &zebra; does boolean queries and searches in specific addressed
-     indexes (there are inverted indexes pointing from terms in the
-     dictionary to documents and term positions inside documents). 
-     It works like this:
-     <variablelist>
-      <varlistentry>
-       <term>Query Components</term>
-       <listitem>
-        <para>
-         First, the boolean query is dismantled into it's principal components,
-         i.e. atomic queries where one term is looked up in one index.
-         For example, the query
-         <screen>
-        @attr 2=102 @and @attr 1=1010 Utah @attr 1=1018 Springer
-         </screen>
-         is a boolean AND between the atomic parts
-         <screen>
-       @attr 2=102 @attr 1=1010 Utah
-         </screen>
-          and
-         <screen>
-       @attr 2=102 @attr 1=1018 Springer
-         </screen>
-         which gets processed each for itself.
-        </para>
-       </listitem>
-      </varlistentry>
-
-      <varlistentry>
-       <term>Atomic hit lists</term>
-       <listitem>
-        <para>
-         Second, for each atomic query, the hit list of documents is
-         computed.
-        </para>
-        <para>
-         In this example, two hit lists for each index  
-         <literal>@attr 1=1010</literal>  and  
-         <literal>@attr 1=1018</literal> are computed.
-        </para>
-       </listitem>
-      </varlistentry>
-
-      <varlistentry>
-       <term>Atomic scores</term>
-       <listitem>
-        <para>
-         Third, each document in the hit list is assigned a score (_if_ ranking
-         is enabled and requested in the query)  using a TF/IDF scheme.
-        </para>
-        <para>
-         In this example, both atomic parts of the query assign the magic
-         <literal>@attr 2=102</literal> relevance attribute, and are
-         to be used in the relevance ranking functions. 
-        </para>
-        <para>
-         It is possible to apply dynamic ranking on only parts of the
-         PQF query: 
-         <screen>
-          @and @attr 2=102 @attr 1=1010 Utah @attr 1=1018 Springer
-         </screen>
-         searches for all documents which have the term 'Utah' on the
-         body of text, and which have the term 'Springer' in the publisher
-         field, and sort them in the order of the relevance ranking made on
-         the body-of-text index only. 
-        </para>
-       </listitem>
-      </varlistentry>
-
-      <varlistentry>
-       <term>Hit list merging</term>
-       <listitem>
-        <para>
-         Fourth, the atomic hit lists are merged according to the boolean
-         conditions to a final hit list of documents to be returned.
-        </para>
-        <para>
-        This step is always performed, independently of the fact that
-        dynamic ranking is enabled or not.
-        </para>
-       </listitem>
-      </varlistentry>
-
-      <varlistentry>
-       <term>Document score computation</term>
-       <listitem>
-        <para>
-         Fifth, the total score of a document is computed as a linear
-         combination of the atomic scores of the atomic hit lists
-        </para>
-        <para>
-         Ranking weights may be used to pass a value to a ranking
-         algorithm, using the non-standard BIB-1 attribute type 9.
-         This allows one branch of a query to use one value while
-         another branch uses a different one.  For example, we can search
-         for <literal>utah</literal> in the 
-         <literal>@attr 1=4</literal> index with weight 30, as
-         well as in the <literal>@attr 1=1010</literal> index with weight 20:
-         <screen>
-         @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 @attr 1=1010 city
-         </screen>
-        </para>
-        <para>
-         The default weight is
-         sqrt(1000) ~ 34 , as the Z39.50 standard prescribes that the top score
-         is 1000 and the bottom score is 0, encoded in integers.
-        </para>
-        <warning>
-         <para>
-          The ranking-weight feature is experimental. It may change in future
-          releases of zebra. 
-         </para>
-        </warning>
-       </listitem>
-      </varlistentry>
-
-      <varlistentry>
-       <term>Re-sorting of hit list</term>
-       <listitem>
-        <para>
-         Finally, the final hit list is re-ordered according to scores.
-        </para>
-       </listitem>
-      </varlistentry>
-     </variablelist>
- 
-
-<!--
-Still need to describe the exact TF/IDF formula. Here's the info, need -->
-<!--to extract it in human readable form .. MC
-
-static int calc (void *set_handle, zint sysno, zint staticrank,
-                 int *stop_flag)
-{
-    int i, lo, divisor, score = 0;
-    struct rank_set_info *si = (struct rank_set_info *) set_handle;
-
-    if (!si->no_rank_entries)
-        return -1;   /* ranking not enabled for any terms */
-
-    for (i = 0; i < si->no_entries; i++)
-    {
-        yaz_log(log_level, "calc: i=%d rank_flag=%d lo=%d",
-                i, si->entries[i].rank_flag, si->entries[i].local_occur);
-        if (si->entries[i].rank_flag && (lo = si->entries[i].local_occur))
-            score += (8+log2_int (lo)) * si->entries[i].global_inv *
-                si->entries[i].rank_weight;
-    }
-    divisor = si->no_rank_entries * (8+log2_int (si->last_pos/si->no_entries));
-    score = score / divisor;
-    yaz_log(log_level, "calc sysno=" ZINT_FORMAT " score=%d", sysno, score);
-    if (score > 1000)
-        score = 1000;
-    /* reset the counts for the next term */
-    for (i = 0; i < si->no_entries; i++)
-        si->entries[i].local_occur = 0;
-    return score;
-}
-
-
-where lo = si->entries[i].local_occur is the local documents term-within-index frequency, si->entries[i].global_inv represents the IDF part (computed in static void *begin()), and
-si->entries[i].rank_weight is the weight assigner per index (default 34, or set in the @attr 9=xyz magic)
-
-Finally, the IDF part is computed as:
-
-static void *begin (struct zebra_register *reg,
-                    void *class_handle, RSET rset, NMEM nmem,
-                    TERMID *terms, int numterms)
-{
-    struct rank_set_info *si =
-        (struct rank_set_info *) nmem_malloc (nmem,sizeof(*si));
-    int i;
-
-    yaz_log(log_level, "rank-1 begin");
-    si->no_entries = numterms;
-    si->no_rank_entries = 0;
-    si->nmem=nmem;
-    si->entries = (struct rank_term_info *)
-        nmem_malloc (si->nmem, sizeof(*si->entries)*numterms);
-    for (i = 0; i < numterms; i++)
-    {
-        zint g = rset_count(terms[i]->rset);
-        yaz_log(log_level, "i=%d flags=%s '%s'", i,
-                terms[i]->flags, terms[i]->name );
-        if  (!strncmp (terms[i]->flags, "rank,", 5))
-        {
-            const char *cp = strstr(terms[i]->flags+4, ",w=");
-            si->entries[i].rank_flag = 1;
-            if (cp)
-                si->entries[i].rank_weight = atoi (cp+3);
-            else
-              si->entries[i].rank_weight = 34; /* sqrroot of 1000 */
-            yaz_log(log_level, " i=%d weight=%d g="ZINT_FORMAT, i,
-                     si->entries[i].rank_weight, g);
-            (si->no_rank_entries)++;
-        }
-        else
-            si->entries[i].rank_flag = 0;
-        si->entries[i].local_occur = 0;  /* FIXME */
-        si->entries[i].global_occur = g;
-        si->entries[i].global_inv = 32 - log2_int (g);
-        yaz_log(log_level, " global_inv = %d g = " ZINT_FORMAT,
-                (int) (32-log2_int (g)), g);
-        si->entries[i].term = terms[i];
-        si->entries[i].term_index=i;
-        terms[i]->rankpriv = &(si->entries[i]);
-    }
-    return si;
-}
-
-
-where g = rset_count(terms[i]->rset) is the count of all documents in this specific index hit list, and the IDF part then is
-
- si->entries[i].global_inv = 32 - log2_int (g);
-   -->
+    <para>
+     The &zebra; server supports <emphasis>updating</emphasis> of the index
+     structures. That is, you can add, modify, or remove records from
+     databases managed by &zebra; without rebuilding the entire index.
+     Since this process involves modifying structured files with various
+     references between blocks of data in the files, the update process
+     is inherently sensitive to system crashes, or to process interruptions:
+     Anything but a successfully completed update process will leave the
+     register files in an unknown state, and you will essentially have no
+     recourse but to re-index everything, or to restore the register files
+     from a backup medium.
+     Further, while the update process is active, users cannot be
+     allowed to access the system, as the contents of the register files
+     may change unpredictably.
+    </para>
+
+    <para>
+     You can solve these problems by enabling the shadow register system in
+     &zebra;.
+     During the updating procedure, <literal>zebraidx</literal> will temporarily
+     write changes to the involved files in a set of "shadow
+     files", without modifying the files that are accessed by the
+     active server processes. If the update procedure is interrupted by a
+     system crash or a signal, you simply repeat the procedure - the
+     register files have not been changed or damaged, and the partially
+     written shadow files are automatically deleted before the new updating
+     procedure commences.
+    </para>
+
+    <para>
+     At the end of the updating procedure (or in a separate operation, if
+     you so desire), the system enters a "commit mode". First,
+     any active server processes are forced to access those blocks that
+     have been changed from the shadow files rather than from the main
+     register files; the unmodified blocks are still accessed at their
+     normal location (the shadow files are not a complete copy of the
+     register files - they only contain those parts that have actually been
+     modified). If the commit process is interrupted at any point during the
+     commit process, the server processes will continue to access the
+     shadow files until you can repeat the commit procedure and complete
+     the writing of data to the main register files. You can perform
+     multiple update operations to the registers before you commit the
+     changes to the system files, or you can execute the commit operation
+     at the end of each update operation. When the commit phase has
+     completed successfully, any running server processes are instructed to
+     switch their operations to the new, operational register, and the
+     temporary shadow files are deleted.
+    </para>
+
+   </sect2>
+
+   <sect2 id="shadow-registers-how-to-use">
+    <title>How to Use Shadow Register Files</title>
+
+    <para>
+     The first step is to allocate space on your system for the shadow
+     files.
+     You do this by adding a <literal>shadow</literal> entry to the
+     <literal>zebra.cfg</literal> file.
+     The syntax of the <literal>shadow</literal> entry is exactly the
+     same as for the <literal>register</literal> entry
+     (see <xref linkend="register-location"/>).
+     The location of the shadow area should be
+     <emphasis>different</emphasis> from the location of the main register
+     area (if you have specified one - remember that if you provide no
+     <literal>register</literal> setting, the default register area is the
+     working directory of the server and indexing processes).
+    </para>
+
+    <para>
+     The following excerpt from a <literal>zebra.cfg</literal> file shows
+     one example of a setup that configures both the main register
+     location and the shadow file area.
+     Note that two directories or partitions have been set aside
+     for the shadow file area. You can specify any number of directories
+     for each of the file areas, but remember that there should be no
+     overlaps between the directories used for the main registers and the
+     shadow files, respectively.
+    </para>
+    <para>
+
+     <screen>
+      register: /d1:500M
+      shadow: /scratch1:100M /scratch2:200M
+     </screen>
+
+    </para>
+
+    <para>
+     When shadow files are enabled, an extra command is available at the
+     <literal>zebraidx</literal> command line.
+     In order to make changes to the system take effect for the
+     users, you'll have to submit a "commit" command after a
+     (sequence of) update operation(s).
+    </para>
+
+    <para>
+
+     <screen>
+      $ zebraidx update /d1/records
+      $ zebraidx commit
+     </screen>
+
+    </para>
+
+    <para>
+     Or you can execute multiple updates before committing the changes:
+    </para>
+
+    <para>
+
+     <screen>
+      $ zebraidx -g books update /d1/records  /d2/more-records
+      $ zebraidx -g fun update /d3/fun-records
+      $ zebraidx commit
+     </screen>
+
+    </para>
+
+    <para>
+     If one of the update operations above had been interrupted, the commit
+     operation on the last line would fail: <literal>zebraidx</literal>
+     will not let you commit changes that would destroy the running register.
+     You'll have to rerun all of the update operations since your last
+     commit operation, before you can commit the new changes.
+    </para>
+
+    <para>
+     Similarly, if the commit operation fails, <literal>zebraidx</literal>
+     will not let you start a new update operation before you have
+     successfully repeated the commit operation.
+     The server processes will keep accessing the shadow files rather
+     than the (possibly damaged) blocks of the main register files
+     until the commit operation has successfully completed.
+    </para>
+
+    <para>
+     You should be aware that update operations may take slightly longer
+     when the shadow register system is enabled, since more file access
+     operations are involved. Further, while the disk space required for
+     the shadow register data is modest for a small update operation, you
+     may prefer to disable the system if you are adding a very large number
+     of records to an already very large database (we use the terms
+     <emphasis>large</emphasis> and <emphasis>modest</emphasis>
+     very loosely here, since every application will have a
+     different perception of size).
+     To update the system without the use of the the shadow files,
+     simply run <literal>zebraidx</literal> with the <literal>-n</literal>
+     option (note that you do not have to execute the
+     <emphasis>commit</emphasis> command of <literal>zebraidx</literal>
+     when you temporarily disable the use of the shadow registers in
+     this fashion.
+     Note also that, just as when the shadow registers are not enabled,
+     server processes will be barred from accessing the main register
+     while the update procedure takes place.
+    </para>
+
+   </sect2>
+
+  </sect1>
+
+
+  <sect1 id="administration-ranking">
+   <title>Relevance Ranking and Sorting of Result Sets</title>
+
+   <sect2 id="administration-overview">
+    <title>Overview</title>
+    <para>
+     The default ordering of a result set is left up to the server,
+     which inside &zebra; means sorting in ascending document ID order.
+     This is not always the order humans want to browse the sometimes
+     quite large hit sets. Ranking and sorting comes to the rescue.
+    </para>
+
+    <para>
+     In cases where a good presentation ordering can be computed at
+     indexing time, we can use a fixed <literal>static ranking</literal>
+     scheme, which is provided for the <literal>alvis</literal>
+     indexing filter. This defines a fixed ordering of hit lists,
+     independently of the query issued.
+    </para>
+
+    <para>
+     There are cases, however, where relevance of hit set documents is
+     highly dependent on the query processed.
+     Simply put, <literal>dynamic relevance ranking</literal>
+     sorts a set of retrieved records such that those most likely to be
+     relevant to your request are retrieved first.
+     Internally, &zebra; retrieves all documents that satisfy your
+     query, and re-orders the hit list to arrange them based on
+     a measurement of similarity between your query and the content of
+     each record.
+    </para>
+
+    <para>
+     Finally, there are situations where hit sets of documents should be
+     <literal>sorted</literal> during query time according to the
+     lexicographical ordering of certain sort indexes created at
+     indexing time.
+    </para>
+   </sect2>
  
  
-   </para>
  
  
+   <sect2 id="administration-ranking-static">
+    <title>Static Ranking</title>
  
      <para>
  
      <para>
-    The <literal>rank-1</literal> algorithm
-    does not use the static rank 
-    information in the list keys, and will produce the same ordering
-    with or without static ranking enabled.
+     &zebra; uses internally inverted indexes to look up term frequencies
+     in documents. Multiple queries from different indexes can be
+     combined by the binary boolean operations <literal>AND</literal>,
+     <literal>OR</literal> and/or <literal>NOT</literal> (which
+     is in fact a binary <literal>AND NOT</literal> operation).
+     To ensure fast query execution
+     speed, all indexes have to be sorted in the same order.
+    </para>
+    <para>
+     The indexes are normally sorted according to document
+     <literal>ID</literal> in
+     ascending order, and any query which does not invoke a special
+     re-ranking function will therefore retrieve the result set in
+     document
+     <literal>ID</literal>
+     order.
+    </para>
+    <para>
+     If one defines the
+     <screen>
+      staticrank: 1
+     </screen>
+     directive in the main core &zebra; configuration file, the internal document
+     keys used for ordering are augmented by a preceding integer, which
+     contains the static rank of a given document, and the index lists
+     are ordered
+     first by ascending static rank,
+     then by ascending document <literal>ID</literal>.
+     Zero
+     is the ``best'' rank, as it occurs at the
+     beginning of the list; higher numbers represent worse scores.
+    </para>
+    <para>
+     The experimental <literal>alvis</literal> filter provides a
+     directive to fetch static rank information out of the indexed &acro.xml;
+     records, thus making <emphasis>all</emphasis> hit sets ordered
+     after <emphasis>ascending</emphasis> static
+     rank, and for those doc's which have the same static rank, ordered
+     after <emphasis>ascending</emphasis> doc <literal>ID</literal>.
+     See <xref linkend="record-model-alvisxslt"/> for the gory details.
+    </para>
+   </sect2>
+
+
+   <sect2 id="administration-ranking-dynamic">
+    <title>Dynamic Ranking</title>
+    <para>
+     In order to fiddle with the static rank order, it is necessary to
+     invoke additional re-ranking/re-ordering using dynamic
+     ranking or score functions. These functions return positive
+     integer scores, where <emphasis>highest</emphasis> score is
+     ``best'';
+     hit sets are sorted according to <emphasis>descending</emphasis>
+     scores (in contrary
+     to the index lists which are sorted according to
+     ascending rank number and document ID).
+    </para>
+    <para>
+     Dynamic ranking is enabled by a directive like one of the
+     following in the zebra configuration file (use only one of these a time!):
+     <screen>
+      rank: rank-1        # default TDF-IDF like
+      rank: rank-static   # dummy do-nothing
+     </screen>
+    </para>
+
+    <para>
+     Dynamic ranking is done at query time rather than
+     indexing time (this is why we
+     call it ``dynamic ranking'' in the first place ...)
+     It is invoked by adding
+     the &acro.bib1; relation attribute with
+     value ``relevance'' to the &acro.pqf; query (that is,
+     <literal>@attr&nbsp;2=102</literal>, see also
+     <ulink url="&url.z39.50;bib1.html">
+      The &acro.bib1; Attribute Set Semantics</ulink>, also in
+     <ulink url="&url.z39.50.attset.bib1;">HTML</ulink>).
+     To find all articles with the word <literal>Eoraptor</literal> in
+     the title, and present them relevance ranked, issue the &acro.pqf; query:
+     <screen>
+      @attr 2=102 @attr 1=4 Eoraptor
+     </screen>
      </para>
      </para>
- 
  
  
-    <!--
      <sect3 id="administration-ranking-dynamic-rank1">
      <sect3 id="administration-ranking-dynamic-rank1">
-     <title>Dynamically ranking PQF queries with the 'rank-static' 
+     <title>Dynamically ranking using &acro.pqf; queries with the 'rank-1'
        algorithm</title>
        algorithm</title>
-    <para>
-    The dummy <literal>rank-static</literal> reranking/scoring
-    function returns just 
-    <literal>score = max int - staticrank</literal>
-    in order to preserve the static ordering of hit sets that would
-    have been produced had it not been invoked.
-    Obviously, to combine static and dynamic ranking usefully,
-    it is necessary
-    to make a new ranking 
-    function; this is left
-    as an exercise for the reader. 
-   </para>
-    </sect3>
-    -->
- 
-   <warning>
+
       <para>
       <para>
-      <literal>Dynamic ranking</literal> is not compatible
-      with <literal>estimated hit sizes</literal>, as all documents in
-      a hit set must be accessed to compute the correct placing in a
-      ranking sorted list. Therefore the use attribute setting
-      <literal>@attr&nbsp;2=102</literal> clashes with 
-      <literal>@attr&nbsp;9=integer</literal>. 
+      The default <literal>rank-1</literal> ranking module implements a
+      TF/IDF (Term Frequecy over Inverse Document Frequency) like
+      algorithm. In contrast to the usual definition of TF/IDF
+      algorithms, which only considers searching in one full-text
+      index, this one works on multiple indexes at the same time.
+      More precisely,
+      &zebra; does boolean queries and searches in specific addressed
+      indexes (there are inverted indexes pointing from terms in the
+      dictionary to documents and term positions inside documents).
+      It works like this:
+      <variablelist>
+       <varlistentry>
+       <term>Query Components</term>
+       <listitem>
+        <para>
+         First, the boolean query is dismantled into its principal components,
+         i.e. atomic queries where one term is looked up in one index.
+         For example, the query
+         <screen>
+          @attr 2=102 @and @attr 1=1010 Utah @attr 1=1018 Springer
+         </screen>
+         is a boolean AND between the atomic parts
+         <screen>
+          @attr 2=102 @attr 1=1010 Utah
+         </screen>
+          and
+         <screen>
+          @attr 2=102 @attr 1=1018 Springer
+         </screen>
+         which gets processed each for itself.
+        </para>
+       </listitem>
+       </varlistentry>
+
+       <varlistentry>
+       <term>Atomic hit lists</term>
+       <listitem>
+        <para>
+         Second, for each atomic query, the hit list of documents is
+         computed.
+        </para>
+        <para>
+         In this example, two hit lists for each index
+         <literal>@attr 1=1010</literal>  and
+         <literal>@attr 1=1018</literal> are computed.
+        </para>
+       </listitem>
+       </varlistentry>
+
+       <varlistentry>
+       <term>Atomic scores</term>
+       <listitem>
+        <para>
+         Third, each document in the hit list is assigned a score (_if_ ranking
+         is enabled and requested in the query)  using a TF/IDF scheme.
+        </para>
+        <para>
+         In this example, both atomic parts of the query assign the magic
+         <literal>@attr 2=102</literal> relevance attribute, and are
+         to be used in the relevance ranking functions.
+        </para>
+        <para>
+         It is possible to apply dynamic ranking on only parts of the
+         &acro.pqf; query:
+         <screen>
+          @and @attr 2=102 @attr 1=1010 Utah @attr 1=1018 Springer
+         </screen>
+         searches for all documents which have the term 'Utah' on the
+         body of text, and which have the term 'Springer' in the publisher
+         field, and sort them in the order of the relevance ranking made on
+         the body-of-text index only.
+        </para>
+       </listitem>
+       </varlistentry>
+
+       <varlistentry>
+       <term>Hit list merging</term>
+       <listitem>
+        <para>
+         Fourth, the atomic hit lists are merged according to the boolean
+         conditions to a final hit list of documents to be returned.
+        </para>
+        <para>
+         This step is always performed, independently of the fact that
+         dynamic ranking is enabled or not.
+        </para>
+       </listitem>
+       </varlistentry>
+
+       <varlistentry>
+       <term>Document score computation</term>
+       <listitem>
+        <para>
+         Fifth, the total score of a document is computed as a linear
+         combination of the atomic scores of the atomic hit lists
+        </para>
+        <para>
+         Ranking weights may be used to pass a value to a ranking
+         algorithm, using the non-standard &acro.bib1; attribute type 9.
+         This allows one branch of a query to use one value while
+         another branch uses a different one.  For example, we can search
+         for <literal>utah</literal> in the
+         <literal>@attr 1=4</literal> index with weight 30, as
+         well as in the <literal>@attr 1=1010</literal> index with weight 20:
+         <screen>
+          @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 @attr 1=1010 city
+         </screen>
+        </para>
+        <para>
+         The default weight is
+         sqrt(1000) ~ 34 , as the &acro.z3950; standard prescribes that the top score
+         is 1000 and the bottom score is 0, encoded in integers.
+        </para>
+        <warning>
+         <para>
+          The ranking-weight feature is experimental. It may change in future
+          releases of zebra.
+         </para>
+        </warning>
+       </listitem>
+       </varlistentry>
+
+       <varlistentry>
+       <term>Re-sorting of hit list</term>
+       <listitem>
+        <para>
+         Finally, the final hit list is re-ordered according to scores.
+        </para>
+       </listitem>
+       </varlistentry>
+      </variablelist>
+
       </para>
       </para>
-   </warning>  
  
  
-   <!--
-    we might want to add ranking like this:
-    UNPUBLISHED:
-    Simple BM25 Extension to Multiple Weighted Fields
-    Stephen Robertson, Hugo Zaragoza and Michael Taylor
-    Microsoft Research
-    ser@microsoft.com
-    hugoz@microsoft.com
-    mitaylor2microsoft.com
-   -->
+
+     <para>
+      The <literal>rank-1</literal> algorithm
+      does not use the static rank
+      information in the list keys, and will produce the same ordering
+      with or without static ranking enabled.
+     </para>
+
+
+     <!--
+     <sect3 id="administration-ranking-dynamic-rank1">
+     <title>Dynamically ranking &acro.pqf; queries with the 'rank-static'
+     algorithm</title>
+     <para>
+     The dummy <literal>rank-static</literal> reranking/scoring
+     function returns just
+     <literal>score = max int - staticrank</literal>
+     in order to preserve the static ordering of hit sets that would
+     have been produced had it not been invoked.
+     Obviously, to combine static and dynamic ranking usefully,
+     it is necessary
+     to make a new ranking
+     function; this is left
+     as an exercise for the reader.
+    </para>
+    </sect3>
+     -->
+
+     <warning>
+      <para>
+       <literal>Dynamic ranking</literal> is not compatible
+       with <literal>estimated hit sizes</literal>, as all documents in
+       a hit set must be accessed to compute the correct placing in a
+       ranking sorted list. Therefore the use attribute setting
+       <literal>@attr&nbsp;2=102</literal> clashes with
+       <literal>@attr&nbsp;9=integer</literal>.
+      </para>
+     </warning>
+
+     <!--
+     we might want to add ranking like this:
+     UNPUBLISHED:
+     Simple BM25 Extension to Multiple Weighted Fields
+     Stephen Robertson, Hugo Zaragoza and Michael Taylor
+     Microsoft Research
+     ser@microsoft.com
+     hugoz@microsoft.com
+     mitaylor2microsoft.com
+     -->
  
      </sect3>
  
      <sect3 id="administration-ranking-dynamic-cql">
  
      </sect3>
  
      <sect3 id="administration-ranking-dynamic-cql">
-     <title>Dynamically ranking CQL queries</title>
+     <title>Dynamically ranking &acro.cql; queries</title>
       <para>
       <para>
-      Dynamic ranking can be enabled during sever side CQL
+      Dynamic ranking can be enabled during sever side &acro.cql;
        query expansion by adding <literal>@attr&nbsp;2=102</literal>
        query expansion by adding <literal>@attr&nbsp;2=102</literal>
-      chunks to the CQL config file. For example
+      chunks to the &acro.cql; config file. For example
        <screen>
         relationModifier.relevant               = 2=102
        </screen>
        <screen>
         relationModifier.relevant               = 2=102
        </screen>
-      invokes dynamic ranking each time a CQL query of the form 
+      invokes dynamic ranking each time a &acro.cql; query of the form
        <screen>
         Z> querytype cql
         Z> f alvis.text =/relevant house
        </screen>
        is issued. Dynamic ranking can also be automatically used on
        <screen>
         Z> querytype cql
         Z> f alvis.text =/relevant house
        </screen>
        is issued. Dynamic ranking can also be automatically used on
-      specific CQL indexes by (for example) setting
+      specific &acro.cql; indexes by (for example) setting
        <screen>
         index.alvis.text                        = 1=text 2=102
        </screen>
        <screen>
         index.alvis.text                        = 1=text 2=102
        </screen>
-      which then invokes dynamic ranking each time a CQL query of the form 
+      which then invokes dynamic ranking each time a &acro.cql; query of the form
        <screen>
         Z> querytype cql
         Z> f alvis.text = house
        </screen>
        is issued.
       </para>
        <screen>
         Z> querytype cql
         Z> f alvis.text = house
        </screen>
        is issued.
       </para>
-     
+
      </sect3>
  
      </sect3>
  
-    </sect2>
+   </sect2>
  
  
  
  
- <sect2 id="administration-ranking-sorting">
-  <title>Sorting</title>
-   <para>
+   <sect2 id="administration-ranking-sorting">
+    <title>Sorting</title>
+    <para>
       &zebra; sorts efficiently using special sorting indexes
       (type=<literal>s</literal>; so each sortable index must be known
       at indexing time, specified in the configuration of record
       &zebra; sorts efficiently using special sorting indexes
       (type=<literal>s</literal>; so each sortable index must be known
       at indexing time, specified in the configuration of record
-     indexing.  For example, to enable sorting according to the BIB-1
+     indexing.  For example, to enable sorting according to the &acro.bib1;
       <literal>Date/time-added-to-db</literal> field, one could add the line
       <screen>
       <literal>Date/time-added-to-db</literal> field, one could add the line
       <screen>
-        xelm /*/@created               Date/time-added-to-db:s
+      xelm /*/@created               Date/time-added-to-db:s
       </screen>
       to any <literal>.abs</literal> record-indexing configuration file.
       Similarly, one could add an indexing element of the form
       </screen>
       to any <literal>.abs</literal> record-indexing configuration file.
       Similarly, one could add an indexing element of the form
-     <screen><![CDATA[       
+     <screen><![CDATA[
        <z:index name="date-modified" type="s">
        <z:index name="date-modified" type="s">
-       <xsl:value-of select="some/xpath"/>
-      </z:index>
+      <xsl:value-of select="some/xpath"/>
+     </z:index>
        ]]></screen>
       to any <literal>alvis</literal>-filter indexing stylesheet.
        ]]></screen>
       to any <literal>alvis</literal>-filter indexing stylesheet.
-     </para>
-     <para>
-      Indexing can be specified at searching time using a query term
-      carrying the non-standard
-      BIB-1 attribute-type <literal>7</literal>.  This removes the
-      need to send a Z39.50 <literal>Sort Request</literal>
-      separately, and can dramatically improve latency when the client
-      and server are on separate networks.
-      The sorting part of the query is separate from the rest of the
-      query - the actual search specification - and must be combined
-      with it using OR.
-     </para>
-     <para>
-      A sorting subquery needs two attributes: an index (such as a
-      BIB-1 type-1 attribute) specifying which index to sort on, and a
-      type-7 attribute whose value is be <literal>1</literal> for
-      ascending sorting, or <literal>2</literal> for descending.  The
-      term associated with the sorting attribute is the priority of
-      the sort key, where <literal>0</literal> specifies the primary
-      sort key, <literal>1</literal> the secondary sort key, and so
-      on.
-     </para>
+    </para>
+    <para>
+     Indexing can be specified at searching time using a query term
+     carrying the non-standard
+     &acro.bib1; attribute-type <literal>7</literal>.  This removes the
+     need to send a &acro.z3950; <literal>Sort Request</literal>
+     separately, and can dramatically improve latency when the client
+     and server are on separate networks.
+     The sorting part of the query is separate from the rest of the
+     query - the actual search specification - and must be combined
+     with it using OR.
+    </para>
+    <para>
+     A sorting subquery needs two attributes: an index (such as a
+     &acro.bib1; type-1 attribute) specifying which index to sort on, and a
+     type-7 attribute whose value is be <literal>1</literal> for
+     ascending sorting, or <literal>2</literal> for descending.  The
+     term associated with the sorting attribute is the priority of
+     the sort key, where <literal>0</literal> specifies the primary
+     sort key, <literal>1</literal> the secondary sort key, and so
+     on.
+    </para>
      <para>For example, a search for water, sort by title (ascending),
      <para>For example, a search for water, sort by title (ascending),
-    is expressed by the PQF query
+     is expressed by the &acro.pqf; query
       <screen>
       <screen>
-     @or @attr 1=1016 water @attr 7=1 @attr 1=4 0
+      @or @attr 1=1016 water @attr 7=1 @attr 1=4 0
       </screen>
       </screen>
-      whereas a search for water, sort by title ascending, 
+     whereas a search for water, sort by title ascending,
       then date descending would be
       <screen>
       then date descending would be
       <screen>
-     @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1
+      @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1
       </screen>
      </para>
      <para>
       Notice the fundamental differences between <literal>dynamic
       </screen>
      </para>
      <para>
       Notice the fundamental differences between <literal>dynamic
-     ranking</literal> and <literal>sorting</literal>: there can be
+      ranking</literal> and <literal>sorting</literal>: there can be
       only one ranking function defined and configured; but multiple
       sorting indexes can be specified dynamically at search
       time. Ranking does not need to use specific indexes, so
       dynamic ranking can be enabled and disabled without
       re-indexing; whereas, sorting indexes need to be
       defined before indexing.
       only one ranking function defined and configured; but multiple
       sorting indexes can be specified dynamically at search
       time. Ranking does not need to use specific indexes, so
       dynamic ranking can be enabled and disabled without
       re-indexing; whereas, sorting indexes need to be
       defined before indexing.
-     </para>
+    </para>
+
+   </sect2>
  
  
- </sect2>
  
  
+  </sect1>
  
  
- </sect1>
+  <sect1 id="administration-extended-services">
+   <title>Extended Services: Remote Insert, Update and Delete</title>
  
  
- <sect1 id="administration-extended-services">
-  <title>Extended Services: Remote Insert, Update and Delete</title>
-  
     <note>
      <para>
       Extended services are only supported when accessing the &zebra;
     <note>
      <para>
       Extended services are only supported when accessing the &zebra;
-     server using the <ulink url="&url.z39.50;">Z39.50</ulink>
-     protocol. The <ulink url="&url.sru;">SRU</ulink> protocol does
+     server using the <ulink url="&url.z39.50;">&acro.z3950;</ulink>
+     protocol. The <ulink url="&url.sru;">&acro.sru;</ulink> protocol does
       not support extended services.
      </para>
     </note>
       not support extended services.
      </para>
     </note>
-   
-  <para>
+
+   <para>
      The extended services are not enabled by default in zebra - due to the
      fact that they modify the system. &zebra; can be configured
      to allow anybody to
      The extended services are not enabled by default in zebra - due to the
      fact that they modify the system. &zebra; can be configured
      to allow anybody to
@@ -1504,15 +1479,15 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci
       perm.admin: rw
       passwd: passwordfile
      </screen>
       perm.admin: rw
       passwd: passwordfile
      </screen>
-    And in the password file 
+    And in the password file
      <filename>passwordfile</filename>, you have to specify users and
      <filename>passwordfile</filename>, you have to specify users and
-    encrypted passwords as colon separated strings. 
-    Use a tool like <filename>htpasswd</filename> 
-    to maintain the encrypted passwords. 
-    <screen> 
+    encrypted passwords as colon separated strings.
+    Use a tool like <filename>htpasswd</filename>
+    to maintain the encrypted passwords.
+    <screen>
       admin:secret
      </screen>
       admin:secret
      </screen>
-    It is essential to configure  &zebra; to store records internally, 
+    It is essential to configure  &zebra; to store records internally,
      and to support
      modifications and deletion of records:
      <screen>
      and to support
      modifications and deletion of records:
      <screen>
@@ -1520,156 +1495,167 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci
       storeKeys: 1
      </screen>
      The general record type should be set to any record filter which
       storeKeys: 1
      </screen>
      The general record type should be set to any record filter which
-    is able to parse XML records, you may use any of the two
+    is able to parse &acro.xml; records, you may use any of the two
      declarations (but not both simultaneously!)
      declarations (but not both simultaneously!)
-    <screen>    
-     recordType: grs.xml
-     # recordType: alvis.filter_alvis_config.xml
+    <screen>
+     recordType: dom.filter_dom_conf.xml
+     # recordType: grs.xml
+    </screen>
+    Notice the difference to the specific instructions
+    <screen>
+     recordType.xml: dom.filter_dom_conf.xml
+     # recordType.xml: grs.xml
      </screen>
      </screen>
+    which only work when indexing XML files from the filesystem using
+    the <literal>*.xml</literal> naming convention.
+   </para>
+   <para>
      To enable transaction safe shadow indexing,
      which is extra important for this kind of operation, set
      <screen>
       shadow: directoryname: size (e.g. 1000M)
      </screen>
      To enable transaction safe shadow indexing,
      which is extra important for this kind of operation, set
      <screen>
       shadow: directoryname: size (e.g. 1000M)
      </screen>
-     See <xref linkend="zebra-cfg"/> for additional information on
-     these configuration options.
+    See <xref linkend="zebra-cfg"/> for additional information on
+    these configuration options.
     </para>
     <note>
      <para>
       It is not possible to carry information about record types or
       similar to &zebra; when using extended services, due to
     </para>
     <note>
      <para>
       It is not possible to carry information about record types or
       similar to &zebra; when using extended services, due to
-     limitations of the <ulink url="&url.z39.50;">Z39.50</ulink>
+     limitations of the <ulink url="&url.z39.50;">&acro.z3950;</ulink>
       protocol. Therefore, indexing filters can not be chosen on a
       protocol. Therefore, indexing filters can not be chosen on a
-     per-record basis. One and only one general XML indexing filter
-     must be defined.  
+     per-record basis. One and only one general &acro.xml; indexing filter
+     must be defined.
       <!-- but because it is represented as an OID, we would need some
       form of proprietary mapping scheme between record type strings and
       OIDs. -->
       <!--
       However, as a minimum, it would be extremely useful to enable
       <!-- but because it is represented as an OID, we would need some
       form of proprietary mapping scheme between record type strings and
       OIDs. -->
       <!--
       However, as a minimum, it would be extremely useful to enable
-     people to use MARC21, assuming grs.marcxml.marc21 as a record
-     type.  
+     people to use &acro.marc21;, assuming grs.marcxml.marc21 as a record
+     type.
       -->
      </para>
     </note>
  
  
     <sect2 id="administration-extended-services-z3950">
       -->
      </para>
     </note>
  
  
     <sect2 id="administration-extended-services-z3950">
-    <title>Extended services in the Z39.50 protocol</title>
+    <title>Extended services in the &acro.z3950; protocol</title>
  
      <para>
  
      <para>
-     The <ulink url="&url.z39.50;">Z39.50</ulink> standard allows
+     The <ulink url="&url.z39.50;">&acro.z3950;</ulink> standard allows
       servers to accept special binary <emphasis>extended services</emphasis>
       protocol packages, which may be used to insert, update and delete
       records into servers. These carry  control and update
       servers to accept special binary <emphasis>extended services</emphasis>
       protocol packages, which may be used to insert, update and delete
       records into servers. These carry  control and update
-     information to the servers, which are encoded in seven package fields: 
+     information to the servers, which are encoded in seven package fields:
      </para>
  
      <table id="administration-extended-services-z3950-table" frame="top">
      </para>
  
      <table id="administration-extended-services-z3950-table" frame="top">
-     <title>Extended services Z39.50 Package Fields</title>
-      <tgroup cols="3">
-       <thead>
+     <title>Extended services &acro.z3950; Package Fields</title>
+     <tgroup cols="3">
+      <thead>
         <row>
         <row>
-         <entry>Parameter</entry>
-         <entry>Value</entry>
-         <entry>Notes</entry>
-        </row>
+       <entry>Parameter</entry>
+       <entry>Value</entry>
+       <entry>Notes</entry>
+       </row>
        </thead>
        </thead>
-       <tbody>
-        <row>
-         <entry><literal>type</literal></entry>
-         <entry><literal>'update'</literal></entry>
-         <entry>Must be set to trigger extended services</entry>
-        </row>
-        <row>
-         <entry><literal>action</literal></entry>
-         <entry><literal>string</literal></entry>
+      <tbody>
+       <row>
+       <entry><literal>type</literal></entry>
+       <entry><literal>'update'</literal></entry>
+       <entry>Must be set to trigger extended services</entry>
+       </row>
+       <row>
+       <entry><literal>action</literal></entry>
+       <entry><literal>string</literal></entry>
          <entry>
          <entry>
-         Extended service action type with 
+         Extended service action type with
           one of four possible values: <literal>recordInsert</literal>,
           <literal>recordReplace</literal>,
           <literal>recordDelete</literal>,
           and <literal>specialUpdate</literal>
          </entry>
           one of four possible values: <literal>recordInsert</literal>,
           <literal>recordReplace</literal>,
           <literal>recordDelete</literal>,
           and <literal>specialUpdate</literal>
          </entry>
-        </row>
-        <row>
-         <entry><literal>record</literal></entry>
-         <entry><literal>XML string</literal></entry>
-         <entry>An XML formatted string containing the record</entry>
-        </row>
-        <row>
-         <entry><literal>syntax</literal></entry>
-         <entry><literal>'xml'</literal></entry>
-         <entry>Only XML record syntax is supported</entry>
-        </row>
-        <row>
-         <entry><literal>recordIdOpaque</literal></entry>
-         <entry><literal>string</literal></entry>
-         <entry>
-         Optional  client-supplied, opaque record
+       </row>
+       <row>
+       <entry><literal>record</literal></entry>
+       <entry><literal>&acro.xml; string</literal></entry>
+       <entry>An &acro.xml; formatted string containing the record</entry>
+       </row>
+       <row>
+       <entry><literal>syntax</literal></entry>
+       <entry><literal>'xml'</literal></entry>
+       <entry>XML/SUTRS/MARC. GRS-1 not supported.
+        The default filter (record type) as given by recordType in
+        zebra.cfg is used to parse the record.</entry>
+       </row>
+       <row>
+       <entry><literal>recordIdOpaque</literal></entry>
+       <entry><literal>string</literal></entry>
+       <entry>
+         Optional client-supplied, opaque record
           identifier used under insert operations.
          </entry>
           identifier used under insert operations.
          </entry>
-        </row>
-        <row>
-         <entry><literal>recordIdNumber </literal></entry>
-         <entry><literal>positive number</literal></entry>
-         <entry>&zebra;'s internal system number,
-         not allowed for  <literal>recordInsert</literal> or 
+       </row>
+       <row>
+       <entry><literal>recordIdNumber </literal></entry>
+       <entry><literal>positive number</literal></entry>
+       <entry>&zebra;'s internal system number,
+         not allowed for  <literal>recordInsert</literal> or
           <literal>specialUpdate</literal> actions which result in fresh
           record inserts.
          </entry>
           <literal>specialUpdate</literal> actions which result in fresh
           record inserts.
          </entry>
-        </row>
-        <row>
-         <entry><literal>databaseName</literal></entry>
-         <entry><literal>database identifier</literal></entry>
+       </row>
+       <row>
+       <entry><literal>databaseName</literal></entry>
+       <entry><literal>database identifier</literal></entry>
          <entry>
          <entry>
-         The name of the database to which the extended services should be 
+         The name of the database to which the extended services should be
           applied.
          </entry>
           applied.
          </entry>
-        </row>
+       </row>
        </tbody>
        </tbody>
-      </tgroup>
-     </table>
+     </tgroup>
+    </table>
  
  
  
  
-   <para>
-    The <literal>action</literal> parameter can be any of 
-    <literal>recordInsert</literal> (will fail if the record already exists),
-    <literal>recordReplace</literal> (will fail if the record does not exist),
-    <literal>recordDelete</literal> (will fail if the record does not
-       exist), and
-    <literal>specialUpdate</literal> (will insert or update the record
-       as needed, record deletion is not possible).
-   </para>
+    <para>
+     The <literal>action</literal> parameter can be any of
+     <literal>recordInsert</literal> (will fail if the record already exists),
+     <literal>recordReplace</literal> (will fail if the record does not exist),
+     <literal>recordDelete</literal> (will fail if the record does not
+     exist), and
+     <literal>specialUpdate</literal> (will insert or update the record
+     as needed, record deletion is not possible).
+    </para>
  
      <para>
       During all actions, the
       usual rules for internal record ID generation apply, unless an
       optional <literal>recordIdNumber</literal> &zebra; internal ID or a
  
      <para>
       During all actions, the
       usual rules for internal record ID generation apply, unless an
       optional <literal>recordIdNumber</literal> &zebra; internal ID or a
-    <literal>recordIdOpaque</literal> string identifier is assigned. 
+     <literal>recordIdOpaque</literal> string identifier is assigned.
       The default ID generation is
       configured using the <literal>recordId:</literal> from
       The default ID generation is
       configured using the <literal>recordId:</literal> from
-     <filename>zebra.cfg</filename>.  
-     See <xref linkend="zebra-cfg"/>.   
+     <filename>zebra.cfg</filename>.
+     See <xref linkend="zebra-cfg"/>.
      </para>
  
      </para>
  
-   <para>
-    Setting of the <literal>recordIdNumber</literal> parameter, 
-    which must be an existing &zebra; internal system ID number, is not
-    allowed during any  <literal>recordInsert</literal> or 
+    <para>
+     Setting of the <literal>recordIdNumber</literal> parameter,
+     which must be an existing &zebra; internal system ID number, is not
+     allowed during any  <literal>recordInsert</literal> or
       <literal>specialUpdate</literal> action resulting in fresh record
       <literal>specialUpdate</literal> action resulting in fresh record
-    inserts.
+     inserts.
      </para>
  
      <para>
       When retrieving existing
      </para>
  
      <para>
       When retrieving existing
-     records indexed with GRS indexing filters, the &zebra; internal 
+     records indexed with &acro.grs1; indexing filters, the &zebra; internal
       ID number is returned in the field
       ID number is returned in the field
-    <literal>/*/id:idzebra/localnumber</literal> in the namespace
-    <literal>xmlns:id="http://www.indexdata.dk/zebra/"</literal>,
-    where it can be picked up for later record updates or deletes. 
+     <literal>/*/id:idzebra/localnumber</literal> in the namespace
+     <literal>xmlns:id="http://www.indexdata.dk/zebra/"</literal>,
+     where it can be picked up for later record updates or deletes.
      </para>
      </para>
- 
+
      <para>
       A new element set for retrieval of internal record
       data has been added, which can be used to access minimal records
      <para>
       A new element set for retrieval of internal record
       data has been added, which can be used to access minimal records
@@ -1679,134 +1665,204 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci
       See <xref linkend="special-retrieval"/>.
      </para>
  
       See <xref linkend="special-retrieval"/>.
      </para>
  
-   <para>
+    <para>
       The <literal>recordIdOpaque</literal> string parameter
       is an client-supplied, opaque record
       The <literal>recordIdOpaque</literal> string parameter
       is an client-supplied, opaque record
-     identifier, which may be  used under 
+     identifier, which may be  used under
       insert, update and delete operations. The
       client software is responsible for assigning these to
       records.      This identifier will
       replace zebra's own automagic identifier generation with a unique
       insert, update and delete operations. The
       client software is responsible for assigning these to
       records.      This identifier will
       replace zebra's own automagic identifier generation with a unique
-     mapping from <literal>recordIdOpaque</literal> to the 
+     mapping from <literal>recordIdOpaque</literal> to the
       &zebra; internal <literal>recordIdNumber</literal>.
       <emphasis>The opaque <literal>recordIdOpaque</literal> string
       &zebra; internal <literal>recordIdNumber</literal>.
       <emphasis>The opaque <literal>recordIdOpaque</literal> string
-     identifiers
+      identifiers
        are not visible in retrieval records, nor are
        searchable, so the value of this parameter is
        questionable. It serves mostly as a convenient mapping from
        application domain string identifiers to &zebra; internal ID's.
        are not visible in retrieval records, nor are
        searchable, so the value of this parameter is
        questionable. It serves mostly as a convenient mapping from
        application domain string identifiers to &zebra; internal ID's.
-     </emphasis> 
+     </emphasis>
      </para>
     </sect2>
  
      </para>
     </sect2>
  
-   
- <sect2 id="administration-extended-services-yaz-client">
-  <title>Extended services from yaz-client</title>
  
  
-   <para>
-    We can now start a yaz-client admin session and create a database:
-   <screen>
-    <![CDATA[
-     $ yaz-client localhost:9999 -u admin/secret
-     Z> adm-create
-     ]]>
-   </screen>
-    Now the <literal>Default</literal> database was created,
-    we can insert an XML file (esdd0006.grs
-    from example/gils/records) and index it:
-   <screen>  
-    <![CDATA[
-     Z> update insert id1234 esdd0006.grs
-     ]]>
-   </screen>
-    The 3rd parameter - <literal>id1234</literal> here -
-      is the  <literal>recordIdOpaque</literal> package field.
-   </para>
-   <para>
-    Actually, we should have a way to specify "no opaque record id" for
-    yaz-client's update command.. We'll fix that.
-   </para>
-   <para>
-    The newly inserted record can be searched as usual:
-    <screen>
-    <![CDATA[
-     Z> f utah
-     Sent searchRequest.
-     Received SearchResponse.
-     Search was a success.
-     Number of hits: 1, setno 1
-     SearchResult-1: term=utah cnt=1
-     records returned: 0
-     Elapsed: 0.014179
-     ]]>
-    </screen>
-   </para>
-   <para>
-     Let's delete the beast, using the same 
+   <sect2 id="administration-extended-services-yaz-client">
+    <title>Extended services from yaz-client</title>
+
+    <para>
+     We can now start a yaz-client admin session and create a database:
+     <screen>
+      <![CDATA[
+      $ yaz-client localhost:9999 -u admin/secret
+      Z> adm-create
+      ]]>
+     </screen>
+     Now the <literal>Default</literal> database was created,
+     we can insert an &acro.xml; file (esdd0006.grs
+     from example/gils/records) and index it:
+     <screen>
+      <![CDATA[
+      Z> update insert id1234 esdd0006.grs
+      ]]>
+     </screen>
+     The 3rd parameter - <literal>id1234</literal> here -
+     is the  <literal>recordIdOpaque</literal> package field.
+    </para>
+    <para>
+     Actually, we should have a way to specify "no opaque record id" for
+     yaz-client's update command.. We'll fix that.
+    </para>
+    <para>
+     The newly inserted record can be searched as usual:
+     <screen>
+      <![CDATA[
+      Z> f utah
+      Sent searchRequest.
+      Received SearchResponse.
+      Search was a success.
+      Number of hits: 1, setno 1
+      SearchResult-1: term=utah cnt=1
+      records returned: 0
+      Elapsed: 0.014179
+      ]]>
+     </screen>
+    </para>
+    <para>
+     Let's delete the beast, using the same
       <literal>recordIdOpaque</literal> string parameter:
       <literal>recordIdOpaque</literal> string parameter:
-    <screen>
-    <![CDATA[
-     Z> update delete id1234
-     No last record (update ignored)
-     Z> update delete 1 esdd0006.grs
-     Got extended services response
-     Status: done
-     Elapsed: 0.072441
-     Z> f utah
-     Sent searchRequest.
-     Received SearchResponse.
-     Search was a success.
-     Number of hits: 0, setno 2
-     SearchResult-1: term=utah cnt=0
-     records returned: 0
-     Elapsed: 0.013610
-     ]]>
+     <screen>
+      <![CDATA[
+      Z> update delete id1234
+      No last record (update ignored)
+      Z> update delete 1 esdd0006.grs
+      Got extended services response
+      Status: done
+      Elapsed: 0.072441
+      Z> f utah
+      Sent searchRequest.
+      Received SearchResponse.
+      Search was a success.
+      Number of hits: 0, setno 2
+      SearchResult-1: term=utah cnt=0
+      records returned: 0
+      Elapsed: 0.013610
+      ]]>
       </screen>
      </para>
      <para>
       </screen>
      </para>
      <para>
-    If shadow register is enabled in your
-    <filename>zebra.cfg</filename>,
-    you must run the adm-commit command
-    <screen>
-    <![CDATA[
-     Z> adm-commit
-     ]]>
-    </screen>
+     If shadow register is enabled in your
+     <filename>zebra.cfg</filename>,
+     you must run the adm-commit command
+     <screen>
+      <![CDATA[
+      Z> adm-commit
+      ]]>
+     </screen>
       after each update session in order write your changes from the
       shadow to the life register space.
       after each update session in order write your changes from the
       shadow to the life register space.
-   </para>
- </sect2>
+    </para>
+   </sect2>
  
  
-  
- <sect2 id="administration-extended-services-yaz-php">
-  <title>Extended services from yaz-php</title>
  
  
-   <para>
-    Extended services are also available from the YAZ PHP client layer. An
-    example of an YAZ-PHP extended service transaction is given here:
-    <screen>
-    <![CDATA[
-     $record = '<record><title>A fine specimen of a record</title></record>';
-
-     $options = array('action' => 'recordInsert',
-                      'syntax' => 'xml',
-                      'record' => $record,
-                      'databaseName' => 'mydatabase'
-                     );
-
-     yaz_es($yaz, 'update', $options);
-     yaz_es($yaz, 'commit', array());
-     yaz_wait();
-
-     if ($error = yaz_error($yaz))
-       echo "$error";
-     ]]>
-    </screen>  
+   <sect2 id="administration-extended-services-yaz-php">
+    <title>Extended services from yaz-php</title>
+
+    <para>
+     Extended services are also available from the &yaz; &acro.php; client layer. An
+     example of an &yaz;-&acro.php; extended service transaction is given here:
+     <screen>
+      <![CDATA[
+      $record = '<record><title>A fine specimen of a record</title></record>';
+
+      $options = array('action' => 'recordInsert',
+      'syntax' => 'xml',
+      'record' => $record,
+      'databaseName' => 'mydatabase'
+      );
+
+      yaz_es($yaz, 'update', $options);
+      yaz_es($yaz, 'commit', array());
+      yaz_wait();
+
+      if ($error = yaz_error($yaz))
+      echo "$error";
+      ]]>
+     </screen>
+    </para>
+   </sect2>
+
+   <sect2 id="administration-extended-services-debugging">
+    <title>Extended services debugging guide</title>
+    <para>
+     When debugging ES over PHP we recommend the following order of tests:
      </para>
      </para>
-    </sect2>
- </sect1>
  
  
-</chapter>
+    <itemizedlist>
+     <listitem>
+      <para>
+       Make sure you have a nice record on your filesystem, which you can
+       index from the filesystem by use of the zebraidx command.
+       Do it exactly as you planned, using one of the GRS-1 filters,
+       or the DOMXML filter.
+       When this works, proceed.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Check that your server setup is OK before you even coded one single
+       line PHP using ES.
+       Take the same record form the file system, and send as ES via
+       <literal>yaz-client</literal> like described in
+       <xref linkend="administration-extended-services-yaz-client"/>,
+       and
+       remember the <literal>-a</literal> option which tells you what
+       goes over the wire! Notice also the section on permissions:
+       try
+       <screen>
+        perm.anonymous: rw
+       </screen>
+       in <literal>zebra.cfg</literal> to make sure you do not run into
+       permission  problems (but never expose such an insecure setup on the
+       internet!!!). Then, make sure to set the general
+       <literal>recordType</literal> instruction, pointing correctly
+       to the GRS-1 filters,
+       or the DOMXML filters.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       If you insist on using the <literal>sysno</literal> in the
+       <literal>recordIdNumber</literal> setting,
+       please make sure you do only updates and deletes. Zebra's internal
+       system number is not allowed for
+       <literal>recordInsert</literal> or
+       <literal>specialUpdate</literal> actions
+       which result in fresh record inserts.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       If <literal>shadow register</literal> is enabled in your
+       <literal>zebra.cfg</literal>, you must remember running the
+       <screen>
+        Z> adm-commit
+       </screen>
+       command as well.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       If this works, then proceed to do the same thing in your PHP script.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+
+   </sect2>
+
+  </sect1>
+
+ </chapter>
  
   <!-- Keep this comment at the end of the file
   Local variables:
  
   <!-- Keep this comment at the end of the file
   Local variables:
@@ -1817,7 +1873,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci
   sgml-always-quote-attributes:t
   sgml-indent-step:1
   sgml-indent-data:t
   sgml-always-quote-attributes:t
   sgml-indent-step:1
   sgml-indent-data:t
- sgml-parent-document: "zebra.xml"
+ sgml-parent-document: "idzebra.xml"
   sgml-local-catalogs: nil
   sgml-namecase-general:t
   End:
   sgml-local-catalogs: nil
   sgml-namecase-general:t
   End: