X-Git-Url: http://git.indexdata.com/?p=idzebra-moved-to-github.git;a=blobdiff_plain;f=doc%2Fadministration.xml;h=b95db6619112ccf3c6a809d3822e27eec6e4b30b;hp=beba1663bf6b30735b2b021cecfb45a5ccace814;hb=HEAD;hpb=37dc985516f52f34fc8434cc8beb982bb0c8988f

diff --git a/doc/administration.xml b/doc/administration.xml
index beba166..b95db66 100644
--- a/doc/administration.xml
+++ b/doc/administration.xml
@@ -1,1452 +1,1475 @@
-<chapter id="administration">
- <!-- $Id: administration.xml,v 1.43 2006-09-03 21:37:26 adam Exp $ -->
- <title>Administrating Zebra</title>
- <!-- ### It's a bit daft that this chapter (which describes half of
-          the configuration-file formats) is separated from
-          "recordmodel-grs.xml" (which describes the other half) by the
-          instructions on running zebraidx and zebrasrv.  Some careful
-          re-ordering is required here.
- -->
+ <chapter id="administration">
+  <title>Administrating &zebra;</title>
+  <!-- ### It's a bit daft that this chapter (which describes half of
+  the configuration-file formats) is separated from
+  "recordmodel-grs.xml" (which describes the other half) by the
+  instructions on running zebraidx and zebrasrv.  Some careful
+  re-ordering is required here.
+  -->
 
- <para>
-  Unlike many simpler retrieval systems, Zebra supports safe, incremental
-  updates to an existing index.
- </para>
- 
- <para>
-  Normally, when Zebra modifies the index it reads a number of records
-  that you specify.
-  Depending on your specifications and on the contents of each record
-  one the following events take place for each record:
-  <variablelist>
-   
-   <varlistentry>
-    <term>Insert</term>
-    <listitem>
-     <para>
-      The record is indexed as if it never occurred before.
-      Either the Zebra system doesn't know how to identify the record or
-      Zebra can identify the record but didn't find it to be already indexed.
-     </para>
-    </listitem>
-   </varlistentry>
-   <varlistentry>
-    <term>Modify</term>
-    <listitem>
-     <para>
-      The record has already been indexed.
-      In this case either the contents of the record or the location
-      (file) of the record indicates that it has been indexed before.
-     </para>
-    </listitem>
-   </varlistentry>
-   <varlistentry>
-    <term>Delete</term>
-    <listitem>
-     <para>
-      The record is deleted from the index. As in the
-      update-case it must be able to identify the record.
-     </para>
-    </listitem>
-   </varlistentry>
-  </variablelist>
- </para>
- 
- <para>
-  Please note that in both the modify- and delete- case the Zebra
-  indexer must be able to generate a unique key that identifies the record 
-  in question (more on this below).
- </para>
- 
- <para>
-  To administrate the Zebra retrieval system, you run the
-  <literal>zebraidx</literal> program.
-  This program supports a number of options which are preceded by a dash,
-  and a few commands (not preceded by dash).
-</para>
- 
- <para>
-  Both the Zebra administrative tool and the Z39.50 server share a
-  set of index files and a global configuration file.
-  The name of the configuration file defaults to
-  <literal>zebra.cfg</literal>.
-  The configuration file includes specifications on how to index
-  various kinds of records and where the other configuration files
-  are located. <literal>zebrasrv</literal> and <literal>zebraidx</literal>
-  <emphasis>must</emphasis> be run in the directory where the
-  configuration file lives unless you indicate the location of the 
-  configuration file by option <literal>-c</literal>.
- </para>
- 
- <sect1 id="record-types">
-  <title>Record Types</title>
-  
-  <para>
-   Indexing is a per-record process, in which either insert/modify/delete
-   will occur. Before a record is indexed search keys are extracted from
-   whatever might be the layout the original record (sgml,html,text, etc..).
-   The Zebra system currently supports two fundamental types of records:
-   structured and simple text.
-   To specify a particular extraction process, use either the
-   command line option <literal>-t</literal> or specify a
-   <literal>recordType</literal> setting in the configuration file.
-  </para>
-  
- </sect1>
- 
- <sect1 id="zebra-cfg">
-  <title>The Zebra Configuration File</title>
-  
-  <para>
-   The Zebra configuration file, read by <literal>zebraidx</literal> and
-   <literal>zebrasrv</literal> defaults to <literal>zebra.cfg</literal>
-   unless specified by <literal>-c</literal> option.
-  </para>
-  
-  <para>
-   You can edit the configuration file with a normal text editor.
-   parameter names and values are separated by colons in the file. Lines
-   starting with a hash sign (<literal>#</literal>) are
-   treated as comments.
-  </para>
-  
-  <para>
-   If you manage different sets of records that share common
-   characteristics, you can organize the configuration settings for each
-   type into "groups".
-   When <literal>zebraidx</literal> is run and you wish to address a
-   given group you specify the group name with the <literal>-g</literal>
-   option.
-   In this case settings that have the group name as their prefix 
-   will be used by <literal>zebraidx</literal>.
-   If no <literal>-g</literal> option is specified, the settings
-   without prefix are used.
-  </para>
-  
-  <para>
-   In the configuration file, the group name is placed before the option
-   name itself, separated by a dot (.). For instance, to set the record type
-   for group <literal>public</literal> to <literal>grs.sgml</literal>
-   (the SGML-like format for structured records) you would write:
-  </para>
-  
-  <para>
-   <screen>
-    public.recordType: grs.sgml
-   </screen>   
-  </para>
-  
-  <para>
-   To set the default value of the record type to <literal>text</literal>
-   write:
-  </para>
-  
-  <para>
-   <screen>
-    recordType: text
-   </screen>
-  </para>
-  
   <para>
-   The available configuration settings are summarized below. They will be
-   explained further in the following sections.
+   Unlike many simpler retrieval systems, &zebra; supports safe, incremental
+   updates to an existing index.
   </para>
-  
-  <!--
-   FIXME - Didn't Adam make something to have multiple databases in multiple dirs...
-  -->
-  
+
   <para>
+   Normally, when &zebra; modifies the index it reads a number of records
+   that you specify.
+   Depending on your specifications and on the contents of each record
+   one the following events take place for each record:
    <variablelist>
-    
-    <varlistentry>
-     <term>
-      <emphasis>group</emphasis>
-      .recordType[<emphasis>.name</emphasis>]:
-      <replaceable>type</replaceable>
-     </term>
-     <listitem>
-      <para>
-       Specifies how records with the file extension
-       <emphasis>name</emphasis> should be handled by the indexer.
-       This option may also be specified as a command line option
-       (<literal>-t</literal>). Note that if you do not specify a
-       <emphasis>name</emphasis>, the setting applies to all files.
-       In general, the record type specifier consists of the elements (each
-       element separated by dot), <emphasis>fundamental-type</emphasis>,
-       <emphasis>file-read-type</emphasis> and arguments. Currently, two
-       fundamental types exist, <literal>text</literal> and
-       <literal>grs</literal>.
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term><emphasis>group</emphasis>.recordId: 
-     <replaceable>record-id-spec</replaceable></term>
-     <listitem>
-      <para>
-       Specifies how the records are to be identified when updated. See
-       <xref linkend="locating-records"/>.
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term><emphasis>group</emphasis>.database:
-     <replaceable>database</replaceable></term>
-     <listitem>
-      <para>
-       Specifies the Z39.50 database name.
-       <!-- FIXME - now we can have multiple databases in one server. -H -->
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term><emphasis>group</emphasis>.storeKeys:
-     <replaceable>boolean</replaceable></term>
-     <listitem>
-      <para>
-       Specifies whether key information should be saved for a given
-       group of records. If you plan to update/delete this type of
-       records later this should be specified as 1; otherwise it
-       should be 0 (default), to save register space.
-       <!-- ### this is the first mention of "register" -->
-       See <xref linkend="file-ids"/>.
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term><emphasis>group</emphasis>.storeData:
-      <replaceable>boolean</replaceable></term>
-     <listitem>
-      <para>
-       Specifies whether the records should be stored internally
-       in the Zebra system files.
-       If you want to maintain the raw records yourself,
-       this option should be false (0).
-       If you want Zebra to take care of the records for you, it
-       should be true(1).
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <!-- ### probably a better place to define "register" -->
-     <term>register: <replaceable>register-location</replaceable></term>
-     <listitem>
-      <para>
-       Specifies the location of the various register files that Zebra uses
-       to represent your databases.
-       See <xref linkend="register-location"/>.
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term>shadow: <replaceable>register-location</replaceable></term>
-     <listitem>
-      <para>
-       Enables the <emphasis>safe update</emphasis> facility of Zebra, and
-       tells the system where to place the required, temporary files.
-       See <xref linkend="shadow-registers"/>.
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term>lockDir: <replaceable>directory</replaceable></term>
-     <listitem>
-      <para>
-       Directory in which various lock files are stored.
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term>keyTmpDir: <replaceable>directory</replaceable></term>
-     <listitem>
-      <para>
-       Directory in which temporary files used during zebraidx's update
-       phase are stored. 
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term>setTmpDir: <replaceable>directory</replaceable></term>
-     <listitem>
-      <para>
-       Specifies the directory that the server uses for temporary result sets.
-       If not specified <literal>/tmp</literal> will be used.
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term>profilePath: <replaceable>path</replaceable></term>
-     <listitem>
-      <para>
-       Specifies a path of profile specification files. 
-       The path is composed of one or more directories separated by
-       colon. Similar to PATH for UNIX systems.
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term>attset: <replaceable>filename</replaceable></term>
-     <listitem>
-      <para>
-       Specifies the filename(s) of attribute set files for use in
-       searching. At least the Bib-1 set should be loaded
-       (<literal>bib1.att</literal>).
-       The <literal>profilePath</literal> setting is used to look for
-       the specified files.
-       See <xref linkend="attset-files"/>
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term>memMax: <replaceable>size</replaceable></term>
-     <listitem>
-      <para>
-       Specifies <replaceable>size</replaceable> of internal memory
-       to use for the zebraidx program.
-       The amount is given in megabytes - default is 4 (4 MB).
-       The more memory, the faster large updates happen, up to about
-       half the free memory available on the computer.
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term>tempfiles: <replaceable>Yes/Auto/No</replaceable></term>
-     <listitem>
-      <para>
-       Tells zebra if it should use temporary files when indexing. The
-       default is Auto, in which case zebra uses temporary files only
-       if it would need more that <replaceable>memMax</replaceable> 
-       megabytes of memory. This should be good for most uses.
-      </para>
-     </listitem>
-    </varlistentry>
 
     <varlistentry>
-     <term>root: <replaceable>dir</replaceable></term>
+     <term>Insert</term>
      <listitem>
       <para>
-       Specifies a directory base for Zebra. All relative paths
-       given (in profilePath, register, shadow) are based on this
-       directory. This setting is useful if your Zebra server
-       is running in a different directory from where
-       <literal>zebra.cfg</literal> is located.
+       The record is indexed as if it never occurred before.
+       Either the &zebra; system doesn't know how to identify the record or
+       &zebra; can identify the record but didn't find it to be already indexed.
       </para>
      </listitem>
     </varlistentry>
-
     <varlistentry>
-     <term>passwd: <replaceable>file</replaceable></term>
+     <term>Modify</term>
      <listitem>
       <para>
-       Specifies a file with description of user accounts for Zebra.
-       The format is similar to that known to Apache's htpasswd files
-       and UNIX' passwd files. Non-empty lines not beginning with
-       # are considered account lines. There is one account per-line.
-       A line consists of fields separate by a single colon character.
-       First field is username, second is password.
+       The record has already been indexed.
+       In this case either the contents of the record or the location
+       (file) of the record indicates that it has been indexed before.
       </para>
      </listitem>
     </varlistentry>
-
     <varlistentry>
-     <term>passwd.c: <replaceable>file</replaceable></term>
+     <term>Delete</term>
      <listitem>
       <para>
-       Specifies a file with description of user accounts for Zebra.
-       File format is similar to that used by the passwd directive except
-       that the password are encrypted. Use Apache's htpasswd or similar
-       for maintenance.
+       The record is deleted from the index. As in the
+       update-case it must be able to identify the record.
       </para>
      </listitem>
     </varlistentry>
-
-    <varlistentry>
-     <term>perm.<replaceable>user</replaceable>:
-     <replaceable>permstring</replaceable></term>
-     <listitem>
-      <para>
-       Specifies permissions (priviledge) for a user that are allowed
-       to access Zebra via the passwd system. There are two kinds
-       of permissions currently: read (r) and write(w). By default
-       users not listed in a permission directive are given the read
-       privilege. To specify permissions for a user with no
-       username, or Z39.50 anonymous style use
-	<literal>anonymous</literal>. The permstring consists of
-       a sequence of characters. Include character <literal>w</literal>
-       for write/update access, <literal>r</literal> for read access.
-      </para>
-     </listitem>
-    </varlistentry>
-
-    <varlistentry>
-      <term>dbaccess <replaceable>accessfile</replaceable></term>
-      <listitem>
-        <para>
-	  Names a file which lists database subscriptions for individual users.
-	  The access file should consists of lines of the form <literal>username:
-	  dbnames</literal>, where dbnames is a list of database names, seprated by
-	  '+'. No whitespace is allowed in the database list.
-	</para>
-      </listitem>
-    </varlistentry>
-
    </variablelist>
   </para>
-  
- </sect1>
- 
- <sect1 id="locating-records">
-  <title>Locating Records</title>
-  
-  <para>
-   The default behavior of the Zebra system is to reference the
-   records from their original location, i.e. where they were found when you
-   run <literal>zebraidx</literal>.
-   That is, when a client wishes to retrieve a record
-   following a search operation, the files are accessed from the place
-   where you originally put them - if you remove the files (without
-   running <literal>zebraidx</literal> again, the server will return
-   diagnostic number 14 (``System error in presenting records'') to
-   the client.
-  </para>
-  
-  <para>
-   If your input files are not permanent - for example if you retrieve
-   your records from an outside source, or if they were temporarily
-   mounted on a CD-ROM drive,
-   you may want Zebra to make an internal copy of them. To do this,
-   you specify 1 (true) in the <literal>storeData</literal> setting. When
-   the Z39.50 server retrieves the records they will be read from the
-   internal file structures of the system.
-  </para>
-  
- </sect1>
- 
- <sect1 id="simple-indexing">
-  <title>Indexing with no Record IDs (Simple Indexing)</title>
-  
-  <para>
-   If you have a set of records that are not expected to change over time
-   you may can build your database without record IDs.
-   This indexing method uses less space than the other methods and
-   is simple to use. 
-  </para>
-  
-  <para>
-   To use this method, you simply omit the <literal>recordId</literal> entry
-   for the group of files that you index. To add a set of records you use
-   <literal>zebraidx</literal> with the <literal>update</literal> command. The
-   <literal>update</literal> command will always add all of the records that it
-   encounters to the index - whether they have already been indexed or
-   not. If the set of indexed files change, you should delete all of the
-   index files, and build a new index from scratch.
-  </para>
-  
-  <para>
-   Consider a system in which you have a group of text files called
-   <literal>simple</literal>.
-   That group of records should belong to a Z39.50 database called
-   <literal>textbase</literal>.
-   The following <literal>zebra.cfg</literal> file will suffice:
-  </para>
-  <para>
-   
-   <screen>
-    profilePath: /usr/local/idzebra/tab
-    attset: bib1.att
-    simple.recordType: text
-    simple.database: textbase
-   </screen>
 
-  </para>
-  
-  <para>
-   Since the existing records in an index can not be addressed by their
-   IDs, it is impossible to delete or modify records when using this method.
-  </para>
-  
- </sect1>
- 
- <sect1 id="file-ids">
-  <title>Indexing with File Record IDs</title>
-  
-  <para>
-   If you have a set of files that regularly change over time: Old files
-   are deleted, new ones are added, or existing files are modified, you
-   can benefit from using the <emphasis>file ID</emphasis>
-   indexing methodology.
-   Examples of this type of database might include an index of WWW
-   resources, or a USENET news spool area.
-   Briefly speaking, the file key methodology uses the directory paths
-   of the individual records as a unique identifier for each record.
-   To perform indexing of a directory with file keys, again, you specify
-   the top-level directory after the <literal>update</literal> command.
-   The command will recursively traverse the directories and compare
-   each one with whatever have been indexed before in that same directory.
-   If a file is new (not in the previous version of the directory) it
-   is inserted into the registers; if a file was already indexed and
-   it has been modified since the last update, the index is also
-   modified; if a file has been removed since the last
-   visit, it is deleted from the index.
-  </para>
-  
-  <para>
-   The resulting system is easy to administrate. To delete a record you
-   simply have to delete the corresponding file (say, with the
-   <literal>rm</literal> command). And to add records you create new
-   files (or directories with files). For your changes to take effect
-   in the register you must run <literal>zebraidx update</literal> with
-   the same directory root again. This mode of operation requires more
-   disk space than simpler indexing methods, but it makes it easier for
-   you to keep the index in sync with a frequently changing set of data.
-   If you combine this system with the <emphasis>safe update</emphasis>
-   facility (see below), you never have to take your server off-line for
-   maintenance or register updating purposes.
-  </para>
-  
   <para>
-   To enable indexing with pathname IDs, you must specify
-   <literal>file</literal> as the value of <literal>recordId</literal>
-   in the configuration file. In addition, you should set
-   <literal>storeKeys</literal> to <literal>1</literal>, since the Zebra
-   indexer must save additional information about the contents of each record
-   in order to modify the indexes correctly at a later time.
+   Please note that in both the modify- and delete- case the &zebra;
+   indexer must be able to generate a unique key that identifies the record
+   in question (more on this below).
   </para>
-  
-   <!--
-    FIXME - There must be a simpler way to do this with Adams string tags -H
-     -->
 
   <para>
-   For example, to update records of group <literal>esdd</literal>
-   located below
-   <literal>/data1/records/</literal> you should type:
-   <screen>
-    $ zebraidx -g esdd update /data1/records
-   </screen>
+   To administrate the &zebra; retrieval system, you run the
+   <literal>zebraidx</literal> program.
+   This program supports a number of options which are preceded by a dash,
+   and a few commands (not preceded by dash).
   </para>
-  
+
   <para>
-   The corresponding configuration file includes:
-   <screen>
-    esdd.recordId: file
-    esdd.recordType: grs.sgml
-    esdd.storeKeys: 1
-   </screen>
+   Both the &zebra; administrative tool and the &acro.z3950; server share a
+   set of index files and a global configuration file.
+   The name of the configuration file defaults to
+   <literal>zebra.cfg</literal>.
+   The configuration file includes specifications on how to index
+   various kinds of records and where the other configuration files
+   are located. <literal>zebrasrv</literal> and <literal>zebraidx</literal>
+   <emphasis>must</emphasis> be run in the directory where the
+   configuration file lives unless you indicate the location of the
+   configuration file by option <literal>-c</literal>.
   </para>
-  
-  <note>
-   <para>You cannot start out with a group of records with simple
-    indexing (no record IDs as in the previous section) and then later
-    enable file record Ids. Zebra must know from the first time that you
-    index the group that
-    the files should be indexed with file record IDs.
+
+  <sect1 id="record-types">
+   <title>Record Types</title>
+
+   <para>
+    Indexing is a per-record process, in which either insert/modify/delete
+    will occur. Before a record is indexed search keys are extracted from
+    whatever might be the layout the original record (sgml,html,text, etc..).
+    The &zebra; system currently supports two fundamental types of records:
+    structured and simple text.
+    To specify a particular extraction process, use either the
+    command line option <literal>-t</literal> or specify a
+    <literal>recordType</literal> setting in the configuration file.
    </para>
-   </note>
-  
-  <para>
-   You cannot explicitly delete records when using this method (using the
-   <literal>delete</literal> command to <literal>zebraidx</literal>. Instead
-   you have to delete the files from the file system (or move them to a
-   different location)
-   and then run <literal>zebraidx</literal> with the
-   <literal>update</literal> command.
-  </para>
-  <!-- ### what happens if a file contains multiple records? -->
-</sect1>
- 
- <sect1 id="generic-ids">
-  <title>Indexing with General Record IDs</title>
-  
-  <para>
-   When using this method you construct an (almost) arbitrary, internal
-   record key based on the contents of the record itself and other system
-   information. If you have a group of records that explicitly associates
-   an ID with each record, this method is convenient. For example, the
-   record format may contain a title or a ID-number - unique within the group.
-   In either case you specify the Z39.50 attribute set and use-attribute
-   location in which this information is stored, and the system looks at
-   that field to determine the identity of the record.
-  </para>
-  
-  <para>
-   As before, the record ID is defined by the <literal>recordId</literal>
-   setting in the configuration file. The value of the record ID specification
-   consists of one or more tokens separated by whitespace. The resulting
-   ID is represented in the index by concatenating the tokens and
-   separating them by ASCII value (1).
-  </para>
-  
-  <para>
-   There are three kinds of tokens:
-   <variablelist>
-    
-    <varlistentry>
-     <term>Internal record info</term>
-     <listitem>
-      <para>
-       The token refers to a key that is
-       extracted from the record. The syntax of this token is
-       <literal>(</literal> <emphasis>set</emphasis> <literal>,</literal>
-       <emphasis>use</emphasis> <literal>)</literal>,
-       where <emphasis>set</emphasis> is the
-       attribute set name <emphasis>use</emphasis> is the
-       name or value of the attribute.
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term>System variable</term>
-     <listitem>
-      <para>
-       The system variables are preceded by
-       
-       <screen>
-        $
-       </screen>
-       and immediately followed by the system variable name, which
-       may one of
-       <variablelist>
-        
-        <varlistentry>
-         <term>group</term>
-         <listitem>
-          <para>
-           Group name.
-          </para>
-         </listitem>
-        </varlistentry>
-        <varlistentry>
-         <term>database</term>
-         <listitem>
-          <para>
-           Current database specified.
-          </para>
-         </listitem>
-        </varlistentry>
-        <varlistentry>
-         <term>type</term>
-         <listitem>
-          <para>
-           Record type.
-          </para>
-         </listitem>
-        </varlistentry>
-       </variablelist>
-      </para>
-     </listitem>
-    </varlistentry>
-    <varlistentry>
-     <term>Constant string</term>
-     <listitem>
-      <para>
-       A string used as part of the ID &mdash; surrounded
-       by single- or double quotes.
-      </para>
-     </listitem>
-    </varlistentry>
-   </variablelist>
-  </para>
-  
-  <para>
-   For instance, the sample GILS records that come with the Zebra
-   distribution contain a unique ID in the data tagged Control-Identifier.
-   The data is mapped to the Bib-1 use attribute Identifier-standard
-   (code 1007). To use this field as a record id, specify
-   <literal>(bib1,Identifier-standard)</literal> as the value of the
-   <literal>recordId</literal> in the configuration file.
-   If you have other record types that uses the same field for a
-   different purpose, you might add the record type
-   (or group or database name) to the record id of the gils
-   records as well, to prevent matches with other types of records.
-   In this case the recordId might be set like this:
-   
-   <screen>
-    gils.recordId: $type (bib1,Identifier-standard)
-   </screen>
-   
-  </para>
-  
-  <para>
-   (see <xref linkend="grs"/>
-    for details of how the mapping between elements of your records and
-    searchable attributes is established).
-  </para>
-  
-  <para>
-   As for the file record ID case described in the previous section,
-   updating your system is simply a matter of running
-   <literal>zebraidx</literal>
-   with the <literal>update</literal> command. However, the update with general
-   keys is considerably slower than with file record IDs, since all files
-   visited must be (re)read to discover their IDs. 
-  </para>
-  
-  <para>
-   As you might expect, when using the general record IDs
-   method, you can only add or modify existing records with the
-   <literal>update</literal> command.
-   If you wish to delete records, you must use the,
-   <literal>delete</literal> command, with a directory as a parameter.
-   This will remove all records that match the files below that root
-   directory.
-  </para>
-  
- </sect1>
- 
- <sect1 id="register-location">
-  <title>Register Location</title>
-  
-  <para>
-   Normally, the index files that form dictionaries, inverted
-   files, record info, etc., are stored in the directory where you run
-   <literal>zebraidx</literal>. If you wish to store these, possibly large,
-   files somewhere else, you must add the <literal>register</literal>
-   entry to the <literal>zebra.cfg</literal> file.
-   Furthermore, the Zebra system allows its file
-   structures to span multiple file systems, which is useful for
-   managing very large databases. 
-  </para>
-  
-  <para>
-   The value of the <literal>register</literal> setting is a sequence
-   of tokens. Each token takes the form:
-   
-   <screen>
-    <emphasis>dir</emphasis><literal>:</literal><emphasis>size</emphasis>. 
-   </screen>
-   
-   The <emphasis>dir</emphasis> specifies a directory in which index files
-   will be stored and the <emphasis>size</emphasis> specifies the maximum
-   size of all files in that directory. The Zebra indexer system fills
-   each directory in the order specified and use the next specified
-   directories as needed.
-   The <emphasis>size</emphasis> is an integer followed by a qualifier
-   code, 
-   <literal>b</literal> for bytes,
-   <literal>k</literal> for kilobytes.
-   <literal>M</literal> for megabytes,
-   <literal>G</literal> for gigabytes.
-  </para>
-  
-  <para>
-   For instance, if you have allocated two disks for your register, and
-   the first disk is mounted
-   on <literal>/d1</literal> and has 2GB of free space and the
-   second, mounted on <literal>/d2</literal> has 3.6 GB, you could
-   put this entry in your configuration file:
-   
-   <screen>
-    register: /d1:2G /d2:3600M
-   </screen>
-   
-  </para>
-  
-  <para>
-   Note that Zebra does not verify that the amount of space specified is
-   actually available on the directory (file system) specified - it is
-   your responsibility to ensure that enough space is available, and that
-   other applications do not attempt to use the free space. In a large
-   production system, it is recommended that you allocate one or more
-   file system exclusively to the Zebra register files.
-  </para>
-  
- </sect1>
- 
- <sect1 id="shadow-registers">
-  <title>Safe Updating - Using Shadow Registers</title>
-  
-  <sect2 id="shadow-registers-description">
-   <title>Description</title>
-   
+
+  </sect1>
+
+  <sect1 id="zebra-cfg">
+   <title>The &zebra; Configuration File</title>
+
+   <para>
+    The &zebra; configuration file, read by <literal>zebraidx</literal> and
+    <literal>zebrasrv</literal> defaults to <literal>zebra.cfg</literal>
+    unless specified by <literal>-c</literal> option.
+   </para>
+
    <para>
-    The Zebra server supports <emphasis>updating</emphasis> of the index
-    structures. That is, you can add, modify, or remove records from
-    databases managed by Zebra without rebuilding the entire index.
-    Since this process involves modifying structured files with various
-    references between blocks of data in the files, the update process
-    is inherently sensitive to system crashes, or to process interruptions:
-    Anything but a successfully completed update process will leave the
-    register files in an unknown state, and you will essentially have no
-    recourse but to re-index everything, or to restore the register files
-    from a backup medium.
-    Further, while the update process is active, users cannot be
-    allowed to access the system, as the contents of the register files
-    may change unpredictably.
+    You can edit the configuration file with a normal text editor.
+    parameter names and values are separated by colons in the file. Lines
+    starting with a hash sign (<literal>#</literal>) are
+    treated as comments.
    </para>
-   
+
    <para>
-    You can solve these problems by enabling the shadow register system in
-    Zebra.
-    During the updating procedure, <literal>zebraidx</literal> will temporarily
-    write changes to the involved files in a set of "shadow
-    files", without modifying the files that are accessed by the
-    active server processes. If the update procedure is interrupted by a
-    system crash or a signal, you simply repeat the procedure - the
-    register files have not been changed or damaged, and the partially
-    written shadow files are automatically deleted before the new updating
-    procedure commences.
+    If you manage different sets of records that share common
+    characteristics, you can organize the configuration settings for each
+    type into "groups".
+    When <literal>zebraidx</literal> is run and you wish to address a
+    given group you specify the group name with the <literal>-g</literal>
+    option.
+    In this case settings that have the group name as their prefix
+    will be used by <literal>zebraidx</literal>.
+    If no <literal>-g</literal> option is specified, the settings
+    without prefix are used.
    </para>
-   
+
    <para>
-    At the end of the updating procedure (or in a separate operation, if
-    you so desire), the system enters a "commit mode". First,
-    any active server processes are forced to access those blocks that
-    have been changed from the shadow files rather than from the main
-    register files; the unmodified blocks are still accessed at their
-    normal location (the shadow files are not a complete copy of the
-    register files - they only contain those parts that have actually been
-    modified). If the commit process is interrupted at any point during the
-    commit process, the server processes will continue to access the
-    shadow files until you can repeat the commit procedure and complete
-    the writing of data to the main register files. You can perform
-    multiple update operations to the registers before you commit the
-    changes to the system files, or you can execute the commit operation
-    at the end of each update operation. When the commit phase has
-    completed successfully, any running server processes are instructed to
-    switch their operations to the new, operational register, and the
-    temporary shadow files are deleted.
+    In the configuration file, the group name is placed before the option
+    name itself, separated by a dot (.). For instance, to set the record type
+    for group <literal>public</literal> to <literal>grs.sgml</literal>
+    (the &acro.sgml;-like format for structured records) you would write:
    </para>
-   
-  </sect2>
-  
-  <sect2 id="shadow-registers-how-to-use">
-   <title>How to Use Shadow Register Files</title>
-   
+
    <para>
-    The first step is to allocate space on your system for the shadow
-    files.
-    You do this by adding a <literal>shadow</literal> entry to the
-    <literal>zebra.cfg</literal> file.
-    The syntax of the <literal>shadow</literal> entry is exactly the
-    same as for the <literal>register</literal> entry
-    (see <xref linkend="register-location"/>).
-     The location of the shadow area should be
-     <emphasis>different</emphasis> from the location of the main register
-     area (if you have specified one - remember that if you provide no
-     <literal>register</literal> setting, the default register area is the
-     working directory of the server and indexing processes).
+    <screen>
+     public.recordType: grs.sgml
+    </screen>
    </para>
-   
+
    <para>
-    The following excerpt from a <literal>zebra.cfg</literal> file shows
-    one example of a setup that configures both the main register
-    location and the shadow file area.
-    Note that two directories or partitions have been set aside
-    for the shadow file area. You can specify any number of directories
-    for each of the file areas, but remember that there should be no
-    overlaps between the directories used for the main registers and the
-    shadow files, respectively.
+    To set the default value of the record type to <literal>text</literal>
+    write:
    </para>
+
    <para>
-    
     <screen>
-     register: /d1:500M
-     shadow: /scratch1:100M /scratch2:200M
+     recordType: text
     </screen>
-    
    </para>
-   
+
    <para>
-    When shadow files are enabled, an extra command is available at the
-    <literal>zebraidx</literal> command line.
-    In order to make changes to the system take effect for the
-    users, you'll have to submit a "commit" command after a
-    (sequence of) update operation(s).
+    The available configuration settings are summarized below. They will be
+    explained further in the following sections.
    </para>
-   
+
+   <!--
+   FIXME - Didn't Adam make something to have multiple databases in multiple dirs...
+   -->
+
    <para>
-    
+    <variablelist>
+
+     <varlistentry>
+      <term>
+       <emphasis>group</emphasis>
+       .recordType[<emphasis>.name</emphasis>]:
+       <replaceable>type</replaceable>
+      </term>
+      <listitem>
+       <para>
+	Specifies how records with the file extension
+	<emphasis>name</emphasis> should be handled by the indexer.
+	This option may also be specified as a command line option
+	(<literal>-t</literal>). Note that if you do not specify a
+	<emphasis>name</emphasis>, the setting applies to all files.
+	In general, the record type specifier consists of the elements (each
+	element separated by dot), <emphasis>fundamental-type</emphasis>,
+	<emphasis>file-read-type</emphasis> and arguments. Currently, two
+	fundamental types exist, <literal>text</literal> and
+	<literal>grs</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term><emphasis>group</emphasis>.recordId:
+       <replaceable>record-id-spec</replaceable></term>
+      <listitem>
+       <para>
+	Specifies how the records are to be identified when updated. See
+	<xref linkend="locating-records"/>.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term><emphasis>group</emphasis>.database:
+       <replaceable>database</replaceable></term>
+      <listitem>
+       <para>
+	Specifies the &acro.z3950; database name.
+	<!-- FIXME - now we can have multiple databases in one server. -H -->
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term><emphasis>group</emphasis>.storeKeys:
+       <replaceable>boolean</replaceable></term>
+      <listitem>
+       <para>
+	Specifies whether key information should be saved for a given
+	group of records. If you plan to update/delete this type of
+	records later this should be specified as 1; otherwise it
+	should be 0 (default), to save register space.
+	<!-- ### this is the first mention of "register" -->
+	See <xref linkend="file-ids"/>.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term><emphasis>group</emphasis>.storeData:
+       <replaceable>boolean</replaceable></term>
+      <listitem>
+       <para>
+	Specifies whether the records should be stored internally
+	in the &zebra; system files.
+	If you want to maintain the raw records yourself,
+	this option should be false (0).
+	If you want &zebra; to take care of the records for you, it
+	should be true(1).
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <!-- ### probably a better place to define "register" -->
+      <term>register: <replaceable>register-location</replaceable></term>
+      <listitem>
+       <para>
+	Specifies the location of the various register files that &zebra; uses
+	to represent your databases.
+	See <xref linkend="register-location"/>.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term>shadow: <replaceable>register-location</replaceable></term>
+      <listitem>
+       <para>
+	Enables the <emphasis>safe update</emphasis> facility of &zebra;, and
+	tells the system where to place the required, temporary files.
+	See <xref linkend="shadow-registers"/>.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term>lockDir: <replaceable>directory</replaceable></term>
+      <listitem>
+       <para>
+	Directory in which various lock files are stored.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term>keyTmpDir: <replaceable>directory</replaceable></term>
+      <listitem>
+       <para>
+	Directory in which temporary files used during zebraidx's update
+	phase are stored.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term>setTmpDir: <replaceable>directory</replaceable></term>
+      <listitem>
+       <para>
+	Specifies the directory that the server uses for temporary result sets.
+	If not specified <literal>/tmp</literal> will be used.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term>profilePath: <replaceable>path</replaceable></term>
+      <listitem>
+       <para>
+	Specifies a path of profile specification files.
+	The path is composed of one or more directories separated by
+	colon. Similar to <literal>PATH</literal> for UNIX systems.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term>modulePath: <replaceable>path</replaceable></term>
+      <listitem>
+       <para>
+	Specifies a path of record filter modules.
+	The path is composed of one or more directories separated by
+	colon. Similar to <literal>PATH</literal> for UNIX systems.
+	The 'make install' procedure typically puts modules in
+	<filename>/usr/local/lib/idzebra-2.0/modules</filename>.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term>index: <replaceable>filename</replaceable></term>
+      <listitem>
+       <para>
+	Defines the filename which holds fields structure
+	definitions. If omitted, the file <filename>default.idx</filename>
+	is read.
+	Refer to <xref linkend="default-idx-file"/> for
+	more information.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term>sortmax: <replaceable>integer</replaceable></term>
+      <listitem>
+       <para>
+	Specifies the maximum number of records that will be sorted
+	in a result set.  If the result set contains more than
+	<replaceable>integer</replaceable> records, records after the
+	limit will not be sorted.  If omitted, the default value is
+	1,000.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term>staticrank: <replaceable>integer</replaceable></term>
+      <listitem>
+       <para>
+	Enables whether static ranking is to be enabled (1) or
+	disabled (0). If omitted, it is disabled - corresponding
+	to a value of 0.
+	Refer to <xref linkend="administration-ranking-static"/> .
+       </para>
+      </listitem>
+     </varlistentry>
+
+
+     <varlistentry>
+      <term>estimatehits: <replaceable>integer</replaceable></term>
+      <listitem>
+       <para>
+	Controls whether &zebra; should calculate approximate hit counts and
+	at which hit count it is to be enabled.
+	A value of 0 disables approximate hit counts.
+	For a positive value approximate hit count is enabled
+	if it is known to be larger than <replaceable>integer</replaceable>.
+       </para>
+       <para>
+	Approximate hit counts can also be triggered by a particular
+	attribute in a query.
+	Refer to <xref linkend="querymodel-zebra-global-attr-limit"/>.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term>attset: <replaceable>filename</replaceable></term>
+      <listitem>
+       <para>
+	Specifies the filename(s) of attribute set files for use in
+	searching. In many configurations <filename>bib1.att</filename>
+	is used, but that is not required. If Classic Explain
+	attributes is to be used for searching,
+	<filename>explain.att</filename> must be given.
+	The path to att-files in general can be given using
+	<literal>profilePath</literal> setting.
+	See also <xref linkend="attset-files"/>.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term>memMax: <replaceable>size</replaceable></term>
+      <listitem>
+       <para>
+	Specifies <replaceable>size</replaceable> of internal memory
+	to use for the zebraidx program.
+	The amount is given in megabytes - default is 4 (4 MB).
+	The more memory, the faster large updates happen, up to about
+	half the free memory available on the computer.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term>tempfiles: <replaceable>Yes/Auto/No</replaceable></term>
+      <listitem>
+       <para>
+	Tells zebra if it should use temporary files when indexing. The
+	default is Auto, in which case zebra uses temporary files only
+	if it would need more that <replaceable>memMax</replaceable>
+	megabytes of memory. This should be good for most uses.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term>root: <replaceable>dir</replaceable></term>
+      <listitem>
+       <para>
+	Specifies a directory base for &zebra;. All relative paths
+	given (in profilePath, register, shadow) are based on this
+	directory. This setting is useful if your &zebra; server
+	is running in a different directory from where
+	<literal>zebra.cfg</literal> is located.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term>passwd: <replaceable>file</replaceable></term>
+      <listitem>
+       <para>
+	Specifies a file with description of user accounts for &zebra;.
+	The format is similar to that known to Apache's htpasswd files
+	and UNIX' passwd files. Non-empty lines not beginning with
+	# are considered account lines. There is one account per-line.
+	A line consists of fields separate by a single colon character.
+	First field is username, second is password.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term>passwd.c: <replaceable>file</replaceable></term>
+      <listitem>
+       <para>
+	Specifies a file with description of user accounts for &zebra;.
+	File format is similar to that used by the passwd directive except
+	that the password are encrypted. Use Apache's htpasswd or similar
+	for maintenance.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term>perm.<replaceable>user</replaceable>:
+       <replaceable>permstring</replaceable></term>
+      <listitem>
+       <para>
+	Specifies permissions (privilege) for a user that are allowed
+	to access &zebra; via the passwd system. There are two kinds
+	of permissions currently: read (r) and write(w). By default
+	users not listed in a permission directive are given the read
+	privilege. To specify permissions for a user with no
+	username, or &acro.z3950; anonymous style use
+	<literal>anonymous</literal>. The permstring consists of
+	a sequence of characters. Include character <literal>w</literal>
+	for write/update access, <literal>r</literal> for read access and
+	<literal>a</literal> to allow anonymous access through this account.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term>dbaccess: <replaceable>accessfile</replaceable></term>
+      <listitem>
+       <para>
+	Names a file which lists database subscriptions for individual users.
+	The access file should consists of lines of the form
+	<literal>username: dbnames</literal>, where dbnames is a list of
+	database names, separated by '+'. No whitespace is allowed in the
+	database list.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term>encoding: <replaceable>charsetname</replaceable></term>
+      <listitem>
+       <para>
+	Tells &zebra; to interpret the terms in Z39.50 queries as
+	having been encoded using the specified character
+	encoding.  The default is <literal>ISO-8859-1</literal>; one
+	useful alternative is <literal>UTF-8</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term>storeKeys: <replaceable>value</replaceable></term>
+      <listitem>
+       <para>
+	Specifies whether &zebra; keeps a copy of indexed keys.
+	Use a value of 1 to enable; 0 to disable. If storeKeys setting is
+	omitted, it is enabled. Enabled storeKeys
+	are required for updating and deleting records.  Disable only
+	storeKeys to save space and only plan to index data once.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term>storeData: <replaceable>value</replaceable></term>
+      <listitem>
+       <para>
+	Specifies whether &zebra; keeps a copy of indexed records.
+	Use a value of 1 to enable; 0 to disable. If storeData setting is
+	omitted, it is enabled. A storeData setting of 0 (disabled) makes
+	Zebra fetch records from the original locaction in the file
+	system using filename, file offset and file length. For the
+	DOM and ALVIS filter, the storeData setting is ignored.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    </variablelist>
+   </para>
+
+  </sect1>
+
+  <sect1 id="locating-records">
+   <title>Locating Records</title>
+
+   <para>
+    The default behavior of the &zebra; system is to reference the
+    records from their original location, i.e. where they were found when you
+    run <literal>zebraidx</literal>.
+    That is, when a client wishes to retrieve a record
+    following a search operation, the files are accessed from the place
+    where you originally put them - if you remove the files (without
+    running <literal>zebraidx</literal> again, the server will return
+    diagnostic number 14 (``System error in presenting records'') to
+    the client.
+   </para>
+
+   <para>
+    If your input files are not permanent - for example if you retrieve
+    your records from an outside source, or if they were temporarily
+    mounted on a CD-ROM drive,
+    you may want &zebra; to make an internal copy of them. To do this,
+    you specify 1 (true) in the <literal>storeData</literal> setting. When
+    the &acro.z3950; server retrieves the records they will be read from the
+    internal file structures of the system.
+   </para>
+
+  </sect1>
+
+  <sect1 id="simple-indexing">
+   <title>Indexing with no Record IDs (Simple Indexing)</title>
+
+   <para>
+    If you have a set of records that are not expected to change over time
+    you may can build your database without record IDs.
+    This indexing method uses less space than the other methods and
+    is simple to use.
+   </para>
+
+   <para>
+    To use this method, you simply omit the <literal>recordId</literal> entry
+    for the group of files that you index. To add a set of records you use
+    <literal>zebraidx</literal> with the <literal>update</literal> command. The
+    <literal>update</literal> command will always add all of the records that it
+    encounters to the index - whether they have already been indexed or
+    not. If the set of indexed files change, you should delete all of the
+    index files, and build a new index from scratch.
+   </para>
+
+   <para>
+    Consider a system in which you have a group of text files called
+    <literal>simple</literal>.
+    That group of records should belong to a &acro.z3950; database called
+    <literal>textbase</literal>.
+    The following <literal>zebra.cfg</literal> file will suffice:
+   </para>
+   <para>
+
     <screen>
-     $ zebraidx update /d1/records 
-     $ zebraidx commit
+     profilePath: /usr/local/idzebra/tab
+     attset: bib1.att
+     simple.recordType: text
+     simple.database: textbase
     </screen>
-    
+
+   </para>
+
+   <para>
+    Since the existing records in an index can not be addressed by their
+    IDs, it is impossible to delete or modify records when using this method.
+   </para>
+
+  </sect1>
+
+  <sect1 id="file-ids">
+   <title>Indexing with File Record IDs</title>
+
+   <para>
+    If you have a set of files that regularly change over time: Old files
+    are deleted, new ones are added, or existing files are modified, you
+    can benefit from using the <emphasis>file ID</emphasis>
+    indexing methodology.
+    Examples of this type of database might include an index of WWW
+    resources, or a USENET news spool area.
+    Briefly speaking, the file key methodology uses the directory paths
+    of the individual records as a unique identifier for each record.
+    To perform indexing of a directory with file keys, again, you specify
+    the top-level directory after the <literal>update</literal> command.
+    The command will recursively traverse the directories and compare
+    each one with whatever have been indexed before in that same directory.
+    If a file is new (not in the previous version of the directory) it
+    is inserted into the registers; if a file was already indexed and
+    it has been modified since the last update, the index is also
+    modified; if a file has been removed since the last
+    visit, it is deleted from the index.
    </para>
-   
+
    <para>
-    Or you can execute multiple updates before committing the changes:
+    The resulting system is easy to administrate. To delete a record you
+    simply have to delete the corresponding file (say, with the
+    <literal>rm</literal> command). And to add records you create new
+    files (or directories with files). For your changes to take effect
+    in the register you must run <literal>zebraidx update</literal> with
+    the same directory root again. This mode of operation requires more
+    disk space than simpler indexing methods, but it makes it easier for
+    you to keep the index in sync with a frequently changing set of data.
+    If you combine this system with the <emphasis>safe update</emphasis>
+    facility (see below), you never have to take your server off-line for
+    maintenance or register updating purposes.
    </para>
-   
+
    <para>
-    
+    To enable indexing with pathname IDs, you must specify
+    <literal>file</literal> as the value of <literal>recordId</literal>
+    in the configuration file. In addition, you should set
+    <literal>storeKeys</literal> to <literal>1</literal>, since the &zebra;
+    indexer must save additional information about the contents of each record
+    in order to modify the indexes correctly at a later time.
+   </para>
+
+   <!--
+   FIXME - There must be a simpler way to do this with Adams string tags -H
+   -->
+
+   <para>
+    For example, to update records of group <literal>esdd</literal>
+    located below
+    <literal>/data1/records/</literal> you should type:
     <screen>
-     $ zebraidx -g books update /d1/records  /d2/more-records
-     $ zebraidx -g fun update /d3/fun-records
-     $ zebraidx commit
+     $ zebraidx -g esdd update /data1/records
     </screen>
-    
    </para>
-   
+
    <para>
-    If one of the update operations above had been interrupted, the commit
-    operation on the last line would fail: <literal>zebraidx</literal>
-    will not let you commit changes that would destroy the running register.
-    You'll have to rerun all of the update operations since your last
-    commit operation, before you can commit the new changes.
+    The corresponding configuration file includes:
+    <screen>
+     esdd.recordId: file
+     esdd.recordType: grs.sgml
+     esdd.storeKeys: 1
+    </screen>
    </para>
-   
+
+   <note>
+    <para>You cannot start out with a group of records with simple
+     indexing (no record IDs as in the previous section) and then later
+     enable file record Ids. &zebra; must know from the first time that you
+     index the group that
+     the files should be indexed with file record IDs.
+    </para>
+   </note>
+
    <para>
-    Similarly, if the commit operation fails, <literal>zebraidx</literal>
-    will not let you start a new update operation before you have
-    successfully repeated the commit operation.
-    The server processes will keep accessing the shadow files rather
-    than the (possibly damaged) blocks of the main register files
-    until the commit operation has successfully completed.
+    You cannot explicitly delete records when using this method (using the
+    <literal>delete</literal> command to <literal>zebraidx</literal>. Instead
+    you have to delete the files from the file system (or move them to a
+    different location)
+    and then run <literal>zebraidx</literal> with the
+    <literal>update</literal> command.
    </para>
-   
+   <!-- ### what happens if a file contains multiple records? -->
+  </sect1>
+
+  <sect1 id="generic-ids">
+   <title>Indexing with General Record IDs</title>
+
    <para>
-    You should be aware that update operations may take slightly longer
-    when the shadow register system is enabled, since more file access
-    operations are involved. Further, while the disk space required for
-    the shadow register data is modest for a small update operation, you
-    may prefer to disable the system if you are adding a very large number
-    of records to an already very large database (we use the terms
-    <emphasis>large</emphasis> and <emphasis>modest</emphasis>
-    very loosely here, since every application will have a
-    different perception of size).
-    To update the system without the use of the the shadow files,
-    simply run <literal>zebraidx</literal> with the <literal>-n</literal>
-    option (note that you do not have to execute the
-    <emphasis>commit</emphasis> command of <literal>zebraidx</literal>
-    when you temporarily disable the use of the shadow registers in
-    this fashion.
-    Note also that, just as when the shadow registers are not enabled,
-    server processes will be barred from accessing the main register
-    while the update procedure takes place.
+    When using this method you construct an (almost) arbitrary, internal
+    record key based on the contents of the record itself and other system
+    information. If you have a group of records that explicitly associates
+    an ID with each record, this method is convenient. For example, the
+    record format may contain a title or a ID-number - unique within the group.
+    In either case you specify the &acro.z3950; attribute set and use-attribute
+    location in which this information is stored, and the system looks at
+    that field to determine the identity of the record.
    </para>
-   
-  </sect2>
-  
- </sect1>
 
+   <para>
+    As before, the record ID is defined by the <literal>recordId</literal>
+    setting in the configuration file. The value of the record ID specification
+    consists of one or more tokens separated by whitespace. The resulting
+    ID is represented in the index by concatenating the tokens and
+    separating them by ASCII value (1).
+   </para>
 
- <sect1 id="administration-ranking">
-  <title>Relevance Ranking and Sorting of Result Sets</title>
+   <para>
+    There are three kinds of tokens:
+    <variablelist>
+
+     <varlistentry>
+      <term>Internal record info</term>
+      <listitem>
+       <para>
+	The token refers to a key that is
+	extracted from the record. The syntax of this token is
+	<literal>(</literal> <emphasis>set</emphasis> <literal>,</literal>
+	<emphasis>use</emphasis> <literal>)</literal>,
+	where <emphasis>set</emphasis> is the
+	attribute set name <emphasis>use</emphasis> is the
+	name or value of the attribute.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term>System variable</term>
+      <listitem>
+       <para>
+	The system variables are preceded by
+
+	<screen>
+	 $
+	</screen>
+	and immediately followed by the system variable name, which
+	may one of
+	<variablelist>
+
+	 <varlistentry>
+	  <term>group</term>
+	  <listitem>
+	   <para>
+	    Group name.
+	   </para>
+	  </listitem>
+	 </varlistentry>
+	 <varlistentry>
+	  <term>database</term>
+	  <listitem>
+	   <para>
+	    Current database specified.
+	   </para>
+	  </listitem>
+	 </varlistentry>
+	 <varlistentry>
+	  <term>type</term>
+	  <listitem>
+	   <para>
+	    Record type.
+	   </para>
+	  </listitem>
+	 </varlistentry>
+	</variablelist>
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term>Constant string</term>
+      <listitem>
+       <para>
+	A string used as part of the ID &mdash; surrounded
+	by single- or double quotes.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+   </para>
 
-  <sect2 id="administration-overview">
-   <title>Overview</title>
    <para>
-    The default ordering of a result set is left up to the server,
-    which inside Zebra means sorting in ascending document ID order. 
-    This is not always the order humans want to browse the sometimes
-    quite large hit sets. Ranking and sorting comes to the rescue.
+    For instance, the sample GILS records that come with the &zebra;
+    distribution contain a unique ID in the data tagged Control-Identifier.
+    The data is mapped to the &acro.bib1; use attribute Identifier-standard
+    (code 1007). To use this field as a record id, specify
+    <literal>(bib1,Identifier-standard)</literal> as the value of the
+    <literal>recordId</literal> in the configuration file.
+    If you have other record types that uses the same field for a
+    different purpose, you might add the record type
+    (or group or database name) to the record id of the gils
+    records as well, to prevent matches with other types of records.
+    In this case the recordId might be set like this:
+
+    <screen>
+     gils.recordId: $type (bib1,Identifier-standard)
+    </screen>
+
    </para>
 
-   <para> 
-    In cases where a good presentation ordering can be computed at
-    indexing time, we can use a fixed <literal>static ranking</literal>
-    scheme, which is provided for the <literal>alvis</literal>
-    indexing filter. This defines a fixed ordering of hit lists,
-    independently of the query issued. 
+   <para>
+    (see <xref linkend="grs"/>
+    for details of how the mapping between elements of your records and
+    searchable attributes is established).
    </para>
 
    <para>
-    There are cases, however, where relevance of hit set documents is
-    highly dependent on the query processed.
-    Simply put, <literal>dynamic relevance ranking</literal> 
-    sorts a set of retrieved records such that those most likely to be
-    relevant to your request are retrieved first. 
-    Internally, Zebra retrieves all documents that satisfy your
-    query, and re-orders the hit list to arrange them based on
-    a measurement of similarity between your query and the content of
-    each record. 
+    As for the file record ID case described in the previous section,
+    updating your system is simply a matter of running
+    <literal>zebraidx</literal>
+    with the <literal>update</literal> command. However, the update with general
+    keys is considerably slower than with file record IDs, since all files
+    visited must be (re)read to discover their IDs.
    </para>
 
    <para>
-    Finally, there are situations where hit sets of documents should be
-    <literal>sorted</literal> during query time according to the
-    lexicographical ordering of certain sort indexes created at
-    indexing time.
+    As you might expect, when using the general record IDs
+    method, you can only add or modify existing records with the
+    <literal>update</literal> command.
+    If you wish to delete records, you must use the,
+    <literal>delete</literal> command, with a directory as a parameter.
+    This will remove all records that match the files below that root
+    directory.
    </para>
-  </sect2>
 
+  </sect1>
+
+  <sect1 id="register-location">
+   <title>Register Location</title>
 
- <sect2 id="administration-ranking-static">
-  <title>Static Ranking</title>
-  
    <para>
-    Zebra uses internally inverted indexes to look up term occurencies
-    in documents. Multiple queries from different indexes can be
-    combined by the binary boolean operations <literal>AND</literal>, 
-    <literal>OR</literal> and/or <literal>NOT</literal> (which
-    is in fact a binary <literal>AND NOT</literal> operation). 
-    To ensure fast query execution
-    speed, all indexes have to be sorted in the same order.
+    Normally, the index files that form dictionaries, inverted
+    files, record info, etc., are stored in the directory where you run
+    <literal>zebraidx</literal>. If you wish to store these, possibly large,
+    files somewhere else, you must add the <literal>register</literal>
+    entry to the <literal>zebra.cfg</literal> file.
+    Furthermore, the &zebra; system allows its file
+    structures to span multiple file systems, which is useful for
+    managing very large databases.
    </para>
+
    <para>
-    The indexes are normally sorted according to document 
-    <literal>ID</literal> in
-    ascending order, and any query which does not invoke a special
-    re-ranking function will therefore retrieve the result set in
-    document 
-    <literal>ID</literal>
-    order.
+    The value of the <literal>register</literal> setting is a sequence
+    of tokens. Each token takes the form:
+
+    <emphasis>dir</emphasis><literal>:</literal><emphasis>size</emphasis>
+
+    The <emphasis>dir</emphasis> specifies a directory in which index files
+    will be stored and the <emphasis>size</emphasis> specifies the maximum
+    size of all files in that directory. The &zebra; indexer system fills
+    each directory in the order specified and use the next specified
+    directories as needed.
+    The <emphasis>size</emphasis> is an integer followed by a qualifier
+    code,
+    <literal>b</literal> for bytes,
+    <literal>k</literal> for kilobytes.
+    <literal>M</literal> for megabytes,
+    <literal>G</literal> for gigabytes.
+    Specifying a negative value disables the checking (it still needs the unit,
+    use <literal>-1b</literal>).
    </para>
+
    <para>
-    If one defines the 
+    For instance, if you have allocated three disks for your register, and
+    the first disk is mounted
+    on <literal>/d1</literal> and has 2GB of free space, the
+    second, mounted on <literal>/d2</literal> has 3.6 GB, and the third,
+    on which you have more space than you bother to worry about, mounted on
+    <literal>/d3</literal> you could put this entry in your configuration file:
+
     <screen>
-    staticrank: 1 
-    </screen> 
-    directive in the main core Zebra configuration file, the internal document
-    keys used for ordering are augmented by a preceding integer, which
-    contains the static rank of a given document, and the index lists
-    are ordered 
-    first by ascending static rank,
-    then by ascending document <literal>ID</literal>.
-    Zero
-    is the ``best'' rank, as it occurs at the
-    beginning of the list; higher numbers represent worse scores.
+     register: /d1:2G /d2:3600M /d3:-1b
+    </screen>
    </para>
+
    <para>
-    The experimental <literal>alvis</literal> filter provides a
-    directive to fetch static rank information out of the indexed XML
-    records, thus making <emphasis>all</emphasis> hit sets ordered
-    after <emphasis>ascending</emphasis> static
-    rank, and for those doc's which have the same static rank, ordered
-    after <emphasis>ascending</emphasis> doc <literal>ID</literal>.
-    See <xref linkend="record-model-alvisxslt"/> for the gory details.
+    Note that &zebra; does not verify that the amount of space specified is
+    actually available on the directory (file system) specified - it is
+    your responsibility to ensure that enough space is available, and that
+    other applications do not attempt to use the free space. In a large
+    production system, it is recommended that you allocate one or more
+    file system exclusively to the &zebra; register files.
    </para>
-    </sect2>
 
+  </sect1>
 
- <sect2 id="administration-ranking-dynamic">
-  <title>Dynamic Ranking</title>
-   <para>
-    In order to fiddle with the static rank order, it is necessary to
-    invoke additional re-ranking/re-ordering using dynamic
-    ranking or score functions. These functions return positive
-    integer scores, where <emphasis>highest</emphasis> score is 
-    ``best'';
-    hit sets are sorted according to <emphasis>descending</emphasis> 
-    scores (in contrary
-    to the index lists which are sorted according to
-    ascending rank number and document ID).
-   </para>
-   <para>
-    Dynamic ranking is enabled by a directive like one of the
-    following in the zebra configuration file (use only one of these a time!):
-    <screen> 
-    rank: rank-1        # default TDF-IDF like
-    rank: rank-static   # dummy do-nothing
-    </screen>
-   </para>
- 
-   <para>
-    Dynamic ranking is done at query time rather than
-    indexing time (this is why we
-    call it ``dynamic ranking'' in the first place ...)
-    It is invoked by adding
-    the Bib-1 relation attribute with
-    value ``relevance'' to the PQF query (that is,
-    <literal>@attr&nbsp;2=102</literal>, see also  
-    <ulink url="&url.z39.50;bib1.html">
-     The BIB-1 Attribute Set Semantics</ulink>, also in 
-      <ulink url="&url.z39.50.attset.bib1;">HTML</ulink>). 
-    To find all articles with the word <literal>Eoraptor</literal> in
-    the title, and present them relevance ranked, issue the PQF query:
-    <screen>
-     @attr 2=102 @attr 1=4 Eoraptor
-    </screen>
-   </para>
+  <sect1 id="shadow-registers">
+   <title>Safe Updating - Using Shadow Registers</title>
 
-    <sect3 id="administration-ranking-dynamic-rank1">
-     <title>Dynamically ranking using PQF queries with the 'rank-1' 
-      algorithm</title>
+   <sect2 id="shadow-registers-description">
+    <title>Description</title>
 
-   <para>
-     The default <literal>rank-1</literal> ranking module implements a 
-     TF/IDF (Term Frequecy over Inverse Document Frequency) like
-     algorithm. In contrast to the usual defintion of TF/IDF
-     algorithms, which only considers searching in one full-text
-     index, this one works on multiple indexes at the same time.
-     More precisely, 
-     Zebra does boolean queries and searches in specific addressed
-     indexes (there are inverted indexes pointing from terms in the
-     dictionary to documents and term positions inside documents). 
-     It works like this:
-     <variablelist>
-      <varlistentry>
-       <term>Query Components</term>
-       <listitem>
-        <para>
-         First, the boolean query is dismantled into it's principal components,
-         i.e. atomic queries where one term is looked up in one index.
-         For example, the query
-         <screen>
-        @attr 2=102 @and @attr 1=1010 Utah @attr 1=1018 Springer
-         </screen>
-         is a boolean AND between the atomic parts
-         <screen>
-       @attr 2=102 @attr 1=1010 Utah
-         </screen>
-          and
-         <screen>
-       @attr 2=102 @attr 1=1018 Springer
-         </screen>
-         which gets processed each for itself.
-        </para>
-       </listitem>
-      </varlistentry>
-
-      <varlistentry>
-       <term>Atomic hit lists</term>
-       <listitem>
-        <para>
-         Second, for each atomic query, the hit list of documents is
-         computed.
-        </para>
-        <para>
-         In this example, two hit lists for each index  
-         <literal>@attr 1=1010</literal>  and  
-         <literal>@attr 1=1018</literal> are computed.
-        </para>
-       </listitem>
-      </varlistentry>
-
-      <varlistentry>
-       <term>Atomic scores</term>
-       <listitem>
-        <para>
-         Third, each document in the hit list is assigned a score (_if_ ranking
-         is enabled and requested in the query)  using a TF/IDF scheme.
-        </para>
-        <para>
-         In this example, both atomic parts of the query assign the magic
-         <literal>@attr 2=102</literal> relevance attribute, and are
-         to be used in the relevance ranking functions. 
-        </para>
-        <para>
-         It is possible to apply dynamic ranking on only parts of the
-         PQF query: 
-         <screen>
-          @and @attr 2=102 @attr 1=1010 Utah @attr 1=1018 Springer
-         </screen>
-         searches for all documents which have the term 'Utah' on the
-         body of text, and which have the term 'Springer' in the publisher
-         field, and sort them in the order of the relevance ranking made on
-         the body-of-text index only. 
-        </para>
-       </listitem>
-      </varlistentry>
-
-      <varlistentry>
-       <term>Hit list merging</term>
-       <listitem>
-        <para>
-         Fourth, the atomic hit lists are merged according to the boolean
-         conditions to a final hit list of documents to be returned.
-        </para>
-        <para>
-        This step is always performed, independently of the fact that
-        dynamic ranking is enabled or not.
-        </para>
-       </listitem>
-      </varlistentry>
-
-      <varlistentry>
-       <term>Document score computation</term>
-       <listitem>
-        <para>
-         Fifth, the total score of a document is computed as a linear
-         combination of the atomic scores of the atomic hit lists
-        </para>
-        <para>
-         Ranking weights may be used to pass a value to a ranking
-         algorithm, using the non-standard BIB-1 attribute type 9.
-         This allows one branch of a query to use one value while
-         another branch uses a different one.  For example, we can search
-         for <literal>utah</literal> in the 
-         <literal>@attr 1=4</literal> index with weight 30, as
-         well as in the <literal>@attr 1=1010</literal> index with weight 20:
-         <screen>
-         @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 @attr 1=1010 city
-         </screen>
-        </para>
-        <para>
-         The default weight is
-         sqrt(1000) ~ 34 , as the Z39.50 standard prescribes that the top score
-         is 1000 and the bottom score is 0, encoded in integers.
-        </para>
-        <warning>
-         <para>
-          The ranking-weight feature is experimental. It may change in future
-          releases of zebra. 
-         </para>
-        </warning>
-       </listitem>
-      </varlistentry>
-
-      <varlistentry>
-       <term>Re-sorting of hit list</term>
-       <listitem>
-        <para>
-         Finally, the final hit list is re-ordered according to scores.
-        </para>
-       </listitem>
-      </varlistentry>
-     </variablelist>
- 
-
-<!--
-Still need to describe the exact TF/IDF formula. Here's the info, need -->
-<!--to extract it in human readable form .. MC
-
-static int calc (void *set_handle, zint sysno, zint staticrank,
-                 int *stop_flag)
-{
-    int i, lo, divisor, score = 0;
-    struct rank_set_info *si = (struct rank_set_info *) set_handle;
-
-    if (!si->no_rank_entries)
-        return -1;   /* ranking not enabled for any terms */
-
-    for (i = 0; i < si->no_entries; i++)
-    {
-        yaz_log(log_level, "calc: i=%d rank_flag=%d lo=%d",
-                i, si->entries[i].rank_flag, si->entries[i].local_occur);
-        if (si->entries[i].rank_flag && (lo = si->entries[i].local_occur))
-            score += (8+log2_int (lo)) * si->entries[i].global_inv *
-                si->entries[i].rank_weight;
-    }
-    divisor = si->no_rank_entries * (8+log2_int (si->last_pos/si->no_entries));
-    score = score / divisor;
-    yaz_log(log_level, "calc sysno=" ZINT_FORMAT " score=%d", sysno, score);
-    if (score > 1000)
-        score = 1000;
-    /* reset the counts for the next term */
-    for (i = 0; i < si->no_entries; i++)
-        si->entries[i].local_occur = 0;
-    return score;
-}
-
-
-where lo = si->entries[i].local_occur is the local documents term-within-index frequency, si->entries[i].global_inv represents the IDF part (computed in static void *begin()), and
-si->entries[i].rank_weight is the weight assigner per index (default 34, or set in the @attr 9=xyz magic)
-
-Finally, the IDF part is computed as:
-
-static void *begin (struct zebra_register *reg,
-                    void *class_handle, RSET rset, NMEM nmem,
-                    TERMID *terms, int numterms)
-{
-    struct rank_set_info *si =
-        (struct rank_set_info *) nmem_malloc (nmem,sizeof(*si));
-    int i;
-
-    yaz_log(log_level, "rank-1 begin");
-    si->no_entries = numterms;
-    si->no_rank_entries = 0;
-    si->nmem=nmem;
-    si->entries = (struct rank_term_info *)
-        nmem_malloc (si->nmem, sizeof(*si->entries)*numterms);
-    for (i = 0; i < numterms; i++)
-    {
-        zint g = rset_count(terms[i]->rset);
-        yaz_log(log_level, "i=%d flags=%s '%s'", i,
-                terms[i]->flags, terms[i]->name );
-        if  (!strncmp (terms[i]->flags, "rank,", 5))
-        {
-            const char *cp = strstr(terms[i]->flags+4, ",w=");
-            si->entries[i].rank_flag = 1;
-            if (cp)
-                si->entries[i].rank_weight = atoi (cp+3);
-            else
-              si->entries[i].rank_weight = 34; /* sqrroot of 1000 */
-            yaz_log(log_level, " i=%d weight=%d g="ZINT_FORMAT, i,
-                     si->entries[i].rank_weight, g);
-            (si->no_rank_entries)++;
-        }
-        else
-            si->entries[i].rank_flag = 0;
-        si->entries[i].local_occur = 0;  /* FIXME */
-        si->entries[i].global_occur = g;
-        si->entries[i].global_inv = 32 - log2_int (g);
-        yaz_log(log_level, " global_inv = %d g = " ZINT_FORMAT,
-                (int) (32-log2_int (g)), g);
-        si->entries[i].term = terms[i];
-        si->entries[i].term_index=i;
-        terms[i]->rankpriv = &(si->entries[i]);
-    }
-    return si;
-}
-
-
-where g = rset_count(terms[i]->rset) is the count of all documents in this specific index hit list, and the IDF part then is
-
- si->entries[i].global_inv = 32 - log2_int (g);
-   -->
+    <para>
+     The &zebra; server supports <emphasis>updating</emphasis> of the index
+     structures. That is, you can add, modify, or remove records from
+     databases managed by &zebra; without rebuilding the entire index.
+     Since this process involves modifying structured files with various
+     references between blocks of data in the files, the update process
+     is inherently sensitive to system crashes, or to process interruptions:
+     Anything but a successfully completed update process will leave the
+     register files in an unknown state, and you will essentially have no
+     recourse but to re-index everything, or to restore the register files
+     from a backup medium.
+     Further, while the update process is active, users cannot be
+     allowed to access the system, as the contents of the register files
+     may change unpredictably.
+    </para>
+
+    <para>
+     You can solve these problems by enabling the shadow register system in
+     &zebra;.
+     During the updating procedure, <literal>zebraidx</literal> will temporarily
+     write changes to the involved files in a set of "shadow
+     files", without modifying the files that are accessed by the
+     active server processes. If the update procedure is interrupted by a
+     system crash or a signal, you simply repeat the procedure - the
+     register files have not been changed or damaged, and the partially
+     written shadow files are automatically deleted before the new updating
+     procedure commences.
+    </para>
+
+    <para>
+     At the end of the updating procedure (or in a separate operation, if
+     you so desire), the system enters a "commit mode". First,
+     any active server processes are forced to access those blocks that
+     have been changed from the shadow files rather than from the main
+     register files; the unmodified blocks are still accessed at their
+     normal location (the shadow files are not a complete copy of the
+     register files - they only contain those parts that have actually been
+     modified). If the commit process is interrupted at any point during the
+     commit process, the server processes will continue to access the
+     shadow files until you can repeat the commit procedure and complete
+     the writing of data to the main register files. You can perform
+     multiple update operations to the registers before you commit the
+     changes to the system files, or you can execute the commit operation
+     at the end of each update operation. When the commit phase has
+     completed successfully, any running server processes are instructed to
+     switch their operations to the new, operational register, and the
+     temporary shadow files are deleted.
+    </para>
+
+   </sect2>
+
+   <sect2 id="shadow-registers-how-to-use">
+    <title>How to Use Shadow Register Files</title>
+
+    <para>
+     The first step is to allocate space on your system for the shadow
+     files.
+     You do this by adding a <literal>shadow</literal> entry to the
+     <literal>zebra.cfg</literal> file.
+     The syntax of the <literal>shadow</literal> entry is exactly the
+     same as for the <literal>register</literal> entry
+     (see <xref linkend="register-location"/>).
+     The location of the shadow area should be
+     <emphasis>different</emphasis> from the location of the main register
+     area (if you have specified one - remember that if you provide no
+     <literal>register</literal> setting, the default register area is the
+     working directory of the server and indexing processes).
+    </para>
+
+    <para>
+     The following excerpt from a <literal>zebra.cfg</literal> file shows
+     one example of a setup that configures both the main register
+     location and the shadow file area.
+     Note that two directories or partitions have been set aside
+     for the shadow file area. You can specify any number of directories
+     for each of the file areas, but remember that there should be no
+     overlaps between the directories used for the main registers and the
+     shadow files, respectively.
+    </para>
+    <para>
+
+     <screen>
+      register: /d1:500M
+      shadow: /scratch1:100M /scratch2:200M
+     </screen>
+
+    </para>
+
+    <para>
+     When shadow files are enabled, an extra command is available at the
+     <literal>zebraidx</literal> command line.
+     In order to make changes to the system take effect for the
+     users, you'll have to submit a "commit" command after a
+     (sequence of) update operation(s).
+    </para>
+
+    <para>
+
+     <screen>
+      $ zebraidx update /d1/records
+      $ zebraidx commit
+     </screen>
+
+    </para>
+
+    <para>
+     Or you can execute multiple updates before committing the changes:
+    </para>
+
+    <para>
+
+     <screen>
+      $ zebraidx -g books update /d1/records  /d2/more-records
+      $ zebraidx -g fun update /d3/fun-records
+      $ zebraidx commit
+     </screen>
+
+    </para>
+
+    <para>
+     If one of the update operations above had been interrupted, the commit
+     operation on the last line would fail: <literal>zebraidx</literal>
+     will not let you commit changes that would destroy the running register.
+     You'll have to rerun all of the update operations since your last
+     commit operation, before you can commit the new changes.
+    </para>
+
+    <para>
+     Similarly, if the commit operation fails, <literal>zebraidx</literal>
+     will not let you start a new update operation before you have
+     successfully repeated the commit operation.
+     The server processes will keep accessing the shadow files rather
+     than the (possibly damaged) blocks of the main register files
+     until the commit operation has successfully completed.
+    </para>
+
+    <para>
+     You should be aware that update operations may take slightly longer
+     when the shadow register system is enabled, since more file access
+     operations are involved. Further, while the disk space required for
+     the shadow register data is modest for a small update operation, you
+     may prefer to disable the system if you are adding a very large number
+     of records to an already very large database (we use the terms
+     <emphasis>large</emphasis> and <emphasis>modest</emphasis>
+     very loosely here, since every application will have a
+     different perception of size).
+     To update the system without the use of the the shadow files,
+     simply run <literal>zebraidx</literal> with the <literal>-n</literal>
+     option (note that you do not have to execute the
+     <emphasis>commit</emphasis> command of <literal>zebraidx</literal>
+     when you temporarily disable the use of the shadow registers in
+     this fashion.
+     Note also that, just as when the shadow registers are not enabled,
+     server processes will be barred from accessing the main register
+     while the update procedure takes place.
+    </para>
+
+   </sect2>
+
+  </sect1>
+
+
+  <sect1 id="administration-ranking">
+   <title>Relevance Ranking and Sorting of Result Sets</title>
+
+   <sect2 id="administration-overview">
+    <title>Overview</title>
+    <para>
+     The default ordering of a result set is left up to the server,
+     which inside &zebra; means sorting in ascending document ID order.
+     This is not always the order humans want to browse the sometimes
+     quite large hit sets. Ranking and sorting comes to the rescue.
+    </para>
+
+    <para>
+     In cases where a good presentation ordering can be computed at
+     indexing time, we can use a fixed <literal>static ranking</literal>
+     scheme, which is provided for the <literal>alvis</literal>
+     indexing filter. This defines a fixed ordering of hit lists,
+     independently of the query issued.
+    </para>
+
+    <para>
+     There are cases, however, where relevance of hit set documents is
+     highly dependent on the query processed.
+     Simply put, <literal>dynamic relevance ranking</literal>
+     sorts a set of retrieved records such that those most likely to be
+     relevant to your request are retrieved first.
+     Internally, &zebra; retrieves all documents that satisfy your
+     query, and re-orders the hit list to arrange them based on
+     a measurement of similarity between your query and the content of
+     each record.
+    </para>
+
+    <para>
+     Finally, there are situations where hit sets of documents should be
+     <literal>sorted</literal> during query time according to the
+     lexicographical ordering of certain sort indexes created at
+     indexing time.
+    </para>
+   </sect2>
 
-   </para>
 
+   <sect2 id="administration-ranking-static">
+    <title>Static Ranking</title>
 
     <para>
-    The <literal>rank-1</literal> algorithm
-    does not use the static rank 
-    information in the list keys, and will produce the same ordering
-    with or without static ranking enabled.
+     &zebra; uses internally inverted indexes to look up term frequencies
+     in documents. Multiple queries from different indexes can be
+     combined by the binary boolean operations <literal>AND</literal>,
+     <literal>OR</literal> and/or <literal>NOT</literal> (which
+     is in fact a binary <literal>AND NOT</literal> operation).
+     To ensure fast query execution
+     speed, all indexes have to be sorted in the same order.
+    </para>
+    <para>
+     The indexes are normally sorted according to document
+     <literal>ID</literal> in
+     ascending order, and any query which does not invoke a special
+     re-ranking function will therefore retrieve the result set in
+     document
+     <literal>ID</literal>
+     order.
+    </para>
+    <para>
+     If one defines the
+     <screen>
+      staticrank: 1
+     </screen>
+     directive in the main core &zebra; configuration file, the internal document
+     keys used for ordering are augmented by a preceding integer, which
+     contains the static rank of a given document, and the index lists
+     are ordered
+     first by ascending static rank,
+     then by ascending document <literal>ID</literal>.
+     Zero
+     is the ``best'' rank, as it occurs at the
+     beginning of the list; higher numbers represent worse scores.
+    </para>
+    <para>
+     The experimental <literal>alvis</literal> filter provides a
+     directive to fetch static rank information out of the indexed &acro.xml;
+     records, thus making <emphasis>all</emphasis> hit sets ordered
+     after <emphasis>ascending</emphasis> static
+     rank, and for those doc's which have the same static rank, ordered
+     after <emphasis>ascending</emphasis> doc <literal>ID</literal>.
+     See <xref linkend="record-model-alvisxslt"/> for the gory details.
+    </para>
+   </sect2>
+
+
+   <sect2 id="administration-ranking-dynamic">
+    <title>Dynamic Ranking</title>
+    <para>
+     In order to fiddle with the static rank order, it is necessary to
+     invoke additional re-ranking/re-ordering using dynamic
+     ranking or score functions. These functions return positive
+     integer scores, where <emphasis>highest</emphasis> score is
+     ``best'';
+     hit sets are sorted according to <emphasis>descending</emphasis>
+     scores (in contrary
+     to the index lists which are sorted according to
+     ascending rank number and document ID).
+    </para>
+    <para>
+     Dynamic ranking is enabled by a directive like one of the
+     following in the zebra configuration file (use only one of these a time!):
+     <screen>
+      rank: rank-1        # default TDF-IDF like
+      rank: rank-static   # dummy do-nothing
+     </screen>
+    </para>
+
+    <para>
+     Dynamic ranking is done at query time rather than
+     indexing time (this is why we
+     call it ``dynamic ranking'' in the first place ...)
+     It is invoked by adding
+     the &acro.bib1; relation attribute with
+     value ``relevance'' to the &acro.pqf; query (that is,
+     <literal>@attr&nbsp;2=102</literal>, see also
+     <ulink url="&url.z39.50;bib1.html">
+      The &acro.bib1; Attribute Set Semantics</ulink>, also in
+     <ulink url="&url.z39.50.attset.bib1;">HTML</ulink>).
+     To find all articles with the word <literal>Eoraptor</literal> in
+     the title, and present them relevance ranked, issue the &acro.pqf; query:
+     <screen>
+      @attr 2=102 @attr 1=4 Eoraptor
+     </screen>
     </para>
- 
 
-    <!--
     <sect3 id="administration-ranking-dynamic-rank1">
-     <title>Dynamically ranking PQF queries with the 'rank-static' 
+     <title>Dynamically ranking using &acro.pqf; queries with the 'rank-1'
       algorithm</title>
-    <para>
-    The dummy <literal>rank-static</literal> reranking/scoring
-    function returns just 
-    <literal>score = max int - staticrank</literal>
-    in order to preserve the static ordering of hit sets that would
-    have been produced had it not been invoked.
-    Obviously, to combine static and dynamic ranking usefully,
-    it is necessary
-    to make a new ranking 
-    function; this is left
-    as an exercise for the reader. 
-   </para>
-    </sect3>
-    -->
- 
-   <warning>
+
      <para>
-      <literal>Dynamic ranking</literal> is not compatible
-      with <literal>estimated hit sizes</literal>, as all documents in
-      a hit set must be accessed to compute the correct placing in a
-      ranking sorted list. Therefore the use attribute setting
-      <literal>@attr&nbsp;2=102</literal> clashes with 
-      <literal>@attr&nbsp;9=integer</literal>. 
+      The default <literal>rank-1</literal> ranking module implements a
+      TF/IDF (Term Frequecy over Inverse Document Frequency) like
+      algorithm. In contrast to the usual definition of TF/IDF
+      algorithms, which only considers searching in one full-text
+      index, this one works on multiple indexes at the same time.
+      More precisely,
+      &zebra; does boolean queries and searches in specific addressed
+      indexes (there are inverted indexes pointing from terms in the
+      dictionary to documents and term positions inside documents).
+      It works like this:
+      <variablelist>
+       <varlistentry>
+	<term>Query Components</term>
+	<listitem>
+	 <para>
+	  First, the boolean query is dismantled into its principal components,
+	  i.e. atomic queries where one term is looked up in one index.
+	  For example, the query
+	  <screen>
+	   @attr 2=102 @and @attr 1=1010 Utah @attr 1=1018 Springer
+	  </screen>
+	  is a boolean AND between the atomic parts
+	  <screen>
+	   @attr 2=102 @attr 1=1010 Utah
+	  </screen>
+          and
+	  <screen>
+	   @attr 2=102 @attr 1=1018 Springer
+	  </screen>
+	  which gets processed each for itself.
+	 </para>
+	</listitem>
+       </varlistentry>
+
+       <varlistentry>
+	<term>Atomic hit lists</term>
+	<listitem>
+	 <para>
+	  Second, for each atomic query, the hit list of documents is
+	  computed.
+	 </para>
+	 <para>
+	  In this example, two hit lists for each index
+	  <literal>@attr 1=1010</literal>  and
+	  <literal>@attr 1=1018</literal> are computed.
+	 </para>
+	</listitem>
+       </varlistentry>
+
+       <varlistentry>
+	<term>Atomic scores</term>
+	<listitem>
+	 <para>
+	  Third, each document in the hit list is assigned a score (_if_ ranking
+	  is enabled and requested in the query)  using a TF/IDF scheme.
+	 </para>
+	 <para>
+	  In this example, both atomic parts of the query assign the magic
+	  <literal>@attr 2=102</literal> relevance attribute, and are
+	  to be used in the relevance ranking functions.
+	 </para>
+	 <para>
+	  It is possible to apply dynamic ranking on only parts of the
+	  &acro.pqf; query:
+	  <screen>
+	   @and @attr 2=102 @attr 1=1010 Utah @attr 1=1018 Springer
+	  </screen>
+	  searches for all documents which have the term 'Utah' on the
+	  body of text, and which have the term 'Springer' in the publisher
+	  field, and sort them in the order of the relevance ranking made on
+	  the body-of-text index only.
+	 </para>
+	</listitem>
+       </varlistentry>
+
+       <varlistentry>
+	<term>Hit list merging</term>
+	<listitem>
+	 <para>
+	  Fourth, the atomic hit lists are merged according to the boolean
+	  conditions to a final hit list of documents to be returned.
+	 </para>
+	 <para>
+	  This step is always performed, independently of the fact that
+	  dynamic ranking is enabled or not.
+	 </para>
+	</listitem>
+       </varlistentry>
+
+       <varlistentry>
+	<term>Document score computation</term>
+	<listitem>
+	 <para>
+	  Fifth, the total score of a document is computed as a linear
+	  combination of the atomic scores of the atomic hit lists
+	 </para>
+	 <para>
+	  Ranking weights may be used to pass a value to a ranking
+	  algorithm, using the non-standard &acro.bib1; attribute type 9.
+	  This allows one branch of a query to use one value while
+	  another branch uses a different one.  For example, we can search
+	  for <literal>utah</literal> in the
+	  <literal>@attr 1=4</literal> index with weight 30, as
+	  well as in the <literal>@attr 1=1010</literal> index with weight 20:
+	  <screen>
+	   @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 @attr 1=1010 city
+	  </screen>
+	 </para>
+	 <para>
+	  The default weight is
+	  sqrt(1000) ~ 34 , as the &acro.z3950; standard prescribes that the top score
+	  is 1000 and the bottom score is 0, encoded in integers.
+	 </para>
+	 <warning>
+	  <para>
+	   The ranking-weight feature is experimental. It may change in future
+	   releases of zebra.
+	  </para>
+	 </warning>
+	</listitem>
+       </varlistentry>
+
+       <varlistentry>
+	<term>Re-sorting of hit list</term>
+	<listitem>
+	 <para>
+	  Finally, the final hit list is re-ordered according to scores.
+	 </para>
+	</listitem>
+       </varlistentry>
+      </variablelist>
+
      </para>
-   </warning>  
 
-   <!--
-    we might want to add ranking like this:
-    UNPUBLISHED:
-    Simple BM25 Extension to Multiple Weighted Fields
-    Stephen Robertson, Hugo Zaragoza and Michael Taylor
-    Microsoft Research
-    ser@microsoft.com
-    hugoz@microsoft.com
-    mitaylor2microsoft.com
-   -->
+
+     <para>
+      The <literal>rank-1</literal> algorithm
+      does not use the static rank
+      information in the list keys, and will produce the same ordering
+      with or without static ranking enabled.
+     </para>
+
+
+     <!--
+     <sect3 id="administration-ranking-dynamic-rank1">
+     <title>Dynamically ranking &acro.pqf; queries with the 'rank-static'
+     algorithm</title>
+     <para>
+     The dummy <literal>rank-static</literal> reranking/scoring
+     function returns just
+     <literal>score = max int - staticrank</literal>
+     in order to preserve the static ordering of hit sets that would
+     have been produced had it not been invoked.
+     Obviously, to combine static and dynamic ranking usefully,
+     it is necessary
+     to make a new ranking
+     function; this is left
+     as an exercise for the reader.
+    </para>
+    </sect3>
+     -->
+
+     <warning>
+      <para>
+       <literal>Dynamic ranking</literal> is not compatible
+       with <literal>estimated hit sizes</literal>, as all documents in
+       a hit set must be accessed to compute the correct placing in a
+       ranking sorted list. Therefore the use attribute setting
+       <literal>@attr&nbsp;2=102</literal> clashes with
+       <literal>@attr&nbsp;9=integer</literal>.
+      </para>
+     </warning>
+
+     <!--
+     we might want to add ranking like this:
+     UNPUBLISHED:
+     Simple BM25 Extension to Multiple Weighted Fields
+     Stephen Robertson, Hugo Zaragoza and Michael Taylor
+     Microsoft Research
+     ser@microsoft.com
+     hugoz@microsoft.com
+     mitaylor2microsoft.com
+     -->
 
     </sect3>
 
     <sect3 id="administration-ranking-dynamic-cql">
-     <title>Dynamically ranking CQL queries</title>
+     <title>Dynamically ranking &acro.cql; queries</title>
      <para>
-      Dynamic ranking can be enabled during sever side CQL
+      Dynamic ranking can be enabled during sever side &acro.cql;
       query expansion by adding <literal>@attr&nbsp;2=102</literal>
-      chunks to the CQL config file. For example
+      chunks to the &acro.cql; config file. For example
       <screen>
        relationModifier.relevant		= 2=102
       </screen>
-      invokes dynamic ranking each time a CQL query of the form 
+      invokes dynamic ranking each time a &acro.cql; query of the form
       <screen>
        Z> querytype cql
        Z> f alvis.text =/relevant house
       </screen>
       is issued. Dynamic ranking can also be automatically used on
-      specific CQL indexes by (for example) setting
+      specific &acro.cql; indexes by (for example) setting
       <screen>
        index.alvis.text                        = 1=text 2=102
       </screen>
-      which then invokes dynamic ranking each time a CQL query of the form 
+      which then invokes dynamic ranking each time a &acro.cql; query of the form
       <screen>
        Z> querytype cql
        Z> f alvis.text = house
       </screen>
       is issued.
      </para>
-     
+
     </sect3>
 
-    </sect2>
+   </sect2>
 
 
- <sect2 id="administration-ranking-sorting">
-  <title>Sorting</title>
-   <para>
-     Zebra sorts efficiently using special sorting indexes
+   <sect2 id="administration-ranking-sorting">
+    <title>Sorting</title>
+    <para>
+     &zebra; sorts efficiently using special sorting indexes
      (type=<literal>s</literal>; so each sortable index must be known
      at indexing time, specified in the configuration of record
-     indexing.  For example, to enable sorting according to the BIB-1
+     indexing.  For example, to enable sorting according to the &acro.bib1;
      <literal>Date/time-added-to-db</literal> field, one could add the line
      <screen>
-        xelm /*/@created               Date/time-added-to-db:s
+      xelm /*/@created               Date/time-added-to-db:s
      </screen>
      to any <literal>.abs</literal> record-indexing configuration file.
      Similarly, one could add an indexing element of the form
-     <screen><![CDATA[       
+     <screen><![CDATA[
       <z:index name="date-modified" type="s">
-       <xsl:value-of select="some/xpath"/>
-      </z:index>
+      <xsl:value-of select="some/xpath"/>
+     </z:index>
       ]]></screen>
      to any <literal>alvis</literal>-filter indexing stylesheet.
-     </para>
-     <para>
-      Indexing can be specified at searching time using a query term
-      carrying the non-standard
-      BIB-1 attribute-type <literal>7</literal>.  This removes the
-      need to send a Z39.50 <literal>Sort Request</literal>
-      separately, and can dramatically improve latency when the client
-      and server are on separate networks.
-      The sorting part of the query is separate from the rest of the
-      query - the actual search specification - and must be combined
-      with it using OR.
-     </para>
-     <para>
-      A sorting subquery needs two attributes: an index (such as a
-      BIB-1 type-1 attribute) specifying which index to sort on, and a
-      type-7 attribute whose value is be <literal>1</literal> for
-      ascending sorting, or <literal>2</literal> for descending.  The
-      term associated with the sorting attribute is the priority of
-      the sort key, where <literal>0</literal> specifies the primary
-      sort key, <literal>1</literal> the secondary sort key, and so
-      on.
-     </para>
+    </para>
+    <para>
+     Indexing can be specified at searching time using a query term
+     carrying the non-standard
+     &acro.bib1; attribute-type <literal>7</literal>.  This removes the
+     need to send a &acro.z3950; <literal>Sort Request</literal>
+     separately, and can dramatically improve latency when the client
+     and server are on separate networks.
+     The sorting part of the query is separate from the rest of the
+     query - the actual search specification - and must be combined
+     with it using OR.
+    </para>
+    <para>
+     A sorting subquery needs two attributes: an index (such as a
+     &acro.bib1; type-1 attribute) specifying which index to sort on, and a
+     type-7 attribute whose value is be <literal>1</literal> for
+     ascending sorting, or <literal>2</literal> for descending.  The
+     term associated with the sorting attribute is the priority of
+     the sort key, where <literal>0</literal> specifies the primary
+     sort key, <literal>1</literal> the secondary sort key, and so
+     on.
+    </para>
     <para>For example, a search for water, sort by title (ascending),
-    is expressed by the PQF query
+     is expressed by the &acro.pqf; query
      <screen>
-     @or @attr 1=1016 water @attr 7=1 @attr 1=4 0
+      @or @attr 1=1016 water @attr 7=1 @attr 1=4 0
      </screen>
-      whereas a search for water, sort by title ascending, 
+     whereas a search for water, sort by title ascending,
      then date descending would be
      <screen>
-     @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1
+      @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1
      </screen>
     </para>
     <para>
      Notice the fundamental differences between <literal>dynamic
-     ranking</literal> and <literal>sorting</literal>: there can be
+      ranking</literal> and <literal>sorting</literal>: there can be
      only one ranking function defined and configured; but multiple
      sorting indexes can be specified dynamically at search
      time. Ranking does not need to use specific indexes, so
      dynamic ranking can be enabled and disabled without
      re-indexing; whereas, sorting indexes need to be
      defined before indexing.
-     </para>
+    </para>
+
+   </sect2>
 
- </sect2>
 
+  </sect1>
 
- </sect1>
+  <sect1 id="administration-extended-services">
+   <title>Extended Services: Remote Insert, Update and Delete</title>
 
- <sect1 id="administration-extended-services">
-  <title>Extended Services: Remote Insert, Update and Delete</title>
-  
    <note>
     <para>
-     Extended services are only supported when accessing the Zebra
-     server using the <ulink url="&url.z39.50;">Z39.50</ulink>
-     protocol. The <ulink url="&url.sru;">SRU</ulink> protocol does
+     Extended services are only supported when accessing the &zebra;
+     server using the <ulink url="&url.z39.50;">&acro.z3950;</ulink>
+     protocol. The <ulink url="&url.sru;">&acro.sru;</ulink> protocol does
      not support extended services.
     </para>
    </note>
-   
-  <para>
+
+   <para>
     The extended services are not enabled by default in zebra - due to the
-    fact that they modify the system. Zebra can be configured
+    fact that they modify the system. &zebra; can be configured
     to allow anybody to
     search, and to allow only updates for a particular admin user
     in the main zebra configuration file <filename>zebra.cfg</filename>.
@@ -1456,15 +1479,15 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci
      perm.admin: rw
      passwd: passwordfile
     </screen>
-    And in the password file 
+    And in the password file
     <filename>passwordfile</filename>, you have to specify users and
-    encrypted passwords as colon separated strings. 
-    Use a tool like <filename>htpasswd</filename> 
-    to maintain the encrypted passwords. 
-    <screen> 
+    encrypted passwords as colon separated strings.
+    Use a tool like <filename>htpasswd</filename>
+    to maintain the encrypted passwords.
+    <screen>
      admin:secret
     </screen>
-    It is essential to configure  Zebra to store records internally, 
+    It is essential to configure  &zebra; to store records internally,
     and to support
     modifications and deletion of records:
     <screen>
@@ -1472,300 +1495,374 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci
      storeKeys: 1
     </screen>
     The general record type should be set to any record filter which
-    is able to parse XML records, you may use any of the two
+    is able to parse &acro.xml; records, you may use any of the two
     declarations (but not both simultaneously!)
-    <screen>    
-     recordType: grs.xml
-     # recordType: alvis.filter_alvis_config.xml
+    <screen>
+     recordType: dom.filter_dom_conf.xml
+     # recordType: grs.xml
+    </screen>
+    Notice the difference to the specific instructions
+    <screen>
+     recordType.xml: dom.filter_dom_conf.xml
+     # recordType.xml: grs.xml
     </screen>
+    which only work when indexing XML files from the filesystem using
+    the <literal>*.xml</literal> naming convention.
+   </para>
+   <para>
     To enable transaction safe shadow indexing,
     which is extra important for this kind of operation, set
     <screen>
      shadow: directoryname: size (e.g. 1000M)
     </screen>
+    See <xref linkend="zebra-cfg"/> for additional information on
+    these configuration options.
    </para>
    <note>
     <para>
      It is not possible to carry information about record types or
-     similar to Zebra when using extended services, due to
-     limitations of the <ulink url="&url.z39.50;">Z39.50</ulink>
+     similar to &zebra; when using extended services, due to
+     limitations of the <ulink url="&url.z39.50;">&acro.z3950;</ulink>
      protocol. Therefore, indexing filters can not be chosen on a
-     per-record basis. One and only one general XML indexing filter
-     must be defined.  
+     per-record basis. One and only one general &acro.xml; indexing filter
+     must be defined.
      <!-- but because it is represented as an OID, we would need some
      form of proprietary mapping scheme between record type strings and
      OIDs. -->
      <!--
      However, as a minimum, it would be extremely useful to enable
-     people to use MARC21, assuming grs.marcxml.marc21 as a record
-     type.  
+     people to use &acro.marc21;, assuming grs.marcxml.marc21 as a record
+     type.
      -->
     </para>
    </note>
 
 
    <sect2 id="administration-extended-services-z3950">
-    <title>Extended services in the Z39.50 protocol</title>
+    <title>Extended services in the &acro.z3950; protocol</title>
 
     <para>
-     The <ulink url="&url.z39.50;">Z39.50</ulink> standard allows
+     The <ulink url="&url.z39.50;">&acro.z3950;</ulink> standard allows
      servers to accept special binary <emphasis>extended services</emphasis>
      protocol packages, which may be used to insert, update and delete
      records into servers. These carry  control and update
-     information to the servers, which are encoded in seven package fields: 
+     information to the servers, which are encoded in seven package fields:
     </para>
 
     <table id="administration-extended-services-z3950-table" frame="top">
-     <title>Extended services Z39.50 Package Fields</title>
-      <tgroup cols="3">
-       <thead>
+     <title>Extended services &acro.z3950; Package Fields</title>
+     <tgroup cols="3">
+      <thead>
        <row>
-         <entry>Parameter</entry>
-         <entry>Value</entry>
-         <entry>Notes</entry>
-        </row>
+	<entry>Parameter</entry>
+	<entry>Value</entry>
+	<entry>Notes</entry>
+       </row>
       </thead>
-       <tbody>
-        <row>
-         <entry><literal>type</literal></entry>
-         <entry><literal>'update'</literal></entry>
-         <entry>Must be set to trigger extended services</entry>
-        </row>
-        <row>
-         <entry><literal>action</literal></entry>
-         <entry><literal>string</literal></entry>
+      <tbody>
+       <row>
+	<entry><literal>type</literal></entry>
+	<entry><literal>'update'</literal></entry>
+	<entry>Must be set to trigger extended services</entry>
+       </row>
+       <row>
+	<entry><literal>action</literal></entry>
+	<entry><literal>string</literal></entry>
         <entry>
-         Extended service action type with 
+         Extended service action type with
          one of four possible values: <literal>recordInsert</literal>,
          <literal>recordReplace</literal>,
          <literal>recordDelete</literal>,
          and <literal>specialUpdate</literal>
         </entry>
-        </row>
-        <row>
-         <entry><literal>record</literal></entry>
-         <entry><literal>XML string</literal></entry>
-         <entry>An XML formatted string containing the record</entry>
-        </row>
-        <row>
-         <entry><literal>syntax</literal></entry>
-         <entry><literal>'xml'</literal></entry>
-         <entry>Only XML record syntax is supported</entry>
-        </row>
-        <row>
-         <entry><literal>recordIdOpaque</literal></entry>
-         <entry><literal>string</literal></entry>
-         <entry>
-         Optional  client-supplied, opaque record
+       </row>
+       <row>
+	<entry><literal>record</literal></entry>
+	<entry><literal>&acro.xml; string</literal></entry>
+	<entry>An &acro.xml; formatted string containing the record</entry>
+       </row>
+       <row>
+	<entry><literal>syntax</literal></entry>
+	<entry><literal>'xml'</literal></entry>
+	<entry>XML/SUTRS/MARC. GRS-1 not supported.
+	 The default filter (record type) as given by recordType in
+	 zebra.cfg is used to parse the record.</entry>
+       </row>
+       <row>
+	<entry><literal>recordIdOpaque</literal></entry>
+	<entry><literal>string</literal></entry>
+	<entry>
+         Optional client-supplied, opaque record
          identifier used under insert operations.
         </entry>
-        </row>
-        <row>
-         <entry><literal>recordIdNumber </literal></entry>
-         <entry><literal>positive number</literal></entry>
-         <entry>Zebra's internal system number, only for update
-         actions.
+       </row>
+       <row>
+	<entry><literal>recordIdNumber </literal></entry>
+	<entry><literal>positive number</literal></entry>
+	<entry>&zebra;'s internal system number,
+         not allowed for  <literal>recordInsert</literal> or
+         <literal>specialUpdate</literal> actions which result in fresh
+         record inserts.
         </entry>
-        </row>
-        <row>
-         <entry><literal>databaseName</literal></entry>
-         <entry><literal>database identifier</literal></entry>
+       </row>
+       <row>
+	<entry><literal>databaseName</literal></entry>
+	<entry><literal>database identifier</literal></entry>
         <entry>
-         The name of the database to which the extended services should be 
+         The name of the database to which the extended services should be
          applied.
         </entry>
-        </row>
+       </row>
       </tbody>
-      </tgroup>
-     </table>
+     </tgroup>
+    </table>
 
 
-   <para>
-    The <literal>action</literal> parameter can be any of 
-    <literal>recordInsert</literal> (will fail if the record already exists),
-    <literal>recordReplace</literal> (will fail if the record does not exist),
-    <literal>recordDelete</literal> (will fail if the record does not
-       exist), and
-    <literal>specialUpdate</literal> (will insert or update the record
-       as needed).
-   </para>
+    <para>
+     The <literal>action</literal> parameter can be any of
+     <literal>recordInsert</literal> (will fail if the record already exists),
+     <literal>recordReplace</literal> (will fail if the record does not exist),
+     <literal>recordDelete</literal> (will fail if the record does not
+     exist), and
+     <literal>specialUpdate</literal> (will insert or update the record
+     as needed, record deletion is not possible).
+    </para>
 
     <para>
-     During a  <literal>recordInsert</literal> action, the
+     During all actions, the
      usual rules for internal record ID generation apply, unless an
-     optional <literal>recordIdNumber</literal> Zebra internal ID or a
-    <literal>recordIdOpaque</literal> string identifier is assigned. 
+     optional <literal>recordIdNumber</literal> &zebra; internal ID or a
+     <literal>recordIdOpaque</literal> string identifier is assigned.
      The default ID generation is
      configured using the <literal>recordId:</literal> from
-     <filename>zebra.cfg</filename>.     
+     <filename>zebra.cfg</filename>.
+     See <xref linkend="zebra-cfg"/>.
     </para>
 
-   <para>
-    The actions <literal>recordReplace</literal> or
-    <literal>recordDelete</literal> need specification of the additional 
-    <literal>recordIdNumber</literal> parameter, which must be an
-    existing Zebra internal system ID number, or the optional 
-     <literal>recordIdOpaque</literal> string parameter.
+    <para>
+     Setting of the <literal>recordIdNumber</literal> parameter,
+     which must be an existing &zebra; internal system ID number, is not
+     allowed during any  <literal>recordInsert</literal> or
+     <literal>specialUpdate</literal> action resulting in fresh record
+     inserts.
     </para>
 
     <para>
      When retrieving existing
-     records indexed with GRS indexing filters, the Zebra internal 
+     records indexed with &acro.grs1; indexing filters, the &zebra; internal
      ID number is returned in the field
-    <literal>/*/id:idzebra/localnumber</literal> in the namespace
-    <literal>xmlns:id="http://www.indexdata.dk/zebra/"</literal>,
-    where it can be picked up for later record updates or deletes. 
+     <literal>/*/id:idzebra/localnumber</literal> in the namespace
+     <literal>xmlns:id="http://www.indexdata.dk/zebra/"</literal>,
+     where it can be picked up for later record updates or deletes.
     </para>
+
     <para>
-     Records indexed with the <literal>alvis</literal> filter
-     have similar means to discover the internal Zebra ID.
+     A new element set for retrieval of internal record
+     data has been added, which can be used to access minimal records
+     containing only the <literal>recordIdNumber</literal> &zebra;
+     internal ID, or the <literal>recordIdOpaque</literal> string
+     identifier. This works for any indexing filter used.
+     See <xref linkend="special-retrieval"/>.
     </para>
- 
-   <para>
+
+    <para>
      The <literal>recordIdOpaque</literal> string parameter
      is an client-supplied, opaque record
-     identifier, which may be  used under 
+     identifier, which may be  used under
      insert, update and delete operations. The
      client software is responsible for assigning these to
      records.      This identifier will
      replace zebra's own automagic identifier generation with a unique
-     mapping from <literal>recordIdOpaque</literal> to the 
-     Zebra internal <literal>recordIdNumber</literal>.
+     mapping from <literal>recordIdOpaque</literal> to the
+     &zebra; internal <literal>recordIdNumber</literal>.
      <emphasis>The opaque <literal>recordIdOpaque</literal> string
-     identifiers
+      identifiers
       are not visible in retrieval records, nor are
       searchable, so the value of this parameter is
       questionable. It serves mostly as a convenient mapping from
-      application domain string identifiers to Zebra internal ID's.
-     </emphasis> 
+      application domain string identifiers to &zebra; internal ID's.
+     </emphasis>
     </para>
    </sect2>
 
-   
- <sect2 id="administration-extended-services-yaz-client">
-  <title>Extended services from yaz-client</title>
 
-   <para>
-    We can now start a yaz-client admin session and create a database:
-   <screen>
-    <![CDATA[
-     $ yaz-client localhost:9999 -u admin/secret
-     Z> adm-create
-     ]]>
-   </screen>
-    Now the <literal>Default</literal> database was created,
-    we can insert an XML file (esdd0006.grs
-    from example/gils/records) and index it:
-   <screen>  
-    <![CDATA[
-     Z> update insert id1234 esdd0006.grs
-     ]]>
-   </screen>
-    The 3rd parameter - <literal>id1234</literal> here -
-      is the  <literal>recordIdOpaque</literal> package field.
-   </para>
-   <para>
-    Actually, we should have a way to specify "no opaque record id" for
-    yaz-client's update command.. We'll fix that.
-   </para>
-   <para>
-    The newly inserted record can be searched as usual:
-    <screen>
-    <![CDATA[
-     Z> f utah
-     Sent searchRequest.
-     Received SearchResponse.
-     Search was a success.
-     Number of hits: 1, setno 1
-     SearchResult-1: term=utah cnt=1
-     records returned: 0
-     Elapsed: 0.014179
-     ]]>
-    </screen>
-   </para>
-   <para>
-     Let's delete the beast, using the same 
+   <sect2 id="administration-extended-services-yaz-client">
+    <title>Extended services from yaz-client</title>
+
+    <para>
+     We can now start a yaz-client admin session and create a database:
+     <screen>
+      <![CDATA[
+      $ yaz-client localhost:9999 -u admin/secret
+      Z> adm-create
+      ]]>
+     </screen>
+     Now the <literal>Default</literal> database was created,
+     we can insert an &acro.xml; file (esdd0006.grs
+     from example/gils/records) and index it:
+     <screen>
+      <![CDATA[
+      Z> update insert id1234 esdd0006.grs
+      ]]>
+     </screen>
+     The 3rd parameter - <literal>id1234</literal> here -
+     is the  <literal>recordIdOpaque</literal> package field.
+    </para>
+    <para>
+     Actually, we should have a way to specify "no opaque record id" for
+     yaz-client's update command.. We'll fix that.
+    </para>
+    <para>
+     The newly inserted record can be searched as usual:
+     <screen>
+      <![CDATA[
+      Z> f utah
+      Sent searchRequest.
+      Received SearchResponse.
+      Search was a success.
+      Number of hits: 1, setno 1
+      SearchResult-1: term=utah cnt=1
+      records returned: 0
+      Elapsed: 0.014179
+      ]]>
+     </screen>
+    </para>
+    <para>
+     Let's delete the beast, using the same
      <literal>recordIdOpaque</literal> string parameter:
-    <screen>
-    <![CDATA[
-     Z> update delete id1234
-     No last record (update ignored)
-     Z> update delete 1 esdd0006.grs
-     Got extended services response
-     Status: done
-     Elapsed: 0.072441
-     Z> f utah
-     Sent searchRequest.
-     Received SearchResponse.
-     Search was a success.
-     Number of hits: 0, setno 2
-     SearchResult-1: term=utah cnt=0
-     records returned: 0
-     Elapsed: 0.013610
-     ]]>
+     <screen>
+      <![CDATA[
+      Z> update delete id1234
+      No last record (update ignored)
+      Z> update delete 1 esdd0006.grs
+      Got extended services response
+      Status: done
+      Elapsed: 0.072441
+      Z> f utah
+      Sent searchRequest.
+      Received SearchResponse.
+      Search was a success.
+      Number of hits: 0, setno 2
+      SearchResult-1: term=utah cnt=0
+      records returned: 0
+      Elapsed: 0.013610
+      ]]>
      </screen>
     </para>
     <para>
-    If shadow register is enabled in your
-    <filename>zebra.cfg</filename>,
-    you must run the adm-commit command
-    <screen>
-    <![CDATA[
-     Z> adm-commit
-     ]]>
-    </screen>
+     If shadow register is enabled in your
+     <filename>zebra.cfg</filename>,
+     you must run the adm-commit command
+     <screen>
+      <![CDATA[
+      Z> adm-commit
+      ]]>
+     </screen>
      after each update session in order write your changes from the
      shadow to the life register space.
-   </para>
- </sect2>
+    </para>
+   </sect2>
 
-  
- <sect2 id="administration-extended-services-yaz-php">
-  <title>Extended services from yaz-php</title>
 
-   <para>
-    Extended services are also available from the YAZ PHP client layer. An
-    example of an YAZ-PHP extended service transaction is given here:
-    <screen>
-    <![CDATA[
-     $record = '<record><title>A fine specimen of a record</title></record>';
-
-     $options = array('action' => 'recordInsert',
-                      'syntax' => 'xml',
-                      'record' => $record,
-                      'databaseName' => 'mydatabase'
-                     );
-
-     yaz_es($yaz, 'update', $options);
-     yaz_es($yaz, 'commit', array());
-     yaz_wait();
-
-     if ($error = yaz_error($yaz))
-       echo "$error";
-     ]]>
-    </screen>  
-    </para>
-    </sect2>
- </sect1>
+   <sect2 id="administration-extended-services-yaz-php">
+    <title>Extended services from yaz-php</title>
 
+    <para>
+     Extended services are also available from the &yaz; &acro.php; client layer. An
+     example of an &yaz;-&acro.php; extended service transaction is given here:
+     <screen>
+      <![CDATA[
+      $record = '<record><title>A fine specimen of a record</title></record>';
+
+      $options = array('action' => 'recordInsert',
+      'syntax' => 'xml',
+      'record' => $record,
+      'databaseName' => 'mydatabase'
+      );
+
+      yaz_es($yaz, 'update', $options);
+      yaz_es($yaz, 'commit', array());
+      yaz_wait();
+
+      if ($error = yaz_error($yaz))
+      echo "$error";
+      ]]>
+     </screen>
+    </para>
+   </sect2>
 
-  <sect1 id="gfs-config">
-   <title>YAZ Frontend Virtual Hosts</title>
+   <sect2 id="administration-extended-services-debugging">
+    <title>Extended services debugging guide</title>
     <para>
-     <command>zebrasrv</command> uses the YAZ server frontend and does
-     support multiple virtual servers behind multiple listening sockets.
+     When debugging ES over PHP we recommend the following order of tests:
     </para>
-    &zebrasrv-virtual;
- 
-   <para>
-    Section "Virtual Hosts" in the YAZ manual.
-    <filename>http://www.indexdata.dk/yaz/doc/server.vhosts.tkl</filename>
-   </para>
- </sect1>
 
+    <itemizedlist>
+     <listitem>
+      <para>
+       Make sure you have a nice record on your filesystem, which you can
+       index from the filesystem by use of the zebraidx command.
+       Do it exactly as you planned, using one of the GRS-1 filters,
+       or the DOMXML filter.
+       When this works, proceed.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Check that your server setup is OK before you even coded one single
+       line PHP using ES.
+       Take the same record form the file system, and send as ES via
+       <literal>yaz-client</literal> like described in
+       <xref linkend="administration-extended-services-yaz-client"/>,
+       and
+       remember the <literal>-a</literal> option which tells you what
+       goes over the wire! Notice also the section on permissions:
+       try
+       <screen>
+        perm.anonymous: rw
+       </screen>
+       in <literal>zebra.cfg</literal> to make sure you do not run into
+       permission  problems (but never expose such an insecure setup on the
+       internet!!!). Then, make sure to set the general
+       <literal>recordType</literal> instruction, pointing correctly
+       to the GRS-1 filters,
+       or the DOMXML filters.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       If you insist on using the <literal>sysno</literal> in the
+       <literal>recordIdNumber</literal> setting,
+       please make sure you do only updates and deletes. Zebra's internal
+       system number is not allowed for
+       <literal>recordInsert</literal> or
+       <literal>specialUpdate</literal> actions
+       which result in fresh record inserts.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       If <literal>shadow register</literal> is enabled in your
+       <literal>zebra.cfg</literal>, you must remember running the
+       <screen>
+        Z> adm-commit
+       </screen>
+       command as well.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       If this works, then proceed to do the same thing in your PHP script.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+
+   </sect2>
+
+  </sect1>
 
- 
-</chapter>
+ </chapter>
 
  <!-- Keep this comment at the end of the file
  Local variables:
@@ -1776,7 +1873,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci
  sgml-always-quote-attributes:t
  sgml-indent-step:1
  sgml-indent-data:t
- sgml-parent-document: "zebra.xml"
+ sgml-parent-document: "idzebra.xml"
  sgml-local-catalogs: nil
  sgml-namecase-general:t
  End: