X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Fadministration.xml;h=829ef7591505428863477952c443e459617edb76;hb=656a766f96dd92939c3604a7bf88f2355d040fc8;hp=cd0572e9f661132d6a3ec5b36588f553bf0c89fd;hpb=7b149568c78c6a73915e282ef10edc95171e66fd;p=idzebra-moved-to-github.git

diff --git a/doc/administration.xml b/doc/administration.xml
index cd0572e..829ef75 100644
--- a/doc/administration.xml
+++ b/doc/administration.xml
@@ -1,5 +1,5 @@
 <chapter id="administration">
- <!-- $Id: administration.xml,v 1.26 2006-03-04 21:07:57 marc Exp $ -->
+ <!-- $Id: administration.xml,v 1.48 2007-01-17 13:31:36 marc Exp $ -->
  <title>Administrating Zebra</title>
  <!-- ### It's a bit daft that this chapter (which describes half of
           the configuration-file formats) is separated from
@@ -94,7 +94,7 @@
   
  </sect1>
  
- <sect1 id="configuration-file">
+ <sect1 id="zebra-cfg">
   <title>The Zebra Configuration File</title>
   
   <para>
@@ -281,20 +281,67 @@
       <para>
        Specifies a path of profile specification files. 
        The path is composed of one or more directories separated by
-       colon. Similar to PATH for UNIX systems.
+       colon. Similar to <literal>PATH</literal> for UNIX systems.
       </para>
      </listitem>
     </varlistentry>
+
+     <varlistentry>
+      <term>modulePath: <replaceable>path</replaceable></term>
+      <listitem>
+       <para>
+	Specifies a path of record filter modules.
+	The path is composed of one or more directories separated by
+	colon. Similar to <literal>PATH</literal> for UNIX systems.
+	The 'make install' procedure typically puts modules in
+	<filename>/usr/local/lib/idzebra-2.0/modules</filename>.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term>staticrank: <replaceable>integer</replaceable></term>
+      <listitem>
+       <para>
+	Enables whether static ranking is to be enabled (1) or
+	disabled (0). If omitted, it is disabled - corresponding
+	to a value of 0.
+	Refer to <xref linkend="administration-ranking-static"/> .
+       </para>
+      </listitem>
+     </varlistentry>
+
+
+     <varlistentry>
+      <term>estimatehits:: <replaceable>integer</replaceable></term>
+      <listitem>
+       <para>
+	Controls whether Zebra should calculate approximite hit counts and
+	at which hit count it is to be enabled.
+	A value of 0 disables approximiate hit counts.
+	For a positive value approximaite hit count is enabled
+	if it is known to be larger than <replaceable>integer</replaceable>.
+       </para>
+       <para>
+	Approximate hit counts can also be triggered by a particular
+	attribute in a query.
+	Refer to <xref linkend="querymodel-zebra-global-attr-limit"/>.
+       </para>
+      </listitem>
+     </varlistentry>
+
     <varlistentry>
      <term>attset: <replaceable>filename</replaceable></term>
      <listitem>
       <para>
-       Specifies the filename(s) of attribute set files for use in
-       searching. At least the Bib-1 set should be loaded
-       (<literal>bib1.att</literal>).
-       The <literal>profilePath</literal> setting is used to look for
-       the specified files.
-       See <xref linkend="attset-files"/>
+	Specifies the filename(s) of attribute set files for use in
+	searching. In many configurations <filename>bib1.att</filename>
+	is used, but that is not required. If Classic Explain
+	attributes is to be used for searching,
+	<filename>explain.att</filename> must be given.
+	The path to att-files in general can be given using 
+	<literal>profilePath</literal> setting.
+	See also <xref linkend="attset-files"/>.
       </para>
      </listitem>
     </varlistentry>
@@ -356,7 +403,7 @@
        Specifies a file with description of user accounts for Zebra.
        File format is similar to that used by the passwd directive except
        that the password are encrypted. Use Apache's htpasswd or similar
-       for maintenanace.
+       for maintenance.
       </para>
      </listitem>
     </varlistentry>
@@ -370,11 +417,12 @@
        to access Zebra via the passwd system. There are two kinds
        of permissions currently: read (r) and write(w). By default
        users not listed in a permission directive are given the read
-       priviledge. To specify permissions for a user with no
+       privilege. To specify permissions for a user with no
        username, or Z39.50 anonymous style use
 	<literal>anonymous</literal>. The permstring consists of
        a sequence of characters. Include character <literal>w</literal>
-       for write/update access, <literal>r</literal> for read access.
+       for write/update access, <literal>r</literal> for read access and
+       <literal>a</literal> to allow anonymous access through this account.
       </para>
      </listitem>
     </varlistentry>
@@ -402,7 +450,7 @@
   <para>
    The default behavior of the Zebra system is to reference the
    records from their original location, i.e. where they were found when you
-   ran <literal>zebraidx</literal>.
+   run <literal>zebraidx</literal>.
    That is, when a client wishes to retrieve a record
    following a search operation, the files are accessed from the place
    where you originally put them - if you remove the files (without
@@ -669,7 +717,7 @@
   </para>
   
   <para>
-   (see <xref linkend="record-model-grs"/>
+   (see <xref linkend="grs"/>
     for details of how the mapping between elements of your records and
     searchable attributes is established).
   </para>
@@ -757,7 +805,7 @@
  <sect1 id="shadow-registers">
   <title>Safe Updating - Using Shadow Registers</title>
   
-  <sect2>
+  <sect2 id="shadow-registers-description">
    <title>Description</title>
    
    <para>
@@ -811,7 +859,7 @@
    
   </sect2>
   
-  <sect2>
+  <sect2 id="shadow-registers-how-to-use">
    <title>How to Use Shadow Register Files</title>
    
    <para>
@@ -923,7 +971,48 @@
 
 
  <sect1 id="administration-ranking">
-  <title>Static and Dynamic Ranking</title>
+  <title>Relevance Ranking and Sorting of Result Sets</title>
+
+  <sect2 id="administration-overview">
+   <title>Overview</title>
+   <para>
+    The default ordering of a result set is left up to the server,
+    which inside Zebra means sorting in ascending document ID order. 
+    This is not always the order humans want to browse the sometimes
+    quite large hit sets. Ranking and sorting comes to the rescue.
+   </para>
+
+   <para> 
+    In cases where a good presentation ordering can be computed at
+    indexing time, we can use a fixed <literal>static ranking</literal>
+    scheme, which is provided for the <literal>alvis</literal>
+    indexing filter. This defines a fixed ordering of hit lists,
+    independently of the query issued. 
+   </para>
+
+   <para>
+    There are cases, however, where relevance of hit set documents is
+    highly dependent on the query processed.
+    Simply put, <literal>dynamic relevance ranking</literal> 
+    sorts a set of retrieved records such that those most likely to be
+    relevant to your request are retrieved first. 
+    Internally, Zebra retrieves all documents that satisfy your
+    query, and re-orders the hit list to arrange them based on
+    a measurement of similarity between your query and the content of
+    each record. 
+   </para>
+
+   <para>
+    Finally, there are situations where hit sets of documents should be
+    <literal>sorted</literal> during query time according to the
+    lexicographical ordering of certain sort indexes created at
+    indexing time.
+   </para>
+  </sect2>
+
+
+ <sect2 id="administration-ranking-static">
+  <title>Static Ranking</title>
   
    <para>
     Zebra uses internally inverted indexes to look up term occurencies
@@ -948,104 +1037,672 @@
     <screen>
     staticrank: 1 
     </screen> 
-    directive in the main core Zebra config file, the internal document
-    keys used for ordering are augmented by a preceeding integer, which
+    directive in the main core Zebra configuration file, the internal document
+    keys used for ordering are augmented by a preceding integer, which
     contains the static rank of a given document, and the index lists
     are ordered 
     first by ascending static rank,
     then by ascending document <literal>ID</literal>.
-   </para>
-   <para>
-    This implies that the default rank <literal>0</literal> 
-    is the best rank at the
-    beginning of the list, and <literal>max int</literal> 
-    is the worst static rank.
+    Zero
+    is the ``best'' rank, as it occurs at the
+    beginning of the list; higher numbers represent worse scores.
    </para>
    <para>
     The experimental <literal>alvis</literal> filter provides a
     directive to fetch static rank information out of the indexed XML
-    records, thus making <emphasis>all</emphasis> hit sets orderd
+    records, thus making <emphasis>all</emphasis> hit sets ordered
     after <emphasis>ascending</emphasis> static
     rank, and for those doc's which have the same static rank, ordered
     after <emphasis>ascending</emphasis> doc <literal>ID</literal>.
-    See <xref linkend="record-model-alvisxslt"/> for the glory details.
+    See <xref linkend="record-model-alvisxslt"/> for the gory details.
    </para>
+    </sect2>
+
+
+ <sect2 id="administration-ranking-dynamic">
+  <title>Dynamic Ranking</title>
    <para>
-    If one wants to do a little fiddeling with the static rank order,
-    one has to invoke additional re-ranking/re-ordering using dynamic 
-    reranking or score functions. These functions return positive
-    interger scores, where <emphasis>highest</emphasis> score is 
-    <emphasis>best</emphasis>, which means that the
-    hit sets will be sorted according to
-    <emphasis>decending</emphasis> 
+    In order to fiddle with the static rank order, it is necessary to
+    invoke additional re-ranking/re-ordering using dynamic
+    ranking or score functions. These functions return positive
+    integer scores, where <emphasis>highest</emphasis> score is 
+    ``best'';
+    hit sets are sorted according to <emphasis>descending</emphasis> 
     scores (in contrary
     to the index lists which are sorted according to
-    <emphasis>ascending</emphasis> rank  number and document ID).
+    ascending rank number and document ID).
    </para>
-   <!--
    <para>
-    Those are defined in the zebra C source files 
-    <screen>     
-    "rank-1" : zebra/index/rank1.c  
-               default TF/IDF like zebra dynamic ranking
-    "rank-static" : zebra/index/rankstatic.c
-               do-nothing dummy static ranking (this is just to prove
-               that the static rank can be used in dynamic ranking functions)  
-     "zvrank" : zebra/index/zvrank.c
-               many different dynamic TF/IDF ranking functions 
-    </screen> 
+    Dynamic ranking is enabled by a directive like one of the
+    following in the zebra configuration file (use only one of these a time!):
+    <screen> 
+    rank: rank-1        # default TDF-IDF like
+    rank: rank-static   # dummy do-nothing
+    </screen>
    </para>
-   -->
+ 
    <para>
-    Those are in the zebra config file enabled by a directive like (use
-    only one of these a time!):
-    <screen> 
-    rank: rank-1        # default
-    rank: rank-static   # dummy 
-    rank: zvrank        # TDF-IDF like
+    Dynamic ranking is done at query time rather than
+    indexing time (this is why we
+    call it ``dynamic ranking'' in the first place ...)
+    It is invoked by adding
+    the Bib-1 relation attribute with
+    value ``relevance'' to the PQF query (that is,
+    <literal>@attr&nbsp;2=102</literal>, see also  
+    <ulink url="&url.z39.50;bib1.html">
+     The BIB-1 Attribute Set Semantics</ulink>, also in 
+      <ulink url="&url.z39.50.attset.bib1;">HTML</ulink>). 
+    To find all articles with the word <literal>Eoraptor</literal> in
+    the title, and present them relevance ranked, issue the PQF query:
+    <screen>
+     @attr 2=102 @attr 1=4 Eoraptor
     </screen>
-    Notice that the <literal>rank-1</literal> and
-    <literal>zvrank</literal> do not use the static rank 
-    information in the list keys, and will produce the same ordering
-    with our without static ranking enabled.
    </para>
+
+    <sect3 id="administration-ranking-dynamic-rank1">
+     <title>Dynamically ranking using PQF queries with the 'rank-1' 
+      algorithm</title>
+
    <para>
+     The default <literal>rank-1</literal> ranking module implements a 
+     TF/IDF (Term Frequecy over Inverse Document Frequency) like
+     algorithm. In contrast to the usual defintion of TF/IDF
+     algorithms, which only considers searching in one full-text
+     index, this one works on multiple indexes at the same time.
+     More precisely, 
+     Zebra does boolean queries and searches in specific addressed
+     indexes (there are inverted indexes pointing from terms in the
+     dictionary to documents and term positions inside documents). 
+     It works like this:
+     <variablelist>
+      <varlistentry>
+       <term>Query Components</term>
+       <listitem>
+        <para>
+         First, the boolean query is dismantled into it's principal components,
+         i.e. atomic queries where one term is looked up in one index.
+         For example, the query
+         <screen>
+        @attr 2=102 @and @attr 1=1010 Utah @attr 1=1018 Springer
+         </screen>
+         is a boolean AND between the atomic parts
+         <screen>
+       @attr 2=102 @attr 1=1010 Utah
+         </screen>
+          and
+         <screen>
+       @attr 2=102 @attr 1=1018 Springer
+         </screen>
+         which gets processed each for itself.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>Atomic hit lists</term>
+       <listitem>
+        <para>
+         Second, for each atomic query, the hit list of documents is
+         computed.
+        </para>
+        <para>
+         In this example, two hit lists for each index  
+         <literal>@attr 1=1010</literal>  and  
+         <literal>@attr 1=1018</literal> are computed.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>Atomic scores</term>
+       <listitem>
+        <para>
+         Third, each document in the hit list is assigned a score (_if_ ranking
+         is enabled and requested in the query)  using a TF/IDF scheme.
+        </para>
+        <para>
+         In this example, both atomic parts of the query assign the magic
+         <literal>@attr 2=102</literal> relevance attribute, and are
+         to be used in the relevance ranking functions. 
+        </para>
+        <para>
+         It is possible to apply dynamic ranking on only parts of the
+         PQF query: 
+         <screen>
+          @and @attr 2=102 @attr 1=1010 Utah @attr 1=1018 Springer
+         </screen>
+         searches for all documents which have the term 'Utah' on the
+         body of text, and which have the term 'Springer' in the publisher
+         field, and sort them in the order of the relevance ranking made on
+         the body-of-text index only. 
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>Hit list merging</term>
+       <listitem>
+        <para>
+         Fourth, the atomic hit lists are merged according to the boolean
+         conditions to a final hit list of documents to be returned.
+        </para>
+        <para>
+        This step is always performed, independently of the fact that
+        dynamic ranking is enabled or not.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>Document score computation</term>
+       <listitem>
+        <para>
+         Fifth, the total score of a document is computed as a linear
+         combination of the atomic scores of the atomic hit lists
+        </para>
+        <para>
+         Ranking weights may be used to pass a value to a ranking
+         algorithm, using the non-standard BIB-1 attribute type 9.
+         This allows one branch of a query to use one value while
+         another branch uses a different one.  For example, we can search
+         for <literal>utah</literal> in the 
+         <literal>@attr 1=4</literal> index with weight 30, as
+         well as in the <literal>@attr 1=1010</literal> index with weight 20:
+         <screen>
+         @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 @attr 1=1010 city
+         </screen>
+        </para>
+        <para>
+         The default weight is
+         sqrt(1000) ~ 34 , as the Z39.50 standard prescribes that the top score
+         is 1000 and the bottom score is 0, encoded in integers.
+        </para>
+        <warning>
+         <para>
+          The ranking-weight feature is experimental. It may change in future
+          releases of zebra. 
+         </para>
+        </warning>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>Re-sorting of hit list</term>
+       <listitem>
+        <para>
+         Finally, the final hit list is re-ordered according to scores.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+ 
+
+<!--
+Still need to describe the exact TF/IDF formula. Here's the info, need -->
+<!--to extract it in human readable form .. MC
+
+static int calc (void *set_handle, zint sysno, zint staticrank,
+                 int *stop_flag)
+{
+    int i, lo, divisor, score = 0;
+    struct rank_set_info *si = (struct rank_set_info *) set_handle;
+
+    if (!si->no_rank_entries)
+        return -1;   /* ranking not enabled for any terms */
+
+    for (i = 0; i < si->no_entries; i++)
+    {
+        yaz_log(log_level, "calc: i=%d rank_flag=%d lo=%d",
+                i, si->entries[i].rank_flag, si->entries[i].local_occur);
+        if (si->entries[i].rank_flag && (lo = si->entries[i].local_occur))
+            score += (8+log2_int (lo)) * si->entries[i].global_inv *
+                si->entries[i].rank_weight;
+    }
+    divisor = si->no_rank_entries * (8+log2_int (si->last_pos/si->no_entries));
+    score = score / divisor;
+    yaz_log(log_level, "calc sysno=" ZINT_FORMAT " score=%d", sysno, score);
+    if (score > 1000)
+        score = 1000;
+    /* reset the counts for the next term */
+    for (i = 0; i < si->no_entries; i++)
+        si->entries[i].local_occur = 0;
+    return score;
+}
+
+
+where lo = si->entries[i].local_occur is the local documents term-within-index frequency, si->entries[i].global_inv represents the IDF part (computed in static void *begin()), and
+si->entries[i].rank_weight is the weight assigner per index (default 34, or set in the @attr 9=xyz magic)
+
+Finally, the IDF part is computed as:
+
+static void *begin (struct zebra_register *reg,
+                    void *class_handle, RSET rset, NMEM nmem,
+                    TERMID *terms, int numterms)
+{
+    struct rank_set_info *si =
+        (struct rank_set_info *) nmem_malloc (nmem,sizeof(*si));
+    int i;
+
+    yaz_log(log_level, "rank-1 begin");
+    si->no_entries = numterms;
+    si->no_rank_entries = 0;
+    si->nmem=nmem;
+    si->entries = (struct rank_term_info *)
+        nmem_malloc (si->nmem, sizeof(*si->entries)*numterms);
+    for (i = 0; i < numterms; i++)
+    {
+        zint g = rset_count(terms[i]->rset);
+        yaz_log(log_level, "i=%d flags=%s '%s'", i,
+                terms[i]->flags, terms[i]->name );
+        if  (!strncmp (terms[i]->flags, "rank,", 5))
+        {
+            const char *cp = strstr(terms[i]->flags+4, ",w=");
+            si->entries[i].rank_flag = 1;
+            if (cp)
+                si->entries[i].rank_weight = atoi (cp+3);
+            else
+              si->entries[i].rank_weight = 34; /* sqrroot of 1000 */
+            yaz_log(log_level, " i=%d weight=%d g="ZINT_FORMAT, i,
+                     si->entries[i].rank_weight, g);
+            (si->no_rank_entries)++;
+        }
+        else
+            si->entries[i].rank_flag = 0;
+        si->entries[i].local_occur = 0;  /* FIXME */
+        si->entries[i].global_occur = g;
+        si->entries[i].global_inv = 32 - log2_int (g);
+        yaz_log(log_level, " global_inv = %d g = " ZINT_FORMAT,
+                (int) (32-log2_int (g)), g);
+        si->entries[i].term = terms[i];
+        si->entries[i].term_index=i;
+        terms[i]->rankpriv = &(si->entries[i]);
+    }
+    return si;
+}
+
+
+where g = rset_count(terms[i]->rset) is the count of all documents in this specific index hit list, and the IDF part then is
+
+ si->entries[i].global_inv = 32 - log2_int (g);
+   -->
+
+   </para>
+
+
+    <para>
+    The <literal>rank-1</literal> algorithm
+    does not use the static rank 
+    information in the list keys, and will produce the same ordering
+    with or without static ranking enabled.
+    </para>
+ 
+
+    <!--
+    <sect3 id="administration-ranking-dynamic-rank1">
+     <title>Dynamically ranking PQF queries with the 'rank-static' 
+      algorithm</title>
+    <para>
     The dummy <literal>rank-static</literal> reranking/scoring
     function returns just 
     <literal>score = max int - staticrank</literal>
-    in order to preserve the ordering of hit sets with and without it's
-    call.
-     Obviously, to combine static and dynamic ranking usefully, one wants
+    in order to preserve the static ordering of hit sets that would
+    have been produced had it not been invoked.
+    Obviously, to combine static and dynamic ranking usefully,
+    it is necessary
     to make a new ranking 
-    function, which is left
+    function; this is left
     as an exercise for the reader. 
    </para>
-   
+    </sect3>
+    -->
+ 
+   <warning>
+     <para>
+      <literal>Dynamic ranking</literal> is not compatible
+      with <literal>estimated hit sizes</literal>, as all documents in
+      a hit set must be accessed to compute the correct placing in a
+      ranking sorted list. Therefore the use attribute setting
+      <literal>@attr&nbsp;2=102</literal> clashes with 
+      <literal>@attr&nbsp;9=integer</literal>. 
+     </para>
+   </warning>  
+
+   <!--
+    we might want to add ranking like this:
+    UNPUBLISHED:
+    Simple BM25 Extension to Multiple Weighted Fields
+    Stephen Robertson, Hugo Zaragoza and Michael Taylor
+    Microsoft Research
+    ser@microsoft.com
+    hugoz@microsoft.com
+    mitaylor2microsoft.com
+   -->
+
+    </sect3>
+
+    <sect3 id="administration-ranking-dynamic-cql">
+     <title>Dynamically ranking CQL queries</title>
+     <para>
+      Dynamic ranking can be enabled during sever side CQL
+      query expansion by adding <literal>@attr&nbsp;2=102</literal>
+      chunks to the CQL config file. For example
+      <screen>
+       relationModifier.relevant		= 2=102
+      </screen>
+      invokes dynamic ranking each time a CQL query of the form 
+      <screen>
+       Z> querytype cql
+       Z> f alvis.text =/relevant house
+      </screen>
+      is issued. Dynamic ranking can also be automatically used on
+      specific CQL indexes by (for example) setting
+      <screen>
+       index.alvis.text                        = 1=text 2=102
+      </screen>
+      which then invokes dynamic ranking each time a CQL query of the form 
+      <screen>
+       Z> querytype cql
+       Z> f alvis.text = house
+      </screen>
+      is issued.
+     </para>
+     
+    </sect3>
+
+    </sect2>
+
+
+ <sect2 id="administration-ranking-sorting">
+  <title>Sorting</title>
+   <para>
+     Zebra sorts efficiently using special sorting indexes
+     (type=<literal>s</literal>; so each sortable index must be known
+     at indexing time, specified in the configuration of record
+     indexing.  For example, to enable sorting according to the BIB-1
+     <literal>Date/time-added-to-db</literal> field, one could add the line
+     <screen>
+        xelm /*/@created               Date/time-added-to-db:s
+     </screen>
+     to any <literal>.abs</literal> record-indexing configuration file.
+     Similarly, one could add an indexing element of the form
+     <screen><![CDATA[       
+      <z:index name="date-modified" type="s">
+       <xsl:value-of select="some/xpath"/>
+      </z:index>
+      ]]></screen>
+     to any <literal>alvis</literal>-filter indexing stylesheet.
+     </para>
+     <para>
+      Indexing can be specified at searching time using a query term
+      carrying the non-standard
+      BIB-1 attribute-type <literal>7</literal>.  This removes the
+      need to send a Z39.50 <literal>Sort Request</literal>
+      separately, and can dramatically improve latency when the client
+      and server are on separate networks.
+      The sorting part of the query is separate from the rest of the
+      query - the actual search specification - and must be combined
+      with it using OR.
+     </para>
+     <para>
+      A sorting subquery needs two attributes: an index (such as a
+      BIB-1 type-1 attribute) specifying which index to sort on, and a
+      type-7 attribute whose value is be <literal>1</literal> for
+      ascending sorting, or <literal>2</literal> for descending.  The
+      term associated with the sorting attribute is the priority of
+      the sort key, where <literal>0</literal> specifies the primary
+      sort key, <literal>1</literal> the secondary sort key, and so
+      on.
+     </para>
+    <para>For example, a search for water, sort by title (ascending),
+    is expressed by the PQF query
+     <screen>
+     @or @attr 1=1016 water @attr 7=1 @attr 1=4 0
+     </screen>
+      whereas a search for water, sort by title ascending, 
+     then date descending would be
+     <screen>
+     @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1
+     </screen>
+    </para>
+    <para>
+     Notice the fundamental differences between <literal>dynamic
+     ranking</literal> and <literal>sorting</literal>: there can be
+     only one ranking function defined and configured; but multiple
+     sorting indexes can be specified dynamically at search
+     time. Ranking does not need to use specific indexes, so
+     dynamic ranking can be enabled and disabled without
+     re-indexing; whereas, sorting indexes need to be
+     defined before indexing.
+     </para>
+
+ </sect2>
+
+
  </sect1>
 
  <sect1 id="administration-extended-services">
   <title>Extended Services: Remote Insert, Update and Delete</title>
   
+   <note>
+    <para>
+     Extended services are only supported when accessing the Zebra
+     server using the <ulink url="&url.z39.50;">Z39.50</ulink>
+     protocol. The <ulink url="&url.sru;">SRU</ulink> protocol does
+     not support extended services.
+    </para>
+   </note>
+   
   <para>
     The extended services are not enabled by default in zebra - due to the
-    fact that they modify the system.
-    In order to allow anybody to update, use
-    <screen>
-    perm.anonymous: rw
-    </screen>
+    fact that they modify the system. Zebra can be configured
+    to allow anybody to
+    search, and to allow only updates for a particular admin user
     in the main zebra configuration file <filename>zebra.cfg</filename>.
-    Or, even better, allow only updates for a particular admin user. For
-    user <literal>admin</literal>, you could use:
+    For user <literal>admin</literal>, you could use:
     <screen>
+     perm.anonymous: r
      perm.admin: rw
      passwd: passwordfile
     </screen>
-    And in <filename>passwordfile</filename>, specify users and
-    passwords as colon seperated strings:
+    And in the password file 
+    <filename>passwordfile</filename>, you have to specify users and
+    encrypted passwords as colon separated strings. 
+    Use a tool like <filename>htpasswd</filename> 
+    to maintain the encrypted passwords. 
     <screen> 
      admin:secret
-    </screen> 
+    </screen>
+    It is essential to configure  Zebra to store records internally, 
+    and to support
+    modifications and deletion of records:
+    <screen>
+     storeData: 1
+     storeKeys: 1
+    </screen>
+    The general record type should be set to any record filter which
+    is able to parse XML records, you may use any of the two
+    declarations (but not both simultaneously!)
+    <screen>    
+     recordType: grs.xml
+     # recordType: alvis.filter_alvis_config.xml
+    </screen>
+    To enable transaction safe shadow indexing,
+    which is extra important for this kind of operation, set
+    <screen>
+     shadow: directoryname: size (e.g. 1000M)
+    </screen>
+     See <xref linkend="zebra-cfg"/> for additional information on
+     these configuration options.
    </para>
+   <note>
+    <para>
+     It is not possible to carry information about record types or
+     similar to Zebra when using extended services, due to
+     limitations of the <ulink url="&url.z39.50;">Z39.50</ulink>
+     protocol. Therefore, indexing filters can not be chosen on a
+     per-record basis. One and only one general XML indexing filter
+     must be defined.  
+     <!-- but because it is represented as an OID, we would need some
+     form of proprietary mapping scheme between record type strings and
+     OIDs. -->
+     <!--
+     However, as a minimum, it would be extremely useful to enable
+     people to use MARC21, assuming grs.marcxml.marc21 as a record
+     type.  
+     -->
+    </para>
+   </note>
+
+
+   <sect2 id="administration-extended-services-z3950">
+    <title>Extended services in the Z39.50 protocol</title>
+
+    <para>
+     The <ulink url="&url.z39.50;">Z39.50</ulink> standard allows
+     servers to accept special binary <emphasis>extended services</emphasis>
+     protocol packages, which may be used to insert, update and delete
+     records into servers. These carry  control and update
+     information to the servers, which are encoded in seven package fields: 
+    </para>
+
+    <table id="administration-extended-services-z3950-table" frame="top">
+     <title>Extended services Z39.50 Package Fields</title>
+      <tgroup cols="3">
+       <thead>
+       <row>
+         <entry>Parameter</entry>
+         <entry>Value</entry>
+         <entry>Notes</entry>
+        </row>
+      </thead>
+       <tbody>
+        <row>
+         <entry><literal>type</literal></entry>
+         <entry><literal>'update'</literal></entry>
+         <entry>Must be set to trigger extended services</entry>
+        </row>
+        <row>
+         <entry><literal>action</literal></entry>
+         <entry><literal>string</literal></entry>
+        <entry>
+         Extended service action type with 
+         one of four possible values: <literal>recordInsert</literal>,
+         <literal>recordReplace</literal>,
+         <literal>recordDelete</literal>,
+         and <literal>specialUpdate</literal>
+        </entry>
+        </row>
+        <row>
+         <entry><literal>record</literal></entry>
+         <entry><literal>XML string</literal></entry>
+         <entry>An XML formatted string containing the record</entry>
+        </row>
+        <row>
+         <entry><literal>syntax</literal></entry>
+         <entry><literal>'xml'</literal></entry>
+         <entry>Only XML record syntax is supported</entry>
+        </row>
+        <row>
+         <entry><literal>recordIdOpaque</literal></entry>
+         <entry><literal>string</literal></entry>
+         <entry>
+         Optional  client-supplied, opaque record
+         identifier used under insert operations.
+        </entry>
+        </row>
+        <row>
+         <entry><literal>recordIdNumber </literal></entry>
+         <entry><literal>positive number</literal></entry>
+         <entry>Zebra's internal system number,
+         not allowed for  <literal>recordInsert</literal> or 
+         <literal>specialUpdate</literal> actions which result in fresh
+         record inserts.
+        </entry>
+        </row>
+        <row>
+         <entry><literal>databaseName</literal></entry>
+         <entry><literal>database identifier</literal></entry>
+        <entry>
+         The name of the database to which the extended services should be 
+         applied.
+        </entry>
+        </row>
+      </tbody>
+      </tgroup>
+     </table>
+
+
+   <para>
+    The <literal>action</literal> parameter can be any of 
+    <literal>recordInsert</literal> (will fail if the record already exists),
+    <literal>recordReplace</literal> (will fail if the record does not exist),
+    <literal>recordDelete</literal> (will fail if the record does not
+       exist), and
+    <literal>specialUpdate</literal> (will insert or update the record
+       as needed, record deletion is not possible).
+   </para>
+
+    <para>
+     During all actions, the
+     usual rules for internal record ID generation apply, unless an
+     optional <literal>recordIdNumber</literal> Zebra internal ID or a
+    <literal>recordIdOpaque</literal> string identifier is assigned. 
+     The default ID generation is
+     configured using the <literal>recordId:</literal> from
+     <filename>zebra.cfg</filename>.  
+     See <xref linkend="zebra-cfg"/>.   
+    </para>
+
+   <para>
+    Setting of the <literal>recordIdNumber</literal> parameter, 
+    which must be an existing Zebra internal system ID number, is not
+    allowed during any  <literal>recordInsert</literal> or 
+     <literal>specialUpdate</literal> action resulting in fresh record
+    inserts.
+    </para>
+
+    <para>
+     When retrieving existing
+     records indexed with GRS indexing filters, the Zebra internal 
+     ID number is returned in the field
+    <literal>/*/id:idzebra/localnumber</literal> in the namespace
+    <literal>xmlns:id="http://www.indexdata.dk/zebra/"</literal>,
+    where it can be picked up for later record updates or deletes. 
+    </para>
+ 
+    <para>
+     A new element set for retrieval of internal record
+     data has been added, which can be used to access minimal records
+     containing only the <literal>recordIdNumber</literal> Zebra
+     internal ID, or the <literal>recordIdOpaque</literal> string
+     identifier. This works for any indexing filter used.
+     See <xref linkend="special-retrieval"/>.
+    </para>
+
+   <para>
+     The <literal>recordIdOpaque</literal> string parameter
+     is an client-supplied, opaque record
+     identifier, which may be  used under 
+     insert, update and delete operations. The
+     client software is responsible for assigning these to
+     records.      This identifier will
+     replace zebra's own automagic identifier generation with a unique
+     mapping from <literal>recordIdOpaque</literal> to the 
+     Zebra internal <literal>recordIdNumber</literal>.
+     <emphasis>The opaque <literal>recordIdOpaque</literal> string
+     identifiers
+      are not visible in retrieval records, nor are
+      searchable, so the value of this parameter is
+      questionable. It serves mostly as a convenient mapping from
+      application domain string identifiers to Zebra internal ID's.
+     </emphasis> 
+    </para>
+   </sect2>
+
+   
+ <sect2 id="administration-extended-services-yaz-client">
+  <title>Extended services from yaz-client</title>
+
    <para>
     We can now start a yaz-client admin session and create a database:
    <screen>
@@ -1059,14 +1716,11 @@
     from example/gils/records) and index it:
    <screen>  
     <![CDATA[
-     Z> update insert 1 esdd0006.grs
+     Z> update insert id1234 esdd0006.grs
      ]]>
    </screen>
-    The 3rd parameter - <literal>1</literal> here -
-      is the opaque record ID from <literal>Ext update</literal>.
-      It a record ID that <emphasis>we</emphasis> assign to the record
-    in question. If we do not 
-    assign one, the usual rules for match apply (recordId: from zebra.cfg).
+    The 3rd parameter - <literal>id1234</literal> here -
+      is the  <literal>recordIdOpaque</literal> package field.
    </para>
    <para>
     Actually, we should have a way to specify "no opaque record id" for
@@ -1088,10 +1742,11 @@
     </screen>
    </para>
    <para>
-    Let's delete the beast:
+     Let's delete the beast, using the same 
+     <literal>recordIdOpaque</literal> string parameter:
     <screen>
     <![CDATA[
-     Z> update delete 1
+     Z> update delete id1234
      No last record (update ignored)
      Z> update delete 1 esdd0006.grs
      Got extended services response
@@ -1120,8 +1775,14 @@
      after each update session in order write your changes from the
      shadow to the life register space.
    </para>
+ </sect2>
+
+  
+ <sect2 id="administration-extended-services-yaz-php">
+  <title>Extended services from yaz-php</title>
+
    <para>
-    Extended services are also available from the YAZ client layer. An
+    Extended services are also available from the YAZ PHP client layer. An
     example of an YAZ-PHP extended service transaction is given here:
     <screen>
     <![CDATA[
@@ -1141,94 +1802,10 @@
        echo "$error";
      ]]>
     </screen>  
-   </para>
- </sect1>
-
-
-  <sect1 id="gfs-config">
-   <title>YAZ Frontend Virtual Hosts</title>
-    <para>
-     <command>zebrasrv</command> uses the YAZ server frontend and does
-     support multiple virtual servers behind multiple listening sockets.
     </para>
-    &zebrasrv-virtual;
- 
-   <para>
-    Section "Virtual Hosts" in the YAZ manual.
-    <filename>http://www.indexdata.dk/yaz/doc/server.vhosts.tkl</filename>
-   </para>
- </sect1>
-
-
-  <sect1 id="administration-cql-to-pqf">
-   <title>Server Side CQL to PQF Query Translation</title>
-   <para>
-    Using the
-    <literal>&lt;cql2rpn&gt;l2rpn.txt&lt;/cql2rpn&gt;</literal>
-      YAZ Frontend Virtual
-    Hosts option, one can configure
-    the YAZ Frontend CQL-to-PQF
-    converter, specifying the interpretation of various 
-    <ulink url="http://www.loc.gov/standards/sru/cql/">CQL</ulink>
-    indexes, relations, etc. in terms of Type-1 query attributes.
-    <!-- The  yaz-client config file -->  
-   </para>
-   <para>
-    For example, using server-side CQL-to-PQF conversion, one might
-    query a zebra server like this:
-    <screen>
-    <![CDATA[
-     yaz-client localhost:9999
-     Z> querytype cql
-     Z> find text=(plant and soil)
-     ]]>
-    </screen>
-     and - if properly configured - even static relevance ranking can
-     be performed using CQL query syntax:
-    <screen>
-    <![CDATA[
-     Z> find text = /relevant (plant and soil)
-     ]]>
-     </screen>
-   </para>
-
-   <para>
-    By the way, the same configuration can be used to 
-    search using client-side CQL-to-PQF conversion:
-    (the only difference is <literal>querytype cql2rpn</literal> 
-    instead of 
-    <literal>querytype cql</literal>, and the call specifying a local
-    conversion file)
-    <screen>
-    <![CDATA[
-     yaz-client -q local/cql2pqf.txt localhost:9999
-     Z> querytype cql2rpn
-     Z> find text=(plant and soil)
-     ]]>
-     </screen>
-   </para>
-
-   <para>
-    Exhaustive information can be found in the
-    Section "Specification of CQL to RPN mappings" in the YAZ manual.
-    <ulink url="http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map">
-     http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map</ulink>,
-   and shall therefore not be repeated here.
-   </para> 
-  <!-- 
-  <para>
-    See 
-      <ulink url="http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html">
-      http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html</ulink>
-    for the Maintenance Agency's work-in-progress mapping of Dublin Core
-    indexes to Attribute Architecture (util, XD and BIB-2)
-    attributes.
-   </para>
-   -->
+    </sect2>
  </sect1>
 
-
- 
 </chapter>
 
  <!-- Keep this comment at the end of the file