Rolling mods to Marc's new ranking prose. (Check in early, check in

author Mike Taylor <mike@indexdata.com>

Tue, 2 May 2006 12:23:02 +0000 (12:23 +0000)

committer Mike Taylor <mike@indexdata.com>

Tue, 2 May 2006 12:23:02 +0000 (12:23 +0000)
author Mike Taylor <mike@indexdata.com>
Tue, 2 May 2006 12:23:02 +0000 (12:23 +0000)
committer Mike Taylor <mike@indexdata.com>
Tue, 2 May 2006 12:23:02 +0000 (12:23 +0000)
diff --git a/doc/administration.xml b/doc/administration.xml

index 773edfd..1dd6a22 100644 (file)
--- a/doc/administration.xml
+++ b/doc/administration.xml
@@ -1,5 +1,5 @@
  <chapter id="administration">
  <chapter id="administration">
- <!-- $Id: administration.xml,v 1.31 2006-05-01 13:07:40 marc Exp $ -->
+ <!-- $Id: administration.xml,v 1.32 2006-05-02 12:23:02 mike Exp $ -->
   <title>Administrating Zebra</title>
   <!-- ### It's a bit daft that this chapter (which describes half of
            the configuration-file formats) is separated from
   <title>Administrating Zebra</title>
   <!-- ### It's a bit daft that this chapter (which describes half of
            the configuration-file formats) is separated from
@@ -925,6 +925,8 @@
   <sect1 id="administration-ranking">
    <title>Relevance Ranking and Sorting of Result Sets</title>
  
   <sect1 id="administration-ranking">
    <title>Relevance Ranking and Sorting of Result Sets</title>
  
+  <sect2>
+   <title>Overview</title>
     <para>
      The default ordering of a result set is left up to the server,
      which inside Zebra means sorting in ascending document ID order. 
     <para>
      The default ordering of a result set is left up to the server,
      which inside Zebra means sorting in ascending document ID order. 
@@ -933,7 +935,7 @@
     </para>
  
     <para> 
     </para>
  
     <para> 
-    In case a good presentation ordering can be computed at
+    In cases where a good presentation ordering can be computed at
      indexing time, we can use a fixed <literal>static ranking</literal>
      scheme, which is provided for the <literal>alvis</literal>
      indexing filter. This defines a fixed ordering of hit lists,
      indexing time, we can use a fixed <literal>static ranking</literal>
      scheme, which is provided for the <literal>alvis</literal>
      indexing filter. This defines a fixed ordering of hit lists,
@@ -944,12 +946,12 @@
      There are cases, however, where relevance of hit set documents is
      highly dependent on the query processed.
      Simply put, <literal>dynamic relevance ranking</literal> 
      There are cases, however, where relevance of hit set documents is
      highly dependent on the query processed.
      Simply put, <literal>dynamic relevance ranking</literal> 
-    sortes a set of retrieved 
+    sorts a set of retrieved 
      records such
      that those most likely to be relevant to your request are
      retrieved first. 
      records such
      that those most likely to be relevant to your request are
      retrieved first. 
-    Internally, Zebra  retrieves all documents ID's that satisfy your
-    search query, and re-orders the hit list to arrange them based on
+    Internally, Zebra retrieves all documents that satisfy your
+    query, and re-orders the hit list to arrange them based on
      a measurement of similarity between your query and the content of
      each record. 
     </para>
      a measurement of similarity between your query and the content of
      each record. 
     </para>
@@ -960,7 +962,7 @@
      lexicographical ordering of certain sort indexes created at
      indexing time.
     </para>
      lexicographical ordering of certain sort indexes created at
      indexing time.
     </para>
-
+  </sect2>
  
  
   <sect2 id="administration-ranking-static">
  
  
   <sect2 id="administration-ranking-static">
@@ -995,12 +997,9 @@
      are ordered 
      first by ascending static rank,
      then by ascending document <literal>ID</literal>.
      are ordered 
      first by ascending static rank,
      then by ascending document <literal>ID</literal>.
-   </para>
-   <para>
-    This implies that the default rank <literal>0</literal> 
-    is the best rank at the
-    beginning of the list, and <literal>max int</literal> 
-    is the worst static rank.
+    Zero
+    is the ``best'' rank, as it occurs at the
+    beginning of the list; higher numbers represent worse scores.
     </para>
     <para>
      The experimental <literal>alvis</literal> filter provides a
     </para>
     <para>
      The experimental <literal>alvis</literal> filter provides a
@@ -1009,7 +1008,7 @@
      after <emphasis>ascending</emphasis> static
      rank, and for those doc's which have the same static rank, ordered
      after <emphasis>ascending</emphasis> doc <literal>ID</literal>.
      after <emphasis>ascending</emphasis> static
      rank, and for those doc's which have the same static rank, ordered
      after <emphasis>ascending</emphasis> doc <literal>ID</literal>.
-    See <xref linkend="record-model-alvisxslt"/> for the glory details.
+    See <xref linkend="record-model-alvisxslt"/> for the gory details.
     </para>
      </sect2>
  
     </para>
      </sect2>
  
@@ -1017,20 +1016,20 @@
   <sect2 id="administration-ranking-dynamic">
    <title>Dynamic Ranking</title>
     <para>
   <sect2 id="administration-ranking-dynamic">
    <title>Dynamic Ranking</title>
     <para>
-    If one wants to do a little fiddeling with the static rank order,
-    one has to invoke additional re-ranking/re-ordering using dynamic 
-    reranking or score functions. These functions return positive
-    interger scores, where <emphasis>highest</emphasis> score is 
-    <emphasis>best</emphasis>, which means that the
-    hit sets will be sorted according to
+    In order to fiddle with the static rank order, it is necessary to
+    invoke additional re-ranking/re-ordering using dynamic
+    ranking or score functions. These functions return positive
+    integer scores, where <emphasis>highest</emphasis> score is 
+    ``best'';
+    hit sets are sorted according to
      <emphasis>decending</emphasis> 
      scores (in contrary
      to the index lists which are sorted according to
      <emphasis>decending</emphasis> 
      scores (in contrary
      to the index lists which are sorted according to
-    <emphasis>ascending</emphasis> rank  number and document ID).
+    ascending rank number and document ID).
     </para>
     <para>
     </para>
     <para>
-    Those are in the zebra config file enabled by a directive like (use
-    only one of these a time!):
+    Dynamic ranking is enabled by a directive like one of the
+    following in the zebra config file (use only one of these a time!):
      <screen> 
      rank: rank-1        # default TDF-IDF like
      rank: rank-static   # dummy do-nothing
      <screen> 
      rank: rank-1        # default TDF-IDF like
      rank: rank-static   # dummy do-nothing
@@ -1039,33 +1038,36 @@
      Notice that the <literal>rank-1</literal> and
      <literal>zvrank</literal> do not use the static rank 
      information in the list keys, and will produce the same ordering
      Notice that the <literal>rank-1</literal> and
      <literal>zvrank</literal> do not use the static rank 
      information in the list keys, and will produce the same ordering
-    with our without static ranking enabled.
+    with or without static ranking enabled.
     </para>
     <para>
      The dummy <literal>rank-static</literal> reranking/scoring
      function returns just 
      <literal>score = max int - staticrank</literal>
     </para>
     <para>
      The dummy <literal>rank-static</literal> reranking/scoring
      function returns just 
      <literal>score = max int - staticrank</literal>
-    in order to preserve the ordering of hit sets with and without it's
-    call.
-     Obviously, to combine static and dynamic ranking usefully, one wants
+    in order to preserve the static ordering of hit sets that would
+    have been produced had it not been invoked.
+    Obviously, to combine static and dynamic ranking usefully,
+    it is necessary
      to make a new ranking 
      to make a new ranking 
-    function, which is left
+    function; this is left
      as an exercise for the reader. 
     </para>
  
  
     <para>
      as an exercise for the reader. 
     </para>
  
  
     <para>
-    Invoking dynamic ranking is done in query time (this is why we
-    call it 'dynamic ranking' in the first place ..). One has to add
+    Dynamic ranking is done at query time rather than
+    indexing time (this is why we
+    call it ``dynamic ranking'' in the first place ...)
+    It is invoked by adding
      the Bib-1 relation attribute with
      the Bib-1 relation attribute with
-    value "relevance" to the PQF query (that is, <literal>@attr
-    2=102</literal>, see also  
+    value ``relevance'' to the PQF query (that is,
+    <literal>@attr&nbsp;2=102</literal>, see also  
      <ulink url="ftp://ftp.loc.gov/pub/z3950/defs/bib1.txt">
       The BIB-1 Attribute Set Semantics</ulink>). 
      <ulink url="ftp://ftp.loc.gov/pub/z3950/defs/bib1.txt">
       The BIB-1 Attribute Set Semantics</ulink>). 
-    To find all articles with the word 'Eoraptor' in
-    the title, and present them relevance ranked, one issues the PQF query:
+    To find all articles with the word <literal>Eoraptor</literal> in
+    the title, and present them relevance ranked, issue the PQF query:
      <screen>
      <screen>
-     Z> f @attr 2=102 @attr 1=4 Eoraptor
+     @attr 2=102 @attr 1=4 Eoraptor
      </screen>
     </para>
   
      </screen>
     </para>
   
@@ -1080,8 +1082,8 @@
        with <literal>estimated hit sizes</literal>, as all documents in
        a hit set must be acessed to compute the correct placing in a
        ranking sorted list. Therefore the use attribute setting
        with <literal>estimated hit sizes</literal>, as all documents in
        a hit set must be acessed to compute the correct placing in a
        ranking sorted list. Therefore the use attribute setting
-      <literal>@attr 2=102</literal> clashes with 
-      <literal>@attr 9=</literal>. 
+      <literal>@attr&nbsp;2=102</literal> clashes with 
+      <literal>@attr&nbsp;9=integer</literal>. 
       </para>
     </warning>  
  
       </para>
     </warning>
author	Mike Taylor <mike@indexdata.com>
	Tue, 2 May 2006 12:23:02 +0000 (12:23 +0000)
committer	Mike Taylor <mike@indexdata.com>
	Tue, 2 May 2006 12:23:02 +0000 (12:23 +0000)