X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Farchitecture.xml;h=b6fe7cf140f8f6bde45b9c40678336cbc4c76117;hb=89d3a004b7c651fd5673abfc192e1472dc4d4197;hp=37afaee5bd62608bd0d8331a2fb23e3a5535787e;hpb=14a2dbce03d7802ab5b1e57b09d915339bb5fc54;p=idzebra-moved-to-github.git

diff --git a/doc/architecture.xml b/doc/architecture.xml
index 37afaee..b6fe7cf 100644
--- a/doc/architecture.xml
+++ b/doc/architecture.xml
@@ -1,11 +1,10 @@
  <chapter id="architecture">
-  <!-- $Id: architecture.xml,v 1.3 2006-02-15 11:07:47 marc Exp $ -->
+  <!-- $Id: architecture.xml,v 1.16 2006-11-30 10:29:23 adam Exp $ -->
   <title>Overview of Zebra Architecture</title>
-  
 
-  <sect1 id="local-representation">
+  <section id="architecture-representation">
    <title>Local Representation</title>
-
+   
    <para>
     As mentioned earlier, Zebra places few restrictions on the type of
     data that you can index and manage. Generally, whatever the form of
@@ -30,62 +29,9 @@
     "grs" keyword, separated by "." characters.
     -->
    </para>
-  </sect1>
-
-  <sect1 id="workflow">
-   <title>Indexing and Retrieval Workflow</title>
-
-  <para>
-   Records pass through three different states during processing in the
-   system.
-  </para>
-
-  <para>
-
-   <itemizedlist>
-    <listitem>
-     
-     <para>
-      When records are accessed by the system, they are represented
-      in their local, or native format. This might be SGML or HTML files,
-      News or Mail archives, MARC records. If the system doesn't already
-      know how to read the type of data you need to store, you can set up an
-      input filter by preparing conversion rules based on regular
-      expressions and possibly augmented by a flexible scripting language
-      (Tcl).
-      The input filter produces as output an internal representation,
-      a tree structure.
+  </section>
 
-     </para>
-    </listitem>
-    <listitem>
-
-     <para>
-      When records are processed by the system, they are represented
-      in a tree-structure, constructed by tagged data elements hanging off a
-      root node. The tagged elements may contain data or yet more tagged
-      elements in a recursive structure. The system performs various
-      actions on this tree structure (indexing, element selection, schema
-      mapping, etc.),
-
-     </para>
-    </listitem>
-    <listitem>
-
-     <para>
-      Before transmitting records to the client, they are first
-      converted from the internal structure to a form suitable for exchange
-      over the network - according to the Z39.50 standard.
-     </para>
-    </listitem>
-
-   </itemizedlist>
-
-  </para>
-  </sect1>
-
-
-  <sect1 id="maincomponents">
+  <section id="architecture-maincomponents">
    <title>Main Components</title>
    <para>
     The Zebra system is designed to support a wide range of data management
@@ -99,68 +45,121 @@
    </para>
    <para>
     The Zebra indexer and information retrieval server consists of the
-    following main applications: the <literal>zebraidx</literal>
-    indexing maintenance utility, and the <literal>zebrasrv</literal>
-    information query and retireval server. Both are using some of the
+    following main applications: the <command>zebraidx</command>
+    indexing maintenance utility, and the <command>zebrasrv</command>
+    information query and retrieval server. Both are using some of the
     same main components, which are presented here.
    </para>    
    <para>    
-    This virtual package installs all the necessary packages to start
+    The virtual Debian package <literal>idzebra-2.0</literal>
+    installs all the necessary packages to start
     working with Zebra - including utility programs, development libraries,
-    documentation and modules.
-     <literal>idzebra1.4</literal>
+    documentation and modules. 
   </para>    
    
-   <sect2 id="componentcore">
-    <title>Core Zebra Module Containing Common Functionality</title>
+   <section id="componentcore">
+    <title>Core Zebra Libraries Containing Common Functionality</title>
     <para>
-     - loads external filter modules used for presenting
-     the recods in a search response.
-     - executes search requests in PQF/RPN, which are handed over from
-     the YAZ server frontend API   
-     - calls resorting/reranking algorithms on the hit sets
-     - returns - possibly ranked - result sets, hit
-     numbers, and the like internal data to the YAZ server backend API.
-    </para>
+     The core Zebra module is the meat of the <command>zebraidx</command>
+    indexing maintenance utility, and the <command>zebrasrv</command>
+    information query and retrieval server binaries. Shortly, the core
+    libraries are responsible for  
+     <variablelist>
+      <varlistentry>
+       <term>Dynamic Loading</term>
+       <listitem>
+        <para>of external filter modules, in case the application is
+        not compiled statically. These filter modules define indexing,
+        search and retrieval capabilities of the various input formats.  
+        </para>
+       </listitem>
+      </varlistentry>
+      <varlistentry>
+       <term>Index Maintenance</term>
+       <listitem>
+        <para> Zebra maintains Term Dictionaries and ISAM index
+        entries in inverted index structures kept on disk. These are
+        optimized for fast inset, update and delete, as well as good
+        search performance.
+        </para>
+       </listitem>
+      </varlistentry>
+      <varlistentry>
+       <term>Search Evaluation</term>
+       <listitem>
+        <para>by execution of search requests expressed in PQF/RPN
+         data structures, which are handed over from
+         the YAZ server frontend API. Search evaluation includes
+         construction of hit lists according to boolean combinations
+         of simpler searches. Fast performance is achieved by careful
+         use of index structures, and by evaluation specific index hit
+         lists in correct order. 
+        </para>
+       </listitem>
+      </varlistentry>
+      <varlistentry>
+       <term>Ranking and Sorting</term>
+       <listitem>
+        <para>
+         components call resorting/re-ranking algorithms on the hit
+         sets. These might also be pre-sorted not only using the
+         assigned document ID's, but also using assigned static rank
+         information. 
+        </para>
+       </listitem>
+      </varlistentry>
+      <varlistentry>
+       <term>Record Presentation</term>
+       <listitem>
+        <para>returns - possibly ranked - result sets, hit
+         numbers, and the like internal data to the YAZ server backend API
+         for shipping to the client. Each individual filter module
+         implements it's own specific presentation formats.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+     </para>
     <para> 
-     This package contains all run-time libraries for Zebra.
-     <literal>libidzebra1.4</literal> 
-     This package includes documentation for Zebra in PDF and HTML.
-     <literal>idzebra1.4-doc</literal> 
-     This package includes common essential Zebra configuration files
-     <literal>idzebra1.4-common</literal>
+     The Debian package <literal>libidzebra-2.0</literal> 
+     contains all run-time libraries for Zebra, the 
+     documentation in PDF and HTML is found in 
+     <literal>idzebra-2.0-doc</literal>, and
+     <literal>idzebra-2.0-common</literal>
+     includes common essential Zebra configuration files.
     </para>
-   </sect2>
+   </section>
    
 
-   <sect2 id="componentindexer">
+   <section id="componentindexer">
     <title>Zebra Indexer</title>
     <para>
-     the core Zebra indexer which
-     - loads external filter modules used for indexing data records of
-     different type. 
-     - creates, updates and drops databases and indexes
+     The  <command>zebraidx</command>
+     indexing maintenance utility 
+     loads external filter modules used for indexing data records of
+     different type, and creates, updates and drops databases and
+     indexes according to the rules defined in the filter modules.
     </para>    
     <para>    
-     This package contains Zebra utilities such as the zebraidx indexer
-     utility and the zebrasrv server.
-     <literal>idzebra1.4-utils</literal>
+     The Debian  package <literal>idzebra-2.0-utils</literal> contains
+     the  <command>zebraidx</command> utility.
     </para>
-   </sect2>
+   </section>
 
-   <sect2 id="componentsearcher">
+   <section id="componentsearcher">
     <title>Zebra Searcher/Retriever</title>
     <para>
-     the core Zebra searcher/retriever which
+     This is the executable which runs the Z39.50/SRU/SRW server and
+     glues together the core libraries and the filter modules to one
+     great Information Retrieval server application. 
     </para>    
     <para>    
-     This package contains Zebra utilities such as the zebraidx indexer
-     utility and the zebrasrv server, and their associated man pages.
-     <literal>idzebra1.4-utils</literal>
+     The Debian  package <literal>idzebra-2.0-utils</literal> contains
+     the  <command>zebrasrv</command> utility.
     </para>
-   </sect2>
+   </section>
 
-   <sect2 id="componentyazserver">
+   <section id="componentyazserver">
     <title>YAZ Server Frontend</title>
     <para>
      The YAZ server frontend is 
@@ -170,488 +169,358 @@
     </para>
     <para>
      In addition to Z39.50 requests, the YAZ server frontend acts
-     as HTTP server, honouring
-      <ulink url="http://www.loc.gov/standards/sru/srw/">SRW</ulink> SOAP requests, and  <ulink url="http://www.loc.gov/standards/sru/">SRU</ulink> REST requests. Moreover, it can
-     translate inco ming <ulink url="http://www.loc.gov/standards/sru/cql/">CQL</ulink> queries to PQF/RPN queries, if
+     as HTTP server, honoring
+      <ulink url="&url.srw;">SRU SOAP</ulink> 
+     requests, and  
+     <ulink url="&url.sru;">SRU REST</ulink> 
+     requests. Moreover, it can
+     translate incoming 
+     <ulink url="&url.cql;">CQL</ulink>
+     queries to
+     <ulink url="&url.yaz.pqf;">PQF</ulink>
+      queries, if
      correctly configured. 
     </para>
     <para>
-    YAZ is a toolkit that allows you to develop software using the
-    ANSI Z39.50/ISO23950 standard for information retrieval.
-     <ulink url="http://www.loc.gov/standards/sru/srw/">SRW</ulink>/ <ulink url="http://www.loc.gov/standards/sru/">SRU</ulink>
-    <literal>libyazthread.so</literal>
-    <literal>libyaz.so</literal>
-    <literal>libyaz</literal>
+     <ulink url="&url.yaz;">YAZ</ulink>
+     is an Open Source  
+     toolkit that allows you to develop software using the
+     ANSI Z39.50/ISO23950 standard for information retrieval.
+     It is packaged in the Debian packages     
+     <literal>yaz</literal> and <literal>libyaz</literal>.
     </para>
-   </sect2>
+   </section>
    
-   <sect2 id="componentmodules">
+   <section id="componentmodules">
     <title>Record Models and Filter Modules</title>
     <para>
-      all filter modules which do indexing and record display filtering:
-This virtual package contains all base IDZebra filter modules. EMPTY ???
-     <literal>libidzebra1.4-modules</literal>
+     The hard work of knowing <emphasis>what</emphasis> to index, 
+     <emphasis>how</emphasis> to do it, and <emphasis>which</emphasis>
+     part of the records to send in a search/retrieve response is
+     implemented in 
+     various filter modules. It is their responsibility to define the
+     exact indexing and record display filtering rules.
+     </para>
+     <para>
+     The virtual Debian package
+     <literal>libidzebra-2.0-modules</literal> installs all base filter
+     modules. 
     </para>
 
-   <sect3 id="componentmodulestext">
+   
+   <section id="componentmodulestext">
     <title>TEXT Record Model and Filter Module</title>
     <para>
-      Plain ASCII text filter
-     <!--
-     <literal>text module missing as deb file<literal>
-     -->
+      Plain ASCII text filter. TODO: add information here.
     </para>
-   </sect3>
+   </section>
 
-   <sect3 id="componentmodulesgrs">
+   <section id="componentmodulesgrs">
     <title>GRS Record Model and Filter Modules</title>
     <para>
-    <xref linkend="record-model-grs"/>
-
-     - grs.danbib     GRS filters of various kind (*.abs files)
-IDZebra filter grs.danbib (DBC DanBib records)
-  This package includes grs.danbib filter which parses DanBib records.
-  DanBib is the Danish Union Catalogue hosted by DBC
-  (Danish Bibliographic Centre).
-     <literal>libidzebra1.4-mod-grs-danbib</literal>
-
-
-     - grs.marc
-     - grs.marcxml
-  This package includes the grs.marc and grs.marcxml filters that allows
-  IDZebra to read MARC records based on ISO2709.
-
-     <literal>libidzebra1.4-mod-grs-marc</literal>
-
-     - grs.regx
-     - grs.tcl        GRS TCL scriptable filter
-  This package includes the grs.regx and grs.tcl filters.
-     <literal>libidzebra1.4-mod-grs-regx</literal>
-
-
-     - grs.sgml
-     <literal>libidzebra1.4-mod-grs-sgml not packaged yet ??</literal>
-
-     - grs.xml
-  This package includes the grs.xml filter which uses <ulink url="http://expat.sourceforge.net/">Expat</ulink> to
-  parse records in XML and turn them into IDZebra's internal grs node.
-     <literal>libidzebra1.4-mod-grs-xml</literal>
+    The GRS filter modules described in 
+    <xref linkend="grs"/>
+    are all based on the Z39.50 specifications, and it is absolutely
+    mandatory to have the reference pages on BIB-1 attribute sets on
+    you hand when configuring GRS filters. The GRS filters come in
+    different flavors, and a short introduction is needed here.
+    GRS filters of various kind have also been called ABS filters due
+    to the <filename>*.abs</filename> configuration file suffix.
+    </para>
+    <para>
+      The <emphasis>grs.marc</emphasis> and 
+      <emphasis>grs.marcxml</emphasis> filters are suited to parse and
+      index binary and XML versions of traditional library MARC records 
+      based on the ISO2709 standard. The Debian package for both
+      filters is 
+     <literal>libidzebra-2.0-mod-grs-marc</literal>.
+    </para>
+    <para>
+      GRS TCL scriptable filters for extensive user configuration come
+     in two flavors: a regular expression filter 
+     <emphasis>grs.regx</emphasis> using TCL regular expressions, and
+     a general scriptable TCL filter called 
+     <emphasis>grs.tcl</emphasis>        
+     are both included in the 
+     <literal>libidzebra-2.0-mod-grs-regx</literal> Debian package.
+    </para>
+    <para>
+      A general purpose SGML filter is called
+     <emphasis>grs.sgml</emphasis>. This filter is not yet packaged,
+     but planned to be in the  
+     <literal>libidzebra-2.0-mod-grs-sgml</literal> Debian package.
     </para>
-   </sect3>
+    <para>
+      The Debian  package 
+      <literal>libidzebra-2.0-mod-grs-xml</literal> includes the 
+      <emphasis>grs.xml</emphasis> filter which uses <ulink
+      url="&url.expat;">Expat</ulink> to 
+      parse records in XML and turn them into IDZebra's internal GRS node
+      trees. Have also a look at the Alvis XML/XSLT filter described in
+      the next session.
+    </para>
+   </section>
 
-   <sect3 id="componentmodulesalvis">
+   <section id="componentmodulesalvis">
     <title>ALVIS Record Model and Filter Module</title>
      <para>
-      - alvis          Experimental Alvis XSLT filter
-      <literal>mod-alvis.so</literal>
-      <literal>libidzebra1.4-mod-alvis</literal>
+      The Alvis filter for XML files is an XSLT based input
+      filter. 
+      It indexes element and attribute content of any thinkable XML format
+      using full XPATH support, a feature which the standard Zebra
+      GRS SGML and XML filters lacked. The indexed documents are
+      parsed into a standard XML DOM tree, which restricts record size
+      according to availability of memory.
+    </para>
+    <para>
+      The Alvis filter 
+      uses XSLT display stylesheets, which let
+      the Zebra DB administrator associate multiple, different views on
+      the same XML document type. These views are chosen on-the-fly in
+      search time.
      </para>
-    </sect3>
-
-   <sect3 id="componentmodulessafari">
-    <title>SAFARI Record Model and Filter Module</title>
     <para>
-     - safari
-     <!--
-     <literal>safari module missing as deb file<literal>
-     -->
+      In addition, the Alvis filter configuration is not bound to the
+      arcane  BIB-1 Z39.50 library catalogue indexing traditions and
+      folklore, and is therefore easier to understand.
     </para>
-   </sect3>
-
-   </sect2>
-
-   <!--
-   <sect2 id="componentconfig">
-    <title>Configuration Files</title>
     <para>
-     - yazserver XML based config file
-     - core Zebra ascii based config files
-     - filter module config files in many flavours  
-     - <ulink url="http://www.loc.gov/standards/sru/cql/">CQL</ulink> to PQF ascii based config file
+      Finally, the Alvis  filter allows for static ranking at index
+      time, and to to sort hit lists according to predefined
+      static ranks. This imposes no overhead at all, both
+      search and indexing perform still 
+      <emphasis>O(1)</emphasis> irrespectively of document
+      collection size. This feature resembles Googles pre-ranking using
+      their Pagerank algorithm.
     </para>
-   </sect2>
-   -->
-  </sect1>
-
-  <!--
+    <para>
+      Details on the experimental Alvis XSLT filter are found in 
+      <xref linkend="record-model-alvisxslt"/>.
+      </para>
+     <para>
+      The Debian package <literal>libidzebra-2.0-mod-alvis</literal>
+      contains the Alvis filter module.
+     </para>
+    </section>
 
+    <!--
+   <section id="componentmodulessafari">
+    <title>SAFARI Record Model and Filter Module</title>
+    <para>
+     SAFARI filter module TODO: add information here.
+    </para>
+   </section>
+    -->
 
-  <sect1 id="cqltopqf">
-   <title>Server Side <ulink url="http://www.loc.gov/standards/sru/cql/">CQL</ulink> To PQF Conversion</title>
-   <para>
-  The cql2pqf.txt yaz-client config file, which is also used in the
-  yaz-server <ulink url="http://www.loc.gov/standards/sru/cql/">CQL</ulink>-to-PQF process, is used to to drive
-  org.z3950.zing.cql.<ulink url="http://www.loc.gov/standards/sru/cql/">CQL</ulink>Node's toPQF() back-end and the YAZ <ulink url="http://www.loc.gov/standards/sru/cql/">CQL</ulink>-to-PQF
-  converter.  This specifies the interpretation of various <ulink url="http://www.loc.gov/standards/sru/cql/">CQL</ulink>
-  indexes, relations, etc. in terms of Type-1 query attributes.
- 
-  This configuration file generates queries using BIB-1 attributes.
-  See http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html
-  for the Maintenance Agency's work-in-progress mapping of Dublin Core
-  indexes to Attribute Architecture (util, XD and BIB-2)
-  attributes.
+   </section>
 
-  a) <ulink url="http://www.loc.gov/standards/sru/cql/">CQL</ulink> set prefixes  are specified using the correct <ulink url="http://www.loc.gov/standards/sru/cql/">CQL</ulink>/ <ulink url="http://www.loc.gov/standards/sru/srw/">SRW</ulink>/U
-  prefixes for the required index sets, or user-invented prefixes for
-  special index sets. An index set in <ulink url="http://www.loc.gov/standards/sru/cql/">CQL</ulink> is roughly speaking equivalent to a
-  namespace specifier in XML.
+  </section>
 
-  b) The default index set to be used if none explicitely mentioned
 
-  c) Index mapping definitions of the form
+  <section id="architecture-workflow">
+   <title>Indexing and Retrieval Workflow</title>
 
-      index.cql.all  = 1=text
+  <para>
+   Records pass through three different states during processing in the
+   system.
+  </para>
 
-  which means that the index "all" from the set "cql" is mapped on the
-  bib-1 RPN query "@attr 1=text" (where "text" is some existing index
-  in zebra, see indexing stylesheet) 
+  <para>
 
-  d) Relation mapping from <ulink url="http://www.loc.gov/standards/sru/cql/">CQL</ulink> relations to bib-1 RPN "@attr 2= " stuff
+   <itemizedlist>
+    <listitem>
+     
+     <para>
+      When records are accessed by the system, they are represented
+      in their local, or native format. This might be SGML or HTML files,
+      News or Mail archives, MARC records. If the system doesn't already
+      know how to read the type of data you need to store, you can set up an
+      input filter by preparing conversion rules based on regular
+      expressions and possibly augmented by a flexible scripting language
+      (Tcl).
+      The input filter produces as output an internal representation,
+      a tree structure.
 
-  e) Relation modifier mapping from <ulink url="http://www.loc.gov/standards/sru/cql/">CQL</ulink> relations to bib-1 RPN "@attr
-  2= " stuff 
+     </para>
+    </listitem>
+    <listitem>
 
-  f) Position attributes
+     <para>
+      When records are processed by the system, they are represented
+      in a tree-structure, constructed by tagged data elements hanging off a
+      root node. The tagged elements may contain data or yet more tagged
+      elements in a recursive structure. The system performs various
+      actions on this tree structure (indexing, element selection, schema
+      mapping, etc.),
 
-  g) structure attributes
+     </para>
+    </listitem>
+    <listitem>
 
-  h) truncation attributes
+     <para>
+      Before transmitting records to the client, they are first
+      converted from the internal structure to a form suitable for exchange
+      over the network - according to the Z39.50 standard.
+     </para>
+    </listitem>
 
-  See
-  http://www.indexdata.com/yaz/doc/tools.tkl#tools.cql.map for config
-  file details.
+   </itemizedlist>
 
+  </para>
+  </section>
 
+  <section id="special-retrieval">
+   <title>Retrieval of Zebra internal record data</title>
+   <para>
+    Starting with <literal>Zebra</literal> version 2.0.5 or newer, it is
+    possible to use a special element set which has the prefix
+    <literal>zebra::</literal>.
    </para>
-  </sect1>
-
-
-  <sect1 id="ranking">
-   <title>Static and Dynamic Ranking</title>
    <para>
-      Zebra uses internally inverted indexes to look up term occurencies
-  in documents. Multiple queries from different indexes can be
-  combined by the binary boolean operations AND, OR and/or NOT (which
-  is in fact a binary AND NOT operation). To ensure fast query execution
-  speed, all indexes have to be sorted in the same order.
-
-  The indexes are normally sorted according to document ID in
-  ascending order, and any query which does not invoke a special
-  re-ranking function will therefore retrieve the result set in document ID
-  order.
-
-  If one defines the 
- 
-    staticrank: 1 
-
-  directive in the main core Zebra config file, the internal document
-  keys used for ordering are augmented by a preceeding integer, which
-  contains the static rank of a given document, and the index lists
-  are ordered 
-    - first by ascending static rank
-    - then by ascending document ID.
-
-  This implies that the default rank "0" is the best rank at the
-  beginning of the list, and "max int" is the worst static rank.
- 
-  The "alvis" and the experimental "xslt" filters are providing a
-  directive to fetch static rank information out of the indexed XML
-  records, thus making _all_ hit sets orderd after ascending static
-  rank, and for those doc's which have the same static rank, ordered
-  after ascending doc ID.
-  If one wants to do a little fiddeling with the static rank order,
-  one has to invoke additional re-ranking/re-ordering using dynamic 
-  reranking or score functions. These functions return positive
-  interger scores, where _highest_ score is best, which means that the
-  hit sets will be sorted according to _decending_ scores (in contrary
-  to the index lists which are sorted according to _ascending_ rank
-  number and document ID) 
-
-
-  Those are defined in the zebra C source files 
-
-   "rank-1" : zebra/index/rank1.c  
-              default TF/IDF like zebra dynamic ranking
-   "rank-static" : zebra/index/rankstatic.c
-              do-nothing dummy static ranking (this is just to prove
-              that the static rank can be used in dynamic ranking functions)  
-   "zvrank" : zebra/index/zvrank.c
-              many different dynamic TF/IDF ranking functions 
-
-   The are in the zebra config file enabled by a directive like:
-
-   rank: rank-static
-
-   Notice that the "rank-1" and "zvrank" do not use the static rank
-   information in the list keys, and will produce the same ordering
-   with our without static ranking enabled.
-
-   The dummy "rank-static" reranking/scoring function returns just
-     score = max int - staticrank
-   in order to preserve the ordering of hit sets with and without it's
-   call.
-
-   Obviously, one wants to make a new ranking function, which combines
-   static and dynamic ranking, which is left as an exercise for the
-   reader .. (Wray, this is your's ...)
-
-
+    Using this element will, regardless of record type, return
+    Zebra's internal index structure/data for a record.
+    In particular, the regular record filters are not invoked when
+    these are in use.
+    This can in some cases make the retrival faster than regular
+    retrieval operations (for MARC, XML etc).
    </para>
-
-
+   <table id="special-retrieval-types">
+    <title>Special Retrieval Elements</title>
+    <tgroup cols="2">
+     <thead>
+      <row>
+       <entry>Element Set</entry>
+       <entry>Description</entry>
+       <entry>Syntax</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry><literal>zebra::meta::sysno</literal></entry>
+       <entry>Get Zebra record system ID</entry>
+       <entry>XML and SUTRS</entry>
+      </row>
+      <row>
+       <entry><literal>zebra::data</literal></entry>
+       <entry>Get raw record</entry>
+       <entry>all</entry>
+      </row>
+      <row>
+       <entry><literal>zebra::meta</literal></entry>
+       <entry>Get Zebra record internal metadata</entry>
+       <entry>XML and SUTRS</entry>
+      </row>
+      <row>
+       <entry><literal>zebra::index</literal></entry>
+       <entry>Get all indexed keys for record</entry>
+       <entry>XML and SUTRS</entry>
+      </row>
+      <row>
+       <entry>
+	<literal>zebra::index::</literal><replaceable>f</replaceable>
+       </entry>
+       <entry>
+	Get indexed keys for field <replaceable>f</replaceable>	for record
+       </entry>
+       <entry>XML and SUTRS</entry>
+      </row>
+      <row>
+       <entry>
+	<literal>zebra::index::</literal><replaceable>f</replaceable>:<replaceable>t</replaceable>
+       </entry>
+       <entry>
+	Get indexed keys for field <replaceable>f</replaceable>
+	  and type <replaceable>t</replaceable> for record
+       </entry>
+       <entry>XML and SUTRS</entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
    <para>
-    yazserver frontend config file
-
-  db/yazserver.xml 
-
-  Setup of listening ports, and virtual zebra servers.
-  Note path to server-side <ulink url="http://www.loc.gov/standards/sru/cql/">CQL</ulink>-to-PQF config file, and to
-   <ulink url="http://www.loc.gov/standards/sru/srw/">SRW</ulink> explain config section. 
-
-  The <directory> path is relative to the directory where zebra.init is placed
-  and is started up. The other pathes are relative to <directory>,
-  which in this case is the same.
-
-  see: http://www.indexdata.com/yaz/doc/server.vhosts.tkl
-
+    For example, to fetch the raw binary record data stored in the
+    zebra internal storage, or on the filesystem, the following
+    commands can be issued:
+    <screen>
+      Z> f @attr 1=title my
+      Z> format xml
+      Z> elements zebra::data
+      Z> s 1+1
+      Z> format sutrs
+      Z> s 1+1
+      Z> format usmarc
+      Z> s 1+1
+    </screen>
+    </para>
+   <para>
+    The special 
+    <literal>zebra::data</literal> element set name is 
+    defined for any record syntax, but will always fetch  
+    the raw record data in exactly the original form. No record syntax
+    specific transformations will be applied to the raw record data. 
    </para>
-
    <para>
- Z39.50 searching:
-
-  search like this (using client-side <ulink url="http://www.loc.gov/standards/sru/cql/">CQL</ulink>-to-PQF conversion):
-
-  yaz-client -q db/cql2pqf.txt localhost:9999
-  > format xml
-  > querytype cql2rpn
-  > f text=(plant and soil)
-  > s 1
-  > elements dc
-  > s 1
-  > elements index
-  > s 1
-  > elements alvis
-  > s 1
-  > elements snippet
-  > s 1
-
-
-  search like this (using server-side <ulink url="http://www.loc.gov/standards/sru/cql/">CQL</ulink>-to-PQF conversion):
-  (the only difference is "querytype cql" instead of 
-   "querytype cql2rpn" and the call without specifying a local
-  conversion file)
-
-  yaz-client localhost:9999
- > format xml
-  > querytype cql
-  > f text=(plant and soil)
-  > s 1
-  > elements dc
-  > s 1
-  > elements index
-  > s 1
-  > elements alvis
-  > s 1
-  > elements snippet
-  > s 1
-
-  NEW: static relevance ranking - see examples in alvis2index.xsl
-
-  > f text = /relevant (plant and soil)
-  > elem dc
-  > s 1
-
-  > f title = /relevant a
-  > elem dc
-  > s 1
-
-
-
- <ulink url="http://www.loc.gov/standards/sru/srw/">SRW</ulink>/U searching
- Surf into http://localhost:9999
- 
- firefox http://localhost:9999
- 
- gives you an explain record. Unfortunately, the data found in the 
- <ulink url="http://www.loc.gov/standards/sru/cql/">CQL</ulink>-to-PQF text file must be added by hand-craft into the explain
- section of the yazserver.xml file. Too bad, but this is all extreme
- new alpha stuff, and a lot of work has yet to be done ..
- 
- Searching via  <ulink url="http://www.loc.gov/standards/sru/">SRU</ulink>: surf into the URL (lines broken here - concat on
- URL line)
- 
- - see number of hits:
- http://localhost:9999/?version=1.1&operation=searchRetrieve
-                       &query=text=(plant%20and%20soil)
- 
-
- - fetch record 5-7 in DC format
- http://localhost:9999/?version=1.1&operation=searchRetrieve
-                       &query=text=(plant%20and%20soil)
-                       &startRecord=5&maximumRecords=2&recordSchema=dc
- 
- 
- - even search using PQF queries using the extended verb "x-pquery",
-   which is special to YAZ/Zebra
- 
- http://localhost:9999/?version=1.1&operation=searchRetrieve
-                       &x-pquery=@attr%201=text%20@and%20plant%20soil
- 
- More info: read the fine manuals at http://www.loc.gov/z3950/agency/zing/srw/
-278,280d299
- Search via  <ulink url="http://www.loc.gov/standards/sru/srw/">SRW</ulink>:
- read the fine manual at 
- http://www.loc.gov/z3950/agency/zing/srw/
-
- 
-and so on. The list of available indexes is found in db/cql2pqf.txt
-
-
-7) How do you add to the index attributes of any other type than "w"?
-I mean, in the context of making <ulink url="http://www.loc.gov/standards/sru/cql/">CQL</ulink> queries. Let's say I want a date 
-attribute in there, so that one could do date > 20050101 in <ulink url="http://www.loc.gov/standards/sru/cql/">CQL</ulink>.
-
-Currently for example 'date-modified' is of type 'w'.
-
-The 2-seconds-of-though solution:
-
-     in alvis2index.sl:
-
-  <z:index name="date-modified" type="d">
-      <xsl:value-of 
-           select="acquisition/acquisitionData/modifiedDate"/>
-    </z:index>
-
-But here's the catch...doesn't the use of the 'd' type require 
-structure type 'date' (@attr 4=5) in PQF? But then...how does that
-reflect in the <ulink url="http://www.loc.gov/standards/sru/cql/">CQL</ulink>->RPN/PQF mapping - does it really work if I just
-change the type of an element in alvis2index.sl? I would think not...?
-
-
-
-
-              Kimmo
-
-
-Either do:
-
-   f @attr 4=5 @attr 1=date-modified 20050713
-
-or do 
-
-
-Either do:
-
-   f @attr 4=5 @attr 1=date-modified 20050713
-
-or do 
-
-querytype cql
-
- f date-modified=20050713
-
- f date-modified=20050713
- 
- Search ERROR 121 4 1+0 RPN: @attrset Bib-1 @attr 5=100 @attr 6=1 @attr 3=3 @att
-r 4=1 @attr 2=3 @attr "1=date-modified" 20050713
-
-
-
- f date-modified eq 20050713
-
-Search OK 23 3 1+0 RPN: @attrset Bib-1 @attr 5=100 @attr 6=1 @attr 3=3 @attr 4=5
- @attr 2=3 @attr "1=date-modified" 20050713
-
-
+    Also, Zebra internal metadata about the record can be accessed: 
+    <screen>
+      Z> f @attr 1=title my
+      Z> format xml
+      Z> elements zebra::meta::sysno
+      Z> s 1+1
+    </screen> 
+    displays in <literal>XML</literal> record syntax only internal
+    record system number, whereas 
+    <screen>
+      Z> f @attr 1=title my
+      Z> format xml
+      Z> elements zebra::meta
+      Z> s 1+1
+    </screen> 
+    displays all available metadata on the record. These include sytem
+    number, database name,  indexed filename,  filter used for indexing,
+    score and static ranking information and finally bytesize of record.
    </para>
-
    <para>
-E) EXTENDED SERVICE LIFE UPDATES
-
-The extended services are not enabled by default in zebra - due to the
-fact that they modify the system.
-
-In order to allow anybody to update, use
-perm.anonymous: rw
-in zebra.cfg.
-
-Or, even better, allow only updates for a particular admin user. For
-user 'admin', you could use:
-perm.admin: rw
-passwd: passwordfile
-
-And in passwordfile, specify users and passwords ..
-admin:secret
-
-We can now start a yaz-client admin session and create a database:
-
-$ yaz-client localhost:9999 -u admin/secret
-Authentication set to Open (admin/secret)
-Connecting...OK.
-Sent initrequest.
-Connection accepted by v3 target.
-ID     : 81
-Name   : Zebra Information Server/GFS/YAZ
-Version: Zebra 1.4.0/1.63/2.1.9
-Options: search present delSet triggerResourceCtrl scan sort
-extendedServices namedResultSets
-Elapsed: 0.007046
-Z> adm-create
-Admin request
-Got extended services response
-Status: done
-Elapsed: 0.045009
-:
-Now Default was created..  We can now insert an XML file (esdd0006.grs
-from example/gils/records) and index it:
-
-Z> update insert 1 esdd0006.grs
-Got extended services response
-Status: done
-Elapsed: 0.438016
-
-The 3rd parameter.. 1 here .. is the opaque record id from Ext update.
-It a record ID that _we_ assign to the record in question. If we do not
-assign one the usual rules for match apply (recordId: from zebra.cfg).
-
-Actually, we should have a way to specify "no opaque record id" for
-yaz-client's update command.. We'll fix that.
-
-Elapsed: 0.438016
-Z> f utah
-Sent searchRequest.
-Received SearchResponse.
-Search was a success.
-Number of hits: 1, setno 1
-SearchResult-1: term=utah cnt=1
-records returned: 0
-Elapsed: 0.014179
-
-Let's delete the beast:
-Z> update delete 1
-No last record (update ignored)
-Z> update delete 1 esdd0006.grs
-Got extended services response
-Status: done
-Elapsed: 0.072441
-Z> f utah
-Sent searchRequest.
-Received SearchResponse.
-Search was a success.
-Number of hits: 0, setno 2
-SearchResult-1: term=utah cnt=0
-records returned: 0
-Elapsed: 0.013610
-
-If shadow register is enabled you must run the adm-commit command in
-order write your changes..
-
+    Sometimes, it is very hard to figure out what exactly has been
+    indexed how and in which indexes. Using the indexing stylesheet of
+    the Alvis filter, one can at least see which portion of the record
+    went into which index, but a similar aid does not exist for all
+    other indexing filters.  
    </para>
-
-
-
-  </sect1>
--->
+   <para>
+    The special
+    <literal>zebra::index</literal> element set names are provided to
+    access information on per record indexed fields. For example, the
+    queries 
+    <screen>
+      Z> f @attr 1=title my
+      Z> format sutrs
+      Z> elements zebra::index
+      Z> s 1+1
+    </screen>
+    will display all indexed tokens from all indexed fields of the
+    first record, and it will display in <literal>SUTRS</literal>
+    record syntax, whereas 
+    <screen>
+      Z> f @attr 1=title my
+      Z> format xml
+      Z> elements zebra::index::title
+      Z> s 1+1
+      Z> elements zebra::index::title:p
+      Z> s 1+1
+    </screen> 
+    displays in <literal>XML</literal> record syntax only the content
+      of the zebra string index <literal>title</literal>, or
+      even only the type <literal>p</literal> phrase indexed part of it.
+   </para>
+   <note>
+    <para>
+     Trying to access numeric <literal>Bib-1</literal> use
+     attributes or trying to access non-existent zebra intern string
+     access points will result in a Diagnostic 25: Specified element set
+     'name not valid for specified database.
+    </para>
+   </note>
+  </section>
 
  </chapter>