X-Git-Url: http://git.indexdata.com/?p=idzebra-moved-to-github.git;a=blobdiff_plain;f=doc%2Frecordmodel-domxml.xml;h=391b45338c15ce9a10e1e6e02b43b8f3e0526462;hp=bb5b300578cbfe00549ae91bfa5757ba2419d80b;hb=250de4ed23a44f5eb3552db317eef0d0fbe3265c;hpb=c99c50f588fb803362a47a933c988360ab1cd98c

diff --git a/doc/recordmodel-domxml.xml b/doc/recordmodel-domxml.xml
index bb5b300..391b453 100644
--- a/doc/recordmodel-domxml.xml
+++ b/doc/recordmodel-domxml.xml
@@ -1,45 +1,44 @@
 <chapter id="record-model-domxml">
-  <!-- $Id: recordmodel-domxml.xml,v 1.9 2007-02-22 15:44:19 marc Exp $ -->
-  <title>&dom; &xml; Record Model and Filter Module</title>
-
+  <title>&acro.dom; &acro.xml; Record Model and Filter Module</title>
+  
   <para>
    The record model described in this chapter applies to the fundamental,
-   structured &xml;
-   record type <literal>&dom;</literal>, introduced in
-   <xref linkend="componentmodulesdom"/>. The &dom; &xml; record model
-   is experimental, and it's inner workings might change in future
+   structured &acro.xml;
+   record type <literal>&acro.dom;</literal>, introduced in
+   <xref linkend="componentmodulesdom"/>. The &acro.dom; &acro.xml; record model
+   is experimental, and its inner workings might change in future
    releases of the &zebra; Information Server.
   </para>
 
   
   
   <section id="record-model-domxml-filter">
-   <title>&dom; Record Filter Architecture</title>
+   <title>&acro.dom; Record Filter Architecture</title>
 
      <para>
-      The &dom; &xml; filter uses a standard &dom; &xml; structure as
+      The &acro.dom; &acro.xml; filter uses a standard &acro.dom; &acro.xml; structure as
       internal data model, and can therefore parse, index, and display 
-      any &xml; document type. It is well suited to work on 
-      standardized &xml;-based formats such as Dublin Core, MODS, METS,
+      any &acro.xml; document type. It is well suited to work on 
+      standardized &acro.xml;-based formats such as Dublin Core, MODS, METS,
       MARCXML, OAI-PMH, RSS, and performs equally  well on any other
-      non-standard &xml; format. 
+      non-standard &acro.xml; format. 
     </para>
     <para>
-      A parser for binary &marc; records based on the ISO2709 library
+      A parser for binary &acro.marc; records based on the ISO2709 library
       standard is provided, it transforms these to the internal
-      &marcxml; &dom; representation. Other binary document parsers
+      &acro.marcxml; &acro.dom; representation. Other binary document parsers
       are planned to follow.  
     </para>
 
     <para>
-      The &dom; filter architecture consists of four
+      The &acro.dom; filter architecture consists of four
       different pipelines, each being a chain of arbitrarily many successive
-      &xslt; transformations of the internal &dom; &xml;
+      &acro.xslt; transformations of the internal &acro.dom; &acro.xml;
       representations of documents.
     </para>
 
     <figure id="record-model-domxml-architecture-fig">
-      <title>&dom; &xml; filter architecture</title>
+      <title>&acro.dom; &acro.xml; filter architecture</title>
       <mediaobject>
        <imageobject>
          <imagedata fileref="domfilter.pdf" format="PDF" scale="50"/>
@@ -50,7 +49,7 @@
         <textobject>
         <!-- Fall back if none of the images can be used -->
         <phrase>
-          [Here there should be a diagram showing the &dom; &xml;
+          [Here there should be a diagram showing the &acro.dom; &acro.xml;
            filter architecture, but is seems that your
            tool chain has not been able to include the diagram in this
            document.]
@@ -61,7 +60,7 @@
 
 
     <table id="record-model-domxml-architecture-table" frame="top">
-      <title>&dom; &xml; filter pipelines overview</title>
+      <title>&acro.dom; &acro.xml; filter pipelines overview</title>
       <tgroup cols="5">
        <thead>
         <row>
@@ -78,26 +77,26 @@
          <entry><literal>input</literal></entry>
          <entry>first</entry>
          <entry>input parsing and initial
-          transformations to common &xml; format</entry>
-         <entry>Input raw &xml; record buffers, &xml;  streams and 
-                binary &marc; buffers</entry>
-         <entry>Common &xml; &dom;</entry>
+          transformations to common &acro.xml; format</entry>
+         <entry>Input raw &acro.xml; record buffers, &acro.xml;  streams and 
+                binary &acro.marc; buffers</entry>
+         <entry>Common &acro.xml; &acro.dom;</entry>
         </row>
         <row>
          <entry><literal>extract</literal></entry>
          <entry>second</entry>
          <entry>indexing term extraction
           transformations</entry>
-         <entry>Common &xml; &dom;</entry>
-         <entry>Indexing &xml; &dom;</entry>
+         <entry>Common &acro.xml; &acro.dom;</entry>
+         <entry>Indexing &acro.xml; &acro.dom;</entry>
         </row>
         <row>
          <entry><literal>store</literal></entry>
          <entry>second</entry>
          <entry> transformations before internal document
           storage</entry>
-         <entry>Common &xml; &dom;</entry>
-         <entry>Storage &xml; &dom;</entry>
+         <entry>Common &acro.xml; &acro.dom;</entry>
+         <entry>Storage &acro.xml; &acro.dom;</entry>
         </row>
         <row>
          <entry><literal>retrieve</literal></entry>
@@ -105,40 +104,40 @@
          <entry>multiple document retrieve transformations from
           storage to different output
           formats are possible</entry>
-         <entry>Storage &xml; &dom;</entry>
-         <entry>Output &xml; syntax in requested formats</entry>
+         <entry>Storage &acro.xml; &acro.dom;</entry>
+         <entry>Output &acro.xml; syntax in requested formats</entry>
         </row>
        </tbody>
       </tgroup>
      </table>
 
     <para>
-      The &dom; &xml; filter pipelines use &xslt; (and if  supported on
-      your platform, even &exslt;), it brings thus full &xpath;
+      The &acro.dom; &acro.xml; filter pipelines use &acro.xslt; (and if  supported on
+      your platform, even &acro.exslt;), it brings thus full &acro.xpath;
       support to the indexing, storage and display rules of not only
-      &xml; documents, but also binary &marc; records.
+      &acro.xml; documents, but also binary &acro.marc; records.
     </para>
    </section>
 
 
    <section id="record-model-domxml-pipeline">
-    <title>&dom; &xml; filter pipeline configuration</title>   
+    <title>&acro.dom; &acro.xml; filter pipeline configuration</title>   
 
    <para>
-    The experimental, loadable  &dom; &xml;/&xslt; filter module
+    The experimental, loadable  &acro.dom; &acro.xml;/&acro.xslt; filter module
    <literal>mod-dom.so</literal> 
     is invoked by the <filename>zebra.cfg</filename> configuration statement
     <screen>
      recordtype.xml: dom.db/filter_dom_conf.xml
     </screen>
-    In this example the &dom; &xml; filter is configured to work 
+    In this example the &acro.dom; &acro.xml; filter is configured to work 
     on all data files with suffix 
     <filename>*.xml</filename>, where the configuration file is found in the
     path <filename>db/filter_dom_conf.xml</filename>.
    </para>
 
-   <para>The &dom; &xslt; filter configuration file must be
-    valid &xml;. It might look like this:
+   <para>The &acro.dom; &acro.xslt; filter configuration file must be
+    valid &acro.xml;. It might look like this:
     <screen> 
     <![CDATA[
     <?xml version="1.0" encoding="UTF8"?>
@@ -147,7 +146,7 @@
         <xmlreader level="1"/>
         <!-- <marc inputcharset="marc-8"/> -->
       </input>
-      <extrac>
+      <extract>
          <xslt stylesheet="common2index.xsl"/>
       </extract>
       <store>
@@ -164,9 +163,9 @@
     </screen>
    </para>
    <para>
-     The root &xml; element <literal>&lt;dom&gt;</literal> and all other &dom;
-     &xml; filter elements are residing in the namespace 
-     <literal>xmlns="http://indexdata.dk/zebra-2.0"</literal>.
+     The root &acro.xml; element <literal>&lt;dom&gt;</literal> and all other &acro.dom;
+     &acro.xml; filter elements are residing in the namespace 
+     <literal>xmlns="http://indexdata.com/zebra-2.0"</literal>.
    </para>
    <para>
     All pipeline definition elements - i.e. the
@@ -180,7 +179,7 @@
    <para>
     All pipeline definition elements may contain zero or more 
     <literal><![CDATA[<xslt stylesheet="path/file.xsl"/>]]></literal>
-    &xslt; transformation instructions, which are performed
+    &acro.xslt; transformation instructions, which are performed
     sequentially from top to bottom.
     The paths in the <literal>stylesheet</literal> attributes
     are relative to zebras working directory, or absolute to the file
@@ -192,22 +191,22 @@
     <title>Input pipeline</title>   
    <para>
     The <literal>&lt;input&gt;</literal> pipeline definition element
-    may contain either one &xml; Reader definition
+    may contain either one &acro.xml; Reader definition
     <literal><![CDATA[<xmlreader level="1"/>]]></literal>, used to split
-    an &xml; collection input stream into individual &xml; &dom;
+    an &acro.xml; collection input stream into individual &acro.xml; &acro.dom;
     documents at the prescribed element level, 
-    or one &marc; binary
+    or one &acro.marc; binary
     parsing instruction
     <literal><![CDATA[<marc inputcharset="marc-8"/>]]></literal>, which defines
-    a conversion to &marcxml; format &dom; trees. The allowed values
+    a conversion to &acro.marcxml; format &acro.dom; trees. The allowed values
     of the <literal>inputcharset</literal> attribute depend on your
     local <productname>iconv</productname> set-up.
    </para>
    <para>
-    Both input parsers deliver individual &dom; &xml; documents to the
+    Both input parsers deliver individual &acro.dom; &acro.xml; documents to the
     following chain of zero or more  
     <literal><![CDATA[<xslt stylesheet="path/file.xsl"/>]]></literal>
-    &xslt; transformations. At the end of this pipeline, the documents
+    &acro.xslt; transformations. At the end of this pipeline, the documents
     are in the common format, used to feed both the 
      <literal>&lt;extract&gt;</literal> and 
      <literal>&lt;store&gt;</literal> pipelines.
@@ -218,11 +217,11 @@
     <title>Extract pipeline</title>   
      <para>
        The <literal>&lt;extract&gt;</literal> pipeline takes documents
-       from any common &dom; &xml; format to the &zebra; specific
-        indexing &dom; &xml; format.
+       from any common &acro.dom; &acro.xml; format to the &zebra; specific
+        indexing &acro.dom; &acro.xml; format.
        It may consist of zero ore more 
        <literal><![CDATA[<xslt stylesheet="path/file.xsl"/>]]></literal>
-       &xslt; transformations, and the outcome is handled to the
+       &acro.xslt; transformations, and the outcome is handled to the
        &zebra; core to drive the process of building the inverted
        indexes. See
        <xref linkend="record-model-domxml-canonical-index"/> for
@@ -233,11 +232,11 @@
    <section id="record-model-domxml-pipeline-store">
     <title>Store pipeline</title>   
        The <literal>&lt;store&gt;</literal> pipeline takes documents
-       from any common &dom;  &xml; format to the &zebra; specific
-        storage &dom;  &xml; format.
+       from any common &acro.dom;  &acro.xml; format to the &zebra; specific
+        storage &acro.dom;  &acro.xml; format.
        It may consist of zero ore more 
        <literal><![CDATA[<xslt stylesheet="path/file.xsl"/>]]></literal>
-       &xslt; transformations, and the outcome is handled to the
+       &acro.xslt; transformations, and the outcome is handled to the
        &zebra; core for deposition into the internal storage system.
     </section>
 
@@ -248,9 +247,9 @@
       <literal>&lt;retrieve&gt;</literal> pipeline definitions, each
       of them again consisting of zero or more
       <literal><![CDATA[<xslt stylesheet="path/file.xsl"/>]]></literal>
-       &xslt; transformations. These are used for document
-      presentation after search, and take the internal storage &dom;
-      &xml; to the requested output formats during record present
+       &acro.xslt; transformations. These are used for document
+      presentation after search, and take the internal storage &acro.dom;
+      &acro.xml; to the requested output formats during record present
       requests.  
     </para>
     <para>
@@ -259,9 +258,9 @@
      are distinguished by their unique <literal>name</literal>
      attributes, these are the literal <literal>schema</literal> or 
      <literal>element set</literal> names used in 
-      <ulink url="http://www.loc.gov/standards/sru/srw/">&srw;</ulink>,
-      <ulink url="&url.sru;">&sru;</ulink> and
-      &z3950; protocol queries.
+      <ulink url="http://www.loc.gov/standards/sru/srw/">&acro.srw;</ulink>,
+      <ulink url="&url.sru;">&acro.sru;</ulink> and
+      &acro.z3950; protocol queries.
    </para>
    </section>
 
@@ -270,23 +269,23 @@
     <title>Canonical Indexing Format</title>
 
     <para>
-     &dom; &xml; indexing comes in two flavors: pure
-     processing-instruction governed plain &xml; documents, and - very
-     similar to the Alvis filter indexing format - &xml; documents
-     containing &xml; <literal>&lt;record&gt;</literal> and
+     &acro.dom; &acro.xml; indexing comes in two flavors: pure
+     processing-instruction governed plain &acro.xml; documents, and - very
+     similar to the Alvis filter indexing format - &acro.xml; documents
+     containing &acro.xml; <literal>&lt;record&gt;</literal> and
      <literal>&lt;index&gt;</literal> instructions from the magic
-     namespace <literal>xmlns:z="http://indexdata.dk/zebra-2.0"</literal>.
+     namespace <literal>xmlns:z="http://indexdata.com/zebra-2.0"</literal>.
     </para>
 
    <section id="record-model-domxml-canonical-index-pi">
     <title>Processing-instruction governed indexing format</title>
  
       <para>The output of the processing instruction driven 
-      indexing &xslt; stylesheets must contain
+      indexing &acro.xslt; stylesheets must contain
       processing instructions named 
        <literal>zebra-2.0</literal>. 
-      The output of the &xslt; indexing transformation is then
-      parsed using &dom; methods, and the contained instructions are
+      The output of the &acro.xslt; indexing transformation is then
+      parsed using &acro.dom; methods, and the contained instructions are
       performed on the <emphasis>elements and their
       subtrees directly following the processing instructions</emphasis>.
       </para>
@@ -314,11 +313,11 @@
    <section id="record-model-domxml-canonical-index-element">
     <title>Magic element governed indexing format</title>
    
-    <para>The output of the indexing &xslt; stylesheets must contain
+    <para>The output of the indexing &acro.xslt; stylesheets must contain
     certain elements in the magic 
-     <literal>xmlns:z="http://indexdata.dk/zebra-2.0"</literal>
-    namespace. The output of the &xslt; indexing transformation is then
-    parsed using &dom; methods, and the contained instructions are
+     <literal>xmlns:z="http://indexdata.com/zebra-2.0"</literal>
+    namespace. The output of the &acro.xslt; indexing transformation is then
+    parsed using &acro.dom; methods, and the contained instructions are
     performed on the <emphasis>magic elements and their
     subtrees</emphasis>.
     </para>
@@ -355,7 +354,7 @@
          processing instructions named
          <literal>zebra-2.0</literal> or
          elements contained in the namespace
-         <literal>xmlns:z="http://indexdata.dk/zebra-2.0"</literal>.
+         <literal>xmlns:z="http://indexdata.com/zebra-2.0"</literal>.
          </para>
        </listitem>
        <listitem>
@@ -365,13 +364,66 @@
          </para>
        </listitem>
        <listitem>
-         <para>The unique <literal>record</literal> instruction
-	    may have additional attributes <literal>id</literal> and
-            <literal>rank</literal>, where the value of the opaque ID
-            may be any string not containing the whitespace character 
-            <literal>' '</literal>, and the rank value must be a
+         <para>
+            The unique <literal>record</literal> instruction
+	    may have additional attributes <literal>id</literal>,
+            <literal>rank</literal> and <literal>type</literal>.
+            Attribute <literal>id</literal> is the value of the opaque ID
+            and may be any string not containing the whitespace character 
+            <literal>' '</literal>.
+            The <literal>rank</literal> attribute value must be a
             non-negative integer. See 
-            <xref linkend="administration-ranking"/>
+            <xref linkend="administration-ranking"/> .
+            The <literal>type</literal> attribute specifies how the record
+            is to be treated. The following values may be given for 
+            <literal>type</literal>:
+            <variablelist>
+             <varlistentry>
+              <term><literal>insert</literal></term>
+              <listitem>
+               <para>
+                The record is inserted. If the record already exists, it is
+                skipped (i.e. not replaced).
+               </para>
+              </listitem>
+             </varlistentry>
+             <varlistentry>
+              <term><literal>replace</literal></term>
+              <listitem>
+               <para>
+                The record is replaced. If the record does not already exist,
+                it is skipped (i.e. not inserted).
+               </para>
+              </listitem>
+             </varlistentry>
+             <varlistentry>
+              <term><literal>delete</literal></term>
+              <listitem>
+               <para>
+                The record is deleted. If the record does not already exist,
+                it is skipped (i.e. nothing is deleted).
+               </para>
+              </listitem>
+             </varlistentry>
+             <varlistentry>
+              <term><literal>update</literal></term>
+              <listitem>
+               <para>
+                The record is inserted or replaced depending on whether the
+                record exists or not. This is the default behavior but may
+                be effectively changed by "outside" the scope of the DOM
+                filter by zebraidx commands or extended services updates.
+               </para>
+              </listitem>
+             </varlistentry>
+            </variablelist>
+          Note that the value of <literal>type</literal> is only used to
+          determine the action if and only if the Zebra indexer is running
+          in "update" mode (i.e zebraidx update) or if the specialUpdate
+          action of the
+          <link linkend="administration-extended-services-z3950">Extended
+          Service Update</link> is used.
+          For this reason a specialUpdate may end up deleting records!
          </para>
        </listitem>
        <listitem>
@@ -400,18 +452,30 @@
          <xref linkend="fields-and-charsets"/> for details.
          </para>
        </listitem>
+       <listitem>
+         <para> 
+         &acro.dom; input documents which are not resulting in both one
+         unique valid 
+         <literal>record</literal> instruction and one or more valid 
+         <literal>index</literal> instructions can not be searched and
+         found. Therefore,
+         invalid document processing is aborted, and any content of
+         the <literal>&lt;extract&gt;</literal> and 
+         <literal>&lt;store&gt;</literal> pipelines is discarded.
+          A warning is issued in the logs. 
+         </para>
+       </listitem>
       </itemizedlist>
     </para>
-
     
     <para>The examples work as follows: 
-     From the original &xml; file 
-     <literal>marc-one.xml</literal> (or from the &xml; record &dom; of the
+     From the original &acro.xml; file 
+     <literal>marc-one.xml</literal> (or from the &acro.xml; record &acro.dom; of the
      same form coming from an <literal>&lt;input&gt;</literal>
      pipeline),
      the indexing
      pipeline <literal>&lt;extract&gt;</literal>
-     produces an indexing &xml; record, which is defined by
+     produces an indexing &acro.xml; record, which is defined by
      the <literal>record</literal> instruction
      &zebra; uses the content of 
      <literal>z:id="11224466"</literal> 
@@ -447,8 +511,8 @@
      inserted in the named indexes.
     </para>
     <para>
-     Finally, this example configuration can be queried using &pqf;
-     queries, either transported by &z3950;, (here using a yaz-client) 
+     Finally, this example configuration can be queried using &acro.pqf;
+     queries, either transported by &acro.z3950;, (here using a yaz-client) 
      <screen>
       <![CDATA[
       Z> open localhost:9999
@@ -468,27 +532,27 @@
      or the proprietary
      extensions <literal>x-pquery</literal> and
      <literal>x-pScanClause</literal> to
-     &sru;, and &srw;
+     &acro.sru;, and &acro.srw;
      <screen>
       <![CDATA[
       http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=@attr 1=title program
       http://localhost:9999/?version=1.1&operation=scan&x-pScanClause=@attr 1=title ""
       ]]>
      </screen>
-     See <xref linkend="zebrasrv-sru"/> for more information on &sru;/&srw;
+     See <xref linkend="zebrasrv-sru"/> for more information on &acro.sru;/&acro.srw;
      configuration, and <xref linkend="gfs-config"/> or the &yaz;
-     <ulink url="&url.yaz.cql;">&cql; section</ulink>
+     <ulink url="&url.yaz.cql;">&acro.cql; section</ulink>
      for the details or the &yaz; frontend server.
     </para>
     <para>
      Notice that there are no <filename>*.abs</filename>,
-     <filename>*.est</filename>, <filename>*.map</filename>, or other &grs1;
+     <filename>*.est</filename>, <filename>*.map</filename>, or other &acro.grs1;
      filter configuration files involves in this process, and that the
      literal index names are used during search and retrieval.
     </para>
     <para>
      In case that we want to support the usual
-     <literal>bib-1</literal> &z3950; numeric access points, it is a
+     <literal>bib-1</literal> &acro.z3950; numeric access points, it is a
      good idea to choose string index names defined in the default
      configuration file <filename>tab/bib1.att</filename>, see   
      <xref linkend="attset-files"/>
@@ -501,15 +565,15 @@
 
 
   <section id="record-model-domxml-conf">
-   <title>&dom; Record Model Configuration</title>
+   <title>&acro.dom; Record Model Configuration</title>
 
 
   <section id="record-model-domxml-index">
-   <title>&dom; Indexing Configuration</title>
+   <title>&acro.dom; Indexing Configuration</title>
     <para>
      As mentioned above, there can be only one indexing pipeline,
      and configuration of the indexing process is a synonym
-     of writing an &xslt; stylesheet which produces &xml; output containing the
+     of writing an &acro.xslt; stylesheet which produces &acro.xml; output containing the
      magic processing instructions or elements discussed in  
      <xref linkend="record-model-domxml-canonical-index"/>. 
      Obviously, there are million of different ways to accomplish this
@@ -519,32 +583,32 @@
     <para>
      Stylesheets can be written in the <emphasis>pull</emphasis> or
      the <emphasis>push</emphasis> style: <emphasis>pull</emphasis>
-     means that the output &xml; structure is taken as starting point of
-     the internal structure of the &xslt; stylesheet, and portions of
-     the input &xml; are <emphasis>pulled</emphasis> out and inserted
-     into the right spots of the output &xml; structure. 
+     means that the output &acro.xml; structure is taken as starting point of
+     the internal structure of the &acro.xslt; stylesheet, and portions of
+     the input &acro.xml; are <emphasis>pulled</emphasis> out and inserted
+     into the right spots of the output &acro.xml; structure. 
      On the other
-     side, <emphasis>push</emphasis> &xslt; stylesheets are recursively
+     side, <emphasis>push</emphasis> &acro.xslt; stylesheets are recursively
      calling their template definitions, a process which is commanded
-     by the input &xml; structure, and is triggered to produce 
-     some output &xml;
+     by the input &acro.xml; structure, and is triggered to produce 
+     some output &acro.xml;
      whenever some special conditions in the input stylesheets are
      met. The <emphasis>pull</emphasis> type is well-suited for input
-     &xml; with strong and well-defined structure and semantics, like the
-     following &oai; indexing example, whereas the
+     &acro.xml; with strong and well-defined structure and semantics, like the
+     following &acro.oai; indexing example, whereas the
      <emphasis>push</emphasis> type might be the only possible way to
-     sort out deeply recursive input &xml; formats.
+     sort out deeply recursive input &acro.xml; formats.
     </para>
     <para> 
      A <emphasis>pull</emphasis> stylesheet example used to index
-     &oai; harvested records could use some of the following template
+     &acro.oai; harvested records could use some of the following template
      definitions:
      <screen>
       <![CDATA[
       <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
-       xmlns:z="http://indexdata.dk/zebra-2.0"
-       xmlns:oai="http://www.openarchives.org/&oai;/2.0/" 
-       xmlns:oai_dc="http://www.openarchives.org/&oai;/2.0/oai_dc/" 
+       xmlns:z="http://indexdata.com/zebra-2.0"
+       xmlns:oai="http://www.openarchives.org/&acro.oai;/2.0/" 
+       xmlns:oai_dc="http://www.openarchives.org/&acro.oai;/2.0/oai_dc/" 
        xmlns:dc="http://purl.org/dc/elements/1.1/"
        version="1.0">
 
@@ -569,7 +633,7 @@
 
          <!-- OAI indexing templates -->
          <xsl:template match="oai:record/oai:header/oai:identifier">
-          <z:index name="oai_identifier;0">
+          <z:index name="oai_identifier:0">
            <xsl:value-of select="."/>
           </z:index>    
          </xsl:template>
@@ -589,20 +653,120 @@
       ]]>
      </screen>
     </para>
+  </section>
+
+
+  <section id="record-model-domxml-index-marc">
+   <title>&acro.dom; Indexing &acro.marcxml;</title>
+    <para>
+      The &acro.dom; filter allows indexing of both binary &acro.marc; records
+      and &acro.marcxml; records, depending on its configuration.
+      A typical &acro.marcxml; record might look like this:
+      <screen>  
+      <![CDATA[
+      <record xmlns="http://www.loc.gov/MARC21/slim">
+       <rank>42</rank>
+       <leader>00366nam  22001698a 4500</leader>
+       <controlfield tag="001">   11224466   </controlfield>
+       <controlfield tag="003">DLC  </controlfield>
+       <controlfield tag="005">00000000000000.0  </controlfield>
+       <controlfield tag="008">910710c19910701nju           00010 eng    </controlfield>
+       <datafield tag="010" ind1=" " ind2=" ">
+         <subfield code="a">   11224466 </subfield>
+       </datafield>
+       <datafield tag="040" ind1=" " ind2=" ">
+         <subfield code="a">DLC</subfield>
+         <subfield code="c">DLC</subfield>
+       </datafield>
+       <datafield tag="050" ind1="0" ind2="0">
+         <subfield code="a">123-xyz</subfield>
+       </datafield>
+       <datafield tag="100" ind1="1" ind2="0">
+         <subfield code="a">Jack Collins</subfield>
+       </datafield>
+       <datafield tag="245" ind1="1" ind2="0">
+         <subfield code="a">How to program a computer</subfield>
+       </datafield>
+       <datafield tag="260" ind1="1" ind2=" ">
+         <subfield code="a">Penguin</subfield>
+       </datafield>
+       <datafield tag="263" ind1=" " ind2=" ">
+         <subfield code="a">8710</subfield>
+       </datafield>
+       <datafield tag="300" ind1=" " ind2=" ">
+         <subfield code="a">p. cm.</subfield>
+       </datafield>
+      </record>
+      ]]>
+      </screen>
+    </para>
+
     <para>
-     Notice also,
-     that the names and types of the indexes can be defined in the
-     indexing &xslt; stylesheet <emphasis>dynamically according to
-     content in the original &xml; records</emphasis>, which has
+      It is easily possible to make string manipulation in the &acro.dom;
+      filter. For example, if you want to drop some leading articles
+      in the indexing of sort fields, you might want to pick out the 
+      &acro.marcxml; indicator attributes to chop of leading substrings. If
+      the above &acro.xml; example would have an indicator
+      <literal>ind2="8"</literal> in the title field 
+      <literal>245</literal>, i.e.
+      <screen>  
+      <![CDATA[
+       <datafield tag="245" ind1="1" ind2="8">
+         <subfield code="a">How to program a computer</subfield>
+       </datafield>
+      ]]>
+      </screen>
+      one could write a template taking into account this information
+      to chop the first <literal>8</literal> characters from the
+      sorting index <literal>title:s</literal> like this:
+      <screen>  
+      <![CDATA[
+      <xsl:template match="m:datafield[@tag='245']">
+        <xsl:variable name="chop">
+          <xsl:choose>
+            <xsl:when test="not(number(@ind2))">0</xsl:when>
+            <xsl:otherwise><xsl:value-of select="number(@ind2)"/></xsl:otherwise>
+          </xsl:choose>
+        </xsl:variable>  
+
+        <z:index name="title:w title:p any:w">
+           <xsl:value-of select="m:subfield[@code='a']"/>
+        </z:index>
+
+        <z:index name="title:s">
+          <xsl:value-of select="substring(m:subfield[@code='a'], $chop)"/>
+        </z:index>
+
+      </xsl:template> 
+      ]]>
+      </screen>
+      The output of the above &acro.marcxml; and &acro.xslt; excerpt would then be:
+      <screen>  
+      <![CDATA[
+        <z:index name="title:w title:p any:w">How to program a computer</z:index>
+        <z:index name="title:s">program a computer</z:index>
+      ]]>
+      </screen>
+      and the record would be sorted in the title index under 'P', not 'H'.
+    </para>
+  </section>
+
+
+  <section id="record-model-domxml-index-wizzard">
+   <title>&acro.dom; Indexing Wizardry</title>
+    <para>
+     The names and types of the indexes can be defined in the
+     indexing &acro.xslt; stylesheet <emphasis>dynamically according to
+     content in the original &acro.xml; records</emphasis>, which has
      opportunities for great power and wizardry as well as grande
      disaster.  
     </para>
     <para>
      The following excerpt of a <emphasis>push</emphasis> stylesheet
      <emphasis>might</emphasis> 
-     be a good idea according to your strict control of the &xml;
+     be a good idea according to your strict control of the &acro.xml;
      input format (due to rigorous checking against well-defined and
-     tight RelaxNG or &xml; Schema's, for example):
+     tight RelaxNG or &acro.xml; Schema's, for example):
      <screen>
       <![CDATA[
       <xsl:template name="element-name-indexes">     
@@ -613,11 +777,11 @@
       ]]>
      </screen>
      This template creates indexes which have the name of the working 
-     node of any input  &xml; file, and assigns a '1' to the index.
+     node of any input  &acro.xml; file, and assigns a '1' to the index.
      The example query 
      <literal>find @attr 1=xyz 1</literal> 
      finds all files which contain at least one
-     <literal>xyz</literal> &xml; element. In case you can not control
+     <literal>xyz</literal> &acro.xml; element. In case you can not control
      which element names the input files contain, you might ask for
      disaster and bad karma using this technique.
     </para>
@@ -645,25 +809,44 @@
       ]]>
      </screen>
      Don't be tempted to play too smart tricks with the power of
-     &xslt;, the above example will create zillions of
+     &acro.xslt;, the above example will create zillions of
      indexes with unpredictable names, resulting in severe &zebra;
      index pollution..
     </para>
   </section>
 
+  <section id="record-model-domxml-debug">
+   <title>Debuggig &acro.dom; Filter Configurations</title>
+   <para>
+    It can be very hard to debug a &acro.dom; filter setup due to the many
+    successive &acro.marc; syntax translations, &acro.xml; stream splitting and 
+    &acro.xslt; transformations involved. As an aid, you have always the
+    power of the <literal>-s</literal> command line switch to the 
+    <literal>zebraidz</literal> indexing command at your hand: 
+    <screen>
+     zebraidx -s -c zebra.cfg update some_record_stream.xml
+    </screen>
+    This command line simulates indexing and dumps a lot of debug
+    information in the logs, telling exactly which transformations
+    have been applied, how the documents look like after each
+    transformation, and which record ids and terms are send to the indexer.
+   </para>
+  </section>
+
+  <!--
   <section id="record-model-domxml-elementset">
-   <title>&dom; Exchange Formats</title>
+   <title>&acro.dom; Exchange Formats</title>
    <para>
      An exchange format can be anything which can be the outcome of an
-     &xslt; transformation, as far as the stylesheet is registered in
-     the main &dom; &xslt; filter configuration file, see
+     &acro.xslt; transformation, as far as the stylesheet is registered in
+     the main &acro.dom; &acro.xslt; filter configuration file, see
      <xref linkend="record-model-domxml-filter"/>.
-     In principle anything that can be expressed in  &xml;, HTML, and
+     In principle anything that can be expressed in  &acro.xml;, HTML, and
      TEXT can be the output of a <literal>schema</literal> or 
     <literal>element set</literal> directive during search, as long as
      the information comes from the 
-     <emphasis>original input record &xml; &dom; tree</emphasis>
-     (and not the transformed and <emphasis>indexed</emphasis> &xml;!!).
+     <emphasis>original input record &acro.xml; &acro.dom; tree</emphasis>
+     (and not the transformed and <emphasis>indexed</emphasis> &acro.xml;!!).
     </para>
     <para>
      In addition, internal administrative information from the &zebra;
@@ -672,10 +855,10 @@
      <screen>
       <![CDATA[  
       <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
-       xmlns:z="http://indexdata.dk/zebra/xslt/1"
+       xmlns:z="http://indexdata.com/zebra-2.0"
        version="1.0">
 
-       <!-- register internal zebra parameters -->       
+       <!- - register internal zebra parameters - ->       
        <xsl:param name="id" select="''"/>
        <xsl:param name="filename" select="''"/>
        <xsl:param name="score" select="''"/>
@@ -683,7 +866,7 @@
            
        <xsl:output indent="yes" method="xml" version="1.0" encoding="UTF-8"/>
 
-       <!-- use then for display of internal information -->
+       <!- - use then for display of internal information - ->
        <xsl:template match="/">
          <z:zebra>
            <id><xsl:value-of select="$id"/></id>
@@ -699,18 +882,19 @@
     </para>
 
   </section>
+  -->
 
   <!--
   <section id="record-model-domxml-example">
-   <title>&dom; Filter &oai; Indexing Example</title>
+   <title>&acro.dom; Filter &acro.oai; Indexing Example</title>
    <para>
-     The source code tarball contains a working &dom; filter example in
+     The source code tarball contains a working &acro.dom; filter example in
      the directory <filename>examples/dom-oai/</filename>, which
      should get you started.  
     </para>
     <para>
-     More example data can be harvested from any &oai; compliant server,
-     see details at the  &oai; 
+     More example data can be harvested from any &acro.oai; compliant server,
+     see details at the  &acro.oai; 
      <ulink url="http://www.openarchives.org/">
       http://www.openarchives.org/</ulink> web site, and the community
       links at