X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Fquerymodel.xml;h=bed7a2e3d153c420b414c4ce30ad8ad4a91fd327;hb=0002d3ccff37e5598553683e95714ca5711f05e8;hp=58c39c62f9537a5b14c5ec56627e2aae7e347ffe;hpb=6d074f35cdc58c223a2f0e4c7ee9d9be5d47ddfb;p=idzebra-moved-to-github.git

diff --git a/doc/querymodel.xml b/doc/querymodel.xml
index 58c39c6..bed7a2e 100644
--- a/doc/querymodel.xml
+++ b/doc/querymodel.xml
@@ -1,43 +1,85 @@
  <chapter id="querymodel">
-  <!-- $Id: querymodel.xml,v 1.4 2006-06-14 13:44:15 adam Exp $ -->
+  <!-- $Id: querymodel.xml,v 1.6 2006-06-15 13:41:49 marc Exp $ -->
   <title>Query Model</title>
   
   <sect1 id="querymodel-overview">
    <title>Query Model Overview</title>
    
-   <para>
-    Zebra is born as a networking Information Retrieval engine adhering
-    to the international standards 
-    <ulink url="&url.z39.50;">Z39.50</ulink> and
-    <ulink url="&url.sru;">SRU</ulink>,
-    and implement the query model defined there.
-    Unfortunately, the Z39.50 query model has only defined a binary
-    encoded representation, which is used as transport packaging in
-    the Z39.50 protocol layer. This representation is not human
-    readable, nor defines any convenient way to specify queries. 
-   </para>
+
+   <sect2 id="querymodel-query-languages">
+    <title>Query Languages</title>
+ 
+    <para>
+     Zebra is born as a networking Information Retrieval engine adhering
+     to the international standards 
+     <ulink url="&url.z39.50;">Z39.50</ulink> and
+     <ulink url="&url.sru;">SRU</ulink>,
+     and implement the query model defined there.
+     Unfortunately, the Z39.50 query model has only defined a binary
+     encoded representation, which is used as transport packaging in
+     the Z39.50 protocol layer. This representation is not human
+     readable, nor defines any convenient way to specify queries. 
+    </para>
    <!-- tell about RPN - include link to YAZ 
         url.yaz.pqf -->
+
+   <sect3 id="querymodel-query-languages-pqf">
+    <title>Prefix Query Format (PQF)</title>
+
    <para>
-    Therefore, Index Data has defined a textual representation of the
-    RPN query: <literal>Prefix Query Format</literal>, short
-    <literal>PQF</literal>, which then has been adopted by other
-    parties developing Z39.50 software. It is also often referred to as
-    <literal>Prefix Query Notation</literal>, or in short 
-    <literal>PQN</literal>, and is thoroughly explained in       
-    <xref linkend="querymodel-pqf"/>. 
-   </para>
+     Index Data has defined a textual representaion in the 
+     <literal>Prefix Query Format</literal>, short
+     <literal>PQF</literal>, which then has been adopted by other
+     parties developing Z39.50 software. It is also often referred to as
+     <literal>Prefix Query Notation</literal>, or in short 
+     <literal>PQN</literal>, and is thoroughly explained in       
+     <xref linkend="querymodel-pqf"/>. 
+    </para>
+   </sect3>    
+
 
    <!-- PQF/RPN is natively supported. CQL is NOT . So we need a map -->
+   <sect3 id="querymodel-query-languages-cql">
+    <title>Common Query Language (CQL)</title>
    <para>
-    In addition, Zebra can be configured to understand and map the 
-    <literal>Common Query Language</literal>
-    (<ulink url="&url.cql;">CQL</ulink>)
-    to PQF. See an introduction on the mapping to the internal query
-    representation in  
-    <xref linkend="querymodel-cql-to-pqf"/>.
-   </para>
-  </sect1>
+     In addition, Zebra can be configured to understand and map the 
+     <literal>Common Query Language</literal>
+     (<ulink url="&url.cql;">CQL</ulink>)
+     to PQF. See an introduction on the mapping to the internal query
+     representation in  
+     <xref linkend="querymodel-cql-to-pqf"/>.
+    </para>
+   </sect3>    
+ 
+   </sect2>
+
+   <sect2 id="querymodel-query-types">
+    <title>Query types</title>
+    <para>
+    </para>
+
+    <sect3 id="querymodel-query-type-explain">
+     <title>Explain Queries</title>
+     <para>
+     </para>
+    </sect3>
+
+    <sect3 id="querymodel-query-type-search">
+     <title>Search Queries</title>
+     <para>
+     </para>
+    </sect3>
+
+    <sect3 id="querymodel-query-type-scan">
+     <title>Scan Queries</title>
+     <para>
+     </para>
+    </sect3>
+
+   </sect2>
+
+ </sect1>
+
   
   <sect1 id="querymodel-pqf">
    <title>Prefix Query Format structure and syntax</title>
@@ -72,7 +114,7 @@
       <note>
        The Zebra internal query procesing is modeled after 
        the <literal>Bib1</literal> attribute set, and the non-use
-       attributes type 2-9 are hard-wired in. It is therefore essential
+       attributes type 2-6 are hard-wired in. It is therefore essential
        to be familiar with <xref linkend="querymodel-bib1"/>. 
       </note>
      </para>
@@ -548,10 +590,33 @@
     
     
    <sect3 id="querymodel-bib1-use">
-     <title>Use Attributes (type = 1)</title>
+     <title>Use Attributes (type 1)</title>
     </sect3>
 
     <para>
+     A use attribute specifies an access point for any atomic query.
+     These acess points are highly dependent on the attribute set used
+     in the query, and are user configurable using the following
+     default configuration files:
+     <filename>tab/bib1.att</filename>,
+     <filename>tab/dan1.att</filename>,
+     <filename>tab/explain.att</filename>, and
+     <filename>tab/gils.att</filename>.
+     New attribute sets can be added by adding new 
+     <filename>tab/*.att</filename> configuration files, which need to
+     be sourced in the main configuration <filename>zebra.cfg</filename>.
+     </para>
+
+    <para>
+     In addition, Zebra allows the acess of 
+     <emphasis>internal index names</emphasis> and <emphasis>dynamic
+     XPath</emphasis> as use attributes. 
+     See  <xref linkend="querymodel-use-string and  "/>
+     <xref linkend="querymodel-use-xpath"/> for
+     alternative acess to the Zebra internal index names and XPath queries.
+    </para> 
+
+    <para>
      Phrase search for <emphasis>information retrieval</emphasis> in
      the title-register:
      <screen>
@@ -561,23 +626,94 @@
 
     
     <sect3 id="querymodel-bib1-relation">
-     <title>Relation Attributes (type = 2)</title>
-    </sect3>
-    <para>
-     Supported operations: = (default, of omitted), &lt; &gt; &lt;=, &gt;= .
-     Unsupported: Not equal.
+     <title>Relation Attributes (type 2)</title>
      
-     The following relation attributes are also supported: relevance (102).
-     <!-- always-matches (103) not supported for all indexes -->
+     <para>
+      Relation attributes describe the relationship of the access
+      point (left side 
+      of the relation) to the search term as qualified by the attributes (right
+      side of the relation), e.g., Date-publication &lt;= 1975.
+      </para>
 
-     All operations are based on a lexicographical ordering, 
-     <emphasis>expect</emphasis> in the case for the
-     following structure attributes: numeric(109).
+     <table id="querymodel-bib1-relation-table">
+      <caption>Relation Attributes (type 2)</caption>
+      <thead>
+        <tr>
+         <td>Relation</td>
+         <td>Value</td>
+         <td>Notes</td>
+        </tr>
+       </thead>
+       <tbody>
+        <tr>
+         <td> Less than</td>
+         <td>1</td>
+         <td>supported</td>
+        </tr>
+        <tr>
+         <td>Less than or equal</td>
+         <td>2</td>
+         <td>supported</td>
+        </tr>
+        <tr>
+         <td>Equal</td>
+         <td>3</td>
+         <td>default</td>
+        </tr>
+        <tr>
+         <td>Greater or equal</td>
+         <td>4</td>
+         <td>supported</td>
+        </tr>
+        <tr>
+         <td>Greater than</td>
+         <td>5</td>
+         <td>supported</td>
+        </tr>
+        <tr>
+         <td>Not equal</td>
+         <td>6</td>
+         <td>unsupported</td>
+        </tr>
+        <tr>
+         <td>Phonetic</td>
+         <td>100</td>
+         <td>unsupported</td>
+        </tr>
+        <tr>
+         <td>Stem</td>
+         <td>101</td>
+         <td>unsupported</td>
+        </tr>
+        <tr>
+         <td>Relevance</td>
+         <td>102</td>
+         <td>supported</td>
+        </tr>
+        <tr>
+         <td>AlwaysMatches</td>
+         <td>103</td>
+         <td>supported</td>
+        </tr>
+       </tbody>
+     </table>
 
-    
-    </para>
-    
+     <para>
+      The relation attribute 
+      <literal>relevance (102)</literal> is supported, see
+      <xref linkend="administration-ranking"/> for full information.
+      <!-- always-matches (103) not supported for all indexes -->
+     </para>
+     
     <para>
+     All ordering operations are based on a lexicographical ordering, 
+     <emphasis>expect</emphasis> when the 
+     structure attribute <literal>numeric (109)</literal> is used. In
+     this case, ordering is numerical. See 
+      <xref linkend="querymodel-bib1-structure"/>.
+    </para>
+
+     <para>
      Ranked search for <emphasis>information retrieval</emphasis> in
      the title-register
      (see <xref linkend="administration-ranking"/> for the glory details):
@@ -585,22 +721,172 @@
       Z> find @attr 1=4 @attr 2=102 "information retrieval"
      </screen>
     </para>
-    
+    </sect3>
+
     <sect3 id="querymodel-bib1-position">
-     <title>Position Attributes (type = 3)</title>
+     <title>Position Attributes (type 3)</title>
+ 
      <para>
-      Only value of (any position(3) is supported. first in field(1),
-      and first in subfield(2) are unsupported but using them
-      does not trigger an error.
+      The position attribute specifies the location of the search term
+      within the field or subfield in which it appears.
+     </para>
+
+     <table id="querymodel-bib1-position-table">
+      <caption>Position Attributes (type 3)</caption>
+      <thead>
+        <tr>
+         <td>Position</td>
+         <td>Value</td>
+         <td>Notes</td>
+        </tr>
+       </thead>
+       <tbody>
+        <tr>
+         <td>First in field </td>
+         <td>1</td>
+         <td>unsupported</td>
+        </tr>
+        <tr>
+         <td>First in subfield</td>
+         <td>2</td>
+         <td>unsupported</td>
+        </tr>
+        <tr>
+         <td>Any position in field</td>
+         <td>3</td>
+         <td>default</td>
+        </tr>
+       </tbody>
+     </table>
+ 
+    <para>
+      The position attribute values <literal>first in field (1)</literal>,
+      and <literal>first in subfield(2)</literal> are unsupported.
+      Using them does not trigger an error, but silent defaults to 
+      <literal>any position in field (3)</literal>.
       <!-- It should -->
+      </para>
     </sect3>
     
     <sect3 id="querymodel-bib1-structure">
-     <title>Structure Attributes (type = 4)</title>
-     <!-- See tab/default.idx -->
+     <title>Structure Attributes (type 4)</title>
+   
+     <para>
+      The structure attribute specifies the type of search
+      term. This causes the search to be mapped on
+      different Zebra internal indexes, which must have been defined
+      at index time. 
+     </para>
+
+     <para> 
+      The possible values of the  
+      <literal>structure attribute (type 4)</literal> can be defined
+      using the configuraiton file <filename>
+      tab/default.idx</filename>.
+      The default configuration is summerized in this table.
+     </para>
+
+     <table id="querymodel-bib1-structure-table">
+      <caption>Structure Attributes (type 4)</caption>
+      <thead>
+        <tr>
+         <td>Structure</td>
+         <td>Value</td>
+         <td>Notes</td>
+        </tr>
+       </thead>
+       <tbody>
+        <tr>
+         <td>Phrase </td>
+         <td>1</td>
+         <td>default</td>
+        </tr>
+        <tr>
+         <td>Word</td>
+         <td>2</td>
+         <td>supported</td>
+        </tr>
+        <tr>
+         <td>Key</td>
+         <td>3</td>
+         <td>supported</td>
+        </tr>
+        <tr>
+         <td>Year</td>
+         <td>4</td>
+         <td>supported</td>
+        </tr>
+        <tr>
+         <td>Date (normalized)</td>
+         <td>5</td>
+         <td>supported</td>
+        </tr>
+        <tr>
+         <td>Word list</td>
+         <td>6</td>
+         <td>supported</td>
+        </tr>
+        <tr>
+         <td>Date (un-normalized)</td>
+         <td>100</td>
+         <td>unsupported</td>
+        </tr>
+        <tr>
+         <td>Name (normalized) </td>
+         <td>101</td>
+         <td>unsupported</td>
+        </tr>
+        <tr>
+         <td>Name (un-normalized) </td>
+         <td>102</td>
+         <td>unsupported</td>
+        </tr>
+        <tr>
+         <td>Structure</td>
+         <td>103</td>
+         <td>unsupported</td>
+        </tr>
+        <tr>
+         <td>Urx</td>
+         <td>104</td>
+         <td>supported</td>
+        </tr>
+        <tr>
+         <td>Free-form-text</td>
+         <td>105</td>
+         <td>supported</td>
+        </tr>
+        <tr>
+         <td>Document-text</td>
+         <td>106</td>
+         <td>supported</td>
+        </tr>
+        <tr>
+         <td>Local-number</td>
+         <td>107</td>
+         <td>supported</td>
+        </tr>
+        <tr>
+         <td>String</td>
+         <td>108</td>
+         <td>unsupported</td>
+        </tr>
+        <tr>
+         <td>Numeric string</td>
+         <td>109</td>
+         <td>supported</td>
+        </tr>
+       </tbody>
+     </table>
     </sect3>
     
     <para>
+     The structure attribute value <literal>local-number
+      (107)</literal>
+     is supported, and maps always to the Zebra internal document ID.
+     </para>
+
+    <para>
      For example, in
      the GILS schema (<literal>gils.abs</literal>), the
      west-bounding-coordinate is indexed as type <literal>n</literal>,
@@ -615,16 +901,86 @@
 
     <sect3 id="querymodel-bib1-truncation">
      <title>Truncation Attributes (type = 5)</title>
+
+     <para>
+      The truncation attribute specifies whether variations of one or
+      more characters are allowed between serch term and hit terms, or
+      not. Using non-default truncation attributes will broaden the
+      document hit set of a search query.
+     </para>
+
+     <table id="querymodel-bib1-truncation-table">
+      <caption>Truncation Attributes (type 5)</caption>
+      <thead>
+        <tr>
+         <td>Truncation</td>
+         <td>Value</td>
+         <td>Notes</td>
+        </tr>
+       </thead>
+       <tbody>
+        <tr>
+         <td>Right truncation </td>
+         <td>1</td>
+         <td>supported</td>
+        </tr>
+        <tr>
+         <td>Left truncation</td>
+         <td>2</td>
+         <td>supported</td>
+        </tr>
+        <tr>
+         <td>Left and right truncation</td>
+         <td>3</td>
+         <td>supported</td>
+        </tr>
+        <tr>
+         <td>Do not truncate</td>
+         <td>100</td>
+         <td>default</td>
+        </tr>
+        <tr>
+         <td>Process # in search term</td>
+         <td>101</td>
+         <td>supported</td>
+        </tr>
+        <tr>
+         <td>RegExpr-1 </td>
+         <td>102</td>
+         <td>supported</td>
+        </tr>
+        <tr>
+         <td>RegExpr-2</td>
+         <td>103</td>
+         <td>supported</td>
+        </tr>
+       </tbody>
+     </table>
+
+     <para>
+      Truncation attribute value 
+      <literal>Process # in search term (100)</literal> is a
+      poor-man's regular expression search. It maps
+      each <literal>#</literal> to <literal>.*</literal>, and
+      performes then a <literal>Regexp-1 (102)</literal> regular
+      expression search.
+     </para>
+     <para>
+      Truncation attribute value 
+       <literal>Regexp-1 (102)</literal> is a normal regular search,
+      see.
+     </para>
      <para>
-      Supported are: No truncation(100) which is the default,
-      Right trunation(1), Left truncation(2),
-      Left&amp;Right truncation(3), 
-      Process <literal>#</literal> in term(100) which maps
-      each # to <literal>.*</literal>,
-      Regexp-1(102) normal regular, Regexp-2(103) (regular with fuzzy),
+       Truncation attribute value 
+      <literal>Regexp-2 (103) </literal> is a Zebra specific extention
+      which allows <emphasis>fuzzy</emphasis> matches. One single
+      error in spelling of search terms is allowed, i.e., a document
+      is hit if it includes a term which can be mapped to the used
+      search term by one character substitution, addition, deletion or
+      change of posiiton. 
+      </para>  
       <!--
       Special 104, 105, 106 are deprecated and will be removed! -->
-      
     </sect3>
     
     <sect3 id="querymodel-bib1-completeness">
@@ -637,6 +993,7 @@
       register type w.
       complete subfield(2) and complete field(3) both triggers
       search field type p.
+     </para>
     </sect3>
    </sect2>
     
@@ -653,34 +1010,40 @@
       <caption>Zebra Search Attribute Extentions</caption>
        <thead>
         <tr>
-         <td><emphasis>Name and Type</emphasis></td>
+         <td>Name</td>
+         <td>Value</td>
          <td>Operation</td>
          <td>Zebra version</td>
         </tr>
       </thead>
        <tbody>
         <tr>
-         <td><emphasis>Embedded Sort (type 7)</emphasis></td>
+         <td>Embedded Sort</td>
+         <td>7</td>
          <td>search</td>
          <td>1.1</td>
         </tr>
         <tr>
-         <td><emphasis>Term Set (type 8)</emphasis></td>
+         <td>Term Set</td>
+         <td>8</td>
          <td>search</td>
          <td>1.1</td>
         </tr>
         <tr>
-         <td><emphasis>Rank weight  (type 9)</emphasis></td>
+         <td>Rank Weight</td>
+         <td>9</td>
          <td>search</td>
          <td>1.1</td>
         </tr>
         <tr>
-         <td><emphasis>Approx Limit (type 9)</emphasis></td>
+         <td>Approx Limit</td>
+         <td>9</td>
          <td>search</td>
          <td>1.4</td>
         </tr>
         <tr>
-         <td><emphasis>Term Reference (type 10)</emphasis></td>
+         <td>Term Reference</td>
+         <td>10</td>
          <td>search</td>
          <td>1.4</td>
         </tr>

Relation	Value	Notes
Less than	1	supported
Less than or equal	2	supported
Equal	3	default
Greater or equal	4	supported
Greater than	5	supported
Not equal	6	unsupported
Phonetic	100	unsupported
Stem	101	unsupported
Relevance	102	supported
AlwaysMatches	103	supported
Position	Value	Notes
First in field	1	unsupported
First in subfield	2	unsupported
Any position in field	3	default
Structure	Value	Notes
Phrase	1	default
Word	2	supported
Key	3	supported
Year	4	supported
Date (normalized)	5	supported
Word list	6	supported
Date (un-normalized)	100	unsupported
Name (normalized)	101	unsupported
Name (un-normalized)	102	unsupported
Structure	103	unsupported
Urx	104	supported
Free-form-text	105	supported
Document-text	106	supported
Local-number	107	supported
String	108	unsupported
Numeric string	109	supported
Truncation	Value	Notes
Right truncation	1	supported
Left truncation	2	supported
Left and right truncation	3	supported
Do not truncate	100	default
Process # in search term	101	supported
RegExpr-1	102	supported
RegExpr-2	103	supported