doc/querymodel.xml

   1  <chapter id="querymodel">
   2   <!-- $Id: querymodel.xml,v 1.3 2006-06-14 12:20:06 marc Exp $ -->
   3   <title>Query Model</title>
   4
   5   <sect1 id="querymodel-overview">
   6    <title>Query Model Overview</title>
   7
   8    <para>
   9     Zebra is born as a networking Information Retrieval engine adhering
  10     to the international standards
  11     <ulink url="&url.z39.50;">Z39.50</ulink> and
  12     <ulink url="&url.sru;">SRU</ulink>,
  13     and implement the query model defined there.
  14     Unfortunately, the Z39.50 query model has only defined a binary
  15     encoded representation, which is used as transport packaging in
  16     the Z39.50 protocol layer. This representation is not human
  17     readable, nor defines any convenient way to specify queries.
  18    </para>
  19    <para>
  20     Therefore, Index Data has defined a textual representaion in the
  21     <literal>Prefix Query Format</literal>, short
  22     <literal>PQF</literal>, which then has been adopted by other
  23     parties developing Z39.50 software. It is also often referred to as
  24     <literal>Prefix Query Notation</literal>, or in short
  25     <literal>PQN</literal>, and is thoroughly explained in
  26     <xref linkend="querymodel-pqf"/>.
  27    </para>
  28
  29    <para>
  30     In addition, Zebra can be configured to understand and map the
  31     <literal>Common Query Language</literal>
  32     (<ulink url="&url.cql;">CQL</ulink>)
  33     to PQF. See an introduction on the mapping to the internal query
  34     representation in
  35     <xref linkend="querymodel-cql-to-pqf"/>.
  36    </para>
  37   </sect1>
  38
  39   <sect1 id="querymodel-pqf">
  40    <title>Prefix Query Format structure and syntax</title>
  41    <para>
  42     The <ulink url="&url.yaz.pqf;">PQF grammer</ulink>
  43     is documented in the YAZ manual, and shall not be
  44     repeated here. This textual PQF representation
  45     is always during search mapped to the equivalent Zebra internal
  46     query parse tree.
  47    </para>
  48
  49    <sect2 id="querymodel-pqf-tree">
  50     <title>PQF tree structure</title>
  51     <para>
  52      The PQF parse tree - or the equivalent textual representation -
  53      may start with one specification of the
  54      <emphasis>attribute set</emphasis> used. Following is a query
  55      tree, which
  56      consists of <emphasis>atomic query parts</emphasis>, eventually
  57      paired by <emphasis>boolean binary operators</emphasis>, and
  58      finally  <emphasis>recursively combined </emphasis> into
  59      complex query trees.
  60     </para>
  61
  62     <sect3 id="querymodel-attribute-sets">
  63      <title>Attribute sets</title>
  64      <para>
  65       Attribute sets define the exact meaning and semantics of queries
  66       issued. Zebra comes with some predefined attribute set
  67       definitions, others can easily be defined and added to the
  68       configuration.
  69       <note>
  70        The Zebra internal query procesing is modeled after
  71        the <literal>Bib1</literal> attribute set, and the non-use
  72        attributes type 2-9 are hard-wired in. It is therefore essential
  73        to be familiar with <xref linkend="querymodel-bib1"/>.
  74       </note>
  75      </para>
  76
  77      <table id="querymodel-attribute-sets-table">
  78       <caption>Attribute sets predefined in Zebra</caption>
  79        <!--
  80        <thead>
  81        <tr><td>one</td><td>two</td></tr>
  82       </thead>
  83        -->
  84        <tbody>
  85         <tr>
  86          <td><emphasis>exp-1</emphasis></td>
  87          <td><literal>Explain</literal> attribute set</td>
  88          <td>Special attribute set used on the special automagic
  89           <literal>IR-Explain-1</literal> database to gain information on
  90           server capabilities, database names, and database
  91           and semantics.</td>
  92         </tr>
  93         <tr>
  94          <td><emphasis>bib-1</emphasis></td>
  95          <td><literal>Bib1</literal> attribute set</td>
  96          <td>Standard PQF query language attribute set which defines the
  97           semantics of Z39.50 searching. In addition, all of the
  98           non-use attributes (type 2-9) define the Zebra internal query
  99           processing</td>
 100         </tr>
 101         <tr>
 102          <td><emphasis>gils</emphasis></td>
 103          <td><literal>GILS</literal> attribute set</td>
 104          <td>Extention to the <literal>Bib1</literal> attribute set.</td>
 105         </tr>
 106        </tbody>
 107      </table>
 108     </sect3>
 109
 110     <sect3 id="querymodel-boolean-operators">
 111      <title>Boolean operators</title>
 112      <para>
 113       A pair of subquery trees, or of atomic queries, is combined
 114       using the standard boolean operators into new query trees.
 115      </para>
 116
 117      <table id="querymodel-boolean-operators-table">
 118       <caption>Boolean operators</caption>
 119        <!--
 120        <thead>
 121        <tr><td>one</td><td>two</td></tr>
 122       </thead>
 123        -->
 124        <tbody>
 125         <tr><td><emphasis>@and</emphasis></td>
 126          <td>binary <literal>AND</literal> operator</td>
 127          <td>Set intersection of two atomic queries hit sets</td>
 128         </tr>
 129         <tr><td><emphasis>@or</emphasis></td>
 130          <td>binary <literal>OR</literal> operator</td>
 131          <td>Set union of two atomic queries hit sets</td>
 132         </tr>
 133         <tr><td><emphasis>@not</emphasis></td>
 134          <td>binary <literal>AND NOT</literal> operator</td>
 135          <td>Set complement of two atomic queries hit sets</td>
 136         </tr>
 137         <tr><td><emphasis>@prox</emphasis></td>
 138          <td>binary <literal>PROXIMY</literal> operator</td>
 139          <td>Set intersection of two atomic queries hit sets. In
 140           addition, the intersection set is purged for all
 141           documents which do not satisfy the requested query
 142           term proximity. Usually a proper subset of the AND
 143           operation.</td>
 144         </tr>
 145        </tbody>
 146      </table>
 147
 148      <para>
 149       For example, we can combine the terms
 150       <emphasis>information</emphasis> and <emphasis>retrieval</emphasis>
 151       into different searches in the default index of the default
 152       attribute set as follows.
 153       Querying for the union of all documents containing the
 154       terms <emphasis>information</emphasis> OR
 155       <emphasis>retrieval</emphasis>:
 156       <screen>
 157        Z> find @or information retrieval
 158       </screen>
 159      </para>
 160      <para>
 161       Querying for the intersection of all documents containing the
 162       terms <emphasis>information</emphasis> AND
 163       <emphasis>retrieval</emphasis>:
 164       The hit set is a subset of the coresponding
 165       OR query.
 166       <screen>
 167        Z> find @and information retrieval
 168       </screen>
 169      </para>
 170      <para>
 171       Querying for the intersection of all documents containing the
 172       terms <emphasis>information</emphasis> AND
 173       <emphasis>retrieval</emphasis>, taking proximity into account:
 174       The hit set is a subset of the coresponding
 175       AND query.
 176       <screen>
 177        Z> find @prox information retrieval
 178       </screen>
 179      </para>
 180      <para>
 181       Querying for the intersection of all documents containing the
 182       terms <emphasis>information</emphasis> AND
 183       <emphasis>retrieval</emphasis>, in the same order and near each
 184       other as described in the term list
 185       The hit set is a subset of the coresponding
 186       PROXIMY query.
 187       <screen>
 188        Z> find "information retrieval"
 189       </screen>
 190      </para>
 191     </sect3>
 192
 193
 194     <sect3 id="querymodel-atomic-queries">
 195      <title>Atomic queries</title>
 196      <para>
 197       Atomic queries are the query parts which work on one acess point
 198       only. These consist of <literal>an attribute list</literal>
 199       followed by a <literal>single term</literal> or a
 200       <literal>quoted term list</literal>.
 201      </para>
 202      <para>
 203       Unsupplied non-use attributes type 2-9 are either inherited from
 204       higher nodes in the query tree, or are set to Zebra's default values.
 205       See <xref linkend="querymodel-bib1"/> for details.
 206      </para>
 207
 208      <table id="querymodel-atomic-queries-table">
 209       <caption>Atomic queries</caption>
 210        <!--
 211        <thead>
 212        <tr><td>one</td><td>two</td></tr>
 213       </thead>
 214        -->
 215        <tbody>
 216         <tr><td><emphasis>attribute list</emphasis></td>
 217          <td>List of <literal>orthogonal</literal> attributes</td>
 218          <td>Any of the orthogonal attribute types may be omitted,
 219           these are inherited from higher query tree nodes, or if not
 220           inherited, are set to the default Zebra configuration values.
 221          </td>
 222         </tr>
 223         <tr><td><emphasis>term</emphasis></td>
 224          <td>single <literal>term</literal>
 225           or <literal>quoted term list</literal>   </td>
 226          <td>Here the search terms or list of search terms is added
 227           to the query</td>
 228         </tr>
 229        </tbody>
 230      </table>
 231      <para>
 232       Querying for the term <emphasis>information</emphasis> in the
 233       default index using the default attribite set, the server choice
 234       of access point/index, and the default non-use attributes.
 235       <screen>
 236        Z> find "information"
 237       </screen>
 238      </para>
 239      <para>
 240       Equivalent query fully specified:
 241       <screen>
 242        Z> find @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 "information"
 243       </screen>
 244      </para>
 245
 246      <para>
 247       Finding all documents which have empty titles. Notice that the
 248       empty term must be quoted, but is otherwise legal.
 249       <screen>
 250        Z> find @attr 1=4 ""
 251       </screen>
 252      </para>
 253
 254     </sect3>
 255
 256     <sect3 id="querymodel-use-string">
 257      <title>Zebra's special use attribute type 1 of form 'string'</title>
 258      <para>
 259       The numeric <literal>use (type 1)</literal> attribute is usually
 260       refered to from a given
 261       attribute set. In addition, Zebra let you use
 262       <emphasis>any internal index
 263        name defined in your configuration</emphasis>
 264       as use atribute value. This is a great feature for
 265       debugging, and when you do
 266       not need the complecity of defined use attribute values. It is
 267       the preferred way of accessing Zebra indexes directly.
 268      </para>
 269      <para>
 270       Finding all documents which have the term list "information
 271       retrieval" in an Zebra index, using it's internal full string name.
 272       <screen>
 273        Z> find @attr 1=sometext "information retrieval"
 274       </screen>
 275      </para>
 276      <para>
 277       Searching the bib-1 use attribute 54 using it's string name:
 278       <screen>
 279        Z> find @attr 1=Code-language eng
 280       </screen>
 281      </para>
 282      <para>
 283       Searching in any silly string index - if it's defined in your
 284       indexation rules and can be parsed by the PQF parser.
 285       This is definitely not the recommended use of
 286       this facility, as it might confuse your users with some very
 287       unexpected results.
 288       <screen>
 289        Z> find @attr 1=silly/xpath/alike[@index]/name "information retrieval"
 290       </screen>
 291      </para>
 292      <para>
 293       See <xref linkend="querymodel-bib1-mapping"/> for details, and
 294       <xref linkend="server-sru"/>
 295       for the SRU PQF query extention using string names as a fast
 296       debugging facility.
 297      </para>
 298     </sect3>
 299
 300     <sect3 id="querymodel-use-xpath">
 301      <title>Zebra's special use attribute type 1 of form 'XPath'
 302       for GRS filters</title>
 303      <para>
 304       As we have seen above, it is possible (albeit seldom a great
 305       idea) to emulate
 306       <ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink> based
 307       search by defining <literal>use (type 1)</literal>
 308       <emphasis>string</emphasis> attributes which in appearence
 309       <emphasis>resemble XPath queries</emphasis>. There are two
 310       problems with this approach: first, the XPath-look-alike has to
 311       be defined at indexation time, no new undefined
 312       XPath queries can entered at search time, and second, it might
 313       confuse users very much that an XPath-alike index name in fact
 314       gets populated from a possible entirely different XML element
 315       than it pretends to acess.
 316      </para>
 317      <para>
 318       When using the <literal>GRS Record Model</literal>
 319       (see  <xref linkend="record-model-grs"/>), we have the
 320       possibility to embed <emphasis>life</emphasis>
 321       XPath expressions
 322       in the PQF queries, which are here called
 323       <literal>use (type 1)</literal> <emphasis>xpath</emphasis>
 324       attributes. You must enable the
 325       <literal>xpath enable</literal> directive in your
 326       <literal>.abs</literal> config files.
 327      </para>
 328      <note>
 329       Only a <emphasis>very</emphasis> restricted subset of the
 330       <ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink>
 331       standard is supported as the GRS record model is simpler than
 332       a full XML DOM structure. See the following examples for
 333       possibilities.
 334      </note>
 335      <para>
 336       Finding all documents which have the term "content"
 337       inside a text node found in a specific XML DOM
 338       <emphasis>subtree</emphasis>, whose starting element is
 339       adressed by XPath.
 340       <screen>
 341        Z> find @attr 1=/root content
 342        Z> find @attr 1=/root/first content
 343       </screen>
 344       <emphasis>Notice that the
 345        XPath must be absolute, i.e., must start with '/', and that the
 346        XPath <literal>decendant-or-self</literal> axis followed by a
 347        text node selection <literal>text()</literal> is implicitly
 348        appended to the stated XPath.
 349       </emphasis>
 350       It follows that the above searches are interpreted as:
 351       <screen>
 352        Z> find @attr 1=/root//text() content
 353        Z> find @attr 1=/root/first//text() content
 354       </screen>
 355      </para>
 356
 357      <para>
 358       Filter the adressing XPath by a predicate working on exact
 359       string values in
 360       attributes (in the XML sense) can be done: return all those docs which
 361       have the term "english" contained in one of all text subnodes of
 362       the subtree defined by the XPath
 363       <literal>/record/title[@lang='en']</literal>
 364       <screen>
 365        Z> find @attr 1=/record/title[@lang='en'] english
 366       </screen>
 367      </para>
 368
 369      <para>
 370       Combining numeric indexes, boolean expressions,
 371       and xpath based searches is possible:
 372       <screen>
 373        Z> find @attr 1=/record/title @and foo bar
 374        Z> find @and @attr 1=/record/title foo @attr 1=4 bar
 375       </screen>
 376      </para>
 377      <para>
 378       Escaping PQF keywords and other non-parseable XPath constructs
 379       with <literal>'{ }'</literal> to prevent syntax errors:
 380       <screen>
 381        Z> find @attr {1=/root/first[@attr='danish']} content
 382        Z> find @attr {1=/root/second[@attr='danish lake']}
 383        Z> find @attr {1=/root/third[@attr='dansk s\xc3\xb8']}
 384       </screen>
 385      </para>
 386      <warning>
 387       It is worth mentioning that these dynamic performed XPath
 388       queries are a performance bottelneck, as no optimized
 389       specialized indexes can be used. Therefore, avoid the use of
 390       this facility when speed is essential, and the database content
 391       size is medium to large.
 392      </warning>
 393     </sect3>
 394
 395    </sect2>
 396
 397    <sect2 id="querymodel-exp1">
 398     <title>Explain Attribute Set</title>
 399     <para>
 400      The Z39.50 standard defines the
 401      <ulink url="&url.z39.50.explain;">Explain</ulink>attribute set
 402      <literal>exp-1</literal>, which is used to discover information
 403      about a server's search semantics and functional capabilities
 404      Zebra exposes a  "classic"
 405      Explain database by base name <literal>IR-Explain-1</literal>, which
 406      is populated with system internal information.
 407     </para>
 408    <para>
 409      The attribute-set <literal>exp-1</literal> consists of a single
 410      <literal>Use (type 1)</literal> attribute.
 411     </para>
 412     <para>
 413      In addition, the non-Use
 414      <literal>bib-1</literal> attributes, that is, the types
 415      <literal>Relation</literal>, <literal>Position</literal>,
 416      <literal>Structure</literal>, <literal>Truncation</literal>,
 417      and <literal>Completeness</literal> are imported from
 418      the <literal>bib-1</literal> attribute set, and may be used
 419      within any explain query.
 420     </para>
 421
 422     <sect3 id="querymodel-exp1-use">
 423     <title>Use Attributes (type = 1)</title>
 424      <para>
 425       The following Explain search atributes are supported:
 426       <literal>ExplainCategory</literal> (@attr 1=1),
 427       <literal>DatabaseName</literal> (@attr 1=3),
 428       <literal>DateAdded</literal> (@attr 1=9),
 429       <literal>DateChanged</literal>(@attr 1=10).
 430      </para>
 431      <para>
 432       A search in the use attribute  <literal>ExplainCategory</literal>
 433       supports only these predefined values:
 434       <literal>CategoryList</literal>, <literal>TargetInfo</literal>,
 435       <literal>DatabaseInfo</literal>, <literal>AttributeDetails</literal>.
 436      </para>
 437      <para>
 438       See <filename>tab/explain.att</filename> and the
 439       <ulink url="&url.z39.50;">Z39.50</ulink> standard
 440       for more information.
 441      </para>
 442     </sect3>
 443
 444     <sect3>
 445      <title>Explain searches with yaz-client</title>
 446      <para>
 447       Classic Explain only defines retrieval of Explain information
 448       via ASN.1. Pratically no Z39.50 clients supports this. Fortunately
 449       they don't have to - Zebra allows retrieval of this information
 450       in other formats:
 451       <literal>SUTRS</literal>, <literal>XML</literal>,
 452       <literal>GRS-1</literal> and  <literal>ASN.1</literal> Explain.
 453      </para>
 454
 455      <para>
 456       List supported categories to find out which explain commands are
 457       supported:
 458       <screen>
 459        Z> base IR-Explain-1
 460        Z> find @attr exp1 1=1 categorylist
 461        Z> form sutrs
 462        Z> show 1+2
 463       </screen>
 464      </para>
 465
 466      <para>
 467       Get target info, that is, investigate which databases exist at
 468       this server endpoint:
 469       <screen>
 470        Z> base IR-Explain-1
 471        Z> find @attr exp1 1=1 targetinfo
 472        Z> form xml
 473        Z> show 1+1
 474        Z> form grs-1
 475        Z> show 1+1
 476        Z> form sutrs
 477        Z> show 1+1
 478       </screen>
 479      </para>
 480
 481      <para>
 482       List all supported databases, the number of hits
 483       is the number of databases found, which most commonly are the
 484       following two:
 485       the <literal>Default</literal> and the
 486       <literal>IR-Explain-1</literal> databases.
 487       <screen>
 488        Z> base IR-Explain-1
 489        Z> find @attr exp1 1=1 databaseinfo
 490        Z> form sutrs
 491        Z> show 1+2
 492       </screen>
 493      </para>
 494
 495      <para>
 496       Get database info record for database <literal>Default</literal>.
 497       <screen>
 498        Z> base IR-Explain-1
 499        Z> find @and @attr exp1 1=1 databaseinfo @attr exp1 1=3 Default
 500       </screen>
 501       Identical query with explicitly specified attribute set:
 502       <screen>
 503        Z> base IR-Explain-1
 504        Z> find @attrset exp1 @and @attr 1=1 databaseinfo @attr 1=3 Default
 505       </screen>
 506      </para>
 507
 508      <para>
 509       Get attribute details record for database
 510       <literal>Default</literal>.
 511       This query is very useful to study the internal Zebra indexes.
 512       If records have been indexed using the <literal>alvis</literal>
 513       XSLT filter, the string representation names of the known indexes can be
 514       found.
 515       <screen>
 516        Z> base IR-Explain-1
 517        Z> find @and @attr exp1 1=1 attributedetails @attr exp1 1=3 Default
 518       </screen>
 519       Identical query with explicitly specified attribute set:
 520       <screen>
 521        Z> base IR-Explain-1
 522        Z> find @attrset exp1 @and @attr 1=1 attributedetails @attr 1=3 Default
 523       </screen>
 524      </para>
 525     </sect3>
 526
 527    </sect2>
 528
 529    <sect2 id="querymodel-bib1">
 530     <title>Bib1 Attribute Set</title>
 531     <para>
 532      Something about querying to be written ..
 533     </para>
 534     <para>
 535      Most of the information contained in this section is an excerpt of
 536      the <literal>ATTRIBUTE SET BIB-1 (Z39.50-1995)
 537       SEMANTICS</literal>,
 538      found at  <ulink url="&url.z39.50.attset.bib1.1995;">. The BIB-1
 539       Attribute Set Semantics</ulink> from 1995, also in an updated
 540      <ulink url="&url.z39.50.attset.bib1;">Bib-1
 541       Attribute Set</ulink>
 542      version from 2003. Index Data is not the copyright holder of this
 543      information.
 544     </para>
 545
 546
 547    <sect3 id="querymodel-bib1-use">
 548      <title>Use Attributes (type = 1)</title>
 549     </sect3>
 550
 551     <para>
 552      Phrase search for <emphasis>information retrieval</emphasis> in
 553      the title-register:
 554      <screen>
 555       Z> find @attr 1=4 "information retrieval"
 556      </screen>
 557     </para>
 558
 559
 560     <sect3 id="querymodel-bib1-relation">
 561      <title>Relation Attributes (type = 2)</title>
 562     </sect3>
 563     <para>
 564     </para>
 565
 566     <para>
 567      Ranked search for <emphasis>information retrieval</emphasis> in
 568      the title-register
 569      (see <xref linkend="administration-ranking"/> for the glory details):
 570      <screen>
 571       Z> find @attr 1=4 @attr 2=102 "information retrieval"
 572      </screen>
 573     </para>
 574
 575     <sect3 id="querymodel-bib1-position">
 576      <title>Position Attributes (type = 3)</title>
 577     </sect3>
 578
 579     <sect3 id="querymodel-bib1-structure">
 580      <title>Structure Attributes (type = 4)</title>
 581     </sect3>
 582
 583
 584     <para>
 585      For example, in
 586      the GILS schema (<literal>gils.abs</literal>), the
 587      west-bounding-coordinate is indexed as type <literal>n</literal>,
 588      and is therefore searched by specifying
 589      <emphasis>structure</emphasis>=<emphasis>Numeric String</emphasis>.
 590      To match all those records with west-bounding-coordinate greater
 591      than -114 we use the following query:
 592      <screen>
 593       Z> find @attr 4=109 @attr 2=5 @attr gils 1=2038 -114
 594      </screen>
 595     </para>
 596
 597     <sect3 id="querymodel-bib1-truncation">
 598      <title>Truncation Attributes (type = 5)</title>
 599     </sect3>
 600
 601     <sect3 id="querymodel-bib1-completeness">
 602     <title>Completeness Attributes (type = 6)</title>
 603     </sect3>
 604    </sect2>
 605
 606
 607    <sect2 id="querymodel-zebra-attr-search">
 608     <title>Zebra specific Search Extentions to all Attribute Sets</title>
 609     <para>
 610      Zebra extends the Bib1 attribute types, and these extentions are
 611      recognized regardless of attribute
 612      set used in a <literal>search</literal> operation query.
 613     </para>
 614
 615      <table id="querymodel-zebra-attr-search-table">
 616       <caption>Zebra Search Attribute Extentions</caption>
 617        <thead>
 618         <tr>
 619          <td><emphasis>Name and Type</emphasis></td>
 620          <td>Operation</td>
 621          <td>Zebra version</td>
 622         </tr>
 623       </thead>
 624        <tbody>
 625         <tr>
 626          <td><emphasis>Embedded Sort (type 7)</emphasis></td>
 627          <td>search</td>
 628          <td>1.1</td>
 629         </tr>
 630         <tr>
 631          <td><emphasis>Term Set (type 8)</emphasis></td>
 632          <td>search</td>
 633          <td>1.1</td>
 634         </tr>
 635         <tr>
 636          <td><emphasis>Rank weight  (type 9)</emphasis></td>
 637          <td>search</td>
 638          <td>1.1</td>
 639         </tr>
 640         <tr>
 641          <td><emphasis>Approx Limit (type 9)</emphasis></td>
 642          <td>search</td>
 643          <td>1.4</td>
 644         </tr>
 645         <tr>
 646          <td><emphasis>Term Reference (type 10)</emphasis></td>
 647          <td>search</td>
 648          <td>1.4</td>
 649         </tr>
 650        </tbody>
 651       </table>
 652
 653     <sect3 id="querymodel-zebra-attr-sorting">
 654      <title>Zebra Extention Embedded Sort Attribute (type 7)</title>
 655     </sect3>
 656     <para>
 657      The embedded sort is a way to specify sort within a query - thus
 658      removing the need to send a Sort Request separately. It is both
 659      faster and does not require clients to deal with the Sort
 660      Facility.
 661     </para>
 662     <para>
 663      The possible values after attribute <literal>type 7</literal> are
 664      <literal>1</literal> ascending and
 665      <literal>2</literal> descending.
 666      The attributes+term (APT) node is separate from the
 667      rest and must be <literal>@or</literal>'ed.
 668      The term associated with APT is the sorting level in integers,
 669      where <literal>0</literal> means primary sort,
 670      <literal>1</literal> means secondary sort, and so forth.
 671      See also <xref linkend="administration-ranking"/>.
 672     </para>
 673     <para>
 674      For example, searching for water, sort by title (ascending)
 675      <screen>
 676       Z> find @or @attr 1=1016 water @attr 7=1 @attr 1=4 0
 677      </screen>
 678     </para>
 679     <para>
 680      Or, searching for water, sort by title ascending, then date descending
 681      <screen>
 682       Z> find @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1
 683      </screen>
 684     </para>
 685
 686     <sect3 id="querymodel-zebra-attr-estimation">
 687      <title>Zebra Extention Term Set Attribute (type 8)</title>
 688     </sect3>
 689     <para>
 690      The Term Set feature is a facility that allows a search to store
 691      hitting terms in a "pseudo" resultset; thus a search (as usual) +
 692      a scan-like facility. Requires a client that can do named result
 693      sets since the search generates two result sets. The value for
 694      attribute 8 is the name of a result set (string). The terms in
 695      the named term set are returned as SUTRS records.
 696     </para>
 697     <para>
 698      For example, searching  for u in title, right truncated, and
 699      storing the result in term set named 'aset'
 700      <screen>
 701       Z> find @attr 5=1 @attr 1=4 @attr 8=aset u
 702      </screen>
 703     </para>
 704     <warning>
 705      The model has one serious flaw: we don't know the size of term
 706      set. Experimental. Do not use in production code.
 707     </warning>
 708
 709     <sect3 id="querymodel-zebra-attr-weight">
 710      <title>Zebra Extention Rank Weight Attribute (type 9)</title>
 711     </sect3>
 712     <para>
 713      Rank weight is a way to pass a value to a ranking algorithm - so
 714      that one APT has one value - while another as a different one.
 715      See also <xref linkend="administration-ranking"/>.
 716     </para>
 717     <para>
 718      For example, searching  for utah in title with weight 30 as well
 719      as any with weight 20:
 720      <screen>
 721       Z> find @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 utah
 722      </screen>
 723     </para>
 724
 725     <sect3 id="querymodel-zebra-attr-limit">
 726      <title>Zebra Extention Approximative Limit Attribute (type 9)</title>
 727     </sect3>
 728     <para>
 729      Newer Zebra versions normally estemiates hit count for every APT
 730      (leaf) in the query tree. These hit counts are returned as part of
 731      the searchResult-1 facility in the binary encoded Z39.50 search
 732      response packages.
 733     </para>
 734     <para>
 735      By setting a limit for the APT we can make Zebra turn into
 736      approximate hit count when a certain hit count limit is
 737      reached. A value of zero means exact hit count.
 738     </para>
 739     <para>
 740      For example, we might be intersted in exact hit count for a, but
 741      for b we allow hit count estimates for 1000 and higher.
 742      <screen>
 743       Z> find @and a @attr 9=1000 b
 744      </screen>
 745     </para>
 746     <note>
 747      The estimated hit count fascility makes searches faster, as one
 748      only needs to process large hit lists partially.
 749     </note>
 750     <warning>
 751      This facility clashes with rank weight, because there all
 752      documents in the hit lists need to be examined for scoring and
 753      re-sorting.
 754      It is an experimental
 755      extention. Do not use in production code.
 756     </warning>
 757
 758     <sect3 id="querymodel-zebra-attr-termref">
 759      <title>Zebra Extention Term Reference Attribute (type 10)</title>
 760     </sect3>
 761     <para>
 762      Zebra supports the searchResult-1 facility. If attribute 10 is
 763      given, that specifies a subqueryId value returned as part of the
 764      search result. It is a way for a client to name an APT part of a
 765      query.
 766     </para>
 767     <!--
 768     <para>
 769      <screen>
 770      </screen>
 771     </para>
 772     -->
 773     <warning>
 774      Experimental. Do not use in production code.
 775     </warning>
 776
 777
 778    </sect2>
 779
 780
 781    <sect2 id="querymodel-zebra-attr-scan">
 782     <title>Zebra specific Scan Extentions to all Attribute Sets</title>
 783     <para>
 784      Zebra extends the Bib1 attribute types, and these extentions are
 785      recognized regardless of attribute
 786      set used in a <literal>scan</literal> operation query.
 787     </para>
 788      <table id="querymodel-zebra-attr-scan-table">
 789       <caption>Zebra Scan Attribute Extentions</caption>
 790        <thead>
 791         <tr>
 792          <td><emphasis>Name and Type</emphasis></td>
 793          <td>Operation</td>
 794          <td>Zebra version</td>
 795         </tr>
 796       </thead>
 797        <tbody>
 798         <tr>
 799          <td><emphasis>Result Set Narrow (type 8)</emphasis></td>
 800          <td>scan</td>
 801          <td>1.3</td>
 802         </tr>
 803         <tr>
 804          <td><emphasis>Approximative Limit (type 9)</emphasis></td>
 805          <td>scan</td>
 806          <td>1.4</td>
 807         </tr>
 808        </tbody>
 809       </table>
 810
 811     <sect3 id="querymodel-zebra-attr-xyz">
 812      <title>Zebra Extention Result Set Narrow (type 8)</title>
 813     </sect3>
 814     <para>
 815      If attribute 8 is given for scan, the value is the name of a
 816      result set. Each hit count in scan is @and'ed with the result set
 817      given.
 818     </para>
 819     <!--
 820     <para>
 821      <screen>
 822      </screen>
 823     </para>
 824     -->
 825     <warning>
 826      Experimental and buggy. Definitely not to be used in production code.
 827     </warning>
 828
 829     <sect3 id="querymodel-zebra-attr-xyz">
 830      <title>Zebra Extention Approximative Limit (type 9)</title>
 831     </sect3>
 832     <para>
 833      The approximative limit (as for search) is a way to enable approx
 834      hit counts for scan hit counts.
 835     </para>
 836     <!--
 837     <para>
 838      <screen>
 839      </screen>
 840     </para>
 841     -->
 842     <warning>
 843      Experimental. Do not use in production code.
 844     </warning>
 845
 846
 847    </sect2>
 848
 849
 850    <sect2 id="querymodel-bib1-mapping">
 851     <title>Mapping from Bib1 Attributes to Zebra internal
 852      register indexes</title>
 853     <para>
 854      TO-DO
 855      </para>
 856
 857
 858      <!-- see in util/zebramap.c
 859       int zebra_maps_attr
 860
 861   if (completeness_value == 2 || completeness_value == 3)
 862         *complete_flag = 1;
 863     else
 864         *complete_flag = 0;
 865     *reg_id = 0;
 866
 867     *sort_flag =(sort_relation_value > 0) ? 1 : 0;
 868     *search_type = "phrase";
 869     strcpy(rank_type, "void");
 870     if (relation_value == 102)
 871     {
 872         if (weight_value == -1)
 873             weight_value = 34;
 874         sprintf(rank_type, "rank,w=%d,u=%d", weight_value, use_value);
 875     }
 876     if (relation_value == 103)
 877     {
 878         *search_type = "always";
 879         *reg_id = 'w';
 880         return 0;
 881     }
 882     if (*complete_flag)
 883         *reg_id = 'p';
 884     else
 885         *reg_id = 'w';
 886     switch (structure_value)
 887     {
 888     case 6:   /* word list */
 889         *search_type = "and-list";
 890         break;
 891     case 105: /* free-form-text */
 892         *search_type = "or-list";
 893         break;
 894     case 106: /* document-text */
 895         *search_type = "or-list";
 896         break;
 897     case -1:
 898     case 1:   /* phrase */
 899     case 2:   /* word */
 900     case 108: /* string */
 901         *search_type = "phrase";
 902         break;
 903    case 107: /* local-number */
 904         *search_type = "local";
 905         *reg_id = 0;
 906         break;
 907     case 109: /* numeric string */
 908         *reg_id = 'n';
 909         *search_type = "numeric";
 910         break;
 911     case 104: /* urx */
 912         *reg_id = 'u';
 913         *search_type = "phrase";
 914         break;
 915     case 3:   /* key */
 916         *reg_id = '0';
 917         *search_type = "phrase";
 918         break;
 919     case 4:  /* year */
 920         *reg_id = 'y';
 921         *search_type = "phrase";
 922         break;
 923     case 5:  /* date */
 924         *reg_id = 'd';
 925         *search_type = "phrase";
 926         break;
 927     default:
 928         return -1;
 929     }
 930     return 0;
 931
 932      -->
 933
 934
 935     <para>
 936      <emphasis>Use</emphasis> attributes are interpreted according to the
 937      attribute sets which have been loaded in the
 938     <literal>zebra.cfg</literal> file, and are matched against specific
 939      fields as specified in the <literal>.abs</literal> file which
 940      describes the profile of the records which have been loaded.
 941      If no Use attribute is provided, a default of Bib-1 Any is assumed.
 942     </para>
 943
 944     <para>
 945      If a <emphasis>Structure</emphasis> attribute of
 946      <emphasis>Phrase</emphasis> is used in conjunction with a
 947      <emphasis>Completeness</emphasis> attribute of
 948      <emphasis>Complete (Sub)field</emphasis>, the term is matched
 949      against the contents of the phrase (long word) register, if one
 950      exists for the given <emphasis>Use</emphasis> attribute.
 951      A phrase register is created for those fields in the
 952      <literal>.abs</literal> file that contains a
 953      <literal>p</literal>-specifier.
 954      <!-- ### whatever the hell _that_ is -->
 955     </para>
 956
 957     <para>
 958      If <emphasis>Structure</emphasis>=<emphasis>Phrase</emphasis> is
 959      used in conjunction with <emphasis>Incomplete Field</emphasis> - the
 960      default value for <emphasis>Completeness</emphasis>, the
 961      search is directed against the normal word registers, but if the term
 962      contains multiple words, the term will only match if all of the words
 963      are found immediately adjacent, and in the given order.
 964      The word search is performed on those fields that are indexed as
 965      type <literal>w</literal> in the <literal>.abs</literal> file.
 966     </para>
 967
 968     <para>
 969      If the <emphasis>Structure</emphasis> attribute is
 970      <emphasis>Word List</emphasis>,
 971      <emphasis>Free-form Text</emphasis>, or
 972      <emphasis>Document Text</emphasis>, the term is treated as a
 973      natural-language, relevance-ranked query.
 974      This search type uses the word register, i.e. those fields
 975      that are indexed as type <literal>w</literal> in the
 976      <literal>.abs</literal> file.
 977     </para>
 978
 979     <para>
 980      If the <emphasis>Structure</emphasis> attribute is
 981      <emphasis>Numeric String</emphasis> the term is treated as an integer.
 982      The search is performed on those fields that are indexed
 983      as type <literal>n</literal> in the <literal>.abs</literal> file.
 984     </para>
 985
 986     <para>
 987      If the <emphasis>Structure</emphasis> attribute is
 988      <emphasis>URx</emphasis> the term is treated as a URX (URL) entity.
 989      The search is performed on those fields that are indexed as type
 990      <literal>u</literal> in the <literal>.abs</literal> file.
 991     </para>
 992
 993     <para>
 994      If the <emphasis>Structure</emphasis> attribute is
 995      <emphasis>Local Number</emphasis> the term is treated as
 996      native Zebra Record Identifier.
 997     </para>
 998
 999     <para>
1000      If the <emphasis>Relation</emphasis> attribute is
1001      <emphasis>Equals</emphasis> (default), the term is matched
1002      in a normal fashion (modulo truncation and processing of
1003      individual words, if required).
1004      If <emphasis>Relation</emphasis> is <emphasis>Less Than</emphasis>,
1005      <emphasis>Less Than or Equal</emphasis>,
1006      <emphasis>Greater than</emphasis>, or <emphasis>Greater than or
1007       Equal</emphasis>, the term is assumed to be numerical, and a
1008      standard regular expression is constructed to match the given
1009      expression.
1010      If <emphasis>Relation</emphasis> is <emphasis>Relevance</emphasis>,
1011      the standard natural-language query processor is invoked.
1012     </para>
1013
1014     <para>
1015      For the <emphasis>Truncation</emphasis> attribute,
1016      <emphasis>No Truncation</emphasis> is the default.
1017      <emphasis>Left Truncation</emphasis> is not supported.
1018      <emphasis>Process # in search term</emphasis> is supported, as is
1019      <emphasis>Regxp-1</emphasis>.
1020      <emphasis>Regxp-2</emphasis> enables the fault-tolerant (fuzzy)
1021      search. As a default, a single error (deletion, insertion,
1022      replacement) is accepted when terms are matched against the register
1023      contents.
1024     </para>
1025    </sect2>
1026
1027    <sect2  id="querymodel-regular">
1028     <title>Zebra Regular Expressions in Truncation Attribute (type = 5)</title>
1029
1030     <para>
1031      Each term in a query is interpreted as a regular expression if
1032      the truncation value is either <emphasis>Regxp-1 (@attr 5=102)</emphasis>
1033      or <emphasis>Regxp-2 (@attr 5=103)</emphasis>.
1034      Both query types follow the same syntax with the operands:
1035     </para>
1036
1037      <table id="querymodel-regular-operands-table">
1038       <caption>Regular Expression Operands</caption>
1039        <!--
1040        <thead>
1041        <tr><td>one</td><td>two</td></tr>
1042       </thead>
1043        -->
1044        <tbody>
1045         <tr>
1046          <td><emphasis>x</emphasis></td>
1047          <td>Matches the character <emphasis>x</emphasis>.</td>
1048         </tr>
1049         <tr>
1050          <td><emphasis>.</emphasis></td>
1051          <td>Matches any character.</td>
1052         </tr>
1053         <tr>
1054          <td><emphasis>[ .. ]</emphasis></td>
1055          <td>Matches the set of characters specified;
1056          such as <literal>[abc]</literal> or <literal>[a-c]</literal>.</td>
1057         </tr>
1058        </tbody>
1059       </table>
1060
1061     <para>
1062      The above operands can be combined with the following operators:
1063     </para>
1064
1065
1066      <table id="querymodel-regular-operators-table">
1067       <caption>Regular Expression Operators</caption>
1068        <!--
1069        <thead>
1070        <tr><td>one</td><td>two</td></tr>
1071       </thead>
1072        -->
1073        <tbody>
1074         <tr>
1075          <td><emphasis>x*</emphasis></td>
1076          <td>Matches <emphasis>x</emphasis> zero or more times.
1077           Priority: high.</td>
1078         </tr>
1079         <tr>
1080          <td><emphasis>x+</emphasis></td>
1081          <td>Matches <emphasis>x</emphasis> one or more times.
1082           Priority: high.</td>
1083         </tr>
1084         <tr>
1085          <td><emphasis>x?</emphasis></td>
1086          <td> Matches <emphasis>x</emphasis> zero or once.
1087           Priority: high.</td>
1088         </tr>
1089         <tr>
1090          <td><emphasis>xy</emphasis></td>
1091          <td> Matches <emphasis>x</emphasis>, then <emphasis>y</emphasis>.
1092          Priority: medium.</td>
1093         </tr>
1094         <tr>
1095          <td><emphasis>x|y</emphasis></td>
1096          <td> Matches either <emphasis>x</emphasis> or <emphasis>y</emphasis>.
1097          Priority: low.</td>
1098         </tr>
1099         <tr>
1100          <td><emphasis>( )</emphasis></td>
1101          <td>The order of evaluation may be changed by using parentheses.</td>
1102         </tr>
1103        </tbody>
1104       </table>
1105
1106     <para>
1107      If the first character of the <emphasis>Regxp-2</emphasis> query
1108      is a plus character (<literal>+</literal>) it marks the
1109      beginning of a section with non-standard specifiers.
1110      The next plus character marks the end of the section.
1111      Currently Zebra only supports one specifier, the error tolerance,
1112      which consists one digit.
1113     </para>
1114
1115     <para>
1116      Since the plus operator is normally a suffix operator the addition to
1117      the query syntax doesn't violate the syntax for standard regular
1118      expressions.
1119     </para>
1120
1121     <para>
1122      For example, a phrase search with regular expressions  in
1123      the title-register is performed like this:
1124      <screen>
1125       Z> find @attr 1=4 @attr 5=102 "informat.* retrieval"
1126      </screen>
1127     </para>
1128
1129     <para>
1130      Combinations with other attributes are possible. For example, a
1131      ranked search with a regular expression
1132      (see <xref linkend="administration-ranking"/> for the glory details):
1133      <screen>
1134       Z> find @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
1135      </screen>
1136     </para>
1137    </sect2>
1138
1139
1140    <!--
1141    <para>
1142     The RecordType parameter in the <literal>zebra.cfg</literal> file, or
1143     the <literal>-t</literal> option to the indexer tells Zebra how to
1144     process input records.
1145     Two basic types of processing are available - raw text and structured
1146     data. Raw text is just that, and it is selected by providing the
1147     argument <emphasis>text</emphasis> to Zebra. Structured records are
1148     all handled internally using the basic mechanisms described in the
1149     subsequent sections.
1150     Zebra can read structured records in many different formats.
1151    </para>
1152    -->
1153   </sect1>
1154
1155
1156   <sect1 id="querymodel-cql-to-pqf">
1157    <title>Server Side CQL to PQF Query Translation</title>
1158    <para>
1159     Using the
1160     <literal>&lt;cql2rpn&gt;l2rpn.txt&lt;/cql2rpn&gt;</literal>
1161       YAZ Frontend Virtual
1162     Hosts option, one can configure
1163     the YAZ Frontend CQL-to-PQF
1164     converter, specifying the interpretation of various
1165     <ulink url="&url.cql;">CQL</ulink>
1166     indexes, relations, etc. in terms of Type-1 query attributes.
1167     <!-- The  yaz-client config file -->
1168    </para>
1169    <para>
1170     For example, using server-side CQL-to-PQF conversion, one might
1171     query a zebra server like this:
1172     <screen>
1173     <![CDATA[
1174      yaz-client localhost:9999
1175      Z> querytype cql
1176      Z> find text=(plant and soil)
1177      ]]>
1178     </screen>
1179      and - if properly configured - even static relevance ranking can
1180      be performed using CQL query syntax:
1181     <screen>
1182     <![CDATA[
1183      Z> find text = /relevant (plant and soil)
1184      ]]>
1185      </screen>
1186    </para>
1187
1188    <para>
1189     By the way, the same configuration can be used to
1190     search using client-side CQL-to-PQF conversion:
1191     (the only difference is <literal>querytype cql2rpn</literal>
1192     instead of
1193     <literal>querytype cql</literal>, and the call specifying a local
1194     conversion file)
1195     <screen>
1196     <![CDATA[
1197      yaz-client -q local/cql2pqf.txt localhost:9999
1198      Z> querytype cql2rpn
1199      Z> find text=(plant and soil)
1200      ]]>
1201      </screen>
1202    </para>
1203
1204    <para>
1205     Exhaustive information can be found in the
1206     Section "Specification of CQL to RPN mappings" in the YAZ manual.
1207     <ulink url="http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map">
1208      http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map</ulink>,
1209    and shall therefore not be repeated here.
1210    </para>
1211   <!--
1212   <para>
1213     See
1214       <ulink url="http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html">
1215       http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html</ulink>
1216     for the Maintenance Agency's work-in-progress mapping of Dublin Core
1217     indexes to Attribute Architecture (util, XD and BIB-2)
1218     attributes.
1219    </para>
1220    -->
1221  </sect1>
1222
1223
1224
1225 </chapter>
1226
1227  <!-- Keep this comment at the end of the file
1228  Local variables:
1229  mode: sgml
1230  sgml-omittag:t
1231  sgml-shorttag:t
1232  sgml-minimize-attributes:nil
1233  sgml-always-quote-attributes:t
1234  sgml-indent-step:1
1235  sgml-indent-data:t
1236  sgml-parent-document: "zebra.xml"
1237  sgml-local-catalogs: nil
1238  sgml-namecase-general:t
1239  End:
1240  -->