doc/querymodel.xml

   1 <chapter id="querymodel">
   2  <!-- $Id: querymodel.xml,v 1.2 2006-06-13 13:45:08 marc Exp $ -->
   3  <title>Query Model</title>
   4
   5   <sect1 id="querymodel-overview">
   6    <title>Query Model Overview</title>
   7
   8    <para>
   9     Zebra is born as a networking Information Retrieval engine adhering
  10     to the international standards
  11     <ulink url="&url.z39.50;">Z39.50</ulink> and
  12     <ulink url="&url.sru;">SRU</ulink>,
  13     and implement the query model defined there.
  14     Unfortunately, the Z39.50 query model has only defined a binary
  15     encoded representation, which is used as transport packaging in
  16     the Z39.50 protocol layer. This representation is not human
  17     readable, nor defines any convenient way to specify queries.
  18    </para>
  19    <para>
  20     Therefore, Index Data has defined a textual representaion in the
  21     <literal>Prefix Query Format</literal>, short
  22     <literal>PQF</literal>, which then has been adopted by other
  23     parties developing Z39.50 software. It is also often referred to as
  24     <literal>Prefix Query Notation</literal>, or in short
  25     <literal>PQN</literal>, and is thoroughly explained in
  26     <xref linkend="querymodel-pqf"/>.
  27    </para>
  28
  29    <para>
  30     In addition, Zebra can be configured to understand and map the
  31     <literal>Common Query Language</literal>
  32     (<ulink url="&url.cql;">CQL</ulink>)
  33     to PQF. See an introduction on the mapping to the internal query
  34     representation in
  35     <xref linkend="querymodel-cql-to-pqf"/>.
  36    </para>
  37    </sect1>
  38
  39   <sect1 id="querymodel-pqf">
  40    <title>Prefix Query Format structure and syntax</title>
  41    <para>
  42     The
  43     <ulink url="&url.yaz.pqf;">PQF
  44     grammer</ulink> is documented in the YAZ manual, and shall not be
  45     repeated here.
  46     This textual PQF representation
  47     is always during search mapped to the equivalent Zebra internal
  48     query parse tree.
  49    </para>
  50
  51    <sect2 id="querymodel-pqf-tree">
  52     <title>PQF tree structure</title>
  53    <para>
  54     The PQF parse tree - or the equivalent textual representation -
  55     may start with one specification of the
  56     <emphasis>attribute set</emphasis> used. Following is a query
  57     tree, which
  58     consists of <emphasis>atomic query parts</emphasis>, eventually
  59     paired by <emphasis>boolean binary operators</emphasis>, and
  60     finally  <emphasis>recursively combined </emphasis> into
  61      complex query trees.
  62    </para>
  63
  64    <sect3 id="querymodel-attribute-sets">
  65     <title>Attribute sets</title>
  66     <para>
  67       Attribute sets define the exact meaning and semantics of queries
  68       issued. Zebra comes with some predefined attribute set
  69       definitions, others can easily be defined and added to the
  70       configuration.
  71       <note>
  72       The Zebra internal query procesing is modeled after
  73       the <literal>Bib1</literal> attribute set, and the non-use
  74       attributes type 2-9 are hard-wired in. It is therefore essential
  75       to be familiar with <xref linkend="querymodel-bib1"/>.
  76     </note>
  77    </para>
  78
  79    <table id="querymodel-attribute-sets-table">
  80     <caption>Attribute sets predefined in Zebra</caption>
  81      <!--
  82      <thead>
  83       <tr><td>one</td><td>two</td></tr>
  84      </thead>
  85      -->
  86      <tbody>
  87       <tr>
  88        <td><emphasis>exp-1</emphasis></td>
  89        <td><literal>Explain</literal> attribute set</td>
  90        <td>Special attribute set used on the special automagic
  91        <literal>IR-Explain-1</literal> database to gain information on
  92        server capabilities, database names, and database
  93        and semantics.</td>
  94       </tr>
  95       <tr>
  96        <td><emphasis>bib-1</emphasis></td>
  97        <td><literal>Bib1</literal> attribute set</td>
  98        <td>Standard PQF query language attribute set which defines the
  99            semantics of Z39.50 searching. In addition, all of the
 100        non-use attributes (type 2-9) define the Zebra internal query
 101        processing</td>
 102       </tr>
 103       <tr>
 104        <td><emphasis>gils</emphasis></td>
 105        <td><literal>GILS</literal> attribute set</td>
 106        <td>Extention to the <literal>Bib1</literal> attribute set.</td>
 107       </tr>
 108      </tbody>
 109    </table>
 110    </sect3>
 111
 112    <sect3 id="querymodel-boolean-operators">
 113     <title>Boolean operators</title>
 114     <para>
 115       A pair of subquery trees, or of atomic queries, is combined
 116       using the standard boolean operators into new query trees.
 117     </para>
 118
 119    <table id="querymodel-boolean-operators-table">
 120     <caption>Boolean operators</caption>
 121      <!--
 122      <thead>
 123       <tr><td>one</td><td>two</td></tr>
 124      </thead>
 125      -->
 126      <tbody>
 127       <tr><td><emphasis>@and</emphasis></td>
 128           <td>binary <literal>AND</literal> operator</td>
 129           <td>Set intersection of two atomic queries hit sets</td>
 130       </tr>
 131       <tr><td><emphasis>@or</emphasis></td>
 132           <td>binary <literal>OR</literal> operator</td>
 133           <td>Set union of two atomic queries hit sets</td>
 134       </tr>
 135       <tr><td><emphasis>@not</emphasis></td>
 136           <td>binary <literal>AND NOT</literal> operator</td>
 137           <td>Set complement of two atomic queries hit sets</td>
 138       </tr>
 139       <tr><td><emphasis>@prox</emphasis></td>
 140           <td>binary <literal>PROXIMY</literal> operator</td>
 141           <td>Set intersection of two atomic queries hit sets. In
 142               addition, the intersection set is purged for all
 143               documents which do not satisfy the requested query
 144               term proximity. Usually a proper subset of the AND
 145               operation.</td>
 146       </tr>
 147      </tbody>
 148    </table>
 149
 150    <para>
 151       For example, we can combine the terms
 152       <emphasis>information</emphasis> and <emphasis>retrieval</emphasis>
 153       into different searches in the default index of the default
 154       attribute set as follows.
 155       Querying for the union of all documents containing the
 156       terms <emphasis>information</emphasis> OR
 157       <emphasis>retrieval</emphasis>:
 158      <screen>
 159        @or information retrieval
 160      </screen>
 161    </para>
 162    <para>
 163       Querying for the intersection of all documents containing the
 164       terms <emphasis>information</emphasis> AND
 165       <emphasis>retrieval</emphasis>:
 166       The hit set is a subset of the coresponding
 167       OR query.
 168      <screen>
 169        @and information retrieval
 170      </screen>
 171    </para>
 172    <para>
 173       Querying for the intersection of all documents containing the
 174       terms <emphasis>information</emphasis> AND
 175       <emphasis>retrieval</emphasis>, taking proximity into account:
 176       The hit set is a subset of the coresponding
 177       AND query.
 178      <screen>
 179        @prox information retrieval
 180      </screen>
 181    </para>
 182    <para>
 183       Querying for the intersection of all documents containing the
 184       terms <emphasis>information</emphasis> AND
 185       <emphasis>retrieval</emphasis>, in the same order and near each
 186       other as described in the term list
 187       The hit set is a subset of the coresponding
 188       PROXIMY query.
 189     <screen>
 190        "information retrieval"
 191      </screen>
 192    </para>
 193   </sect3>
 194
 195
 196    <sect3 id="querymodel-atomic-queries">
 197     <title>Atomic queries</title>
 198     <para>
 199       Atomic queries are the query parts which work on one acess point
 200       only. These consist of <literal>an attribute list</literal>
 201       followed by a <literal>single term</literal> or a
 202       <literal>quoted term list</literal>.
 203     </para>
 204     <para>
 205       Unsupplied non-use attributes type 2-9 are either inherited from
 206       higher nodes in the query tree, or are set to Zebra's default values.
 207       See <xref linkend="querymodel-bib1"/> for details.
 208     </para>
 209
 210    <table id="querymodel-atomic-queries-table">
 211     <caption>Atomic queries</caption>
 212      <!--
 213      <thead>
 214       <tr><td>one</td><td>two</td></tr>
 215      </thead>
 216      -->
 217      <tbody>
 218       <tr><td><emphasis>attribute list</emphasis></td>
 219           <td>List of <literal>orthogonal</literal> attributes</td>
 220           <td>Any of the orthogonal attribute types may be omitted,
 221           these are inherited from higher query tree nodes, or if not
 222           inherited, are set to the default Zebra configuration values.
 223        </td>
 224       </tr>
 225       <tr><td><emphasis>term</emphasis></td>
 226           <td>single <literal>term</literal>
 227         or <literal>quoted term list</literal>   </td>
 228           <td>Here the search terms or list of search terms is added
 229           to the query</td>
 230       </tr>
 231      </tbody>
 232    </table>
 233    <para>
 234       Querying for the term <emphasis>information</emphasis> in the
 235       default index using the default attribite set, the server choice
 236       of access point/index, and the default non-use attributes.
 237     <screen>
 238        "information"
 239      </screen>
 240    </para>
 241    <para>
 242     Equivalent query fully specified:
 243       <screen>
 244        @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 "information"
 245       </screen>
 246    </para>
 247
 248    <para>
 249     Finding all documents which have empty titles. Notice that the
 250     empty term must be quoted, but is otherwise legal.
 251       <screen>
 252        @attr 1=4 ""
 253       </screen>
 254    </para>
 255
 256   </sect3>
 257
 258     <sect3 id="querymodel-use-string">
 259      <title>Zebra's special use attribute of type 'string'</title>
 260      <para>
 261       The numeric <literal>use (type 1)</literal> attribute is usually
 262         refered to from a given
 263       attribute set. In addition, Zebra let you use
 264       <emphasis>any internal index
 265       name defined in your configuration</emphasis>
 266         as use atribute value. This is a great feature for
 267       debugging, and when you do
 268       not need the complecity of defined use attribute values. It is
 269       the preferred way of accessing Zebra indexes directly.
 270      </para>
 271      <para>
 272       Finding all documents which have the term list "information
 273       retrieval" in an Zebra index, using it's internal full string name.
 274       <screen>
 275        @attr 1=sometext "information retrieval"
 276       </screen>
 277    </para>
 278      <para>
 279       Searching the bib-1 use attribute 54 using it's string name:
 280       <screen>
 281        @attr 1=Code-language eng
 282       </screen>
 283    </para>
 284      <para>
 285       Searching in any silly string index - if it's defined in your
 286       indexation rules and can be parsed by the PQF parser.
 287       This is definitely not the recommended use of
 288       this facility, as it might confuse your users with some very
 289       unexpected results.
 290       <screen>
 291        @attr 1=silly/xpath/alike[@index]/name "information retrieval"
 292       </screen>
 293    </para>
 294    <para>
 295       See <xref linkend="querymodel-bib1-mapping"/> for details, and
 296        <xref linkend="server-sru"/>
 297       for the SRU PQF query extention using string names as a fast
 298        debugging facility.
 299    </para>
 300   </sect3>
 301
 302   </sect2>
 303
 304   <sect2 id="querymodel-exp1">
 305    <title>Explain Attribute Set</title>
 306     <para>
 307      The Z39.50 standard defines the
 308      <ulink url="&url.z39.50.explain;">Explain</ulink>attribute set
 309      <literal>exp-1</literal>, which is used to discover information
 310      about a server's search semantics and functional capabilities
 311      Zebra exposes a  "classic"
 312      Explain database by base name <literal>IR-Explain-1</literal>, which
 313      is populated with system internal information.
 314     </para>
 315    <para>
 316      The attribute-set <literal>exp-1</literal> consists of a single
 317      <literal>Use (type 1)</literal> attribute.
 318    </para>
 319    <para>
 320      In addition, the non-Use
 321      <literal>bib-1</literal> attributes, that is, the types
 322      <literal>Relation</literal>, <literal>Position</literal>,
 323      <literal>Structure</literal>, <literal>Truncation</literal>,
 324      and <literal>Completeness</literal> are imported from
 325      the <literal>bib-1</literal> attribute set, and may be used
 326      within any explain query.
 327    </para>
 328
 329    <sect3 id="querymodel-exp1-use">
 330     <title>Use Attributes (type = 1)</title>
 331     <para>
 332      The following Explain search atributes are supported:
 333      <literal>ExplainCategory</literal> (@attr 1=1),
 334      <literal>DatabaseName</literal> (@attr 1=3),
 335      <literal>DateAdded</literal> (@attr 1=9),
 336      <literal>DateChanged</literal>(@attr 1=10).
 337     </para>
 338     <para>
 339      A search in the use attribute  <literal>ExplainCategory</literal>
 340      supports only these predefined values:
 341      <literal>CategoryList</literal>, <literal>TargetInfo</literal>,
 342      <literal>DatabaseInfo</literal>, <literal>AttributeDetails</literal>.
 343     </para>
 344      <para>
 345       See <filename>tab/explain.att</filename> and the
 346       for more information.
 347       </para>
 348    </sect3>
 349
 350     <sect3>
 351      <title>Explain searches with yaz-client</title>
 352     <para>
 353      Classic Explain only defines retrieval of Explain information
 354      via ASN.1. Pratically no Z39.50 clients supports this. Fortunately
 355      they don't have to - Zebra allows retrieval of this information
 356      in other formats:
 357      <literal>SUTRS</literal>, <literal>XML</literal>,
 358      <literal>GRS-1</literal> and  <literal>ASN.1</literal> Explain.
 359     </para>
 360
 361      <para>
 362       List supported categories to find out which explain commands are
 363       supported:
 364       <screen>
 365        Z> base IR-Explain-1
 366        Z> @attr exp1 1=1 categorylist
 367        Z> form sutrs
 368        Z> show 1+2
 369       </screen>
 370      </para>
 371
 372      <para>
 373       Get target info, that is, investigate which databases exist at
 374       this server endpoint:
 375       <screen>
 376        Z> base IR-Explain-1
 377        Z> @attr exp1 1=1 targetinfo
 378        Z> form xml
 379        Z> show 1+1
 380        Z> form grs-1
 381        Z> show 1+1
 382        Z> form sutrs
 383        Z> show 1+1
 384       </screen>
 385      </para>
 386
 387      <para>
 388       List all supported databases, the number of hits
 389       is the number of databases found, which most commonly are the
 390       following two:
 391       the <literal>Default</literal> and the
 392       <literal>IR-Explain-1</literal> databases.
 393       <screen>
 394        Z> base IR-Explain-1
 395        Z> f @attr exp1 1=1 databaseinfo
 396        Z> form sutrs
 397        Z> show 1+2
 398       </screen>
 399      </para>
 400
 401      <para>
 402       Get database info record for database <literal>Default</literal>.
 403       <screen>
 404        Z> base IR-Explain-1
 405        Z> @and @attr exp1 1=1 databaseinfo @attr exp1 1=3 Default
 406       </screen>
 407       Identical query with explicitly specified attribute set:
 408       <screen>
 409        Z> base IR-Explain-1
 410        Z> @attrset exp1 @and @attr 1=1 databaseinfo @attr 1=3 Default
 411       </screen>
 412      </para>
 413
 414      <para>
 415       Get attribute details record for database
 416       <literal>Default</literal>.
 417       This query is very useful to study the internal Zebra indexes.
 418       If records have been indexed using the <literal>alvis</literal>
 419       XSLT filter, the string representation names of the known indexes can be
 420       found.
 421       <screen>
 422        Z> base IR-Explain-1
 423        Z> @and @attr exp1 1=1 attributedetails @attr exp1 1=3 Default
 424       </screen>
 425       Identical query with explicitly specified attribute set:
 426       <screen>
 427        Z> base IR-Explain-1
 428        Z> @attrset exp1 @and @attr 1=1 attributedetails @attr 1=3 Default
 429       </screen>
 430      </para>
 431     </sect3>
 432
 433   </sect2>
 434
 435   <sect2 id="querymodel-bib1">
 436    <title>Bib1 Attribute Set</title>
 437    <para>
 438     Something about querying to be written ..
 439    </para>
 440    <para>
 441     Most of the information contained in this section is an excerpt of
 442     the <literal>ATTRIBUTE SET BIB-1 (Z39.50-1995)
 443     SEMANTICS</literal>, found at  <ulink
 444     url="&url.z39.50.attset.bib1.1995;">The BIB-1
 445     Attribute Set Semantics</ulink> from 1995, also in an updated
 446    <ulink url="&url.z39.50.attset.bib1;">Bib-1
 447     Attribute Set</ulink>
 448     version from 2003. Index Data is not the copyright holder of this
 449     information.
 450    </para>
 451
 452
 453    <sect3 id="querymodel-bib1-use">
 454     <title>Use Attributes (type = 1)</title>
 455    </sect3>
 456
 457    <sect3 id="querymodel-bib1-relation">
 458     <title>Relation Attributes (type = 2)</title>
 459    </sect3>
 460    <para>
 461    </para>
 462
 463    <sect3 id="querymodel-bib1-position">
 464     <title>Position Attributes (type = 3)</title>
 465    </sect3>
 466
 467    <sect3 id="querymodel-bib1-structure">
 468     <title>Structure Attributes (type = 4)</title>
 469    </sect3>
 470
 471    <sect3 id="querymodel-bib1-truncation">
 472     <title>Truncation Attributes (type = 5)</title>
 473    </sect3>
 474
 475    <sect3 id="querymodel-bib1-completeness">
 476     <title>Completeness Attributes (type = 6)</title>
 477    </sect3>
 478
 479    <sect3 id="querymodel-bib1-sorting">
 480     <title>Zebra Extention Sorting Attributes (type = 7)</title>
 481    </sect3>
 482
 483    <sect3 id="querymodel-bib1-estimation">
 484     <title>Zebra Extention Search Estimation Attributes (type = 8)</title>
 485    </sect3>
 486
 487    <sect3 id="querymodel-bib1-weight">
 488     <title>Zebra Extention Weight Attributes (type = 9)</title>
 489    </sect3>
 490
 491   </sect2>
 492
 493    <sect2 id="querymodel-bib1-mapping">
 494     <title>Mapping from Bib1 Attributes to Zebra internal
 495      register indexes</title>
 496      <para>
 497      </para>
 498
 499    <para>
 500     <emphasis>Use</emphasis> attributes are interpreted according to the
 501     attribute sets which have been loaded in the
 502     <literal>zebra.cfg</literal> file, and are matched against specific
 503     fields as specified in the <literal>.abs</literal> file which
 504     describes the profile of the records which have been loaded.
 505     If no Use attribute is provided, a default of Bib-1 Any is assumed.
 506    </para>
 507
 508    <para>
 509     If a <emphasis>Structure</emphasis> attribute of
 510     <emphasis>Phrase</emphasis> is used in conjunction with a
 511     <emphasis>Completeness</emphasis> attribute of
 512     <emphasis>Complete (Sub)field</emphasis>, the term is matched
 513     against the contents of the phrase (long word) register, if one
 514     exists for the given <emphasis>Use</emphasis> attribute.
 515     A phrase register is created for those fields in the
 516     <literal>.abs</literal> file that contains a
 517     <literal>p</literal>-specifier.
 518     <!-- ### whatever the hell _that_ is -->
 519    </para>
 520
 521    <para>
 522     If <emphasis>Structure</emphasis>=<emphasis>Phrase</emphasis> is
 523     used in conjunction with <emphasis>Incomplete Field</emphasis> - the
 524     default value for <emphasis>Completeness</emphasis>, the
 525     search is directed against the normal word registers, but if the term
 526     contains multiple words, the term will only match if all of the words
 527     are found immediately adjacent, and in the given order.
 528     The word search is performed on those fields that are indexed as
 529     type <literal>w</literal> in the <literal>.abs</literal> file.
 530    </para>
 531
 532    <para>
 533     If the <emphasis>Structure</emphasis> attribute is
 534     <emphasis>Word List</emphasis>,
 535     <emphasis>Free-form Text</emphasis>, or
 536     <emphasis>Document Text</emphasis>, the term is treated as a
 537     natural-language, relevance-ranked query.
 538     This search type uses the word register, i.e. those fields
 539     that are indexed as type <literal>w</literal> in the
 540     <literal>.abs</literal> file.
 541    </para>
 542
 543    <para>
 544     If the <emphasis>Structure</emphasis> attribute is
 545     <emphasis>Numeric String</emphasis> the term is treated as an integer.
 546     The search is performed on those fields that are indexed
 547     as type <literal>n</literal> in the <literal>.abs</literal> file.
 548    </para>
 549
 550    <para>
 551     If the <emphasis>Structure</emphasis> attribute is
 552     <emphasis>URx</emphasis> the term is treated as a URX (URL) entity.
 553     The search is performed on those fields that are indexed as type
 554     <literal>u</literal> in the <literal>.abs</literal> file.
 555    </para>
 556
 557    <para>
 558     If the <emphasis>Structure</emphasis> attribute is
 559     <emphasis>Local Number</emphasis> the term is treated as
 560     native Zebra Record Identifier.
 561    </para>
 562
 563    <para>
 564     If the <emphasis>Relation</emphasis> attribute is
 565     <emphasis>Equals</emphasis> (default), the term is matched
 566     in a normal fashion (modulo truncation and processing of
 567     individual words, if required).
 568     If <emphasis>Relation</emphasis> is <emphasis>Less Than</emphasis>,
 569     <emphasis>Less Than or Equal</emphasis>,
 570     <emphasis>Greater than</emphasis>, or <emphasis>Greater than or
 571      Equal</emphasis>, the term is assumed to be numerical, and a
 572     standard regular expression is constructed to match the given
 573     expression.
 574     If <emphasis>Relation</emphasis> is <emphasis>Relevance</emphasis>,
 575     the standard natural-language query processor is invoked.
 576    </para>
 577
 578    <para>
 579     For the <emphasis>Truncation</emphasis> attribute,
 580     <emphasis>No Truncation</emphasis> is the default.
 581     <emphasis>Left Truncation</emphasis> is not supported.
 582     <emphasis>Process # in search term</emphasis> is supported, as is
 583     <emphasis>Regxp-1</emphasis>.
 584     <emphasis>Regxp-2</emphasis> enables the fault-tolerant (fuzzy)
 585     search. As a default, a single error (deletion, insertion,
 586     replacement) is accepted when terms are matched against the register
 587     contents.
 588    </para>
 589   </sect2>
 590
 591    <sect2  id="querymodel-regular">
 592     <title>Regular expressions</title>
 593
 594     <para>
 595      Each term in a query is interpreted as a regular expression if
 596      the truncation value is either <emphasis>Regxp-1</emphasis> (102)
 597      or <emphasis>Regxp-2</emphasis> (103).
 598      Both query types follow the same syntax with the operands:
 599      <variablelist>
 600
 601       <varlistentry>
 602        <term>x</term>
 603        <listitem>
 604         <para>
 605          Matches the character <emphasis>x</emphasis>.
 606         </para>
 607        </listitem>
 608       </varlistentry>
 609       <varlistentry>
 610        <term>.</term>
 611        <listitem>
 612         <para>
 613          Matches any character.
 614         </para>
 615        </listitem>
 616       </varlistentry>
 617       <varlistentry>
 618        <term><literal>[</literal>..<literal>]</literal></term>
 619        <listitem>
 620         <para>
 621          Matches the set of characters specified;
 622          such as <literal>[abc]</literal> or <literal>[a-c]</literal>.
 623         </para>
 624        </listitem>
 625       </varlistentry>
 626      </variablelist>
 627      and the operators:
 628      <variablelist>
 629
 630       <varlistentry>
 631        <term>x*</term>
 632        <listitem>
 633         <para>
 634          Matches <emphasis>x</emphasis> zero or more times. Priority: high.
 635         </para>
 636        </listitem>
 637       </varlistentry>
 638       <varlistentry>
 639        <term>x+</term>
 640        <listitem>
 641         <para>
 642          Matches <emphasis>x</emphasis> one or more times. Priority: high.
 643         </para>
 644        </listitem>
 645       </varlistentry>
 646       <varlistentry>
 647        <term>x?</term>
 648        <listitem>
 649         <para>
 650          Matches <emphasis>x</emphasis> zero or once. Priority: high.
 651         </para>
 652        </listitem>
 653       </varlistentry>
 654       <varlistentry>
 655        <term>xy</term>
 656        <listitem>
 657         <para>
 658          Matches <emphasis>x</emphasis>, then <emphasis>y</emphasis>.
 659          Priority: medium.
 660         </para>
 661        </listitem>
 662       </varlistentry>
 663       <varlistentry>
 664        <term>x|y</term>
 665        <listitem>
 666         <para>
 667          Matches either <emphasis>x</emphasis> or <emphasis>y</emphasis>.
 668          Priority: low.
 669         </para>
 670        </listitem>
 671       </varlistentry>
 672      </variablelist>
 673      The order of evaluation may be changed by using parentheses.
 674     </para>
 675
 676     <para>
 677      If the first character of the <emphasis>Regxp-2</emphasis> query
 678      is a plus character (<literal>+</literal>) it marks the
 679      beginning of a section with non-standard specifiers.
 680      The next plus character marks the end of the section.
 681      Currently Zebra only supports one specifier, the error tolerance,
 682      which consists one digit.
 683     </para>
 684
 685     <para>
 686      Since the plus operator is normally a suffix operator the addition to
 687      the query syntax doesn't violate the syntax for standard regular
 688      expressions.
 689     </para>
 690
 691    </sect2>
 692
 693    <sect2  id="querymodel-examples">
 694     <title>Query examples</title>
 695
 696     <para>
 697      Phrase search for <emphasis>information retrieval</emphasis> in
 698      the title-register:
 699      <screen>
 700       @attr 1=4 "information retrieval"
 701      </screen>
 702     </para>
 703
 704     <para>
 705      Ranked search for the same thing:
 706      <screen>
 707       @attr 1=4 @attr 2=102 "Information retrieval"
 708      </screen>
 709     </para>
 710
 711     <para>
 712      Phrase search with a regular expression:
 713      <screen>
 714       @attr 1=4 @attr 5=102 "informat.* retrieval"
 715      </screen>
 716     </para>
 717
 718     <para>
 719      Ranked search with a regular expression:
 720      <screen>
 721       @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
 722      </screen>
 723     </para>
 724
 725     <para>
 726      In the GILS schema (<literal>gils.abs</literal>), the
 727      west-bounding-coordinate is indexed as type <literal>n</literal>,
 728      and is therefore searched by specifying
 729      <emphasis>structure</emphasis>=<emphasis>Numeric String</emphasis>.
 730      To match all those records with west-bounding-coordinate greater
 731      than -114 we use the following query:
 732      <screen>
 733       @attr 4=109 @attr 2=5 @attr gils 1=2038 -114
 734      </screen>
 735     </para>
 736    </sect2>
 737
 738
 739      <!-- see in util/zebramap.c
 740       int zebra_maps_attr
 741
 742   if (completeness_value == 2 || completeness_value == 3)
 743         *complete_flag = 1;
 744     else
 745         *complete_flag = 0;
 746     *reg_id = 0;
 747
 748     *sort_flag =(sort_relation_value > 0) ? 1 : 0;
 749     *search_type = "phrase";
 750     strcpy(rank_type, "void");
 751     if (relation_value == 102)
 752     {
 753         if (weight_value == -1)
 754             weight_value = 34;
 755         sprintf(rank_type, "rank,w=%d,u=%d", weight_value, use_value);
 756     }
 757     if (relation_value == 103)
 758     {
 759         *search_type = "always";
 760         *reg_id = 'w';
 761         return 0;
 762     }
 763     if (*complete_flag)
 764         *reg_id = 'p';
 765     else
 766         *reg_id = 'w';
 767     switch (structure_value)
 768     {
 769     case 6:   /* word list */
 770         *search_type = "and-list";
 771         break;
 772     case 105: /* free-form-text */
 773         *search_type = "or-list";
 774         break;
 775     case 106: /* document-text */
 776         *search_type = "or-list";
 777         break;
 778     case -1:
 779     case 1:   /* phrase */
 780     case 2:   /* word */
 781     case 108: /* string */
 782         *search_type = "phrase";
 783         break;
 784    case 107: /* local-number */
 785         *search_type = "local";
 786         *reg_id = 0;
 787         break;
 788     case 109: /* numeric string */
 789         *reg_id = 'n';
 790         *search_type = "numeric";
 791         break;
 792     case 104: /* urx */
 793         *reg_id = 'u';
 794         *search_type = "phrase";
 795         break;
 796     case 3:   /* key */
 797         *reg_id = '0';
 798         *search_type = "phrase";
 799         break;
 800     case 4:  /* year */
 801         *reg_id = 'y';
 802         *search_type = "phrase";
 803         break;
 804     case 5:  /* date */
 805         *reg_id = 'd';
 806         *search_type = "phrase";
 807         break;
 808     default:
 809         return -1;
 810     }
 811     return 0;
 812
 813      -->
 814
 815    <!--
 816    <para>
 817     The RecordType parameter in the <literal>zebra.cfg</literal> file, or
 818     the <literal>-t</literal> option to the indexer tells Zebra how to
 819     process input records.
 820     Two basic types of processing are available - raw text and structured
 821     data. Raw text is just that, and it is selected by providing the
 822     argument <emphasis>text</emphasis> to Zebra. Structured records are
 823     all handled internally using the basic mechanisms described in the
 824     subsequent sections.
 825     Zebra can read structured records in many different formats.
 826    </para>
 827    -->
 828   </sect1>
 829
 830
 831   <sect1 id="querymodel-cql-to-pqf">
 832    <title>Server Side CQL to PQF Query Translation</title>
 833    <para>
 834     Using the
 835     <literal>&lt;cql2rpn&gt;l2rpn.txt&lt;/cql2rpn&gt;</literal>
 836       YAZ Frontend Virtual
 837     Hosts option, one can configure
 838     the YAZ Frontend CQL-to-PQF
 839     converter, specifying the interpretation of various
 840     <ulink url="&url.cql;">CQL</ulink>
 841     indexes, relations, etc. in terms of Type-1 query attributes.
 842     <!-- The  yaz-client config file -->
 843    </para>
 844    <para>
 845     For example, using server-side CQL-to-PQF conversion, one might
 846     query a zebra server like this:
 847     <screen>
 848     <![CDATA[
 849      yaz-client localhost:9999
 850      Z> querytype cql
 851      Z> find text=(plant and soil)
 852      ]]>
 853     </screen>
 854      and - if properly configured - even static relevance ranking can
 855      be performed using CQL query syntax:
 856     <screen>
 857     <![CDATA[
 858      Z> find text = /relevant (plant and soil)
 859      ]]>
 860      </screen>
 861    </para>
 862
 863    <para>
 864     By the way, the same configuration can be used to
 865     search using client-side CQL-to-PQF conversion:
 866     (the only difference is <literal>querytype cql2rpn</literal>
 867     instead of
 868     <literal>querytype cql</literal>, and the call specifying a local
 869     conversion file)
 870     <screen>
 871     <![CDATA[
 872      yaz-client -q local/cql2pqf.txt localhost:9999
 873      Z> querytype cql2rpn
 874      Z> find text=(plant and soil)
 875      ]]>
 876      </screen>
 877    </para>
 878
 879    <para>
 880     Exhaustive information can be found in the
 881     Section "Specification of CQL to RPN mappings" in the YAZ manual.
 882     <ulink url="http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map">
 883      http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map</ulink>,
 884    and shall therefore not be repeated here.
 885    </para>
 886   <!--
 887   <para>
 888     See
 889       <ulink url="http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html">
 890       http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html</ulink>
 891     for the Maintenance Agency's work-in-progress mapping of Dublin Core
 892     indexes to Attribute Architecture (util, XD and BIB-2)
 893     attributes.
 894    </para>
 895    -->
 896  </sect1>
 897
 898
 899
 900 <!--
 901   <sect1 id="architecture-querylanguage">
 902    <title>Query Languages</title>
 903
 904    <para>
 905
 906 http://www.loc.gov/z3950/agency/document.html
 907
 908     PQF and BIB-1 stuff to be explained
 909     <ulink url="&url.z39.50.attset.bib1;">
 910      http://www.loc.gov/z3950/agency/defns/bib1.html</ulink>
 911
 912      <ulink url="&url.z39.50.attset.bib1.1995;">
 913      http://www.loc.gov/z3950/agency/bib1.html</ulink>
 914
 915      http://www.loc.gov/z3950/agency/markup/13.html
 916
 917   </para>
 918   </sect1>
 919
 920
 921 These attribute types are recognized regardless of attribute set. Some are recognized for search, others for scan.
 922
 923 Search
 924
 925 Type    Name    Version
 926 7       Embedded Sort   1.1
 927 8       Term Set        1.1
 928 9       Rank weight     1.1
 929 9       Approx Limit    1.4
 930 10      Term Ref        1.4
 931
 932 Embedded Sort
 933
 934 The embedded sort is a way to specify sort within a query - thus removing the need to send a Sort Request separately. It is both faster and does not require clients that deal with the Sort Facility.
 935
 936 The value after attribute type 7 is 1=ascending, 2=descending.. The attributes+term (APT) node is separate from the rest and must be @or'ed. The term associated with APT is the level .. 0=primary sort, 1=secondary sort etc.. Example:
 937
 938 Search for water, sort by title (ascending):
 939
 940   @or @attr 1=1016 water @attr 7=1 @attr 1=4 0
 941
 942 Search for water, sort by title ascending, then date descending:
 943
 944   @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1
 945
 946 Term Set
 947
 948 The Term Set feature is a facility that allows a search to store hitting terms in a "pseudo" resultset; thus a search (as usual) + a scan-like facility. Requires a client that can do named result sets since the search generates two result sets. The value for attribute 8 is the name of a result set (string). The terms in term set are returned as SUTRS records.
 949
 950 Seach for u in title, right truncated.. Store result in result set named uset.
 951
 952   @attr 5=1 @attr 1=4 @attr 8=uset u
 953
 954 The model as one serious flaw.. We don't know the size of term set.
 955
 956 Rank weight
 957
 958 Rank weight is a way to pass a value to a ranking algorithm - so that one APT has one value - while another as a different one.
 959
 960 Search for utah in title with weight 30 as well as any with weight 20.
 961
 962   @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 utah
 963
 964 Approx Limit
 965
 966 Newer Zebra versions normally estemiates hit count for every APT (leaf) in the query tree. These hit counts are returned as part of the searchResult-1 facility.
 967
 968 By setting a limit for the APT we can make Zebra turn into approximate hit count when a certain hit count limit is reached. A value of zero means exact hit count.
 969
 970 We are intersted in exact hit count for a, but for b we allow estimates for 1000 and higher..
 971
 972   @and a @attr 9=1000 b
 973
 974 This facility clashes with rank weight! Fortunately this is a Zebra 1.4 thing so we can change this without upsetting anybody!
 975
 976 Term Ref
 977
 978 Zebra supports the searchResult-1 facility.
 979
 980 If attribute 10 is given, that specifies a subqueryId value returned as part of the search result. It is a way for a client to name an APT part of a query.
 981
 982 Scan
 983
 984 Type    Name    Version
 985 8       Result set narrow       1.3
 986 9       Approx Limit    1.4
 987
 988 Result set narrow
 989
 990 If attribute 8 is given for scan, the value is the name of a result set. Each hit count in scan is @and'ed with the result set given.
 991
 992 Approx limit
 993
 994 The approx (as for search) is a way to enable approx hit counts for scan hit counts. However, it does NOT appear to work at the moment.
 995
 996
 997  AdamDickmeiss - 19 Dec 2005
 998
 999
1000 -->
1001
1002 </chapter>
1003
1004  <!-- Keep this comment at the end of the file
1005  Local variables:
1006  mode: sgml
1007  sgml-omittag:t
1008  sgml-shorttag:t
1009  sgml-minimize-attributes:nil
1010  sgml-always-quote-attributes:t
1011  sgml-indent-step:1
1012  sgml-indent-data:t
1013  sgml-parent-document: "zebra.xml"
1014  sgml-local-catalogs: nil
1015  sgml-namecase-general:t
1016  End:
1017  -->