doc/querymodel.xml

   1  <chapter id="querymodel">
   2   <title>Query Model</title>
   3
   4   <section id="querymodel-overview">
   5    <title>Query Model Overview</title>
   6
   7    <section id="querymodel-query-languages">
   8     <title>Query Languages</title>
   9
  10     <para>
  11      &zebra; is born as a networking Information Retrieval engine adhering
  12      to the international standards
  13      <ulink url="&url.z39.50;">&acro.z3950;</ulink> and
  14      <ulink url="&url.sru;">&acro.sru;</ulink>,
  15      and implement the
  16      type-1 Reverse Polish Notation (&acro.rpn;) query
  17      model defined there.
  18      Unfortunately, this model has only defined a binary
  19      encoded representation, which is used as transport packaging in
  20      the &acro.z3950; protocol layer. This representation is not human
  21      readable, nor defines any convenient way to specify queries.
  22     </para>
  23     <para>
  24      Since the type-1 (&acro.rpn;)
  25      query structure has no direct, useful string
  26      representation, every client application needs to provide some
  27      form of mapping from a local query notation or representation to it.
  28     </para>
  29
  30
  31     <section id="querymodel-query-languages-pqf">
  32      <title>Prefix Query Format (&acro.pqf;)</title>
  33      <para>
  34       Index Data has defined a textual representation in the
  35       <ulink url="&url.yaz.pqf;">Prefix Query Format</ulink>, short
  36       <emphasis>&acro.pqf;</emphasis>, which maps
  37       one-to-one to binary encoded
  38       <emphasis>type-1 &acro.rpn;</emphasis> queries.
  39       &acro.pqf; has been adopted by other
  40       parties developing &acro.z3950; software, and is often referred to as
  41       <emphasis>Prefix Query Notation</emphasis>, or in short
  42       &acro.pqn;. See
  43       <xref linkend="querymodel-rpn"/> for further explanations and
  44       descriptions of &zebra;'s capabilities.
  45      </para>
  46     </section>
  47
  48     <section id="querymodel-query-languages-cql">
  49      <title>Common Query Language (&acro.cql;)</title>
  50      <para>
  51       The query model of the type-1 &acro.rpn;,
  52       expressed in &acro.pqf;/&acro.pqn; is natively supported.
  53       On the other hand, the default &acro.sru;
  54       web services <emphasis>Common Query Language</emphasis>
  55       <ulink url="&url.cql;">&acro.cql;</ulink> is not natively supported.
  56      </para>
  57      <para>
  58       &zebra; can be configured to understand and map &acro.cql; to &acro.pqf;. See
  59       <xref linkend="querymodel-cql-to-pqf"/>.
  60      </para>
  61     </section>
  62
  63    </section>
  64
  65    <section id="querymodel-operation-types">
  66     <title>Operation types</title>
  67     <para>
  68      &zebra; supports all of the three different
  69      &acro.z3950;/&acro.sru; operations defined in the
  70      standards: explain, search,
  71      and scan. A short description of the
  72      functionality and purpose of each is quite in order here.
  73     </para>
  74
  75     <section id="querymodel-operation-type-explain">
  76      <title>Explain Operation</title>
  77      <para>
  78       The <emphasis>syntax</emphasis> of &acro.z3950;/&acro.sru; queries is
  79       well known to any client, but the specific
  80       <emphasis>semantics</emphasis> - taking into account a
  81       particular servers functionalities and abilities - must be
  82       discovered from case to case. Enters the
  83       explain operation, which provides the means for learning which
  84       <emphasis>fields</emphasis> (also called
  85       <emphasis>indexes</emphasis> or <emphasis>access points</emphasis>)
  86       are provided, which default parameter the server uses, which
  87       retrieve document formats are defined, and which specific parts
  88       of the general query model are supported.
  89      </para>
  90      <para>
  91       The &acro.z3950; embeds the explain operation
  92       by performing a
  93       search in the magic
  94       <literal>IR-Explain-1</literal> database;
  95       see <xref linkend="querymodel-exp1"/>.
  96      </para>
  97      <para>
  98       In &acro.sru;, explain is an entirely  separate
  99       operation, which returns an ZeeRex &acro.xml; record according to the
 100       structure defined by the protocol.
 101      </para>
 102      <para>
 103       In both cases, the information gathered through
 104       explain operations can be used to
 105       auto-configure a client user interface to the servers
 106       capabilities.
 107      </para>
 108     </section>
 109
 110     <section id="querymodel-operation-type-search">
 111      <title>Search Operation</title>
 112      <para>
 113       Search and retrieve interactions are the raison d'être.
 114       They are used to query the remote database and
 115       return search result documents.  Search queries span from
 116       simple free text searches to nested complex boolean queries,
 117       targeting specific indexes, and possibly enhanced with many
 118       query semantic specifications. Search interactions are the heart
 119       and soul of &acro.z3950;/&acro.sru; servers.
 120      </para>
 121     </section>
 122
 123     <section id="querymodel-operation-type-scan">
 124      <title>Scan Operation</title>
 125      <para>
 126       The scan operation is a helper functionality,
 127        which operates on one index or access point a time.
 128      </para>
 129      <para>
 130       It provides
 131       the means to investigate the content of specific indexes.
 132       Scanning an index returns a handful of terms actually found in
 133       the indexes, and in addition the scan
 134       operation returns the number of documents indexed by each term.
 135       A search client can use this information to propose proper
 136       spelling of search terms, to auto-fill search boxes, or to
 137       display  controlled vocabularies.
 138      </para>
 139     </section>
 140
 141    </section>
 142
 143  </section>
 144
 145
 146   <section id="querymodel-rpn">
 147    <title>&acro.rpn; queries and semantics</title>
 148    <para>
 149     The <ulink url="&url.yaz.pqf;">&acro.pqf; grammar</ulink>
 150     is documented in the &yaz; manual, and shall not be
 151     repeated here. This textual &acro.pqf; representation
 152     is not transmistted to &zebra; during search, but it is in the
 153     client mapped to the equivalent &acro.z3950; binary
 154     query parse tree.
 155    </para>
 156
 157    <section id="querymodel-rpn-tree">
 158     <title>&acro.rpn; tree structure</title>
 159     <para>
 160      The &acro.rpn; parse tree - or the equivalent textual representation in &acro.pqf; -
 161      may start with one specification of the
 162      <emphasis>attribute set</emphasis> used. Following is a query
 163      tree, which
 164      consists of <emphasis>atomic query parts (&acro.apt;)</emphasis> or
 165      <emphasis>named result sets</emphasis>, eventually
 166      paired by <emphasis>boolean binary operators</emphasis>, and
 167      finally  <emphasis>recursively combined </emphasis> into
 168      complex query trees.
 169     </para>
 170
 171     <section id="querymodel-attribute-sets">
 172      <title>Attribute sets</title>
 173      <para>
 174       Attribute sets define the exact meaning and semantics of queries
 175       issued. &zebra; comes with some predefined attribute set
 176       definitions, others can easily be defined and added to the
 177       configuration.
 178      </para>
 179
 180      <table id="querymodel-attribute-sets-table" frame="top">
 181       <title>Attribute sets predefined in &zebra;</title>
 182       <tgroup cols="4">
 183        <thead>
 184         <row>
 185          <entry>Attribute set</entry>
 186          <entry>&acro.pqf; notation (Short hand)</entry>
 187          <entry>Status</entry>
 188          <entry>Notes</entry>
 189         </row>
 190        </thead>
 191
 192        <tbody>
 193         <row>
 194          <entry>Explain</entry>
 195          <entry><literal>exp-1</literal></entry>
 196          <entry>Special attribute set used on the special automagic
 197           <literal>IR-Explain-1</literal> database to gain information on
 198           server capabilities, database names, and database
 199           and semantics.</entry>
 200          <entry>predefined</entry>
 201         </row>
 202         <row>
 203          <entry>&acro.bib1;</entry>
 204          <entry><literal>bib-1</literal></entry>
 205          <entry>Standard &acro.pqf; query language attribute set which defines the
 206           semantics of &acro.z3950; searching. In addition, all of the
 207           non-use attributes (types 2-14) define the hard-wired
 208           &zebra; internal query
 209           processing.</entry>
 210          <entry>default</entry>
 211         </row>
 212         <row>
 213          <entry>GILS</entry>
 214          <entry><literal>gils</literal></entry>
 215          <entry>Extension to the &acro.bib1; attribute set.</entry>
 216          <entry>predefined</entry>
 217         </row>
 218         <!--
 219         <row>
 220         <entry>&acro.idxpath;</entry>
 221         <entry><literal>idxpath</literal></entry>
 222         <entry>Hardwired &acro.xpath; like attribute set, only available for
 223         indexing with the &acro.grs1; record model</entry>
 224         <entry>deprecated</entry>
 225        </row>
 226         -->
 227        </tbody>
 228       </tgroup>
 229      </table>
 230
 231      <para>
 232       The use attributes (type 1) mappings  the
 233       predefined attribute sets are found in the
 234       attribute set configuration files <filename>tab/*.att</filename>.
 235      </para>
 236
 237      <note>
 238       <para>
 239        The &zebra; internal query processing is modeled after
 240        the &acro.bib1; attribute set, and the non-use
 241        attributes type 2-6 are hard-wired in. It is therefore essential
 242        to be familiar with <xref linkend="querymodel-bib1-nonuse"/>.
 243       </para>
 244      </note>
 245
 246     </section>
 247
 248     <section id="querymodel-boolean-operators">
 249      <title>Boolean operators</title>
 250      <para>
 251       A pair of sub query trees, or of atomic queries, is combined
 252       using the standard boolean operators into new query trees.
 253       Thus, boolean operators are always internal nodes in the query tree.
 254      </para>
 255
 256      <table id="querymodel-boolean-operators-table" frame="top">
 257       <title>Boolean operators</title>
 258       <tgroup cols="3">
 259        <thead>
 260         <row>
 261          <entry>Keyword</entry>
 262          <entry>Operator</entry>
 263          <entry>Description</entry>
 264         </row>
 265        </thead>
 266        <tbody>
 267         <row><entry><literal>@and</literal></entry>
 268          <entry>binary AND operator</entry>
 269          <entry>Set intersection of two atomic queries hit sets</entry>
 270         </row>
 271         <row><entry><literal>@or</literal></entry>
 272          <entry>binary OR operator</entry>
 273          <entry>Set union of two atomic queries hit sets</entry>
 274         </row>
 275         <row><entry><literal>@not</literal></entry>
 276          <entry>binary AND NOT operator</entry>
 277          <entry>Set complement of two atomic queries hit sets</entry>
 278         </row>
 279         <row><entry><literal>@prox</literal></entry>
 280          <entry>binary PROXIMITY operator</entry>
 281          <entry>Set intersection of two atomic queries hit sets. In
 282           addition, the intersection set is purged for all
 283           documents which do not satisfy the requested query
 284           term proximity. Usually a proper subset of the AND
 285           operation.</entry>
 286         </row>
 287        </tbody>
 288       </tgroup>
 289      </table>
 290
 291      <para>
 292       For example, we can combine the terms
 293       <emphasis>information</emphasis> and <emphasis>retrieval</emphasis>
 294       into different searches in the default index of the default
 295       attribute set as follows.
 296       Querying for the union of all documents containing the
 297       terms <emphasis>information</emphasis> OR
 298       <emphasis>retrieval</emphasis>:
 299       <screen>
 300        Z> find @or information retrieval
 301       </screen>
 302      </para>
 303      <para>
 304       Querying for the intersection of all documents containing the
 305       terms <emphasis>information</emphasis> AND
 306       <emphasis>retrieval</emphasis>:
 307       The hit set is a subset of the corresponding
 308       OR query.
 309       <screen>
 310        Z> find @and information retrieval
 311       </screen>
 312      </para>
 313      <para>
 314       Querying for the intersection of all documents containing the
 315       terms <emphasis>information</emphasis> AND
 316       <emphasis>retrieval</emphasis>, taking proximity into account:
 317       The hit set is a subset of the corresponding
 318       AND query
 319       (see the <ulink url="&url.yaz.pqf;">&acro.pqf; grammar</ulink> for
 320       details on the proximity operator):
 321       <screen>
 322        Z> find @prox 0 3 0 2 k 2 information retrieval
 323       </screen>
 324      </para>
 325      <para>
 326       Querying for the intersection of all documents containing the
 327       terms <emphasis>information</emphasis> AND
 328       <emphasis>retrieval</emphasis>, in the same order and near each
 329       other as described in the term list.
 330       The hit set is a subset of the corresponding
 331       PROXIMITY query.
 332       <screen>
 333        Z> find "information retrieval"
 334       </screen>
 335      </para>
 336     </section>
 337
 338
 339     <section id="querymodel-atomic-queries">
 340      <title>Atomic queries (&acro.apt;)</title>
 341      <para>
 342       Atomic queries are the query parts which work on one access point
 343       only. These consist of <emphasis>an attribute list</emphasis>
 344       followed by a <emphasis>single term</emphasis> or a
 345       <emphasis>quoted term list</emphasis>, and are often called
 346       <emphasis>Attributes-Plus-Terms (&acro.apt;)</emphasis> queries.
 347      </para>
 348      <para>
 349       Atomic (&acro.apt;) queries are always leaf nodes in the &acro.pqf; query tree.
 350       UN-supplied non-use attributes types 2-12 are either inherited from
 351       higher nodes in the query tree, or are set to &zebra;'s default values.
 352       See <xref linkend="querymodel-bib1"/> for details.
 353      </para>
 354
 355      <table id="querymodel-atomic-queries-table" frame="top">
 356       <title>Atomic queries (&acro.apt;)</title>
 357       <tgroup cols="3">
 358        <thead>
 359         <row>
 360          <entry>Name</entry>
 361          <entry>Type</entry>
 362          <entry>Notes</entry>
 363         </row>
 364       </thead>
 365        <tbody>
 366         <row>
 367          <entry><emphasis>attribute list</emphasis></entry>
 368          <entry>List of <emphasis>orthogonal</emphasis> attributes</entry>
 369          <entry>Any of the orthogonal attribute types may be omitted,
 370           these are inherited from higher query tree nodes, or if not
 371           inherited, are set to the default &zebra; configuration values.
 372          </entry>
 373         </row>
 374         <row>
 375          <entry><emphasis>term</emphasis></entry>
 376          <entry>single <emphasis>term</emphasis>
 377           or <emphasis>quoted term list</emphasis>   </entry>
 378          <entry>Here the search terms or list of search terms is added
 379           to the query</entry>
 380         </row>
 381        </tbody>
 382       </tgroup>
 383      </table>
 384      <para>
 385       Querying for the term <emphasis>information</emphasis> in the
 386       default index using the default attribute set, the server choice
 387       of access point/index, and the default non-use attributes.
 388       <screen>
 389        Z> find information
 390       </screen>
 391      </para>
 392      <para>
 393       Equivalent query fully specified including all default values:
 394       <screen>
 395        Z> find @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 information
 396       </screen>
 397      </para>
 398
 399      <para>
 400       Finding all documents which have the term
 401       <emphasis>debussy</emphasis> in the title field.
 402       <screen>
 403        Z> find @attr 1=4 debussy
 404       </screen>
 405      </para>
 406
 407      <para>
 408       The <emphasis>scan</emphasis> operation is only supported with
 409       atomic &acro.apt; queries, as it is bound to one access point at a
 410       time. Boolean query trees are not allowed during
 411       <emphasis>scan</emphasis>.
 412       </para>
 413
 414      <para>
 415       For example, we might want to scan the title index, starting with
 416       the term
 417       <emphasis>debussy</emphasis>, and displaying this and the
 418       following terms in lexicographic order:
 419       <screen>
 420        Z> scan @attr 1=4 debussy
 421       </screen>
 422      </para>
 423     </section>
 424
 425
 426     <section id="querymodel-resultset">
 427      <title>Named Result Sets</title>
 428      <para>
 429       Named result sets are supported in &zebra;, and result sets can be
 430       used as operands without limitations. It follows that named
 431       result sets are leaf nodes in the &acro.pqf; query tree, exactly as
 432       atomic &acro.apt; queries are.
 433      </para>
 434      <para>
 435       After the execution of a search, the result set is available at
 436       the server, such that the client can use it for subsequent
 437       searches or retrieval requests. The Z30.50 standard actually
 438       stresses the fact that result sets are volatile. It may cease
 439       to exist at any time point after search, and the server will
 440       send a diagnostic to the effect that the requested
 441       result set does not exist any more.
 442      </para>
 443
 444      <para>
 445       Defining a named result set and re-using it in the next query,
 446       using <application>yaz-client</application>. Notice that the client, not
 447       the server, assigns the string '1' to the
 448       named result set.
 449       <screen>
 450        Z> f @attr 1=4 mozart
 451        ...
 452        Number of hits: 43, setno 1
 453        ...
 454        Z> f @and @set 1 @attr 1=4 amadeus
 455        ...
 456        Number of hits: 14, setno 2
 457       </screen>
 458      </para>
 459
 460      <note>
 461       <para>
 462        Named result sets are only supported by the &acro.z3950; protocol.
 463        The &acro.sru; web service is stateless, and therefore the notion of
 464        named result sets does not exist when accessing a &zebra; server by
 465        the &acro.sru; protocol.
 466       </para>
 467      </note>
 468     </section>
 469
 470     <section id="querymodel-use-string">
 471      <title>&zebra;'s special access point of type 'string'</title>
 472      <para>
 473       The numeric <emphasis>use (type 1)</emphasis> attribute is usually
 474       referred to from a given
 475       attribute set. In addition, &zebra; let you use
 476       <emphasis>any internal index
 477        name defined in your configuration</emphasis>
 478       as use attribute value. This is a great feature for
 479       debugging, and when you do
 480       not need the complexity of defined use attribute values. It is
 481       the preferred way of accessing &zebra; indexes directly.
 482      </para>
 483      <para>
 484       Finding all documents which have the term list "information
 485       retrieval" in an &zebra; index, using its internal full string
 486       name. Scanning the same index.
 487       <screen>
 488        Z> find @attr 1=sometext "information retrieval"
 489        Z> scan @attr 1=sometext aterm
 490       </screen>
 491      </para>
 492      <para>
 493       Searching or scanning
 494       the bib-1 use attribute 54 using its string name:
 495       <screen>
 496        Z> find @attr 1=Code-language eng
 497        Z> scan @attr 1=Code-language ""
 498       </screen>
 499      </para>
 500      <para>
 501       It is possible to search
 502       in any silly string index - if it's defined in your
 503       indexation rules and can be parsed by the &acro.pqf; parser.
 504       This is definitely not the recommended use of
 505       this facility, as it might confuse your users with some very
 506       unexpected results.
 507       <screen>
 508        Z> find @attr 1=silly/xpath/alike[@index]/name "information retrieval"
 509       </screen>
 510      </para>
 511      <para>
 512       See also <xref linkend="querymodel-pqf-apt-mapping"/> for details, and
 513       <xref linkend="zebrasrv-sru"/>
 514       for the &acro.sru; &acro.pqf; query extension using string names as a fast
 515       debugging facility.
 516      </para>
 517     </section>
 518
 519     <section id="querymodel-use-xpath">
 520      <title>&zebra;'s special access point of type 'XPath'
 521       for &acro.grs1; filters</title>
 522      <para>
 523       As we have seen above, it is possible (albeit seldom a great
 524       idea) to emulate
 525       <ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink> based
 526       search by defining <emphasis>use (type 1)</emphasis>
 527       <emphasis>string</emphasis> attributes which in appearance
 528       <emphasis>resemble XPath queries</emphasis>. There are two
 529       problems with this approach: first, the XPath-look-alike has to
 530       be defined at indexation time, no new undefined
 531       XPath queries can entered at search time, and second, it might
 532       confuse users very much that an XPath-alike index name in fact
 533       gets populated from a possible entirely different &acro.xml; element
 534       than it pretends to access.
 535      </para>
 536      <para>
 537       When using the &acro.grs1; Record Model
 538       (see  <xref linkend="grs"/>), we have the
 539       possibility to embed <emphasis>life</emphasis>
 540       XPath expressions
 541       in the &acro.pqf; queries, which are here called
 542       <emphasis>use (type 1)</emphasis> <emphasis>xpath</emphasis>
 543       attributes. You must enable the
 544       <literal>xpath enable</literal> directive in your
 545       <literal>.abs</literal> configuration files.
 546      </para>
 547      <note>
 548       <para>
 549        Only a <emphasis>very</emphasis> restricted subset of the
 550        <ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink>
 551        standard is supported as the &acro.grs1; record model is simpler than
 552        a full &acro.xml; &acro.dom; structure. See the following examples for
 553        possibilities.
 554       </para>
 555      </note>
 556      <para>
 557       Finding all documents which have the term "content"
 558       inside a text node found in a specific &acro.xml; &acro.dom;
 559       <emphasis>subtree</emphasis>, whose starting element is
 560       addressed by XPath.
 561       <screen>
 562        Z> find @attr 1=/root content
 563        Z> find @attr 1=/root/first content
 564       </screen>
 565       <emphasis>Notice that the
 566        XPath must be absolute, i.e., must start with '/', and that the
 567        XPath <literal>descendant-or-self</literal> axis followed by a
 568        text node selection <literal>text()</literal> is implicitly
 569        appended to the stated XPath.
 570       </emphasis>
 571       It follows that the above searches are interpreted as:
 572       <screen>
 573        Z> find @attr 1=/root//text() content
 574        Z> find @attr 1=/root/first//text() content
 575       </screen>
 576      </para>
 577
 578      <para>
 579       Searching inside attribute strings is possible:
 580       <screen>
 581        Z> find @attr 1=/link/@creator morten
 582       </screen>
 583       </para>
 584
 585      <para>
 586       Filter the addressing XPath by a predicate working on exact
 587       string values in
 588       attributes (in the &acro.xml; sense) can be done: return all those docs which
 589       have the term "english" contained in one of all text sub nodes of
 590       the subtree defined by the XPath
 591       <literal>/record/title[@lang='en']</literal>. And similar
 592       predicate filtering.
 593       <screen>
 594        Z> find @attr 1=/record/title[@lang='en'] english
 595        Z> find @attr 1=/link[@creator='sisse'] sibelius
 596        Z> find @attr 1=/link[@creator='sisse']/description[@xml:lang='da'] sibelius
 597       </screen>
 598      </para>
 599
 600      <para>
 601       Combining numeric indexes, boolean expressions,
 602       and xpath based searches is possible:
 603       <screen>
 604        Z> find @attr 1=/record/title @and foo bar
 605        Z> find @and @attr 1=/record/title foo @attr 1=4 bar
 606       </screen>
 607      </para>
 608      <para>
 609       Escaping &acro.pqf; keywords and other non-parseable XPath constructs
 610       with <literal>'{ }'</literal> to prevent client-side &acro.pqf; parsing
 611       syntax errors:
 612       <screen>
 613        Z> find @attr {1=/root/first[@attr='danish']} content
 614        Z> find @attr {1=/record/@set} oai
 615       </screen>
 616      </para>
 617      <warning>
 618       <para>
 619        It is worth mentioning that these dynamic performed XPath
 620        queries are a performance bottleneck, as no optimized
 621        specialized indexes can be used. Therefore, avoid the use of
 622        this facility when speed is essential, and the database content
 623        size is medium to large.
 624       </para>
 625      </warning>
 626     </section>
 627    </section>
 628
 629    <section id="querymodel-exp1">
 630     <title>Explain Attribute Set</title>
 631     <para>
 632      The &acro.z3950; standard defines the
 633      <ulink url="&url.z39.50.explain;">Explain</ulink> attribute set
 634      Exp-1, which is used to discover information
 635      about a server's search semantics and functional capabilities
 636      &zebra; exposes a  "classic"
 637      Explain database by base name <literal>IR-Explain-1</literal>, which
 638      is populated with system internal information.
 639     </para>
 640    <para>
 641      The attribute-set <literal>exp-1</literal> consists of a single
 642      use attribute (type 1).
 643     </para>
 644     <para>
 645      In addition, the non-Use
 646      &acro.bib1; attributes, that is, the types
 647      <emphasis>Relation</emphasis>, <emphasis>Position</emphasis>,
 648      <emphasis>Structure</emphasis>, <emphasis>Truncation</emphasis>,
 649      and <emphasis>Completeness</emphasis> are imported from
 650      the &acro.bib1; attribute set, and may be used
 651      within any explain query.
 652     </para>
 653
 654     <section id="querymodel-exp1-use">
 655     <title>Use Attributes (type = 1)</title>
 656      <para>
 657       The following Explain search attributes are supported:
 658       <literal>ExplainCategory</literal> (@attr 1=1),
 659       <literal>DatabaseName</literal> (@attr 1=3),
 660       <literal>DateAdded</literal> (@attr 1=9),
 661       <literal>DateChanged</literal>(@attr 1=10).
 662      </para>
 663      <para>
 664       A search in the use attribute  <literal>ExplainCategory</literal>
 665       supports only these predefined values:
 666       <literal>CategoryList</literal>, <literal>TargetInfo</literal>,
 667       <literal>DatabaseInfo</literal>, <literal>AttributeDetails</literal>.
 668      </para>
 669      <para>
 670       See <filename>tab/explain.att</filename> and the
 671       <ulink url="&url.z39.50;">&acro.z3950;</ulink> standard
 672       for more information.
 673      </para>
 674     </section>
 675
 676     <section id="querymodel-examples">
 677      <title>Explain searches with yaz-client</title>
 678      <para>
 679       Classic Explain only defines retrieval of Explain information
 680       via ASN.1. Practically no &acro.z3950; clients supports this. Fortunately
 681       they don't have to - &zebra; allows retrieval of this information
 682       in other formats:
 683       <literal>&acro.sutrs;</literal>, <literal>&acro.xml;</literal>,
 684       <literal>&acro.grs1;</literal> and  <literal>ASN.1</literal> Explain.
 685      </para>
 686
 687      <para>
 688       List supported categories to find out which explain commands are
 689       supported:
 690       <screen>
 691        Z> base IR-Explain-1
 692        Z> find @attr exp1 1=1 categorylist
 693        Z> form sutrs
 694        Z> show 1+2
 695       </screen>
 696      </para>
 697
 698      <para>
 699       Get target info, that is, investigate which databases exist at
 700       this server endpoint:
 701       <screen>
 702        Z> base IR-Explain-1
 703        Z> find @attr exp1 1=1 targetinfo
 704        Z> form xml
 705        Z> show 1+1
 706        Z> form grs-1
 707        Z> show 1+1
 708        Z> form sutrs
 709        Z> show 1+1
 710       </screen>
 711      </para>
 712
 713      <para>
 714       List all supported databases, the number of hits
 715       is the number of databases found, which most commonly are the
 716       following two:
 717       the <literal>Default</literal> and the
 718       <literal>IR-Explain-1</literal> databases.
 719       <screen>
 720        Z> base IR-Explain-1
 721        Z> find @attr exp1 1=1 databaseinfo
 722        Z> form sutrs
 723        Z> show 1+2
 724       </screen>
 725      </para>
 726
 727      <para>
 728       Get database info record for database <literal>Default</literal>.
 729       <screen>
 730        Z> base IR-Explain-1
 731        Z> find @and @attr exp1 1=1 databaseinfo @attr exp1 1=3 Default
 732       </screen>
 733       Identical query with explicitly specified attribute set:
 734       <screen>
 735        Z> base IR-Explain-1
 736        Z> find @attrset exp1 @and @attr 1=1 databaseinfo @attr 1=3 Default
 737       </screen>
 738      </para>
 739
 740      <para>
 741       Get attribute details record for database
 742       <literal>Default</literal>.
 743       This query is very useful to study the internal &zebra; indexes.
 744       If records have been indexed using the <literal>alvis</literal>
 745       &acro.xslt; filter, the string representation names of the known indexes can be
 746       found.
 747       <screen>
 748        Z> base IR-Explain-1
 749        Z> find @and @attr exp1 1=1 attributedetails @attr exp1 1=3 Default
 750       </screen>
 751       Identical query with explicitly specified attribute set:
 752       <screen>
 753        Z> base IR-Explain-1
 754        Z> find @attrset exp1 @and @attr 1=1 attributedetails @attr 1=3 Default
 755       </screen>
 756      </para>
 757     </section>
 758
 759    </section>
 760
 761    <section id="querymodel-bib1">
 762     <title>&acro.bib1; Attribute Set</title>
 763     <para>
 764      Most of the information contained in this section is an excerpt of
 765      the ATTRIBUTE SET &acro.bib1; (&acro.z3950;-1995) SEMANTICS
 766      found at <ulink url="&url.z39.50.attset.bib1.1995;">. The &acro.bib1;
 767       Attribute Set Semantics</ulink> from 1995, also in an updated
 768      <ulink url="&url.z39.50.attset.bib1;">&acro.bib1;
 769       Attribute Set</ulink>
 770      version from 2003. Index Data is not the copyright holder of this
 771      information, except for the configuration details, the listing of
 772      &zebra;'s capabilities, and the example queries.
 773     </para>
 774
 775
 776    <section id="querymodel-bib1-use">
 777      <title>Use Attributes (type 1)</title>
 778
 779     <para>
 780      A use attribute specifies an access point for any atomic query.
 781      These access points are highly dependent on the attribute set used
 782      in the query, and are user configurable using the following
 783      default configuration files:
 784      <filename>tab/bib1.att</filename>,
 785      <filename>tab/dan1.att</filename>,
 786      <filename>tab/explain.att</filename>, and
 787      <filename>tab/gils.att</filename>.
 788      </para>
 789     <para>
 790       For example, some few &acro.bib1; use
 791       attributes from the  <filename>tab/bib1.att</filename> are:
 792       <screen>
 793        att 1               Personal-name
 794        att 2               Corporate-name
 795        att 3               Conference-name
 796        att 4               Title
 797        ...
 798        att 1009            Subject-name-personal
 799        att 1010            Body-of-text
 800        att 1011            Date/time-added-to-db
 801        ...
 802        att 1016            Any
 803        att 1017            Server-choice
 804        att 1018            Publisher
 805        ...
 806        att 1035            Anywhere
 807        att 1036            Author-Title-Subject
 808       </screen>
 809      </para>
 810     <para>
 811      New attribute sets can be added by adding new
 812      <filename>tab/*.att</filename> configuration files, which need to
 813      be sourced in the main configuration <filename>zebra.cfg</filename>.
 814      </para>
 815     <para>
 816       In addition, &zebra; allows the access of
 817      <emphasis>internal index names</emphasis> and <emphasis>dynamic
 818      XPath</emphasis> as use attributes; see
 819       <xref linkend="querymodel-use-string"/> and
 820      <xref linkend="querymodel-use-xpath"/>.
 821     </para>
 822
 823     <para>
 824      Phrase search for <emphasis>information retrieval</emphasis> in
 825      the title-register, scanning the same register afterwards:
 826      <screen>
 827       Z> find @attr 1=4 "information retrieval"
 828       Z> scan @attr 1=4 information
 829      </screen>
 830     </para>
 831     </section>
 832
 833    </section>
 834
 835
 836    <section id="querymodel-bib1-nonuse">
 837      <title>&zebra; general Bib1 Non-Use Attributes (type 2-6)</title>
 838
 839     <section id="querymodel-bib1-relation">
 840      <title>Relation Attributes (type 2)</title>
 841
 842      <para>
 843       Relation attributes describe the relationship of the access
 844       point (left side
 845       of the relation) to the search term as qualified by the attributes (right
 846       side of the relation), e.g., Date-publication &lt;= 1975.
 847       </para>
 848
 849      <table id="querymodel-bib1-relation-table" frame="top">
 850       <title>Relation Attributes (type 2)</title>
 851       <tgroup cols="3">
 852        <thead>
 853         <row>
 854          <entry>Relation</entry>
 855          <entry>Value</entry>
 856          <entry>Notes</entry>
 857         </row>
 858        </thead>
 859        <tbody>
 860         <row>
 861          <entry>Less than</entry>
 862          <entry>1</entry>
 863          <entry>supported</entry>
 864         </row>
 865         <row>
 866          <entry>Less than or equal</entry>
 867          <entry>2</entry>
 868          <entry>supported</entry>
 869         </row>
 870         <row>
 871          <entry>Equal</entry>
 872          <entry>3</entry>
 873          <entry>default</entry>
 874         </row>
 875         <row>
 876          <entry>Greater or equal</entry>
 877          <entry>4</entry>
 878          <entry>supported</entry>
 879         </row>
 880         <row>
 881          <entry>Greater than</entry>
 882          <entry>5</entry>
 883          <entry>supported</entry>
 884         </row>
 885         <row>
 886          <entry>Not equal</entry>
 887          <entry>6</entry>
 888          <entry>unsupported</entry>
 889         </row>
 890         <row>
 891          <entry>Phonetic</entry>
 892          <entry>100</entry>
 893          <entry>unsupported</entry>
 894         </row>
 895         <row>
 896          <entry>Stem</entry>
 897          <entry>101</entry>
 898          <entry>unsupported</entry>
 899         </row>
 900         <row>
 901          <entry>Relevance</entry>
 902          <entry>102</entry>
 903          <entry>supported</entry>
 904         </row>
 905         <row>
 906          <entry>AlwaysMatches</entry>
 907          <entry>103</entry>
 908          <entry>supported *</entry>
 909         </row>
 910        </tbody>
 911       </tgroup>
 912      </table>
 913      <note>
 914       <para>
 915        AlwaysMatches searches are only supported if alwaysmatches indexing
 916        has been enabled. See <xref linkend="default-idx-file"/>
 917       </para>
 918       </note>
 919
 920      <para>
 921       The relation attributes 1-5 are supported and work exactly as
 922       expected.
 923       All ordering operations are based on a lexicographical ordering,
 924       <emphasis>except</emphasis> when the
 925       structure attribute numeric (109) is used. In
 926       this case, ordering is numerical. See
 927       <xref linkend="querymodel-bib1-structure"/>.
 928       <screen>
 929        Z> find @attr 1=Title @attr 2=1 music
 930        ...
 931        Number of hits: 11745, setno 1
 932        ...
 933        Z> find @attr 1=Title @attr 2=2 music
 934        ...
 935        Number of hits: 11771, setno 2
 936        ...
 937        Z> find @attr 1=Title @attr 2=3 music
 938        ...
 939        Number of hits: 532, setno 3
 940        ...
 941        Z> find @attr 1=Title @attr 2=4 music
 942        ...
 943        Number of hits: 11463, setno 4
 944        ...
 945        Z> find @attr 1=Title @attr 2=5 music
 946        ...
 947        Number of hits: 11419, setno 5
 948       </screen>
 949      </para>
 950
 951      <para>
 952       The relation attribute
 953       <emphasis>Relevance (102)</emphasis> is supported, see
 954       <xref linkend="administration-ranking"/> for full information.
 955      </para>
 956
 957      <para>
 958       Ranked search for <emphasis>information retrieval</emphasis> in
 959       the title-register:
 960       <screen>
 961        Z> find @attr 1=4 @attr 2=102 "information retrieval"
 962       </screen>
 963      </para>
 964
 965      <para>
 966       The relation attribute
 967       <emphasis>AlwaysMatches (103)</emphasis> is in the default
 968       configuration
 969       supported in conjecture with structure attribute
 970       <emphasis>Phrase (1)</emphasis> (which may be omitted by
 971       default).
 972       It can be configured to work with other structure attributes,
 973       see the configuration file
 974       <filename>tab/default.idx</filename> and
 975        <xref linkend="querymodel-pqf-apt-mapping"/>.
 976      </para>
 977      <para>
 978       <emphasis>AlwaysMatches (103)</emphasis> is a
 979       great way to discover how many documents have been indexed in a
 980       given field. The search term is ignored, but needed for correct
 981       &acro.pqf; syntax. An empty search term may be supplied.
 982       <screen>
 983        Z> find @attr 1=Title  @attr 2=103  ""
 984        Z> find @attr 1=Title  @attr 2=103  @attr 4=1 ""
 985       </screen>
 986      </para>
 987
 988
 989     </section>
 990
 991     <section id="querymodel-bib1-position">
 992      <title>Position Attributes (type 3)</title>
 993
 994      <para>
 995       The position attribute specifies the location of the search term
 996       within the field or subfield in which it appears.
 997      </para>
 998
 999      <table id="querymodel-bib1-position-table" frame="top">
1000       <title>Position Attributes (type 3)</title>
1001       <tgroup cols="3">
1002        <thead>
1003         <row>
1004          <entry>Position</entry>
1005          <entry>Value</entry>
1006          <entry>Notes</entry>
1007         </row>
1008        </thead>
1009        <tbody>
1010         <row>
1011          <entry>First in field </entry>
1012          <entry>1</entry>
1013          <entry>supported *</entry>
1014         </row>
1015         <row>
1016          <entry>First in subfield</entry>
1017          <entry>2</entry>
1018          <entry>supported *</entry>
1019         </row>
1020         <row>
1021          <entry>Any position in field</entry>
1022          <entry>3</entry>
1023          <entry>default</entry>
1024         </row>
1025        </tbody>
1026       </tgroup>
1027      </table>
1028
1029      <note>
1030       <para>
1031        &zebra; only supports first-in-field seaches if the
1032        <literal>firstinfield</literal> is enabled for the index
1033        Refer to <xref linkend="default-idx-file"/>.
1034        &zebra; does not distinguish between first in field and
1035        first in subfield. They result in the same hit count.
1036        Searching for first position in (sub)field in only supported in &zebra;
1037        2.0.2 and later.
1038       </para>
1039      </note>
1040     </section>
1041
1042     <section id="querymodel-bib1-structure">
1043      <title>Structure Attributes (type 4)</title>
1044
1045      <para>
1046       The structure attribute specifies the type of search
1047       term. This causes the search to be mapped on
1048       different &zebra; internal indexes, which must have been defined
1049       at index time.
1050      </para>
1051
1052      <para>
1053       The possible values of the
1054       <literal>structure attribute (type 4)</literal> can be defined
1055       using the configuration file <filename>
1056       tab/default.idx</filename>.
1057       The default configuration is summarized in this table.
1058      </para>
1059
1060      <table id="querymodel-bib1-structure-table" frame="top">
1061       <title>Structure Attributes (type 4)</title>
1062       <tgroup cols="3">
1063        <thead>
1064         <row>
1065          <entry>Structure</entry>
1066          <entry>Value</entry>
1067          <entry>Notes</entry>
1068         </row>
1069        </thead>
1070        <tbody>
1071         <row>
1072          <entry>Phrase </entry>
1073          <entry>1</entry>
1074          <entry>default</entry>
1075         </row>
1076         <row>
1077          <entry>Word</entry>
1078          <entry>2</entry>
1079          <entry>supported</entry>
1080         </row>
1081         <row>
1082          <entry>Key</entry>
1083          <entry>3</entry>
1084          <entry>supported</entry>
1085         </row>
1086         <row>
1087          <entry>Year</entry>
1088          <entry>4</entry>
1089          <entry>supported</entry>
1090         </row>
1091         <row>
1092          <entry>Date (normalized)</entry>
1093          <entry>5</entry>
1094          <entry>supported</entry>
1095         </row>
1096         <row>
1097          <entry>Word list</entry>
1098          <entry>6</entry>
1099          <entry>supported</entry>
1100         </row>
1101         <row>
1102          <entry>Date (un-normalized)</entry>
1103          <entry>100</entry>
1104          <entry>unsupported</entry>
1105         </row>
1106         <row>
1107          <entry>Name (normalized) </entry>
1108          <entry>101</entry>
1109          <entry>unsupported</entry>
1110         </row>
1111         <row>
1112          <entry>Name (un-normalized) </entry>
1113          <entry>102</entry>
1114          <entry>unsupported</entry>
1115         </row>
1116         <row>
1117          <entry>Structure</entry>
1118          <entry>103</entry>
1119          <entry>unsupported</entry>
1120         </row>
1121         <row>
1122          <entry>Urx</entry>
1123          <entry>104</entry>
1124          <entry>supported</entry>
1125         </row>
1126         <row>
1127          <entry>Free-form-text</entry>
1128          <entry>105</entry>
1129          <entry>supported</entry>
1130         </row>
1131         <row>
1132          <entry>Document-text</entry>
1133          <entry>106</entry>
1134          <entry>supported</entry>
1135         </row>
1136         <row>
1137          <entry>Local-number</entry>
1138          <entry>107</entry>
1139          <entry>supported</entry>
1140         </row>
1141         <row>
1142          <entry>String</entry>
1143          <entry>108</entry>
1144          <entry>unsupported</entry>
1145         </row>
1146         <row>
1147          <entry>Numeric string</entry>
1148          <entry>109</entry>
1149          <entry>supported</entry>
1150         </row>
1151        </tbody>
1152       </tgroup>
1153      </table>
1154
1155     <para>
1156      The structure attribute values
1157      <literal>Word list (6)</literal>
1158      is supported, and maps to the boolean <literal>AND</literal>
1159      combination of words supplied. The word list is useful when
1160      google-like bag-of-word queries need to be translated from a GUI
1161      query language to &acro.pqf;.  For example, the following queries
1162      are equivalent:
1163      <screen>
1164       Z> find @attr 1=Title @attr 4=6 "mozart amadeus"
1165       Z> find @attr 1=Title  @and mozart amadeus
1166      </screen>
1167     </para>
1168
1169     <para>
1170      The structure attribute value
1171      <literal>Free-form-text (105)</literal> and
1172      <literal>Document-text (106)</literal>
1173      are supported, and map both to the boolean <literal>OR</literal>
1174      combination of words supplied. The following queries
1175      are equivalent:
1176      <screen>
1177       Z> find @attr 1=Body-of-text @attr 4=105 "bach salieri teleman"
1178       Z> find @attr 1=Body-of-text @attr 4=106 "bach salieri teleman"
1179       Z> find @attr 1=Body-of-text @or bach @or salieri teleman
1180      </screen>
1181      This <literal>OR</literal> list of terms is very useful in
1182      combination with relevance ranking:
1183      <screen>
1184       Z> find @attr 1=Body-of-text @attr 2=102 @attr 4=105 "bach salieri teleman"
1185      </screen>
1186     </para>
1187
1188     <para>
1189      The structure attribute value
1190      <literal>Local number (107)</literal>
1191      is supported, and maps always to the &zebra; internal document ID,
1192      irrespectively which use attribute is specified. The following queries
1193      have exactly the same unique record in the hit set:
1194      <screen>
1195       Z> find @attr 4=107 10
1196       Z> find @attr 1=4 @attr 4=107 10
1197       Z> find @attr 1=1010 @attr 4=107 10
1198      </screen>
1199     </para>
1200
1201     <para>
1202      In
1203      the GILS schema (<literal>gils.abs</literal>), the
1204      west-bounding-coordinate is indexed as type <literal>n</literal>,
1205      and is therefore searched by specifying
1206      <emphasis>structure</emphasis>=<emphasis>Numeric String</emphasis>.
1207      To match all those records with west-bounding-coordinate greater
1208      than -114 we use the following query:
1209      <screen>
1210       Z> find @attr 4=109 @attr 2=5 @attr gils 1=2038 -114
1211      </screen>
1212      </para>
1213      <note>
1214       <para>
1215        The exact mapping between &acro.pqf; queries and &zebra; internal indexes
1216        and index types is explained in
1217        <xref linkend="querymodel-pqf-apt-mapping"/>.
1218       </para>
1219      </note>
1220     </section>
1221
1222     <section id="querymodel-bib1-truncation">
1223      <title>Truncation Attributes (type = 5)</title>
1224
1225      <para>
1226       The truncation attribute specifies whether variations of one or
1227       more characters are allowed between search term and hit terms, or
1228       not. Using non-default truncation attributes will broaden the
1229       document hit set of a search query.
1230      </para>
1231
1232      <table id="querymodel-bib1-truncation-table" frame="top">
1233       <title>Truncation Attributes (type 5)</title>
1234       <tgroup cols="3">
1235        <thead>
1236         <row>
1237          <entry>Truncation</entry>
1238          <entry>Value</entry>
1239          <entry>Notes</entry>
1240         </row>
1241        </thead>
1242        <tbody>
1243         <row>
1244          <entry>Right truncation </entry>
1245          <entry>1</entry>
1246          <entry>supported</entry>
1247         </row>
1248         <row>
1249          <entry>Left truncation</entry>
1250          <entry>2</entry>
1251          <entry>supported</entry>
1252         </row>
1253         <row>
1254          <entry>Left and right truncation</entry>
1255          <entry>3</entry>
1256          <entry>supported</entry>
1257         </row>
1258         <row>
1259          <entry>Do not truncate</entry>
1260          <entry>100</entry>
1261          <entry>default</entry>
1262         </row>
1263         <row>
1264          <entry>Process # in search term</entry>
1265          <entry>101</entry>
1266          <entry>supported</entry>
1267         </row>
1268         <row>
1269          <entry>RegExpr-1 </entry>
1270          <entry>102</entry>
1271          <entry>supported</entry>
1272         </row>
1273         <row>
1274          <entry>RegExpr-2</entry>
1275          <entry>103</entry>
1276          <entry>supported</entry>
1277         </row>
1278        </tbody>
1279       </tgroup>
1280      </table>
1281
1282      <para>
1283       The truncation attribute values 1-3 perform the obvious way:
1284       <screen>
1285        Z> scan @attr 1=Body-of-text  schnittke
1286        ...
1287        * schnittke (81)
1288        schnittkes (31)
1289        schnittstelle (1)
1290        ...
1291        Z> find @attr 1=Body-of-text  @attr 5=1 schnittke
1292        ...
1293        Number of hits: 95, setno 7
1294        ...
1295        Z> find @attr 1=Body-of-text  @attr 5=2 schnittke
1296        ...
1297        Number of hits: 81, setno 6
1298        ...
1299        Z> find @attr 1=Body-of-text  @attr 5=3 schnittke
1300        ...
1301        Number of hits: 95, setno 8
1302       </screen>
1303       </para>
1304
1305      <para>
1306       The truncation attribute value
1307       <literal>Process # in search term (101)</literal> is a
1308       poor-man's regular expression search. It maps
1309       each <literal>#</literal> to <literal>.*</literal>, and
1310       performs then a <literal>Regexp-1 (102)</literal> regular
1311       expression search. The following two queries are equivalent:
1312       <screen>
1313        Z> find @attr 1=Body-of-text  @attr 5=101 schnit#ke
1314        Z> find @attr 1=Body-of-text  @attr 5=102 schnit.*ke
1315        ...
1316        Number of hits: 89, setno 10
1317       </screen>
1318      </para>
1319
1320      <para>
1321       The truncation attribute value
1322        <literal>Regexp-1 (102)</literal> is a normal regular search,
1323       see <xref linkend="querymodel-regular"/> for details.
1324       <screen>
1325        Z> find @attr 1=Body-of-text  @attr 5=102 schnit+ke
1326        Z> find @attr 1=Body-of-text  @attr 5=102 schni[a-t]+ke
1327       </screen>
1328      </para>
1329
1330      <para>
1331        The truncation attribute value
1332       <literal>Regexp-2 (103) </literal> is a &zebra; specific extension
1333       which allows <emphasis>fuzzy</emphasis> matches. One single
1334       error in spelling of search terms is allowed, i.e., a document
1335       is hit if it includes a term which can be mapped to the used
1336       search term by one character substitution, addition, deletion or
1337       change of position.
1338       <screen>
1339        Z> find @attr 1=Body-of-text  @attr 5=100 schnittke
1340        ...
1341        Number of hits: 81, setno 14
1342        ...
1343        Z> find @attr 1=Body-of-text  @attr 5=103 schnittke
1344        ...
1345        Number of hits: 103, setno 15
1346        ...
1347       </screen>
1348       </para>
1349     </section>
1350
1351     <section id="querymodel-bib1-completeness">
1352     <title>Completeness Attributes (type = 6)</title>
1353
1354
1355      <para>
1356       The <literal>Completeness Attributes (type = 6)</literal>
1357       is used to specify that a given search term or term list is  either
1358       part of the terms of a given index/field
1359       (<literal>Incomplete subfield (1)</literal>), or is
1360       what literally is found in the entire field's index
1361       (<literal>Complete field (3)</literal>).
1362       </para>
1363
1364      <table id="querymodel-bib1-completeness-table" frame="top">
1365       <title>Completeness Attributes (type = 6)</title>
1366       <tgroup cols="3">
1367        <thead>
1368         <row>
1369          <entry>Completeness</entry>
1370          <entry>Value</entry>
1371          <entry>Notes</entry>
1372         </row>
1373        </thead>
1374        <tbody>
1375         <row>
1376          <entry>Incomplete subfield</entry>
1377          <entry>1</entry>
1378          <entry>default</entry>
1379         </row>
1380         <row>
1381          <entry>Complete subfield</entry>
1382          <entry>2</entry>
1383          <entry>deprecated</entry>
1384         </row>
1385         <row>
1386          <entry>Complete field</entry>
1387          <entry>3</entry>
1388          <entry>supported</entry>
1389         </row>
1390        </tbody>
1391       </tgroup>
1392      </table>
1393
1394      <para>
1395       The <literal>Completeness Attributes (type = 6)</literal>
1396       is only partially and conditionally
1397       supported in the sense that it is ignored if the hit index is
1398       not of structure <literal>type="w"</literal> or
1399       <literal>type="p"</literal>.
1400       </para>
1401      <para>
1402       <literal>Incomplete subfield (1)</literal> is the default, and
1403       makes &zebra; use
1404       register <literal>type="w"</literal>, whereas
1405       <literal>Complete field (3)</literal> triggers
1406       search and scan in index <literal>type="p"</literal>.
1407      </para>
1408      <para>
1409       The <literal>Complete subfield (2)</literal> is a reminiscens
1410       from the  happy <literal>&acro.marc;</literal>
1411       binary format days. &zebra; does not support it, but maps silently
1412       to <literal>Complete field (3)</literal>.
1413      </para>
1414
1415      <note>
1416       <para>
1417        The exact mapping between &acro.pqf; queries and &zebra; internal indexes
1418        and index types is explained in
1419        <xref linkend="querymodel-pqf-apt-mapping"/>.
1420       </para>
1421      </note>
1422     </section>
1423    </section>
1424
1425    </section>
1426
1427
1428   <section id="querymodel-zebra">
1429    <title>Extended &zebra; &acro.rpn; Features</title>
1430    <para>
1431     The &zebra; internal query engine has been extended to specific needs
1432     not covered by the <literal>bib-1</literal> attribute set query
1433     model. These extensions are <emphasis>non-standard</emphasis>
1434     and <emphasis>non-portable</emphasis>: most functional extensions
1435     are modeled over the <literal>bib-1</literal> attribute set,
1436     defining type 7 and higher values.
1437     There are also the special
1438     <literal>string</literal> type index names for the
1439     <literal>idxpath</literal> attribute set.
1440    </para>
1441
1442    <section id="querymodel-zebra-attr-allrecords">
1443     <title>&zebra; specific retrieval of all records</title>
1444     <para>
1445      &zebra; defines a hardwired <literal>string</literal> index name
1446      called <literal>_ALLRECORDS</literal>. It matches any record
1447      contained in the database, if used in conjunction with
1448      the relation attribute
1449      <literal>AlwaysMatches (103)</literal>.
1450      </para>
1451     <para>
1452      The <literal>_ALLRECORDS</literal> index name is used for total database
1453      export. The search term is ignored, it may be empty.
1454      <screen>
1455       Z> find @attr 1=_ALLRECORDS @attr 2=103 ""
1456      </screen>
1457     </para>
1458     <para>
1459      Combination with other index types can be made. For example, to
1460      find all records which are <emphasis>not</emphasis> indexed in
1461      the <literal>Title</literal> register, issue one of the two
1462      equivalent queries:
1463      <screen>
1464       Z> find @not @attr 1=_ALLRECORDS @attr 2=103 "" @attr 1=Title @attr 2=103 ""
1465       Z> find @not @attr 1=_ALLRECORDS @attr 2=103 "" @attr 1=4 @attr 2=103 ""
1466      </screen>
1467     </para>
1468     <warning>
1469      <para>
1470       The special string index <literal>_ALLRECORDS</literal> is
1471       experimental, and the provided functionality and syntax may very
1472       well change in future releases of &zebra;.
1473      </para>
1474     </warning>
1475    </section>
1476
1477    <section id="querymodel-zebra-attr-search">
1478     <title>&zebra; specific Search Extensions to all Attribute Sets</title>
1479     <para>
1480      &zebra; extends the &acro.bib1; attribute types, and these extensions are
1481      recognized regardless of attribute
1482      set used in a <literal>search</literal> operation query.
1483     </para>
1484
1485     <table id="querymodel-zebra-attr-search-table" frame="top">
1486      <title>&zebra; Search Attribute Extensions</title>
1487      <tgroup cols="4">
1488       <thead>
1489        <row>
1490          <entry>Name</entry>
1491         <entry>Value</entry>
1492         <entry>Operation</entry>
1493         <entry>&zebra; version</entry>
1494        </row>
1495       </thead>
1496       <tbody>
1497        <row>
1498         <entry>Embedded Sort</entry>
1499         <entry>7</entry>
1500         <entry>search</entry>
1501         <entry>1.1</entry>
1502        </row>
1503        <row>
1504         <entry>Term Set</entry>
1505         <entry>8</entry>
1506         <entry>search</entry>
1507         <entry>1.1</entry>
1508        </row>
1509        <row>
1510         <entry>Rank Weight</entry>
1511         <entry>9</entry>
1512         <entry>search</entry>
1513         <entry>1.1</entry>
1514        </row>
1515        <row>
1516         <entry>Term Reference</entry>
1517         <entry>10</entry>
1518         <entry>search</entry>
1519         <entry>1.4</entry>
1520        </row>
1521        <row>
1522         <entry>Local Approx Limit</entry>
1523         <entry>11</entry>
1524         <entry>search</entry>
1525         <entry>1.4</entry>
1526        </row>
1527        <row>
1528         <entry>Global Approx Limit</entry>
1529         <entry>12</entry>
1530         <entry>search</entry>
1531         <entry>2.0.8</entry>
1532        </row>
1533        <row>
1534         <entry>Maximum number of truncated terms (truncmax)</entry>
1535         <entry>13</entry>
1536         <entry>search</entry>
1537         <entry>2.0.10</entry>
1538        </row>
1539        <row>
1540         <entry>
1541          Specifies whether un-indexed fields should be ignored.
1542          A zero value (default) throws a diagnostic when an un-indexed
1543          field is specified. A non-zero value makes it return 0 hits.
1544         </entry>
1545         <entry>14</entry>
1546         <entry>search</entry>
1547         <entry>2.0.16</entry>
1548        </row>
1549       </tbody>
1550      </tgroup>
1551     </table>
1552
1553     <section id="querymodel-zebra-attr-sorting">
1554      <title>&zebra; Extension Embedded Sort Attribute (type 7)</title>
1555      <para>
1556       The embedded sort is a way to specify sort within a query - thus
1557       removing the need to send a Sort Request separately. It is both
1558       faster and does not require clients to deal with the Sort
1559       Facility.
1560      </para>
1561
1562      <para>
1563       All ordering operations are based on a lexicographical ordering,
1564       <emphasis>except</emphasis> when the
1565       <literal>structure attribute numeric (109)</literal> is used. In
1566       this case, ordering is numerical. See
1567       <xref linkend="querymodel-bib1-structure"/>.
1568      </para>
1569
1570      <para>
1571       The possible values after attribute <literal>type 7</literal> are
1572       <literal>1</literal> ascending and
1573       <literal>2</literal> descending.
1574       The attributes+term (&acro.apt;) node is separate from the
1575       rest and must be <literal>@or</literal>'ed.
1576       The term associated with &acro.apt; is the sorting level in integers,
1577       where <literal>0</literal> means primary sort,
1578       <literal>1</literal> means secondary sort, and so forth.
1579       See also <xref linkend="administration-ranking"/>.
1580      </para>
1581      <para>
1582       For example, searching for water, sort by title (ascending)
1583       <screen>
1584        Z> find @or @attr 1=1016 water @attr 7=1 @attr 1=4 0
1585       </screen>
1586      </para>
1587      <para>
1588       Or, searching for water, sort by title ascending, then date descending
1589       <screen>
1590        Z> find @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1
1591       </screen>
1592      </para>
1593     </section>
1594
1595      <!--
1596     &zebra; Extension Term Set Attribute
1597     From the manual text, I can not see what is the point with this feature.
1598     I think it makes more sense when there are multiple terms in a query, or
1599     something...
1600
1601     We decided 2006-06-03 to disable this feature, as it is covered by
1602     scan within a resultset. Better use ressources to upgrade this
1603     feature for good performance.
1604     -->
1605
1606      <!--
1607     <section id="querymodel-zebra-attr-estimation">
1608      <title>&zebra; Extension Term Set Attribute (type 8)</title>
1609     <para>
1610      The Term Set feature is a facility that allows a search to store
1611      hitting terms in a "pseudo" resultset; thus a search (as usual) +
1612      a scan-like facility. Requires a client that can do named result
1613      sets since the search generates two result sets. The value for
1614      attribute 8 is the name of a result set (string). The terms in
1615      the named term set are returned as &acro.sutrs; records.
1616     </para>
1617     <para>
1618      For example, searching  for u in title, right truncated, and
1619      storing the result in term set named 'aset'
1620      <screen>
1621       Z> find @attr 5=1 @attr 1=4 @attr 8=aset u
1622      </screen>
1623     </para>
1624     <warning>
1625      The model has one serious flaw: we don't know the size of term
1626      set. Experimental. Do not use in production code.
1627     </warning>
1628     </section>
1629     -->
1630
1631
1632     <section id="querymodel-zebra-attr-weight">
1633      <title>&zebra; Extension Rank Weight Attribute (type 9)</title>
1634      <para>
1635       Rank weight is a way to pass a value to a ranking algorithm - so
1636       that one &acro.apt; has one value - while another as a different one.
1637       See also <xref linkend="administration-ranking"/>.
1638      </para>
1639      <para>
1640       For example, searching  for utah in title with weight 30 as well
1641       as any with weight 20:
1642       <screen>
1643        Z> find @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 utah
1644       </screen>
1645      </para>
1646     </section>
1647
1648     <section id="querymodel-zebra-attr-termref">
1649      <title>&zebra; Extension Term Reference Attribute (type 10)</title>
1650      <para>
1651       &zebra; supports the searchResult-1 facility.
1652       If the Term Reference Attribute (type 10) is
1653       given, that specifies a subqueryId value returned as part of the
1654       search result. It is a way for a client to name an &acro.apt; part of a
1655       query.
1656      </para>
1657      <!--
1658      <para>
1659      <screen>
1660     </screen>
1661     </para>
1662      -->
1663      <warning>
1664       <para>
1665        Experimental. Do not use in production code.
1666        </para>
1667      </warning>
1668
1669     </section>
1670
1671
1672
1673     <section id="querymodel-zebra-local-attr-limit">
1674      <title>Local Approximative Limit Attribute (type 11)</title>
1675      <para>
1676       &zebra; computes - unless otherwise configured -
1677       the exact hit count for every &acro.apt;
1678       (leaf) in the query tree. These hit counts are returned as part of
1679       the searchResult-1 facility in the binary encoded &acro.z3950; search
1680       response packages.
1681      </para>
1682      <para>
1683       By setting an estimation limit size of the resultset of the &acro.apt;
1684       leaves, &zebra; stoppes processing the result set when the limit
1685       length is reached.
1686       Hit counts under this limit are still precise, but hit counts over it
1687       are estimated using the statistics gathered from the chopped
1688       result set.
1689      </para>
1690      <para>
1691       Specifying a limit of <literal>0</literal> resuts in exact hit counts.
1692      </para>
1693      <para>
1694       For example, we might be interested in exact hit count for a, but
1695       for b we allow hit count estimates for 1000 and higher.
1696       <screen>
1697        Z> find @and a @attr 11=1000 b
1698       </screen>
1699      </para>
1700      <note>
1701       <para>
1702        The estimated hit count facility makes searches faster, as one
1703        only needs to process large hit lists partially.
1704        It is mostly used in huge databases, where you you want trade
1705        exactness of hit counts against speed of execution.
1706       </para>
1707      </note>
1708      <warning>
1709       <para>
1710        Do not use approximative hit count limits
1711        in conjunction with relevance ranking, as re-sorting of the
1712        result set only works when the entire result set has
1713        been processed.
1714       </para>
1715      </warning>
1716     </section>
1717
1718     <section id="querymodel-zebra-global-attr-limit">
1719      <title>Global Approximative Limit Attribute (type 12)</title>
1720      <para>
1721       By default &zebra; computes precise hit counts for a query as
1722       a whole. Setting attribute 12 makes it perform approximative
1723       hit counts instead. It has the same semantics as
1724       <literal>estimatehits</literal> for the <xref linkend="zebra-cfg"/>.
1725      </para>
1726      <para>
1727       The attribute (12) can occur anywhere in the query tree.
1728       Unlike regular attributes it does not relate to the leaf (&acro.apt;)
1729       - but to the whole query.
1730      </para>
1731      <warning>
1732       <para>
1733        Do not use approximative hit count limits
1734        in conjunction with relevance ranking, as re-sorting of the
1735        result set only works when the entire result set has
1736        been processed.
1737       </para>
1738      </warning>
1739     </section>
1740
1741    </section>
1742
1743    <section id="querymodel-zebra-attr-scan">
1744     <title>&zebra; specific Scan Extensions to all Attribute Sets</title>
1745     <para>
1746      &zebra; extends the Bib1 attribute types, and these extensions are
1747      recognized regardless of attribute
1748      set used in a scan operation query.
1749     </para>
1750     <table id="querymodel-zebra-attr-scan-table" frame="top">
1751      <title>&zebra; Scan Attribute Extensions</title>
1752      <tgroup cols="4">
1753       <thead>
1754        <row>
1755         <entry>Name</entry>
1756         <entry>Type</entry>
1757         <entry>Operation</entry>
1758         <entry>&zebra; version</entry>
1759        </row>
1760       </thead>
1761       <tbody>
1762        <row>
1763         <entry>Result Set Narrow</entry>
1764         <entry>8</entry>
1765         <entry>scan</entry>
1766         <entry>1.3</entry>
1767        </row>
1768        <row>
1769         <entry>Approximative Limit</entry>
1770         <entry>12</entry>
1771         <entry>scan</entry>
1772         <entry>2.0.20</entry>
1773        </row>
1774       </tbody>
1775      </tgroup>
1776     </table>
1777
1778     <section id="querymodel-zebra-attr-narrow">
1779      <title>&zebra; Extension Result Set Narrow (type 8)</title>
1780      <para>
1781       If attribute Result Set Narrow (type 8)
1782       is given for scan, the value is the name of a
1783       result set. Each hit count in scan is
1784       <literal>@and</literal>'ed with the result set given.
1785      </para>
1786      <para>
1787       Consider for example
1788       the case of scanning all title fields around the
1789       scanterm <emphasis>mozart</emphasis>, then refining the scan by
1790       issuing a filtering query for <emphasis>amadeus</emphasis> to
1791       restrict the scan to the result set of the query:
1792       <screen>
1793       Z> scan @attr 1=4 mozart
1794       ...
1795       * mozart (43)
1796         mozartforskningen (1)
1797         mozartiana (1)
1798         mozarts (16)
1799       ...
1800       Z> f @attr 1=4 amadeus
1801       ...
1802       Number of hits: 15, setno 2
1803       ...
1804       Z> scan @attr 1=4 @attr 8=2 mozart
1805       ...
1806       * mozart (14)
1807         mozartforskningen (0)
1808         mozartiana (0)
1809         mozarts (1)
1810       ...
1811       </screen>
1812      </para>
1813
1814      <para>
1815       &zebra; 2.0.2 and later is able to skip 0 hit counts. This, however,
1816       is known not to scale if the number of terms to skip is high.
1817       This most likely will happen if the result set is small (and
1818       result in many 0 hits).
1819      </para>
1820     </section>
1821
1822     <section id="querymodel-zebra-attr-approx">
1823      <title>&zebra; Extension Approximative Limit (type 12)</title>
1824      <para>
1825       The &zebra; Extension Approximative Limit (type 12) is a way to
1826       enable approximate hit counts for scan hit counts, in the same
1827       way as for search hit counts.
1828      </para>
1829     </section>
1830    </section>
1831
1832    <section id="querymodel-idxpath">
1833     <title>&zebra; special &acro.idxpath; Attribute Set for &acro.grs1; indexing</title>
1834     <para>
1835      The attribute-set <literal>idxpath</literal> consists of a single
1836      Use (type 1) attribute. All non-use attributes behave as normal.
1837     </para>
1838     <para>
1839      This feature is enabled when defining the
1840      <literal>xpath enable</literal> option in the &acro.grs1; filter
1841      <filename>*.abs</filename> configuration files. If one wants to use
1842      the special <literal>idxpath</literal> numeric attribute set, the
1843      main &zebra; configuration file <filename>zebra.cfg</filename>
1844      directive <literal>attset: idxpath.att</literal> must be enabled.
1845     </para>
1846     <warning>
1847      <para>
1848       The <literal>idxpath</literal> is deprecated, may not be
1849       supported in future &zebra; versions, and should definitely
1850       not be used in production code.
1851      </para>
1852     </warning>
1853
1854     <section id="querymodel-idxpath-use">
1855     <title>&acro.idxpath; Use Attributes (type = 1)</title>
1856      <para>
1857       This attribute set allows one to search &acro.grs1; filter indexed
1858       records by &acro.xpath; like structured index names.
1859      </para>
1860
1861      <warning>
1862       <para>
1863        The <literal>idxpath</literal> option defines hard-coded
1864        index names, which might clash with your own index names.
1865       </para>
1866      </warning>
1867
1868      <table id="querymodel-idxpath-use-table" frame="top">
1869       <title>&zebra; specific &acro.idxpath; Use Attributes (type 1)</title>
1870       <tgroup cols="4">
1871        <thead>
1872         <row>
1873          <entry>&acro.idxpath;</entry>
1874          <entry>Value</entry>
1875          <entry>String Index</entry>
1876          <entry>Notes</entry>
1877         </row>
1878        </thead>
1879        <tbody>
1880         <row>
1881          <entry>&acro.xpath; Begin</entry>
1882          <entry>1</entry>
1883          <entry>_XPATH_BEGIN</entry>
1884          <entry>deprecated</entry>
1885         </row>
1886         <row>
1887          <entry>&acro.xpath; End</entry>
1888          <entry>2</entry>
1889          <entry>_XPATH_END</entry>
1890          <entry>deprecated</entry>
1891         </row>
1892         <row>
1893          <entry>&acro.xpath; CData</entry>
1894          <entry>1016</entry>
1895          <entry>_XPATH_CDATA</entry>
1896          <entry>deprecated</entry>
1897         </row>
1898         <row>
1899          <entry>&acro.xpath; Attribute Name</entry>
1900          <entry>3</entry>
1901          <entry>_XPATH_ATTR_NAME</entry>
1902          <entry>deprecated</entry>
1903         </row>
1904         <row>
1905          <entry>&acro.xpath; Attribute CData</entry>
1906          <entry>1015</entry>
1907          <entry>_XPATH_ATTR_CDATA</entry>
1908          <entry>deprecated</entry>
1909         </row>
1910        </tbody>
1911       </tgroup>
1912      </table>
1913
1914      <para>
1915       See <filename>tab/idxpath.att</filename> for more information.
1916      </para>
1917      <para>
1918       Search for all documents starting with root element
1919       <literal>/root</literal> (either using the numeric or the string
1920       use attributes):
1921       <screen>
1922        Z> find @attrset idxpath @attr 1=1 @attr 4=3 root/
1923        Z> find @attr idxpath 1=1 @attr 4=3 root/
1924        Z> find @attr 1=_XPATH_BEGIN @attr 4=3 root/
1925       </screen>
1926      </para>
1927      <para>
1928       Search for all documents where specific nested &acro.xpath;
1929       <literal>/c1/c2/../cn</literal> exists. Notice the very
1930       counter-intuitive <emphasis>reverse</emphasis> notation!
1931       <screen>
1932        Z> find @attrset idxpath @attr 1=1 @attr 4=3 cn/cn-1/../c1/
1933        Z> find @attr 1=_XPATH_BEGIN @attr 4=3 cn/cn-1/../c1/
1934       </screen>
1935      </para>
1936      <para>
1937       Search for CDATA string <emphasis>text</emphasis> in any  element
1938       <screen>
1939        Z> find @attrset idxpath @attr 1=1016 text
1940        Z> find @attr 1=_XPATH_CDATA text
1941       </screen>
1942      </para>
1943      <para>
1944        Search for CDATA string <emphasis>anothertext</emphasis> in any
1945        attribute:
1946       <screen>
1947        Z> find @attrset idxpath @attr 1=1015 anothertext
1948        Z> find @attr 1=_XPATH_ATTR_CDATA anothertext
1949       </screen>
1950      </para>
1951      <para>
1952        Search for all documents with have an &acro.xml; element node
1953        including an &acro.xml;  attribute named <emphasis>creator</emphasis>
1954       <screen>
1955        Z> find @attrset idxpath @attr 1=3 @attr 4=3 creator
1956        Z> find @attr 1=_XPATH_ATTR_NAME @attr 4=3 creator
1957       </screen>
1958      </para>
1959      <para>
1960       Combining usual <literal>bib-1</literal> attribute set searches
1961       with <literal>idxpath</literal> attribute set searches:
1962       <screen>
1963        Z> find @and @attr idxpath 1=1 @attr 4=3 link/ @attr 1=4 mozart
1964        Z> find @and @attr 1=_XPATH_BEGIN @attr 4=3 link/ @attr 1=_XPATH_CDATA mozart
1965       </screen>
1966      </para>
1967      <para>
1968       Scanning is supported on all <literal>idxpath</literal>
1969       indexes, both specified as numeric use attributes, or as string
1970       index names.
1971       <screen>
1972        Z> scan  @attrset idxpath @attr 1=1016 text
1973        Z> scan  @attr 1=_XPATH_ATTR_CDATA anothertext
1974        Z> scan  @attrset idxpath @attr 1=3 @attr 4=3 ''
1975       </screen>
1976      </para>
1977
1978     </section>
1979    </section>
1980
1981
1982    <section id="querymodel-pqf-apt-mapping">
1983     <title>Mapping from &acro.pqf; atomic &acro.apt; queries to &zebra; internal
1984      register indexes</title>
1985     <para>
1986      The rules for &acro.pqf; &acro.apt; mapping are rather tricky to grasp in the
1987      first place. We deal first with the rules for deciding which
1988      internal register or string index to use, according to the use
1989      attribute or access point specified in the query. Thereafter we
1990      deal with the rules for determining the correct structure type of
1991      the named register.
1992     </para>
1993
1994    <section id="querymodel-pqf-apt-mapping-accesspoint">
1995     <title>Mapping of &acro.pqf; &acro.apt; access points</title>
1996     <para>
1997       &zebra; understands four fundamental different types of access
1998       points, of which only the
1999       <emphasis>numeric use attribute</emphasis> type access points
2000       are defined by the  <ulink url="&url.z39.50;">&acro.z3950;</ulink>
2001       standard.
2002       All other access point types are &zebra; specific, and non-portable.
2003     </para>
2004
2005      <table id="querymodel-zebra-mapping-accesspoint-types" frame="top">
2006       <title>Access point name mapping</title>
2007       <tgroup cols="4">
2008        <thead>
2009         <row>
2010          <entry>Access Point</entry>
2011          <entry>Type</entry>
2012          <entry>Grammar</entry>
2013          <entry>Notes</entry>
2014         </row>
2015       </thead>
2016       <tbody>
2017        <row>
2018         <entry>Use attribute</entry>
2019         <entry>numeric</entry>
2020         <entry>[1-9][1-9]*</entry>
2021         <entry>directly mapped to string index name</entry>
2022        </row>
2023        <row>
2024         <entry>String index name</entry>
2025         <entry>string</entry>
2026         <entry>[a-zA-Z](\-?[a-zA-Z0-9])*</entry>
2027         <entry>normalized name is used as internal string index name</entry>
2028        </row>
2029        <row>
2030         <entry>&zebra; internal index name</entry>
2031         <entry>zebra</entry>
2032         <entry>_[a-zA-Z](_?[a-zA-Z0-9])*</entry>
2033         <entry>hardwired internal string index name</entry>
2034        </row>
2035        <row>
2036         <entry>&acro.xpath; special index</entry>
2037         <entry>XPath</entry>
2038         <entry>/.*</entry>
2039         <entry>special xpath search for &acro.grs1; indexed records</entry>
2040        </row>
2041        </tbody>
2042       </tgroup>
2043      </table>
2044
2045      <para>
2046       <literal>Attribute set names</literal> and
2047       <literal>string index names</literal> are normalizes
2048       according to the following rules: all <emphasis>single</emphasis>
2049       hyphens <literal>'-'</literal> are stripped, and all upper case
2050       letters are folded to lower case.
2051      </para>
2052
2053      <para>
2054       <emphasis>Numeric use attributes</emphasis> are mapped
2055       to the &zebra; internal
2056       string index according to the attribute set definition in use.
2057       The default attribute set is <literal>&acro.bib1;</literal>, and may be
2058       omitted in the &acro.pqf; query.
2059      </para>
2060
2061      <para>
2062       According to normalization and numeric
2063       use attribute mapping, it follows that the following
2064       &acro.pqf; queries are considered equivalent (assuming the default
2065       configuration has not been altered):
2066       <screen>
2067       Z> find  @attr 1=Body-of-text serenade
2068       Z> find  @attr 1=bodyoftext serenade
2069       Z> find  @attr 1=BodyOfText serenade
2070       Z> find  @attr 1=bO-d-Y-of-tE-x-t serenade
2071       Z> find  @attr 1=1010 serenade
2072       Z> find  @attrset &acro.bib1; @attr 1=1010 serenade
2073       Z> find  @attrset bib1 @attr 1=1010 serenade
2074       Z> find  @attrset Bib1 @attr 1=1010 serenade
2075       Z> find  @attrset b-I-b-1 @attr 1=1010 serenade
2076      </screen>
2077     </para>
2078
2079     <para>
2080       The <emphasis>numerical</emphasis>
2081       <literal>use attributes (type 1)</literal>
2082       are interpreted according to the
2083       attribute sets which have been loaded in the
2084       <literal>zebra.cfg</literal> file, and are matched against specific
2085       fields as specified in the <literal>.abs</literal> file which
2086       describes the profile of the records which have been loaded.
2087       If no use attribute is provided, a default of
2088       &acro.bib1; Use Any (1016) is assumed.
2089       The predefined use attribute sets
2090       can be reconfigured by  tweaking the configuration files
2091       <filename>tab/*.att</filename>, and
2092       new attribute sets can be defined by adding similar files in the
2093       configuration path <literal>profilePath</literal> of the server.
2094     </para>
2095
2096      <para>
2097       String indexes can be accessed directly,
2098       independently which attribute set is in use. These are just
2099       ignored. The above mentioned name normalization applies.
2100       String index names are defined in the
2101       used indexing  filter configuration files, for example in the
2102       <literal>&acro.grs1;</literal>
2103       <filename>*.abs</filename> configuration files, or in the
2104       <literal>alvis</literal> filter &acro.xslt; indexing stylesheets.
2105      </para>
2106
2107      <para>
2108       &zebra; internal indexes can be accessed directly,
2109       according to the same rules as the user defined
2110       string indexes. The only difference is that
2111       &zebra; internal index names are hardwired,
2112       all uppercase and
2113       must start with the character <literal>'_'</literal>.
2114      </para>
2115
2116      <para>
2117       Finally, <literal>&acro.xpath;</literal> access points are only
2118       available using the <literal>&acro.grs1;</literal> filter for indexing.
2119       These access point names must start with the character
2120       <literal>'/'</literal>, they are <emphasis>not
2121       normalized</emphasis>, but passed unaltered to the &zebra; internal
2122       &acro.xpath; engine. See <xref linkend="querymodel-use-xpath"/>.
2123
2124      </para>
2125
2126
2127     </section>
2128
2129
2130    <section id="querymodel-pqf-apt-mapping-structuretype">
2131      <title>Mapping of &acro.pqf; &acro.apt; structure and completeness to
2132       register type</title>
2133     <para>
2134       Internally &zebra; has in its default configuration several
2135      different types of registers or indexes, whose tokenization and
2136       character normalization rules differ. This reflects the fact that
2137       searching fundamental different tokens like dates, numbers,
2138       bitfields and string based text needs different rule sets.
2139      </para>
2140
2141      <table id="querymodel-zebra-mapping-structure-types" frame="top">
2142       <title>Structure and completeness mapping to register types</title>
2143       <tgroup cols="4">
2144        <thead>
2145         <row>
2146          <entry>Structure</entry>
2147          <entry>Completeness</entry>
2148          <entry>Register type</entry>
2149          <entry>Notes</entry>
2150         </row>
2151        </thead>
2152        <tbody>
2153         <row>
2154          <entry>
2155           phrase (@attr 4=1), word (@attr 4=2),
2156           word-list (@attr 4=6),
2157           free-form-text  (@attr 4=105), or document-text (@attr 4=106)
2158          </entry>
2159          <entry>Incomplete field (@attr 6=1)</entry>
2160          <entry>Word ('w')</entry>
2161          <entry>Traditional tokenized and character normalized word index</entry>
2162         </row>
2163         <row>
2164          <entry>
2165           phrase (@attr 4=1), word (@attr 4=2),
2166           word-list (@attr 4=6),
2167           free-form-text  (@attr 4=105), or document-text (@attr 4=106)
2168          </entry>
2169          <entry>complete field' (@attr 6=3)</entry>
2170          <entry>Phrase ('p')</entry>
2171          <entry>Character normalized, but not tokenized index for phrase
2172           matches
2173          </entry>
2174         </row>
2175         <row>
2176          <entry>urx (@attr 4=104)</entry>
2177          <entry>ignored</entry>
2178          <entry>URX/URL ('u')</entry>
2179          <entry>Special index for URL web addresses</entry>
2180         </row>
2181         <row>
2182          <entry>numeric (@attr 4=109)</entry>
2183          <entry>ignored</entry>
2184          <entry>Numeric ('n')</entry>
2185          <entry>Special index for digital numbers</entry>
2186         </row>
2187         <row>
2188          <entry>key (@attr 4=3)</entry>
2189          <entry>ignored</entry>
2190          <entry>Null bitmap ('0')</entry>
2191          <entry>Used for non-tokenizated and non-normalized bit sequences</entry>
2192         </row>
2193         <row>
2194          <entry>year (@attr 4=4)</entry>
2195          <entry>ignored</entry>
2196          <entry>Year ('y')</entry>
2197          <entry>Non-tokenizated and non-normalized 4 digit numbers</entry>
2198         </row>
2199         <row>
2200          <entry>date (@attr 4=5)</entry>
2201          <entry>ignored</entry>
2202          <entry>Date ('d')</entry>
2203          <entry>Non-tokenizated and non-normalized ISO date strings</entry>
2204         </row>
2205         <row>
2206          <entry>ignored</entry>
2207          <entry>ignored</entry>
2208          <entry>Sort ('s')</entry>
2209          <entry>Used with special sort attribute set (@attr 7=1, @attr 7=2)</entry>
2210         </row>
2211         <row>
2212          <entry>overruled</entry>
2213          <entry>overruled</entry>
2214          <entry>special</entry>
2215          <entry>Internal record ID register, used whenever
2216           Relation Always Matches (@attr 2=103) is specified</entry>
2217         </row>
2218        </tbody>
2219       </tgroup>
2220      </table>
2221
2222      <!-- see in util/zebramap.c -->
2223
2224     <para>
2225      If a <emphasis>Structure</emphasis> attribute of
2226      <emphasis>Phrase</emphasis> is used in conjunction with a
2227      <emphasis>Completeness</emphasis> attribute of
2228      <emphasis>Complete (Sub)field</emphasis>, the term is matched
2229      against the contents of the phrase (long word) register, if one
2230      exists for the given <emphasis>Use</emphasis> attribute.
2231      A phrase register is created for those fields in the
2232      &acro.grs1; <filename>*.abs</filename> file that contains a
2233      <literal>p</literal>-specifier.
2234       <screen>
2235        Z> scan @attr 1=Title @attr 4=1 @attr 6=3 beethoven
2236        ...
2237        bayreuther festspiele (1)
2238        * beethoven bibliography database (1)
2239        benny carter (1)
2240        ...
2241        Z> find @attr 1=Title @attr 4=1 @attr 6=3 "beethoven bibliography"
2242        ...
2243        Number of hits: 0, setno 5
2244        ...
2245        Z> find @attr 1=Title @attr 4=1 @attr 6=3 "beethoven bibliography database"
2246        ...
2247        Number of hits: 1, setno 6
2248        </screen>
2249     </para>
2250
2251     <para>
2252      If <emphasis>Structure</emphasis>=<emphasis>Phrase</emphasis> is
2253      used in conjunction with <emphasis>Incomplete Field</emphasis> - the
2254      default value for <emphasis>Completeness</emphasis>, the
2255      search is directed against the normal word registers, but if the term
2256      contains multiple words, the term will only match if all of the words
2257      are found immediately adjacent, and in the given order.
2258      The word search is performed on those fields that are indexed as
2259      type <literal>w</literal> in the &acro.grs1; <filename>*.abs</filename> file.
2260       <screen>
2261        Z> scan @attr 1=Title @attr 4=1 @attr 6=1 beethoven
2262        ...
2263          beefheart (1)
2264        * beethoven (18)
2265          beethovens (7)
2266        ...
2267        Z> find @attr 1=Title @attr 4=1 @attr 6=1 beethoven
2268        ...
2269        Number of hits: 18, setno 1
2270        ...
2271        Z> find @attr 1=Title @attr 4=1 @attr 6=1 "beethoven  bibliography"
2272        ...
2273        Number of hits: 2, setno 2
2274        ...
2275      </screen>
2276     </para>
2277
2278     <para>
2279      If the <emphasis>Structure</emphasis> attribute is
2280      <emphasis>Word List</emphasis>,
2281      <emphasis>Free-form Text</emphasis>, or
2282      <emphasis>Document Text</emphasis>, the term is treated as a
2283      natural-language, relevance-ranked query.
2284      This search type uses the word register, i.e. those fields
2285      that are indexed as type <literal>w</literal> in the
2286      &acro.grs1; <filename>*.abs</filename> file.
2287     </para>
2288
2289     <para>
2290      If the <emphasis>Structure</emphasis> attribute is
2291      <emphasis>Numeric String</emphasis> the term is treated as an integer.
2292      The search is performed on those fields that are indexed
2293      as type <literal>n</literal> in the &acro.grs1;
2294       <filename>*.abs</filename> file.
2295     </para>
2296
2297     <para>
2298      If the <emphasis>Structure</emphasis> attribute is
2299      <emphasis>URX</emphasis> the term is treated as a URX (URL) entity.
2300      The search is performed on those fields that are indexed as type
2301      <literal>u</literal> in the <filename>*.abs</filename> file.
2302     </para>
2303
2304     <para>
2305      If the <emphasis>Structure</emphasis> attribute is
2306      <emphasis>Local Number</emphasis> the term is treated as
2307      native &zebra; Record Identifier.
2308     </para>
2309
2310     <para>
2311      If the <emphasis>Relation</emphasis> attribute is
2312      <emphasis>Equals</emphasis> (default), the term is matched
2313      in a normal fashion (modulo truncation and processing of
2314      individual words, if required).
2315      If <emphasis>Relation</emphasis> is <emphasis>Less Than</emphasis>,
2316      <emphasis>Less Than or Equal</emphasis>,
2317      <emphasis>Greater than</emphasis>, or <emphasis>Greater than or
2318       Equal</emphasis>, the term is assumed to be numerical, and a
2319      standard regular expression is constructed to match the given
2320      expression.
2321      If <emphasis>Relation</emphasis> is <emphasis>Relevance</emphasis>,
2322      the standard natural-language query processor is invoked.
2323     </para>
2324
2325     <para>
2326      For the <emphasis>Truncation</emphasis> attribute,
2327      <emphasis>No Truncation</emphasis> is the default.
2328      <emphasis>Left Truncation</emphasis> is not supported.
2329      <emphasis>Process # in search term</emphasis> is supported, as is
2330      <emphasis>Regxp-1</emphasis>.
2331      <emphasis>Regxp-2</emphasis> enables the fault-tolerant (fuzzy)
2332      search. As a default, a single error (deletion, insertion,
2333      replacement) is accepted when terms are matched against the register
2334      contents.
2335     </para>
2336
2337      </section>
2338    </section>
2339
2340    <section  id="querymodel-regular">
2341     <title>&zebra; Regular Expressions in Truncation Attribute (type = 5)</title>
2342
2343     <para>
2344      Each term in a query is interpreted as a regular expression if
2345      the truncation value is either <emphasis>Regxp-1 (@attr 5=102)</emphasis>
2346      or <emphasis>Regxp-2 (@attr 5=103)</emphasis>.
2347      Both query types follow the same syntax with the operands:
2348     </para>
2349
2350     <table id="querymodel-regular-operands-table" frame="top">
2351      <title>Regular Expression Operands</title>
2352      <tgroup cols="2">
2353       <tbody>
2354        <row>
2355         <entry><literal>x</literal></entry>
2356         <entry>Matches the character <literal>x</literal>.</entry>
2357        </row>
2358        <row>
2359         <entry><literal>.</literal></entry>
2360         <entry>Matches any character.</entry>
2361        </row>
2362        <row>
2363         <entry><literal>[ .. ]</literal></entry>
2364         <entry>Matches the set of characters specified;
2365          such as <literal>[abc]</literal> or <literal>[a-c]</literal>.</entry>
2366        </row>
2367       </tbody>
2368      </tgroup>
2369     </table>
2370
2371     <para>
2372      The above operands can be combined with the following operators:
2373     </para>
2374
2375     <table id="querymodel-regular-operators-table" frame="top">
2376      <title>Regular Expression Operators</title>
2377      <tgroup cols="2">
2378       <tbody>
2379        <row>
2380         <entry><literal>x*</literal></entry>
2381         <entry>Matches <literal>x</literal> zero or more times.
2382          Priority: high.</entry>
2383        </row>
2384        <row>
2385         <entry><literal>x+</literal></entry>
2386         <entry>Matches <literal>x</literal> one or more times.
2387          Priority: high.</entry>
2388        </row>
2389        <row>
2390         <entry><literal>x?</literal></entry>
2391         <entry> Matches <literal>x</literal> zero or once.
2392          Priority: high.</entry>
2393        </row>
2394        <row>
2395         <entry><literal>xy</literal></entry>
2396         <entry> Matches <literal>x</literal>, then <literal>y</literal>.
2397          Priority: medium.</entry>
2398        </row>
2399        <row>
2400         <entry><literal>x|y</literal></entry>
2401         <entry> Matches either <literal>x</literal> or <literal>y</literal>.
2402          Priority: low.</entry>
2403        </row>
2404        <row>
2405         <entry><literal>( )</literal></entry>
2406         <entry>The order of evaluation may be changed by using parentheses.</entry>
2407        </row>
2408       </tbody>
2409       </tgroup>
2410     </table>
2411
2412     <para>
2413      If the first character of the <literal>Regxp-2</literal> query
2414      is a plus character (<literal>+</literal>) it marks the
2415      beginning of a section with non-standard specifiers.
2416      The next plus character marks the end of the section.
2417      Currently &zebra; only supports one specifier, the error tolerance,
2418      which consists one digit.
2419      <!-- TODO Nice thing, but what does
2420      that error tolerance digit *mean*? Maybe an example would be nice? -->
2421     </para>
2422
2423     <para>
2424      Since the plus operator is normally a suffix operator the addition to
2425      the query syntax doesn't violate the syntax for standard regular
2426      expressions.
2427     </para>
2428
2429     <para>
2430      For example, a phrase search with regular expressions  in
2431      the title-register is performed like this:
2432      <screen>
2433       Z> find @attr 1=4 @attr 5=102 "informat.* retrieval"
2434      </screen>
2435     </para>
2436
2437     <para>
2438      Combinations with other attributes are possible. For example, a
2439      ranked search with a regular expression:
2440      <screen>
2441       Z> find @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
2442      </screen>
2443     </para>
2444    </section>
2445
2446
2447    <!--
2448    <para>
2449     The RecordType parameter in the <literal>zebra.cfg</literal> file, or
2450     the <literal>-t</literal> option to the indexer tells &zebra; how to
2451     process input records.
2452     Two basic types of processing are available - raw text and structured
2453     data. Raw text is just that, and it is selected by providing the
2454     argument <literal>text</literal> to &zebra;. Structured records are
2455     all handled internally using the basic mechanisms described in the
2456     subsequent sections.
2457     &zebra; can read structured records in many different formats.
2458    </para>
2459    -->
2460   </section>
2461
2462
2463   <section id="querymodel-cql-to-pqf">
2464    <title>Server Side &acro.cql; to &acro.pqf; Query Translation</title>
2465    <para>
2466     Using the
2467     <literal>&lt;cql2rpn&gt;l2rpn.txt&lt;/cql2rpn&gt;</literal>
2468       &yaz; Frontend Virtual
2469     Hosts option, one can configure
2470     the &yaz; Frontend &acro.cql;-to-&acro.pqf;
2471     converter, specifying the interpretation of various
2472     <ulink url="&url.cql;">&acro.cql;</ulink>
2473     indexes, relations, etc. in terms of Type-1 query attributes.
2474     <!-- The  yaz-client config file -->
2475    </para>
2476    <para>
2477     For example, using server-side &acro.cql;-to-&acro.pqf; conversion, one might
2478     query a zebra server like this:
2479     <screen>
2480     <![CDATA[
2481      yaz-client localhost:9999
2482      Z> querytype cql
2483      Z> find text=(plant and soil)
2484      ]]>
2485     </screen>
2486      and - if properly configured - even static relevance ranking can
2487      be performed using &acro.cql; query syntax:
2488     <screen>
2489     <![CDATA[
2490      Z> find text = /relevant (plant and soil)
2491      ]]>
2492      </screen>
2493    </para>
2494
2495    <para>
2496     By the way, the same configuration can be used to
2497     search using client-side &acro.cql;-to-&acro.pqf; conversion:
2498     (the only difference is <literal>querytype cql2rpn</literal>
2499     instead of
2500     <literal>querytype cql</literal>, and the call specifying a local
2501     conversion file)
2502     <screen>
2503     <![CDATA[
2504      yaz-client -q local/cql2pqf.txt localhost:9999
2505      Z> querytype cql2rpn
2506      Z> find text=(plant and soil)
2507      ]]>
2508      </screen>
2509    </para>
2510
2511    <para>
2512     Exhaustive information can be found in the
2513     Section <ulink url="&url.yaz.cql2pqf;">&acro.cql; to &acro.rpn; conversion"</ulink>
2514     in the &yaz; manual.
2515    </para>
2516   <!--
2517   <para>
2518     See
2519    <ulink url="http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html"/>
2520    for the Maintenance Agency's work-in-progress mapping of Dublin Core
2521     indexes to Attribute Architecture (util, XD and BIB-2)
2522    attributes.
2523   </para>
2524    -->
2525  </section>
2526
2527 </chapter>
2528
2529  <!-- Keep this comment at the end of the file
2530  Local variables:
2531  mode: sgml
2532  sgml-omittag:t
2533  sgml-shorttag:t
2534  sgml-minimize-attributes:nil
2535  sgml-always-quote-attributes:t
2536  sgml-indent-step:1
2537  sgml-indent-data:t
2538  sgml-parent-document: "zebra.xml"
2539  sgml-local-catalogs: nil
2540  sgml-namecase-general:t
2541  End:
2542  -->