-<chapter id="querymodel">
- <!-- $Id: querymodel.xml,v 1.2 2006-06-13 13:45:08 marc Exp $ -->
- <title>Query Model</title>
+ <chapter id="querymodel">
+ <!-- $Id: querymodel.xml,v 1.30 2007-02-02 11:10:08 marc Exp $ -->
+ <title>Query Model</title>
+
+ <section id="querymodel-overview">
+ <title>Query Model Overview</title>
+
+ <section id="querymodel-query-languages">
+ <title>Query Languages</title>
+
+ <para>
+ &zebra; is born as a networking Information Retrieval engine adhering
+ to the international standards
+ <ulink url="&url.z39.50;">&z3950;</ulink> and
+ <ulink url="&url.sru;">&sru;</ulink>,
+ and implement the
+ type-1 Reverse Polish Notation (&rpn;) query
+ model defined there.
+ Unfortunately, this model has only defined a binary
+ encoded representation, which is used as transport packaging in
+ the &z3950; protocol layer. This representation is not human
+ readable, nor defines any convenient way to specify queries.
+ </para>
+ <para>
+ Since the type-1 (&rpn;)
+ query structure has no direct, useful string
+ representation, every client application needs to provide some
+ form of mapping from a local query notation or representation to it.
+ </para>
+
+
+ <section id="querymodel-query-languages-pqf">
+ <title>Prefix Query Format (&pqf;)</title>
+ <para>
+ Index Data has defined a textual representation in the
+ <ulink url="&url.yaz.pqf;">Prefix Query Format</ulink>, short
+ <emphasis>&pqf;</emphasis>, which maps
+ one-to-one to binary encoded
+ <emphasis>type-1 &rpn;</emphasis> queries.
+ &pqf; has been adopted by other
+ parties developing &z3950; software, and is often referred to as
+ <emphasis>Prefix Query Notation</emphasis>, or in short
+ &pqn;. See
+ <xref linkend="querymodel-rpn"/> for further explanations and
+ descriptions of &zebra;'s capabilities.
+ </para>
+ </section>
+
+ <section id="querymodel-query-languages-cql">
+ <title>Common Query Language (&cql;)</title>
+ <para>
+ The query model of the type-1 &rpn;,
+ expressed in &pqf;/&pqn; is natively supported.
+ On the other hand, the default &sru;
+ web services <emphasis>Common Query Language</emphasis>
+ <ulink url="&url.cql;">&cql;</ulink> is not natively supported.
+ </para>
+ <para>
+ &zebra; can be configured to understand and map &cql; to &pqf;. See
+ <xref linkend="querymodel-cql-to-pqf"/>.
+ </para>
+ </section>
- <sect1 id="querymodel-overview">
- <title>Query Model Overview</title>
+ </section>
- <para>
- Zebra is born as a networking Information Retrieval engine adhering
- to the international standards
- <ulink url="&url.z39.50;">Z39.50</ulink> and
- <ulink url="&url.sru;">SRU</ulink>,
- and implement the query model defined there.
- Unfortunately, the Z39.50 query model has only defined a binary
- encoded representation, which is used as transport packaging in
- the Z39.50 protocol layer. This representation is not human
- readable, nor defines any convenient way to specify queries.
- </para>
- <para>
- Therefore, Index Data has defined a textual representaion in the
- <literal>Prefix Query Format</literal>, short
- <literal>PQF</literal>, which then has been adopted by other
- parties developing Z39.50 software. It is also often referred to as
- <literal>Prefix Query Notation</literal>, or in short
- <literal>PQN</literal>, and is thoroughly explained in
- <xref linkend="querymodel-pqf"/>.
- </para>
+ <section id="querymodel-operation-types">
+ <title>Operation types</title>
+ <para>
+ &zebra; supports all of the three different
+ &z3950;/&sru; operations defined in the
+ standards: explain, search,
+ and scan. A short description of the
+ functionality and purpose of each is quite in order here.
+ </para>
- <para>
- In addition, Zebra can be configured to understand and map the
- <literal>Common Query Language</literal>
- (<ulink url="&url.cql;">CQL</ulink>)
- to PQF. See an introduction on the mapping to the internal query
- representation in
- <xref linkend="querymodel-cql-to-pqf"/>.
- </para>
- </sect1>
+ <section id="querymodel-operation-type-explain">
+ <title>Explain Operation</title>
+ <para>
+ The <emphasis>syntax</emphasis> of &z3950;/&sru; queries is
+ well known to any client, but the specific
+ <emphasis>semantics</emphasis> - taking into account a
+ particular servers functionalities and abilities - must be
+ discovered from case to case. Enters the
+ explain operation, which provides the means for learning which
+ <emphasis>fields</emphasis> (also called
+ <emphasis>indexes</emphasis> or <emphasis>access points</emphasis>)
+ are provided, which default parameter the server uses, which
+ retrieve document formats are defined, and which specific parts
+ of the general query model are supported.
+ </para>
+ <para>
+ The &z3950; embeds the explain operation
+ by performing a
+ search in the magic
+ <literal>IR-Explain-1</literal> database;
+ see <xref linkend="querymodel-exp1"/>.
+ </para>
+ <para>
+ In &sru;, explain is an entirely separate
+ operation, which returns an ZeeRex &xml; record according to the
+ structure defined by the protocol.
+ </para>
+ <para>
+ In both cases, the information gathered through
+ explain operations can be used to
+ auto-configure a client user interface to the servers
+ capabilities.
+ </para>
+ </section>
- <sect1 id="querymodel-pqf">
- <title>Prefix Query Format structure and syntax</title>
- <para>
- The
- <ulink url="&url.yaz.pqf;">PQF
- grammer</ulink> is documented in the YAZ manual, and shall not be
- repeated here.
- This textual PQF representation
- is always during search mapped to the equivalent Zebra internal
- query parse tree.
- </para>
+ <section id="querymodel-operation-type-search">
+ <title>Search Operation</title>
+ <para>
+ Search and retrieve interactions are the raison d'ĂȘtre.
+ They are used to query the remote database and
+ return search result documents. Search queries span from
+ simple free text searches to nested complex boolean queries,
+ targeting specific indexes, and possibly enhanced with many
+ query semantic specifications. Search interactions are the heart
+ and soul of &z3950;/&sru; servers.
+ </para>
+ </section>
+
+ <section id="querymodel-operation-type-scan">
+ <title>Scan Operation</title>
+ <para>
+ The scan operation is a helper functionality,
+ which operates on one index or access point a time.
+ </para>
+ <para>
+ It provides
+ the means to investigate the content of specific indexes.
+ Scanning an index returns a handful of terms actually found in
+ the indexes, and in addition the scan
+ operation returns the number of documents indexed by each term.
+ A search client can use this information to propose proper
+ spelling of search terms, to auto-fill search boxes, or to
+ display controlled vocabularies.
+ </para>
+ </section>
+
+ </section>
+
+ </section>
- <sect2 id="querymodel-pqf-tree">
- <title>PQF tree structure</title>
+
+ <section id="querymodel-rpn">
+ <title>&rpn; queries and semantics</title>
<para>
- The PQF parse tree - or the equivalent textual representation -
- may start with one specification of the
- <emphasis>attribute set</emphasis> used. Following is a query
- tree, which
- consists of <emphasis>atomic query parts</emphasis>, eventually
- paired by <emphasis>boolean binary operators</emphasis>, and
- finally <emphasis>recursively combined </emphasis> into
- complex query trees.
+ The <ulink url="&url.yaz.pqf;">&pqf; grammar</ulink>
+ is documented in the &yaz; manual, and shall not be
+ repeated here. This textual &pqf; representation
+ is not transmistted to &zebra; during search, but it is in the
+ client mapped to the equivalent &z3950; binary
+ query parse tree.
</para>
-
- <sect3 id="querymodel-attribute-sets">
- <title>Attribute sets</title>
+
+ <section id="querymodel-rpn-tree">
+ <title>&rpn; tree structure</title>
<para>
+ The &rpn; parse tree - or the equivalent textual representation in &pqf; -
+ may start with one specification of the
+ <emphasis>attribute set</emphasis> used. Following is a query
+ tree, which
+ consists of <emphasis>atomic query parts (&apt;)</emphasis> or
+ <emphasis>named result sets</emphasis>, eventually
+ paired by <emphasis>boolean binary operators</emphasis>, and
+ finally <emphasis>recursively combined </emphasis> into
+ complex query trees.
+ </para>
+
+ <section id="querymodel-attribute-sets">
+ <title>Attribute sets</title>
+ <para>
Attribute sets define the exact meaning and semantics of queries
- issued. Zebra comes with some predefined attribute set
+ issued. &zebra; comes with some predefined attribute set
definitions, others can easily be defined and added to the
configuration.
- <note>
- The Zebra internal query procesing is modeled after
- the <literal>Bib1</literal> attribute set, and the non-use
- attributes type 2-9 are hard-wired in. It is therefore essential
- to be familiar with <xref linkend="querymodel-bib1"/>.
- </note>
- </para>
-
- <table id="querymodel-attribute-sets-table">
- <caption>Attribute sets predefined in Zebra</caption>
- <!--
- <thead>
- <tr><td>one</td><td>two</td></tr>
- </thead>
- -->
- <tbody>
- <tr>
- <td><emphasis>exp-1</emphasis></td>
- <td><literal>Explain</literal> attribute set</td>
- <td>Special attribute set used on the special automagic
- <literal>IR-Explain-1</literal> database to gain information on
- server capabilities, database names, and database
- and semantics.</td>
- </tr>
- <tr>
- <td><emphasis>bib-1</emphasis></td>
- <td><literal>Bib1</literal> attribute set</td>
- <td>Standard PQF query language attribute set which defines the
- semantics of Z39.50 searching. In addition, all of the
- non-use attributes (type 2-9) define the Zebra internal query
- processing</td>
- </tr>
- <tr>
- <td><emphasis>gils</emphasis></td>
- <td><literal>GILS</literal> attribute set</td>
- <td>Extention to the <literal>Bib1</literal> attribute set.</td>
- </tr>
- </tbody>
- </table>
- </sect3>
-
- <sect3 id="querymodel-boolean-operators">
- <title>Boolean operators</title>
- <para>
- A pair of subquery trees, or of atomic queries, is combined
+ </para>
+
+ <table id="querymodel-attribute-sets-table" frame="top">
+ <title>Attribute sets predefined in &zebra;</title>
+ <tgroup cols="4">
+ <thead>
+ <row>
+ <entry>Attribute set</entry>
+ <entry>&pqf; notation (Short hand)</entry>
+ <entry>Status</entry>
+ <entry>Notes</entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry>Explain</entry>
+ <entry><literal>exp-1</literal></entry>
+ <entry>Special attribute set used on the special automagic
+ <literal>IR-Explain-1</literal> database to gain information on
+ server capabilities, database names, and database
+ and semantics.</entry>
+ <entry>predefined</entry>
+ </row>
+ <row>
+ <entry>&bib1;</entry>
+ <entry><literal>bib-1</literal></entry>
+ <entry>Standard &pqf; query language attribute set which defines the
+ semantics of &z3950; searching. In addition, all of the
+ non-use attributes (types 2-12) define the hard-wired
+ &zebra; internal query
+ processing.</entry>
+ <entry>default</entry>
+ </row>
+ <row>
+ <entry>GILS</entry>
+ <entry><literal>gils</literal></entry>
+ <entry>Extension to the &bib1; attribute set.</entry>
+ <entry>predefined</entry>
+ </row>
+ <!--
+ <row>
+ <entry>&idxpath;</entry>
+ <entry><literal>idxpath</literal></entry>
+ <entry>Hardwired &xpath; like attribute set, only available for
+ indexing with the &grs1; record model</entry>
+ <entry>deprecated</entry>
+ </row>
+ -->
+ </tbody>
+ </tgroup>
+ </table>
+
+ <para>
+ The use attributes (type 1) mappings the
+ predefined attribute sets are found in the
+ attribute set configuration files <filename>tab/*.att</filename>.
+ </para>
+
+ <note>
+ <para>
+ The &zebra; internal query processing is modeled after
+ the &bib1; attribute set, and the non-use
+ attributes type 2-6 are hard-wired in. It is therefore essential
+ to be familiar with <xref linkend="querymodel-bib1-nonuse"/>.
+ </para>
+ </note>
+
+ </section>
+
+ <section id="querymodel-boolean-operators">
+ <title>Boolean operators</title>
+ <para>
+ A pair of sub query trees, or of atomic queries, is combined
using the standard boolean operators into new query trees.
- </para>
-
- <table id="querymodel-boolean-operators-table">
- <caption>Boolean operators</caption>
- <!--
- <thead>
- <tr><td>one</td><td>two</td></tr>
- </thead>
- -->
- <tbody>
- <tr><td><emphasis>@and</emphasis></td>
- <td>binary <literal>AND</literal> operator</td>
- <td>Set intersection of two atomic queries hit sets</td>
- </tr>
- <tr><td><emphasis>@or</emphasis></td>
- <td>binary <literal>OR</literal> operator</td>
- <td>Set union of two atomic queries hit sets</td>
- </tr>
- <tr><td><emphasis>@not</emphasis></td>
- <td>binary <literal>AND NOT</literal> operator</td>
- <td>Set complement of two atomic queries hit sets</td>
- </tr>
- <tr><td><emphasis>@prox</emphasis></td>
- <td>binary <literal>PROXIMY</literal> operator</td>
- <td>Set intersection of two atomic queries hit sets. In
- addition, the intersection set is purged for all
- documents which do not satisfy the requested query
- term proximity. Usually a proper subset of the AND
- operation.</td>
- </tr>
- </tbody>
- </table>
-
- <para>
+ Thus, boolean operators are always internal nodes in the query tree.
+ </para>
+
+ <table id="querymodel-boolean-operators-table" frame="top">
+ <title>Boolean operators</title>
+ <tgroup cols="3">
+ <thead>
+ <row>
+ <entry>Keyword</entry>
+ <entry>Operator</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row><entry><literal>@and</literal></entry>
+ <entry>binary AND operator</entry>
+ <entry>Set intersection of two atomic queries hit sets</entry>
+ </row>
+ <row><entry><literal>@or</literal></entry>
+ <entry>binary OR operator</entry>
+ <entry>Set union of two atomic queries hit sets</entry>
+ </row>
+ <row><entry><literal>@not</literal></entry>
+ <entry>binary AND NOT operator</entry>
+ <entry>Set complement of two atomic queries hit sets</entry>
+ </row>
+ <row><entry><literal>@prox</literal></entry>
+ <entry>binary PROXIMITY operator</entry>
+ <entry>Set intersection of two atomic queries hit sets. In
+ addition, the intersection set is purged for all
+ documents which do not satisfy the requested query
+ term proximity. Usually a proper subset of the AND
+ operation.</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <para>
For example, we can combine the terms
<emphasis>information</emphasis> and <emphasis>retrieval</emphasis>
into different searches in the default index of the default
Querying for the union of all documents containing the
terms <emphasis>information</emphasis> OR
<emphasis>retrieval</emphasis>:
- <screen>
- @or information retrieval
- </screen>
- </para>
- <para>
+ <screen>
+ Z> find @or information retrieval
+ </screen>
+ </para>
+ <para>
Querying for the intersection of all documents containing the
terms <emphasis>information</emphasis> AND
<emphasis>retrieval</emphasis>:
- The hit set is a subset of the coresponding
+ The hit set is a subset of the corresponding
OR query.
- <screen>
- @and information retrieval
- </screen>
- </para>
- <para>
+ <screen>
+ Z> find @and information retrieval
+ </screen>
+ </para>
+ <para>
Querying for the intersection of all documents containing the
terms <emphasis>information</emphasis> AND
<emphasis>retrieval</emphasis>, taking proximity into account:
- The hit set is a subset of the coresponding
- AND query.
- <screen>
- @prox information retrieval
- </screen>
- </para>
- <para>
+ The hit set is a subset of the corresponding
+ AND query
+ (see the <ulink url="&url.yaz.pqf;">&pqf; grammar</ulink> for
+ details on the proximity operator):
+ <screen>
+ Z> find @prox 0 3 0 2 k 2 information retrieval
+ </screen>
+ </para>
+ <para>
Querying for the intersection of all documents containing the
terms <emphasis>information</emphasis> AND
<emphasis>retrieval</emphasis>, in the same order and near each
- other as described in the term list
- The hit set is a subset of the coresponding
- PROXIMY query.
- <screen>
- "information retrieval"
- </screen>
- </para>
- </sect3>
-
-
- <sect3 id="querymodel-atomic-queries">
- <title>Atomic queries</title>
- <para>
- Atomic queries are the query parts which work on one acess point
- only. These consist of <literal>an attribute list</literal>
- followed by a <literal>single term</literal> or a
- <literal>quoted term list</literal>.
- </para>
- <para>
- Unsupplied non-use attributes type 2-9 are either inherited from
- higher nodes in the query tree, or are set to Zebra's default values.
+ other as described in the term list.
+ The hit set is a subset of the corresponding
+ PROXIMITY query.
+ <screen>
+ Z> find "information retrieval"
+ </screen>
+ </para>
+ </section>
+
+
+ <section id="querymodel-atomic-queries">
+ <title>Atomic queries (&apt;)</title>
+ <para>
+ Atomic queries are the query parts which work on one access point
+ only. These consist of <emphasis>an attribute list</emphasis>
+ followed by a <emphasis>single term</emphasis> or a
+ <emphasis>quoted term list</emphasis>, and are often called
+ <emphasis>Attributes-Plus-Terms (&apt;)</emphasis> queries.
+ </para>
+ <para>
+ Atomic (&apt;) queries are always leaf nodes in the &pqf; query tree.
+ UN-supplied non-use attributes types 2-12 are either inherited from
+ higher nodes in the query tree, or are set to &zebra;'s default values.
See <xref linkend="querymodel-bib1"/> for details.
- </para>
-
- <table id="querymodel-atomic-queries-table">
- <caption>Atomic queries</caption>
- <!--
- <thead>
- <tr><td>one</td><td>two</td></tr>
- </thead>
- -->
- <tbody>
- <tr><td><emphasis>attribute list</emphasis></td>
- <td>List of <literal>orthogonal</literal> attributes</td>
- <td>Any of the orthogonal attribute types may be omitted,
+ </para>
+
+ <table id="querymodel-atomic-queries-table" frame="top">
+ <title>Atomic queries (&apt;)</title>
+ <tgroup cols="3">
+ <thead>
+ <row>
+ <entry>Name</entry>
+ <entry>Type</entry>
+ <entry>Notes</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry><emphasis>attribute list</emphasis></entry>
+ <entry>List of <emphasis>orthogonal</emphasis> attributes</entry>
+ <entry>Any of the orthogonal attribute types may be omitted,
these are inherited from higher query tree nodes, or if not
- inherited, are set to the default Zebra configuration values.
- </td>
- </tr>
- <tr><td><emphasis>term</emphasis></td>
- <td>single <literal>term</literal>
- or <literal>quoted term list</literal> </td>
- <td>Here the search terms or list of search terms is added
- to the query</td>
- </tr>
- </tbody>
- </table>
- <para>
+ inherited, are set to the default &zebra; configuration values.
+ </entry>
+ </row>
+ <row>
+ <entry><emphasis>term</emphasis></entry>
+ <entry>single <emphasis>term</emphasis>
+ or <emphasis>quoted term list</emphasis> </entry>
+ <entry>Here the search terms or list of search terms is added
+ to the query</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ <para>
Querying for the term <emphasis>information</emphasis> in the
- default index using the default attribite set, the server choice
+ default index using the default attribute set, the server choice
of access point/index, and the default non-use attributes.
- <screen>
- "information"
- </screen>
- </para>
- <para>
- Equivalent query fully specified:
<screen>
- @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 "information"
+ Z> find information
</screen>
- </para>
-
- <para>
- Finding all documents which have empty titles. Notice that the
- empty term must be quoted, but is otherwise legal.
+ </para>
+ <para>
+ Equivalent query fully specified including all default values:
<screen>
- @attr 1=4 ""
+ Z> find @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 information
</screen>
- </para>
+ </para>
- </sect3>
+ <para>
+ Finding all documents which have the term
+ <emphasis>debussy</emphasis> in the title field.
+ <screen>
+ Z> find @attr 1=4 debussy
+ </screen>
+ </para>
- <sect3 id="querymodel-use-string">
- <title>Zebra's special use attribute of type 'string'</title>
<para>
- The numeric <literal>use (type 1)</literal> attribute is usually
- refered to from a given
- attribute set. In addition, Zebra let you use
+ The <emphasis>scan</emphasis> operation is only supported with
+ atomic &apt; queries, as it is bound to one access point at a
+ time. Boolean query trees are not allowed during
+ <emphasis>scan</emphasis>.
+ </para>
+
+ <para>
+ For example, we might want to scan the title index, starting with
+ the term
+ <emphasis>debussy</emphasis>, and displaying this and the
+ following terms in lexicographic order:
+ <screen>
+ Z> scan @attr 1=4 debussy
+ </screen>
+ </para>
+ </section>
+
+
+ <section id="querymodel-resultset">
+ <title>Named Result Sets</title>
+ <para>
+ Named result sets are supported in &zebra;, and result sets can be
+ used as operands without limitations. It follows that named
+ result sets are leaf nodes in the &pqf; query tree, exactly as
+ atomic &apt; queries are.
+ </para>
+ <para>
+ After the execution of a search, the result set is available at
+ the server, such that the client can use it for subsequent
+ searches or retrieval requests. The Z30.50 standard actually
+ stresses the fact that result sets are volatile. It may cease
+ to exist at any time point after search, and the server will
+ send a diagnostic to the effect that the requested
+ result set does not exist any more.
+ </para>
+
+ <para>
+ Defining a named result set and re-using it in the next query,
+ using <application>yaz-client</application>. Notice that the client, not
+ the server, assigns the string '1' to the
+ named result set.
+ <screen>
+ Z> f @attr 1=4 mozart
+ ...
+ Number of hits: 43, setno 1
+ ...
+ Z> f @and @set 1 @attr 1=4 amadeus
+ ...
+ Number of hits: 14, setno 2
+ </screen>
+ </para>
+
+ <note>
+ <para>
+ Named result sets are only supported by the &z3950; protocol.
+ The &sru; web service is stateless, and therefore the notion of
+ named result sets does not exist when accessing a &zebra; server by
+ the &sru; protocol.
+ </para>
+ </note>
+ </section>
+
+ <section id="querymodel-use-string">
+ <title>&zebra;'s special access point of type 'string'</title>
+ <para>
+ The numeric <emphasis>use (type 1)</emphasis> attribute is usually
+ referred to from a given
+ attribute set. In addition, &zebra; let you use
<emphasis>any internal index
- name defined in your configuration</emphasis>
- as use atribute value. This is a great feature for
+ name defined in your configuration</emphasis>
+ as use attribute value. This is a great feature for
debugging, and when you do
- not need the complecity of defined use attribute values. It is
- the preferred way of accessing Zebra indexes directly.
+ not need the complexity of defined use attribute values. It is
+ the preferred way of accessing &zebra; indexes directly.
</para>
<para>
Finding all documents which have the term list "information
- retrieval" in an Zebra index, using it's internal full string name.
+ retrieval" in an &zebra; index, using it's internal full string
+ name. Scanning the same index.
<screen>
- @attr 1=sometext "information retrieval"
+ Z> find @attr 1=sometext "information retrieval"
+ Z> scan @attr 1=sometext aterm
</screen>
- </para>
+ </para>
<para>
- Searching the bib-1 use attribute 54 using it's string name:
+ Searching or scanning
+ the bib-1 use attribute 54 using it's string name:
<screen>
- @attr 1=Code-language eng
+ Z> find @attr 1=Code-language eng
+ Z> scan @attr 1=Code-language ""
</screen>
- </para>
+ </para>
<para>
- Searching in any silly string index - if it's defined in your
- indexation rules and can be parsed by the PQF parser.
+ It is possible to search
+ in any silly string index - if it's defined in your
+ indexation rules and can be parsed by the &pqf; parser.
This is definitely not the recommended use of
this facility, as it might confuse your users with some very
unexpected results.
<screen>
- @attr 1=silly/xpath/alike[@index]/name "information retrieval"
+ Z> find @attr 1=silly/xpath/alike[@index]/name "information retrieval"
+ </screen>
+ </para>
+ <para>
+ See also <xref linkend="querymodel-pqf-apt-mapping"/> for details, and
+ <xref linkend="zebrasrv-sru"/>
+ for the &sru; &pqf; query extension using string names as a fast
+ debugging facility.
+ </para>
+ </section>
+
+ <section id="querymodel-use-xpath">
+ <title>&zebra;'s special access point of type 'XPath'
+ for &grs1; filters</title>
+ <para>
+ As we have seen above, it is possible (albeit seldom a great
+ idea) to emulate
+ <ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink> based
+ search by defining <emphasis>use (type 1)</emphasis>
+ <emphasis>string</emphasis> attributes which in appearance
+ <emphasis>resemble XPath queries</emphasis>. There are two
+ problems with this approach: first, the XPath-look-alike has to
+ be defined at indexation time, no new undefined
+ XPath queries can entered at search time, and second, it might
+ confuse users very much that an XPath-alike index name in fact
+ gets populated from a possible entirely different &xml; element
+ than it pretends to access.
+ </para>
+ <para>
+ When using the &grs1; Record Model
+ (see <xref linkend="grs"/>), we have the
+ possibility to embed <emphasis>life</emphasis>
+ XPath expressions
+ in the &pqf; queries, which are here called
+ <emphasis>use (type 1)</emphasis> <emphasis>xpath</emphasis>
+ attributes. You must enable the
+ <literal>xpath enable</literal> directive in your
+ <literal>.abs</literal> configuration files.
+ </para>
+ <note>
+ <para>
+ Only a <emphasis>very</emphasis> restricted subset of the
+ <ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink>
+ standard is supported as the &grs1; record model is simpler than
+ a full &xml; &dom; structure. See the following examples for
+ possibilities.
+ </para>
+ </note>
+ <para>
+ Finding all documents which have the term "content"
+ inside a text node found in a specific &xml; &dom;
+ <emphasis>subtree</emphasis>, whose starting element is
+ addressed by XPath.
+ <screen>
+ Z> find @attr 1=/root content
+ Z> find @attr 1=/root/first content
+ </screen>
+ <emphasis>Notice that the
+ XPath must be absolute, i.e., must start with '/', and that the
+ XPath <literal>descendant-or-self</literal> axis followed by a
+ text node selection <literal>text()</literal> is implicitly
+ appended to the stated XPath.
+ </emphasis>
+ It follows that the above searches are interpreted as:
+ <screen>
+ Z> find @attr 1=/root//text() content
+ Z> find @attr 1=/root/first//text() content
+ </screen>
+ </para>
+
+ <para>
+ Searching inside attribute strings is possible:
+ <screen>
+ Z> find @attr 1=/link/@creator morten
+ </screen>
+ </para>
+
+ <para>
+ Filter the addressing XPath by a predicate working on exact
+ string values in
+ attributes (in the &xml; sense) can be done: return all those docs which
+ have the term "english" contained in one of all text sub nodes of
+ the subtree defined by the XPath
+ <literal>/record/title[@lang='en']</literal>. And similar
+ predicate filtering.
+ <screen>
+ Z> find @attr 1=/record/title[@lang='en'] english
+ Z> find @attr 1=/link[@creator='sisse'] sibelius
+ Z> find @attr 1=/link[@creator='sisse']/description[@xml:lang='da'] sibelius
+ </screen>
+ </para>
+
+ <para>
+ Combining numeric indexes, boolean expressions,
+ and xpath based searches is possible:
+ <screen>
+ Z> find @attr 1=/record/title @and foo bar
+ Z> find @and @attr 1=/record/title foo @attr 1=4 bar
+ </screen>
+ </para>
+ <para>
+ Escaping &pqf; keywords and other non-parseable XPath constructs
+ with <literal>'{ }'</literal> to prevent client-side &pqf; parsing
+ syntax errors:
+ <screen>
+ Z> find @attr {1=/root/first[@attr='danish']} content
+ Z> find @attr {1=/record/@set} oai
+ </screen>
+ </para>
+ <warning>
+ <para>
+ It is worth mentioning that these dynamic performed XPath
+ queries are a performance bottleneck, as no optimized
+ specialized indexes can be used. Therefore, avoid the use of
+ this facility when speed is essential, and the database content
+ size is medium to large.
+ </para>
+ </warning>
+ </section>
+ </section>
+
+ <section id="querymodel-exp1">
+ <title>Explain Attribute Set</title>
+ <para>
+ The &z3950; standard defines the
+ <ulink url="&url.z39.50.explain;">Explain</ulink> attribute set
+ Exp-1, which is used to discover information
+ about a server's search semantics and functional capabilities
+ &zebra; exposes a "classic"
+ Explain database by base name <literal>IR-Explain-1</literal>, which
+ is populated with system internal information.
+ </para>
+ <para>
+ The attribute-set <literal>exp-1</literal> consists of a single
+ use attribute (type 1).
+ </para>
+ <para>
+ In addition, the non-Use
+ &bib1; attributes, that is, the types
+ <emphasis>Relation</emphasis>, <emphasis>Position</emphasis>,
+ <emphasis>Structure</emphasis>, <emphasis>Truncation</emphasis>,
+ and <emphasis>Completeness</emphasis> are imported from
+ the &bib1; attribute set, and may be used
+ within any explain query.
+ </para>
+
+ <section id="querymodel-exp1-use">
+ <title>Use Attributes (type = 1)</title>
+ <para>
+ The following Explain search attributes are supported:
+ <literal>ExplainCategory</literal> (@attr 1=1),
+ <literal>DatabaseName</literal> (@attr 1=3),
+ <literal>DateAdded</literal> (@attr 1=9),
+ <literal>DateChanged</literal>(@attr 1=10).
+ </para>
+ <para>
+ A search in the use attribute <literal>ExplainCategory</literal>
+ supports only these predefined values:
+ <literal>CategoryList</literal>, <literal>TargetInfo</literal>,
+ <literal>DatabaseInfo</literal>, <literal>AttributeDetails</literal>.
+ </para>
+ <para>
+ See <filename>tab/explain.att</filename> and the
+ <ulink url="&url.z39.50;">&z3950;</ulink> standard
+ for more information.
+ </para>
+ </section>
+
+ <section id="querymodel-examples">
+ <title>Explain searches with yaz-client</title>
+ <para>
+ Classic Explain only defines retrieval of Explain information
+ via ASN.1. Practically no &z3950; clients supports this. Fortunately
+ they don't have to - &zebra; allows retrieval of this information
+ in other formats:
+ <literal>&sutrs;</literal>, <literal>&xml;</literal>,
+ <literal>&grs1;</literal> and <literal>ASN.1</literal> Explain.
+ </para>
+
+ <para>
+ List supported categories to find out which explain commands are
+ supported:
+ <screen>
+ Z> base IR-Explain-1
+ Z> find @attr exp1 1=1 categorylist
+ Z> form sutrs
+ Z> show 1+2
+ </screen>
+ </para>
+
+ <para>
+ Get target info, that is, investigate which databases exist at
+ this server endpoint:
+ <screen>
+ Z> base IR-Explain-1
+ Z> find @attr exp1 1=1 targetinfo
+ Z> form xml
+ Z> show 1+1
+ Z> form grs-1
+ Z> show 1+1
+ Z> form sutrs
+ Z> show 1+1
+ </screen>
+ </para>
+
+ <para>
+ List all supported databases, the number of hits
+ is the number of databases found, which most commonly are the
+ following two:
+ the <literal>Default</literal> and the
+ <literal>IR-Explain-1</literal> databases.
+ <screen>
+ Z> base IR-Explain-1
+ Z> find @attr exp1 1=1 databaseinfo
+ Z> form sutrs
+ Z> show 1+2
+ </screen>
+ </para>
+
+ <para>
+ Get database info record for database <literal>Default</literal>.
+ <screen>
+ Z> base IR-Explain-1
+ Z> find @and @attr exp1 1=1 databaseinfo @attr exp1 1=3 Default
+ </screen>
+ Identical query with explicitly specified attribute set:
+ <screen>
+ Z> base IR-Explain-1
+ Z> find @attrset exp1 @and @attr 1=1 databaseinfo @attr 1=3 Default
+ </screen>
+ </para>
+
+ <para>
+ Get attribute details record for database
+ <literal>Default</literal>.
+ This query is very useful to study the internal &zebra; indexes.
+ If records have been indexed using the <literal>alvis</literal>
+ &xslt; filter, the string representation names of the known indexes can be
+ found.
+ <screen>
+ Z> base IR-Explain-1
+ Z> find @and @attr exp1 1=1 attributedetails @attr exp1 1=3 Default
+ </screen>
+ Identical query with explicitly specified attribute set:
+ <screen>
+ Z> base IR-Explain-1
+ Z> find @attrset exp1 @and @attr 1=1 attributedetails @attr 1=3 Default
+ </screen>
+ </para>
+ </section>
+
+ </section>
+
+ <section id="querymodel-bib1">
+ <title>&bib1; Attribute Set</title>
+ <para>
+ Most of the information contained in this section is an excerpt of
+ the ATTRIBUTE SET &bib1; (&z3950;-1995) SEMANTICS
+ found at <ulink url="&url.z39.50.attset.bib1.1995;">. The &bib1;
+ Attribute Set Semantics</ulink> from 1995, also in an updated
+ <ulink url="&url.z39.50.attset.bib1;">&bib1;
+ Attribute Set</ulink>
+ version from 2003. Index Data is not the copyright holder of this
+ information, except for the configuration details, the listing of
+ &zebra;'s capabilities, and the example queries.
+ </para>
+
+
+ <section id="querymodel-bib1-use">
+ <title>Use Attributes (type 1)</title>
+
+ <para>
+ A use attribute specifies an access point for any atomic query.
+ These access points are highly dependent on the attribute set used
+ in the query, and are user configurable using the following
+ default configuration files:
+ <filename>tab/bib1.att</filename>,
+ <filename>tab/dan1.att</filename>,
+ <filename>tab/explain.att</filename>, and
+ <filename>tab/gils.att</filename>.
+ </para>
+ <para>
+ For example, some few &bib1; use
+ attributes from the <filename>tab/bib1.att</filename> are:
+ <screen>
+ att 1 Personal-name
+ att 2 Corporate-name
+ att 3 Conference-name
+ att 4 Title
+ ...
+ att 1009 Subject-name-personal
+ att 1010 Body-of-text
+ att 1011 Date/time-added-to-db
+ ...
+ att 1016 Any
+ att 1017 Server-choice
+ att 1018 Publisher
+ ...
+ att 1035 Anywhere
+ att 1036 Author-Title-Subject
+ </screen>
+ </para>
+ <para>
+ New attribute sets can be added by adding new
+ <filename>tab/*.att</filename> configuration files, which need to
+ be sourced in the main configuration <filename>zebra.cfg</filename>.
+ </para>
+ <para>
+ In addition, &zebra; allows the access of
+ <emphasis>internal index names</emphasis> and <emphasis>dynamic
+ XPath</emphasis> as use attributes; see
+ <xref linkend="querymodel-use-string"/> and
+ <xref linkend="querymodel-use-xpath"/>.
+ </para>
+
+ <para>
+ Phrase search for <emphasis>information retrieval</emphasis> in
+ the title-register, scanning the same register afterwards:
+ <screen>
+ Z> find @attr 1=4 "information retrieval"
+ Z> scan @attr 1=4 information
+ </screen>
+ </para>
+ </section>
+
+ </section>
+
+
+ <section id="querymodel-bib1-nonuse">
+ <title>&zebra; general Bib1 Non-Use Attributes (type 2-6)</title>
+
+ <section id="querymodel-bib1-relation">
+ <title>Relation Attributes (type 2)</title>
+
+ <para>
+ Relation attributes describe the relationship of the access
+ point (left side
+ of the relation) to the search term as qualified by the attributes (right
+ side of the relation), e.g., Date-publication <= 1975.
+ </para>
+
+ <table id="querymodel-bib1-relation-table" frame="top">
+ <title>Relation Attributes (type 2)</title>
+ <tgroup cols="3">
+ <thead>
+ <row>
+ <entry>Relation</entry>
+ <entry>Value</entry>
+ <entry>Notes</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>Less than</entry>
+ <entry>1</entry>
+ <entry>supported</entry>
+ </row>
+ <row>
+ <entry>Less than or equal</entry>
+ <entry>2</entry>
+ <entry>supported</entry>
+ </row>
+ <row>
+ <entry>Equal</entry>
+ <entry>3</entry>
+ <entry>default</entry>
+ </row>
+ <row>
+ <entry>Greater or equal</entry>
+ <entry>4</entry>
+ <entry>supported</entry>
+ </row>
+ <row>
+ <entry>Greater than</entry>
+ <entry>5</entry>
+ <entry>supported</entry>
+ </row>
+ <row>
+ <entry>Not equal</entry>
+ <entry>6</entry>
+ <entry>unsupported</entry>
+ </row>
+ <row>
+ <entry>Phonetic</entry>
+ <entry>100</entry>
+ <entry>unsupported</entry>
+ </row>
+ <row>
+ <entry>Stem</entry>
+ <entry>101</entry>
+ <entry>unsupported</entry>
+ </row>
+ <row>
+ <entry>Relevance</entry>
+ <entry>102</entry>
+ <entry>supported</entry>
+ </row>
+ <row>
+ <entry>AlwaysMatches</entry>
+ <entry>103</entry>
+ <entry>supported *</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ <note>
+ <para>
+ AlwaysMatches searches are only supported if alwaysmatches indexing
+ has been enabled. See <xref linkend="default-idx-file"/>
+ </para>
+ </note>
+
+ <para>
+ The relation attributes 1-5 are supported and work exactly as
+ expected.
+ All ordering operations are based on a lexicographical ordering,
+ <emphasis>expect</emphasis> when the
+ structure attribute numeric (109) is used. In
+ this case, ordering is numerical. See
+ <xref linkend="querymodel-bib1-structure"/>.
+ <screen>
+ Z> find @attr 1=Title @attr 2=1 music
+ ...
+ Number of hits: 11745, setno 1
+ ...
+ Z> find @attr 1=Title @attr 2=2 music
+ ...
+ Number of hits: 11771, setno 2
+ ...
+ Z> find @attr 1=Title @attr 2=3 music
+ ...
+ Number of hits: 532, setno 3
+ ...
+ Z> find @attr 1=Title @attr 2=4 music
+ ...
+ Number of hits: 11463, setno 4
+ ...
+ Z> find @attr 1=Title @attr 2=5 music
+ ...
+ Number of hits: 11419, setno 5
+ </screen>
+ </para>
+
+ <para>
+ The relation attribute
+ <emphasis>Relevance (102)</emphasis> is supported, see
+ <xref linkend="administration-ranking"/> for full information.
+ </para>
+
+ <para>
+ Ranked search for <emphasis>information retrieval</emphasis> in
+ the title-register:
+ <screen>
+ Z> find @attr 1=4 @attr 2=102 "information retrieval"
+ </screen>
+ </para>
+
+ <para>
+ The relation attribute
+ <emphasis>AlwaysMatches (103)</emphasis> is in the default
+ configuration
+ supported in conjecture with structure attribute
+ <emphasis>Phrase (1)</emphasis> (which may be omitted by
+ default).
+ It can be configured to work with other structure attributes,
+ see the configuration file
+ <filename>tab/default.idx</filename> and
+ <xref linkend="querymodel-pqf-apt-mapping"/>.
+ </para>
+ <para>
+ <emphasis>AlwaysMatches (103)</emphasis> is a
+ great way to discover how many documents have been indexed in a
+ given field. The search term is ignored, but needed for correct
+ &pqf; syntax. An empty search term may be supplied.
+ <screen>
+ Z> find @attr 1=Title @attr 2=103 ""
+ Z> find @attr 1=Title @attr 2=103 @attr 4=1 ""
+ </screen>
+ </para>
+
+
+ </section>
+
+ <section id="querymodel-bib1-position">
+ <title>Position Attributes (type 3)</title>
+
+ <para>
+ The position attribute specifies the location of the search term
+ within the field or subfield in which it appears.
+ </para>
+
+ <table id="querymodel-bib1-position-table" frame="top">
+ <title>Position Attributes (type 3)</title>
+ <tgroup cols="3">
+ <thead>
+ <row>
+ <entry>Position</entry>
+ <entry>Value</entry>
+ <entry>Notes</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>First in field </entry>
+ <entry>1</entry>
+ <entry>supported *</entry>
+ </row>
+ <row>
+ <entry>First in subfield</entry>
+ <entry>2</entry>
+ <entry>supported *</entry>
+ </row>
+ <row>
+ <entry>Any position in field</entry>
+ <entry>3</entry>
+ <entry>default</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <note>
+ <para>
+ &zebra; only supports first-in-field seaches if the
+ <literal>firstinfield</literal> is enabled for the index
+ Refer to <xref linkend="default-idx-file"/>.
+ &zebra; does not distinguish between first in field and
+ first in subfield. They result in the same hit count.
+ Searching for first position in (sub)field in only supported in &zebra;
+ 2.0.2 and later.
+ </para>
+ </note>
+ </section>
+
+ <section id="querymodel-bib1-structure">
+ <title>Structure Attributes (type 4)</title>
+
+ <para>
+ The structure attribute specifies the type of search
+ term. This causes the search to be mapped on
+ different &zebra; internal indexes, which must have been defined
+ at index time.
+ </para>
+
+ <para>
+ The possible values of the
+ <literal>structure attribute (type 4)</literal> can be defined
+ using the configuration file <filename>
+ tab/default.idx</filename>.
+ The default configuration is summarized in this table.
+ </para>
+
+ <table id="querymodel-bib1-structure-table" frame="top">
+ <title>Structure Attributes (type 4)</title>
+ <tgroup cols="3">
+ <thead>
+ <row>
+ <entry>Structure</entry>
+ <entry>Value</entry>
+ <entry>Notes</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>Phrase </entry>
+ <entry>1</entry>
+ <entry>default</entry>
+ </row>
+ <row>
+ <entry>Word</entry>
+ <entry>2</entry>
+ <entry>supported</entry>
+ </row>
+ <row>
+ <entry>Key</entry>
+ <entry>3</entry>
+ <entry>supported</entry>
+ </row>
+ <row>
+ <entry>Year</entry>
+ <entry>4</entry>
+ <entry>supported</entry>
+ </row>
+ <row>
+ <entry>Date (normalized)</entry>
+ <entry>5</entry>
+ <entry>supported</entry>
+ </row>
+ <row>
+ <entry>Word list</entry>
+ <entry>6</entry>
+ <entry>supported</entry>
+ </row>
+ <row>
+ <entry>Date (un-normalized)</entry>
+ <entry>100</entry>
+ <entry>unsupported</entry>
+ </row>
+ <row>
+ <entry>Name (normalized) </entry>
+ <entry>101</entry>
+ <entry>unsupported</entry>
+ </row>
+ <row>
+ <entry>Name (un-normalized) </entry>
+ <entry>102</entry>
+ <entry>unsupported</entry>
+ </row>
+ <row>
+ <entry>Structure</entry>
+ <entry>103</entry>
+ <entry>unsupported</entry>
+ </row>
+ <row>
+ <entry>Urx</entry>
+ <entry>104</entry>
+ <entry>supported</entry>
+ </row>
+ <row>
+ <entry>Free-form-text</entry>
+ <entry>105</entry>
+ <entry>supported</entry>
+ </row>
+ <row>
+ <entry>Document-text</entry>
+ <entry>106</entry>
+ <entry>supported</entry>
+ </row>
+ <row>
+ <entry>Local-number</entry>
+ <entry>107</entry>
+ <entry>supported</entry>
+ </row>
+ <row>
+ <entry>String</entry>
+ <entry>108</entry>
+ <entry>unsupported</entry>
+ </row>
+ <row>
+ <entry>Numeric string</entry>
+ <entry>109</entry>
+ <entry>supported</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <para>
+ The structure attribute values
+ <literal>Word list (6)</literal>
+ is supported, and maps to the boolean <literal>AND</literal>
+ combination of words supplied. The word list is useful when
+ google-like bag-of-word queries need to be translated from a GUI
+ query language to &pqf;. For example, the following queries
+ are equivalent:
+ <screen>
+ Z> find @attr 1=Title @attr 4=6 "mozart amadeus"
+ Z> find @attr 1=Title @and mozart amadeus
+ </screen>
+ </para>
+
+ <para>
+ The structure attribute value
+ <literal>Free-form-text (105)</literal> and
+ <literal>Document-text (106)</literal>
+ are supported, and map both to the boolean <literal>OR</literal>
+ combination of words supplied. The following queries
+ are equivalent:
+ <screen>
+ Z> find @attr 1=Body-of-text @attr 4=105 "bach salieri teleman"
+ Z> find @attr 1=Body-of-text @attr 4=106 "bach salieri teleman"
+ Z> find @attr 1=Body-of-text @or bach @or salieri teleman
+ </screen>
+ This <literal>OR</literal> list of terms is very useful in
+ combination with relevance ranking:
+ <screen>
+ Z> find @attr 1=Body-of-text @attr 2=102 @attr 4=105 "bach salieri teleman"
+ </screen>
+ </para>
+
+ <para>
+ The structure attribute value
+ <literal>Local number (107)</literal>
+ is supported, and maps always to the &zebra; internal document ID,
+ irrespectively which use attribute is specified. The following queries
+ have exactly the same unique record in the hit set:
+ <screen>
+ Z> find @attr 4=107 10
+ Z> find @attr 1=4 @attr 4=107 10
+ Z> find @attr 1=1010 @attr 4=107 10
+ </screen>
+ </para>
+
+ <para>
+ In
+ the GILS schema (<literal>gils.abs</literal>), the
+ west-bounding-coordinate is indexed as type <literal>n</literal>,
+ and is therefore searched by specifying
+ <emphasis>structure</emphasis>=<emphasis>Numeric String</emphasis>.
+ To match all those records with west-bounding-coordinate greater
+ than -114 we use the following query:
+ <screen>
+ Z> find @attr 4=109 @attr 2=5 @attr gils 1=2038 -114
+ </screen>
+ </para>
+ <note>
+ <para>
+ The exact mapping between &pqf; queries and &zebra; internal indexes
+ and index types is explained in
+ <xref linkend="querymodel-pqf-apt-mapping"/>.
+ </para>
+ </note>
+ </section>
+
+ <section id="querymodel-bib1-truncation">
+ <title>Truncation Attributes (type = 5)</title>
+
+ <para>
+ The truncation attribute specifies whether variations of one or
+ more characters are allowed between search term and hit terms, or
+ not. Using non-default truncation attributes will broaden the
+ document hit set of a search query.
+ </para>
+
+ <table id="querymodel-bib1-truncation-table" frame="top">
+ <title>Truncation Attributes (type 5)</title>
+ <tgroup cols="3">
+ <thead>
+ <row>
+ <entry>Truncation</entry>
+ <entry>Value</entry>
+ <entry>Notes</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>Right truncation </entry>
+ <entry>1</entry>
+ <entry>supported</entry>
+ </row>
+ <row>
+ <entry>Left truncation</entry>
+ <entry>2</entry>
+ <entry>supported</entry>
+ </row>
+ <row>
+ <entry>Left and right truncation</entry>
+ <entry>3</entry>
+ <entry>supported</entry>
+ </row>
+ <row>
+ <entry>Do not truncate</entry>
+ <entry>100</entry>
+ <entry>default</entry>
+ </row>
+ <row>
+ <entry>Process # in search term</entry>
+ <entry>101</entry>
+ <entry>supported</entry>
+ </row>
+ <row>
+ <entry>RegExpr-1 </entry>
+ <entry>102</entry>
+ <entry>supported</entry>
+ </row>
+ <row>
+ <entry>RegExpr-2</entry>
+ <entry>103</entry>
+ <entry>supported</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <para>
+ The truncation attribute values 1-3 perform the obvious way:
+ <screen>
+ Z> scan @attr 1=Body-of-text schnittke
+ ...
+ * schnittke (81)
+ schnittkes (31)
+ schnittstelle (1)
+ ...
+ Z> find @attr 1=Body-of-text @attr 5=1 schnittke
+ ...
+ Number of hits: 95, setno 7
+ ...
+ Z> find @attr 1=Body-of-text @attr 5=2 schnittke
+ ...
+ Number of hits: 81, setno 6
+ ...
+ Z> find @attr 1=Body-of-text @attr 5=3 schnittke
+ ...
+ Number of hits: 95, setno 8
+ </screen>
+ </para>
+
+ <para>
+ The truncation attribute value
+ <literal>Process # in search term (101)</literal> is a
+ poor-man's regular expression search. It maps
+ each <literal>#</literal> to <literal>.*</literal>, and
+ performs then a <literal>Regexp-1 (102)</literal> regular
+ expression search. The following two queries are equivalent:
+ <screen>
+ Z> find @attr 1=Body-of-text @attr 5=101 schnit#ke
+ Z> find @attr 1=Body-of-text @attr 5=102 schnit.*ke
+ ...
+ Number of hits: 89, setno 10
+ </screen>
+ </para>
+
+ <para>
+ The truncation attribute value
+ <literal>Regexp-1 (102)</literal> is a normal regular search,
+ see <xref linkend="querymodel-regular"/> for details.
+ <screen>
+ Z> find @attr 1=Body-of-text @attr 5=102 schnit+ke
+ Z> find @attr 1=Body-of-text @attr 5=102 schni[a-t]+ke
+ </screen>
+ </para>
+
+ <para>
+ The truncation attribute value
+ <literal>Regexp-2 (103) </literal> is a &zebra; specific extension
+ which allows <emphasis>fuzzy</emphasis> matches. One single
+ error in spelling of search terms is allowed, i.e., a document
+ is hit if it includes a term which can be mapped to the used
+ search term by one character substitution, addition, deletion or
+ change of position.
+ <screen>
+ Z> find @attr 1=Body-of-text @attr 5=100 schnittke
+ ...
+ Number of hits: 81, setno 14
+ ...
+ Z> find @attr 1=Body-of-text @attr 5=103 schnittke
+ ...
+ Number of hits: 103, setno 15
+ ...
+ </screen>
+ </para>
+ </section>
+
+ <section id="querymodel-bib1-completeness">
+ <title>Completeness Attributes (type = 6)</title>
+
+
+ <para>
+ The <literal>Completeness Attributes (type = 6)</literal>
+ is used to specify that a given search term or term list is either
+ part of the terms of a given index/field
+ (<literal>Incomplete subfield (1)</literal>), or is
+ what literally is found in the entire field's index
+ (<literal>Complete field (3)</literal>).
+ </para>
+
+ <table id="querymodel-bib1-completeness-table" frame="top">
+ <title>Completeness Attributes (type = 6)</title>
+ <tgroup cols="3">
+ <thead>
+ <row>
+ <entry>Completeness</entry>
+ <entry>Value</entry>
+ <entry>Notes</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>Incomplete subfield</entry>
+ <entry>1</entry>
+ <entry>default</entry>
+ </row>
+ <row>
+ <entry>Complete subfield</entry>
+ <entry>2</entry>
+ <entry>deprecated</entry>
+ </row>
+ <row>
+ <entry>Complete field</entry>
+ <entry>3</entry>
+ <entry>supported</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <para>
+ The <literal>Completeness Attributes (type = 6)</literal>
+ is only partially and conditionally
+ supported in the sense that it is ignored if the hit index is
+ not of structure <literal>type="w"</literal> or
+ <literal>type="p"</literal>.
+ </para>
+ <para>
+ <literal>Incomplete subfield (1)</literal> is the default, and
+ makes &zebra; use
+ register <literal>type="w"</literal>, whereas
+ <literal>Complete field (3)</literal> triggers
+ search and scan in index <literal>type="p"</literal>.
+ </para>
+ <para>
+ The <literal>Complete subfield (2)</literal> is a reminiscens
+ from the happy <literal>&marc;</literal>
+ binary format days. &zebra; does not support it, but maps silently
+ to <literal>Complete field (3)</literal>.
+ </para>
+
+ <note>
+ <para>
+ The exact mapping between &pqf; queries and &zebra; internal indexes
+ and index types is explained in
+ <xref linkend="querymodel-pqf-apt-mapping"/>.
+ </para>
+ </note>
+ </section>
+ </section>
+
+ </section>
+
+
+ <section id="querymodel-zebra">
+ <title>Extended &zebra; &rpn; Features</title>
+ <para>
+ The &zebra; internal query engine has been extended to specific needs
+ not covered by the <literal>bib-1</literal> attribute set query
+ model. These extensions are <emphasis>non-standard</emphasis>
+ and <emphasis>non-portable</emphasis>: most functional extensions
+ are modeled over the <literal>bib-1</literal> attribute set,
+ defining type 7 and higher values.
+ There are also the special
+ <literal>string</literal> type index names for the
+ <literal>idxpath</literal> attribute set.
+ </para>
+
+ <section id="querymodel-zebra-attr-allrecords">
+ <title>&zebra; specific retrieval of all records</title>
+ <para>
+ &zebra; defines a hardwired <literal>string</literal> index name
+ called <literal>_ALLRECORDS</literal>. It matches any record
+ contained in the database, if used in conjunction with
+ the relation attribute
+ <literal>AlwaysMatches (103)</literal>.
+ </para>
+ <para>
+ The <literal>_ALLRECORDS</literal> index name is used for total database
+ export. The search term is ignored, it may be empty.
+ <screen>
+ Z> find @attr 1=_ALLRECORDS @attr 2=103 ""
+ </screen>
+ </para>
+ <para>
+ Combination with other index types can be made. For example, to
+ find all records which are <emphasis>not</emphasis> indexed in
+ the <literal>Title</literal> register, issue one of the two
+ equivalent queries:
+ <screen>
+ Z> find @not @attr 1=_ALLRECORDS @attr 2=103 "" @attr 1=Title @attr 2=103 ""
+ Z> find @not @attr 1=_ALLRECORDS @attr 2=103 "" @attr 1=4 @attr 2=103 ""
+ </screen>
+ </para>
+ <warning>
+ <para>
+ The special string index <literal>_ALLRECORDS</literal> is
+ experimental, and the provided functionality and syntax may very
+ well change in future releases of &zebra;.
+ </para>
+ </warning>
+ </section>
+
+ <section id="querymodel-zebra-attr-search">
+ <title>&zebra; specific Search Extensions to all Attribute Sets</title>
+ <para>
+ &zebra; extends the &bib1; attribute types, and these extensions are
+ recognized regardless of attribute
+ set used in a <literal>search</literal> operation query.
+ </para>
+
+ <table id="querymodel-zebra-attr-search-table" frame="top">
+ <title>&zebra; Search Attribute Extensions</title>
+ <tgroup cols="4">
+ <thead>
+ <row>
+ <entry>Name</entry>
+ <entry>Value</entry>
+ <entry>Operation</entry>
+ <entry>&zebra; version</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>Embedded Sort</entry>
+ <entry>7</entry>
+ <entry>search</entry>
+ <entry>1.1</entry>
+ </row>
+ <row>
+ <entry>Term Set</entry>
+ <entry>8</entry>
+ <entry>search</entry>
+ <entry>1.1</entry>
+ </row>
+ <row>
+ <entry>Rank Weight</entry>
+ <entry>9</entry>
+ <entry>search</entry>
+ <entry>1.1</entry>
+ </row>
+ <row>
+ <entry>Term Reference</entry>
+ <entry>10</entry>
+ <entry>search</entry>
+ <entry>1.4</entry>
+ </row>
+ <row>
+ <entry>Local Approx Limit</entry>
+ <entry>11</entry>
+ <entry>search</entry>
+ <entry>1.4</entry>
+ </row>
+ <row>
+ <entry>Global Approx Limit</entry>
+ <entry>12</entry>
+ <entry>search</entry>
+ <entry>2.0.8</entry>
+ </row>
+ </tbody>
+ <row>
+ <entry>Maximum number of truncated terms (truncmax)</entry>
+ <entry>13</entry>
+ <entry>search</entry>
+ <entry>2.0.10</entry>
+ </row>
+ </tgroup>
+ </table>
+
+ <section id="querymodel-zebra-attr-sorting">
+ <title>&zebra; Extension Embedded Sort Attribute (type 7)</title>
+ <para>
+ The embedded sort is a way to specify sort within a query - thus
+ removing the need to send a Sort Request separately. It is both
+ faster and does not require clients to deal with the Sort
+ Facility.
+ </para>
+
+ <para>
+ All ordering operations are based on a lexicographical ordering,
+ <emphasis>expect</emphasis> when the
+ <literal>structure attribute numeric (109)</literal> is used. In
+ this case, ordering is numerical. See
+ <xref linkend="querymodel-bib1-structure"/>.
+ </para>
+
+ <para>
+ The possible values after attribute <literal>type 7</literal> are
+ <literal>1</literal> ascending and
+ <literal>2</literal> descending.
+ The attributes+term (&apt;) node is separate from the
+ rest and must be <literal>@or</literal>'ed.
+ The term associated with &apt; is the sorting level in integers,
+ where <literal>0</literal> means primary sort,
+ <literal>1</literal> means secondary sort, and so forth.
+ See also <xref linkend="administration-ranking"/>.
+ </para>
+ <para>
+ For example, searching for water, sort by title (ascending)
+ <screen>
+ Z> find @or @attr 1=1016 water @attr 7=1 @attr 1=4 0
+ </screen>
+ </para>
+ <para>
+ Or, searching for water, sort by title ascending, then date descending
+ <screen>
+ Z> find @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1
+ </screen>
+ </para>
+ </section>
+
+ <!--
+ &zebra; Extension Term Set Attribute
+ From the manual text, I can not see what is the point with this feature.
+ I think it makes more sense when there are multiple terms in a query, or
+ something...
+
+ We decided 2006-06-03 to disable this feature, as it is covered by
+ scan within a resultset. Better use ressources to upgrade this
+ feature for good performance.
+ -->
+
+ <!--
+ <section id="querymodel-zebra-attr-estimation">
+ <title>&zebra; Extension Term Set Attribute (type 8)</title>
+ <para>
+ The Term Set feature is a facility that allows a search to store
+ hitting terms in a "pseudo" resultset; thus a search (as usual) +
+ a scan-like facility. Requires a client that can do named result
+ sets since the search generates two result sets. The value for
+ attribute 8 is the name of a result set (string). The terms in
+ the named term set are returned as &sutrs; records.
+ </para>
+ <para>
+ For example, searching for u in title, right truncated, and
+ storing the result in term set named 'aset'
+ <screen>
+ Z> find @attr 5=1 @attr 1=4 @attr 8=aset u
+ </screen>
+ </para>
+ <warning>
+ The model has one serious flaw: we don't know the size of term
+ set. Experimental. Do not use in production code.
+ </warning>
+ </section>
+ -->
+
+
+ <section id="querymodel-zebra-attr-weight">
+ <title>&zebra; Extension Rank Weight Attribute (type 9)</title>
+ <para>
+ Rank weight is a way to pass a value to a ranking algorithm - so
+ that one &apt; has one value - while another as a different one.
+ See also <xref linkend="administration-ranking"/>.
+ </para>
+ <para>
+ For example, searching for utah in title with weight 30 as well
+ as any with weight 20:
+ <screen>
+ Z> find @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 utah
+ </screen>
+ </para>
+ </section>
+
+ <section id="querymodel-zebra-attr-termref">
+ <title>&zebra; Extension Term Reference Attribute (type 10)</title>
+ <para>
+ &zebra; supports the searchResult-1 facility.
+ If the Term Reference Attribute (type 10) is
+ given, that specifies a subqueryId value returned as part of the
+ search result. It is a way for a client to name an &apt; part of a
+ query.
+ </para>
+ <!--
+ <para>
+ <screen>
+ </screen>
+ </para>
+ -->
+ <warning>
+ <para>
+ Experimental. Do not use in production code.
+ </para>
+ </warning>
+
+ </section>
+
+
+
+ <section id="querymodel-zebra-local-attr-limit">
+ <title>Local Approximative Limit Attribute (type 11)</title>
+ <para>
+ &zebra; computes - unless otherwise configured -
+ the exact hit count for every &apt;
+ (leaf) in the query tree. These hit counts are returned as part of
+ the searchResult-1 facility in the binary encoded &z3950; search
+ response packages.
+ </para>
+ <para>
+ By setting an estimation limit size of the resultset of the &apt;
+ leaves, &zebra; stoppes processing the result set when the limit
+ length is reached.
+ Hit counts under this limit are still precise, but hit counts over it
+ are estimated using the statistics gathered from the chopped
+ result set.
+ </para>
+ <para>
+ Specifying a limit of <literal>0</literal> resuts in exact hit counts.
+ </para>
+ <para>
+ For example, we might be interested in exact hit count for a, but
+ for b we allow hit count estimates for 1000 and higher.
+ <screen>
+ Z> find @and a @attr 11=1000 b
</screen>
- </para>
- <para>
- See <xref linkend="querymodel-bib1-mapping"/> for details, and
- <xref linkend="server-sru"/>
- for the SRU PQF query extention using string names as a fast
- debugging facility.
- </para>
- </sect3>
+ </para>
+ <note>
+ <para>
+ The estimated hit count facility makes searches faster, as one
+ only needs to process large hit lists partially.
+ It is mostly used in huge databases, where you you want trade
+ exactness of hit counts against speed of execution.
+ </para>
+ </note>
+ <warning>
+ <para>
+ Do not use approximative hit count limits
+ in conjunction with relevance ranking, as re-sorting of the
+ result set only works when the entire result set has
+ been processed.
+ </para>
+ </warning>
+ </section>
+
+ <section id="querymodel-zebra-global-attr-limit">
+ <title>Global Approximative Limit Attribute (type 12)</title>
+ <para>
+ By default &zebra; computes precise hit counts for a query as
+ a whole. Setting attribute 12 makes it perform approximative
+ hit counts instead. It has the same semantics as
+ <literal>estimatehits</literal> for the <xref linkend="zebra-cfg"/>.
+ </para>
+ <para>
+ The attribute (12) can occur anywhere in the query tree.
+ Unlike regular attributes it does not relate to the leaf (&apt;)
+ - but to the whole query.
+ </para>
+ <warning>
+ <para>
+ Do not use approximative hit count limits
+ in conjunction with relevance ranking, as re-sorting of the
+ result set only works when the entire result set has
+ been processed.
+ </para>
+ </warning>
+ </section>
- </sect2>
+ </section>
- <sect2 id="querymodel-exp1">
- <title>Explain Attribute Set</title>
+ <section id="querymodel-zebra-attr-scan">
+ <title>&zebra; specific Scan Extensions to all Attribute Sets</title>
<para>
- The Z39.50 standard defines the
- <ulink url="&url.z39.50.explain;">Explain</ulink>attribute set
- <literal>exp-1</literal>, which is used to discover information
- about a server's search semantics and functional capabilities
- Zebra exposes a "classic"
- Explain database by base name <literal>IR-Explain-1</literal>, which
- is populated with system internal information.
+ &zebra; extends the Bib1 attribute types, and these extensions are
+ recognized regardless of attribute
+ set used in a scan operation query.
</para>
- <para>
- The attribute-set <literal>exp-1</literal> consists of a single
- <literal>Use (type 1)</literal> attribute.
- </para>
- <para>
- In addition, the non-Use
- <literal>bib-1</literal> attributes, that is, the types
- <literal>Relation</literal>, <literal>Position</literal>,
- <literal>Structure</literal>, <literal>Truncation</literal>,
- and <literal>Completeness</literal> are imported from
- the <literal>bib-1</literal> attribute set, and may be used
- within any explain query.
- </para>
+ <table id="querymodel-zebra-attr-scan-table" frame="top">
+ <title>&zebra; Scan Attribute Extensions</title>
+ <tgroup cols="4">
+ <thead>
+ <row>
+ <entry>Name</entry>
+ <entry>Type</entry>
+ <entry>Operation</entry>
+ <entry>&zebra; version</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>Result Set Narrow</entry>
+ <entry>8</entry>
+ <entry>scan</entry>
+ <entry>1.3</entry>
+ </row>
+ <row>
+ <entry>Approximative Limit</entry>
+ <entry>9</entry>
+ <entry>scan</entry>
+ <entry>1.4</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
- <sect3 id="querymodel-exp1-use">
- <title>Use Attributes (type = 1)</title>
+ <section id="querymodel-zebra-attr-narrow">
+ <title>&zebra; Extension Result Set Narrow (type 8)</title>
+ <para>
+ If attribute Result Set Narrow (type 8)
+ is given for scan, the value is the name of a
+ result set. Each hit count in scan is
+ <literal>@and</literal>'ed with the result set given.
+ </para>
+ <para>
+ Consider for example
+ the case of scanning all title fields around the
+ scanterm <emphasis>mozart</emphasis>, then refining the scan by
+ issuing a filtering query for <emphasis>amadeus</emphasis> to
+ restrict the scan to the result set of the query:
+ <screen>
+ Z> scan @attr 1=4 mozart
+ ...
+ * mozart (43)
+ mozartforskningen (1)
+ mozartiana (1)
+ mozarts (16)
+ ...
+ Z> f @attr 1=4 amadeus
+ ...
+ Number of hits: 15, setno 2
+ ...
+ Z> scan @attr 1=4 @attr 8=2 mozart
+ ...
+ * mozart (14)
+ mozartforskningen (0)
+ mozartiana (0)
+ mozarts (1)
+ ...
+ </screen>
+ </para>
+
+ <para>
+ &zebra; 2.0.2 and later is able to skip 0 hit counts. This, however,
+ is known not to scale if the number of terms to skip is high.
+ This most likely will happen if the result set is small (and
+ result in many 0 hits).
+ </para>
+ </section>
+
+ <section id="querymodel-zebra-attr-approx">
+ <title>&zebra; Extension Approximative Limit (type 11)</title>
+ <para>
+ The &zebra; Extension Approximative Limit (type 11) is a way to
+ enable approximate hit counts for scan hit counts, in the same
+ way as for search hit counts.
+ </para>
+ </section>
+ </section>
+
+ <section id="querymodel-idxpath">
+ <title>&zebra; special &idxpath; Attribute Set for &grs1; indexing</title>
<para>
- The following Explain search atributes are supported:
- <literal>ExplainCategory</literal> (@attr 1=1),
- <literal>DatabaseName</literal> (@attr 1=3),
- <literal>DateAdded</literal> (@attr 1=9),
- <literal>DateChanged</literal>(@attr 1=10).
+ The attribute-set <literal>idxpath</literal> consists of a single
+ Use (type 1) attribute. All non-use attributes behave as normal.
</para>
<para>
- A search in the use attribute <literal>ExplainCategory</literal>
- supports only these predefined values:
- <literal>CategoryList</literal>, <literal>TargetInfo</literal>,
- <literal>DatabaseInfo</literal>, <literal>AttributeDetails</literal>.
+ This feature is enabled when defining the
+ <literal>xpath enable</literal> option in the &grs1; filter
+ <filename>*.abs</filename> configuration files. If one wants to use
+ the special <literal>idxpath</literal> numeric attribute set, the
+ main &zebra; configuration file <filename>zebra.cfg</filename>
+ directive <literal>attset: idxpath.att</literal> must be enabled.
</para>
+ <warning>
<para>
- See <filename>tab/explain.att</filename> and the
- for more information.
+ The <literal>idxpath</literal> is deprecated, may not be
+ supported in future &zebra; versions, and should definitely
+ not be used in production code.
+ </para>
+ </warning>
+
+ <section id="querymodel-idxpath-use">
+ <title>&idxpath; Use Attributes (type = 1)</title>
+ <para>
+ This attribute set allows one to search &grs1; filter indexed
+ records by &xpath; like structured index names.
+ </para>
+
+ <warning>
+ <para>
+ The <literal>idxpath</literal> option defines hard-coded
+ index names, which might clash with your own index names.
</para>
- </sect3>
-
- <sect3>
- <title>Explain searches with yaz-client</title>
- <para>
- Classic Explain only defines retrieval of Explain information
- via ASN.1. Pratically no Z39.50 clients supports this. Fortunately
- they don't have to - Zebra allows retrieval of this information
- in other formats:
- <literal>SUTRS</literal>, <literal>XML</literal>,
- <literal>GRS-1</literal> and <literal>ASN.1</literal> Explain.
- </para>
+ </warning>
+
+ <table id="querymodel-idxpath-use-table" frame="top">
+ <title>&zebra; specific &idxpath; Use Attributes (type 1)</title>
+ <tgroup cols="4">
+ <thead>
+ <row>
+ <entry>&idxpath;</entry>
+ <entry>Value</entry>
+ <entry>String Index</entry>
+ <entry>Notes</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>&xpath; Begin</entry>
+ <entry>1</entry>
+ <entry>_XPATH_BEGIN</entry>
+ <entry>deprecated</entry>
+ </row>
+ <row>
+ <entry>&xpath; End</entry>
+ <entry>2</entry>
+ <entry>_XPATH_END</entry>
+ <entry>deprecated</entry>
+ </row>
+ <row>
+ <entry>&xpath; CData</entry>
+ <entry>1016</entry>
+ <entry>_XPATH_CDATA</entry>
+ <entry>deprecated</entry>
+ </row>
+ <row>
+ <entry>&xpath; Attribute Name</entry>
+ <entry>3</entry>
+ <entry>_XPATH_ATTR_NAME</entry>
+ <entry>deprecated</entry>
+ </row>
+ <row>
+ <entry>&xpath; Attribute CData</entry>
+ <entry>1015</entry>
+ <entry>_XPATH_ATTR_CDATA</entry>
+ <entry>deprecated</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
<para>
- List supported categories to find out which explain commands are
- supported:
+ See <filename>tab/idxpath.att</filename> for more information.
+ </para>
+ <para>
+ Search for all documents starting with root element
+ <literal>/root</literal> (either using the numeric or the string
+ use attributes):
<screen>
- Z> base IR-Explain-1
- Z> @attr exp1 1=1 categorylist
- Z> form sutrs
- Z> show 1+2
+ Z> find @attrset idxpath @attr 1=1 @attr 4=3 root/
+ Z> find @attr idxpath 1=1 @attr 4=3 root/
+ Z> find @attr 1=_XPATH_BEGIN @attr 4=3 root/
</screen>
</para>
-
<para>
- Get target info, that is, investigate which databases exist at
- this server endpoint:
+ Search for all documents where specific nested &xpath;
+ <literal>/c1/c2/../cn</literal> exists. Notice the very
+ counter-intuitive <emphasis>reverse</emphasis> notation!
<screen>
- Z> base IR-Explain-1
- Z> @attr exp1 1=1 targetinfo
- Z> form xml
- Z> show 1+1
- Z> form grs-1
- Z> show 1+1
- Z> form sutrs
- Z> show 1+1
+ Z> find @attrset idxpath @attr 1=1 @attr 4=3 cn/cn-1/../c1/
+ Z> find @attr 1=_XPATH_BEGIN @attr 4=3 cn/cn-1/../c1/
</screen>
</para>
-
<para>
- List all supported databases, the number of hits
- is the number of databases found, which most commonly are the
- following two:
- the <literal>Default</literal> and the
- <literal>IR-Explain-1</literal> databases.
+ Search for CDATA string <emphasis>text</emphasis> in any element
<screen>
- Z> base IR-Explain-1
- Z> f @attr exp1 1=1 databaseinfo
- Z> form sutrs
- Z> show 1+2
+ Z> find @attrset idxpath @attr 1=1016 text
+ Z> find @attr 1=_XPATH_CDATA text
</screen>
</para>
-
<para>
- Get database info record for database <literal>Default</literal>.
- <screen>
- Z> base IR-Explain-1
- Z> @and @attr exp1 1=1 databaseinfo @attr exp1 1=3 Default
+ Search for CDATA string <emphasis>anothertext</emphasis> in any
+ attribute:
+ <screen>
+ Z> find @attrset idxpath @attr 1=1015 anothertext
+ Z> find @attr 1=_XPATH_ATTR_CDATA anothertext
</screen>
- Identical query with explicitly specified attribute set:
- <screen>
- Z> base IR-Explain-1
- Z> @attrset exp1 @and @attr 1=1 databaseinfo @attr 1=3 Default
+ </para>
+ <para>
+ Search for all documents with have an &xml; element node
+ including an &xml; attribute named <emphasis>creator</emphasis>
+ <screen>
+ Z> find @attrset idxpath @attr 1=3 @attr 4=3 creator
+ Z> find @attr 1=_XPATH_ATTR_NAME @attr 4=3 creator
</screen>
</para>
-
<para>
- Get attribute details record for database
- <literal>Default</literal>.
- This query is very useful to study the internal Zebra indexes.
- If records have been indexed using the <literal>alvis</literal>
- XSLT filter, the string representation names of the known indexes can be
- found.
+ Combining usual <literal>bib-1</literal> attribute set searches
+ with <literal>idxpath</literal> attribute set searches:
<screen>
- Z> base IR-Explain-1
- Z> @and @attr exp1 1=1 attributedetails @attr exp1 1=3 Default
+ Z> find @and @attr idxpath 1=1 @attr 4=3 link/ @attr 1=4 mozart
+ Z> find @and @attr 1=_XPATH_BEGIN @attr 4=3 link/ @attr 1=_XPATH_CDATA mozart
</screen>
- Identical query with explicitly specified attribute set:
+ </para>
+ <para>
+ Scanning is supported on all <literal>idxpath</literal>
+ indexes, both specified as numeric use attributes, or as string
+ index names.
<screen>
- Z> base IR-Explain-1
- Z> @attrset exp1 @and @attr 1=1 attributedetails @attr 1=3 Default
+ Z> scan @attrset idxpath @attr 1=1016 text
+ Z> scan @attr 1=_XPATH_ATTR_CDATA anothertext
+ Z> scan @attrset idxpath @attr 1=3 @attr 4=3 ''
</screen>
</para>
- </sect3>
- </sect2>
+ </section>
+ </section>
- <sect2 id="querymodel-bib1">
- <title>Bib1 Attribute Set</title>
- <para>
- Something about querying to be written ..
- </para>
- <para>
- Most of the information contained in this section is an excerpt of
- the <literal>ATTRIBUTE SET BIB-1 (Z39.50-1995)
- SEMANTICS</literal>, found at <ulink
- url="&url.z39.50.attset.bib1.1995;">The BIB-1
- Attribute Set Semantics</ulink> from 1995, also in an updated
- <ulink url="&url.z39.50.attset.bib1;">Bib-1
- Attribute Set</ulink>
- version from 2003. Index Data is not the copyright holder of this
- information.
- </para>
-
-
- <sect3 id="querymodel-bib1-use">
- <title>Use Attributes (type = 1)</title>
- </sect3>
- <sect3 id="querymodel-bib1-relation">
- <title>Relation Attributes (type = 2)</title>
- </sect3>
- <para>
- </para>
+ <section id="querymodel-pqf-apt-mapping">
+ <title>Mapping from &pqf; atomic &apt; queries to &zebra; internal
+ register indexes</title>
+ <para>
+ The rules for &pqf; &apt; mapping are rather tricky to grasp in the
+ first place. We deal first with the rules for deciding which
+ internal register or string index to use, according to the use
+ attribute or access point specified in the query. Thereafter we
+ deal with the rules for determining the correct structure type of
+ the named register.
+ </para>
- <sect3 id="querymodel-bib1-position">
- <title>Position Attributes (type = 3)</title>
- </sect3>
+ <section id="querymodel-pqf-apt-mapping-accesspoint">
+ <title>Mapping of &pqf; &apt; access points</title>
+ <para>
+ &zebra; understands four fundamental different types of access
+ points, of which only the
+ <emphasis>numeric use attribute</emphasis> type access points
+ are defined by the <ulink url="&url.z39.50;">&z3950;</ulink>
+ standard.
+ All other access point types are &zebra; specific, and non-portable.
+ </para>
- <sect3 id="querymodel-bib1-structure">
- <title>Structure Attributes (type = 4)</title>
- </sect3>
+ <table id="querymodel-zebra-mapping-accesspoint-types" frame="top">
+ <title>Access point name mapping</title>
+ <tgroup cols="4">
+ <thead>
+ <row>
+ <entry>Access Point</entry>
+ <entry>Type</entry>
+ <entry>Grammar</entry>
+ <entry>Notes</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>Use attribute</entry>
+ <entry>numeric</entry>
+ <entry>[1-9][1-9]*</entry>
+ <entry>directly mapped to string index name</entry>
+ </row>
+ <row>
+ <entry>String index name</entry>
+ <entry>string</entry>
+ <entry>[a-zA-Z](\-?[a-zA-Z0-9])*</entry>
+ <entry>normalized name is used as internal string index name</entry>
+ </row>
+ <row>
+ <entry>&zebra; internal index name</entry>
+ <entry>zebra</entry>
+ <entry>_[a-zA-Z](_?[a-zA-Z0-9])*</entry>
+ <entry>hardwired internal string index name</entry>
+ </row>
+ <row>
+ <entry>&xpath; special index</entry>
+ <entry>XPath</entry>
+ <entry>/.*</entry>
+ <entry>special xpath search for &grs1; indexed records</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <para>
+ <literal>Attribute set names</literal> and
+ <literal>string index names</literal> are normalizes
+ according to the following rules: all <emphasis>single</emphasis>
+ hyphens <literal>'-'</literal> are stripped, and all upper case
+ letters are folded to lower case.
+ </para>
+
+ <para>
+ <emphasis>Numeric use attributes</emphasis> are mapped
+ to the &zebra; internal
+ string index according to the attribute set definition in use.
+ The default attribute set is <literal>&bib1;</literal>, and may be
+ omitted in the &pqf; query.
+ </para>
+
+ <para>
+ According to normalization and numeric
+ use attribute mapping, it follows that the following
+ &pqf; queries are considered equivalent (assuming the default
+ configuration has not been altered):
+ <screen>
+ Z> find @attr 1=Body-of-text serenade
+ Z> find @attr 1=bodyoftext serenade
+ Z> find @attr 1=BodyOfText serenade
+ Z> find @attr 1=bO-d-Y-of-tE-x-t serenade
+ Z> find @attr 1=1010 serenade
+ Z> find @attrset &bib1; @attr 1=1010 serenade
+ Z> find @attrset bib1 @attr 1=1010 serenade
+ Z> find @attrset Bib1 @attr 1=1010 serenade
+ Z> find @attrset b-I-b-1 @attr 1=1010 serenade
+ </screen>
+ </para>
- <sect3 id="querymodel-bib1-truncation">
- <title>Truncation Attributes (type = 5)</title>
- </sect3>
+ <para>
+ The <emphasis>numerical</emphasis>
+ <literal>use attributes (type 1)</literal>
+ are interpreted according to the
+ attribute sets which have been loaded in the
+ <literal>zebra.cfg</literal> file, and are matched against specific
+ fields as specified in the <literal>.abs</literal> file which
+ describes the profile of the records which have been loaded.
+ If no use attribute is provided, a default of
+ &bib1; Use Any (1016) is assumed.
+ The predefined use attribute sets
+ can be reconfigured by tweaking the configuration files
+ <filename>tab/*.att</filename>, and
+ new attribute sets can be defined by adding similar files in the
+ configuration path <literal>profilePath</literal> of the server.
+ </para>
- <sect3 id="querymodel-bib1-completeness">
- <title>Completeness Attributes (type = 6)</title>
- </sect3>
+ <para>
+ String indexes can be accessed directly,
+ independently which attribute set is in use. These are just
+ ignored. The above mentioned name normalization applies.
+ String index names are defined in the
+ used indexing filter configuration files, for example in the
+ <literal>&grs1;</literal>
+ <filename>*.abs</filename> configuration files, or in the
+ <literal>alvis</literal> filter &xslt; indexing stylesheets.
+ </para>
- <sect3 id="querymodel-bib1-sorting">
- <title>Zebra Extention Sorting Attributes (type = 7)</title>
- </sect3>
+ <para>
+ &zebra; internal indexes can be accessed directly,
+ according to the same rules as the user defined
+ string indexes. The only difference is that
+ &zebra; internal index names are hardwired,
+ all uppercase and
+ must start with the character <literal>'_'</literal>.
+ </para>
- <sect3 id="querymodel-bib1-estimation">
- <title>Zebra Extention Search Estimation Attributes (type = 8)</title>
- </sect3>
+ <para>
+ Finally, <literal>&xpath;</literal> access points are only
+ available using the <literal>&grs1;</literal> filter for indexing.
+ These access point names must start with the character
+ <literal>'/'</literal>, they are <emphasis>not
+ normalized</emphasis>, but passed unaltered to the &zebra; internal
+ &xpath; engine. See <xref linkend="querymodel-use-xpath"/>.
- <sect3 id="querymodel-bib1-weight">
- <title>Zebra Extention Weight Attributes (type = 9)</title>
- </sect3>
-
- </sect2>
+ </para>
- <sect2 id="querymodel-bib1-mapping">
- <title>Mapping from Bib1 Attributes to Zebra internal
- register indexes</title>
- <para>
+
+ </section>
+
+
+ <section id="querymodel-pqf-apt-mapping-structuretype">
+ <title>Mapping of &pqf; &apt; structure and completeness to
+ register type</title>
+ <para>
+ Internally &zebra; has in it's default configuration several
+ different types of registers or indexes, whose tokenization and
+ character normalization rules differ. This reflects the fact that
+ searching fundamental different tokens like dates, numbers,
+ bitfields and string based text needs different rule sets.
</para>
- <para>
- <emphasis>Use</emphasis> attributes are interpreted according to the
- attribute sets which have been loaded in the
- <literal>zebra.cfg</literal> file, and are matched against specific
- fields as specified in the <literal>.abs</literal> file which
- describes the profile of the records which have been loaded.
- If no Use attribute is provided, a default of Bib-1 Any is assumed.
- </para>
+ <table id="querymodel-zebra-mapping-structure-types" frame="top">
+ <title>Structure and completeness mapping to register types</title>
+ <tgroup cols="4">
+ <thead>
+ <row>
+ <entry>Structure</entry>
+ <entry>Completeness</entry>
+ <entry>Register type</entry>
+ <entry>Notes</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>
+ phrase (@attr 4=1), word (@attr 4=2),
+ word-list (@attr 4=6),
+ free-form-text (@attr 4=105), or document-text (@attr 4=106)
+ </entry>
+ <entry>Incomplete field (@attr 6=1)</entry>
+ <entry>Word ('w')</entry>
+ <entry>Traditional tokenized and character normalized word index</entry>
+ </row>
+ <row>
+ <entry>
+ phrase (@attr 4=1), word (@attr 4=2),
+ word-list (@attr 4=6),
+ free-form-text (@attr 4=105), or document-text (@attr 4=106)
+ </entry>
+ <entry>complete field' (@attr 6=3)</entry>
+ <entry>Phrase ('p')</entry>
+ <entry>Character normalized, but not tokenized index for phrase
+ matches
+ </entry>
+ </row>
+ <row>
+ <entry>urx (@attr 4=104)</entry>
+ <entry>ignored</entry>
+ <entry>URX/URL ('u')</entry>
+ <entry>Special index for URL web addresses</entry>
+ </row>
+ <row>
+ <entry>numeric (@attr 4=109)</entry>
+ <entry>ignored</entry>
+ <entry>Numeric ('u')</entry>
+ <entry>Special index for digital numbers</entry>
+ </row>
+ <row>
+ <entry>key (@attr 4=3)</entry>
+ <entry>ignored</entry>
+ <entry>Null bitmap ('0')</entry>
+ <entry>Used for non-tokenizated and non-normalized bit sequences</entry>
+ </row>
+ <row>
+ <entry>year (@attr 4=4)</entry>
+ <entry>ignored</entry>
+ <entry>Year ('y')</entry>
+ <entry>Non-tokenizated and non-normalized 4 digit numbers</entry>
+ </row>
+ <row>
+ <entry>date (@attr 4=5)</entry>
+ <entry>ignored</entry>
+ <entry>Date ('d')</entry>
+ <entry>Non-tokenizated and non-normalized ISO date strings</entry>
+ </row>
+ <row>
+ <entry>ignored</entry>
+ <entry>ignored</entry>
+ <entry>Sort ('s')</entry>
+ <entry>Used with special sort attribute set (@attr 7=1, @attr 7=2)</entry>
+ </row>
+ <row>
+ <entry>overruled</entry>
+ <entry>overruled</entry>
+ <entry>special</entry>
+ <entry>Internal record ID register, used whenever
+ Relation Always Matches (@attr 2=103) is specified</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <!-- see in util/zebramap.c -->
+
+ <para>
+ If a <emphasis>Structure</emphasis> attribute of
+ <emphasis>Phrase</emphasis> is used in conjunction with a
+ <emphasis>Completeness</emphasis> attribute of
+ <emphasis>Complete (Sub)field</emphasis>, the term is matched
+ against the contents of the phrase (long word) register, if one
+ exists for the given <emphasis>Use</emphasis> attribute.
+ A phrase register is created for those fields in the
+ &grs1; <filename>*.abs</filename> file that contains a
+ <literal>p</literal>-specifier.
+ <screen>
+ Z> scan @attr 1=Title @attr 4=1 @attr 6=3 beethoven
+ ...
+ bayreuther festspiele (1)
+ * beethoven bibliography database (1)
+ benny carter (1)
+ ...
+ Z> find @attr 1=Title @attr 4=1 @attr 6=3 "beethoven bibliography"
+ ...
+ Number of hits: 0, setno 5
+ ...
+ Z> find @attr 1=Title @attr 4=1 @attr 6=3 "beethoven bibliography database"
+ ...
+ Number of hits: 1, setno 6
+ </screen>
+ </para>
- <para>
- If a <emphasis>Structure</emphasis> attribute of
- <emphasis>Phrase</emphasis> is used in conjunction with a
- <emphasis>Completeness</emphasis> attribute of
- <emphasis>Complete (Sub)field</emphasis>, the term is matched
- against the contents of the phrase (long word) register, if one
- exists for the given <emphasis>Use</emphasis> attribute.
- A phrase register is created for those fields in the
- <literal>.abs</literal> file that contains a
- <literal>p</literal>-specifier.
- <!-- ### whatever the hell _that_ is -->
- </para>
+ <para>
+ If <emphasis>Structure</emphasis>=<emphasis>Phrase</emphasis> is
+ used in conjunction with <emphasis>Incomplete Field</emphasis> - the
+ default value for <emphasis>Completeness</emphasis>, the
+ search is directed against the normal word registers, but if the term
+ contains multiple words, the term will only match if all of the words
+ are found immediately adjacent, and in the given order.
+ The word search is performed on those fields that are indexed as
+ type <literal>w</literal> in the &grs1; <filename>*.abs</filename> file.
+ <screen>
+ Z> scan @attr 1=Title @attr 4=1 @attr 6=1 beethoven
+ ...
+ beefheart (1)
+ * beethoven (18)
+ beethovens (7)
+ ...
+ Z> find @attr 1=Title @attr 4=1 @attr 6=1 beethoven
+ ...
+ Number of hits: 18, setno 1
+ ...
+ Z> find @attr 1=Title @attr 4=1 @attr 6=1 "beethoven bibliography"
+ ...
+ Number of hits: 2, setno 2
+ ...
+ </screen>
+ </para>
- <para>
- If <emphasis>Structure</emphasis>=<emphasis>Phrase</emphasis> is
- used in conjunction with <emphasis>Incomplete Field</emphasis> - the
- default value for <emphasis>Completeness</emphasis>, the
- search is directed against the normal word registers, but if the term
- contains multiple words, the term will only match if all of the words
- are found immediately adjacent, and in the given order.
- The word search is performed on those fields that are indexed as
- type <literal>w</literal> in the <literal>.abs</literal> file.
- </para>
+ <para>
+ If the <emphasis>Structure</emphasis> attribute is
+ <emphasis>Word List</emphasis>,
+ <emphasis>Free-form Text</emphasis>, or
+ <emphasis>Document Text</emphasis>, the term is treated as a
+ natural-language, relevance-ranked query.
+ This search type uses the word register, i.e. those fields
+ that are indexed as type <literal>w</literal> in the
+ &grs1; <filename>*.abs</filename> file.
+ </para>
- <para>
- If the <emphasis>Structure</emphasis> attribute is
- <emphasis>Word List</emphasis>,
- <emphasis>Free-form Text</emphasis>, or
- <emphasis>Document Text</emphasis>, the term is treated as a
- natural-language, relevance-ranked query.
- This search type uses the word register, i.e. those fields
- that are indexed as type <literal>w</literal> in the
- <literal>.abs</literal> file.
- </para>
+ <para>
+ If the <emphasis>Structure</emphasis> attribute is
+ <emphasis>Numeric String</emphasis> the term is treated as an integer.
+ The search is performed on those fields that are indexed
+ as type <literal>n</literal> in the &grs1;
+ <filename>*.abs</filename> file.
+ </para>
- <para>
- If the <emphasis>Structure</emphasis> attribute is
- <emphasis>Numeric String</emphasis> the term is treated as an integer.
- The search is performed on those fields that are indexed
- as type <literal>n</literal> in the <literal>.abs</literal> file.
- </para>
+ <para>
+ If the <emphasis>Structure</emphasis> attribute is
+ <emphasis>URX</emphasis> the term is treated as a URX (URL) entity.
+ The search is performed on those fields that are indexed as type
+ <literal>u</literal> in the <filename>*.abs</filename> file.
+ </para>
- <para>
- If the <emphasis>Structure</emphasis> attribute is
- <emphasis>URx</emphasis> the term is treated as a URX (URL) entity.
- The search is performed on those fields that are indexed as type
- <literal>u</literal> in the <literal>.abs</literal> file.
- </para>
+ <para>
+ If the <emphasis>Structure</emphasis> attribute is
+ <emphasis>Local Number</emphasis> the term is treated as
+ native &zebra; Record Identifier.
+ </para>
- <para>
- If the <emphasis>Structure</emphasis> attribute is
- <emphasis>Local Number</emphasis> the term is treated as
- native Zebra Record Identifier.
- </para>
+ <para>
+ If the <emphasis>Relation</emphasis> attribute is
+ <emphasis>Equals</emphasis> (default), the term is matched
+ in a normal fashion (modulo truncation and processing of
+ individual words, if required).
+ If <emphasis>Relation</emphasis> is <emphasis>Less Than</emphasis>,
+ <emphasis>Less Than or Equal</emphasis>,
+ <emphasis>Greater than</emphasis>, or <emphasis>Greater than or
+ Equal</emphasis>, the term is assumed to be numerical, and a
+ standard regular expression is constructed to match the given
+ expression.
+ If <emphasis>Relation</emphasis> is <emphasis>Relevance</emphasis>,
+ the standard natural-language query processor is invoked.
+ </para>
- <para>
- If the <emphasis>Relation</emphasis> attribute is
- <emphasis>Equals</emphasis> (default), the term is matched
- in a normal fashion (modulo truncation and processing of
- individual words, if required).
- If <emphasis>Relation</emphasis> is <emphasis>Less Than</emphasis>,
- <emphasis>Less Than or Equal</emphasis>,
- <emphasis>Greater than</emphasis>, or <emphasis>Greater than or
- Equal</emphasis>, the term is assumed to be numerical, and a
- standard regular expression is constructed to match the given
- expression.
- If <emphasis>Relation</emphasis> is <emphasis>Relevance</emphasis>,
- the standard natural-language query processor is invoked.
- </para>
+ <para>
+ For the <emphasis>Truncation</emphasis> attribute,
+ <emphasis>No Truncation</emphasis> is the default.
+ <emphasis>Left Truncation</emphasis> is not supported.
+ <emphasis>Process # in search term</emphasis> is supported, as is
+ <emphasis>Regxp-1</emphasis>.
+ <emphasis>Regxp-2</emphasis> enables the fault-tolerant (fuzzy)
+ search. As a default, a single error (deletion, insertion,
+ replacement) is accepted when terms are matched against the register
+ contents.
+ </para>
- <para>
- For the <emphasis>Truncation</emphasis> attribute,
- <emphasis>No Truncation</emphasis> is the default.
- <emphasis>Left Truncation</emphasis> is not supported.
- <emphasis>Process # in search term</emphasis> is supported, as is
- <emphasis>Regxp-1</emphasis>.
- <emphasis>Regxp-2</emphasis> enables the fault-tolerant (fuzzy)
- search. As a default, a single error (deletion, insertion,
- replacement) is accepted when terms are matched against the register
- contents.
- </para>
- </sect2>
+ </section>
+ </section>
- <sect2 id="querymodel-regular">
- <title>Regular expressions</title>
+ <section id="querymodel-regular">
+ <title>&zebra; Regular Expressions in Truncation Attribute (type = 5)</title>
<para>
Each term in a query is interpreted as a regular expression if
- the truncation value is either <emphasis>Regxp-1</emphasis> (102)
- or <emphasis>Regxp-2</emphasis> (103).
+ the truncation value is either <emphasis>Regxp-1 (@attr 5=102)</emphasis>
+ or <emphasis>Regxp-2 (@attr 5=103)</emphasis>.
Both query types follow the same syntax with the operands:
- <variablelist>
-
- <varlistentry>
- <term>x</term>
- <listitem>
- <para>
- Matches the character <emphasis>x</emphasis>.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>.</term>
- <listitem>
- <para>
- Matches any character.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><literal>[</literal>..<literal>]</literal></term>
- <listitem>
- <para>
- Matches the set of characters specified;
- such as <literal>[abc]</literal> or <literal>[a-c]</literal>.
- </para>
- </listitem>
- </varlistentry>
- </variablelist>
- and the operators:
- <variablelist>
-
- <varlistentry>
- <term>x*</term>
- <listitem>
- <para>
- Matches <emphasis>x</emphasis> zero or more times. Priority: high.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>x+</term>
- <listitem>
- <para>
- Matches <emphasis>x</emphasis> one or more times. Priority: high.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>x?</term>
- <listitem>
- <para>
- Matches <emphasis>x</emphasis> zero or once. Priority: high.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>xy</term>
- <listitem>
- <para>
- Matches <emphasis>x</emphasis>, then <emphasis>y</emphasis>.
- Priority: medium.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>x|y</term>
- <listitem>
- <para>
- Matches either <emphasis>x</emphasis> or <emphasis>y</emphasis>.
- Priority: low.
- </para>
- </listitem>
- </varlistentry>
- </variablelist>
- The order of evaluation may be changed by using parentheses.
- </para>
-
- <para>
- If the first character of the <emphasis>Regxp-2</emphasis> query
+ </para>
+
+ <table id="querymodel-regular-operands-table" frame="top">
+ <title>Regular Expression Operands</title>
+ <tgroup cols="2">
+ <tbody>
+ <row>
+ <entry><literal>x</literal></entry>
+ <entry>Matches the character <literal>x</literal>.</entry>
+ </row>
+ <row>
+ <entry><literal>.</literal></entry>
+ <entry>Matches any character.</entry>
+ </row>
+ <row>
+ <entry><literal>[ .. ]</literal></entry>
+ <entry>Matches the set of characters specified;
+ such as <literal>[abc]</literal> or <literal>[a-c]</literal>.</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <para>
+ The above operands can be combined with the following operators:
+ </para>
+
+ <table id="querymodel-regular-operators-table" frame="top">
+ <title>Regular Expression Operators</title>
+ <tgroup cols="2">
+ <tbody>
+ <row>
+ <entry><literal>x*</literal></entry>
+ <entry>Matches <literal>x</literal> zero or more times.
+ Priority: high.</entry>
+ </row>
+ <row>
+ <entry><literal>x+</literal></entry>
+ <entry>Matches <literal>x</literal> one or more times.
+ Priority: high.</entry>
+ </row>
+ <row>
+ <entry><literal>x?</literal></entry>
+ <entry> Matches <literal>x</literal> zero or once.
+ Priority: high.</entry>
+ </row>
+ <row>
+ <entry><literal>xy</literal></entry>
+ <entry> Matches <literal>x</literal>, then <literal>y</literal>.
+ Priority: medium.</entry>
+ </row>
+ <row>
+ <entry><literal>x|y</literal></entry>
+ <entry> Matches either <literal>x</literal> or <literal>y</literal>.
+ Priority: low.</entry>
+ </row>
+ <row>
+ <entry><literal>( )</literal></entry>
+ <entry>The order of evaluation may be changed by using parentheses.</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <para>
+ If the first character of the <literal>Regxp-2</literal> query
is a plus character (<literal>+</literal>) it marks the
beginning of a section with non-standard specifiers.
The next plus character marks the end of the section.
- Currently Zebra only supports one specifier, the error tolerance,
+ Currently &zebra; only supports one specifier, the error tolerance,
which consists one digit.
+ <!-- TODO Nice thing, but what does
+ that error tolerance digit *mean*? Maybe an example would be nice? -->
</para>
<para>
expressions.
</para>
- </sect2>
-
- <sect2 id="querymodel-examples">
- <title>Query examples</title>
-
- <para>
- Phrase search for <emphasis>information retrieval</emphasis> in
- the title-register:
- <screen>
- @attr 1=4 "information retrieval"
- </screen>
- </para>
-
- <para>
- Ranked search for the same thing:
- <screen>
- @attr 1=4 @attr 2=102 "Information retrieval"
- </screen>
- </para>
-
<para>
- Phrase search with a regular expression:
+ For example, a phrase search with regular expressions in
+ the title-register is performed like this:
<screen>
- @attr 1=4 @attr 5=102 "informat.* retrieval"
+ Z> find @attr 1=4 @attr 5=102 "informat.* retrieval"
</screen>
</para>
<para>
- Ranked search with a regular expression:
+ Combinations with other attributes are possible. For example, a
+ ranked search with a regular expression:
<screen>
- @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
+ Z> find @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
</screen>
</para>
+ </section>
- <para>
- In the GILS schema (<literal>gils.abs</literal>), the
- west-bounding-coordinate is indexed as type <literal>n</literal>,
- and is therefore searched by specifying
- <emphasis>structure</emphasis>=<emphasis>Numeric String</emphasis>.
- To match all those records with west-bounding-coordinate greater
- than -114 we use the following query:
- <screen>
- @attr 4=109 @attr 2=5 @attr gils 1=2038 -114
- </screen>
- </para>
- </sect2>
-
-
- <!-- see in util/zebramap.c
- int zebra_maps_attr
-
- if (completeness_value == 2 || completeness_value == 3)
- *complete_flag = 1;
- else
- *complete_flag = 0;
- *reg_id = 0;
-
- *sort_flag =(sort_relation_value > 0) ? 1 : 0;
- *search_type = "phrase";
- strcpy(rank_type, "void");
- if (relation_value == 102)
- {
- if (weight_value == -1)
- weight_value = 34;
- sprintf(rank_type, "rank,w=%d,u=%d", weight_value, use_value);
- }
- if (relation_value == 103)
- {
- *search_type = "always";
- *reg_id = 'w';
- return 0;
- }
- if (*complete_flag)
- *reg_id = 'p';
- else
- *reg_id = 'w';
- switch (structure_value)
- {
- case 6: /* word list */
- *search_type = "and-list";
- break;
- case 105: /* free-form-text */
- *search_type = "or-list";
- break;
- case 106: /* document-text */
- *search_type = "or-list";
- break;
- case -1:
- case 1: /* phrase */
- case 2: /* word */
- case 108: /* string */
- *search_type = "phrase";
- break;
- case 107: /* local-number */
- *search_type = "local";
- *reg_id = 0;
- break;
- case 109: /* numeric string */
- *reg_id = 'n';
- *search_type = "numeric";
- break;
- case 104: /* urx */
- *reg_id = 'u';
- *search_type = "phrase";
- break;
- case 3: /* key */
- *reg_id = '0';
- *search_type = "phrase";
- break;
- case 4: /* year */
- *reg_id = 'y';
- *search_type = "phrase";
- break;
- case 5: /* date */
- *reg_id = 'd';
- *search_type = "phrase";
- break;
- default:
- return -1;
- }
- return 0;
-
- -->
<!--
<para>
The RecordType parameter in the <literal>zebra.cfg</literal> file, or
- the <literal>-t</literal> option to the indexer tells Zebra how to
+ the <literal>-t</literal> option to the indexer tells &zebra; how to
process input records.
Two basic types of processing are available - raw text and structured
data. Raw text is just that, and it is selected by providing the
- argument <emphasis>text</emphasis> to Zebra. Structured records are
+ argument <literal>text</literal> to &zebra;. Structured records are
all handled internally using the basic mechanisms described in the
subsequent sections.
- Zebra can read structured records in many different formats.
+ &zebra; can read structured records in many different formats.
</para>
-->
- </sect1>
+ </section>
- <sect1 id="querymodel-cql-to-pqf">
- <title>Server Side CQL to PQF Query Translation</title>
+ <section id="querymodel-cql-to-pqf">
+ <title>Server Side &cql; to &pqf; Query Translation</title>
<para>
Using the
<literal><cql2rpn>l2rpn.txt</cql2rpn></literal>
- YAZ Frontend Virtual
+ &yaz; Frontend Virtual
Hosts option, one can configure
- the YAZ Frontend CQL-to-PQF
+ the &yaz; Frontend &cql;-to-&pqf;
converter, specifying the interpretation of various
- <ulink url="&url.cql;">CQL</ulink>
+ <ulink url="&url.cql;">&cql;</ulink>
indexes, relations, etc. in terms of Type-1 query attributes.
<!-- The yaz-client config file -->
</para>
<para>
- For example, using server-side CQL-to-PQF conversion, one might
+ For example, using server-side &cql;-to-&pqf; conversion, one might
query a zebra server like this:
<screen>
<![CDATA[
]]>
</screen>
and - if properly configured - even static relevance ranking can
- be performed using CQL query syntax:
+ be performed using &cql; query syntax:
<screen>
<![CDATA[
Z> find text = /relevant (plant and soil)
<para>
By the way, the same configuration can be used to
- search using client-side CQL-to-PQF conversion:
+ search using client-side &cql;-to-&pqf; conversion:
(the only difference is <literal>querytype cql2rpn</literal>
instead of
<literal>querytype cql</literal>, and the call specifying a local
<para>
Exhaustive information can be found in the
- Section "Specification of CQL to RPN mappings" in the YAZ manual.
- <ulink url="http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map">
- http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map</ulink>,
- and shall therefore not be repeated here.
+ Section "Specification of &cql; to &rpn; mappings" in the &yaz; manual.
+ <ulink url="&url.yaz.cql2pqf;"/>,
+ and shall therefore not be repeated here.
</para>
<!--
<para>
See
- <ulink url="http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html">
- http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html</ulink>
- for the Maintenance Agency's work-in-progress mapping of Dublin Core
+ <ulink url="http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html"/>
+ for the Maintenance Agency's work-in-progress mapping of Dublin Core
indexes to Attribute Architecture (util, XD and BIB-2)
- attributes.
- </para>
- -->
- </sect1>
-
-
-
-<!--
- <sect1 id="architecture-querylanguage">
- <title>Query Languages</title>
-
- <para>
-
-http://www.loc.gov/z3950/agency/document.html
-
- PQF and BIB-1 stuff to be explained
- <ulink url="&url.z39.50.attset.bib1;">
- http://www.loc.gov/z3950/agency/defns/bib1.html</ulink>
-
- <ulink url="&url.z39.50.attset.bib1.1995;">
- http://www.loc.gov/z3950/agency/bib1.html</ulink>
-
- http://www.loc.gov/z3950/agency/markup/13.html
-
+ attributes.
</para>
- </sect1>
-
-
-These attribute types are recognized regardless of attribute set. Some are recognized for search, others for scan.
-
-Search
-
-Type Name Version
-7 Embedded Sort 1.1
-8 Term Set 1.1
-9 Rank weight 1.1
-9 Approx Limit 1.4
-10 Term Ref 1.4
-
-Embedded Sort
-
-The embedded sort is a way to specify sort within a query - thus removing the need to send a Sort Request separately. It is both faster and does not require clients that deal with the Sort Facility.
-
-The value after attribute type 7 is 1=ascending, 2=descending.. The attributes+term (APT) node is separate from the rest and must be @or'ed. The term associated with APT is the level .. 0=primary sort, 1=secondary sort etc.. Example:
-
-Search for water, sort by title (ascending):
-
- @or @attr 1=1016 water @attr 7=1 @attr 1=4 0
-
-Search for water, sort by title ascending, then date descending:
-
- @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1
-
-Term Set
-
-The Term Set feature is a facility that allows a search to store hitting terms in a "pseudo" resultset; thus a search (as usual) + a scan-like facility. Requires a client that can do named result sets since the search generates two result sets. The value for attribute 8 is the name of a result set (string). The terms in term set are returned as SUTRS records.
-
-Seach for u in title, right truncated.. Store result in result set named uset.
-
- @attr 5=1 @attr 1=4 @attr 8=uset u
-
-The model as one serious flaw.. We don't know the size of term set.
-
-Rank weight
-
-Rank weight is a way to pass a value to a ranking algorithm - so that one APT has one value - while another as a different one.
-
-Search for utah in title with weight 30 as well as any with weight 20.
-
- @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 utah
-
-Approx Limit
-
-Newer Zebra versions normally estemiates hit count for every APT (leaf) in the query tree. These hit counts are returned as part of the searchResult-1 facility.
-
-By setting a limit for the APT we can make Zebra turn into approximate hit count when a certain hit count limit is reached. A value of zero means exact hit count.
-
-We are intersted in exact hit count for a, but for b we allow estimates for 1000 and higher..
-
- @and a @attr 9=1000 b
-
-This facility clashes with rank weight! Fortunately this is a Zebra 1.4 thing so we can change this without upsetting anybody!
-
-Term Ref
-
-Zebra supports the searchResult-1 facility.
-
-If attribute 10 is given, that specifies a subqueryId value returned as part of the search result. It is a way for a client to name an APT part of a query.
-
-Scan
-
-Type Name Version
-8 Result set narrow 1.3
-9 Approx Limit 1.4
-
-Result set narrow
-
-If attribute 8 is given for scan, the value is the name of a result set. Each hit count in scan is @and'ed with the result set given.
-
-Approx limit
-
-The approx (as for search) is a way to enable approx hit counts for scan hit counts. However, it does NOT appear to work at the moment.
-
-
- AdamDickmeiss - 19 Dec 2005
-
-
--->
+ -->
+ </section>
</chapter>