<chapter id="querymodel">
- <!-- $Id: querymodel.xml,v 1.5 2006-06-14 13:57:45 marc Exp $ -->
+ <!-- $Id: querymodel.xml,v 1.10 2006-06-21 13:32:33 marc Exp $ -->
<title>Query Model</title>
<sect1 id="querymodel-overview">
to the international standards
<ulink url="&url.z39.50;">Z39.50</ulink> and
<ulink url="&url.sru;">SRU</ulink>,
- and implement the query model defined there.
- Unfortunately, the Z39.50 query model has only defined a binary
+ and implement the
+ <literal>type-1 Reverse Polish Notation (RPN)</literal> query
+ model defined there.
+ Unfortunately, this model has only defined a binary
encoded representation, which is used as transport packaging in
the Z39.50 protocol layer. This representation is not human
readable, nor defines any convenient way to specify queries.
</para>
- <!-- tell about RPN - include link to YAZ
- url.yaz.pqf -->
+ <para>
+ Since the <literal>type-1 (RPN)</literal>
+ query structure has no direct, useful string
+ representation, every origin application needs to provide some
+ form of mapping from a local query notation or representation to it.
+ </para>
+
<sect3 id="querymodel-query-languages-pqf">
<title>Prefix Query Format (PQF)</title>
<para>
Index Data has defined a textual representaion in the
<literal>Prefix Query Format</literal>, short
- <literal>PQF</literal>, which then has been adopted by other
- parties developing Z39.50 software. It is also often referred to as
+ <literal>PQF</literal>, which mappes
+ <literal>one-to-one</literal> to binary encoded
+ <literal>type-1 RPN</literal> query packages.
+ It has been adopted by other
+ parties developing Z39.50 software, and is often referred to as
<literal>Prefix Query Notation</literal>, or in short
- <literal>PQN</literal>, and is thoroughly explained in
- <xref linkend="querymodel-pqf"/>.
+ <literal>PQN</literal>. See
+ <xref linkend="querymodel-pqf"/> for further explanaitions and
+ descriptions of Zebra's capabilities.
</para>
</sect3>
-
- <!-- PQF/RPN is natively supported. CQL is NOT . So we need a map -->
<sect3 id="querymodel-query-languages-cql">
<title>Common Query Language (CQL)</title>
- <para>
- In addition, Zebra can be configured to understand and map the
- <literal>Common Query Language</literal>
- (<ulink url="&url.cql;">CQL</ulink>)
- to PQF. See an introduction on the mapping to the internal query
- representation in
+ <para>
+ The query model of the <literal>type-1 RPN</literal>,
+ expressed in <literal>PQF/PQN</literal> is natively supported.
+ On the other hand, the default <literal>SRU</literal>
+ webservices <literal>Common Query Language</literal>
+ <ulink url="&url.cql;">CQL</ulink> is not natively supported.
+ </para>
+ <para>
+ Zebra can be configured to understand and map CQL to PQF. See
<xref linkend="querymodel-cql-to-pqf"/>.
</para>
</sect3>
</sect2>
- <sect2 id="querymodel-query-types">
- <title>Query types</title>
+ <sect2 id="querymodel-operation-types">
+ <title>Operation types</title>
<para>
+ Zebra supports all of the three different
+ <literal>Z39.50/SRU</literal> operations defined in the
+ standards: <literal>explain</literal>, <literal>search</literal>,
+ and <literal>scan</literal>. A short description of the
+ functionality and purpose of each is quite in order here.
</para>
- <sect3 id="querymodel-query-type-explain">
- <title>Explain Queries</title>
+ <sect3 id="querymodel-operation-type-explain">
+ <title>Explain Operation</title>
<para>
+ The <emphasis>syntax</emphasis> of Z39.50/SRU queries is
+ well known to any client, but the specific
+ <emphasis>semantics</emphasis> - taking into account a
+ particular servers functionalities and abilities - must be
+ discovered from case to case. Enters the
+ <literal>explain</literal> operation, which provides the means
+ for learning which
+ <emphasis>fields</emphasis> (also called
+ <emphasis>indexes</emphasis> or <emphasis>access points</emphasis>
+ are provided, which default parameter the server uses, which
+ retrieve document formats are defined, and which specific parts
+ of the general query model are supported.
+ </para>
+ <para>
+ The Z39.50 embeddes the <literal>explain</literal> operation
+ by perfoming a
+ <literal>search</literal> in the magic
+ <literal>IR-Explain-1</literal> database;
+ see <xref linkend="querymodel-exp1"/>.
+ </para>
+ <para>
+ In SRU, <literal>explain</literal> is an entirely seperate
+ operation, which returns an <literal>Zeerex
+ XML</literal> record according to the
+ structure defined by the protocol.
+ </para>
+ <para>
+ In both cases, the information gathered through
+ <literal>explain</literal> operations can be used to
+ auto-configure a client user interface to the servers
+ capabilities.
</para>
</sect3>
- <sect3 id="querymodel-query-type-search">
- <title>Search Queries</title>
+ <sect3 id="querymodel-operation-type-search">
+ <title>Search Operation</title>
<para>
+ Search and retrieve interactions are the raison d'ĂȘtre.
+ They are used to query the remote database and
+ return search result documents. Search queries span from
+ simple free text searches to nested complex boolean queries,
+ targeting specific indexes, and possibly enhanced with many
+ query semantic specifications. Search interactions are the heart
+ and soul of Z39.50/SRU servers.
</para>
</sect3>
- <sect3 id="querymodel-query-type-scan">
- <title>Scan Queries</title>
+ <sect3 id="querymodel-operation-type-scan">
+ <title>Scan Operation</title>
+ <para>
+ The <literal>scan</literal> operation is a helper functionality,
+ which operates on one index or access point a time.
+ </para>
<para>
+ It provides
+ the means to investigate the content of specific indexes.
+ Scanning an index returns a handfull of terms actually fond in
+ the indexes, and in addition the <literal>scan</literal>
+ operation returns th enumber of documents indexed by each term.
+ A search client can use this information to propose proper
+ spelling of search terms, to auto-fill search boxes, or to
+ display controlled vocabularies.
</para>
</sect3>
may start with one specification of the
<emphasis>attribute set</emphasis> used. Following is a query
tree, which
- consists of <emphasis>atomic query parts</emphasis>, eventually
+ consists of <emphasis>atomic query parts (APT)</emphasis> or
+ <emphasis>named result sets</emphasis>, eventually
paired by <emphasis>boolean binary operators</emphasis>, and
finally <emphasis>recursively combined </emphasis> into
complex query trees.
issued. Zebra comes with some predefined attribute set
definitions, others can easily be defined and added to the
configuration.
- <note>
- The Zebra internal query procesing is modeled after
- the <literal>Bib1</literal> attribute set, and the non-use
- attributes type 2-6 are hard-wired in. It is therefore essential
- to be familiar with <xref linkend="querymodel-bib1"/>.
- </note>
</para>
+
- <table id="querymodel-attribute-sets-table">
+ <table id="querymodel-attribute-sets-table"
+ frame="all" rowsep="1" colsep="1" align="center">
+
<caption>Attribute sets predefined in Zebra</caption>
- <!--
+
<thead>
- <tr><td>one</td><td>two</td></tr>
+ <tr>
+ <td>Attribute set</td>
+ <td>Short hand</td>
+ <td>Status</td>
+ <td>Notes</td>
+ </tr>
</thead>
- -->
+
<tbody>
<tr>
- <td><emphasis>exp-1</emphasis></td>
- <td><literal>Explain</literal> attribute set</td>
+ <td><literal>Explain</literal></td>
+ <td><literal>exp-1</literal></td>
<td>Special attribute set used on the special automagic
<literal>IR-Explain-1</literal> database to gain information on
server capabilities, database names, and database
and semantics.</td>
+ <td>predefined</td>
</tr>
<tr>
- <td><emphasis>bib-1</emphasis></td>
- <td><literal>Bib1</literal> attribute set</td>
+ <td><literal>Bib1</literal></td>
+ <td><literal>bib-1</literal></td>
<td>Standard PQF query language attribute set which defines the
semantics of Z39.50 searching. In addition, all of the
- non-use attributes (type 2-9) define the Zebra internal query
- processing</td>
+ non-use attributes (type 2-9) define the hard-wired
+ Zebra internal query
+ processing.</td>
+ <td>default</td>
</tr>
<tr>
- <td><emphasis>gils</emphasis></td>
- <td><literal>GILS</literal> attribute set</td>
+ <td><literal>GILS</literal></td>
+ <td><literal>gils</literal></td>
<td>Extention to the <literal>Bib1</literal> attribute set.</td>
+ <td>predefined</td>
+ </tr>
+ <tr>
+ <td><literal>IDXPATH</literal></td>
+ <td><literal>idxpath</literal></td>
+ <td>Hardwired XPATH like attribute set, only available for
+ indexing with the GRS record model</td>
+ <td>depreciated</td>
</tr>
</tbody>
</table>
</sect3>
+
+ <para>
+ The use attributes (type 1) of the predefined attribute sets can
+ be reconfigured by tweaking the files
+ <filename>tab/*.att</filename>.
+ New attribute sets can be defined by adding similar files in the
+ configuration path of the server.
+ </para>
+
+ <note>
+ The Zebra internal query processing is modeled after
+ the <literal>Bib1</literal> attribute set, and the non-use
+ attributes type 2-6 are hard-wired in. It is therefore essential
+ to be familiar with <xref linkend="querymodel-bib1-nonuse"/>.
+ </note>
+
<sect3 id="querymodel-boolean-operators">
<title>Boolean operators</title>
using the standard boolean operators into new query trees.
</para>
- <table id="querymodel-boolean-operators-table">
+ <table id="querymodel-boolean-operators-table"
+ frame="all" rowsep="1" colsep="1" align="center">
+
<caption>Boolean operators</caption>
<!--
<thead>
</thead>
-->
<tbody>
- <tr><td><emphasis>@and</emphasis></td>
+ <tr><td><literal>@and</literal></td>
<td>binary <literal>AND</literal> operator</td>
<td>Set intersection of two atomic queries hit sets</td>
</tr>
- <tr><td><emphasis>@or</emphasis></td>
+ <tr><td><literal>@or</literal></td>
<td>binary <literal>OR</literal> operator</td>
<td>Set union of two atomic queries hit sets</td>
</tr>
- <tr><td><emphasis>@not</emphasis></td>
+ <tr><td><literal>@not</literal></td>
<td>binary <literal>AND NOT</literal> operator</td>
<td>Set complement of two atomic queries hit sets</td>
</tr>
- <tr><td><emphasis>@prox</emphasis></td>
+ <tr><td><literal>@prox</literal></td>
<td>binary <literal>PROXIMY</literal> operator</td>
<td>Set intersection of two atomic queries hit sets. In
addition, the intersection set is purged for all
<sect3 id="querymodel-atomic-queries">
- <title>Atomic queries</title>
+ <title>Atomic queries (APT)</title>
<para>
Atomic queries are the query parts which work on one acess point
only. These consist of <literal>an attribute list</literal>
followed by a <literal>single term</literal> or a
- <literal>quoted term list</literal>.
+ <literal>quoted term list</literal>, and are often called
+ <emphasis>Attributes-Plus-Terms (APT)</emphasis> queries.
</para>
<para>
Unsupplied non-use attributes type 2-9 are either inherited from
See <xref linkend="querymodel-bib1"/> for details.
</para>
- <table id="querymodel-atomic-queries-table">
+ <table id="querymodel-atomic-queries-table"
+ frame="all" rowsep="1" colsep="1" align="center">
+
<caption>Atomic queries</caption>
<!--
<thead>
</screen>
</para>
<para>
- Equivalent query fully specified:
+ Equivalent query fully specified including all default values:
<screen>
Z> find @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 "information"
</screen>
</sect3>
+
+ <sect3 id="querymodel-resultset">
+ <title>Named Result Sets</title>
+ <para>
+ Named result sets are supported in Zebra, and result sets can be
+ used as operands without limitations.
+ </para>
+ <para>
+ After the execution of a search, the result set is available at
+ the server, such that the client can use it for subsequent
+ searches or retrieval requests. The Z30.50 standard actually
+ stresses the fact that result sets are voliatile. It may cease
+ to exist at any time point after search, and the server will
+ send a diagnostic to the effect that the requested
+ result set does not exist any more.
+ </para>
+
+ <para>
+ Defining a named result set and re-using it in the next query,
+ using <literal>yaz-client</literal>.
+ <screen>
+ Z> f @attr 1=4 mozart
+ ...
+ Number of hits: 43, setno 1
+ ...
+ Z> f @and @set 1 @attr 1=4 amadeus
+ ...
+ Number of hits: 14, setno 2
+ ...
+ Z> f @attr 1=1016 beethoven
+ ...
+ Number of hits: 26, setno 3
+ ...
+ </screen>
+ </para>
+
+ <note>
+ Named result sets are only supported by the Z39.50 protocol.
+ The SRU web service is stateless, and therefore the notion of
+ named result sets does not exist when acessing a Zebra server by
+ the SRU protocol.
+ </note>
+ </sect3>
+
+
<sect3 id="querymodel-use-string">
<title>Zebra's special use attribute type 1 of form 'string'</title>
<para>
this facility when speed is essential, and the database content
size is medium to large.
</warning>
+
</sect3>
</sect2>
<sect2 id="querymodel-bib1">
<title>Bib1 Attribute Set</title>
<para>
- Something about querying to be written ..
- </para>
- <para>
Most of the information contained in this section is an excerpt of
the <literal>ATTRIBUTE SET BIB-1 (Z39.50-1995)
SEMANTICS</literal>,
<ulink url="&url.z39.50.attset.bib1;">Bib-1
Attribute Set</ulink>
version from 2003. Index Data is not the copyright holder of this
- information.
+ information, except for the configuration details, the listing of
+ Zebra's capabilities, and the example queries.
</para>
<sect3 id="querymodel-bib1-use">
- <title>Use Attributes (type = 1)</title>
- </sect3>
+ <title>Use Attributes (type 1)</title>
+
+ <para>
+ A use attribute specifies an access point for any atomic query.
+ These acess points are highly dependent on the attribute set used
+ in the query, and are user configurable using the following
+ default configuration files:
+ <filename>tab/bib1.att</filename>,
+ <filename>tab/dan1.att</filename>,
+ <filename>tab/explain.att</filename>, and
+ <filename>tab/gils.att</filename>.
+ New attribute sets can be added by adding new
+ <filename>tab/*.att</filename> configuration files, which need to
+ be sourced in the main configuration <filename>zebra.cfg</filename>.
+ </para>
+
+ <para>
+ In addition, Zebra allows the acess of
+ <emphasis>internal index names</emphasis> and <emphasis>dynamic
+ XPath</emphasis> as use attributes.
+ See <xref linkend="querymodel-use-string"/> and
+ <xref linkend="querymodel-use-xpath"/> for
+ alternative acess to the Zebra internal index names and XPath queries.
+ </para>
<para>
Phrase search for <emphasis>information retrieval</emphasis> in
Z> find @attr 1=4 "information retrieval"
</screen>
</para>
+ </sect3>
- <para>
- See also <xref linkend="querymodel-use-string and "/>
- <xref linkend="querymodel-use-xpath"/> for
- alternative acess to the Zebra internal index names and XPath queries.
- </para>
+ </sect2>
+
+ <sect2 id="querymodel-bib1-nonuse">
+ <title>Zebra general Bib1 Non-Use Attributes (type 2-6)</title>
<sect3 id="querymodel-bib1-relation">
- <title>Relation Attributes (type = 2)</title>
- <para>
- Supported operations: = (default, of omitted), < > <=, >= .
- Unsupported: Not equal.
+ <title>Relation Attributes (type 2)</title>
- The following relation attributes are also supported: relevance (102).
- <!-- always-matches (103) not supported for all indexes -->
+ <para>
+ Relation attributes describe the relationship of the access
+ point (left side
+ of the relation) to the search term as qualified by the attributes (right
+ side of the relation), e.g., Date-publication <= 1975.
+ </para>
- All operations are based on a lexicographical ordering,
- <emphasis>expect</emphasis> in the case for the
- following structure attributes: numeric(109).
- </para>
+ <table id="querymodel-bib1-relation-table"
+ frame="all" rowsep="1" colsep="1" align="center">
+
+ <caption>Relation Attributes (type 2)</caption>
+ <thead>
+ <tr>
+ <td>Relation</td>
+ <td>Value</td>
+ <td>Notes</td>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td> Less than</td>
+ <td>1</td>
+ <td>supported</td>
+ </tr>
+ <tr>
+ <td>Less than or equal</td>
+ <td>2</td>
+ <td>supported</td>
+ </tr>
+ <tr>
+ <td>Equal</td>
+ <td>3</td>
+ <td>default</td>
+ </tr>
+ <tr>
+ <td>Greater or equal</td>
+ <td>4</td>
+ <td>supported</td>
+ </tr>
+ <tr>
+ <td>Greater than</td>
+ <td>5</td>
+ <td>supported</td>
+ </tr>
+ <tr>
+ <td>Not equal</td>
+ <td>6</td>
+ <td>unsupported</td>
+ </tr>
+ <tr>
+ <td>Phonetic</td>
+ <td>100</td>
+ <td>unsupported</td>
+ </tr>
+ <tr>
+ <td>Stem</td>
+ <td>101</td>
+ <td>unsupported</td>
+ </tr>
+ <tr>
+ <td>Relevance</td>
+ <td>102</td>
+ <td>supported</td>
+ </tr>
+ <tr>
+ <td>AlwaysMatches</td>
+ <td>103</td>
+ <td>unsupported</td>
+ </tr>
+ </tbody>
+ </table>
+ <para>
+ The relation attribute
+ <literal>relevance (102)</literal> is supported, see
+ <xref linkend="administration-ranking"/> for full information.
+ <!-- always-matches (103) not supported for all indexes -->
+ </para>
+
<para>
+ All ordering operations are based on a lexicographical ordering,
+ <emphasis>expect</emphasis> when the
+ <literal>structure attribute numeric (109)</literal> is used. In
+ this case, ordering is numerical. See
+ <xref linkend="querymodel-bib1-structure"/>.
+ </para>
+
+ <para>
Ranked search for <emphasis>information retrieval</emphasis> in
- the title-register
- (see <xref linkend="administration-ranking"/> for the glory details):
+ the title-register:
<screen>
Z> find @attr 1=4 @attr 2=102 "information retrieval"
</screen>
</sect3>
<sect3 id="querymodel-bib1-position">
- <title>Position Attributes (type = 3)</title>
+ <title>Position Attributes (type 3)</title>
+
<para>
- Only value of (any position(3) is supported. first in field(1),
- and first in subfield(2) are unsupported but using them
- does not trigger an error.
+ The position attribute specifies the location of the search term
+ within the field or subfield in which it appears.
+ </para>
+
+ <table id="querymodel-bib1-position-table"
+ frame="all" rowsep="1" colsep="1" align="center">
+
+ <caption>Position Attributes (type 3)</caption>
+ <thead>
+ <tr>
+ <td>Position</td>
+ <td>Value</td>
+ <td>Notes</td>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>First in field </td>
+ <td>1</td>
+ <td>unsupported</td>
+ </tr>
+ <tr>
+ <td>First in subfield</td>
+ <td>2</td>
+ <td>unsupported</td>
+ </tr>
+ <tr>
+ <td>Any position in field</td>
+ <td>3</td>
+ <td>default</td>
+ </tr>
+ </tbody>
+ </table>
+
+ <para>
+ The position attribute values <literal>first in field (1)</literal>,
+ and <literal>first in subfield(2)</literal> are unsupported.
+ Using them does not trigger an error, but silent defaults to
+ <literal>any position in field (3)</literal>.
<!-- It should -->
</para>
</sect3>
<sect3 id="querymodel-bib1-structure">
- <title>Structure Attributes (type = 4)</title>
- <!-- See tab/default.idx -->
+ <title>Structure Attributes (type 4)</title>
+
+ <para>
+ The structure attribute specifies the type of search
+ term. This causes the search to be mapped on
+ different Zebra internal indexes, which must have been defined
+ at index time.
+ </para>
+
+ <para>
+ The possible values of the
+ <literal>structure attribute (type 4)</literal> can be defined
+ using the configuraiton file <filename>
+ tab/default.idx</filename>.
+ The default configuration is summerized in this table.
+ </para>
+
+ <table id="querymodel-bib1-structure-table"
+ frame="all" rowsep="1" colsep="1" align="center">
+
+ <caption>Structure Attributes (type 4)</caption>
+ <thead>
+ <tr>
+ <td>Structure</td>
+ <td>Value</td>
+ <td>Notes</td>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>Phrase </td>
+ <td>1</td>
+ <td>default</td>
+ </tr>
+ <tr>
+ <td>Word</td>
+ <td>2</td>
+ <td>supported</td>
+ </tr>
+ <tr>
+ <td>Key</td>
+ <td>3</td>
+ <td>supported</td>
+ </tr>
+ <tr>
+ <td>Year</td>
+ <td>4</td>
+ <td>supported</td>
+ </tr>
+ <tr>
+ <td>Date (normalized)</td>
+ <td>5</td>
+ <td>supported</td>
+ </tr>
+ <tr>
+ <td>Word list</td>
+ <td>6</td>
+ <td>supported</td>
+ </tr>
+ <tr>
+ <td>Date (un-normalized)</td>
+ <td>100</td>
+ <td>unsupported</td>
+ </tr>
+ <tr>
+ <td>Name (normalized) </td>
+ <td>101</td>
+ <td>unsupported</td>
+ </tr>
+ <tr>
+ <td>Name (un-normalized) </td>
+ <td>102</td>
+ <td>unsupported</td>
+ </tr>
+ <tr>
+ <td>Structure</td>
+ <td>103</td>
+ <td>unsupported</td>
+ </tr>
+ <tr>
+ <td>Urx</td>
+ <td>104</td>
+ <td>supported</td>
+ </tr>
+ <tr>
+ <td>Free-form-text</td>
+ <td>105</td>
+ <td>supported</td>
+ </tr>
+ <tr>
+ <td>Document-text</td>
+ <td>106</td>
+ <td>supported</td>
+ </tr>
+ <tr>
+ <td>Local-number</td>
+ <td>107</td>
+ <td>supported</td>
+ </tr>
+ <tr>
+ <td>String</td>
+ <td>108</td>
+ <td>unsupported</td>
+ </tr>
+ <tr>
+ <td>Numeric string</td>
+ <td>109</td>
+ <td>supported</td>
+ </tr>
+ </tbody>
+ </table>
</sect3>
<para>
+ The structure attribute value <literal>local-number
+ (107)</literal>
+ is supported, and maps always to the Zebra internal document ID.
+ </para>
+
+ <para>
For example, in
the GILS schema (<literal>gils.abs</literal>), the
west-bounding-coordinate is indexed as type <literal>n</literal>,
<sect3 id="querymodel-bib1-truncation">
<title>Truncation Attributes (type = 5)</title>
+
+ <para>
+ The truncation attribute specifies whether variations of one or
+ more characters are allowed between serch term and hit terms, or
+ not. Using non-default truncation attributes will broaden the
+ document hit set of a search query.
+ </para>
+
+ <table id="querymodel-bib1-truncation-table"
+ frame="all" rowsep="1" colsep="1" align="center">
+
+ <caption>Truncation Attributes (type 5)</caption>
+ <thead>
+ <tr>
+ <td>Truncation</td>
+ <td>Value</td>
+ <td>Notes</td>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>Right truncation </td>
+ <td>1</td>
+ <td>supported</td>
+ </tr>
+ <tr>
+ <td>Left truncation</td>
+ <td>2</td>
+ <td>supported</td>
+ </tr>
+ <tr>
+ <td>Left and right truncation</td>
+ <td>3</td>
+ <td>supported</td>
+ </tr>
+ <tr>
+ <td>Do not truncate</td>
+ <td>100</td>
+ <td>default</td>
+ </tr>
+ <tr>
+ <td>Process # in search term</td>
+ <td>101</td>
+ <td>supported</td>
+ </tr>
+ <tr>
+ <td>RegExpr-1 </td>
+ <td>102</td>
+ <td>supported</td>
+ </tr>
+ <tr>
+ <td>RegExpr-2</td>
+ <td>103</td>
+ <td>supported</td>
+ </tr>
+ </tbody>
+ </table>
+
+ <para>
+ Truncation attribute value
+ <literal>Process # in search term (100)</literal> is a
+ poor-man's regular expression search. It maps
+ each <literal>#</literal> to <literal>.*</literal>, and
+ performes then a <literal>Regexp-1 (102)</literal> regular
+ expression search.
+ </para>
+ <para>
+ Truncation attribute value
+ <literal>Regexp-1 (102)</literal> is a normal regular search,
+ see.
+ </para>
<para>
- Supported are: No truncation(100) which is the default,
- Right trunation(1), Left truncation(2),
- Left&Right truncation(3),
- Process <literal>#</literal> in term(100) which maps
- each # to <literal>.*</literal>,
- Regexp-1(102) normal regular, Regexp-2(103) (regular with fuzzy),
+ Truncation attribute value
+ <literal>Regexp-2 (103) </literal> is a Zebra specific extention
+ which allows <emphasis>fuzzy</emphasis> matches. One single
+ error in spelling of search terms is allowed, i.e., a document
+ is hit if it includes a term which can be mapped to the used
+ search term by one character substitution, addition, deletion or
+ change of posiiton.
+ </para>
<!--
Special 104, 105, 106 are deprecated and will be removed! -->
- </para>
</sect3>
<sect3 id="querymodel-bib1-completeness">
set used in a <literal>search</literal> operation query.
</para>
- <table id="querymodel-zebra-attr-search-table">
+ <table id="querymodel-zebra-attr-search-table"
+ frame="all" rowsep="1" colsep="1" align="center">
+
<caption>Zebra Search Attribute Extentions</caption>
<thead>
<tr>
- <td><emphasis>Name and Type</emphasis></td>
+ <td>Name</td>
+ <td>Value</td>
<td>Operation</td>
<td>Zebra version</td>
</tr>
</thead>
<tbody>
<tr>
- <td><emphasis>Embedded Sort (type 7)</emphasis></td>
+ <td>Embedded Sort</td>
+ <td>7</td>
<td>search</td>
<td>1.1</td>
</tr>
<tr>
- <td><emphasis>Term Set (type 8)</emphasis></td>
+ <td>Term Set</td>
+ <td>8</td>
<td>search</td>
<td>1.1</td>
</tr>
<tr>
- <td><emphasis>Rank weight (type 9)</emphasis></td>
+ <td>Rank Weight</td>
+ <td>9</td>
<td>search</td>
<td>1.1</td>
</tr>
<tr>
- <td><emphasis>Approx Limit (type 9)</emphasis></td>
+ <td>Approx Limit</td>
+ <td>9</td>
<td>search</td>
<td>1.4</td>
</tr>
<tr>
- <td><emphasis>Term Reference (type 10)</emphasis></td>
+ <td>Term Reference</td>
+ <td>10</td>
<td>search</td>
<td>1.4</td>
</tr>
<title>Zebra Extention Term Reference Attribute (type 10)</title>
</sect3>
<para>
- Zebra supports the searchResult-1 facility. If attribute 10 is
+ Zebra supports the <literal>searchResult-1</literal> facility.
+ If the <literal>Term Reference Attribute (type 10)</literal> is
given, that specifies a subqueryId value returned as part of the
search result. It is a way for a client to name an APT part of a
query.
recognized regardless of attribute
set used in a <literal>scan</literal> operation query.
</para>
- <table id="querymodel-zebra-attr-scan-table">
+ <table id="querymodel-zebra-attr-scan-table"
+ frame="all" rowsep="1" colsep="1" align="center">
+
<caption>Zebra Scan Attribute Extentions</caption>
<thead>
<tr>
- <td><emphasis>Name and Type</emphasis></td>
+ <td>Name</td>
+ <td>Type</td>
<td>Operation</td>
<td>Zebra version</td>
</tr>
</thead>
<tbody>
<tr>
- <td><emphasis>Result Set Narrow (type 8)</emphasis></td>
+ <td>Result Set Narrow</td>
+ <td>8</td>
<td>scan</td>
<td>1.3</td>
</tr>
<tr>
- <td><emphasis>Approximative Limit (type 9)</emphasis></td>
+ <td>Approximative Limit</td>
+ <td>9</td>
<td>scan</td>
<td>1.4</td>
</tr>
</tbody>
</table>
- <sect3 id="querymodel-zebra-attr-xyz">
+ <sect3 id="querymodel-zebra-attr-narrow">
<title>Zebra Extention Result Set Narrow (type 8)</title>
</sect3>
<para>
- If attribute 8 is given for scan, the value is the name of a
- result set. Each hit count in scan is @and'ed with the result set
- given.
+ If attribute <literal>Result Set Narrow (type 8)</literal>
+ is given for <literal>scan</literal>, the value is the name of a
+ result set. Each hit count in <literal>scan</literal> is
+ <literal>@and</literal>'ed with the result set given.
</para>
- <!--
<para>
+ Consider for example
+ the case of scanning all title fields around the
+ scanterm <emphasis>mozart</emphasis>, then refining the scan by
+ issuing a filtering query for <emphasis>amadeus</emphasis> to
+ restric the scan to the result set of the query:
<screen>
+ Z> scan @attr 1=4 mozart
+ ...
+ * mozart (43)
+ mozartforskningen (1)
+ mozartiana (1)
+ mozarts (16)
+ ...
+ Z> f @attr 1=4 amadeus
+ ...
+ Number of hits: 15, setno 2
+ ...
+ Z> scan @attr 1=4 @attr 8=2 mozart
+ ...
+ * mozart (14)
+ mozartforskningen (0)
+ mozartiana (0)
+ mozarts (1)
+ ...
</screen>
</para>
- -->
+
<warning>
- Experimental and buggy. Definitely not to be used in production code.
+ Experimental. Do not use in production code.
</warning>
- <sect3 id="querymodel-zebra-attr-xyz">
+ <sect3 id="querymodel-zebra-attr-approx">
<title>Zebra Extention Approximative Limit (type 9)</title>
</sect3>
<para>
- The approximative limit (as for search) is a way to enable approx
- hit counts for scan hit counts.
+ The <literal>Zebra Extention Approximative Limit (type
+ 9)</literal> is a way to enable approx
+ hit counts for <literal>scan</literal> hit counts, in the same
+ way as for <literal>search</literal> hit counts.
</para>
<!--
<para>
</para>
-->
<warning>
- Experimental. Do not use in production code.
+ Experimental and buggy. Definitely not to be used in production code.
</warning>
</sect2>
-
+
+
+ <sect2 id="querymodel-idxpath">
+ <title>Zebra special IDXPATH Attribute Set for GRS indexing</title>
+ <para>
+ The attribute-set <literal>idxpath</literal> consists of a single
+ <literal>Use (type 1)</literal> attribute. All non-use attributes
+ behave as normal.
+ </para>
+ <para>
+ This feature is enabled when defining the
+ <literal>xpath enable</literal> option in the GRS filter
+ <literal>*.abs</literal> configuration files. If one wants to use
+ the special <literal>idxpath</literal> numeric attribute set, the
+ main Zebra configuraiton file <filename>zebra.cfg</filename>
+ directive <literal>attset: idxpath.att</literal> must be enabled.
+ </para>
+ <warning>The <literal>idxpath</literal> is depreciated, may not be
+ supported in future Zebra versions, and should definitely
+ not be used in production code.
+ </warning>
+
+ <sect3 id="querymodel-idxpath-use">
+ <title>IDXPATH Use Attributes (type = 1)</title>
+ <para>
+ This attribute set allows one to search GRS filter indexed
+ records by XPATH like structured index names. It is enabled by
+ specifying the <literal></literal>
+ </para>
+
+
+ <warning>The <literal>idxpath</literal> option defines hard-coded
+ index names, which might clash with your own index names.
+ </warning>
+
+ <table id="querymodel-idxpath-use-table"
+ frame="all" rowsep="1" colsep="1" align="center">
+
+ <caption>Zebra specific IDXPATH Use Attributes (type 1)</caption>
+ <thead>
+ <tr>
+ <td>IDXPATH</td>
+ <td>Value</td>
+ <td>String Index</td>
+ <td>Notes</td>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>XPATH Begin</td>
+ <td>1</td>
+ <td>_XPATH_BEGIN</td>
+ <td>depreciated</td>
+ </tr>
+ <tr>
+ <td>XPATH End</td>
+ <td>2</td>
+ <td>_XPATH_END</td>
+ <td>depreciated</td>
+ </tr>
+ <tr>
+ <td>XPATH CData</td>
+ <td>1016</td>
+ <td>_XPATH_CDATA</td>
+ <td>depreciated</td>
+ </tr>
+ <tr>
+ <td>XPATH Attribute Name</td>
+ <td>3</td>
+ <td>_XPATH_ATTR_NAME</td>
+ <td>depreciated</td>
+ </tr>
+ <tr>
+ <td>XPATH Attribute CData</td>
+ <td>1015</td>
+ <td>_XPATH_ATTR_CDATA</td>
+ <td>depreciated</td>
+ </tr>
+ </tbody>
+ </table>
+
+
+ <para>
+ See <filename>tab/idxpath.att</filename> for more information.
+ </para>
+ <para>
+ Search for all documents starting with root element
+ <literal>/root</literal> (either using the numeric or the string
+ use attributes):
+ <screen>
+ Z> find @attrset idxpath @attr 1=1 @attr 4=3 root/
+ Z> find @attr idxpath 1=1 @attr 4=3 root/
+ Z> find @attr 1=_XPATH_BEGIN @attr 4=3 root/
+ </screen>
+ </para>
+ <para>
+ Search for all documents where specific nested XPATH
+ <literal>/c1/c2/../cn</literal> exists. Notice the very
+ counter-intuitive <emphasis>reverse</emphasis> notation!
+ <screen>
+ Z> find @attrset idxpath @attr 1=1 @attr 4=3 cn/cn-1/../c1/
+ Z> find @attr 1=_XPATH_BEGIN @attr 4=3 cn/cn-1/../c1/
+ </screen>
+ </para>
+ <para>
+ Search for CDATA string <emphasis>text</emphasis> in any element
+ <screen>
+ Z> find @attrset idxpath @attr 1=1016 text
+ Z> find @attr 1=_XPATH_CDATA text
+ </screen>
+ </para>
+ <para>
+ Search for CDATA string <emphasis>anothertext</emphasis> in any
+ attribute:
+ <screen>
+ Z> find @attrset idxpath @attr 1=1015 anothertext
+ Z> find @attr 1=_XPATH_ATTR_CDATA anothertext
+ </screen>
+ </para>
+ <para>
+ Search for all documents with have an XML element node
+ including an XML attribute named <emphasis>creator</emphasis>
+ <screen>
+ Z> find @attrset idxpath @attr 1=3 @attr 4=3 creator
+ Z> find @attr 1=_XPATH_ATTR_NAME @attr 4=3 creator
+ </screen>
+ </para>
+ <para>
+ Combining usual <literal>bib-1</literal> attribut set searches
+ with <literal>idxpath</literal> attribute set searches:
+ <screen>
+ Z> find @and @attr idxpath 1=1 @attr 4=3 link/ @attr 1=4 mozart
+ Z> find @and @attr 1=_XPATH_BEGIN @attr 4=3 link/ @attr 1=_XPATH_CDATA mozart
+ </screen>
+ </para>
+
+ </sect3>
+ </sect2>
+
<sect2 id="querymodel-bib1-mapping">
<title>Mapping from Bib1 Attributes to Zebra internal
Both query types follow the same syntax with the operands:
</para>
- <table id="querymodel-regular-operands-table">
+ <table id="querymodel-regular-operands-table"
+ frame="all" rowsep="1" colsep="1" align="center">
+
<caption>Regular Expression Operands</caption>
<!--
<thead>
-->
<tbody>
<tr>
- <td><emphasis>x</emphasis></td>
- <td>Matches the character <emphasis>x</emphasis>.</td>
+ <td><literal>x</literal></td>
+ <td>Matches the character <literal>x</literal>.</td>
</tr>
<tr>
- <td><emphasis>.</emphasis></td>
+ <td><literal>.</literal></td>
<td>Matches any character.</td>
</tr>
<tr>
- <td><emphasis>[ .. ]</emphasis></td>
+ <td><literal>[ .. ]</literal></td>
<td>Matches the set of characters specified;
such as <literal>[abc]</literal> or <literal>[a-c]</literal>.</td>
</tr>
The above operands can be combined with the following operators:
</para>
-
- <table id="querymodel-regular-operators-table">
+ <table id="querymodel-regular-operators-table"
+ frame="all" rowsep="1" colsep="1" align="center">
<caption>Regular Expression Operators</caption>
<!--
<thead>
-->
<tbody>
<tr>
- <td><emphasis>x*</emphasis></td>
- <td>Matches <emphasis>x</emphasis> zero or more times.
+ <td><literal>x*</literal></td>
+ <td>Matches <literal>x</literal> zero or more times.
Priority: high.</td>
</tr>
<tr>
- <td><emphasis>x+</emphasis></td>
- <td>Matches <emphasis>x</emphasis> one or more times.
+ <td><literal>x+</literal></td>
+ <td>Matches <literal>x</literal> one or more times.
Priority: high.</td>
</tr>
<tr>
- <td><emphasis>x?</emphasis></td>
- <td> Matches <emphasis>x</emphasis> zero or once.
+ <td><literal>x?</literal></td>
+ <td> Matches <literal>x</literal> zero or once.
Priority: high.</td>
</tr>
<tr>
- <td><emphasis>xy</emphasis></td>
- <td> Matches <emphasis>x</emphasis>, then <emphasis>y</emphasis>.
+ <td><literal>xy</literal></td>
+ <td> Matches <literal>x</literal>, then <literal>y</literal>.
Priority: medium.</td>
</tr>
<tr>
- <td><emphasis>x|y</emphasis></td>
- <td> Matches either <emphasis>x</emphasis> or <emphasis>y</emphasis>.
+ <td><literal>x|y</literal></td>
+ <td> Matches either <literal>x</literal> or <literal>y</literal>.
Priority: low.</td>
</tr>
<tr>
- <td><emphasis>( )</emphasis></td>
+ <td><literal>( )</literal></td>
<td>The order of evaluation may be changed by using parentheses.</td>
</tr>
</tbody>
</table>
-
+
<para>
- If the first character of the <emphasis>Regxp-2</emphasis> query
+ If the first character of the <literal>Regxp-2</literal> query
is a plus character (<literal>+</literal>) it marks the
beginning of a section with non-standard specifiers.
The next plus character marks the end of the section.
<para>
Combinations with other attributes are possible. For example, a
- ranked search with a regular expression
- (see <xref linkend="administration-ranking"/> for the glory details):
+ ranked search with a regular expression:
<screen>
Z> find @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
</screen>
process input records.
Two basic types of processing are available - raw text and structured
data. Raw text is just that, and it is selected by providing the
- argument <emphasis>text</emphasis> to Zebra. Structured records are
+ argument <literal>text</literal> to Zebra. Structured records are
all handled internally using the basic mechanisms described in the
subsequent sections.
Zebra can read structured records in many different formats.