+ </screen>
+ </para>
+ -->
+ <warning>
+ Experimental and buggy. Definitely not to be used in production code.
+ </warning>
+
+
+ </sect2>
+
+
+ <sect2 id="querymodel-idxpath">
+ <title>Zebra special IDXPATH Attribute Set for GRS indexing</title>
+ <para>
+ The attribute-set <literal>idxpath</literal> consists of a single
+ <literal>Use (type 1)</literal> attribute. All non-use attributes
+ behave as normal.
+ </para>
+ <para>
+ This feature is enabled when defining the
+ <literal>xpath enable</literal> option in the GRS filter
+ <filename>*.abs</filename> configuration files. If one wants to use
+ the special <literal>idxpath</literal> numeric attribute set, the
+ main Zebra configuration file <filename>zebra.cfg</filename>
+ directive <literal>attset: idxpath.att</literal> must be enabled.
+ </para>
+ <warning>The <literal>idxpath</literal> is deprecated, may not be
+ supported in future Zebra versions, and should definitely
+ not be used in production code.
+ </warning>
+
+ <sect3 id="querymodel-idxpath-use">
+ <title>IDXPATH Use Attributes (type = 1)</title>
+ <para>
+ This attribute set allows one to search GRS filter indexed
+ records by XPATH like structured index names.
+ </para>
+
+ <warning>The <literal>idxpath</literal> option defines hard-coded
+ index names, which might clash with your own index names.
+ </warning>
+
+ <table id="querymodel-idxpath-use-table"
+ frame="all" rowsep="1" colsep="1" align="center">
+
+ <caption>Zebra specific IDXPATH Use Attributes (type 1)</caption>
+ <thead>
+ <tr>
+ <td>IDXPATH</td>
+ <td>Value</td>
+ <td>String Index</td>
+ <td>Notes</td>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>XPATH Begin</td>
+ <td>1</td>
+ <td>_XPATH_BEGIN</td>
+ <td>deprecated</td>
+ </tr>
+ <tr>
+ <td>XPATH End</td>
+ <td>2</td>
+ <td>_XPATH_END</td>
+ <td>deprecated</td>
+ </tr>
+ <tr>
+ <td>XPATH CData</td>
+ <td>1016</td>
+ <td>_XPATH_CDATA</td>
+ <td>deprecated</td>
+ </tr>
+ <tr>
+ <td>XPATH Attribute Name</td>
+ <td>3</td>
+ <td>_XPATH_ATTR_NAME</td>
+ <td>deprecated</td>
+ </tr>
+ <tr>
+ <td>XPATH Attribute CData</td>
+ <td>1015</td>
+ <td>_XPATH_ATTR_CDATA</td>
+ <td>deprecated</td>
+ </tr>
+ </tbody>
+ </table>
+
+
+ <para>
+ See <filename>tab/idxpath.att</filename> for more information.
+ </para>
+ <para>
+ Search for all documents starting with root element
+ <literal>/root</literal> (either using the numeric or the string
+ use attributes):
+ <screen>
+ Z> find @attrset idxpath @attr 1=1 @attr 4=3 root/
+ Z> find @attr idxpath 1=1 @attr 4=3 root/
+ Z> find @attr 1=_XPATH_BEGIN @attr 4=3 root/
+ </screen>
+ </para>
+ <para>
+ Search for all documents where specific nested XPATH
+ <literal>/c1/c2/../cn</literal> exists. Notice the very
+ counter-intuitive <emphasis>reverse</emphasis> notation!
+ <screen>
+ Z> find @attrset idxpath @attr 1=1 @attr 4=3 cn/cn-1/../c1/
+ Z> find @attr 1=_XPATH_BEGIN @attr 4=3 cn/cn-1/../c1/
+ </screen>
+ </para>
+ <para>
+ Search for CDATA string <emphasis>text</emphasis> in any element
+ <screen>
+ Z> find @attrset idxpath @attr 1=1016 text
+ Z> find @attr 1=_XPATH_CDATA text
+ </screen>
+ </para>
+ <para>
+ Search for CDATA string <emphasis>anothertext</emphasis> in any
+ attribute:
+ <screen>
+ Z> find @attrset idxpath @attr 1=1015 anothertext
+ Z> find @attr 1=_XPATH_ATTR_CDATA anothertext
+ </screen>
+ </para>
+ <para>
+ Search for all documents with have an XML element node
+ including an XML attribute named <emphasis>creator</emphasis>
+ <screen>
+ Z> find @attrset idxpath @attr 1=3 @attr 4=3 creator
+ Z> find @attr 1=_XPATH_ATTR_NAME @attr 4=3 creator
+ </screen>
+ </para>
+ <para>
+ Combining usual <literal>bib-1</literal> attribute set searches
+ with <literal>idxpath</literal> attribute set searches:
+ <screen>
+ Z> find @and @attr idxpath 1=1 @attr 4=3 link/ @attr 1=4 mozart
+ Z> find @and @attr 1=_XPATH_BEGIN @attr 4=3 link/ @attr 1=_XPATH_CDATA mozart
+ </screen>
+ </para>
+ <para>
+ Scanning is supported on all <literal>idxpath</literal>
+ indexes, both specified as numeric use attributes, or as string
+ index names.
+ <screen>
+ Z> scan @attrset idxpath @attr 1=1016 text
+ Z> scan @attr 1=_XPATH_ATTR_CDATA anothertext
+ Z> scan @attrset idxpath @attr 1=3 @attr 4=3 ''
+ </screen>
+ </para>
+
+ </sect3>
+ </sect2>
+
+
+ <sect2 id="querymodel-pqf-apt-mapping">
+ <title>Mapping from PQF atomic APT queries to Zebra internal
+ register indexes</title>
+ <para>
+ The rules for PQF APT mapping are rather tricky to grasp in the
+ first place. We deal first with the rules for deciding which
+ internal register or string index to use, according to the use
+ attribute or access point specified in the query. Thereafter we
+ deal with the rules for determining the correct structure type of
+ the named register.
+ </para>
+
+ <sect3 id="querymodel-pqf-apt-mapping-accesspoint">
+ <title>Mapping of PQF APT access points</title>
+ <para>
+ Zebra understands four fundamental different types of access
+ points, of which only the
+ <emphasis>numeric use attribute</emphasis> type access points
+ are defined by the <ulink url="&url.z39.50;">Z39.50</ulink>
+ standard.
+ All other access point types are Zebra specific, and non-portable.
+ </para>
+
+ <table id="querymodel-zebra-mapping-accesspoint-types"
+ frame="all" rowsep="1" colsep="1" align="center">
+
+ <caption>Access point name mapping</caption>
+ <thead>
+ <tr>
+ <td>Access Point</td>
+ <td>Type</td>
+ <td>Grammar</td>
+ <td>Notes</td>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>Use attribute</td>
+ <td>numeric</td>
+ <td>[1-9][1-9]*</td>
+ <td>directly mapped to string index name</td>
+ </tr>
+ <tr>
+ <td>String index name</td>
+ <td>string</td>
+ <td>[a-zA-Z](\-?[a-zA-Z0-9])*</td>
+ <td>normalized name is used as internal string index name</td>
+ </tr>
+ <tr>
+ <td>Zebra internal index name</td>
+ <td>zebra</td>
+ <td>_[a-zA-Z](_?[a-zA-Z0-9])*</td>
+ <td>hardwired internal string index name</td>
+ </tr>
+ <tr>
+ <td>XPATH special index</td>
+ <td>XPath</td>
+ <td>/.*</td>
+ <td>special xpath search for GRS indexed records</td>
+ </tr>
+ </tbody>
+ </table>
+
+ <para>
+ <literal>Attribute set names</literal> and
+ <literal>string index names</literal> are normalizes
+ according to the following rules: all <emphasis>single</emphasis>
+ hyphens <literal>'-'</literal> are stripped, and all upper case
+ letters are folded to lower case.
+ </para>
+
+ <para>
+ <emphasis>Numeric use attributes</emphasis> are mapped
+ to the Zebra internal
+ string index according to the attribute set definition in use.
+ The default attribute set is <literal>Bib-1</literal>, and may be
+ omitted in the PQF query.
+ </para>
+
+ <para>
+ According to normalization and numeric
+ use attribute mapping, it follows that the following
+ PQF queries are considered equivalent (assuming the default
+ configuration has not been altered):
+ <screen>
+ Z> find @attr 1=Body-of-text serenade
+ Z> find @attr 1=bodyoftext serenade
+ Z> find @attr 1=BodyOfText serenade
+ Z> find @attr 1=bO-d-Y-of-tE-x-t serenade
+ Z> find @attr 1=1010 serenade
+ Z> find @attrset Bib-1 @attr 1=1010 serenade
+ Z> find @attrset bib1 @attr 1=1010 serenade
+ Z> find @attrset Bib1 @attr 1=1010 serenade
+ Z> find @attrset b-I-b-1 @attr 1=1010 serenade
+ </screen>
+ </para>
+
+ <para>
+ The <emphasis>numerical</emphasis>
+ <literal>use attributes (type 1)</literal>
+ are interpreted according to the
+ attribute sets which have been loaded in the
+ <literal>zebra.cfg</literal> file, and are matched against specific
+ fields as specified in the <literal>.abs</literal> file which
+ describes the profile of the records which have been loaded.
+ If no use attribute is provided, a default of
+ <literal>Bib-1 Use Any (1016)</literal> is
+ assumed.
+ The predefined <literal>use attribute sets</literal>
+ can be reconfigured by tweaking the configuration files
+ <filename>tab/*.att</filename>, and
+ new attribute sets can be defined by adding similar files in the
+ configuration path <literal>profilePath</literal> of the server.
+ </para>
+
+ <para>
+ <literal>String indexes</literal> can be accessed directly,
+ independently which attribute set is in use. These are just
+ ignored. The above mentioned name normalization applies.
+ <literal>String index names</literal> are defined in the
+ used indexing filter configuration files, for example in the
+ <literal>GRS</literal>
+ <filename>*.abs</filename> configuration files, or in the
+ <literal>alvis</literal> filter XSLT indexing stylesheets.
+ </para>
+
+ <para>
+ <literal>Zebra internal indexes</literal> can be accessed directly,
+ according to the same rules as the user defined
+ <literal>string indexes</literal>. The only difference is that
+ <literal>Zebra internal index names</literal> are hardwired,
+ all uppercase and
+ must start with the character <literal>'_'</literal>.
+ </para>
+
+ <para>
+ Finally, <literal>XPATH</literal> access points are only
+ available using the <literal>GRS</literal> filter for indexing.
+ These access point names must start with the character
+ <literal>'/'</literal>, they are <emphasis>not
+ normalized</emphasis>, but passed unaltered to the Zebra internal
+ XPATH engine. See <xref linkend="querymodel-use-xpath"/>.
+
+ </para>
+
+
+ </sect3>
+
+
+ <sect3 id="querymodel-pqf-apt-mapping-structuretype">
+ <title>Mapping of PQF APT structure and completeness to
+ register type</title>
+ <para>
+ Internally Zebra has in it's default configuration several
+ different types of registers or indexes, whose tokenization and
+ character normalization rules differ. This reflects the fact that
+ searching fundamental different tokens like dates, numbers,
+ bitfields and string based text needs different rule sets.
+ </para>
+
+ <table id="querymodel-zebra-mapping-structure-types"
+ frame="all" rowsep="1" colsep="1" align="center">
+
+ <caption>Structure and completeness mapping to register types</caption>
+ <thead>
+ <tr>
+ <td>Structure</td>
+ <td>Completeness</td>
+ <td>Register type</td>
+ <td>Notes</td>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>
+ phrase (@attr 4=1), word (@attr 4=2),
+ word-list (@attr 4=6),
+ free-form-text (@attr 4=105), or document-text (@attr 4=106)
+ </td>
+ <td>Incomplete field (@attr 6=1)</td>
+ <td>Word ('w')</td>
+ <td>Traditional tokenized and character normalized word index</td>
+ </tr>
+ <tr>
+ <td>
+ phrase (@attr 4=1), word (@attr 4=2),
+ word-list (@attr 4=6),
+ free-form-text (@attr 4=105), or document-text (@attr 4=106)
+ </td>
+ <td>complete field' (@attr 6=3)</td>
+ <td>Phrase ('p')</td>
+ <td>Character normalized, but not tokenized index for phrase
+ matches
+ </td>
+ </tr>
+ <tr>
+ <td>urx (@attr 4=104)</td>
+ <td>ignored</td>
+ <td>URX/URL ('u')</td>
+ <td>Special index for URL web addresses</td>
+ </tr>
+ <tr>
+ <td>numeric (@attr 4=109)</td>
+ <td>ignored</td>
+ <td>Numeric ('u')</td>
+ <td>Special index for digital numbers</td>
+ </tr>
+ <tr>
+ <td>key (@attr 4=3)</td>
+ <td>ignored</td>
+ <td>Null bitmap ('0')</td>
+ <td>Used for non-tokenizated and non-normalized bit sequences</td>
+ </tr>
+ <tr>
+ <td>year (@attr 4=4)</td>
+ <td>ignored</td>
+ <td>Year ('y')</td>
+ <td>Non-tokenizated and non-normalized 4 digit numbers</td>
+ </tr>
+ <tr>
+ <td>date (@attr 4=5)</td>
+ <td>ignored</td>
+ <td>Date ('d')</td>
+ <td>Non-tokenizated and non-normalized ISO date strings</td>
+ </tr>
+ <tr>
+ <td>ignored</td>
+ <td>ignored</td>
+ <td>Sort ('s')</td>
+ <td>Used with special sort attribute set (@attr 7=1, @attr 7=2)</td>
+ </tr>
+ <tr>
+ <td>overruled</td>
+ <td>overruled</td>
+ <td>special</td>
+ <td>Internal record ID register, used whenever
+ Relation Always Matches (@attr 2=103) is specified</td>
+ </tr>
+ </tbody>
+ </table>
+
+ <!-- see in util/zebramap.c -->
+
+ <para>
+ If a <emphasis>Structure</emphasis> attribute of
+ <emphasis>Phrase</emphasis> is used in conjunction with a
+ <emphasis>Completeness</emphasis> attribute of
+ <emphasis>Complete (Sub)field</emphasis>, the term is matched
+ against the contents of the phrase (long word) register, if one
+ exists for the given <emphasis>Use</emphasis> attribute.
+ A phrase register is created for those fields in the
+ GRS <filename>*.abs</filename> file that contains a
+ <literal>p</literal>-specifier.
+ <screen>
+ Z> scan @attr 1=Title @attr 4=1 @attr 6=3 beethoven
+ ...
+ bayreuther festspiele (1)
+ * beethoven bibliography database (1)
+ benny carter (1)
+ ...
+ Z> find @attr 1=Title @attr 4=1 @attr 6=3 "beethoven bibliography"
+ ...
+ Number of hits: 0, setno 5
+ ...
+ Z> find @attr 1=Title @attr 4=1 @attr 6=3 "beethoven bibliography database"
+ ...
+ Number of hits: 1, setno 6
+ </screen>
+ </para>
+
+ <para>
+ If <emphasis>Structure</emphasis>=<emphasis>Phrase</emphasis> is
+ used in conjunction with <emphasis>Incomplete Field</emphasis> - the
+ default value for <emphasis>Completeness</emphasis>, the
+ search is directed against the normal word registers, but if the term
+ contains multiple words, the term will only match if all of the words
+ are found immediately adjacent, and in the given order.
+ The word search is performed on those fields that are indexed as
+ type <literal>w</literal> in the GRS <filename>*.abs</filename> file.
+ <screen>
+ Z> scan @attr 1=Title @attr 4=1 @attr 6=1 beethoven
+ ...
+ beefheart (1)
+ * beethoven (18)
+ beethovens (7)
+ ...
+ Z> find @attr 1=Title @attr 4=1 @attr 6=1 beethoven
+ ...
+ Number of hits: 18, setno 1
+ ...
+ Z> find @attr 1=Title @attr 4=1 @attr 6=1 "beethoven bibliography"
+ ...
+ Number of hits: 2, setno 2
+ ...
+ </screen>
+ </para>
+
+ <para>
+ If the <emphasis>Structure</emphasis> attribute is
+ <emphasis>Word List</emphasis>,
+ <emphasis>Free-form Text</emphasis>, or
+ <emphasis>Document Text</emphasis>, the term is treated as a
+ natural-language, relevance-ranked query.
+ This search type uses the word register, i.e. those fields
+ that are indexed as type <literal>w</literal> in the
+ GRS <filename>*.abs</filename> file.