X-Git-Url: http://git.indexdata.com/?p=idzebra-moved-to-github.git;a=blobdiff_plain;f=doc%2Fquerymodel.xml;h=d7a7f8489ae6cbc385672497dc1fbcea1ecf9aeb;hp=afdb4078ceae66d6eb96dc5f100c4e03a8c6dfb9;hb=ba0720e26f508ba3396e232d2f82037c0e701698;hpb=5ca4e60e990af6ad6b62ebff855d7b642f37c3ec diff --git a/doc/querymodel.xml b/doc/querymodel.xml index afdb407..d7a7f84 100644 --- a/doc/querymodel.xml +++ b/doc/querymodel.xml @@ -1,5 +1,4 @@ - Query Model
@@ -11,18 +10,18 @@ &zebra; is born as a networking Information Retrieval engine adhering to the international standards - Z39.50 and - SRU, + &acro.z3950; and + &acro.sru;, and implement the - type-1 Reverse Polish Notation (RPN) query + type-1 Reverse Polish Notation (&acro.rpn;) query model defined there. Unfortunately, this model has only defined a binary encoded representation, which is used as transport packaging in - the Z39.50 protocol layer. This representation is not human + the &acro.z3950; protocol layer. This representation is not human readable, nor defines any convenient way to specify queries. - Since the type-1 (RPN) + Since the type-1 (&acro.rpn;) query structure has no direct, useful string representation, every client application needs to provide some form of mapping from a local query notation or representation to it. @@ -30,33 +29,33 @@
- Prefix Query Format (PQF) + Prefix Query Format (&acro.pqf;) Index Data has defined a textual representation in the Prefix Query Format, short - PQF, which maps + &acro.pqf;, which maps one-to-one to binary encoded - type-1 RPN queries. - PQF has been adopted by other - parties developing Z39.50 software, and is often referred to as + type-1 &acro.rpn; queries. + &acro.pqf; has been adopted by other + parties developing &acro.z3950; software, and is often referred to as Prefix Query Notation, or in short - PQN. See + &acro.pqn;. See for further explanations and descriptions of &zebra;'s capabilities.
- Common Query Language (CQL) + Common Query Language (&acro.cql;) - The query model of the type-1 RPN, - expressed in PQF/PQN is natively supported. - On the other hand, the default SRU + The query model of the type-1 &acro.rpn;, + expressed in &acro.pqf;/&acro.pqn; is natively supported. + On the other hand, the default &acro.sru; web services Common Query Language - CQL is not natively supported. + &acro.cql; is not natively supported. - &zebra; can be configured to understand and map CQL to PQF. See + &zebra; can be configured to understand and map &acro.cql; to &acro.pqf;. See .
@@ -67,7 +66,7 @@ Operation types &zebra; supports all of the three different - Z39.50/SRU operations defined in the + &acro.z3950;/&acro.sru; operations defined in the standards: explain, search, and scan. A short description of the functionality and purpose of each is quite in order here. @@ -76,7 +75,7 @@
Explain Operation - The syntax of Z39.50/SRU queries is + The syntax of &acro.z3950;/&acro.sru; queries is well known to any client, but the specific semantics - taking into account a particular servers functionalities and abilities - must be @@ -89,15 +88,15 @@ of the general query model are supported. - The Z39.50 embeds the explain operation + The &acro.z3950; embeds the explain operation by performing a search in the magic IR-Explain-1 database; see . - In SRU, explain is an entirely separate - operation, which returns an ZeeRex XML record according to the + In &acro.sru;, explain is an entirely separate + operation, which returns an ZeeRex &acro.xml; record according to the structure defined by the protocol. @@ -117,7 +116,7 @@ simple free text searches to nested complex boolean queries, targeting specific indexes, and possibly enhanced with many query semantic specifications. Search interactions are the heart - and soul of Z39.50/SRU servers. + and soul of &acro.z3950;/&acro.sru; servers.
@@ -143,26 +142,25 @@
-
- RPN queries and semantics + &acro.rpn; queries and semantics - The PQF grammar - is documented in the YAZ manual, and shall not be - repeated here. This textual PQF representation - is not transmistted to &zebra; during search, but it is in the - client mapped to the equivalent Z39.50 binary + The &acro.pqf; grammar + is documented in the &yaz; manual, and shall not be + repeated here. This textual &acro.pqf; representation + is not transmitted to &zebra; during search, but it is in the + client mapped to the equivalent &acro.z3950; binary query parse tree.
- RPN tree structure + &acro.rpn; tree structure - The RPN parse tree - or the equivalent textual representation in PQF - + The &acro.rpn; parse tree - or the equivalent textual representation in &acro.pqf; - may start with one specification of the attribute set used. Following is a query tree, which - consists of atomic query parts (APT) or + consists of atomic query parts (&acro.apt;) or named result sets, eventually paired by boolean binary operators, and finally recursively combined into @@ -184,7 +182,7 @@ Attribute set - PQF notation (Short hand) + &acro.pqf; notation (Short hand) Status Notes @@ -201,11 +199,11 @@ predefined - Bib-1 + &acro.bib1; bib-1 - Standard PQF query language attribute set which defines the - semantics of Z39.50 searching. In addition, all of the - non-use attributes (types 2-12) define the hard-wired + Standard &acro.pqf; query language attribute set which defines the + semantics of &acro.z3950; searching. In addition, all of the + non-use attributes (types 2-14) define the hard-wired &zebra; internal query processing. default @@ -213,18 +211,9 @@ GILS gils - Extension to the Bib-1 attribute set. + Extension to the &acro.bib1; attribute set. predefined - @@ -238,7 +227,7 @@ The &zebra; internal query processing is modeled after - the Bib-1 attribute set, and the non-use + the &acro.bib1; attribute set, and the non-use attributes type 2-6 are hard-wired in. It is therefore essential to be familiar with . @@ -317,7 +306,7 @@ retrieval, taking proximity into account: The hit set is a subset of the corresponding AND query - (see the PQF grammar for + (see the &acro.pqf; grammar for details on the proximity operator): Z> find @prox 0 3 0 2 k 2 information retrieval @@ -338,23 +327,23 @@
- Atomic queries (APT) + Atomic queries (&acro.apt;) Atomic queries are the query parts which work on one access point only. These consist of an attribute list followed by a single term or a quoted term list, and are often called - Attributes-Plus-Terms (APT) queries. + Attributes-Plus-Terms (&acro.apt;) queries. - Atomic (APT) queries are always leaf nodes in the PQF query tree. + Atomic (&acro.apt;) queries are always leaf nodes in the &acro.pqf; query tree. UN-supplied non-use attributes types 2-12 are either inherited from higher nodes in the query tree, or are set to &zebra;'s default values. See for details. - Atomic queries (APT) + Atomic queries (&acro.apt;) @@ -407,7 +396,7 @@ The scan operation is only supported with - atomic APT queries, as it is bound to one access point at a + atomic &acro.apt; queries, as it is bound to one access point at a time. Boolean query trees are not allowed during scan. @@ -429,8 +418,8 @@ Named result sets are supported in &zebra;, and result sets can be used as operands without limitations. It follows that named - result sets are leaf nodes in the PQF query tree, exactly as - atomic APT queries are. + result sets are leaf nodes in the &acro.pqf; query tree, exactly as + atomic &acro.apt; queries are. After the execution of a search, the result set is available at @@ -460,10 +449,10 @@ - Named result sets are only supported by the Z39.50 protocol. - The SRU web service is stateless, and therefore the notion of + Named result sets are only supported by the &acro.z3950; protocol. + The &acro.sru; web service is stateless, and therefore the notion of named result sets does not exist when accessing a &zebra; server by - the SRU protocol. + the &acro.sru; protocol. @@ -483,7 +472,7 @@ Finding all documents which have the term list "information - retrieval" in an &zebra; index, using it's internal full string + retrieval" in an &zebra; index, using its internal full string name. Scanning the same index. Z> find @attr 1=sometext "information retrieval" @@ -492,7 +481,7 @@ Searching or scanning - the bib-1 use attribute 54 using it's string name: + the bib-1 use attribute 54 using its string name: Z> find @attr 1=Code-language eng Z> scan @attr 1=Code-language "" @@ -501,7 +490,7 @@ It is possible to search in any silly string index - if it's defined in your - indexation rules and can be parsed by the PQF parser. + indexing rules and can be parsed by the &acro.pqf; parser. This is definitely not the recommended use of this facility, as it might confuse your users with some very unexpected results. @@ -512,14 +501,14 @@ See also for details, and - for the SRU PQF query extension using string names as a fast + for the &acro.sru; &acro.pqf; query extension using string names as a fast debugging facility.
&zebra;'s special access point of type 'XPath' - for GRS filters + for &acro.grs1; filters As we have seen above, it is possible (albeit seldom a great idea) to emulate @@ -528,18 +517,18 @@ string attributes which in appearance resemble XPath queries. There are two problems with this approach: first, the XPath-look-alike has to - be defined at indexation time, no new undefined + be defined at indexing time, no new undefined XPath queries can entered at search time, and second, it might confuse users very much that an XPath-alike index name in fact - gets populated from a possible entirely different XML element + gets populated from a possible entirely different &acro.xml; element than it pretends to access. - When using the GRS Record Model + When using the &acro.grs1; Record Model (see ), we have the possibility to embed life XPath expressions - in the PQF queries, which are here called + in the &acro.pqf; queries, which are here called use (type 1) xpath attributes. You must enable the xpath enable directive in your @@ -549,14 +538,14 @@ Only a very restricted subset of the XPath 1.0 - standard is supported as the GRS record model is simpler than - a full XML DOM structure. See the following examples for + standard is supported as the &acro.grs1; record model is simpler than + a full &acro.xml; &acro.dom; structure. See the following examples for possibilities. Finding all documents which have the term "content" - inside a text node found in a specific XML DOM + inside a text node found in a specific &acro.xml; &acro.dom; subtree, whose starting element is addressed by XPath. @@ -586,7 +575,7 @@ Filter the addressing XPath by a predicate working on exact string values in - attributes (in the XML sense) can be done: return all those docs which + attributes (in the &acro.xml; sense) can be done: return all those docs which have the term "english" contained in one of all text sub nodes of the subtree defined by the XPath /record/title[@lang='en']. And similar @@ -607,8 +596,8 @@ - Escaping PQF keywords and other non-parseable XPath constructs - with '{ }' to prevent client-side PQF parsing + Escaping &acro.pqf; keywords and other non-parseable XPath constructs + with '{ }' to prevent client-side &acro.pqf; parsing syntax errors: Z> find @attr {1=/root/first[@attr='danish']} content @@ -630,7 +619,7 @@
Explain Attribute Set - The Z39.50 standard defines the + The &acro.z3950; standard defines the Explain attribute set Exp-1, which is used to discover information about a server's search semantics and functional capabilities @@ -644,11 +633,11 @@ In addition, the non-Use - Bib-1 attributes, that is, the types + &acro.bib1; attributes, that is, the types Relation, Position, Structure, Truncation, and Completeness are imported from - the Bib-1 attribute set, and may be used + the &acro.bib1; attribute set, and may be used within any explain query. @@ -669,7 +658,7 @@ See tab/explain.att and the - Z39.50 standard + &acro.z3950; standard for more information.
@@ -678,11 +667,11 @@ Explain searches with yaz-client Classic Explain only defines retrieval of Explain information - via ASN.1. Practically no Z39.50 clients supports this. Fortunately + via ASN.1. Practically no &acro.z3950; clients supports this. Fortunately they don't have to - &zebra; allows retrieval of this information in other formats: - SUTRS, XML, - GRS-1 and ASN.1 Explain. + &acro.sutrs;, &acro.xml;, + &acro.grs1; and ASN.1 Explain. @@ -743,7 +732,7 @@ Default. This query is very useful to study the internal &zebra; indexes. If records have been indexed using the alvis - XSLT filter, the string representation names of the known indexes can be + &acro.xslt; filter, the string representation names of the known indexes can be found. Z> base IR-Explain-1 @@ -760,13 +749,13 @@
- Bib-1 Attribute Set + &acro.bib1; Attribute Set Most of the information contained in this section is an excerpt of - the ATTRIBUTE SET BIB-1 (Z39.50-1995) SEMANTICS - found at . The Bib-1 + the ATTRIBUTE SET &acro.bib1; (&acro.z3950;-1995) SEMANTICS + found at . The &acro.bib1; Attribute Set Semantics from 1995, also in an updated - Bib-1 + &acro.bib1; Attribute Set version from 2003. Index Data is not the copyright holder of this information, except for the configuration details, the listing of @@ -788,7 +777,7 @@ tab/gils.att. - For example, some few Bib-1 use + For example, some few &acro.bib1; use attributes from the tab/bib1.att are: att 1 Personal-name @@ -836,7 +825,7 @@
&zebra; general Bib1 Non-Use Attributes (type 2-6) - +
Relation Attributes (type 2) @@ -922,7 +911,7 @@ The relation attributes 1-5 are supported and work exactly as expected. All ordering operations are based on a lexicographical ordering, - expect when the + except when the structure attribute numeric (109) is used. In this case, ordering is numerical. See . @@ -979,7 +968,7 @@ AlwaysMatches (103) is a great way to discover how many documents have been indexed in a given field. The search term is ignored, but needed for correct - PQF syntax. An empty search term may be supplied. + &acro.pqf; syntax. An empty search term may be supplied. Z> find @attr 1=Title @attr 2=103 "" Z> find @attr 1=Title @attr 2=103 @attr 4=1 "" @@ -1039,7 +1028,7 @@
- +
Structure Attributes (type 4) @@ -1053,8 +1042,7 @@ The possible values of the structure attribute (type 4) can be defined - using the configuration file - tab/default.idx. + using the configuration file tab/default.idx. The default configuration is summarized in this table. @@ -1152,14 +1140,13 @@
- The structure attribute values Word list (6) is supported, and maps to the boolean AND combination of words supplied. The word list is useful when - google-like bag-of-word queries need to be translated from a GUI - query language to PQF. For example, the following queries + Google-like bag-of-word queries need to be translated from a GUI + query language to &acro.pqf;. For example, the following queries are equivalent: Z> find @attr 1=Title @attr 4=6 "mozart amadeus" @@ -1185,7 +1172,7 @@ Z> find @attr 1=Body-of-text @attr 2=102 @attr 4=105 "bach salieri teleman" - + The structure attribute value Local number (107) @@ -1213,13 +1200,14 @@ - The exact mapping between PQF queries and &zebra; internal indexes + The exact mapping between &acro.pqf; queries and &zebra; internal indexes and index types is explained in .
+
Truncation Attributes (type = 5) @@ -1407,27 +1395,27 @@ search and scan in index type="p". - The Complete subfield (2) is a reminiscens - from the happy MARC + The Complete subfield (2) is a reminiscent + from the happy &acro.marc; binary format days. &zebra; does not support it, but maps silently to Complete field (3). - The exact mapping between PQF queries and &zebra; internal indexes + The exact mapping between &acro.pqf; queries and &zebra; internal indexes and index types is explained in .
-
- +
+
- Extended &zebra; RPN Features + Extended &zebra; &acro.rpn; Features The &zebra; internal query engine has been extended to specific needs not covered by the bib-1 attribute set query @@ -1478,7 +1466,7 @@
- &zebra; Extension Approximative Limit (type 11) + &zebra; Extension Approximative Limit (type 12) - The &zebra; Extension Approximative Limit (type 11) is a way to + The &zebra; Extension Approximative Limit (type 12) is a way to enable approximate hit counts for scan hit counts, in the same way as for search hit counts. @@ -1821,14 +1814,14 @@
- &zebra; special IDXPATH Attribute Set for GRS indexing + &zebra; special &acro.idxpath; Attribute Set for &acro.grs1; indexing The attribute-set idxpath consists of a single Use (type 1) attribute. All non-use attributes behave as normal. This feature is enabled when defining the - xpath enable option in the GRS filter + xpath enable option in the &acro.grs1; filter *.abs configuration files. If one wants to use the special idxpath numeric attribute set, the main &zebra; configuration file zebra.cfg @@ -1843,10 +1836,10 @@
- IDXPATH Use Attributes (type = 1) + &acro.idxpath; Use Attributes (type = 1) - This attribute set allows one to search GRS filter indexed - records by XPATH like structured index names. + This attribute set allows one to search &acro.grs1; filter indexed + records by &acro.xpath; like structured index names. @@ -1857,11 +1850,11 @@ - &zebra; specific IDXPATH Use Attributes (type 1) + &zebra; specific &acro.idxpath; Use Attributes (type 1) - IDXPATH + &acro.idxpath; Value String Index Notes @@ -1869,31 +1862,31 @@ - XPATH Begin + &acro.xpath; Begin 1 _XPATH_BEGIN deprecated - XPATH End + &acro.xpath; End 2 _XPATH_END deprecated - XPATH CData + &acro.xpath; CData 1016 _XPATH_CDATA deprecated - XPATH Attribute Name + &acro.xpath; Attribute Name 3 _XPATH_ATTR_NAME deprecated - XPATH Attribute CData + &acro.xpath; Attribute CData 1015 _XPATH_ATTR_CDATA deprecated @@ -1916,7 +1909,7 @@ - Search for all documents where specific nested XPATH + Search for all documents where specific nested &acro.xpath; /c1/c2/../cn exists. Notice the very counter-intuitive reverse notation! @@ -1940,8 +1933,8 @@ - Search for all documents with have an XML element node - including an XML attribute named creator + Search for all documents with have an &acro.xml; element node + including an &acro.xml; attribute named creator Z> find @attrset idxpath @attr 1=3 @attr 4=3 creator Z> find @attr 1=_XPATH_ATTR_NAME @attr 4=3 creator @@ -1971,10 +1964,10 @@
- Mapping from PQF atomic APT queries to &zebra; internal + <title>Mapping from &acro.pqf; atomic &acro.apt; queries to &zebra; internal register indexes - The rules for PQF APT mapping are rather tricky to grasp in the + The rules for &acro.pqf; &acro.apt; mapping are rather tricky to grasp in the first place. We deal first with the rules for deciding which internal register or string index to use, according to the use attribute or access point specified in the query. Thereafter we @@ -1983,12 +1976,12 @@
- Mapping of PQF APT access points + Mapping of &acro.pqf; &acro.apt; access points &zebra; understands four fundamental different types of access points, of which only the numeric use attribute type access points - are defined by the Z39.50 + are defined by the &acro.z3950; standard. All other access point types are &zebra; specific, and non-portable. @@ -2024,10 +2017,10 @@ hardwired internal string index name - XPATH special index + &acro.xpath; special index XPath /.* - special xpath search for GRS indexed records + special xpath search for &acro.grs1; indexed records
@@ -2045,14 +2038,14 @@ Numeric use attributes are mapped to the &zebra; internal string index according to the attribute set definition in use. - The default attribute set is Bib-1, and may be - omitted in the PQF query. + The default attribute set is &acro.bib1;, and may be + omitted in the &acro.pqf; query. According to normalization and numeric use attribute mapping, it follows that the following - PQF queries are considered equivalent (assuming the default + &acro.pqf; queries are considered equivalent (assuming the default configuration has not been altered): Z> find @attr 1=Body-of-text serenade @@ -2060,7 +2053,7 @@ Z> find @attr 1=BodyOfText serenade Z> find @attr 1=bO-d-Y-of-tE-x-t serenade Z> find @attr 1=1010 serenade - Z> find @attrset Bib-1 @attr 1=1010 serenade + Z> find @attrset bib1 @attr 1=1010 serenade Z> find @attrset bib1 @attr 1=1010 serenade Z> find @attrset Bib1 @attr 1=1010 serenade Z> find @attrset b-I-b-1 @attr 1=1010 serenade @@ -2076,7 +2069,7 @@ fields as specified in the .abs file which describes the profile of the records which have been loaded. If no use attribute is provided, a default of - Bib-1 Use Any (1016) is assumed. + &acro.bib1; Use Any (1016) is assumed. The predefined use attribute sets can be reconfigured by tweaking the configuration files tab/*.att, and @@ -2090,9 +2083,9 @@ ignored. The above mentioned name normalization applies. String index names are defined in the used indexing filter configuration files, for example in the - GRS + &acro.grs1; *.abs configuration files, or in the - alvis filter XSLT indexing stylesheets. + alvis filter &acro.xslt; indexing stylesheets. @@ -2105,12 +2098,12 @@ - Finally, XPATH access points are only - available using the GRS filter for indexing. + Finally, &acro.xpath; access points are only + available using the &acro.grs1; filter for indexing. These access point names must start with the character '/', they are not normalized, but passed unaltered to the &zebra; internal - XPATH engine. See . + &acro.xpath; engine. See . @@ -2119,10 +2112,10 @@
- Mapping of PQF APT structure and completeness to + <title>Mapping of &acro.pqf; &acro.apt; structure and completeness to register type - Internally &zebra; has in it's default configuration several + Internally &zebra; has in its default configuration several different types of registers or indexes, whose tokenization and character normalization rules differ. This reflects the fact that searching fundamental different tokens like dates, numbers, @@ -2172,26 +2165,26 @@ numeric (@attr 4=109) ignored - Numeric ('u') + Numeric ('n') Special index for digital numbers key (@attr 4=3) ignored Null bitmap ('0') - Used for non-tokenizated and non-normalized bit sequences + Used for non-tokenized and non-normalized bit sequences year (@attr 4=4) ignored Year ('y') - Non-tokenizated and non-normalized 4 digit numbers + Non-tokenized and non-normalized 4 digit numbers date (@attr 4=5) ignored Date ('d') - Non-tokenizated and non-normalized ISO date strings + Non-tokenized and non-normalized ISO date strings ignored @@ -2220,7 +2213,7 @@ against the contents of the phrase (long word) register, if one exists for the given Use attribute. A phrase register is created for those fields in the - GRS *.abs file that contains a + &acro.grs1; *.abs file that contains a p-specifier. Z> scan @attr 1=Title @attr 4=1 @attr 6=3 beethoven @@ -2247,7 +2240,7 @@ contains multiple words, the term will only match if all of the words are found immediately adjacent, and in the given order. The word search is performed on those fields that are indexed as - type w in the GRS *.abs file. + type w in the &acro.grs1; *.abs file. Z> scan @attr 1=Title @attr 4=1 @attr 6=1 beethoven ... @@ -2274,14 +2267,14 @@ natural-language, relevance-ranked query. This search type uses the word register, i.e. those fields that are indexed as type w in the - GRS *.abs file. + &acro.grs1; *.abs file. If the Structure attribute is Numeric String the term is treated as an integer. The search is performed on those fields that are indexed - as type n in the GRS + as type n in the &acro.grs1; *.abs file. @@ -2452,20 +2445,20 @@
- Server Side CQL to PQF Query Translation + Server Side &acro.cql; to &acro.pqf; Query Translation Using the <cql2rpn>l2rpn.txt</cql2rpn> - YAZ Frontend Virtual + &yaz; Frontend Virtual Hosts option, one can configure - the YAZ Frontend CQL-to-PQF + the &yaz; Frontend &acro.cql;-to-&acro.pqf; converter, specifying the interpretation of various - CQL + &acro.cql; indexes, relations, etc. in terms of Type-1 query attributes. - For example, using server-side CQL-to-PQF conversion, one might + For example, using server-side &acro.cql;-to-&acro.pqf; conversion, one might query a zebra server like this: and - if properly configured - even static relevance ranking can - be performed using CQL query syntax: + be performed using &acro.cql; query syntax: find text = /relevant (plant and soil) @@ -2485,7 +2478,7 @@ By the way, the same configuration can be used to - search using client-side CQL-to-PQF conversion: + search using client-side &acro.cql;-to-&acro.pqf; conversion: (the only difference is querytype cql2rpn instead of querytype cql, and the call specifying a local @@ -2501,9 +2494,8 @@ Exhaustive information can be found in the - Section "Specification of CQL to RPN mappings" in the YAZ manual. - , - and shall therefore not be repeated here. + Section &acro.cql; to &acro.rpn; conversion + in the &yaz; manual.