X-Git-Url: http://git.indexdata.com/?p=idzebra-moved-to-github.git;a=blobdiff_plain;f=doc%2Fquerymodel.xml;h=d7a7f8489ae6cbc385672497dc1fbcea1ecf9aeb;hp=b3df12e3da4e077b27ca229810020a11ddae71b2;hb=ba0720e26f508ba3396e232d2f82037c0e701698;hpb=b19b79e382ef8196f1625763db1af3a82b1e0c81 diff --git a/doc/querymodel.xml b/doc/querymodel.xml index b3df12e..d7a7f84 100644 --- a/doc/querymodel.xml +++ b/doc/querymodel.xml @@ -1,5 +1,4 @@ - Query Model
@@ -11,18 +10,18 @@ &zebra; is born as a networking Information Retrieval engine adhering to the international standards - &z3950; and - &sru;, + &acro.z3950; and + &acro.sru;, and implement the - type-1 Reverse Polish Notation (&rpn;) query + type-1 Reverse Polish Notation (&acro.rpn;) query model defined there. Unfortunately, this model has only defined a binary encoded representation, which is used as transport packaging in - the &z3950; protocol layer. This representation is not human + the &acro.z3950; protocol layer. This representation is not human readable, nor defines any convenient way to specify queries. - Since the type-1 (&rpn;) + Since the type-1 (&acro.rpn;) query structure has no direct, useful string representation, every client application needs to provide some form of mapping from a local query notation or representation to it. @@ -30,33 +29,33 @@
- Prefix Query Format (&pqf;) + Prefix Query Format (&acro.pqf;) Index Data has defined a textual representation in the Prefix Query Format, short - &pqf;, which maps + &acro.pqf;, which maps one-to-one to binary encoded - type-1 &rpn; queries. - &pqf; has been adopted by other - parties developing &z3950; software, and is often referred to as + type-1 &acro.rpn; queries. + &acro.pqf; has been adopted by other + parties developing &acro.z3950; software, and is often referred to as Prefix Query Notation, or in short - &pqn;. See + &acro.pqn;. See for further explanations and descriptions of &zebra;'s capabilities.
- Common Query Language (&cql;) + Common Query Language (&acro.cql;) - The query model of the type-1 &rpn;, - expressed in &pqf;/&pqn; is natively supported. - On the other hand, the default &sru; + The query model of the type-1 &acro.rpn;, + expressed in &acro.pqf;/&acro.pqn; is natively supported. + On the other hand, the default &acro.sru; web services Common Query Language - &cql; is not natively supported. + &acro.cql; is not natively supported. - &zebra; can be configured to understand and map &cql; to &pqf;. See + &zebra; can be configured to understand and map &acro.cql; to &acro.pqf;. See .
@@ -67,7 +66,7 @@ Operation types &zebra; supports all of the three different - &z3950;/&sru; operations defined in the + &acro.z3950;/&acro.sru; operations defined in the standards: explain, search, and scan. A short description of the functionality and purpose of each is quite in order here. @@ -76,7 +75,7 @@
Explain Operation - The syntax of &z3950;/&sru; queries is + The syntax of &acro.z3950;/&acro.sru; queries is well known to any client, but the specific semantics - taking into account a particular servers functionalities and abilities - must be @@ -89,15 +88,15 @@ of the general query model are supported. - The &z3950; embeds the explain operation + The &acro.z3950; embeds the explain operation by performing a search in the magic IR-Explain-1 database; see . - In &sru;, explain is an entirely separate - operation, which returns an ZeeRex &xml; record according to the + In &acro.sru;, explain is an entirely separate + operation, which returns an ZeeRex &acro.xml; record according to the structure defined by the protocol. @@ -117,7 +116,7 @@ simple free text searches to nested complex boolean queries, targeting specific indexes, and possibly enhanced with many query semantic specifications. Search interactions are the heart - and soul of &z3950;/&sru; servers. + and soul of &acro.z3950;/&acro.sru; servers.
@@ -143,26 +142,25 @@
-
- &rpn; queries and semantics + &acro.rpn; queries and semantics - The &pqf; grammar + The &acro.pqf; grammar is documented in the &yaz; manual, and shall not be - repeated here. This textual &pqf; representation - is not transmistted to &zebra; during search, but it is in the - client mapped to the equivalent &z3950; binary + repeated here. This textual &acro.pqf; representation + is not transmitted to &zebra; during search, but it is in the + client mapped to the equivalent &acro.z3950; binary query parse tree.
- &rpn; tree structure + &acro.rpn; tree structure - The &rpn; parse tree - or the equivalent textual representation in &pqf; - + The &acro.rpn; parse tree - or the equivalent textual representation in &acro.pqf; - may start with one specification of the attribute set used. Following is a query tree, which - consists of atomic query parts (&apt;) or + consists of atomic query parts (&acro.apt;) or named result sets, eventually paired by boolean binary operators, and finally recursively combined into @@ -184,7 +182,7 @@ Attribute set - &pqf; notation (Short hand) + &acro.pqf; notation (Short hand) Status Notes @@ -201,11 +199,11 @@ predefined - &bib1; + &acro.bib1; bib-1 - Standard &pqf; query language attribute set which defines the - semantics of &z3950; searching. In addition, all of the - non-use attributes (types 2-12) define the hard-wired + Standard &acro.pqf; query language attribute set which defines the + semantics of &acro.z3950; searching. In addition, all of the + non-use attributes (types 2-14) define the hard-wired &zebra; internal query processing. default @@ -213,18 +211,9 @@ GILS gils - Extension to the &bib1; attribute set. + Extension to the &acro.bib1; attribute set. predefined - @@ -238,7 +227,7 @@ The &zebra; internal query processing is modeled after - the &bib1; attribute set, and the non-use + the &acro.bib1; attribute set, and the non-use attributes type 2-6 are hard-wired in. It is therefore essential to be familiar with . @@ -317,7 +306,7 @@ retrieval, taking proximity into account: The hit set is a subset of the corresponding AND query - (see the &pqf; grammar for + (see the &acro.pqf; grammar for details on the proximity operator): Z> find @prox 0 3 0 2 k 2 information retrieval @@ -338,23 +327,23 @@
- Atomic queries (&apt;) + Atomic queries (&acro.apt;) Atomic queries are the query parts which work on one access point only. These consist of an attribute list followed by a single term or a quoted term list, and are often called - Attributes-Plus-Terms (&apt;) queries. + Attributes-Plus-Terms (&acro.apt;) queries. - Atomic (&apt;) queries are always leaf nodes in the &pqf; query tree. + Atomic (&acro.apt;) queries are always leaf nodes in the &acro.pqf; query tree. UN-supplied non-use attributes types 2-12 are either inherited from higher nodes in the query tree, or are set to &zebra;'s default values. See for details. - Atomic queries (&apt;) + Atomic queries (&acro.apt;) @@ -407,7 +396,7 @@ The scan operation is only supported with - atomic &apt; queries, as it is bound to one access point at a + atomic &acro.apt; queries, as it is bound to one access point at a time. Boolean query trees are not allowed during scan. @@ -429,8 +418,8 @@ Named result sets are supported in &zebra;, and result sets can be used as operands without limitations. It follows that named - result sets are leaf nodes in the &pqf; query tree, exactly as - atomic &apt; queries are. + result sets are leaf nodes in the &acro.pqf; query tree, exactly as + atomic &acro.apt; queries are. After the execution of a search, the result set is available at @@ -460,10 +449,10 @@ - Named result sets are only supported by the &z3950; protocol. - The &sru; web service is stateless, and therefore the notion of + Named result sets are only supported by the &acro.z3950; protocol. + The &acro.sru; web service is stateless, and therefore the notion of named result sets does not exist when accessing a &zebra; server by - the &sru; protocol. + the &acro.sru; protocol. @@ -483,7 +472,7 @@ Finding all documents which have the term list "information - retrieval" in an &zebra; index, using it's internal full string + retrieval" in an &zebra; index, using its internal full string name. Scanning the same index. Z> find @attr 1=sometext "information retrieval" @@ -492,7 +481,7 @@ Searching or scanning - the bib-1 use attribute 54 using it's string name: + the bib-1 use attribute 54 using its string name: Z> find @attr 1=Code-language eng Z> scan @attr 1=Code-language "" @@ -501,7 +490,7 @@ It is possible to search in any silly string index - if it's defined in your - indexation rules and can be parsed by the &pqf; parser. + indexing rules and can be parsed by the &acro.pqf; parser. This is definitely not the recommended use of this facility, as it might confuse your users with some very unexpected results. @@ -512,14 +501,14 @@ See also for details, and - for the &sru; &pqf; query extension using string names as a fast + for the &acro.sru; &acro.pqf; query extension using string names as a fast debugging facility.
&zebra;'s special access point of type 'XPath' - for &grs1; filters + for &acro.grs1; filters As we have seen above, it is possible (albeit seldom a great idea) to emulate @@ -528,18 +517,18 @@ string attributes which in appearance resemble XPath queries. There are two problems with this approach: first, the XPath-look-alike has to - be defined at indexation time, no new undefined + be defined at indexing time, no new undefined XPath queries can entered at search time, and second, it might confuse users very much that an XPath-alike index name in fact - gets populated from a possible entirely different &xml; element + gets populated from a possible entirely different &acro.xml; element than it pretends to access. - When using the &grs1; Record Model + When using the &acro.grs1; Record Model (see ), we have the possibility to embed life XPath expressions - in the &pqf; queries, which are here called + in the &acro.pqf; queries, which are here called use (type 1) xpath attributes. You must enable the xpath enable directive in your @@ -549,14 +538,14 @@ Only a very restricted subset of the XPath 1.0 - standard is supported as the &grs1; record model is simpler than - a full &xml; &dom; structure. See the following examples for + standard is supported as the &acro.grs1; record model is simpler than + a full &acro.xml; &acro.dom; structure. See the following examples for possibilities. Finding all documents which have the term "content" - inside a text node found in a specific &xml; &dom; + inside a text node found in a specific &acro.xml; &acro.dom; subtree, whose starting element is addressed by XPath. @@ -586,7 +575,7 @@ Filter the addressing XPath by a predicate working on exact string values in - attributes (in the &xml; sense) can be done: return all those docs which + attributes (in the &acro.xml; sense) can be done: return all those docs which have the term "english" contained in one of all text sub nodes of the subtree defined by the XPath /record/title[@lang='en']. And similar @@ -607,8 +596,8 @@ - Escaping &pqf; keywords and other non-parseable XPath constructs - with '{ }' to prevent client-side &pqf; parsing + Escaping &acro.pqf; keywords and other non-parseable XPath constructs + with '{ }' to prevent client-side &acro.pqf; parsing syntax errors: Z> find @attr {1=/root/first[@attr='danish']} content @@ -630,7 +619,7 @@
Explain Attribute Set - The &z3950; standard defines the + The &acro.z3950; standard defines the Explain attribute set Exp-1, which is used to discover information about a server's search semantics and functional capabilities @@ -644,11 +633,11 @@ In addition, the non-Use - &bib1; attributes, that is, the types + &acro.bib1; attributes, that is, the types Relation, Position, Structure, Truncation, and Completeness are imported from - the &bib1; attribute set, and may be used + the &acro.bib1; attribute set, and may be used within any explain query. @@ -669,7 +658,7 @@ See tab/explain.att and the - &z3950; standard + &acro.z3950; standard for more information.
@@ -678,11 +667,11 @@ Explain searches with yaz-client Classic Explain only defines retrieval of Explain information - via ASN.1. Practically no &z3950; clients supports this. Fortunately + via ASN.1. Practically no &acro.z3950; clients supports this. Fortunately they don't have to - &zebra; allows retrieval of this information in other formats: - &sutrs;, &xml;, - &grs1; and ASN.1 Explain. + &acro.sutrs;, &acro.xml;, + &acro.grs1; and ASN.1 Explain. @@ -743,7 +732,7 @@ Default. This query is very useful to study the internal &zebra; indexes. If records have been indexed using the alvis - &xslt; filter, the string representation names of the known indexes can be + &acro.xslt; filter, the string representation names of the known indexes can be found. Z> base IR-Explain-1 @@ -760,13 +749,13 @@
- &bib1; Attribute Set + &acro.bib1; Attribute Set Most of the information contained in this section is an excerpt of - the ATTRIBUTE SET &bib1; (&z3950;-1995) SEMANTICS - found at . The &bib1; + the ATTRIBUTE SET &acro.bib1; (&acro.z3950;-1995) SEMANTICS + found at . The &acro.bib1; Attribute Set Semantics from 1995, also in an updated - &bib1; + &acro.bib1; Attribute Set version from 2003. Index Data is not the copyright holder of this information, except for the configuration details, the listing of @@ -788,7 +777,7 @@ tab/gils.att. - For example, some few &bib1; use + For example, some few &acro.bib1; use attributes from the tab/bib1.att are: att 1 Personal-name @@ -836,7 +825,7 @@
&zebra; general Bib1 Non-Use Attributes (type 2-6) - +
Relation Attributes (type 2) @@ -922,7 +911,7 @@ The relation attributes 1-5 are supported and work exactly as expected. All ordering operations are based on a lexicographical ordering, - expect when the + except when the structure attribute numeric (109) is used. In this case, ordering is numerical. See . @@ -979,7 +968,7 @@ AlwaysMatches (103) is a great way to discover how many documents have been indexed in a given field. The search term is ignored, but needed for correct - &pqf; syntax. An empty search term may be supplied. + &acro.pqf; syntax. An empty search term may be supplied. Z> find @attr 1=Title @attr 2=103 "" Z> find @attr 1=Title @attr 2=103 @attr 4=1 "" @@ -1039,7 +1028,7 @@
- +
Structure Attributes (type 4) @@ -1053,8 +1042,7 @@ The possible values of the structure attribute (type 4) can be defined - using the configuration file - tab/default.idx. + using the configuration file tab/default.idx. The default configuration is summarized in this table. @@ -1152,14 +1140,13 @@
- The structure attribute values Word list (6) is supported, and maps to the boolean AND combination of words supplied. The word list is useful when - google-like bag-of-word queries need to be translated from a GUI - query language to &pqf;. For example, the following queries + Google-like bag-of-word queries need to be translated from a GUI + query language to &acro.pqf;. For example, the following queries are equivalent: Z> find @attr 1=Title @attr 4=6 "mozart amadeus" @@ -1185,7 +1172,7 @@ Z> find @attr 1=Body-of-text @attr 2=102 @attr 4=105 "bach salieri teleman" - + The structure attribute value Local number (107) @@ -1213,13 +1200,14 @@ - The exact mapping between &pqf; queries and &zebra; internal indexes + The exact mapping between &acro.pqf; queries and &zebra; internal indexes and index types is explained in .
+
Truncation Attributes (type = 5) @@ -1407,27 +1395,27 @@ search and scan in index type="p". - The Complete subfield (2) is a reminiscens - from the happy &marc; + The Complete subfield (2) is a reminiscent + from the happy &acro.marc; binary format days. &zebra; does not support it, but maps silently to Complete field (3). - The exact mapping between &pqf; queries and &zebra; internal indexes + The exact mapping between &acro.pqf; queries and &zebra; internal indexes and index types is explained in .
-
- +
+
- Extended &zebra; &rpn; Features + Extended &zebra; &acro.rpn; Features The &zebra; internal query engine has been extended to specific needs not covered by the bib-1 attribute set query @@ -1478,7 +1466,7 @@
- &zebra; Extension Approximative Limit (type 11) + &zebra; Extension Approximative Limit (type 12) - The &zebra; Extension Approximative Limit (type 11) is a way to + The &zebra; Extension Approximative Limit (type 12) is a way to enable approximate hit counts for scan hit counts, in the same way as for search hit counts. @@ -1821,14 +1814,14 @@
- &zebra; special &idxpath; Attribute Set for &grs1; indexing + &zebra; special &acro.idxpath; Attribute Set for &acro.grs1; indexing The attribute-set idxpath consists of a single Use (type 1) attribute. All non-use attributes behave as normal. This feature is enabled when defining the - xpath enable option in the &grs1; filter + xpath enable option in the &acro.grs1; filter *.abs configuration files. If one wants to use the special idxpath numeric attribute set, the main &zebra; configuration file zebra.cfg @@ -1843,10 +1836,10 @@
- &idxpath; Use Attributes (type = 1) + &acro.idxpath; Use Attributes (type = 1) - This attribute set allows one to search &grs1; filter indexed - records by &xpath; like structured index names. + This attribute set allows one to search &acro.grs1; filter indexed + records by &acro.xpath; like structured index names. @@ -1857,11 +1850,11 @@ - &zebra; specific &idxpath; Use Attributes (type 1) + &zebra; specific &acro.idxpath; Use Attributes (type 1) - &idxpath; + &acro.idxpath; Value String Index Notes @@ -1869,31 +1862,31 @@ - &xpath; Begin + &acro.xpath; Begin 1 _XPATH_BEGIN deprecated - &xpath; End + &acro.xpath; End 2 _XPATH_END deprecated - &xpath; CData + &acro.xpath; CData 1016 _XPATH_CDATA deprecated - &xpath; Attribute Name + &acro.xpath; Attribute Name 3 _XPATH_ATTR_NAME deprecated - &xpath; Attribute CData + &acro.xpath; Attribute CData 1015 _XPATH_ATTR_CDATA deprecated @@ -1916,7 +1909,7 @@ - Search for all documents where specific nested &xpath; + Search for all documents where specific nested &acro.xpath; /c1/c2/../cn exists. Notice the very counter-intuitive reverse notation! @@ -1940,8 +1933,8 @@ - Search for all documents with have an &xml; element node - including an &xml; attribute named creator + Search for all documents with have an &acro.xml; element node + including an &acro.xml; attribute named creator Z> find @attrset idxpath @attr 1=3 @attr 4=3 creator Z> find @attr 1=_XPATH_ATTR_NAME @attr 4=3 creator @@ -1971,10 +1964,10 @@
- Mapping from &pqf; atomic &apt; queries to &zebra; internal + <title>Mapping from &acro.pqf; atomic &acro.apt; queries to &zebra; internal register indexes - The rules for &pqf; &apt; mapping are rather tricky to grasp in the + The rules for &acro.pqf; &acro.apt; mapping are rather tricky to grasp in the first place. We deal first with the rules for deciding which internal register or string index to use, according to the use attribute or access point specified in the query. Thereafter we @@ -1983,12 +1976,12 @@
- Mapping of &pqf; &apt; access points + Mapping of &acro.pqf; &acro.apt; access points &zebra; understands four fundamental different types of access points, of which only the numeric use attribute type access points - are defined by the &z3950; + are defined by the &acro.z3950; standard. All other access point types are &zebra; specific, and non-portable. @@ -2024,10 +2017,10 @@ hardwired internal string index name - &xpath; special index + &acro.xpath; special index XPath /.* - special xpath search for &grs1; indexed records + special xpath search for &acro.grs1; indexed records
@@ -2045,14 +2038,14 @@ Numeric use attributes are mapped to the &zebra; internal string index according to the attribute set definition in use. - The default attribute set is &bib1;, and may be - omitted in the &pqf; query. + The default attribute set is &acro.bib1;, and may be + omitted in the &acro.pqf; query. According to normalization and numeric use attribute mapping, it follows that the following - &pqf; queries are considered equivalent (assuming the default + &acro.pqf; queries are considered equivalent (assuming the default configuration has not been altered): Z> find @attr 1=Body-of-text serenade @@ -2060,7 +2053,7 @@ Z> find @attr 1=BodyOfText serenade Z> find @attr 1=bO-d-Y-of-tE-x-t serenade Z> find @attr 1=1010 serenade - Z> find @attrset &bib1; @attr 1=1010 serenade + Z> find @attrset bib1 @attr 1=1010 serenade Z> find @attrset bib1 @attr 1=1010 serenade Z> find @attrset Bib1 @attr 1=1010 serenade Z> find @attrset b-I-b-1 @attr 1=1010 serenade @@ -2076,7 +2069,7 @@ fields as specified in the .abs file which describes the profile of the records which have been loaded. If no use attribute is provided, a default of - &bib1; Use Any (1016) is assumed. + &acro.bib1; Use Any (1016) is assumed. The predefined use attribute sets can be reconfigured by tweaking the configuration files tab/*.att, and @@ -2090,9 +2083,9 @@ ignored. The above mentioned name normalization applies. String index names are defined in the used indexing filter configuration files, for example in the - &grs1; + &acro.grs1; *.abs configuration files, or in the - alvis filter &xslt; indexing stylesheets. + alvis filter &acro.xslt; indexing stylesheets. @@ -2105,12 +2098,12 @@ - Finally, &xpath; access points are only - available using the &grs1; filter for indexing. + Finally, &acro.xpath; access points are only + available using the &acro.grs1; filter for indexing. These access point names must start with the character '/', they are not normalized, but passed unaltered to the &zebra; internal - &xpath; engine. See . + &acro.xpath; engine. See . @@ -2119,10 +2112,10 @@
- Mapping of &pqf; &apt; structure and completeness to + <title>Mapping of &acro.pqf; &acro.apt; structure and completeness to register type - Internally &zebra; has in it's default configuration several + Internally &zebra; has in its default configuration several different types of registers or indexes, whose tokenization and character normalization rules differ. This reflects the fact that searching fundamental different tokens like dates, numbers, @@ -2172,26 +2165,26 @@ numeric (@attr 4=109) ignored - Numeric ('u') + Numeric ('n') Special index for digital numbers key (@attr 4=3) ignored Null bitmap ('0') - Used for non-tokenizated and non-normalized bit sequences + Used for non-tokenized and non-normalized bit sequences year (@attr 4=4) ignored Year ('y') - Non-tokenizated and non-normalized 4 digit numbers + Non-tokenized and non-normalized 4 digit numbers date (@attr 4=5) ignored Date ('d') - Non-tokenizated and non-normalized ISO date strings + Non-tokenized and non-normalized ISO date strings ignored @@ -2220,7 +2213,7 @@ against the contents of the phrase (long word) register, if one exists for the given Use attribute. A phrase register is created for those fields in the - &grs1; *.abs file that contains a + &acro.grs1; *.abs file that contains a p-specifier. Z> scan @attr 1=Title @attr 4=1 @attr 6=3 beethoven @@ -2247,7 +2240,7 @@ contains multiple words, the term will only match if all of the words are found immediately adjacent, and in the given order. The word search is performed on those fields that are indexed as - type w in the &grs1; *.abs file. + type w in the &acro.grs1; *.abs file. Z> scan @attr 1=Title @attr 4=1 @attr 6=1 beethoven ... @@ -2274,14 +2267,14 @@ natural-language, relevance-ranked query. This search type uses the word register, i.e. those fields that are indexed as type w in the - &grs1; *.abs file. + &acro.grs1; *.abs file. If the Structure attribute is Numeric String the term is treated as an integer. The search is performed on those fields that are indexed - as type n in the &grs1; + as type n in the &acro.grs1; *.abs file. @@ -2452,20 +2445,20 @@
- Server Side &cql; to &pqf; Query Translation + Server Side &acro.cql; to &acro.pqf; Query Translation Using the <cql2rpn>l2rpn.txt</cql2rpn> &yaz; Frontend Virtual Hosts option, one can configure - the &yaz; Frontend &cql;-to-&pqf; + the &yaz; Frontend &acro.cql;-to-&acro.pqf; converter, specifying the interpretation of various - &cql; + &acro.cql; indexes, relations, etc. in terms of Type-1 query attributes. - For example, using server-side &cql;-to-&pqf; conversion, one might + For example, using server-side &acro.cql;-to-&acro.pqf; conversion, one might query a zebra server like this: and - if properly configured - even static relevance ranking can - be performed using &cql; query syntax: + be performed using &acro.cql; query syntax: find text = /relevant (plant and soil) @@ -2485,7 +2478,7 @@ By the way, the same configuration can be used to - search using client-side &cql;-to-&pqf; conversion: + search using client-side &acro.cql;-to-&acro.pqf; conversion: (the only difference is querytype cql2rpn instead of querytype cql, and the call specifying a local @@ -2501,9 +2494,8 @@ Exhaustive information can be found in the - Section "Specification of &cql; to &rpn; mappings" in the &yaz; manual. - , - and shall therefore not be repeated here. + Section &acro.cql; to &acro.rpn; conversion + in the &yaz; manual.