X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Fquerymodel.xml;h=1e41be98eb56481353e6cf3a116fcdae2f3816ec;hb=676ae79af06721621b1f66bdaec06164b3ba7b1f;hp=cbdf9022e48e743c9659819de2c2a34ba2b6d4c5;hpb=e7ac5d718e802430433faca1e4f040b2cfcf4977;p=idzebra-moved-to-github.git diff --git a/doc/querymodel.xml b/doc/querymodel.xml index cbdf902..1e41be9 100644 --- a/doc/querymodel.xml +++ b/doc/querymodel.xml @@ -1,5 +1,5 @@ - + Query Model
@@ -11,18 +11,18 @@ &zebra; is born as a networking Information Retrieval engine adhering to the international standards - &z3950; and - &sru;, + &acro.z3950; and + &acro.sru;, and implement the - type-1 Reverse Polish Notation (&rpn;) query + type-1 Reverse Polish Notation (&acro.rpn;) query model defined there. Unfortunately, this model has only defined a binary encoded representation, which is used as transport packaging in - the &z3950; protocol layer. This representation is not human + the &acro.z3950; protocol layer. This representation is not human readable, nor defines any convenient way to specify queries. - Since the type-1 (&rpn;) + Since the type-1 (&acro.rpn;) query structure has no direct, useful string representation, every client application needs to provide some form of mapping from a local query notation or representation to it. @@ -30,33 +30,33 @@
- Prefix Query Format (&pqf;) + Prefix Query Format (&acro.pqf;) Index Data has defined a textual representation in the Prefix Query Format, short - &pqf;, which maps + &acro.pqf;, which maps one-to-one to binary encoded - type-1 &rpn; queries. - &pqf; has been adopted by other - parties developing &z3950; software, and is often referred to as + type-1 &acro.rpn; queries. + &acro.pqf; has been adopted by other + parties developing &acro.z3950; software, and is often referred to as Prefix Query Notation, or in short - &pqn;. See + &acro.pqn;. See for further explanations and descriptions of &zebra;'s capabilities.
- Common Query Language (&cql;) + Common Query Language (&acro.cql;) - The query model of the type-1 &rpn;, - expressed in &pqf;/&pqn; is natively supported. - On the other hand, the default &sru; + The query model of the type-1 &acro.rpn;, + expressed in &acro.pqf;/&acro.pqn; is natively supported. + On the other hand, the default &acro.sru; web services Common Query Language - &cql; is not natively supported. + &acro.cql; is not natively supported. - &zebra; can be configured to understand and map &cql; to &pqf;. See + &zebra; can be configured to understand and map &acro.cql; to &acro.pqf;. See .
@@ -67,7 +67,7 @@ Operation types &zebra; supports all of the three different - &z3950;/&sru; operations defined in the + &acro.z3950;/&acro.sru; operations defined in the standards: explain, search, and scan. A short description of the functionality and purpose of each is quite in order here. @@ -76,7 +76,7 @@
Explain Operation - The syntax of &z3950;/&sru; queries is + The syntax of &acro.z3950;/&acro.sru; queries is well known to any client, but the specific semantics - taking into account a particular servers functionalities and abilities - must be @@ -89,15 +89,15 @@ of the general query model are supported. - The &z3950; embeds the explain operation + The &acro.z3950; embeds the explain operation by performing a search in the magic IR-Explain-1 database; see . - In &sru;, explain is an entirely separate - operation, which returns an ZeeRex &xml; record according to the + In &acro.sru;, explain is an entirely separate + operation, which returns an ZeeRex &acro.xml; record according to the structure defined by the protocol. @@ -117,7 +117,7 @@ simple free text searches to nested complex boolean queries, targeting specific indexes, and possibly enhanced with many query semantic specifications. Search interactions are the heart - and soul of &z3950;/&sru; servers. + and soul of &acro.z3950;/&acro.sru; servers.
@@ -145,24 +145,24 @@
- &rpn; queries and semantics + &acro.rpn; queries and semantics - The &pqf; grammar + The &acro.pqf; grammar is documented in the &yaz; manual, and shall not be - repeated here. This textual &pqf; representation + repeated here. This textual &acro.pqf; representation is not transmistted to &zebra; during search, but it is in the - client mapped to the equivalent &z3950; binary + client mapped to the equivalent &acro.z3950; binary query parse tree.
- &rpn; tree structure + &acro.rpn; tree structure - The &rpn; parse tree - or the equivalent textual representation in &pqf; - + The &acro.rpn; parse tree - or the equivalent textual representation in &acro.pqf; - may start with one specification of the attribute set used. Following is a query tree, which - consists of atomic query parts (&apt;) or + consists of atomic query parts (&acro.apt;) or named result sets, eventually paired by boolean binary operators, and finally recursively combined into @@ -184,7 +184,7 @@ Attribute set - &pqf; notation (Short hand) + &acro.pqf; notation (Short hand) Status Notes @@ -201,11 +201,11 @@ predefined - &bib1; + &acro.bib1; bib-1 - Standard &pqf; query language attribute set which defines the - semantics of &z3950; searching. In addition, all of the - non-use attributes (types 2-12) define the hard-wired + Standard &acro.pqf; query language attribute set which defines the + semantics of &acro.z3950; searching. In addition, all of the + non-use attributes (types 2-14) define the hard-wired &zebra; internal query processing. default @@ -213,15 +213,15 @@ GILS gils - Extension to the &bib1; attribute set. + Extension to the &acro.bib1; attribute set. predefined @@ -238,7 +238,7 @@ The &zebra; internal query processing is modeled after - the &bib1; attribute set, and the non-use + the &acro.bib1; attribute set, and the non-use attributes type 2-6 are hard-wired in. It is therefore essential to be familiar with . @@ -317,7 +317,7 @@ retrieval, taking proximity into account: The hit set is a subset of the corresponding AND query - (see the &pqf; grammar for + (see the &acro.pqf; grammar for details on the proximity operator): Z> find @prox 0 3 0 2 k 2 information retrieval @@ -338,23 +338,23 @@
- Atomic queries (&apt;) + Atomic queries (&acro.apt;) Atomic queries are the query parts which work on one access point only. These consist of an attribute list followed by a single term or a quoted term list, and are often called - Attributes-Plus-Terms (&apt;) queries. + Attributes-Plus-Terms (&acro.apt;) queries. - Atomic (&apt;) queries are always leaf nodes in the &pqf; query tree. + Atomic (&acro.apt;) queries are always leaf nodes in the &acro.pqf; query tree. UN-supplied non-use attributes types 2-12 are either inherited from higher nodes in the query tree, or are set to &zebra;'s default values. See for details. - Atomic queries (&apt;) + Atomic queries (&acro.apt;) @@ -407,7 +407,7 @@ The scan operation is only supported with - atomic &apt; queries, as it is bound to one access point at a + atomic &acro.apt; queries, as it is bound to one access point at a time. Boolean query trees are not allowed during scan. @@ -429,8 +429,8 @@ Named result sets are supported in &zebra;, and result sets can be used as operands without limitations. It follows that named - result sets are leaf nodes in the &pqf; query tree, exactly as - atomic &apt; queries are. + result sets are leaf nodes in the &acro.pqf; query tree, exactly as + atomic &acro.apt; queries are. After the execution of a search, the result set is available at @@ -460,10 +460,10 @@ - Named result sets are only supported by the &z3950; protocol. - The &sru; web service is stateless, and therefore the notion of + Named result sets are only supported by the &acro.z3950; protocol. + The &acro.sru; web service is stateless, and therefore the notion of named result sets does not exist when accessing a &zebra; server by - the &sru; protocol. + the &acro.sru; protocol. @@ -483,7 +483,7 @@ Finding all documents which have the term list "information - retrieval" in an &zebra; index, using it's internal full string + retrieval" in an &zebra; index, using its internal full string name. Scanning the same index. Z> find @attr 1=sometext "information retrieval" @@ -492,7 +492,7 @@ Searching or scanning - the bib-1 use attribute 54 using it's string name: + the bib-1 use attribute 54 using its string name: Z> find @attr 1=Code-language eng Z> scan @attr 1=Code-language "" @@ -501,7 +501,7 @@ It is possible to search in any silly string index - if it's defined in your - indexation rules and can be parsed by the &pqf; parser. + indexation rules and can be parsed by the &acro.pqf; parser. This is definitely not the recommended use of this facility, as it might confuse your users with some very unexpected results. @@ -512,14 +512,14 @@ See also for details, and - for the &sru; &pqf; query extension using string names as a fast + for the &acro.sru; &acro.pqf; query extension using string names as a fast debugging facility.
&zebra;'s special access point of type 'XPath' - for &grs1; filters + for &acro.grs1; filters As we have seen above, it is possible (albeit seldom a great idea) to emulate @@ -531,15 +531,15 @@ be defined at indexation time, no new undefined XPath queries can entered at search time, and second, it might confuse users very much that an XPath-alike index name in fact - gets populated from a possible entirely different &xml; element + gets populated from a possible entirely different &acro.xml; element than it pretends to access. - When using the &grs1; Record Model + When using the &acro.grs1; Record Model (see ), we have the possibility to embed life XPath expressions - in the &pqf; queries, which are here called + in the &acro.pqf; queries, which are here called use (type 1) xpath attributes. You must enable the xpath enable directive in your @@ -549,14 +549,14 @@ Only a very restricted subset of the XPath 1.0 - standard is supported as the &grs1; record model is simpler than - a full &xml; &dom; structure. See the following examples for + standard is supported as the &acro.grs1; record model is simpler than + a full &acro.xml; &acro.dom; structure. See the following examples for possibilities. Finding all documents which have the term "content" - inside a text node found in a specific &xml; &dom; + inside a text node found in a specific &acro.xml; &acro.dom; subtree, whose starting element is addressed by XPath. @@ -586,7 +586,7 @@ Filter the addressing XPath by a predicate working on exact string values in - attributes (in the &xml; sense) can be done: return all those docs which + attributes (in the &acro.xml; sense) can be done: return all those docs which have the term "english" contained in one of all text sub nodes of the subtree defined by the XPath /record/title[@lang='en']. And similar @@ -607,8 +607,8 @@ - Escaping &pqf; keywords and other non-parseable XPath constructs - with '{ }' to prevent client-side &pqf; parsing + Escaping &acro.pqf; keywords and other non-parseable XPath constructs + with '{ }' to prevent client-side &acro.pqf; parsing syntax errors: Z> find @attr {1=/root/first[@attr='danish']} content @@ -630,7 +630,7 @@
Explain Attribute Set - The &z3950; standard defines the + The &acro.z3950; standard defines the Explain attribute set Exp-1, which is used to discover information about a server's search semantics and functional capabilities @@ -644,11 +644,11 @@ In addition, the non-Use - &bib1; attributes, that is, the types + &acro.bib1; attributes, that is, the types Relation, Position, Structure, Truncation, and Completeness are imported from - the &bib1; attribute set, and may be used + the &acro.bib1; attribute set, and may be used within any explain query. @@ -669,7 +669,7 @@ See tab/explain.att and the - &z3950; standard + &acro.z3950; standard for more information.
@@ -678,11 +678,11 @@ Explain searches with yaz-client Classic Explain only defines retrieval of Explain information - via ASN.1. Practically no &z3950; clients supports this. Fortunately + via ASN.1. Practically no &acro.z3950; clients supports this. Fortunately they don't have to - &zebra; allows retrieval of this information in other formats: - &sutrs;, &xml;, - &grs1; and ASN.1 Explain. + &acro.sutrs;, &acro.xml;, + &acro.grs1; and ASN.1 Explain. @@ -743,7 +743,7 @@ Default. This query is very useful to study the internal &zebra; indexes. If records have been indexed using the alvis - &xslt; filter, the string representation names of the known indexes can be + &acro.xslt; filter, the string representation names of the known indexes can be found. Z> base IR-Explain-1 @@ -760,13 +760,13 @@
- &bib1; Attribute Set + &acro.bib1; Attribute Set Most of the information contained in this section is an excerpt of - the ATTRIBUTE SET &bib1; (&z3950;-1995) SEMANTICS - found at . The &bib1; + the ATTRIBUTE SET &acro.bib1; (&acro.z3950;-1995) SEMANTICS + found at . The &acro.bib1; Attribute Set Semantics from 1995, also in an updated - &bib1; + &acro.bib1; Attribute Set version from 2003. Index Data is not the copyright holder of this information, except for the configuration details, the listing of @@ -788,7 +788,7 @@ tab/gils.att. - For example, some few &bib1; use + For example, some few &acro.bib1; use attributes from the tab/bib1.att are: att 1 Personal-name @@ -922,7 +922,7 @@ The relation attributes 1-5 are supported and work exactly as expected. All ordering operations are based on a lexicographical ordering, - expect when the + except when the structure attribute numeric (109) is used. In this case, ordering is numerical. See . @@ -979,7 +979,7 @@ AlwaysMatches (103) is a great way to discover how many documents have been indexed in a given field. The search term is ignored, but needed for correct - &pqf; syntax. An empty search term may be supplied. + &acro.pqf; syntax. An empty search term may be supplied. Z> find @attr 1=Title @attr 2=103 "" Z> find @attr 1=Title @attr 2=103 @attr 4=1 "" @@ -1159,7 +1159,7 @@ is supported, and maps to the boolean AND combination of words supplied. The word list is useful when google-like bag-of-word queries need to be translated from a GUI - query language to &pqf;. For example, the following queries + query language to &acro.pqf;. For example, the following queries are equivalent: Z> find @attr 1=Title @attr 4=6 "mozart amadeus" @@ -1213,7 +1213,7 @@ - The exact mapping between &pqf; queries and &zebra; internal indexes + The exact mapping between &acro.pqf; queries and &zebra; internal indexes and index types is explained in . @@ -1408,14 +1408,14 @@ The Complete subfield (2) is a reminiscens - from the happy &marc; + from the happy &acro.marc; binary format days. &zebra; does not support it, but maps silently to Complete field (3). - The exact mapping between &pqf; queries and &zebra; internal indexes + The exact mapping between &acro.pqf; queries and &zebra; internal indexes and index types is explained in . @@ -1427,7 +1427,7 @@
- Extended &zebra; &rpn; Features + Extended &zebra; &acro.rpn; Features The &zebra; internal query engine has been extended to specific needs not covered by the bib-1 attribute set query @@ -1478,7 +1478,7 @@
@@ -1552,7 +1562,7 @@ All ordering operations are based on a lexicographical ordering, - expect when the + except when the structure attribute numeric (109) is used. In this case, ordering is numerical. See . @@ -1562,9 +1572,9 @@ The possible values after attribute type 7 are 1 ascending and 2 descending. - The attributes+term (&apt;) node is separate from the + The attributes+term (&acro.apt;) node is separate from the rest and must be @or'ed. - The term associated with &apt; is the sorting level in integers, + The term associated with &acro.apt; is the sorting level in integers, where 0 means primary sort, 1 means secondary sort, and so forth. See also . @@ -1603,7 +1613,7 @@ a scan-like facility. Requires a client that can do named result sets since the search generates two result sets. The value for attribute 8 is the name of a result set (string). The terms in - the named term set are returned as &sutrs; records. + the named term set are returned as &acro.sutrs; records. For example, searching for u in title, right truncated, and @@ -1624,7 +1634,7 @@ &zebra; Extension Rank Weight Attribute (type 9) Rank weight is a way to pass a value to a ranking algorithm - so - that one &apt; has one value - while another as a different one. + that one &acro.apt; has one value - while another as a different one. See also . @@ -1642,7 +1652,7 @@ &zebra; supports the searchResult-1 facility. If the Term Reference Attribute (type 10) is given, that specifies a subqueryId value returned as part of the - search result. It is a way for a client to name an &apt; part of a + search result. It is a way for a client to name an &acro.apt; part of a query. - For example, using server-side &cql;-to-&pqf; conversion, one might + For example, using server-side &acro.cql;-to-&acro.pqf; conversion, one might query a zebra server like this: and - if properly configured - even static relevance ranking can - be performed using &cql; query syntax: + be performed using &acro.cql; query syntax: find text = /relevant (plant and soil) @@ -2485,7 +2495,7 @@ By the way, the same configuration can be used to - search using client-side &cql;-to-&pqf; conversion: + search using client-side &acro.cql;-to-&acro.pqf; conversion: (the only difference is querytype cql2rpn instead of querytype cql, and the call specifying a local @@ -2501,7 +2511,7 @@ Exhaustive information can be found in the - Section &cql; to &rpn; conversion" + Section &acro.cql; to &acro.rpn; conversion" in the &yaz; manual.