X-Git-Url: http://git.indexdata.com/?p=idzebra-moved-to-github.git;a=blobdiff_plain;f=doc%2Fquerymodel.xml;h=20e6b4b942d1d314639eb7543c83c1bf1dc48961;hp=bae113f2551991bc0c62332313c2d2500ede7aa2;hb=558bf94a5f36eb89b0ca7ac4780b641da852c36b;hpb=8b21c0028a137ae87201c1f4334879dabb23bad7 diff --git a/doc/querymodel.xml b/doc/querymodel.xml index bae113f..20e6b4b 100644 --- a/doc/querymodel.xml +++ b/doc/querymodel.xml @@ -1,5 +1,5 @@ - + Query Model @@ -8,8 +8,8 @@ Zebra is born as a networking Information Retrieval engine adhering to the international standards - Z39.50 and - SRU, + Z39.50 and + SRU, and implement the query model defined there. Unfortunately, the Z39.50 query model has only defined a binary encoded representation, which is used as transport packaging in @@ -29,7 +29,7 @@ In addition, Zebra can be configured to understand and map the Common Query Language - (CQL) + (CQL) to PQF. See an introduction on the mapping to the internal query representation in . @@ -40,22 +40,281 @@ Prefix Query Format structure and syntax The - PQF - grammer is documented in the YAZ manual. + PQF + grammer is documented in the YAZ manual, and shall not be + repeated here. This textual PQF representation is always during search mapped to the equivalent Zebra internal query parse tree. + + PQF tree structure + The PQF parse tree - or the equivalent textual representation - + may start with one specification of the + attribute set used. Following is a query + tree, which + consists of atomic query parts, eventually + paired by boolean binary operators, and + finally recursively combined into + complex query trees. + + Attribute sets + + Attribute sets define the exact meaning and semantics of queries + issued. Zebra comes with some predefined attribute set + definitions, others can easily be defined and added to the + configuration. + + The Zebra internal query procesing is modeled after + the Bib1 attribute set, and the non-use + attributes type 2-9 are hard-wired in. It is therefore essential + to be familiar with . + + + + + + + + + + + + + + + + + + + + + + + +
Attribute sets predefined in Zebra
exp-1Explain attribute setSpecial attribute set used on the special automagic + IR-Explain-1 database to gain information on + server capabilities, database names, and database + and semantics.
bib-1Bib1 attribute setStandard PQF query language attribute set which defines the + semantics of Z39.50 searching. In addition, all of the + non-use attributes (type 2-9) define the Zebra internal query + processing
gilsGILS attribute setExtention to the Bib1 attribute set.
+
+ + + Boolean operators + + A pair of subquery trees, or of atomic queries, is combined + using the standard boolean operators into new query trees. + + + + + + + + + + + + + + + + + + + + + + + +
Boolean operators
@andbinary AND operatorSet intersection of two atomic queries hit sets
@orbinary OR operatorSet union of two atomic queries hit sets
@notbinary AND NOT operatorSet complement of two atomic queries hit sets
@proxbinary PROXIMY operatorSet intersection of two atomic queries hit sets. In + addition, the intersection set is purged for all + documents which do not satisfy the requested query + term proximity. Usually a proper subset of the AND + operation.
+ + + For example, we can combine the terms + information and retrieval + into different searches in the default index of the default + attribute set as follows. + Querying for the union of all documents containing the + terms information OR + retrieval: + + @or information retrieval + + + + Querying for the intersection of all documents containing the + terms information AND + retrieval: + The hit set is a subset of the coresponding + OR query. + + @and information retrieval + + + + Querying for the intersection of all documents containing the + terms information AND + retrieval, taking proximity into account: + The hit set is a subset of the coresponding + AND query. + + @prox information retrieval + + + + Querying for the intersection of all documents containing the + terms information AND + retrieval, in the same order and near each + other as described in the term list + The hit set is a subset of the coresponding + PROXIMY query. + + "information retrieval" + + +
+ + + + Atomic queries + + Atomic queries are the query parts which work on one acess point + only. These consist of an attribute list + followed by a single term or a + quoted term list. + + + Unsupplied non-use attributes type 2-9 are either inherited from + higher nodes in the query tree, or are set to Zebra's default values. + See for details. + + + + + + + + + + + + + + + +
Atomic queries
attribute listList of orthogonal attributesAny of the orthogonal attribute types may be omitted, + these are inherited from higher query tree nodes, or if not + inherited, are set to the default Zebra configuration values. +
termsingle term + or quoted term list Here the search terms or list of search terms is added + to the query
+ + Querying for the term information in the + default index using the default attribite set, the server choice + of access point/index, and the default non-use attributes. + + "information" + + + + Equivalent query fully specified: + + @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 "information" + + + + + Finding all documents which have empty titles. Notice that the + empty term must be quoted, but is otherwise legal. + + @attr 1=4 "" + + + +
+ + + Zebra's special use attribute of type 'string' + + The numeric use (type 1) attribute is usually + refered to from a given + attribute set. In addition, Zebra let you use + any internal index + name defined in your configuration + as use atribute value. This is a great feature for + debugging, and when you do + not need the complecity of defined use attribute values. It is + the preferred way of accessing Zebra indexes directly. + + + Finding all documents which have the term list "information + retrieval" in an Zebra index, using it's internal full string name. + + @attr 1=sometext "information retrieval" + + + + Searching the bib-1 use attribute 54 using it's string name: + + @attr 1=Code-language eng + + + + Searching in any silly string index - if it's defined in your + indexation rules and can be parsed by the PQF parser. + This is definitely not the recommended use of + this facility, as it might confuse your users with some very + unexpected results. + + @attr 1=silly/xpath/alike[@index]/name "information retrieval" + + + + See for details, and + + for the SRU PQF query extention using string names as a fast + debugging facility. + + + +
+ Explain Attribute Set + + The Z39.50 standard defines the + Explainattribute set + exp-1, which is used to discover information + about a server's search semantics and functional capabilities + Zebra exposes a "classic" + Explain database by base name IR-Explain-1, which + is populated with system internal information. + - The attribute-set exp-1 is defined for - searching an Explain IR-Explain-1 database. - It consists of a single Use (type 1) attribute. + The attribute-set exp-1 consists of a single + Use (type 1) attribute. In addition, the non-Use @@ -63,7 +322,7 @@ Relation, Position, Structure, Truncation, and Completeness are imported from - the bib-1 attrubute set, and may be used + the bib-1 attribute set, and may be used within any explain query. @@ -90,6 +349,15 @@ Explain searches with yaz-client + + Classic Explain only defines retrieval of Explain information + via ASN.1. Pratically no Z39.50 clients supports this. Fortunately + they don't have to - Zebra allows retrieval of this information + in other formats: + SUTRS, XML, + GRS-1 and ASN.1 Explain. + + List supported categories to find out which explain commands are supported: @@ -173,10 +441,9 @@ Most of the information contained in this section is an excerpt of the ATTRIBUTE SET BIB-1 (Z39.50-1995) SEMANTICS, found at The BIB-1 + url="&url.z39.50.attset.bib1.1995;">The BIB-1 Attribute Set Semantics from 1995, also in an updated - Bib-1 + Bib-1 Attribute Set version from 2003. Index Data is not the copyright holder of this information. @@ -188,21 +455,21 @@ - Relation Attributes (type = 2) + Relation Attributes (type = 2) - Position Attributes (type = 3) + Position Attributes (type = 3) - Structure Attributes (type = 4) + Structure Attributes (type = 4) - Truncation Attributes (type = 5) + Truncation Attributes (type = 5) @@ -570,7 +837,7 @@ Hosts option, one can configure the YAZ Frontend CQL-to-PQF converter, specifying the interpretation of various - CQL + CQL indexes, relations, etc. in terms of Type-1 query attributes.
@@ -639,10 +906,10 @@ http://www.loc.gov/z3950/agency/document.html PQF and BIB-1 stuff to be explained - + http://www.loc.gov/z3950/agency/defns/bib1.html - + http://www.loc.gov/z3950/agency/bib1.html http://www.loc.gov/z3950/agency/markup/13.html