From 3f97d3b682ee70a2b523b67153a83e9545d1b610 Mon Sep 17 00:00:00 2001 From: Marc Cromme Date: Sun, 25 Jun 2006 21:54:03 +0000 Subject: [PATCH] added few comments --- doc/querymodel.xml | 194 +++++++++++++++++++++++++++++----------------------- 1 file changed, 108 insertions(+), 86 deletions(-) diff --git a/doc/querymodel.xml b/doc/querymodel.xml index 88c2fd7..ee17061 100644 --- a/doc/querymodel.xml +++ b/doc/querymodel.xml @@ -1,10 +1,9 @@ - + Query Model - Query Model Overview - + Query Model Overview Query Languages @@ -34,16 +33,16 @@ Prefix Query Format (PQF) - Index Data has defined a textual representaion in the + Index Data has defined a textual representation in the Prefix Query Format, short - PQF, which mappes + PQF, which maps one-to-one to binary encoded type-1 RPN query packages. It has been adopted by other parties developing Z39.50 software, and is often referred to as Prefix Query Notation, or in short PQN. See - for further explanaitions and + for further explanations and descriptions of Zebra's capabilities. @@ -92,15 +91,15 @@ of the general query model are supported. - The Z39.50 embeddes the explain operation - by perfoming a + The Z39.50 embeds the explain operation + by performing a search in the magic IR-Explain-1 database; see . - In SRU, explain is an entirely seperate - operation, which returns an Zeerex + In SRU, explain is an entirely separate + operation, which returns an ZeeRex XML record according to the structure defined by the protocol. @@ -134,9 +133,9 @@ It provides the means to investigate the content of specific indexes. - Scanning an index returns a handfull of terms actually fond in + Scanning an index returns a handful of terms actually fond in the indexes, and in addition the scan - operation returns th enumber of documents indexed by each term. + operation returns the number of documents indexed by each term. A search client can use this information to propose proper spelling of search terms, to auto-fill search boxes, or to display controlled vocabularies. @@ -219,7 +218,7 @@ GILS gils - Extention to the Bib1 attribute set. + Extension to the Bib1 attribute set. predefined attribute list @@ -382,7 +386,7 @@ Querying for the term information in the - default index using the default attribite set, the server choice + default index using the default attribute set, the server choice of access point/index, and the default non-use attributes. Z> find information @@ -394,7 +398,7 @@ Z> find @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 information - + Finding all documents which have the term debussy in the title field. @@ -403,6 +407,22 @@ + + The scan operation is only supported with + atomic APT queries, as it is bound to one access point at a + time. Boolean query trees are not allowed during + scan. + + + + For example, we migh want to scan the title index, starting with + the term + debussy, and displaying this and the + following terms in lexicographic order: + + Z> scan @attr 1=4 debussy + + @@ -410,13 +430,15 @@ Named Result Sets Named result sets are supported in Zebra, and result sets can be - used as operands without limitations. + used as operands without limitations. It follows that named + result sets are leaf nodes in the PQF query tree, exactly as + atomic APT queries are. After the execution of a search, the result set is available at the server, such that the client can use it for subsequent searches or retrieval requests. The Z30.50 standard actually - stresses the fact that result sets are voliatile. It may cease + stresses the fact that result sets are volatile. It may cease to exist at any time point after search, and the server will send a diagnostic to the effect that the requested result set does not exist any more. @@ -444,7 +466,7 @@ Named result sets are only supported by the Z39.50 protocol. The SRU web service is stateless, and therefore the notion of - named result sets does not exist when acessing a Zebra server by + named result sets does not exist when accessing a Zebra server by the SRU protocol. @@ -454,13 +476,13 @@ Zebra's special access point of type 'string' The numeric use (type 1) attribute is usually - refered to from a given + referred to from a given attribute set. In addition, Zebra let you use any internal index name defined in your configuration - as use atribute value. This is a great feature for + as use attribute value. This is a great feature for debugging, and when you do - not need the complecity of defined use attribute values. It is + not need the complexity of defined use attribute values. It is the preferred way of accessing Zebra indexes directly. @@ -494,7 +516,7 @@ See also for details, and - for the SRU PQF query extention using string names as a fast + for the SRU PQF query extension using string names as a fast debugging facility. @@ -507,7 +529,7 @@ idea) to emulate XPath 1.0 based search by defining use (type 1) - string attributes which in appearence + string attributes which in appearance resemble XPath queries. There are two problems with this approach: first, the XPath-look-alike has to be defined at indexation time, no new undefined @@ -525,7 +547,7 @@ use (type 1) xpath attributes. You must enable the xpath enable directive in your - .abs config files. + .abs configuration files. Only a very restricted subset of the @@ -538,14 +560,14 @@ Finding all documents which have the term "content" inside a text node found in a specific XML DOM subtree, whose starting element is - adressed by XPath. + addressed by XPath. Z> find @attr 1=/root content Z> find @attr 1=/root/first content Notice that the XPath must be absolute, i.e., must start with '/', and that the - XPath decendant-or-self axis followed by a + XPath descendant-or-self axis followed by a text node selection text() is implicitly appended to the stated XPath. @@ -564,7 +586,7 @@ - Filter the adressing XPath by a predicate working on exact + Filter the addressing XPath by a predicate working on exact string values in attributes (in the XML sense) can be done: return all those docs which have the term "english" contained in one of all text subnodes of @@ -596,7 +618,7 @@ It is worth mentioning that these dynamic performed XPath - queries are a performance bottelneck, as no optimized + queries are a performance bottleneck, as no optimized specialized indexes can be used. Therefore, avoid the use of this facility when speed is essential, and the database content size is medium to large. @@ -634,7 +656,7 @@ Use Attributes (type = 1) - The following Explain search atributes are supported: + The following Explain search attributes are supported: ExplainCategory (@attr 1=1), DatabaseName (@attr 1=3), DateAdded (@attr 1=9), @@ -657,7 +679,7 @@ Explain searches with yaz-client Classic Explain only defines retrieval of Explain information - via ASN.1. Pratically no Z39.50 clients supports this. Fortunately + via ASN.1. Practically no Z39.50 clients supports this. Fortunately they don't have to - Zebra allows retrieval of this information in other formats: SUTRS, XML, @@ -744,7 +766,7 @@ Most of the information contained in this section is an excerpt of the ATTRIBUTE SET BIB-1 (Z39.50-1995) SEMANTICS, - found at . The BIB-1 + found at . The BIB-1 Attribute Set Semantics from 1995, also in an updated Bib-1 Attribute Set @@ -759,7 +781,7 @@ A use attribute specifies an access point for any atomic query. - These acess points are highly dependent on the attribute set used + These access points are highly dependent on the attribute set used in the query, and are user configurable using the following default configuration files: tab/bib1.att, @@ -772,7 +794,7 @@ - In addition, Zebra allows the acess of + In addition, Zebra allows the access of internal index names and dynamic XPath as use attributes; see and @@ -1004,7 +1026,7 @@ structure attribute (type 4) can be defined using the configuration file tab/default.idx. - The default configuration is summerized in this table. + The default configuration is summarized in this table. find @attr 1=Body-of-text @attr 4=106 "bach salieri teleman" Z> find @attr 1=Body-of-text @or bach @or salieri teleman - This OR list of terms is very usefull in + This OR list of terms is very useful in combination with relevance ranking: Z> find @attr 1=Body-of-text @attr 2=102 @attr 4=105 "bach salieri teleman" @@ -1174,7 +1196,7 @@ The truncation attribute specifies whether variations of one or - more characters are allowed between serch term and hit terms, or + more characters are allowed between search term and hit terms, or not. Using non-default truncation attributes will broaden the document hit set of a search query. @@ -1257,7 +1279,7 @@ Process # in search term (101) is a poor-man's regular expression search. It maps each # to .*, and - performes then a Regexp-1 (102) regular + performs then a Regexp-1 (102) regular expression search. The following two queries are equivalent: Z> find @attr 1=Body-of-text @attr 5=101 schnit#ke @@ -1279,12 +1301,12 @@ The truncation attribute value - Regexp-2 (103) is a Zebra specific extention + Regexp-2 (103) is a Zebra specific extension which allows fuzzy matches. One single error in spelling of search terms is allowed, i.e., a document is hit if it includes a term which can be mapped to the used search term by one character substitution, addition, deletion or - change of posiiton. + change of position. Z> find @attr 1=Body-of-text @attr 5=100 schnittke ... @@ -1377,11 +1399,11 @@ The Zebra internal query engine has been extended to specific needs not covered by the bib-1 attribute set query - model. These extentions are non-standard - and non-portable: most functional extentions + model. These extensions are non-standard + and non-portable: most functional extensions are modeled over the bib-1 attribute set, defining type 7-9 attributes. - There are also the speciel + There are also the special string type index names for the idxpath attribute set. @@ -1421,9 +1443,9 @@ - Zebra specific Search Extentions to all Attribute Sets + Zebra specific Search Extensions to all Attribute Sets - Zebra extends the Bib1 attribute types, and these extentions are + Zebra extends the Bib1 attribute types, and these extensions are recognized regardless of attribute set used in a search operation query. @@ -1431,7 +1453,7 @@
- + @@ -1475,7 +1497,7 @@
Zebra Search Attribute ExtentionsZebra Search Attribute Extensions
Name
- Zebra Extention Embedded Sort Attribute (type 7) + Zebra Extension Embedded Sort Attribute (type 7) The embedded sort is a way to specify sort within a query - thus @@ -1517,7 +1539,7 @@ - Zebra Extention Term Set Attribute (type 8) + Zebra Extension Term Set Attribute (type 8) The Term Set feature is a facility that allows a search to store @@ -1540,7 +1562,7 @@
- Zebra Extention Rank Weight Attribute (type 9) + Zebra Extension Rank Weight Attribute (type 9) Rank weight is a way to pass a value to a ranking algorithm - so @@ -1556,10 +1578,10 @@ - Zebra Extention Approximative Limit Attribute (type 9) + Zebra Extension Approximative Limit Attribute (type 9) - Newer Zebra versions normally estemiates hit count for every APT + Newer Zebra versions normally estimate hit count for every APT (leaf) in the query tree. These hit counts are returned as part of the searchResult-1 facility in the binary encoded Z39.50 search response packages. @@ -1570,14 +1592,14 @@ reached. A value of zero means exact hit count. - For example, we might be intersted in exact hit count for a, but + For example, we might be interested in exact hit count for a, but for b we allow hit count estimates for 1000 and higher. Z> find @and a @attr 9=1000 b - The estimated hit count fascility makes searches faster, as one + The estimated hit count facility makes searches faster, as one only needs to process large hit lists partially. @@ -1585,11 +1607,11 @@ documents in the hit lists need to be examined for scoring and re-sorting. It is an experimental - extention. Do not use in production code. + extension. Do not use in production code. - Zebra Extention Term Reference Attribute (type 10) + Zebra Extension Term Reference Attribute (type 10) Zebra supports the searchResult-1 facility. @@ -1613,16 +1635,16 @@ - Zebra specific Scan Extentions to all Attribute Sets + Zebra specific Scan Extensions to all Attribute Sets - Zebra extends the Bib1 attribute types, and these extentions are + Zebra extends the Bib1 attribute types, and these extensions are recognized regardless of attribute set used in a scan operation query. - + @@ -1648,7 +1670,7 @@
Zebra Scan Attribute ExtentionsZebra Scan Attribute Extensions
Name
- Zebra Extention Result Set Narrow (type 8) + Zebra Extension Result Set Narrow (type 8) If attribute Result Set Narrow (type 8) @@ -1661,7 +1683,7 @@ the case of scanning all title fields around the scanterm mozart, then refining the scan by issuing a filtering query for amadeus to - restric the scan to the result set of the query: + restrict the scan to the result set of the query: Z> scan @attr 1=4 mozart ... @@ -1689,11 +1711,11 @@ - Zebra Extention Approximative Limit (type 9) + Zebra Extension Approximative Limit (type 9) - The Zebra Extention Approximative Limit (type - 9) is a way to enable approx + The Zebra Extension Approximative Limit (type + 9) is a way to enable approximate hit counts for scan hit counts, in the same way as for search hit counts. @@ -1723,7 +1745,7 @@ xpath enable option in the GRS filter *.abs configuration files. If one wants to use the special idxpath numeric attribute set, the - main Zebra configuraiton file zebra.cfg + main Zebra configuration file zebra.cfg directive attset: idxpath.att must be enabled. The idxpath is depreciated, may not be @@ -1835,7 +1857,7 @@
- Combining usual bib-1 attribut set searches + Combining usual bib-1 attribute set searches with idxpath attribute set searches: Z> find @and @attr idxpath 1=1 @attr 4=3 link/ @attr 1=4 mozart @@ -1843,7 +1865,7 @@ - Scanning is supportet on all idxpath + Scanning is supported on all idxpath indexes, both specified as numeric use attributes, or as string index names. @@ -1883,10 +1905,10 @@ - + - + @@ -1894,7 +1916,7 @@ - + @@ -1931,7 +1953,7 @@ Numeric use attributes are mapped to the Zebra internal - string index according to the attribute set defintion in use. + string index according to the attribute set definition in use. The default attribute set is Bib-1, and may be omitted in the PQF query. @@ -1973,7 +1995,7 @@ - String indexes can be acessed directly, + String indexes can be accessed directly, independently which attribute set is in use. These are just ignored. The above mentioned name normalization applies. String index names are defined in the @@ -1984,10 +2006,10 @@ - Zebra internal indexes can be acessed directly, + Zebra internal indexes can be accessed directly, according to the same rules as the user defined string indexes. The only difference is that - Zebra internal indexe names are hardwired, + Zebra internal index names are hardwired, all uppercase and must start with the character '_'. @@ -1995,7 +2017,7 @@ Finally, XPATH access points are only available using the GRS filter for indexing. - These acees point names must start with the character + These access point names must start with the character '/', they are not normalized, but passed unaltered to the Zebra internal XPATH engine. See . @@ -2013,7 +2035,7 @@ Internally Zebra has in it's default configuration several different types of registers or indexes, whose tokenization and character normalization rules differ. This reflects the fact that - serching fundamental different tokens like dates, numbers, + searching fundamental different tokens like dates, numbers, bitfields and string based text needs different rulesets. @@ -2175,7 +2197,7 @@ If the Structure attribute is - URx the term is treated as a URX (URL) entity. + URX the term is treated as a URX (URL) entity. The search is performed on those fields that are indexed as type u in the *.abs file. -- 1.7.10.4
Acces point name mappingAccess point name mapping
Acess PointAccess Point Type Grammar Notes
Use attibuteUse attribute numeric [1-9][1-9]* directly mapped to string index name