X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Fquerymodel.xml;h=831eeff71e5e7e94013c342a9a0a897c37183af8;hb=4478d785b7769691261005c98063b98a5a5971b3;hp=709af6ede7d3f27ef3988b57b3824b6d889632fe;hpb=5ceb96687dc82a78d6beba7f066c37abb0cd2bd5;p=idzebra-moved-to-github.git diff --git a/doc/querymodel.xml b/doc/querymodel.xml index 709af6e..831eeff 100644 --- a/doc/querymodel.xml +++ b/doc/querymodel.xml @@ -1,5 +1,5 @@ - + Query Model @@ -24,7 +24,7 @@ Since the type-1 (RPN) query structure has no direct, useful string - representation, every origin application needs to provide some + representation, every client application needs to provide some form of mapping from a local query notation or representation to it. @@ -84,7 +84,7 @@ explain operation, which provides the means for learning which fields (also called - indexes or access points + indexes or access points) are provided, which default parameter the server uses, which retrieve document formats are defined, and which specific parts of the general query model are supported. @@ -132,7 +132,7 @@ It provides the means to investigate the content of specific indexes. - Scanning an index returns a handful of terms actually fond in + Scanning an index returns a handful of terms actually found in the indexes, and in addition the scan operation returns the number of documents indexed by each term. A search client can use this information to propose proper @@ -152,7 +152,8 @@ The PQF grammar is documented in the YAZ manual, and shall not be repeated here. This textual PQF representation - is always during search mapped to the equivalent Zebra internal + is not transmistted to Zebra during search, but it is in the + client mapped to the equivalent Z39.50 binary query parse tree. @@ -209,7 +210,7 @@ bib-1 Standard PQF query language attribute set which defines the semantics of Z39.50 searching. In addition, all of the - non-use attributes (type 2-9) define the hard-wired + non-use attributes (types 2-11) define the hard-wired Zebra internal query processing. default @@ -226,7 +227,7 @@ idxpath Hardwired XPATH like attribute set, only available for indexing with the GRS record model - depreciated + deprecated --> @@ -349,7 +350,7 @@ Atomic (APT) queries are always leaf nodes in the PQF query tree. - UN-supplied non-use attributes type 2-9 are either inherited from + UN-supplied non-use attributes types 2-11 are either inherited from higher nodes in the query tree, or are set to Zebra's default values. See for details. @@ -445,7 +446,9 @@ Defining a named result set and re-using it in the next query, - using yaz-client. + using yaz-client. Notice that the client, not + the server, assigns the string '1' to the + named result set. Z> f @attr 1=4 mozart ... @@ -454,11 +457,6 @@ Z> f @and @set 1 @attr 1=4 amadeus ... Number of hits: 14, setno 2 - ... - Z> f @attr 1=1016 beethoven - ... - Number of hits: 26, setno 3 - ... @@ -609,7 +607,8 @@ Escaping PQF keywords and other non-parseable XPath constructs - with '{ }' to prevent syntax errors: + with '{ }' to prevent client-side PQF parsing + syntax errors: Z> find @attr {1=/root/first[@attr='danish']} content Z> find @attr {1=/record/@set} oai @@ -787,13 +786,35 @@ tab/dan1.att, tab/explain.att, and tab/gils.att. + + + For example, some few Bib-1 use + attributes from the tab/bib1.att are: + + att 1 Personal-name + att 2 Corporate-name + att 3 Conference-name + att 4 Title + ... + att 1009 Subject-name-personal + att 1010 Body-of-text + att 1011 Date/time-added-to-db + ... + att 1016 Any + att 1017 Server-choice + att 1018 Publisher + ... + att 1035 Anywhere + att 1036 Author-Title-Subject + + + New attribute sets can be added by adding new tab/*.att configuration files, which need to - be sourced in the main configuration zebra.cfg. + be sourced in the main configuration zebra.cfg. - - In addition, Zebra allows the access of + In addition, Zebra allows the access of internal index names and dynamic XPath as use attributes; see and @@ -996,7 +1017,7 @@ Any position in field 3 - default + supported @@ -1004,9 +1025,9 @@ The position attribute values first in field (1), and first in subfield(2) are unsupported. - Using them does not trigger an error, but silent defaults to - any position in field (3). - + Using them silently maps to + any position in field (3). A proper diagnostic + should have been issued. @@ -1351,7 +1372,7 @@ Complete subfield 2 - depreciated + deprecated Complete field @@ -1537,9 +1558,21 @@ + + + + + Zebra Extension Rank Weight Attribute (type 9) @@ -1577,31 +1613,46 @@ - Zebra Extension Approximative Limit Attribute (type 9) + Zebra Extension Approximative Limit Attribute (type 11) - Newer Zebra versions normally estimate hit count for every APT + Zebra computes - unless otherwise configured - + the exact hit count for every APT (leaf) in the query tree. These hit counts are returned as part of the searchResult-1 facility in the binary encoded Z39.50 search response packages. - By setting a limit for the APT we can make Zebra turn into - approximate hit count when a certain hit count limit is - reached. A value of zero means exact hit count. + By setting an estimation limit size of the resultset of the APT + leaves, Zebra stoppes processing the result set when the limit + length is reached. + Hit counts under this limit are still precise, but hit counts over it + are estimated using the statistics gathered from the chopped + result set. + + + Specifying a limit of 0 resuts in exact hit counts. For example, we might be interested in exact hit count for a, but for b we allow hit count estimates for 1000 and higher. - Z> find @and a @attr 9=1000 b + Z> find @and a @attr 11=1000 b The estimated hit count facility makes searches faster, as one only needs to process large hit lists partially. + It is mostly used in huge databases, where you you want trade + exactness of hit counts against speed of execution. + Do not use approximative hit count limits + in conjunction with relevance ranking, as re-sorting of the + result set obviosly only works when the entire result set has + been processed. + + This facility clashes with rank weight, because there all documents in the hit lists need to be examined for scoring and re-sorting. @@ -1710,11 +1761,11 @@ - Zebra Extension Approximative Limit (type 9) + Zebra Extension Approximative Limit (type 11) The Zebra Extension Approximative Limit (type - 9) is a way to enable approximate + 11) is a way to enable approximate hit counts for scan hit counts, in the same way as for search hit counts. @@ -1747,7 +1798,7 @@ main Zebra configuration file zebra.cfg directive attset: idxpath.att must be enabled. - The idxpath is depreciated, may not be + The idxpath is deprecated, may not be supported in future Zebra versions, and should definitely not be used in production code. @@ -1780,31 +1831,31 @@ XPATH Begin 1 _XPATH_BEGIN - depreciated + deprecated XPATH End 2 _XPATH_END - depreciated + deprecated XPATH CData 1016 _XPATH_CDATA - depreciated + deprecated XPATH Attribute Name 3 _XPATH_ATTR_NAME - depreciated + deprecated XPATH Attribute CData 1015 _XPATH_ATTR_CDATA - depreciated + deprecated @@ -2325,6 +2376,8 @@ The next plus character marks the end of the section. Currently Zebra only supports one specifier, the error tolerance, which consists one digit. +