X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Fquerymodel.xml;h=fb69f7489cb91703306a06a01d250a84d0de4815;hb=7cba8e6df70ba12b7dad1570564ff9e7a8112dd3;hp=ee170612f843dcf06c8dda7fd57f3cebec968528;hpb=3f97d3b682ee70a2b523b67153a83e9545d1b610;p=idzebra-moved-to-github.git diff --git a/doc/querymodel.xml b/doc/querymodel.xml index ee17061..fb69f74 100644 --- a/doc/querymodel.xml +++ b/doc/querymodel.xml @@ -1,10 +1,10 @@ - + Query Model Query Model Overview - + Query Languages @@ -24,43 +24,42 @@ Since the type-1 (RPN) query structure has no direct, useful string - representation, every origin application needs to provide some + representation, every client application needs to provide some form of mapping from a local query notation or representation to it. - - - - - Prefix Query Format (PQF) - - - Index Data has defined a textual representation in the - Prefix Query Format, short - PQF, which maps - one-to-one to binary encoded - type-1 RPN query packages. - It has been adopted by other - parties developing Z39.50 software, and is often referred to as - Prefix Query Notation, or in short - PQN. See - for further explanations and - descriptions of Zebra's capabilities. - - - - Common Query Language (CQL) + + + + Prefix Query Format (PQF) + + Index Data has defined a textual representation in the + Prefix Query Format, short + PQF, which maps + one-to-one to binary encoded + type-1 RPN queries. + PQF has been adopted by other + parties developing Z39.50 software, and is often referred to as + Prefix Query Notation, or in short + PQN. See + for further explanations and + descriptions of Zebra's capabilities. + + + + + Common Query Language (CQL) - The query model of the type-1 RPN, - expressed in PQF/PQN is natively supported. - On the other hand, the default SRU - webservices Common Query Language - CQL is not natively supported. + The query model of the type-1 RPN, + expressed in PQF/PQN is natively supported. + On the other hand, the default SRU + web services Common Query Language + CQL is not natively supported. - Zebra can be configured to understand and map CQL to PQF. See - . - - + Zebra can be configured to understand and map CQL to PQF. See + . + + @@ -85,7 +84,7 @@ explain operation, which provides the means for learning which fields (also called - indexes or access points + indexes or access points) are provided, which default parameter the server uses, which retrieve document formats are defined, and which specific parts of the general query model are supported. @@ -133,7 +132,7 @@ It provides the means to investigate the content of specific indexes. - Scanning an index returns a handful of terms actually fond in + Scanning an index returns a handful of terms actually found in the indexes, and in addition the scan operation returns the number of documents indexed by each term. A search client can use this information to propose proper @@ -150,10 +149,11 @@ Prefix Query Format syntax and semantics - The PQF grammer + The PQF grammar is documented in the YAZ manual, and shall not be repeated here. This textual PQF representation - is always during search mapped to the equivalent Zebra internal + is not transmistted to Zebra during search, but it is in the + client mapped to the equivalent Z39.50 binary query parse tree. @@ -227,7 +227,7 @@ idxpath Hardwired XPATH like attribute set, only available for indexing with the GRS record model - depreciated + deprecated --> @@ -251,7 +251,7 @@ Boolean operators - A pair of subquery trees, or of atomic queries, is combined + A pair of sub query trees, or of atomic queries, is combined using the standard boolean operators into new query trees. Thus, boolean operators are always internal nodes in the query tree. @@ -281,7 +281,7 @@ Set complement of two atomic queries hit sets @prox - binary PROXIMY operator + binary PROXIMITY operator Set intersection of two atomic queries hit sets. In addition, the intersection set is purged for all documents which do not satisfy the requested query @@ -331,7 +331,7 @@ retrieval, in the same order and near each other as described in the term list. The hit set is a subset of the corresponding - PROXIMY query. + PROXIMITY query. Z> find "information retrieval" @@ -350,7 +350,7 @@ Atomic (APT) queries are always leaf nodes in the PQF query tree. - Unsupplied non-use attributes type 2-9 are either inherited from + UN-supplied non-use attributes type 2-9 are either inherited from higher nodes in the query tree, or are set to Zebra's default values. See for details. @@ -415,7 +415,7 @@ - For example, we migh want to scan the title index, starting with + For example, we might want to scan the title index, starting with the term debussy, and displaying this and the following terms in lexicographic order: @@ -446,7 +446,9 @@ Defining a named result set and re-using it in the next query, - using yaz-client. + using yaz-client. Notice that the client, not + the server, assigns the string '1' to the + named result set. Z> f @attr 1=4 mozart ... @@ -455,11 +457,6 @@ Z> f @and @set 1 @attr 1=4 amadeus ... Number of hits: 14, setno 2 - ... - Z> f @attr 1=1016 beethoven - ... - Number of hits: 26, setno 3 - ... @@ -589,7 +586,7 @@ Filter the addressing XPath by a predicate working on exact string values in attributes (in the XML sense) can be done: return all those docs which - have the term "english" contained in one of all text subnodes of + have the term "english" contained in one of all text sub nodes of the subtree defined by the XPath /record/title[@lang='en']. And similar predicate filtering. @@ -610,7 +607,8 @@ Escaping PQF keywords and other non-parseable XPath constructs - with '{ }' to prevent syntax errors: + with '{ }' to prevent client-side PQF parsing + syntax errors: Z> find @attr {1=/root/first[@attr='danish']} content Z> find @attr {1=/record/@set} oai @@ -788,13 +786,35 @@ tab/dan1.att, tab/explain.att, and tab/gils.att. + + + For example, some few Bib-1 use + attributes from the tab/bib1.att are: + + att 1 Personal-name + att 2 Corporate-name + att 3 Conference-name + att 4 Title + ... + att 1009 Subject-name-personal + att 1010 Body-of-text + att 1011 Date/time-added-to-db + ... + att 1016 Any + att 1017 Server-choice + att 1018 Publisher + ... + att 1035 Anywhere + att 1036 Author-Title-Subject + + + New attribute sets can be added by adding new tab/*.att configuration files, which need to - be sourced in the main configuration zebra.cfg. + be sourced in the main configuration zebra.cfg. - - In addition, Zebra allows the access of + In addition, Zebra allows the access of internal index names and dynamic XPath as use attributes; see and @@ -997,7 +1017,7 @@ Any position in field 3 - default + supported @@ -1005,9 +1025,9 @@ The position attribute values first in field (1), and first in subfield(2) are unsupported. - Using them does not trigger an error, but silent defaults to - any position in field (3). - + Using them silently maps to + any position in field (3). A proper diagnostic + should have been issued. @@ -1352,7 +1372,7 @@ Complete subfield 2 - depreciated + deprecated Complete field @@ -1538,9 +1558,21 @@ + + + + + Zebra Extension Rank Weight Attribute (type 9) @@ -1581,15 +1616,22 @@ Zebra Extension Approximative Limit Attribute (type 9) - Newer Zebra versions normally estimate hit count for every APT + Zebra computes - unless otherwise configured - + the exact hit count for every APT (leaf) in the query tree. These hit counts are returned as part of the searchResult-1 facility in the binary encoded Z39.50 search response packages. - By setting a limit for the APT we can make Zebra turn into - approximate hit count when a certain hit count limit is - reached. A value of zero means exact hit count. + By setting an estimation limit size of the resultset of the APT + leaves, Zebra stoppes processing the result set when the limit + length is reached. + Hit counts under this limit are still precise, but hit counts over it + are estimated using the statistics gathered from the chopped + result set. + + + Specifying a limit of 0 resuts in exact hit counts. For example, we might be interested in exact hit count for a, but @@ -1601,8 +1643,16 @@ The estimated hit count facility makes searches faster, as one only needs to process large hit lists partially. + It is mostly used in huge databases, where you you want trade + exactness of hit counts against speed of execution. + Do not use approximative hit count limits + in conjunction with relevance ranking, as re-sorting of the + result set obviosly only works when the entire result set has + been processed. + + This facility clashes with rank weight, because there all documents in the hit lists need to be examined for scoring and re-sorting. @@ -1748,7 +1798,7 @@ main Zebra configuration file zebra.cfg directive attset: idxpath.att must be enabled. - The idxpath is depreciated, may not be + The idxpath is deprecated, may not be supported in future Zebra versions, and should definitely not be used in production code. @@ -1781,31 +1831,31 @@ XPATH Begin 1 _XPATH_BEGIN - depreciated + deprecated XPATH End 2 _XPATH_END - depreciated + deprecated XPATH CData 1016 _XPATH_CDATA - depreciated + deprecated XPATH Attribute Name 3 _XPATH_ATTR_NAME - depreciated + deprecated XPATH Attribute CData 1015 _XPATH_ATTR_CDATA - depreciated + deprecated @@ -2036,7 +2086,7 @@ different types of registers or indexes, whose tokenization and character normalization rules differ. This reflects the fact that searching fundamental different tokens like dates, numbers, - bitfields and string based text needs different rulesets. + bitfields and string based text needs different rule sets. urx (@attr 4=104) - + @@ -2326,6 +2376,8 @@ The next plus character marks the end of the section. Currently Zebra only supports one specifier, the error tolerance, which consists one digit. +
ignored URX/URL ('u')Special index for URL web adressesSpecial index for URL web addresses
numeric (@attr 4=109)