X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Fquerymodel.xml;h=bed7a2e3d153c420b414c4ce30ad8ad4a91fd327;hb=0002d3ccff37e5598553683e95714ca5711f05e8;hp=bae113f2551991bc0c62332313c2d2500ede7aa2;hpb=a4d2f62568bcf788630502fc1cbcad1163d3f87a;p=idzebra-moved-to-github.git diff --git a/doc/querymodel.xml b/doc/querymodel.xml index bae113f..bed7a2e 100644 --- a/doc/querymodel.xml +++ b/doc/querymodel.xml @@ -1,112 +1,519 @@ - - - Query Model - + + + Query Model + Query Model Overview + + + + Query Languages + + + Zebra is born as a networking Information Retrieval engine adhering + to the international standards + Z39.50 and + SRU, + and implement the query model defined there. + Unfortunately, the Z39.50 query model has only defined a binary + encoded representation, which is used as transport packaging in + the Z39.50 protocol layer. This representation is not human + readable, nor defines any convenient way to specify queries. + + + + + Prefix Query Format (PQF) - Zebra is born as a networking Information Retrieval engine adhering - to the international standards - Z39.50 and - SRU, - and implement the query model defined there. - Unfortunately, the Z39.50 query model has only defined a binary - encoded representation, which is used as transport packaging in - the Z39.50 protocol layer. This representation is not human - readable, nor defines any convenient way to specify queries. - - - Therefore, Index Data has defined a textual representaion in the - Prefix Query Format, short - PQF, which then has been adopted by other - parties developing Z39.50 software. It is also often referred to as - Prefix Query Notation, or in short - PQN, and is thoroughly explained in - . - + Index Data has defined a textual representaion in the + Prefix Query Format, short + PQF, which then has been adopted by other + parties developing Z39.50 software. It is also often referred to as + Prefix Query Notation, or in short + PQN, and is thoroughly explained in + . + + + + + + Common Query Language (CQL) - In addition, Zebra can be configured to understand and map the - Common Query Language - (CQL) - to PQF. See an introduction on the mapping to the internal query - representation in - . - - + In addition, Zebra can be configured to understand and map the + Common Query Language + (CQL) + to PQF. See an introduction on the mapping to the internal query + representation in + . + + + + + + + Query types + + + + + Explain Queries + + + + + + Search Queries + + + + + + Scan Queries + + + + + + + + Prefix Query Format structure and syntax - The - PQF - grammer is documented in the YAZ manual. - This textual PQF representation + The PQF grammer + is documented in the YAZ manual, and shall not be + repeated here. This textual PQF representation is always during search mapped to the equivalent Zebra internal query parse tree. + + + PQF tree structure + + The PQF parse tree - or the equivalent textual representation - + may start with one specification of the + attribute set used. Following is a query + tree, which + consists of atomic query parts, eventually + paired by boolean binary operators, and + finally recursively combined into + complex query trees. + + + + Attribute sets + + Attribute sets define the exact meaning and semantics of queries + issued. Zebra comes with some predefined attribute set + definitions, others can easily be defined and added to the + configuration. + + The Zebra internal query procesing is modeled after + the Bib1 attribute set, and the non-use + attributes type 2-6 are hard-wired in. It is therefore essential + to be familiar with . + + + + + + + + + + + + + + + + + + + + + + + +
Attribute sets predefined in Zebra
exp-1Explain attribute setSpecial attribute set used on the special automagic + IR-Explain-1 database to gain information on + server capabilities, database names, and database + and semantics.
bib-1Bib1 attribute setStandard PQF query language attribute set which defines the + semantics of Z39.50 searching. In addition, all of the + non-use attributes (type 2-9) define the Zebra internal query + processing
gilsGILS attribute setExtention to the Bib1 attribute set.
+
+ + + Boolean operators + + A pair of subquery trees, or of atomic queries, is combined + using the standard boolean operators into new query trees. + + + + + + + + + + + + + + + + + + + + + + + +
Boolean operators
@andbinary AND operatorSet intersection of two atomic queries hit sets
@orbinary OR operatorSet union of two atomic queries hit sets
@notbinary AND NOT operatorSet complement of two atomic queries hit sets
@proxbinary PROXIMY operatorSet intersection of two atomic queries hit sets. In + addition, the intersection set is purged for all + documents which do not satisfy the requested query + term proximity. Usually a proper subset of the AND + operation.
+ + + For example, we can combine the terms + information and retrieval + into different searches in the default index of the default + attribute set as follows. + Querying for the union of all documents containing the + terms information OR + retrieval: + + Z> find @or information retrieval + + + + Querying for the intersection of all documents containing the + terms information AND + retrieval: + The hit set is a subset of the coresponding + OR query. + + Z> find @and information retrieval + + + + Querying for the intersection of all documents containing the + terms information AND + retrieval, taking proximity into account: + The hit set is a subset of the coresponding + AND query. + + Z> find @prox information retrieval + + + + Querying for the intersection of all documents containing the + terms information AND + retrieval, in the same order and near each + other as described in the term list + The hit set is a subset of the coresponding + PROXIMY query. + + Z> find "information retrieval" + + +
+ + + + Atomic queries + + Atomic queries are the query parts which work on one acess point + only. These consist of an attribute list + followed by a single term or a + quoted term list. + + + Unsupplied non-use attributes type 2-9 are either inherited from + higher nodes in the query tree, or are set to Zebra's default values. + See for details. + + + + + + + + + + + + + + + +
Atomic queries
attribute listList of orthogonal attributesAny of the orthogonal attribute types may be omitted, + these are inherited from higher query tree nodes, or if not + inherited, are set to the default Zebra configuration values. +
termsingle term + or quoted term list Here the search terms or list of search terms is added + to the query
+ + Querying for the term information in the + default index using the default attribite set, the server choice + of access point/index, and the default non-use attributes. + + Z> find "information" + + + + Equivalent query fully specified: + + Z> find @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 "information" + + + + + Finding all documents which have empty titles. Notice that the + empty term must be quoted, but is otherwise legal. + + Z> find @attr 1=4 "" + + +
+ + + Zebra's special use attribute type 1 of form 'string' + + The numeric use (type 1) attribute is usually + refered to from a given + attribute set. In addition, Zebra let you use + any internal index + name defined in your configuration + as use atribute value. This is a great feature for + debugging, and when you do + not need the complecity of defined use attribute values. It is + the preferred way of accessing Zebra indexes directly. + + + Finding all documents which have the term list "information + retrieval" in an Zebra index, using it's internal full string name. + + Z> find @attr 1=sometext "information retrieval" + + + + Searching the bib-1 use attribute 54 using it's string name: + + Z> find @attr 1=Code-language eng + + + + Searching in any silly string index - if it's defined in your + indexation rules and can be parsed by the PQF parser. + This is definitely not the recommended use of + this facility, as it might confuse your users with some very + unexpected results. + + Z> find @attr 1=silly/xpath/alike[@index]/name "information retrieval" + + + + See for details, and + + for the SRU PQF query extention using string names as a fast + debugging facility. + + + + + Zebra's special use attribute type 1 of form 'XPath' + for GRS filters + + As we have seen above, it is possible (albeit seldom a great + idea) to emulate + XPath 1.0 based + search by defining use (type 1) + string attributes which in appearence + resemble XPath queries. There are two + problems with this approach: first, the XPath-look-alike has to + be defined at indexation time, no new undefined + XPath queries can entered at search time, and second, it might + confuse users very much that an XPath-alike index name in fact + gets populated from a possible entirely different XML element + than it pretends to acess. + + + When using the GRS Record Model + (see ), we have the + possibility to embed life + XPath expressions + in the PQF queries, which are here called + use (type 1) xpath + attributes. You must enable the + xpath enable directive in your + .abs config files. + + + Only a very restricted subset of the + XPath 1.0 + standard is supported as the GRS record model is simpler than + a full XML DOM structure. See the following examples for + possibilities. + + + Finding all documents which have the term "content" + inside a text node found in a specific XML DOM + subtree, whose starting element is + adressed by XPath. + + Z> find @attr 1=/root content + Z> find @attr 1=/root/first content + + Notice that the + XPath must be absolute, i.e., must start with '/', and that the + XPath decendant-or-self axis followed by a + text node selection text() is implicitly + appended to the stated XPath. + + It follows that the above searches are interpreted as: + + Z> find @attr 1=/root//text() content + Z> find @attr 1=/root/first//text() content + + + + + Filter the adressing XPath by a predicate working on exact + string values in + attributes (in the XML sense) can be done: return all those docs which + have the term "english" contained in one of all text subnodes of + the subtree defined by the XPath + /record/title[@lang='en'] + + Z> find @attr 1=/record/title[@lang='en'] english + + + + + Combining numeric indexes, boolean expressions, + and xpath based searches is possible: + + Z> find @attr 1=/record/title @and foo bar + Z> find @and @attr 1=/record/title foo @attr 1=4 bar + + + + Escaping PQF keywords and other non-parseable XPath constructs + with '{ }' to prevent syntax errors: + + Z> find @attr {1=/root/first[@attr='danish']} content + Z> find @attr {1=/root/second[@attr='danish lake']} + Z> find @attr {1=/root/third[@attr='dansk s\xc3\xb8']} + + + + It is worth mentioning that these dynamic performed XPath + queries are a performance bottelneck, as no optimized + specialized indexes can be used. Therefore, avoid the use of + this facility when speed is essential, and the database content + size is medium to large. + + + +
+ + + Explain Attribute Set + + The Z39.50 standard defines the + Explainattribute set + exp-1, which is used to discover information + about a server's search semantics and functional capabilities + Zebra exposes a "classic" + Explain database by base name IR-Explain-1, which + is populated with system internal information. + - - - - Explain Attribute Set - - The attribute-set exp-1 is defined for - searching an Explain IR-Explain-1 database. - It consists of a single Use (type 1) attribute. - - + The attribute-set exp-1 consists of a single + Use (type 1) attribute. + + In addition, the non-Use bib-1 attributes, that is, the types Relation, Position, Structure, Truncation, and Completeness are imported from - the bib-1 attrubute set, and may be used + the bib-1 attribute set, and may be used within any explain query. - + - + Use Attributes (type = 1) - - The following Explain search atributes are supported: - ExplainCategory (@attr 1=1), - DatabaseName (@attr 1=3), - DateAdded (@attr 1=9), - DateChanged(@attr 1=10). - - - A search in the use attribute ExplainCategory - supports only these predefined values: - CategoryList, TargetInfo, - DatabaseInfo, AttributeDetails. - + + The following Explain search atributes are supported: + ExplainCategory (@attr 1=1), + DatabaseName (@attr 1=3), + DateAdded (@attr 1=9), + DateChanged(@attr 1=10). + + + A search in the use attribute ExplainCategory + supports only these predefined values: + CategoryList, TargetInfo, + DatabaseInfo, AttributeDetails. + See tab/explain.att and the + Z39.50 standard for more information. - - + + Explain searches with yaz-client + Classic Explain only defines retrieval of Explain information + via ASN.1. Pratically no Z39.50 clients supports this. Fortunately + they don't have to - Zebra allows retrieval of this information + in other formats: + SUTRS, XML, + GRS-1 and ASN.1 Explain. + + + List supported categories to find out which explain commands are supported: Z> base IR-Explain-1 - Z> @attr exp1 1=1 categorylist + Z> find @attr exp1 1=1 categorylist Z> form sutrs Z> show 1+2 - + Get target info, that is, investigate which databases exist at this server endpoint: Z> base IR-Explain-1 - Z> @attr exp1 1=1 targetinfo + Z> find @attr exp1 1=1 targetinfo Z> form xml Z> show 1+1 Z> form grs-1 @@ -115,7 +522,7 @@ Z> show 1+1 - + List all supported databases, the number of hits is the number of databases found, which most commonly are the @@ -124,7 +531,7 @@ IR-Explain-1 databases. Z> base IR-Explain-1 - Z> f @attr exp1 1=1 databaseinfo + Z> find @attr exp1 1=1 databaseinfo Z> form sutrs Z> show 1+2 @@ -134,15 +541,15 @@ Get database info record for database Default. Z> base IR-Explain-1 - Z> @and @attr exp1 1=1 databaseinfo @attr exp1 1=3 Default + Z> find @and @attr exp1 1=1 databaseinfo @attr exp1 1=3 Default Identical query with explicitly specified attribute set: Z> base IR-Explain-1 - Z> @attrset exp1 @and @attr 1=1 databaseinfo @attr 1=3 Default + Z> find @attrset exp1 @and @attr 1=1 databaseinfo @attr 1=3 Default - + Get attribute details record for database Default. @@ -152,321 +559,700 @@ found. Z> base IR-Explain-1 - Z> @and @attr exp1 1=1 attributedetails @attr exp1 1=3 Default + Z> find @and @attr exp1 1=1 attributedetails @attr exp1 1=3 Default Identical query with explicitly specified attribute set: Z> base IR-Explain-1 - Z> @attrset exp1 @and @attr 1=1 attributedetails @attr 1=3 Default + Z> find @attrset exp1 @and @attr 1=1 attributedetails @attr 1=3 Default - - - - - Bib1 Attribute Set - - Something about querying to be written .. - - - Most of the information contained in this section is an excerpt of - the ATTRIBUTE SET BIB-1 (Z39.50-1995) - SEMANTICS, found at The BIB-1 - Attribute Set Semantics from 1995, also in an updated - Bib-1 - Attribute Set - version from 2003. Index Data is not the copyright holder of this - information. - + + + + + Bib1 Attribute Set + + Something about querying to be written .. + + + Most of the information contained in this section is an excerpt of + the ATTRIBUTE SET BIB-1 (Z39.50-1995) + SEMANTICS, + found at . The BIB-1 + Attribute Set Semantics from 1995, also in an updated + Bib-1 + Attribute Set + version from 2003. Index Data is not the copyright holder of this + information. + - Use Attributes (type = 1) - - - - Relation Attributes (type = 2) - - - + Use Attributes (type 1) + - - Position Attributes (type = 3) - + + A use attribute specifies an access point for any atomic query. + These acess points are highly dependent on the attribute set used + in the query, and are user configurable using the following + default configuration files: + tab/bib1.att, + tab/dan1.att, + tab/explain.att, and + tab/gils.att. + New attribute sets can be added by adding new + tab/*.att configuration files, which need to + be sourced in the main configuration zebra.cfg. + - - Structure Attributes (type = 4) - + + In addition, Zebra allows the acess of + internal index names and dynamic + XPath as use attributes. + See + for + alternative acess to the Zebra internal index names and XPath queries. + - - Truncation Attributes (type = 5) - + + Phrase search for information retrieval in + the title-register: + + Z> find @attr 1=4 "information retrieval" + + - - Completeness Attributes (type = 6) - + + + Relation Attributes (type 2) + + + Relation attributes describe the relationship of the access + point (left side + of the relation) to the search term as qualified by the attributes (right + side of the relation), e.g., Date-publication <= 1975. + - - Zebra Extention Sorting Attributes (type = 7) - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Relation Attributes (type 2)
RelationValueNotes
Less than1supported
Less than or equal2supported
Equal3default
Greater or equal4supported
Greater than5supported
Not equal6unsupported
Phonetic100unsupported
Stem101unsupported
Relevance102supported
AlwaysMatches103supported
- - Zebra Extention Search Estimation Attributes (type = 8) - + + The relation attribute + relevance (102) is supported, see + for full information. + + + + + All ordering operations are based on a lexicographical ordering, + expect when the + structure attribute numeric (109) is used. In + this case, ordering is numerical. See + . + - - Zebra Extention Weight Attributes (type = 9) - - -
+ + Ranked search for information retrieval in + the title-register + (see for the glory details): + + Z> find @attr 1=4 @attr 2=102 "information retrieval" + + + - - Mapping from Bib1 Attributes to Zebra internal - register indexes + + Position Attributes (type 3) + + The position attribute specifies the location of the search term + within the field or subfield in which it appears. - - Use attributes are interpreted according to the - attribute sets which have been loaded in the - zebra.cfg file, and are matched against specific - fields as specified in the .abs file which - describes the profile of the records which have been loaded. - If no Use attribute is provided, a default of Bib-1 Any is assumed. - + + + + + + + + + + + + + + + + + + + + + + + + + + +
Position Attributes (type 3)
PositionValueNotes
First in field 1unsupported
First in subfield2unsupported
Any position in field3default
+ + + The position attribute values first in field (1), + and first in subfield(2) are unsupported. + Using them does not trigger an error, but silent defaults to + any position in field (3). + + +
+ + + Structure Attributes (type 4) + + + The structure attribute specifies the type of search + term. This causes the search to be mapped on + different Zebra internal indexes, which must have been defined + at index time. + - - If a Structure attribute of - Phrase is used in conjunction with a - Completeness attribute of - Complete (Sub)field, the term is matched - against the contents of the phrase (long word) register, if one - exists for the given Use attribute. - A phrase register is created for those fields in the - .abs file that contains a - p-specifier. - - + + The possible values of the + structure attribute (type 4) can be defined + using the configuraiton file + tab/default.idx. + The default configuration is summerized in this table. + - - If Structure=Phrase is - used in conjunction with Incomplete Field - the - default value for Completeness, the - search is directed against the normal word registers, but if the term - contains multiple words, the term will only match if all of the words - are found immediately adjacent, and in the given order. - The word search is performed on those fields that are indexed as - type w in the .abs file. - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Structure Attributes (type 4)
StructureValueNotes
Phrase 1default
Word2supported
Key3supported
Year4supported
Date (normalized)5supported
Word list6supported
Date (un-normalized)100unsupported
Name (normalized) 101unsupported
Name (un-normalized) 102unsupported
Structure103unsupported
Urx104supported
Free-form-text105supported
Document-text106supported
Local-number107supported
String108unsupported
Numeric string109supported
+
+ + + The structure attribute value local-number + (107) + is supported, and maps always to the Zebra internal document ID. + - - If the Structure attribute is - Word List, - Free-form Text, or - Document Text, the term is treated as a - natural-language, relevance-ranked query. - This search type uses the word register, i.e. those fields - that are indexed as type w in the - .abs file. - + + For example, in + the GILS schema (gils.abs), the + west-bounding-coordinate is indexed as type n, + and is therefore searched by specifying + structure=Numeric String. + To match all those records with west-bounding-coordinate greater + than -114 we use the following query: + + Z> find @attr 4=109 @attr 2=5 @attr gils 1=2038 -114 + + - - If the Structure attribute is - Numeric String the term is treated as an integer. - The search is performed on those fields that are indexed - as type n in the .abs file. - + + Truncation Attributes (type = 5) - - If the Structure attribute is - URx the term is treated as a URX (URL) entity. - The search is performed on those fields that are indexed as type - u in the .abs file. - + + The truncation attribute specifies whether variations of one or + more characters are allowed between serch term and hit terms, or + not. Using non-default truncation attributes will broaden the + document hit set of a search query. + - - If the Structure attribute is - Local Number the term is treated as - native Zebra Record Identifier. - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Truncation Attributes (type 5)
TruncationValueNotes
Right truncation 1supported
Left truncation2supported
Left and right truncation3supported
Do not truncate100default
Process # in search term101supported
RegExpr-1 102supported
RegExpr-2103supported
- - If the Relation attribute is - Equals (default), the term is matched - in a normal fashion (modulo truncation and processing of - individual words, if required). - If Relation is Less Than, - Less Than or Equal, - Greater than, or Greater than or - Equal, the term is assumed to be numerical, and a - standard regular expression is constructed to match the given - expression. - If Relation is Relevance, - the standard natural-language query processor is invoked. - + + Truncation attribute value + Process # in search term (100) is a + poor-man's regular expression search. It maps + each # to .*, and + performes then a Regexp-1 (102) regular + expression search. + + + Truncation attribute value + Regexp-1 (102) is a normal regular search, + see. + + + Truncation attribute value + Regexp-2 (103) is a Zebra specific extention + which allows fuzzy matches. One single + error in spelling of search terms is allowed, i.e., a document + is hit if it includes a term which can be mapped to the used + search term by one character substitution, addition, deletion or + change of posiiton. + + +
+ + + Completeness Attributes (type = 6) + + This attribute is ONLY used if structure w, p is to be + chosen. completeness is ignorned if not w, p is to be + used.. + Incomplete field(1) is the default and makes Zebra use + register type w. + complete subfield(2) and complete field(3) both triggers + search field type p. + + +
+ - - For the Truncation attribute, - No Truncation is the default. - Left Truncation is not supported. - Process # in search term is supported, as is - Regxp-1. - Regxp-2 enables the fault-tolerant (fuzzy) - search. As a default, a single error (deletion, insertion, - replacement) is accepted when terms are matched against the register - contents. - -
+ + Zebra specific Search Extentions to all Attribute Sets + + Zebra extends the Bib1 attribute types, and these extentions are + recognized regardless of attribute + set used in a search operation query. + - - Regular expressions + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Zebra Search Attribute Extentions
NameValueOperationZebra version
Embedded Sort7search1.1
Term Set8search1.1
Rank Weight9search1.1
Approx Limit9search1.4
Term Reference10search1.4
+ + + Zebra Extention Embedded Sort Attribute (type 7) + + + The embedded sort is a way to specify sort within a query - thus + removing the need to send a Sort Request separately. It is both + faster and does not require clients to deal with the Sort + Facility. + + + The possible values after attribute type 7 are + 1 ascending and + 2 descending. + The attributes+term (APT) node is separate from the + rest and must be @or'ed. + The term associated with APT is the sorting level in integers, + where 0 means primary sort, + 1 means secondary sort, and so forth. + See also . + + + For example, searching for water, sort by title (ascending) + + Z> find @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 + + + + Or, searching for water, sort by title ascending, then date descending + + Z> find @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1 + + + + Zebra Extention Term Set Attribute (type 8) + - Each term in a query is interpreted as a regular expression if - the truncation value is either Regxp-1 (102) - or Regxp-2 (103). - Both query types follow the same syntax with the operands: - - - - x - - - Matches the character x. - - - - - . - - - Matches any character. - - - - - [..] - - - Matches the set of characters specified; - such as [abc] or [a-c]. - - - - - and the operators: - - - - x* - - - Matches x zero or more times. Priority: high. - - - - - x+ - - - Matches x one or more times. Priority: high. - - - - - x? - - - Matches x zero or once. Priority: high. - - - - - xy - - - Matches x, then y. - Priority: medium. - - - - - x|y - - - Matches either x or y. - Priority: low. - - - - - The order of evaluation may be changed by using parentheses. + The Term Set feature is a facility that allows a search to store + hitting terms in a "pseudo" resultset; thus a search (as usual) + + a scan-like facility. Requires a client that can do named result + sets since the search generates two result sets. The value for + attribute 8 is the name of a result set (string). The terms in + the named term set are returned as SUTRS records. - - If the first character of the Regxp-2 query - is a plus character (+) it marks the - beginning of a section with non-standard specifiers. - The next plus character marks the end of the section. - Currently Zebra only supports one specifier, the error tolerance, - which consists one digit. + For example, searching for u in title, right truncated, and + storing the result in term set named 'aset' + + Z> find @attr 5=1 @attr 1=4 @attr 8=aset u + + + The model has one serious flaw: we don't know the size of term + set. Experimental. Do not use in production code. + + + Zebra Extention Rank Weight Attribute (type 9) + - Since the plus operator is normally a suffix operator the addition to - the query syntax doesn't violate the syntax for standard regular - expressions. + Rank weight is a way to pass a value to a ranking algorithm - so + that one APT has one value - while another as a different one. + See also . - -
- - - Query examples - - Phrase search for information retrieval in - the title-register: - - @attr 1=4 "information retrieval" + For example, searching for utah in title with weight 30 as well + as any with weight 20: + + Z> find @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 utah + + Zebra Extention Approximative Limit Attribute (type 9) + + + Newer Zebra versions normally estemiates hit count for every APT + (leaf) in the query tree. These hit counts are returned as part of + the searchResult-1 facility in the binary encoded Z39.50 search + response packages. + - Ranked search for the same thing: + By setting a limit for the APT we can make Zebra turn into + approximate hit count when a certain hit count limit is + reached. A value of zero means exact hit count. + + + For example, we might be intersted in exact hit count for a, but + for b we allow hit count estimates for 1000 and higher. - @attr 1=4 @attr 2=102 "Information retrieval" + Z> find @and a @attr 9=1000 b - + + The estimated hit count fascility makes searches faster, as one + only needs to process large hit lists partially. + + + This facility clashes with rank weight, because there all + documents in the hit lists need to be examined for scoring and + re-sorting. + It is an experimental + extention. Do not use in production code. + + + + Zebra Extention Term Reference Attribute (type 10) + + + Zebra supports the searchResult-1 facility. If attribute 10 is + given, that specifies a subqueryId value returned as part of the + search result. It is a way for a client to name an APT part of a + query. + + + + Experimental. Do not use in production code. + + + + + + + Zebra specific Scan Extentions to all Attribute Sets + + Zebra extends the Bib1 attribute types, and these extentions are + recognized regardless of attribute + set used in a scan operation query. + + + + + + + + + + + + + + + + + + + + + + +
Zebra Scan Attribute Extentions
Name and TypeOperationZebra version
Result Set Narrow (type 8)scan1.3
Approximative Limit (type 9)scan1.4
+ + + Zebra Extention Result Set Narrow (type 8) + + + If attribute 8 is given for scan, the value is the name of a + result set. Each hit count in scan is @and'ed with the result set + given. + + + + Experimental and buggy. Definitely not to be used in production code. + + + Zebra Extention Approximative Limit (type 9) + + + The approximative limit (as for search) is a way to enable approx + hit counts for scan hit counts. + + + + Experimental. Do not use in production code. + + +
+ + + + Mapping from Bib1 Attributes to Zebra internal + register indexes + + TO-DO + + + + + Use attributes are interpreted according to the + attribute sets which have been loaded in the + zebra.cfg file, and are matched against specific + fields as specified in the .abs file which + describes the profile of the records which have been loaded. + If no Use attribute is provided, a default of Bib-1 Any is assumed. + + + + If a Structure attribute of + Phrase is used in conjunction with a + Completeness attribute of + Complete (Sub)field, the term is matched + against the contents of the phrase (long word) register, if one + exists for the given Use attribute. + A phrase register is created for those fields in the + .abs file that contains a + p-specifier. + + + + + If Structure=Phrase is + used in conjunction with Incomplete Field - the + default value for Completeness, the + search is directed against the normal word registers, but if the term + contains multiple words, the term will only match if all of the words + are found immediately adjacent, and in the given order. + The word search is performed on those fields that are indexed as + type w in the .abs file. + + + + If the Structure attribute is + Word List, + Free-form Text, or + Document Text, the term is treated as a + natural-language, relevance-ranked query. + This search type uses the word register, i.e. those fields + that are indexed as type w in the + .abs file. + + + + If the Structure attribute is + Numeric String the term is treated as an integer. + The search is performed on those fields that are indexed + as type n in the .abs file. + + + + If the Structure attribute is + URx the term is treated as a URX (URL) entity. + The search is performed on those fields that are indexed as type + u in the .abs file. + + + + If the Structure attribute is + Local Number the term is treated as + native Zebra Record Identifier. + + + + If the Relation attribute is + Equals (default), the term is matched + in a normal fashion (modulo truncation and processing of + individual words, if required). + If Relation is Less Than, + Less Than or Equal, + Greater than, or Greater than or + Equal, the term is assumed to be numerical, and a + standard regular expression is constructed to match the given + expression. + If Relation is Relevance, + the standard natural-language query processor is invoked. + + + + For the Truncation attribute, + No Truncation is the default. + Left Truncation is not supported. + Process # in search term is supported, as is + Regxp-1. + Regxp-2 enables the fault-tolerant (fuzzy) + search. As a default, a single error (deletion, insertion, + replacement) is accepted when terms are matched against the register + contents. + + + + + Zebra Regular Expressions in Truncation Attribute (type = 5) + + + Each term in a query is interpreted as a regular expression if + the truncation value is either Regxp-1 (@attr 5=102) + or Regxp-2 (@attr 5=103). + Both query types follow the same syntax with the operands: + + + + + + + + + + + + + + + + + + + +
Regular Expression Operands
xMatches the character x.
.Matches any character.
[ .. ]Matches the set of characters specified; + such as [abc] or [a-c].
+ + + The above operands can be combined with the following operators: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Regular Expression Operators
x*Matches x zero or more times. + Priority: high.
x+Matches x one or more times. + Priority: high.
x? Matches x zero or once. + Priority: high.
xy Matches x, then y. + Priority: medium.
x|y Matches either x or y. + Priority: low.
( )The order of evaluation may be changed by using parentheses.
+ + + If the first character of the Regxp-2 query + is a plus character (+) it marks the + beginning of a section with non-standard specifiers. + The next plus character marks the end of the section. + Currently Zebra only supports one specifier, the error tolerance, + which consists one digit. + + + + Since the plus operator is normally a suffix operator the addition to + the query syntax doesn't violate the syntax for standard regular + expressions. + + + + For example, a phrase search with regular expressions in + the title-register is performed like this: + + Z> find @attr 1=4 @attr 5=102 "informat.* retrieval" + + + + + Combinations with other attributes are possible. For example, a + ranked search with a regular expression + (see for the glory details): + + Z> find @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval" + + +
+ @@ -630,108 +1622,6 @@ - -