X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Fquerymodel.xml;h=82d25ec6a072ed011b76c004f72e2819cf3b7328;hb=8b2d919ca1ab2134c098057bb0965ec7dc42cd9d;hp=88c2fd7799e247f9968d09d4c746034e12eab9e1;hpb=1ab2e1e0d6f2aa60baa5195b0a313f689d4c1027;p=idzebra-moved-to-github.git diff --git a/doc/querymodel.xml b/doc/querymodel.xml index 88c2fd7..82d25ec 100644 --- a/doc/querymodel.xml +++ b/doc/querymodel.xml @@ -1,11 +1,10 @@ - + Query Model - Query Model Overview + Query Model Overview - Query Languages @@ -15,7 +14,7 @@ Z39.50 and SRU, and implement the - type-1 Reverse Polish Notation (RPN) query + type-1 Reverse Polish Notation (RPN) query model defined there. Unfortunately, this model has only defined a binary encoded representation, which is used as transport packaging in @@ -23,45 +22,44 @@ readable, nor defines any convenient way to specify queries. - Since the type-1 (RPN) + Since the type-1 (RPN) query structure has no direct, useful string - representation, every origin application needs to provide some + representation, every client application needs to provide some form of mapping from a local query notation or representation to it. - - - - - Prefix Query Format (PQF) - - - Index Data has defined a textual representaion in the - Prefix Query Format, short - PQF, which mappes - one-to-one to binary encoded - type-1 RPN query packages. - It has been adopted by other - parties developing Z39.50 software, and is often referred to as - Prefix Query Notation, or in short - PQN. See - for further explanaitions and - descriptions of Zebra's capabilities. - - - - Common Query Language (CQL) + + + + Prefix Query Format (PQF) + + Index Data has defined a textual representation in the + Prefix Query Format, short + PQF, which maps + one-to-one to binary encoded + type-1 RPN queries. + PQF has been adopted by other + parties developing Z39.50 software, and is often referred to as + Prefix Query Notation, or in short + PQN. See + for further explanations and + descriptions of Zebra's capabilities. + + + + + Common Query Language (CQL) - The query model of the type-1 RPN, - expressed in PQF/PQN is natively supported. - On the other hand, the default SRU - webservices Common Query Language - CQL is not natively supported. + The query model of the type-1 RPN, + expressed in PQF/PQN is natively supported. + On the other hand, the default SRU + web services Common Query Language + CQL is not natively supported. - Zebra can be configured to understand and map CQL to PQF. See - . - - + Zebra can be configured to understand and map CQL to PQF. See + . + + @@ -69,9 +67,9 @@ Operation types Zebra supports all of the three different - Z39.50/SRU operations defined in the - standards: explain, search, - and scan. A short description of the + Z39.50/SRU operations defined in the + standards: explain, search, + and scan. A short description of the functionality and purpose of each is quite in order here. @@ -83,30 +81,28 @@ semantics - taking into account a particular servers functionalities and abilities - must be discovered from case to case. Enters the - explain operation, which provides the means - for learning which + explain operation, which provides the means for learning which fields (also called - indexes or access points + indexes or access points) are provided, which default parameter the server uses, which retrieve document formats are defined, and which specific parts of the general query model are supported. - The Z39.50 embeddes the explain operation - by perfoming a - search in the magic + The Z39.50 embeds the explain operation + by performing a + search in the magic IR-Explain-1 database; see . - In SRU, explain is an entirely seperate - operation, which returns an Zeerex - XML record according to the + In SRU, explain is an entirely separate + operation, which returns an ZeeRex XML record according to the structure defined by the protocol. In both cases, the information gathered through - explain operations can be used to + explain operations can be used to auto-configure a client user interface to the servers capabilities. @@ -128,15 +124,15 @@ Scan Operation - The scan operation is a helper functionality, + The scan operation is a helper functionality, which operates on one index or access point a time. It provides the means to investigate the content of specific indexes. - Scanning an index returns a handfull of terms actually fond in - the indexes, and in addition the scan - operation returns th enumber of documents indexed by each term. + Scanning an index returns a handful of terms actually found in + the indexes, and in addition the scan + operation returns the number of documents indexed by each term. A search client can use this information to propose proper spelling of search terms, to auto-fill search boxes, or to display controlled vocabularies. @@ -151,10 +147,11 @@ Prefix Query Format syntax and semantics - The PQF grammer + The PQF grammar is documented in the YAZ manual, and shall not be repeated here. This textual PQF representation - is always during search mapped to the equivalent Zebra internal + is not transmistted to Zebra during search, but it is in the + client mapped to the equivalent Z39.50 binary query parse tree. @@ -180,115 +177,116 @@ definitions, others can easily be defined and added to the configuration. - - - - - +
Attribute sets predefined in Zebra
+ Attribute sets predefined in Zebra + - - - - - - - - + + Attribute set + Short hand + Status + Notes + + + - - - - - - - - - - - - - - - - - - + processing. + default + + + GILS + gils + Extension to the Bib1 attribute set. + predefined + +
Attribute setShort handStatusNotes
Explainexp-1Special attribute set used on the special automagic + + Explain + exp-1 + Special attribute set used on the special automagic IR-Explain-1 database to gain information on server capabilities, database names, and database - and semantics.predefined
Bib1bib-1Standard PQF query language attribute set which defines the + and semantics. + predefined + + + Bib1 + bib-1 + Standard PQF query language attribute set which defines the semantics of Z39.50 searching. In addition, all of the - non-use attributes (type 2-9) define the hard-wired + non-use attributes (types 2-11) define the hard-wired Zebra internal query - processing.default
GILSgilsExtention to the Bib1 attribute set.predefined
+ + + The use attributes (type 1) mappings the + predefined attribute sets are found in the + attribute set configuration files tab/*.att. + + + + + The Zebra internal query processing is modeled after + the Bib1 attribute set, and the non-use + attributes type 2-6 are hard-wired in. It is therefore essential + to be familiar with . + + +
- - - The use attributes (type 1) mappings the - predefined attribute sets are found in the - attribute set configuration files tab/*.att. - - - - The Zebra internal query processing is modeled after - the Bib1 attribute set, and the non-use - attributes type 2-6 are hard-wired in. It is therefore essential - to be familiar with . - - Boolean operators - A pair of subquery trees, or of atomic queries, is combined + A pair of sub query trees, or of atomic queries, is combined using the standard boolean operators into new query trees. + Thus, boolean operators are always internal nodes in the query tree. - - - +
Boolean operators
+ Boolean operators + - - - - - - + + Keyword + Operator + Description + + - - - - - - - - - - - - - - - - + @and + binary AND operator + Set intersection of two atomic queries hit sets + + @or + binary OR operator + Set union of two atomic queries hit sets + + @not + binary AND NOT operator + Set complement of two atomic queries hit sets + + @prox + binary PROXIMITY operator + Set intersection of two atomic queries hit sets. In + addition, the intersection set is purged for all + documents which do not satisfy the requested query + term proximity. Usually a proper subset of the AND + operation. + +
KeywordOperatorDescription
@andbinary AND operatorSet intersection of two atomic queries hit sets
@orbinary OR operatorSet union of two atomic queries hit sets
@notbinary AND NOT operatorSet complement of two atomic queries hit sets
@proxbinary PROXIMY operatorSet intersection of two atomic queries hit sets. In - addition, the intersection set is purged for all - documents which do not satisfy the requested query - term proximity. Usually a proper subset of the AND - operation.
@@ -307,7 +305,7 @@ Querying for the intersection of all documents containing the terms information AND retrieval: - The hit set is a subset of the coresponding + The hit set is a subset of the corresponding OR query. Z> find @and information retrieval @@ -317,20 +315,21 @@ Querying for the intersection of all documents containing the terms information AND retrieval, taking proximity into account: - The hit set is a subset of the coresponding - AND query. + The hit set is a subset of the corresponding + AND query + (see the PQF grammar for + details on the proximity operator): Z> find @prox 0 3 0 2 k 2 information retrieval - See PQF grammer for details. Querying for the intersection of all documents containing the terms information AND retrieval, in the same order and near each - other as described in the term list - The hit set is a subset of the coresponding - PROXIMY query. + other as described in the term list. + The hit set is a subset of the corresponding + PROXIMITY query. Z> find "information retrieval" @@ -341,48 +340,51 @@ Atomic queries (APT) - Atomic queries are the query parts which work on one acess point + Atomic queries are the query parts which work on one access point only. These consist of an attribute list followed by a single term or a quoted term list, and are often called Attributes-Plus-Terms (APT) queries. - Unsupplied non-use attributes type 2-9 are either inherited from + Atomic (APT) queries are always leaf nodes in the PQF query tree. + UN-supplied non-use attributes types 2-11 are either inherited from higher nodes in the query tree, or are set to Zebra's default values. See for details. - - - - - - - - - - - - - - + + + + term + single term + or quoted term list + Here the search terms or list of search terms is added + to the query + +
Atomic queries
attribute listList of orthogonal attributesAny of the orthogonal attribute types may be omitted, + + attribute list + List of orthogonal attributes + Any of the orthogonal attribute types may be omitted, these are inherited from higher query tree nodes, or if not inherited, are set to the default Zebra configuration values. -
termsingle term - or quoted term list Here the search terms or list of search terms is added - to the query
Querying for the term information in the - default index using the default attribite set, the server choice + default index using the default attribute set, the server choice of access point/index, and the default non-use attributes. Z> find information @@ -394,7 +396,7 @@ Z> find @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 information - + Finding all documents which have the term debussy in the title field. @@ -403,6 +405,22 @@ + + The scan operation is only supported with + atomic APT queries, as it is bound to one access point at a + time. Boolean query trees are not allowed during + scan. + + + + For example, we might want to scan the title index, starting with + the term + debussy, and displaying this and the + following terms in lexicographic order: + + Z> scan @attr 1=4 debussy + +
@@ -410,13 +428,15 @@ Named Result Sets Named result sets are supported in Zebra, and result sets can be - used as operands without limitations. + used as operands without limitations. It follows that named + result sets are leaf nodes in the PQF query tree, exactly as + atomic APT queries are. After the execution of a search, the result set is available at the server, such that the client can use it for subsequent searches or retrieval requests. The Z30.50 standard actually - stresses the fact that result sets are voliatile. It may cease + stresses the fact that result sets are volatile. It may cease to exist at any time point after search, and the server will send a diagnostic to the effect that the requested result set does not exist any more. @@ -424,7 +444,9 @@ Defining a named result set and re-using it in the next query, - using yaz-client. + using yaz-client. Notice that the client, not + the server, assigns the string '1' to the + named result set. Z> f @attr 1=4 mozart ... @@ -433,34 +455,30 @@ Z> f @and @set 1 @attr 1=4 amadeus ... Number of hits: 14, setno 2 - ... - Z> f @attr 1=1016 beethoven - ... - Number of hits: 26, setno 3 - ... - Named result sets are only supported by the Z39.50 protocol. - The SRU web service is stateless, and therefore the notion of - named result sets does not exist when acessing a Zebra server by - the SRU protocol. + + Named result sets are only supported by the Z39.50 protocol. + The SRU web service is stateless, and therefore the notion of + named result sets does not exist when accessing a Zebra server by + the SRU protocol. +
- - + Zebra's special access point of type 'string' The numeric use (type 1) attribute is usually - refered to from a given + referred to from a given attribute set. In addition, Zebra let you use any internal index name defined in your configuration - as use atribute value. This is a great feature for + as use attribute value. This is a great feature for debugging, and when you do - not need the complecity of defined use attribute values. It is + not need the complexity of defined use attribute values. It is the preferred way of accessing Zebra indexes directly. @@ -494,7 +512,7 @@ See also for details, and - for the SRU PQF query extention using string names as a fast + for the SRU PQF query extension using string names as a fast debugging facility. @@ -507,7 +525,7 @@ idea) to emulate XPath 1.0 based search by defining use (type 1) - string attributes which in appearence + string attributes which in appearance resemble XPath queries. There are two problems with this approach: first, the XPath-look-alike has to be defined at indexation time, no new undefined @@ -525,27 +543,29 @@ use (type 1) xpath attributes. You must enable the xpath enable directive in your - .abs config files. + .abs configuration files. - Only a very restricted subset of the - XPath 1.0 - standard is supported as the GRS record model is simpler than - a full XML DOM structure. See the following examples for - possibilities. + + Only a very restricted subset of the + XPath 1.0 + standard is supported as the GRS record model is simpler than + a full XML DOM structure. See the following examples for + possibilities. + Finding all documents which have the term "content" inside a text node found in a specific XML DOM subtree, whose starting element is - adressed by XPath. + addressed by XPath. Z> find @attr 1=/root content Z> find @attr 1=/root/first content Notice that the XPath must be absolute, i.e., must start with '/', and that the - XPath decendant-or-self axis followed by a + XPath descendant-or-self axis followed by a text node selection text() is implicitly appended to the stated XPath. @@ -564,10 +584,10 @@ - Filter the adressing XPath by a predicate working on exact + Filter the addressing XPath by a predicate working on exact string values in attributes (in the XML sense) can be done: return all those docs which - have the term "english" contained in one of all text subnodes of + have the term "english" contained in one of all text sub nodes of the subtree defined by the XPath /record/title[@lang='en']. And similar predicate filtering. @@ -588,22 +608,23 @@ Escaping PQF keywords and other non-parseable XPath constructs - with '{ }' to prevent syntax errors: + with '{ }' to prevent client-side PQF parsing + syntax errors: Z> find @attr {1=/root/first[@attr='danish']} content Z> find @attr {1=/record/@set} oai - It is worth mentioning that these dynamic performed XPath - queries are a performance bottelneck, as no optimized - specialized indexes can be used. Therefore, avoid the use of - this facility when speed is essential, and the database content - size is medium to large. + + It is worth mentioning that these dynamic performed XPath + queries are a performance bottleneck, as no optimized + specialized indexes can be used. Therefore, avoid the use of + this facility when speed is essential, and the database content + size is medium to large. + - - @@ -634,7 +655,7 @@ Use Attributes (type = 1) - The following Explain search atributes are supported: + The following Explain search attributes are supported: ExplainCategory (@attr 1=1), DatabaseName (@attr 1=3), DateAdded (@attr 1=9), @@ -657,7 +678,7 @@ Explain searches with yaz-client Classic Explain only defines retrieval of Explain information - via ASN.1. Pratically no Z39.50 clients supports this. Fortunately + via ASN.1. Practically no Z39.50 clients supports this. Fortunately they don't have to - Zebra allows retrieval of this information in other formats: SUTRS, XML, @@ -744,7 +765,7 @@ Most of the information contained in this section is an excerpt of the ATTRIBUTE SET BIB-1 (Z39.50-1995) SEMANTICS, - found at . The BIB-1 + found at . The BIB-1 Attribute Set Semantics from 1995, also in an updated Bib-1 Attribute Set @@ -759,20 +780,42 @@ A use attribute specifies an access point for any atomic query. - These acess points are highly dependent on the attribute set used + These access points are highly dependent on the attribute set used in the query, and are user configurable using the following default configuration files: tab/bib1.att, tab/dan1.att, tab/explain.att, and tab/gils.att. + + + For example, some few Bib-1 use + attributes from the tab/bib1.att are: + + att 1 Personal-name + att 2 Corporate-name + att 3 Conference-name + att 4 Title + ... + att 1009 Subject-name-personal + att 1010 Body-of-text + att 1011 Date/time-added-to-db + ... + att 1016 Any + att 1017 Server-choice + att 1018 Publisher + ... + att 1035 Anywhere + att 1036 Author-Title-Subject + + + New attribute sets can be added by adding new tab/*.att configuration files, which need to - be sourced in the main configuration zebra.cfg. + be sourced in the main configuration zebra.cfg. - - In addition, Zebra allows the acess of + In addition, Zebra allows the access of internal index names and dynamic XPath as use attributes; see and @@ -805,74 +848,73 @@ side of the relation), e.g., Date-publication <= 1975. - - - - - - - - - +
Relation Attributes (type 2)
RelationValueNotes
+ Relation Attributes (type 2) + + + + Relation + Value + Notes + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + Less than + 1 + supported + + + Less than or equal + 2 + supported + + + Equal + 3 + default + + + Greater or equal + 4 + supported + + + Greater than + 5 + supported + + + Not equal + 6 + unsupported + + + Phonetic + 100 + unsupported + + + Stem + 101 + unsupported + + + Relevance + 102 + supported + + + AlwaysMatches + 103 + supported + +
Less than1supported
Less than or equal2supported
Equal3default
Greater or equal4supported
Greater than5supported
Not equal6unsupported
Phonetic100unsupported
Stem101unsupported
Relevance102supported
AlwaysMatches103supported
- + - The relation attributes - 1-5 are supported and work exactly as + The relation attributes 1-5 are supported and work exactly as expected. All ordering operations are based on a lexicographical ordering, expect when the @@ -880,23 +922,23 @@ this case, ordering is numerical. See . - Z> find @attr 1=Title @attr 2=1 music + Z> find @attr 1=Title @attr 2=1 music ... Number of hits: 11745, setno 1 ... - Z> find @attr 1=Title @attr 2=2 music + Z> find @attr 1=Title @attr 2=2 music ... Number of hits: 11771, setno 2 ... - Z> find @attr 1=Title @attr 2=3 music + Z> find @attr 1=Title @attr 2=3 music ... Number of hits: 532, setno 3 ... - Z> find @attr 1=Title @attr 2=4 music + Z> find @attr 1=Title @attr 2=4 music ... Number of hits: 11463, setno 4 ... - Z> find @attr 1=Title @attr 2=5 music + Z> find @attr 1=Title @attr 2=5 music ... Number of hits: 11419, setno 5 @@ -950,42 +992,42 @@ within the field or subfield in which it appears. - - - - - - - - - +
Position Attributes (type 3)
PositionValueNotes
+ Position Attributes (type 3) + + + + Position + Value + Notes + - - - - - - - - - - - - - - - + + First in field + 1 + unsupported + + + First in subfield + 2 + unsupported + + + Any position in field + 3 + supported + +
First in field 1unsupported
First in subfield2unsupported
Any position in field3default
The position attribute values first in field (1), and first in subfield(2) are unsupported. - Using them does not trigger an error, but silent defaults to - any position in field (3). - + Using them silently maps to + any position in field (3). A proper diagnostic + should have been issued.
@@ -1004,105 +1046,104 @@ structure attribute (type 4) can be defined using the configuration file tab/default.idx. - The default configuration is summerized in this table. + The default configuration is summarized in this table. - - - - - - - - - +
Structure Attributes (type 4)
StructureValueNotes
+ Structure Attributes (type 4) + + + + Structure + Value + Notes + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + Phrase + 1 + default + + + Word + 2 + supported + + + Key + 3 + supported + + + Year + 4 + supported + + + Date (normalized) + 5 + supported + + + Word list + 6 + supported + + + Date (un-normalized) + 100 + unsupported + + + Name (normalized) + 101 + unsupported + + + Name (un-normalized) + 102 + unsupported + + + Structure + 103 + unsupported + + + Urx + 104 + supported + + + Free-form-text + 105 + supported + + + Document-text + 106 + supported + + + Local-number + 107 + supported + + + String + 108 + unsupported + + + Numeric string + 109 + supported + +
Phrase 1default
Word2supported
Key3supported
Year4supported
Date (normalized)5supported
Word list6supported
Date (un-normalized)100unsupported
Name (normalized) 101unsupported
Name (un-normalized) 102unsupported
Structure103unsupported
Urx104supported
Free-form-text105supported
Document-text106supported
Local-number107supported
String108unsupported
Numeric string109supported
- The structure attribute values Word list (6) @@ -1129,7 +1170,7 @@ Z> find @attr 1=Body-of-text @attr 4=106 "bach salieri teleman" Z> find @attr 1=Body-of-text @or bach @or salieri teleman - This OR list of terms is very usefull in + This OR list of terms is very useful in combination with relevance ranking: Z> find @attr 1=Body-of-text @attr 2=102 @attr 4=105 "bach salieri teleman" @@ -1160,73 +1201,74 @@ Z> find @attr 4=109 @attr 2=5 @attr gils 1=2038 -114 - + - The exact mapping between PQF queries and Zebra internal indexes - and index types is explained in + + The exact mapping between PQF queries and Zebra internal indexes + and index types is explained in . - - - - + + + + Truncation Attributes (type = 5) The truncation attribute specifies whether variations of one or - more characters are allowed between serch term and hit terms, or + more characters are allowed between search term and hit terms, or not. Using non-default truncation attributes will broaden the document hit set of a search query. - - - - - - - - - +
Truncation Attributes (type 5)
TruncationValueNotes
+ Truncation Attributes (type 5) + + + + Truncation + Value + Notes + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + Right truncation + 1 + supported + + + Left truncation + 2 + supported + + + Left and right truncation + 3 + supported + + + Do not truncate + 100 + default + + + Process # in search term + 101 + supported + + + RegExpr-1 + 102 + supported + + + RegExpr-2 + 103 + supported + +
Right truncation 1supported
Left truncation2supported
Left and right truncation3supported
Do not truncate100default
Process # in search term101supported
RegExpr-1 102supported
RegExpr-2103supported
@@ -1257,7 +1299,7 @@ Process # in search term (101) is a poor-man's regular expression search. It maps each # to .*, and - performes then a Regexp-1 (102) regular + performs then a Regexp-1 (102) regular expression search. The following two queries are equivalent: Z> find @attr 1=Body-of-text @attr 5=101 schnit#ke @@ -1279,12 +1321,12 @@ The truncation attribute value - Regexp-2 (103) is a Zebra specific extention + Regexp-2 (103) is a Zebra specific extension which allows fuzzy matches. One single error in spelling of search terms is allowed, i.e., a document is hit if it includes a term which can be mapped to the used search term by one character substitution, addition, deletion or - change of posiiton. + change of position. Z> find @attr 1=Body-of-text @attr 5=100 schnittke ... @@ -1311,33 +1353,34 @@ (Complete field (3)). - - - - - - - - +
Completeness Attributes (type = 6)
CompletenessValueNotes
+ Completeness Attributes (type = 6) + + + + Completeness + Value + Notes + - - - - - - - - - - - - - - - + + Incomplete subfield + 1 + default + + + Complete subfield + 2 + deprecated + + + Complete field + 3 + supported + +
Incomplete subfield1default
Complete subfield2depreciated
Complete field3supported
@@ -1362,10 +1405,12 @@ - The exact mapping between PQF queries and Zebra internal indexes - and index types is explained in + + The exact mapping between PQF queries and Zebra internal indexes + and index types is explained in . - +
+
@@ -1377,11 +1422,11 @@ The Zebra internal query engine has been extended to specific needs not covered by the bib-1 attribute set query - model. These extentions are non-standard - and non-portable: most functional extentions + model. These extensions are non-standard + and non-portable: most functional extensions are modeled over the bib-1 attribute set, defining type 7-9 attributes. - There are also the speciel + There are also the special string type index names for the idxpath attribute set. @@ -1413,112 +1458,124 @@ - The special string index _ALLRECORDS is - experimental, and the provided functionality and syntax may very - well change in future releases of Zebra. + + The special string index _ALLRECORDS is + experimental, and the provided functionality and syntax may very + well change in future releases of Zebra. + - - Zebra specific Search Extentions to all Attribute Sets + Zebra specific Search Extensions to all Attribute Sets - Zebra extends the Bib1 attribute types, and these extentions are + Zebra extends the Bib1 attribute types, and these extensions are recognized regardless of attribute set used in a search operation query. - - - - - - - - - - - + +
Zebra Search Attribute Extentions
NameValueOperationZebra version
+ Zebra Search Attribute Extensions + + + + Name + Value + Operation + Zebra version + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Embedded Sort7search1.1
Term Set8search1.1
Rank Weight9search1.1
Approx Limit9search1.4
Term Reference10search1.4
- + + + Embedded Sort + 7 + search + 1.1 + + + Term Set + 8 + search + 1.1 + + + Rank Weight + 9 + search + 1.1 + + + Approx Limit + 11 + search + 1.4 + + + Term Reference + 10 + search + 1.4 + + + + + - Zebra Extention Embedded Sort Attribute (type 7) - - - The embedded sort is a way to specify sort within a query - thus - removing the need to send a Sort Request separately. It is both - faster and does not require clients to deal with the Sort - Facility. - - - - All ordering operations are based on a lexicographical ordering, - expect when the - structure attribute numeric (109) is used. In - this case, ordering is numerical. See + Zebra Extension Embedded Sort Attribute (type 7) + + The embedded sort is a way to specify sort within a query - thus + removing the need to send a Sort Request separately. It is both + faster and does not require clients to deal with the Sort + Facility. + + + + All ordering operations are based on a lexicographical ordering, + expect when the + structure attribute numeric (109) is used. In + this case, ordering is numerical. See . - + + + + The possible values after attribute type 7 are + 1 ascending and + 2 descending. + The attributes+term (APT) node is separate from the + rest and must be @or'ed. + The term associated with APT is the sorting level in integers, + where 0 means primary sort, + 1 means secondary sort, and so forth. + See also . + + + For example, searching for water, sort by title (ascending) + + Z> find @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 + + + + Or, searching for water, sort by title ascending, then date descending + + Z> find @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1 + + + - - The possible values after attribute type 7 are - 1 ascending and - 2 descending. - The attributes+term (APT) node is separate from the - rest and must be @or'ed. - The term associated with APT is the sorting level in integers, - where 0 means primary sort, - 1 means secondary sort, and so forth. - See also . - - - For example, searching for water, sort by title (ascending) - - Z> find @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 - - - - Or, searching for water, sort by title ascending, then date descending - - Z> find @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1 - - + + + + + + Zebra Extension Rank Weight Attribute (type 9) + + Rank weight is a way to pass a value to a ranking algorithm - so + that one APT has one value - while another as a different one. + See also . + + + For example, searching for utah in title with weight 30 as well + as any with weight 20: + + Z> find @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 utah + + + + - Zebra Extention Approximative Limit Attribute (type 9) + Zebra Extension Approximative Limit Attribute (type 11) + + Zebra computes - unless otherwise configured - + the exact hit count for every APT + (leaf) in the query tree. These hit counts are returned as part of + the searchResult-1 facility in the binary encoded Z39.50 search + response packages. + + + By setting an estimation limit size of the resultset of the APT + leaves, Zebra stoppes processing the result set when the limit + length is reached. + Hit counts under this limit are still precise, but hit counts over it + are estimated using the statistics gathered from the chopped + result set. + + + Specifying a limit of 0 resuts in exact hit counts. + + + For example, we might be interested in exact hit count for a, but + for b we allow hit count estimates for 1000 and higher. + + Z> find @and a @attr 11=1000 b + + + + + The estimated hit count facility makes searches faster, as one + only needs to process large hit lists partially. + It is mostly used in huge databases, where you you want trade + exactness of hit counts against speed of execution. + + + + + Do not use approximative hit count limits + in conjunction with relevance ranking, as re-sorting of the + result set obviosly only works when the entire result set has + been processed. + + + + + This facility clashes with rank weight, because there all + documents in the hit lists need to be examined for scoring and + re-sorting. + It is an experimental + extension. Do not use in production code. + + - - Newer Zebra versions normally estemiates hit count for every APT - (leaf) in the query tree. These hit counts are returned as part of - the searchResult-1 facility in the binary encoded Z39.50 search - response packages. - - - By setting a limit for the APT we can make Zebra turn into - approximate hit count when a certain hit count limit is - reached. A value of zero means exact hit count. - - - For example, we might be intersted in exact hit count for a, but - for b we allow hit count estimates for 1000 and higher. - - Z> find @and a @attr 9=1000 b - - - - The estimated hit count fascility makes searches faster, as one - only needs to process large hit lists partially. - - - This facility clashes with rank weight, because there all - documents in the hit lists need to be examined for scoring and - re-sorting. - It is an experimental - extention. Do not use in production code. - - Zebra Extention Term Reference Attribute (type 10) - - - Zebra supports the searchResult-1 facility. - If the Term Reference Attribute (type 10) is - given, that specifies a subqueryId value returned as part of the - search result. It is a way for a client to name an APT part of a - query. - - - - Experimental. Do not use in production code. - - - + --> + + + Experimental. Do not use in production code. + + + +
- Zebra specific Scan Extentions to all Attribute Sets + Zebra specific Scan Extensions to all Attribute Sets - Zebra extends the Bib1 attribute types, and these extentions are + Zebra extends the Bib1 attribute types, and these extensions are recognized regardless of attribute - set used in a scan operation query. + set used in a scan operation query. - - - - - - - - - - +
Zebra Scan Attribute Extentions
NameTypeOperationZebra version
+ Zebra Scan Attribute Extensions + + + + Name + Type + Operation + Zebra version + - - - - - - - - - - - - - - -
Result Set Narrow8scan1.3
Approximative Limit9scan1.4
- + + + Result Set Narrow + 8 + scan + 1.3 + + + Approximative Limit + 9 + scan + 1.4 + + + + + - Zebra Extention Result Set Narrow (type 8) - - - If attribute Result Set Narrow (type 8) - is given for scan, the value is the name of a - result set. Each hit count in scan is - @and'ed with the result set given. - - - Consider for example - the case of scanning all title fields around the - scanterm mozart, then refining the scan by - issuing a filtering query for amadeus to - restric the scan to the result set of the query: - + Zebra Extension Result Set Narrow (type 8) + + If attribute Result Set Narrow (type 8) + is given for scan, the value is the name of a + result set. Each hit count in scan is + @and'ed with the result set given. + + + Consider for example + the case of scanning all title fields around the + scanterm mozart, then refining the scan by + issuing a filtering query for amadeus to + restrict the scan to the result set of the query: + Z> scan @attr 1=4 mozart ... * mozart (43) @@ -1681,54 +1762,57 @@ mozartiana (0) mozarts (1) ... - - - + + + - Experimental. Do not use in production code. - + + Experimental. Do not use in production code. + + + - Zebra Extention Approximative Limit (type 9) - - - The Zebra Extention Approximative Limit (type - 9) is a way to enable approx - hit counts for scan hit counts, in the same - way as for search hit counts. - - - - Experimental and buggy. Definitely not to be used in production code. - - - + --> + + + Experimental and buggy. Definitely not to be used in production code. + + +
- Zebra special IDXPATH Attribute Set for GRS indexing The attribute-set idxpath consists of a single - Use (type 1) attribute. All non-use attributes - behave as normal. + Use (type 1) attribute. All non-use attributes behave as normal. This feature is enabled when defining the xpath enable option in the GRS filter *.abs configuration files. If one wants to use the special idxpath numeric attribute set, the - main Zebra configuraiton file zebra.cfg + main Zebra configuration file zebra.cfg directive attset: idxpath.att must be enabled. - The idxpath is depreciated, may not be - supported in future Zebra versions, and should definitely - not be used in production code. + + + The idxpath is deprecated, may not be + supported in future Zebra versions, and should definitely + not be used in production code. + @@ -1738,57 +1822,59 @@ records by XPATH like structured index names. - The idxpath option defines hard-coded - index names, which might clash with your own index names. + + + The idxpath option defines hard-coded + index names, which might clash with your own index names. + - - - - - - - - - - +
Zebra specific IDXPATH Use Attributes (type 1)
IDXPATHValueString IndexNotes
+ Zebra specific IDXPATH Use Attributes (type 1) + + + + IDXPATH + Value + String Index + Notes + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + XPATH Begin + 1 + _XPATH_BEGIN + deprecated + + + XPATH End + 2 + _XPATH_END + deprecated + + + XPATH CData + 1016 + _XPATH_CDATA + deprecated + + + XPATH Attribute Name + 3 + _XPATH_ATTR_NAME + deprecated + + + XPATH Attribute CData + 1015 + _XPATH_ATTR_CDATA + deprecated + +
XPATH Begin1_XPATH_BEGINdepreciated
XPATH End2_XPATH_ENDdepreciated
XPATH CData1016_XPATH_CDATAdepreciated
XPATH Attribute Name3_XPATH_ATTR_NAMEdepreciated
XPATH Attribute CData1015_XPATH_ATTR_CDATAdepreciated
- See tab/idxpath.att for more information. @@ -1835,7 +1921,7 @@ - Combining usual bib-1 attribut set searches + Combining usual bib-1 attribute set searches with idxpath attribute set searches: Z> find @and @attr idxpath 1=1 @attr 4=3 link/ @attr 1=4 mozart @@ -1843,7 +1929,7 @@ - Scanning is supportet on all idxpath + Scanning is supported on all idxpath indexes, both specified as numeric use attributes, or as string index names. @@ -1880,58 +1966,58 @@ All other access point types are Zebra specific, and non-portable. - - - +
Acces point name mapping
+ Access point name mapping + - - - - - - + + Access Point + Type + Grammar + Notes + - - - - - - - - - - - - - - - - - - - - - - - - - -
Acess PointTypeGrammarNotes
Use attibutenumeric[1-9][1-9]*directly mapped to string index name
String index namestring[a-zA-Z](\-?[a-zA-Z0-9])*normalized name is used as internal string index name
Zebra internal index namezebra_[a-zA-Z](_?[a-zA-Z0-9])*hardwired internal string index name
XPATH special indexXPath/.*special xpath search for GRS indexed records
- - - Attribute set names and - string index names are normalizes - according to the following rules: all single - hyphens '-' are stripped, and all upper case - letters are folded to lower case. + + Use attribute + numeric + [1-9][1-9]* + directly mapped to string index name + + + String index name + string + [a-zA-Z](\-?[a-zA-Z0-9])* + normalized name is used as internal string index name + + + Zebra internal index name + zebra + _[a-zA-Z](_?[a-zA-Z0-9])* + hardwired internal string index name + + + XPATH special index + XPath + /.* + special xpath search for GRS indexed records + + + + + + + Attribute set names and + string index names are normalizes + according to the following rules: all single + hyphens '-' are stripped, and all upper case + letters are folded to lower case. - + Numeric use attributes are mapped to the Zebra internal - string index according to the attribute set defintion in use. + string index according to the attribute set definition in use. The default attribute set is Bib-1, and may be omitted in the PQF query. @@ -1963,9 +2049,8 @@ fields as specified in the .abs file which describes the profile of the records which have been loaded. If no use attribute is provided, a default of - Bib-1 Use Any (1016) is - assumed. - The predefined use attribute sets + Bib-1 Use Any (1016) is assumed. + The predefined use attribute sets can be reconfigured by tweaking the configuration files tab/*.att, and new attribute sets can be defined by adding similar files in the @@ -1973,10 +2058,10 @@ - String indexes can be acessed directly, + String indexes can be accessed directly, independently which attribute set is in use. These are just ignored. The above mentioned name normalization applies. - String index names are defined in the + String index names are defined in the used indexing filter configuration files, for example in the GRS *.abs configuration files, or in the @@ -1984,10 +2069,10 @@ - Zebra internal indexes can be acessed directly, + Zebra internal indexes can be accessed directly, according to the same rules as the user defined - string indexes. The only difference is that - Zebra internal indexe names are hardwired, + string indexes. The only difference is that + Zebra internal index names are hardwired, all uppercase and must start with the character '_'. @@ -1995,7 +2080,7 @@ Finally, XPATH access points are only available using the GRS filter for indexing. - These acees point names must start with the character + These access point names must start with the character '/', they are not normalized, but passed unaltered to the Zebra internal XPATH engine. See . @@ -2013,91 +2098,91 @@ Internally Zebra has in it's default configuration several different types of registers or indexes, whose tokenization and character normalization rules differ. This reflects the fact that - serching fundamental different tokens like dates, numbers, - bitfields and string based text needs different rulesets. + searching fundamental different tokens like dates, numbers, + bitfields and string based text needs different rule sets. - - - +
Structure and completeness mapping to register types
+ Structure and completeness mapping to register types + - - - - - - - - - - + + phrase (@attr 4=1), word (@attr 4=2), word-list (@attr 4=6), free-form-text (@attr 4=105), or document-text (@attr 4=106) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
StructureCompletenessRegister typeNotes
+ + Structure + Completeness + Register type + Notes + + +
Incomplete field (@attr 6=1)Word ('w')Traditional tokenized and character normalized word index
+ + Incomplete field (@attr 6=1) + Word ('w') + Traditional tokenized and character normalized word index + + + phrase (@attr 4=1), word (@attr 4=2), word-list (@attr 4=6), free-form-text (@attr 4=105), or document-text (@attr 4=106) - complete field' (@attr 6=3)Phrase ('p')Character normalized, but not tokenized index for phrase + + complete field' (@attr 6=3) + Phrase ('p') + Character normalized, but not tokenized index for phrase matches -
urx (@attr 4=104)ignoredURX/URL ('u')Special index for URL web adresses
numeric (@attr 4=109)ignoredNumeric ('u')Special index for digital numbers
key (@attr 4=3)ignoredNull bitmap ('0')Used for non-tokenizated and non-normalized bit sequences
year (@attr 4=4)ignoredYear ('y')Non-tokenizated and non-normalized 4 digit numbers
date (@attr 4=5)ignoredDate ('d')Non-tokenizated and non-normalized ISO date strings
ignoredignoredSort ('s')Used with special sort attribute set (@attr 7=1, @attr 7=2)
overruledoverruledspecialInternal record ID register, used whenever - Relation Always Matches (@attr 2=103) is specified
- + + + + urx (@attr 4=104) + ignored + URX/URL ('u') + Special index for URL web addresses + + + numeric (@attr 4=109) + ignored + Numeric ('u') + Special index for digital numbers + + + key (@attr 4=3) + ignored + Null bitmap ('0') + Used for non-tokenizated and non-normalized bit sequences + + + year (@attr 4=4) + ignored + Year ('y') + Non-tokenizated and non-normalized 4 digit numbers + + + date (@attr 4=5) + ignored + Date ('d') + Non-tokenizated and non-normalized ISO date strings + + + ignored + ignored + Sort ('s') + Used with special sort attribute set (@attr 7=1, @attr 7=2) + + + overruled + overruled + special + Internal record ID register, used whenever + Relation Always Matches (@attr 2=103) is specified + + + + + @@ -2111,7 +2196,7 @@ GRS *.abs file that contains a p-specifier. - Z> scan @attr 1=Title @attr 4=1 @attr 6=3 beethoven + Z> scan @attr 1=Title @attr 4=1 @attr 6=3 beethoven ... bayreuther festspiele (1) * beethoven bibliography database (1) @@ -2137,7 +2222,7 @@ The word search is performed on those fields that are indexed as type w in the GRS *.abs file. - Z> scan @attr 1=Title @attr 4=1 @attr 6=1 beethoven + Z> scan @attr 1=Title @attr 4=1 @attr 6=1 beethoven ... beefheart (1) * beethoven (18) @@ -2175,7 +2260,7 @@ If the Structure attribute is - URx the term is treated as a URX (URL) entity. + URX the term is treated as a URX (URL) entity. The search is performed on those fields that are indexed as type u in the *.abs file. @@ -2226,77 +2311,68 @@ Both query types follow the same syntax with the operands: - - - - - - - - - - - - - - - - - - -
Regular Expression Operands
xMatches the character x.
.Matches any character.
[ .. ]Matches the set of characters specified; - such as [abc] or [a-c].
+ + Regular Expression Operands + + + + x + Matches the character x. + + + . + Matches any character. + + + [ .. ] + Matches the set of characters specified; + such as [abc] or [a-c]. + + + +
The above operands can be combined with the following operators: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Regular Expression Operators
x*Matches x zero or more times. - Priority: high.
x+Matches x one or more times. - Priority: high.
x? Matches x zero or once. - Priority: high.
xy Matches x, then y. - Priority: medium.
x|y Matches either x or y. - Priority: low.
( )The order of evaluation may be changed by using parentheses.
- + + + Regular Expression Operators + + + + x* + Matches x zero or more times. + Priority: high. + + + x+ + Matches x one or more times. + Priority: high. + + + x? + Matches x zero or once. + Priority: high. + + + xy + Matches x, then y. + Priority: medium. + + + x|y + Matches either x or y. + Priority: low. + + + ( ) + The order of evaluation may be changed by using parentheses. + + + +
+ If the first character of the Regxp-2 query is a plus character (+) it marks the @@ -2304,6 +2380,8 @@ The next plus character marks the end of the section. Currently Zebra only supports one specifier, the error tolerance, which consists one digit. +