X-Git-Url: http://git.indexdata.com/?p=idzebra-moved-to-github.git;a=blobdiff_plain;f=doc%2Fquerymodel.xml;h=f5d69338b0addb553360fc6949b9b50d7c17565c;hp=d359d18a40d5168817b97536e859b2a18fed9314;hb=c3ff843e467932c6027a8b3b2ebda7b44612447e;hpb=997db1975fa2132c9bb155b69c86f1310f5136b4 diff --git a/doc/querymodel.xml b/doc/querymodel.xml index d359d18..f5d6933 100644 --- a/doc/querymodel.xml +++ b/doc/querymodel.xml @@ -1,306 +1,291 @@ - Query Model - - + +
Query Model Overview - - +
Query Languages - + - Zebra is born as a networking Information Retrieval engine adhering - to the international standards - Z39.50 and - SRU, - and implement the - type-1 Reverse Polish Notation (RPN) query + &zebra; is born as a networking Information Retrieval engine adhering + to the international standards + &acro.z3950; and + &acro.sru;, + and implement the + type-1 Reverse Polish Notation (&acro.rpn;) query model defined there. Unfortunately, this model has only defined a binary encoded representation, which is used as transport packaging in - the Z39.50 protocol layer. This representation is not human - readable, nor defines any convenient way to specify queries. + the &acro.z3950; protocol layer. This representation is not human + readable, nor defines any convenient way to specify queries. - Since the type-1 (RPN) + Since the type-1 (&acro.rpn;) query structure has no direct, useful string - representation, every origin application needs to provide some + representation, every client application needs to provide some form of mapping from a local query notation or representation to it. - - + - - Prefix Query Format (PQF) - - Index Data has defined a textual representaion in the - Prefix Query Format, short - PQF, which mappes - one-to-one to binary encoded - type-1 RPN query packages. - It has been adopted by other - parties developing Z39.50 software, and is often referred to as - Prefix Query Notation, or in short - PQN. See - for further explanaitions and - descriptions of Zebra's capabilities. - - +
+ Prefix Query Format (&acro.pqf;) + + Index Data has defined a textual representation in the + Prefix Query Format, short + &acro.pqf;, which maps + one-to-one to binary encoded + type-1 &acro.rpn; queries. + &acro.pqf; has been adopted by other + parties developing &acro.z3950; software, and is often referred to as + Prefix Query Notation, or in short + &acro.pqn;. See + for further explanations and + descriptions of &zebra;'s capabilities. + +
- - Common Query Language (CQL) +
+ Common Query Language (&acro.cql;) - The query model of the type-1 RPN, - expressed in PQF/PQN is natively supported. - On the other hand, the default SRU - webservices Common Query Language - CQL is not natively supported. + The query model of the type-1 &acro.rpn;, + expressed in &acro.pqf;/&acro.pqn; is natively supported. + On the other hand, the default &acro.sru; + web services Common Query Language + &acro.cql; is not natively supported. - Zebra can be configured to understand and map CQL to PQF. See - . - - - - + &zebra; can be configured to understand and map &acro.cql; to &acro.pqf;. See + . + +
- +
+ +
Operation types - Zebra supports all of the three different - Z39.50/SRU operations defined in the - standards: explain, search, - and scan. A short description of the - functionality and purpose of each is quite in order here. + &zebra; supports all of the three different + &acro.z3950;/&acro.sru; operations defined in the + standards: explain, search, + and scan. A short description of the + functionality and purpose of each is quite in order here. - +
Explain Operation - The syntax of Z39.50/SRU queries is + The syntax of &acro.z3950;/&acro.sru; queries is well known to any client, but the specific semantics - taking into account a particular servers functionalities and abilities - must be - discovered from case to case. Enters the - explain operation, which provides the means - for learning which + discovered from case to case. Enters the + explain operation, which provides the means for learning which fields (also called - indexes or access points + indexes or access points) are provided, which default parameter the server uses, which retrieve document formats are defined, and which specific parts - of the general query model are supported. + of the general query model are supported. - The Z39.50 embeddes the explain operation - by perfoming a - search in the magic + The &acro.z3950; embeds the explain operation + by performing a + search in the magic IR-Explain-1 database; - see . + see . - In SRU, explain is an entirely seperate - operation, which returns an Zeerex - XML record according to the + In &acro.sru;, explain is an entirely separate + operation, which returns an ZeeRex &acro.xml; record according to the structure defined by the protocol. In both cases, the information gathered through - explain operations can be used to + explain operations can be used to auto-configure a client user interface to the servers - capabilities. + capabilities. - +
- + - +
Scan Operation - The scan operation is a helper functionality, - which operates on one index or access point a time. + The scan operation is a helper functionality, + which operates on one index or access point a time. It provides the means to investigate the content of specific indexes. - Scanning an index returns a handfull of terms actually fond in - the indexes, and in addition the scan - operation returns th enumber of documents indexed by each term. + Scanning an index returns a handful of terms actually found in + the indexes, and in addition the scan + operation returns the number of documents indexed by each term. A search client can use this information to propose proper - spelling of search terms, to auto-fill search boxes, or to + spelling of search terms, to auto-fill search boxes, or to display controlled vocabularies. - +
- +
- +
- - - Prefix Query Format structure and syntax +
+ &acro.rpn; queries and semantics - The PQF grammer - is documented in the YAZ manual, and shall not be - repeated here. This textual PQF representation - is always during search mapped to the equivalent Zebra internal - query parse tree. + The &acro.pqf; grammar + is documented in the &yaz; manual, and shall not be + repeated here. This textual &acro.pqf; representation + is not transmitted to &zebra; during search, but it is in the + client mapped to the equivalent &acro.z3950; binary + query parse tree. - - - PQF tree structure + +
+ &acro.rpn; tree structure - The PQF parse tree - or the equivalent textual representation - - may start with one specification of the + The &acro.rpn; parse tree - or the equivalent textual representation in &acro.pqf; - + may start with one specification of the attribute set used. Following is a query - tree, which - consists of atomic query parts (APT) or + tree, which + consists of atomic query parts (&acro.apt;) or named result sets, eventually - paired by boolean binary operators, and - finally recursively combined into - complex query trees. + paired by boolean binary operators, and + finally recursively combined into + complex query trees. - - + +
Attribute sets Attribute sets define the exact meaning and semantics of queries - issued. Zebra comes with some predefined attribute set + issued. &zebra; comes with some predefined attribute set definitions, others can easily be defined and added to the configuration. - - - - - +
Attribute sets predefined in Zebra
+ Attribute sets predefined in &zebra; + - - - - - - - - + + Attribute set + &acro.pqf; notation (Short hand) + Status + Notes + + + - - - - - - - - - - - - - - - - - - - + and semantics. + predefined + + + &acro.bib1; + bib-1 + Standard &acro.pqf; query language attribute set which defines the + semantics of &acro.z3950; searching. In addition, all of the + non-use attributes (types 2-14) define the hard-wired + &zebra; internal query + processing. + default + + + GILS + gils + Extension to the &acro.bib1; attribute set. + predefined + +
Attribute setShort handStatusNotes
Explainexp-1Special attribute set used on the special automagic + + Explain + exp-1 + Special attribute set used on the special automagic IR-Explain-1 database to gain information on server capabilities, database names, and database - and semantics.predefined
Bib1bib-1Standard PQF query language attribute set which defines the - semantics of Z39.50 searching. In addition, all of the - non-use attributes (type 2-9) define the hard-wired - Zebra internal query - processing.default
GILSgilsExtention to the Bib1 attribute set.predefined
- - - The use attributes (type 1) of the predefined attribute sets can - be reconfigured by tweaking the files - tab/*.att. - New attribute sets can be defined by adding similar files in the - configuration path of the server. - + + The use attributes (type 1) mappings the + predefined attribute sets are found in the + attribute set configuration files tab/*.att. + - - The Zebra internal query processing is modeled after - the Bib1 attribute set, and the non-use - attributes type 2-6 are hard-wired in. It is therefore essential - to be familiar with . - + + + The &zebra; internal query processing is modeled after + the &acro.bib1; attribute set, and the non-use + attributes type 2-6 are hard-wired in. It is therefore essential + to be familiar with . + + + +
- - +
Boolean operators - A pair of subquery trees, or of atomic queries, is combined + A pair of sub query trees, or of atomic queries, is combined using the standard boolean operators into new query trees. + Thus, boolean operators are always internal nodes in the query tree. - - - +
Boolean operators
+ Boolean operators + - - - - - - + + Keyword + Operator + Description + + - - - - - - - - - - - - - - - - + @and + binary AND operator + Set intersection of two atomic queries hit sets + + @or + binary OR operator + Set union of two atomic queries hit sets + + @not + binary AND NOT operator + Set complement of two atomic queries hit sets + + @prox + binary PROXIMITY operator + Set intersection of two atomic queries hit sets. In + addition, the intersection set is purged for all + documents which do not satisfy the requested query + term proximity. Usually a proper subset of the AND + operation. + +
KeywordOperatorDescription
@andbinary AND operatorSet intersection of two atomic queries hit sets
@orbinary OR operatorSet union of two atomic queries hit sets
@notbinary AND NOT operatorSet complement of two atomic queries hit sets
@proxbinary PROXIMY operatorSet intersection of two atomic queries hit sets. In - addition, the intersection set is purged for all - documents which do not satisfy the requested query - term proximity. Usually a proper subset of the AND - operation.
- + - For example, we can combine the terms - information and retrieval + For example, we can combine the terms + information and retrieval into different searches in the default index of the default attribute set as follows. Querying for the union of all documents containing the terms information OR - retrieval: + retrieval: Z> find @or information retrieval @@ -308,8 +293,8 @@ Querying for the intersection of all documents containing the terms information AND - retrieval: - The hit set is a subset of the coresponding + retrieval: + The hit set is a subset of the corresponding OR query. Z> find @and information retrieval @@ -319,114 +304,138 @@ Querying for the intersection of all documents containing the terms information AND retrieval, taking proximity into account: - The hit set is a subset of the coresponding - AND query. + The hit set is a subset of the corresponding + AND query + (see the &acro.pqf; grammar for + details on the proximity operator): Z> find @prox 0 3 0 2 k 2 information retrieval - See PQF grammer for details. Querying for the intersection of all documents containing the terms information AND retrieval, in the same order and near each - other as described in the term list - The hit set is a subset of the coresponding - PROXIMY query. + other as described in the term list. + The hit set is a subset of the corresponding + PROXIMITY query. Z> find "information retrieval" - - - - - Atomic queries (APT) +
+ + +
+ Atomic queries (&acro.apt;) - Atomic queries are the query parts which work on one acess point - only. These consist of an attribute list - followed by a single term or a - quoted term list, and are often called - Attributes-Plus-Terms (APT) queries. + Atomic queries are the query parts which work on one access point + only. These consist of an attribute list + followed by a single term or a + quoted term list, and are often called + Attributes-Plus-Terms (&acro.apt;) queries. - Unsupplied non-use attributes type 2-9 are either inherited from - higher nodes in the query tree, or are set to Zebra's default values. - See for details. + Atomic (&acro.apt;) queries are always leaf nodes in the &acro.pqf; query tree. + UN-supplied non-use attributes types 2-12 are either inherited from + higher nodes in the query tree, or are set to &zebra;'s default values. + See for details. - - - - + + Name + Type + Notes + + - - - - - - - - - - + inherited, are set to the default &zebra; configuration values. + + + + term + single term + or quoted term list + Here the search terms or list of search terms is added + to the query + +
Atomic queries
attribute listList of orthogonal attributesAny of the orthogonal attribute types may be omitted, + + attribute list + List of orthogonal attributes + Any of the orthogonal attribute types may be omitted, these are inherited from higher query tree nodes, or if not - inherited, are set to the default Zebra configuration values. -
termsingle term - or quoted term list Here the search terms or list of search terms is added - to the query
Querying for the term information in the - default index using the default attribite set, the server choice + default index using the default attribute set, the server choice of access point/index, and the default non-use attributes. - Z> find "information" + Z> find information Equivalent query fully specified including all default values: - Z> find @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 "information" + Z> find @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 information - + - Finding all documents which have empty titles. Notice that the - empty term must be quoted, but is otherwise legal. + Finding all documents which have the term + debussy in the title field. - Z> find @attr 1=4 "" + Z> find @attr 1=4 debussy - - - - + + The scan operation is only supported with + atomic &acro.apt; queries, as it is bound to one access point at a + time. Boolean query trees are not allowed during + scan. + + + + For example, we might want to scan the title index, starting with + the term + debussy, and displaying this and the + following terms in lexicographic order: + + Z> scan @attr 1=4 debussy + + +
+ + +
Named Result Sets - Named result sets are supported in Zebra, and result sets can be - used as operands without limitations. + Named result sets are supported in &zebra;, and result sets can be + used as operands without limitations. It follows that named + result sets are leaf nodes in the &acro.pqf; query tree, exactly as + atomic &acro.apt; queries are. - + After the execution of a search, the result set is available at the server, such that the client can use it for subsequent searches or retrieval requests. The Z30.50 standard actually - stresses the fact that result sets are voliatile. It may cease + stresses the fact that result sets are volatile. It may cease to exist at any time point after search, and the server will send a diagnostic to the effect that the requested result set does not exist any more. - + Defining a named result set and re-using it in the next query, - using yaz-client. + using yaz-client. Notice that the client, not + the server, assigns the string '1' to the + named result set. Z> f @attr 1=4 mozart ... @@ -435,39 +444,35 @@ Z> f @and @set 1 @attr 1=4 amadeus ... Number of hits: 14, setno 2 - ... - Z> f @attr 1=1016 beethoven - ... - Number of hits: 26, setno 3 - ... - + - Named result sets are only supported by the Z39.50 protocol. - The SRU web service is stateless, and therefore the notion of - named result sets does not exist when acessing a Zebra server by - the SRU protocol. + + Named result sets are only supported by the &acro.z3950; protocol. + The &acro.sru; web service is stateless, and therefore the notion of + named result sets does not exist when accessing a &zebra; server by + the &acro.sru; protocol. + - +
- - - Zebra's special use attribute type 1 of form 'string' +
+ &zebra;'s special access point of type 'string' - The numeric use (type 1) attribute is usually - refered to from a given - attribute set. In addition, Zebra let you use + The numeric use (type 1) attribute is usually + referred to from a given + attribute set. In addition, &zebra; let you use any internal index - name defined in your configuration - as use atribute value. This is a great feature for + name defined in your configuration + as use attribute value. This is a great feature for debugging, and when you do - not need the complecity of defined use attribute values. It is - the preferred way of accessing Zebra indexes directly. + not need the complexity of defined use attribute values. It is + the preferred way of accessing &zebra; indexes directly. Finding all documents which have the term list "information - retrieval" in an Zebra index, using it's internal full string + retrieval" in an &zebra; index, using its internal full string name. Scanning the same index. Z> find @attr 1=sometext "information retrieval" @@ -476,7 +481,7 @@ Searching or scanning - the bib-1 use attribute 54 using it's string name: + the bib-1 use attribute 54 using its string name: Z> find @attr 1=Code-language eng Z> scan @attr 1=Code-language "" @@ -485,7 +490,7 @@ It is possible to search in any silly string index - if it's defined in your - indexation rules and can be parsed by the PQF parser. + indexing rules and can be parsed by the &acro.pqf; parser. This is definitely not the recommended use of this facility, as it might confuse your users with some very unexpected results. @@ -494,152 +499,155 @@ - See also for details, and - - for the SRU PQF query extention using string names as a fast + See also for details, and + + for the &acro.sru; &acro.pqf; query extension using string names as a fast debugging facility. - - - - Zebra's special use attribute type 1 of form 'XPath' - for GRS filters +
+ +
+ &zebra;'s special access point of type 'XPath' + for &acro.grs1; filters As we have seen above, it is possible (albeit seldom a great - idea) to emulate + idea) to emulate XPath 1.0 based - search by defining use (type 1) - string attributes which in appearence + search by defining use (type 1) + string attributes which in appearance resemble XPath queries. There are two problems with this approach: first, the XPath-look-alike has to - be defined at indexation time, no new undefined + be defined at indexing time, no new undefined XPath queries can entered at search time, and second, it might confuse users very much that an XPath-alike index name in fact - gets populated from a possible entirely different XML element - than it pretends to access. + gets populated from a possible entirely different &acro.xml; element + than it pretends to access. - When using the GRS Record Model - (see ), we have the - possibility to embed life + When using the &acro.grs1; Record Model + (see ), we have the + possibility to embed life XPath expressions - in the PQF queries, which are here called - use (type 1) xpath - attributes. You must enable the - xpath enable directive in your - .abs config files. + in the &acro.pqf; queries, which are here called + use (type 1) xpath + attributes. You must enable the + xpath enable directive in your + .abs configuration files. - Only a very restricted subset of the - XPath 1.0 - standard is supported as the GRS record model is simpler than - a full XML DOM structure. See the following examples for - possibilities. + + Only a very restricted subset of the + XPath 1.0 + standard is supported as the &acro.grs1; record model is simpler than + a full &acro.xml; &acro.dom; structure. See the following examples for + possibilities. + - Finding all documents which have the term "content" - inside a text node found in a specific XML DOM - subtree, whose starting element is - adressed by XPath. + Finding all documents which have the term "content" + inside a text node found in a specific &acro.xml; &acro.dom; + subtree, whose starting element is + addressed by XPath. - Z> find @attr 1=/root content + Z> find @attr 1=/root content Z> find @attr 1=/root/first content Notice that the XPath must be absolute, i.e., must start with '/', and that the - XPath decendant-or-self axis followed by a + XPath descendant-or-self axis followed by a text node selection text() is implicitly appended to the stated XPath. It follows that the above searches are interpreted as: - Z> find @attr 1=/root//text() content + Z> find @attr 1=/root//text() content Z> find @attr 1=/root/first//text() content - + Searching inside attribute strings is possible: - Z> find @attr 1=/link/@creator morten + Z> find @attr 1=/link/@creator morten - + - - Filter the adressing XPath by a predicate working on exact + + Filter the addressing XPath by a predicate working on exact string values in - attributes (in the XML sense) can be done: return all those docs which - have the term "english" contained in one of all text subnodes of + attributes (in the &acro.xml; sense) can be done: return all those docs which + have the term "english" contained in one of all text sub nodes of the subtree defined by the XPath /record/title[@lang='en']. And similar predicate filtering. Z> find @attr 1=/record/title[@lang='en'] english Z> find @attr 1=/link[@creator='sisse'] sibelius - Z> find @attr 1=/link[@creator='sisse']/description[@xml:lang='da'] sibelius + Z> find @attr 1=/link[@creator='sisse']/description[@xml:lang='da'] sibelius - - - Combining numeric indexes, boolean expressions, + + + Combining numeric indexes, boolean expressions, and xpath based searches is possible: Z> find @attr 1=/record/title @and foo bar Z> find @and @attr 1=/record/title foo @attr 1=4 bar - + - Escaping PQF keywords and other non-parseable XPath constructs - with '{ }' to prevent syntax errors: + Escaping &acro.pqf; keywords and other non-parseable XPath constructs + with '{ }' to prevent client-side &acro.pqf; parsing + syntax errors: Z> find @attr {1=/root/first[@attr='danish']} content Z> find @attr {1=/record/@set} oai - It is worth mentioning that these dynamic performed XPath - queries are a performance bottelneck, as no optimized - specialized indexes can be used. Therefore, avoid the use of - this facility when speed is essential, and the database content - size is medium to large. + + It is worth mentioning that these dynamic performed XPath + queries are a performance bottleneck, as no optimized + specialized indexes can be used. Therefore, avoid the use of + this facility when speed is essential, and the database content + size is medium to large. + +
+
- - -
- - +
Explain Attribute Set - The Z39.50 standard defines the - Explainattribute set - exp-1, which is used to discover information + The &acro.z3950; standard defines the + Explain attribute set + Exp-1, which is used to discover information about a server's search semantics and functional capabilities - Zebra exposes a "classic" + &zebra; exposes a "classic" Explain database by base name IR-Explain-1, which - is populated with system internal information. + is populated with system internal information. - - The attribute-set exp-1 consists of a single - Use (type 1) attribute. + + The attribute-set exp-1 consists of a single + use attribute (type 1). In addition, the non-Use - bib-1 attributes, that is, the types - Relation, Position, - Structure, Truncation, - and Completeness are imported from - the bib-1 attribute set, and may be used - within any explain query. + &acro.bib1; attributes, that is, the types + Relation, Position, + Structure, Truncation, + and Completeness are imported from + the &acro.bib1; attribute set, and may be used + within any explain query. - - - Use Attributes (type = 1) - - The following Explain search atributes are supported: - ExplainCategory (@attr 1=1), - DatabaseName (@attr 1=3), - DateAdded (@attr 1=9), + +
+ Use Attributes (type = 1) + + The following Explain search attributes are supported: + ExplainCategory (@attr 1=1), + DatabaseName (@attr 1=3), + DateAdded (@attr 1=9), DateChanged(@attr 1=10). @@ -649,26 +657,26 @@ DatabaseInfo, AttributeDetails. - See tab/explain.att and the - Z39.50 standard + See tab/explain.att and the + &acro.z3950; standard for more information. - - - +
+ +
Explain searches with yaz-client Classic Explain only defines retrieval of Explain information - via ASN.1. Pratically no Z39.50 clients supports this. Fortunately - they don't have to - Zebra allows retrieval of this information + via ASN.1. Practically no &acro.z3950; clients supports this. Fortunately + they don't have to - &zebra; allows retrieval of this information in other formats: - SUTRS, XML, - GRS-1 and ASN.1 Explain. + &acro.sutrs;, &acro.xml;, + &acro.grs1; and ASN.1 Explain. - + List supported categories to find out which explain commands are - supported: + supported: Z> base IR-Explain-1 Z> find @attr exp1 1=1 categorylist @@ -676,7 +684,7 @@ Z> show 1+2 - + Get target info, that is, investigate which databases exist at this server endpoint: @@ -691,7 +699,7 @@ Z> show 1+1 - + List all supported databases, the number of hits is the number of databases found, which most commonly are the @@ -705,7 +713,7 @@ Z> show 1+2 - + Get database info record for database Default. @@ -718,13 +726,13 @@ Z> find @attrset exp1 @and @attr 1=1 databaseinfo @attr 1=3 Default - + Get attribute details record for database Default. - This query is very useful to study the internal Zebra indexes. + This query is very useful to study the internal &zebra; indexes. If records have been indexed using the alvis - XSLT filter, the string representation names of the known indexes can be + &acro.xslt; filter, the string representation names of the known indexes can be found. Z> base IR-Explain-1 @@ -736,451 +744,528 @@ Z> find @attrset exp1 @and @attr 1=1 attributedetails @attr 1=3 Default - - - - - - Bib1 Attribute Set +
+ +
+ +
+ &acro.bib1; Attribute Set Most of the information contained in this section is an excerpt of - the ATTRIBUTE SET BIB-1 (Z39.50-1995) - SEMANTICS, - found at . The BIB-1 - Attribute Set Semantics from 1995, also in an updated - Bib-1 - Attribute Set + the ATTRIBUTE SET &acro.bib1; (&acro.z3950;-1995) SEMANTICS + found at . The &acro.bib1; + Attribute Set Semantics from 1995, also in an updated + &acro.bib1; + Attribute Set version from 2003. Index Data is not the copyright holder of this information, except for the configuration details, the listing of - Zebra's capabilities, and the example queries. + &zebra;'s capabilities, and the example queries. - - - + + +
Use Attributes (type 1) - - A use attribute specifies an access point for any atomic query. - These acess points are highly dependent on the attribute set used - in the query, and are user configurable using the following - default configuration files: - tab/bib1.att, - tab/dan1.att, - tab/explain.att, and - tab/gils.att. - New attribute sets can be added by adding new - tab/*.att configuration files, which need to - be sourced in the main configuration zebra.cfg. + + A use attribute specifies an access point for any atomic query. + These access points are highly dependent on the attribute set used + in the query, and are user configurable using the following + default configuration files: + tab/bib1.att, + tab/dan1.att, + tab/explain.att, and + tab/gils.att. + + + For example, some few &acro.bib1; use + attributes from the tab/bib1.att are: + + att 1 Personal-name + att 2 Corporate-name + att 3 Conference-name + att 4 Title + ... + att 1009 Subject-name-personal + att 1010 Body-of-text + att 1011 Date/time-added-to-db + ... + att 1016 Any + att 1017 Server-choice + att 1018 Publisher + ... + att 1035 Anywhere + att 1036 Author-Title-Subject + + + + New attribute sets can be added by adding new + tab/*.att configuration files, which need to + be sourced in the main configuration zebra.cfg. + + + In addition, &zebra; allows the access of + internal index names and dynamic + XPath as use attributes; see + and + . - - In addition, Zebra allows the acess of - internal index names and dynamic - XPath as use attributes; see - and - . - + + Phrase search for information retrieval in + the title-register, scanning the same register afterwards: + + Z> find @attr 1=4 "information retrieval" + Z> scan @attr 1=4 information + + +
- - Phrase search for information retrieval in - the title-register, scanning the same register afterwards: - - Z> find @attr 1=4 "information retrieval" - Z> scan @attr 1=4 information - - -
+
-
+
+ &zebra; general Bib1 Non-Use Attributes (type 2-6) - - Zebra general Bib1 Non-Use Attributes (type 2-6) - - +
Relation Attributes (type 2) - + Relation attributes describe the relationship of the access - point (left side + point (left side of the relation) to the search term as qualified by the attributes (right side of the relation), e.g., Date-publication <= 1975. - - - + - - - - - - - +
Relation Attributes (type 2)
RelationValueNotes
+ Relation Attributes (type 2) + + + + Relation + Value + Notes + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + Less than + 1 + supported + + + Less than or equal + 2 + supported + + + Equal + 3 + default + + + Greater or equal + 4 + supported + + + Greater than + 5 + supported + + + Not equal + 6 + unsupported + + + Phonetic + 100 + unsupported + + + Stem + 101 + unsupported + + + Relevance + 102 + supported + + + AlwaysMatches + 103 + supported * + +
Less than1supported
Less than or equal2supported
Equal3default
Greater or equal4supported
Greater than5supported
Not equal6unsupported
Phonetic100unsupported
Stem101unsupported
Relevance102supported
AlwaysMatches103unsupported
+ + + AlwaysMatches searches are only supported if alwaysmatches indexing + has been enabled. See + + + + + The relation attributes 1-5 are supported and work exactly as + expected. + All ordering operations are based on a lexicographical ordering, + except when the + structure attribute numeric (109) is used. In + this case, ordering is numerical. See + . + + Z> find @attr 1=Title @attr 2=1 music + ... + Number of hits: 11745, setno 1 + ... + Z> find @attr 1=Title @attr 2=2 music + ... + Number of hits: 11771, setno 2 + ... + Z> find @attr 1=Title @attr 2=3 music + ... + Number of hits: 532, setno 3 + ... + Z> find @attr 1=Title @attr 2=4 music + ... + Number of hits: 11463, setno 4 + ... + Z> find @attr 1=Title @attr 2=5 music + ... + Number of hits: 11419, setno 5 + + - The relation attribute - relevance (102) is supported, see + The relation attribute + Relevance (102) is supported, see for full information. - - - - All ordering operations are based on a lexicographical ordering, - expect when the - structure attribute numeric (109) is used. In - this case, ordering is numerical. See - . - - Ranked search for information retrieval in - the title-register: - - Z> find @attr 1=4 @attr 2=102 "information retrieval" - - - + Ranked search for information retrieval in + the title-register: + + Z> find @attr 1=4 @attr 2=102 "information retrieval" + + + + + The relation attribute + AlwaysMatches (103) is in the default + configuration + supported in conjecture with structure attribute + Phrase (1) (which may be omitted by + default). + It can be configured to work with other structure attributes, + see the configuration file + tab/default.idx and + . + + + AlwaysMatches (103) is a + great way to discover how many documents have been indexed in a + given field. The search term is ignored, but needed for correct + &acro.pqf; syntax. An empty search term may be supplied. + + Z> find @attr 1=Title @attr 2=103 "" + Z> find @attr 1=Title @attr 2=103 @attr 4=1 "" + + + - +
+ +
Position Attributes (type 3) - + The position attribute specifies the location of the search term within the field or subfield in which it appears. - - - - - - - - - +
Position Attributes (type 3)
PositionValueNotes
+ Position Attributes (type 3) + + + + Position + Value + Notes + - - - - - - - - - - - - - - - + + First in field + 1 + supported * + + + First in subfield + 2 + supported * + + + Any position in field + 3 + default + +
First in field 1unsupported
First in subfield2unsupported
Any position in field3default
- - - The position attribute values first in field (1), - and first in subfield(2) are unsupported. - Using them does not trigger an error, but silent defaults to - any position in field (3). - + + + + &zebra; only supports first-in-field seaches if the + firstinfield is enabled for the index + Refer to . + &zebra; does not distinguish between first in field and + first in subfield. They result in the same hit count. + Searching for first position in (sub)field in only supported in &zebra; + 2.0.2 and later. - - - + +
+ +
Structure Attributes (type 4) - + The structure attribute specifies the type of search term. This causes the search to be mapped on - different Zebra internal indexes, which must have been defined - at index time. + different &zebra; internal indexes, which must have been defined + at index time. - - The possible values of the + + The possible values of the structure attribute (type 4) can be defined - using the configuration file - tab/default.idx. - The default configuration is summerized in this table. + using the configuration file tab/default.idx. + The default configuration is summarized in this table. - - - - - - - - - +
Structure Attributes (type 4)
StructureValueNotes
+ Structure Attributes (type 4) + + + + Structure + Value + Notes + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + Phrase + 1 + default + + + Word + 2 + supported + + + Key + 3 + supported + + + Year + 4 + supported + + + Date (normalized) + 5 + supported + + + Word list + 6 + supported + + + Date (un-normalized) + 100 + unsupported + + + Name (normalized) + 101 + unsupported + + + Name (un-normalized) + 102 + unsupported + + + Structure + 103 + unsupported + + + Urx + 104 + supported + + + Free-form-text + 105 + supported + + + Document-text + 106 + supported + + + Local-number + 107 + supported + + + String + 108 + unsupported + + + Numeric string + 109 + supported + +
Phrase 1default
Word2supported
Key3supported
Year4supported
Date (normalized)5supported
Word list6supported
Date (un-normalized)100unsupported
Name (normalized) 101unsupported
Name (un-normalized) 102unsupported
Structure103unsupported
Urx104supported
Free-form-text105supported
Document-text106supported
Local-number107supported
String108unsupported
Numeric string109supported
- + + The structure attribute values + Word list (6) + is supported, and maps to the boolean AND + combination of words supplied. The word list is useful when + Google-like bag-of-word queries need to be translated from a GUI + query language to &acro.pqf;. For example, the following queries + are equivalent: + + Z> find @attr 1=Title @attr 4=6 "mozart amadeus" + Z> find @attr 1=Title @and mozart amadeus + + - - The structure attribute values - Word list (6) - is supported, and maps to the boolean AND - combination of words supplied. The word list is useful when - google-like bag-of-word queries need to be translated from a GUI - query language to PQF. For example, the following queries - are equivalent: - - Z> find @attr 1=Title @attr 4=6 "mozart amadeus" - Z> find @attr 1=Title @and mozart amadeus - - + + The structure attribute value + Free-form-text (105) and + Document-text (106) + are supported, and map both to the boolean OR + combination of words supplied. The following queries + are equivalent: + + Z> find @attr 1=Body-of-text @attr 4=105 "bach salieri teleman" + Z> find @attr 1=Body-of-text @attr 4=106 "bach salieri teleman" + Z> find @attr 1=Body-of-text @or bach @or salieri teleman + + This OR list of terms is very useful in + combination with relevance ranking: + + Z> find @attr 1=Body-of-text @attr 2=102 @attr 4=105 "bach salieri teleman" + + - - The structure attribute value - Free-form-text (105) and - Document-text (106) - are supported, and map both to the boolean OR - combination of words supplied. The following queries - are equivalent: - - Z> find @attr 1=Body-of-text @attr 4=105 "bach salieri teleman" - Z> find @attr 1=Body-of-text @attr 4=106 "bach salieri teleman" - Z> find @attr 1=Body-of-text @or bach @or salieri teleman - - This OR list of terms is very usefull in - combination with relevance ranking: - - Z> find @attr 1=Body-of-text @attr 2=102 @attr 4=105 "bach salieri teleman" - - - - - The structure attribute value - Local number (107) - is supported, and maps always to the Zebra internal document ID, - irrespectively which use attribute is specified. The following queries - have exactly the same unique record in the hit set: - - Z> find @attr 4=107 10 - Z> find @attr 1=4 @attr 4=107 10 - Z> find @attr 1=1010 @attr 4=107 10 - - + + The structure attribute value + Local number (107) + is supported, and maps always to the &zebra; internal document ID, + irrespectively which use attribute is specified. The following queries + have exactly the same unique record in the hit set: + + Z> find @attr 4=107 10 + Z> find @attr 1=4 @attr 4=107 10 + Z> find @attr 1=1010 @attr 4=107 10 + + - - In - the GILS schema (gils.abs), the - west-bounding-coordinate is indexed as type n, - and is therefore searched by specifying - structure=Numeric String. - To match all those records with west-bounding-coordinate greater - than -114 we use the following query: - - Z> find @attr 4=109 @attr 2=5 @attr gils 1=2038 -114 - - + + In + the GILS schema (gils.abs), the + west-bounding-coordinate is indexed as type n, + and is therefore searched by specifying + structure=Numeric String. + To match all those records with west-bounding-coordinate greater + than -114 we use the following query: + + Z> find @attr 4=109 @attr 2=5 @attr gils 1=2038 -114 + + - The exact mapping between PQF queries and Zebra internal indexes - and index types is explained in - . - + + The exact mapping between &acro.pqf; queries and &zebra; internal indexes + and index types is explained in + . + + +
-
- +
Truncation Attributes (type = 5) The truncation attribute specifies whether variations of one or - more characters are allowed between serch term and hit terms, or + more characters are allowed between search term and hit terms, or not. Using non-default truncation attributes will broaden the document hit set of a search query. - - - - - - - - - +
Truncation Attributes (type 5)
TruncationValueNotes
+ Truncation Attributes (type 5) + + + + Truncation + Value + Notes + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + Right truncation + 1 + supported + + + Left truncation + 2 + supported + + + Left and right truncation + 3 + supported + + + Do not truncate + 100 + default + + + Process # in search term + 101 + supported + + + RegExpr-1 + 102 + supported + + + RegExpr-2 + 103 + supported + +
Right truncation 1supported
Left truncation2supported
Left and right truncation3supported
Do not truncate100default
Process # in search term101supported
RegExpr-1 102supported
RegExpr-2103supported
@@ -1204,14 +1289,14 @@ ... Number of hits: 95, setno 8 - + - The truncation attribute value + The truncation attribute value Process # in search term (101) is a poor-man's regular expression search. It maps each # to .*, and - performes then a Regexp-1 (102) regular + performs then a Regexp-1 (102) regular expression search. The following two queries are equivalent: Z> find @attr 1=Body-of-text @attr 5=101 schnit#ke @@ -1220,10 +1305,10 @@ Number of hits: 89, setno 10 - + - The truncation attribute value - Regexp-1 (102) is a normal regular search, + The truncation attribute value + Regexp-1 (102) is a normal regular search, see for details. Z> find @attr 1=Body-of-text @attr 5=102 schnit+ke @@ -1232,13 +1317,13 @@ - The truncation attribute value - Regexp-2 (103) is a Zebra specific extention + The truncation attribute value + Regexp-2 (103) is a &zebra; specific extension which allows fuzzy matches. One single error in spelling of search terms is allowed, i.e., a document is hit if it includes a term which can be mapped to the used search term by one character substitution, addition, deletion or - change of posiiton. + change of position. Z> find @attr 1=Body-of-text @attr 5=100 schnittke ... @@ -1249,478 +1334,587 @@ Number of hits: 103, setno 15 ... - - - - - Completeness Attributes (type = 6) + +
+ +
+ Completeness Attributes (type = 6) The Completeness Attributes (type = 6) - is used to specify that a given search term or term list is either - part of the terms of a given index/field + is used to specify that a given search term or term list is either + part of the terms of a given index/field (Incomplete subfield (1)), or is what literally is found in the entire field's index (Complete field (3)). - + - - - - - - - - +
Completeness Attributes (type = 6)
CompletenessValueNotes
+ Completeness Attributes (type = 6) + + + + Completeness + Value + Notes + - - - - - - - - - - - - - - - + + Incomplete subfield + 1 + default + + + Complete subfield + 2 + deprecated + + + Complete field + 3 + supported + +
Incomplete subfield1default
Complete subfield2depreciated
Complete field3supported
The Completeness Attributes (type = 6) is only partially and conditionally supported in the sense that it is ignored if the hit index is - not of structure type="w" or + not of structure type="w" or type="p". - + Incomplete subfield (1) is the default, and - makes Zebra use - register type="w", whereas + makes &zebra; use + register type="w", whereas Complete field (3) triggers search and scan in index type="p". - The Complete subfield (2) is a reminiscens - from the happy MARC - binary format days. Zebra does not support it, but maps silently + The Complete subfield (2) is a reminiscent + from the happy &acro.marc; + binary format days. &zebra; does not support it, but maps silently to Complete field (3). - The exact mapping between PQF queries and Zebra internal indexes - and index types is explained in - . - - - - - - - - - Advanced Zebra PQF Features + + The exact mapping between &acro.pqf; queries and &zebra; internal indexes + and index types is explained in + . + + +
+ +
+ +
+ +
+ Extended &zebra; &acro.rpn; Features - The Zebra internal query engine has been extended to specific needs + The &zebra; internal query engine has been extended to specific needs not covered by the bib-1 attribute set query - model. These extentions are non-standard - and non-portable: most functional extentions + model. These extensions are non-standard + and non-portable: most functional extensions are modeled over the bib-1 attribute set, - defining type 7-9 attributes. - There are also the speciel + defining type 7 and higher values. + There are also the special string type index names for the - idxpath attribute set. + idxpath attribute set. - - - - Zebra specific Search Extentions to all Attribute Sets - - Zebra extends the Bib1 attribute types, and these extentions are - recognized regardless of attribute - set used in a search operation query. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Zebra Search Attribute Extentions
NameValueOperationZebra version
Embedded Sort7search1.1
Term Set8search1.1
Rank Weight9search1.1
Approx Limit9search1.4
Term Reference10search1.4
- - Zebra Extention Embedded Sort Attribute (type 7) - +
+ &zebra; specific retrieval of all records - The embedded sort is a way to specify sort within a query - thus - removing the need to send a Sort Request separately. It is both - faster and does not require clients to deal with the Sort - Facility. + &zebra; defines a hardwired string index name + called _ALLRECORDS. It matches any record + contained in the database, if used in conjunction with + the relation attribute + AlwaysMatches (103). - The possible values after attribute type 7 are - 1 ascending and - 2 descending. - The attributes+term (APT) node is separate from the - rest and must be @or'ed. - The term associated with APT is the sorting level in integers, - where 0 means primary sort, - 1 means secondary sort, and so forth. - See also . - - - For example, searching for water, sort by title (ascending) + The _ALLRECORDS index name is used for total database + export. The search term is ignored, it may be empty. - Z> find @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 + Z> find @attr 1=_ALLRECORDS @attr 2=103 "" - Or, searching for water, sort by title ascending, then date descending + Combination with other index types can be made. For example, to + find all records which are not indexed in + the Title register, issue one of the two + equivalent queries: - Z> find @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1 - - - - - Zebra Extention Term Set Attribute (type 8) - - - The Term Set feature is a facility that allows a search to store - hitting terms in a "pseudo" resultset; thus a search (as usual) + - a scan-like facility. Requires a client that can do named result - sets since the search generates two result sets. The value for - attribute 8 is the name of a result set (string). The terms in - the named term set are returned as SUTRS records. - - - For example, searching for u in title, right truncated, and - storing the result in term set named 'aset' - - Z> find @attr 5=1 @attr 1=4 @attr 8=aset u + Z> find @not @attr 1=_ALLRECORDS @attr 2=103 "" @attr 1=Title @attr 2=103 "" + Z> find @not @attr 1=_ALLRECORDS @attr 2=103 "" @attr 1=4 @attr 2=103 "" - The model has one serious flaw: we don't know the size of term - set. Experimental. Do not use in production code. + + The special string index _ALLRECORDS is + experimental, and the provided functionality and syntax may very + well change in future releases of &zebra;. + +
- - Zebra Extention Rank Weight Attribute (type 9) - - - Rank weight is a way to pass a value to a ranking algorithm - so - that one APT has one value - while another as a different one. - See also . - + - - Zebra Extention Approximative Limit (type 9) - - - The Zebra Extention Approximative Limit (type - 9) is a way to enable approx - hit counts for scan hit counts, in the same - way as for search hit counts. - - - - Experimental and buggy. Definitely not to be used in production code. - + + &zebra; Scan Attribute Extensions + + + + Name + Type + Operation + &zebra; version + + + + + Result Set Narrow + 8 + scan + 1.3 + + + Approximative Limit + 12 + scan + 2.0.20 + + + +
+ +
+ &zebra; Extension Result Set Narrow (type 8) + + If attribute Result Set Narrow (type 8) + is given for scan, the value is the name of a + result set. Each hit count in scan is + @and'ed with the result set given. + + + Consider for example + the case of scanning all title fields around the + scanterm mozart, then refining the scan by + issuing a filtering query for amadeus to + restrict the scan to the result set of the query: + + Z> scan @attr 1=4 mozart + ... + * mozart (43) + mozartforskningen (1) + mozartiana (1) + mozarts (16) + ... + Z> f @attr 1=4 amadeus + ... + Number of hits: 15, setno 2 + ... + Z> scan @attr 1=4 @attr 8=2 mozart + ... + * mozart (14) + mozartforskningen (0) + mozartiana (0) + mozarts (1) + ... + + + + + &zebra; 2.0.2 and later is able to skip 0 hit counts. This, however, + is known not to scale if the number of terms to skip is high. + This most likely will happen if the result set is small (and + result in many 0 hits). + +
+
+ &zebra; Extension Approximative Limit (type 12) + + The &zebra; Extension Approximative Limit (type 12) is a way to + enable approximate hit counts for scan hit counts, in the same + way as for search hit counts. + +
+
- - - - - Zebra special IDXPATH Attribute Set for GRS indexing +
+ &zebra; special &acro.idxpath; Attribute Set for &acro.grs1; indexing - The attribute-set idxpath consists of a single - Use (type 1) attribute. All non-use attributes - behave as normal. + The attribute-set idxpath consists of a single + Use (type 1) attribute. All non-use attributes behave as normal. This feature is enabled when defining the - xpath enable option in the GRS filter - *.abs configuration files. If one wants to use + xpath enable option in the &acro.grs1; filter + *.abs configuration files. If one wants to use the special idxpath numeric attribute set, the - main Zebra configuraiton file zebra.cfg + main &zebra; configuration file zebra.cfg directive attset: idxpath.att must be enabled. - The idxpath is depreciated, may not be - supported in future Zebra versions, and should definitely - not be used in production code. + + + The idxpath is deprecated, may not be + supported in future &zebra; versions, and should definitely + not be used in production code. + - - IDXPATH Use Attributes (type = 1) +
+ &acro.idxpath; Use Attributes (type = 1) - This attribute set allows one to search GRS filter indexed - records by XPATH like structured index names. + This attribute set allows one to search &acro.grs1; filter indexed + records by &acro.xpath; like structured index names. - The idxpath option defines hard-coded - index names, which might clash with your own index names. + + + The idxpath option defines hard-coded + index names, which might clash with your own index names. + - - - - - - - - - - +
Zebra specific IDXPATH Use Attributes (type 1)
IDXPATHValueString IndexNotes
+ &zebra; specific &acro.idxpath; Use Attributes (type 1) + + + + &acro.idxpath; + Value + String Index + Notes + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + &acro.xpath; Begin + 1 + _XPATH_BEGIN + deprecated + + + &acro.xpath; End + 2 + _XPATH_END + deprecated + + + &acro.xpath; CData + 1016 + _XPATH_CDATA + deprecated + + + &acro.xpath; Attribute Name + 3 + _XPATH_ATTR_NAME + deprecated + + + &acro.xpath; Attribute CData + 1015 + _XPATH_ATTR_CDATA + deprecated + +
XPATH Begin1_XPATH_BEGINdepreciated
XPATH End2_XPATH_ENDdepreciated
XPATH CData1016_XPATH_CDATAdepreciated
XPATH Attribute Name3_XPATH_ATTR_NAMEdepreciated
XPATH Attribute CData1015_XPATH_ATTR_CDATAdepreciated
- See tab/idxpath.att for more information. - Search for all documents starting with root element + Search for all documents starting with root element /root (either using the numeric or the string use attributes): - Z> find @attrset idxpath @attr 1=1 @attr 4=3 root/ - Z> find @attr idxpath 1=1 @attr 4=3 root/ - Z> find @attr 1=_XPATH_BEGIN @attr 4=3 root/ + Z> find @attrset idxpath @attr 1=1 @attr 4=3 root/ + Z> find @attr idxpath 1=1 @attr 4=3 root/ + Z> find @attr 1=_XPATH_BEGIN @attr 4=3 root/ - Search for all documents where specific nested XPATH + Search for all documents where specific nested &acro.xpath; /c1/c2/../cn exists. Notice the very counter-intuitive reverse notation! - Z> find @attrset idxpath @attr 1=1 @attr 4=3 cn/cn-1/../c1/ - Z> find @attr 1=_XPATH_BEGIN @attr 4=3 cn/cn-1/../c1/ + Z> find @attrset idxpath @attr 1=1 @attr 4=3 cn/cn-1/../c1/ + Z> find @attr 1=_XPATH_BEGIN @attr 4=3 cn/cn-1/../c1/ @@ -1731,23 +1925,23 @@ - Search for CDATA string anothertext in any - attribute: - + Search for CDATA string anothertext in any + attribute: + Z> find @attrset idxpath @attr 1=1015 anothertext Z> find @attr 1=_XPATH_ATTR_CDATA anothertext - Search for all documents with have an XML element node - including an XML attribute named creator - - Z> find @attrset idxpath @attr 1=3 @attr 4=3 creator - Z> find @attr 1=_XPATH_ATTR_NAME @attr 4=3 creator + Search for all documents with have an &acro.xml; element node + including an &acro.xml; attribute named creator + + Z> find @attrset idxpath @attr 1=3 @attr 4=3 creator + Z> find @attr 1=_XPATH_ATTR_NAME @attr 4=3 creator - Combining usual bib-1 attribut set searches + Combining usual bib-1 attribute set searches with idxpath attribute set searches: Z> find @and @attr idxpath 1=1 @attr 4=3 link/ @attr 1=4 mozart @@ -1755,9 +1949,9 @@ - Scanning is supportet on all idxpath + Scanning is supported on all idxpath indexes, both specified as numeric use attributes, or as string - index names. + index names. Z> scan @attrset idxpath @attr 1=1016 text Z> scan @attr 1=_XPATH_ATTR_CDATA anothertext @@ -1765,190 +1959,371 @@ - - +
+
- - Mapping from Bib1 Attributes to Zebra internal + <section id="querymodel-pqf-apt-mapping"> + <title>Mapping from &acro.pqf; atomic &acro.apt; queries to &zebra; internal register indexes - TO-DO - - - - - - - - Use attributes are interpreted according to the - attribute sets which have been loaded in the - zebra.cfg file, and are matched against specific - fields as specified in the .abs file which - describes the profile of the records which have been loaded. - If no Use attribute is provided, a default of Bib-1 Any is assumed. - - - - If a Structure attribute of - Phrase is used in conjunction with a - Completeness attribute of - Complete (Sub)field, the term is matched - against the contents of the phrase (long word) register, if one - exists for the given Use attribute. - A phrase register is created for those fields in the - .abs file that contains a - p-specifier. - + The rules for &acro.pqf; &acro.apt; mapping are rather tricky to grasp in the + first place. We deal first with the rules for deciding which + internal register or string index to use, according to the use + attribute or access point specified in the query. Thereafter we + deal with the rules for determining the correct structure type of + the named register. - - If Structure=Phrase is - used in conjunction with Incomplete Field - the - default value for Completeness, the - search is directed against the normal word registers, but if the term - contains multiple words, the term will only match if all of the words - are found immediately adjacent, and in the given order. - The word search is performed on those fields that are indexed as - type w in the .abs file. - +
+ Mapping of &acro.pqf; &acro.apt; access points + + &zebra; understands four fundamental different types of access + points, of which only the + numeric use attribute type access points + are defined by the &acro.z3950; + standard. + All other access point types are &zebra; specific, and non-portable. + - - If the Structure attribute is - Word List, - Free-form Text, or - Document Text, the term is treated as a - natural-language, relevance-ranked query. - This search type uses the word register, i.e. those fields - that are indexed as type w in the - .abs file. - + + Access point name mapping + + + + Access Point + Type + Grammar + Notes + + + + + Use attribute + numeric + [1-9][1-9]* + directly mapped to string index name + + + String index name + string + [a-zA-Z](\-?[a-zA-Z0-9])* + normalized name is used as internal string index name + + + &zebra; internal index name + zebra + _[a-zA-Z](_?[a-zA-Z0-9])* + hardwired internal string index name + + + &acro.xpath; special index + XPath + /.* + special xpath search for &acro.grs1; indexed records + + + +
- - If the Structure attribute is - Numeric String the term is treated as an integer. - The search is performed on those fields that are indexed - as type n in the .abs file. - + + Attribute set names and + string index names are normalizes + according to the following rules: all single + hyphens '-' are stripped, and all upper case + letters are folded to lower case. + - - If the Structure attribute is - URx the term is treated as a URX (URL) entity. - The search is performed on those fields that are indexed as type - u in the .abs file. - + + Numeric use attributes are mapped + to the &zebra; internal + string index according to the attribute set definition in use. + The default attribute set is &acro.bib1;, and may be + omitted in the &acro.pqf; query. + - - If the Structure attribute is - Local Number the term is treated as - native Zebra Record Identifier. - + + According to normalization and numeric + use attribute mapping, it follows that the following + &acro.pqf; queries are considered equivalent (assuming the default + configuration has not been altered): + + Z> find @attr 1=Body-of-text serenade + Z> find @attr 1=bodyoftext serenade + Z> find @attr 1=BodyOfText serenade + Z> find @attr 1=bO-d-Y-of-tE-x-t serenade + Z> find @attr 1=1010 serenade + Z> find @attrset bib1 @attr 1=1010 serenade + Z> find @attrset bib1 @attr 1=1010 serenade + Z> find @attrset Bib1 @attr 1=1010 serenade + Z> find @attrset b-I-b-1 @attr 1=1010 serenade + + - - If the Relation attribute is - Equals (default), the term is matched - in a normal fashion (modulo truncation and processing of - individual words, if required). - If Relation is Less Than, - Less Than or Equal, - Greater than, or Greater than or - Equal, the term is assumed to be numerical, and a - standard regular expression is constructed to match the given - expression. - If Relation is Relevance, - the standard natural-language query processor is invoked. - + + The numerical + use attributes (type 1) + are interpreted according to the + attribute sets which have been loaded in the + zebra.cfg file, and are matched against specific + fields as specified in the .abs file which + describes the profile of the records which have been loaded. + If no use attribute is provided, a default of + &acro.bib1; Use Any (1016) is assumed. + The predefined use attribute sets + can be reconfigured by tweaking the configuration files + tab/*.att, and + new attribute sets can be defined by adding similar files in the + configuration path profilePath of the server. + + + + String indexes can be accessed directly, + independently which attribute set is in use. These are just + ignored. The above mentioned name normalization applies. + String index names are defined in the + used indexing filter configuration files, for example in the + &acro.grs1; + *.abs configuration files, or in the + alvis filter &acro.xslt; indexing stylesheets. + + + + &zebra; internal indexes can be accessed directly, + according to the same rules as the user defined + string indexes. The only difference is that + &zebra; internal index names are hardwired, + all uppercase and + must start with the character '_'. + + + + Finally, &acro.xpath; access points are only + available using the &acro.grs1; filter for indexing. + These access point names must start with the character + '/', they are not + normalized, but passed unaltered to the &zebra; internal + &acro.xpath; engine. See . + + + + +
- - For the Truncation attribute, - No Truncation is the default. - Left Truncation is not supported. - Process # in search term is supported, as is - Regxp-1. - Regxp-2 enables the fault-tolerant (fuzzy) - search. As a default, a single error (deletion, insertion, - replacement) is accepted when terms are matched against the register - contents. - -
- - Zebra Regular Expressions in Truncation Attribute (type = 5) - +
+ Mapping of &acro.pqf; &acro.apt; structure and completeness to + register type + + Internally &zebra; has in its default configuration several + different types of registers or indexes, whose tokenization and + character normalization rules differ. This reflects the fact that + searching fundamental different tokens like dates, numbers, + bitfields and string based text needs different rule sets. + + + + Structure and completeness mapping to register types + + + + Structure + Completeness + Register type + Notes + + + + + + phrase (@attr 4=1), word (@attr 4=2), + word-list (@attr 4=6), + free-form-text (@attr 4=105), or document-text (@attr 4=106) + + Incomplete field (@attr 6=1) + Word ('w') + Traditional tokenized and character normalized word index + + + + phrase (@attr 4=1), word (@attr 4=2), + word-list (@attr 4=6), + free-form-text (@attr 4=105), or document-text (@attr 4=106) + + complete field' (@attr 6=3) + Phrase ('p') + Character normalized, but not tokenized index for phrase + matches + + + + urx (@attr 4=104) + ignored + URX/URL ('u') + Special index for URL web addresses + + + numeric (@attr 4=109) + ignored + Numeric ('n') + Special index for digital numbers + + + key (@attr 4=3) + ignored + Null bitmap ('0') + Used for non-tokenized and non-normalized bit sequences + + + year (@attr 4=4) + ignored + Year ('y') + Non-tokenized and non-normalized 4 digit numbers + + + date (@attr 4=5) + ignored + Date ('d') + Non-tokenized and non-normalized ISO date strings + + + ignored + ignored + Sort ('s') + Used with special sort attribute set (@attr 7=1, @attr 7=2) + + + overruled + overruled + special + Internal record ID register, used whenever + Relation Always Matches (@attr 2=103) is specified + + + +
+ + + + + If a Structure attribute of + Phrase is used in conjunction with a + Completeness attribute of + Complete (Sub)field, the term is matched + against the contents of the phrase (long word) register, if one + exists for the given Use attribute. + A phrase register is created for those fields in the + &acro.grs1; *.abs file that contains a + p-specifier. + + Z> scan @attr 1=Title @attr 4=1 @attr 6=3 beethoven + ... + bayreuther festspiele (1) + * beethoven bibliography database (1) + benny carter (1) + ... + Z> find @attr 1=Title @attr 4=1 @attr 6=3 "beethoven bibliography" + ... + Number of hits: 0, setno 5 + ... + Z> find @attr 1=Title @attr 4=1 @attr 6=3 "beethoven bibliography database" + ... + Number of hits: 1, setno 6 + + + + + If Structure=Phrase is + used in conjunction with Incomplete Field - the + default value for Completeness, the + search is directed against the normal word registers, but if the term + contains multiple words, the term will only match if all of the words + are found immediately adjacent, and in the given order. + The word search is performed on those fields that are indexed as + type w in the &acro.grs1; *.abs file. + + Z> scan @attr 1=Title @attr 4=1 @attr 6=1 beethoven + ... + beefheart (1) + * beethoven (18) + beethovens (7) + ... + Z> find @attr 1=Title @attr 4=1 @attr 6=1 beethoven + ... + Number of hits: 18, setno 1 + ... + Z> find @attr 1=Title @attr 4=1 @attr 6=1 "beethoven bibliography" + ... + Number of hits: 2, setno 2 + ... + + + + + If the Structure attribute is + Word List, + Free-form Text, or + Document Text, the term is treated as a + natural-language, relevance-ranked query. + This search type uses the word register, i.e. those fields + that are indexed as type w in the + &acro.grs1; *.abs file. + + + + If the Structure attribute is + Numeric String the term is treated as an integer. + The search is performed on those fields that are indexed + as type n in the &acro.grs1; + *.abs file. + + + + If the Structure attribute is + URX the term is treated as a URX (URL) entity. + The search is performed on those fields that are indexed as type + u in the *.abs file. + + + + If the Structure attribute is + Local Number the term is treated as + native &zebra; Record Identifier. + + + + If the Relation attribute is + Equals (default), the term is matched + in a normal fashion (modulo truncation and processing of + individual words, if required). + If Relation is Less Than, + Less Than or Equal, + Greater than, or Greater than or + Equal, the term is assumed to be numerical, and a + standard regular expression is constructed to match the given + expression. + If Relation is Relevance, + the standard natural-language query processor is invoked. + + + + For the Truncation attribute, + No Truncation is the default. + Left Truncation is not supported. + Process # in search term is supported, as is + Regxp-1. + Regxp-2 enables the fault-tolerant (fuzzy) + search. As a default, a single error (deletion, insertion, + replacement) is accepted when terms are matched against the register + contents. + + +
+ + +
+ &zebra; Regular Expressions in Truncation Attribute (type = 5) + Each term in a query is interpreted as a regular expression if the truncation value is either Regxp-1 (@attr 5=102) @@ -1956,84 +2331,77 @@ Both query types follow the same syntax with the operands: - - - - - - - - - - - - - - - - - - -
Regular Expression Operands
xMatches the character x.
.Matches any character.
[ .. ]Matches the set of characters specified; - such as [abc] or [a-c].
+ + Regular Expression Operands + + + + x + Matches the character x. + + + . + Matches any character. + + + [ .. ] + Matches the set of characters specified; + such as [abc] or [a-c]. + + + +
The above operands can be combined with the following operators: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Regular Expression Operators
x*Matches x zero or more times. - Priority: high.
x+Matches x one or more times. - Priority: high.
x? Matches x zero or once. - Priority: high.
xy Matches x, then y. - Priority: medium.
x|y Matches either x or y. - Priority: low.
( )The order of evaluation may be changed by using parentheses.
+ + Regular Expression Operators + + + + x* + Matches x zero or more times. + Priority: high. + + + x+ + Matches x one or more times. + Priority: high. + + + x? + Matches x zero or once. + Priority: high. + + + xy + Matches x, then y. + Priority: medium. + + + x|y + Matches either x or y. + Priority: low. + + + ( ) + The order of evaluation may be changed by using parentheses. + + + +
If the first character of the Regxp-2 query is a plus character (+) it marks the beginning of a section with non-standard specifiers. The next plus character marks the end of the section. - Currently Zebra only supports one specifier, the error tolerance, - which consists one digit. + Currently &zebra; only supports one specifier, the error tolerance, + which consists one digit. + @@ -2057,95 +2425,90 @@ Z> find @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval" - +
+ - -
+ - - Server Side CQL to PQF Query Translation +
+ Server Side &acro.cql; to &acro.pqf; Query Translation Using the <cql2rpn>l2rpn.txt</cql2rpn> - YAZ Frontend Virtual + &yaz; Frontend Virtual Hosts option, one can configure - the YAZ Frontend CQL-to-PQF - converter, specifying the interpretation of various - CQL + the &yaz; Frontend &acro.cql;-to-&acro.pqf; + converter, specifying the interpretation of various + &acro.cql; indexes, relations, etc. in terms of Type-1 query attributes. - + - For example, using server-side CQL-to-PQF conversion, one might + For example, using server-side &acro.cql;-to-&acro.pqf; conversion, one might query a zebra server like this: - querytype cql Z> find text=(plant and soil) ]]> - and - if properly configured - even static relevance ranking can - be performed using CQL query syntax: + and - if properly configured - even static relevance ranking can + be performed using &acro.cql; query syntax: - find text = /relevant (plant and soil) ]]> - + - By the way, the same configuration can be used to - search using client-side CQL-to-PQF conversion: - (the only difference is querytype cql2rpn - instead of + By the way, the same configuration can be used to + search using client-side &acro.cql;-to-&acro.pqf; conversion: + (the only difference is querytype cql2rpn + instead of querytype cql, and the call specifying a local conversion file) - querytype cql2rpn Z> find text=(plant and soil) ]]> - + Exhaustive information can be found in the - Section "Specification of CQL to RPN mappings" in the YAZ manual. - - http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map, - and shall therefore not be repeated here. - - - - - +
-
+