X-Git-Url: http://git.indexdata.com/?p=idzebra-moved-to-github.git;a=blobdiff_plain;f=doc%2Fquerymodel.xml;h=d7a7f8489ae6cbc385672497dc1fbcea1ecf9aeb;hp=cdb344d90153a72d25f4d328e35f5b594def22ab;hb=ba0720e26f508ba3396e232d2f82037c0e701698;hpb=5ec1f005c341f170fb66ddf9189fc624a10fc79d diff --git a/doc/querymodel.xml b/doc/querymodel.xml index cdb344d..d7a7f84 100644 --- a/doc/querymodel.xml +++ b/doc/querymodel.xml @@ -1,5 +1,4 @@ - Query Model
@@ -9,20 +8,20 @@ Query Languages - Zebra is born as a networking Information Retrieval engine adhering + &zebra; is born as a networking Information Retrieval engine adhering to the international standards - Z39.50 and - SRU, + &acro.z3950; and + &acro.sru;, and implement the - type-1 Reverse Polish Notation (RPN) query + type-1 Reverse Polish Notation (&acro.rpn;) query model defined there. Unfortunately, this model has only defined a binary encoded representation, which is used as transport packaging in - the Z39.50 protocol layer. This representation is not human + the &acro.z3950; protocol layer. This representation is not human readable, nor defines any convenient way to specify queries. - Since the type-1 (RPN) + Since the type-1 (&acro.rpn;) query structure has no direct, useful string representation, every client application needs to provide some form of mapping from a local query notation or representation to it. @@ -30,33 +29,33 @@
- Prefix Query Format (PQF) + Prefix Query Format (&acro.pqf;) Index Data has defined a textual representation in the Prefix Query Format, short - PQF, which maps + &acro.pqf;, which maps one-to-one to binary encoded - type-1 RPN queries. - PQF has been adopted by other - parties developing Z39.50 software, and is often referred to as + type-1 &acro.rpn; queries. + &acro.pqf; has been adopted by other + parties developing &acro.z3950; software, and is often referred to as Prefix Query Notation, or in short - PQN. See + &acro.pqn;. See for further explanations and - descriptions of Zebra's capabilities. + descriptions of &zebra;'s capabilities.
- Common Query Language (CQL) + Common Query Language (&acro.cql;) - The query model of the type-1 RPN, - expressed in PQF/PQN is natively supported. - On the other hand, the default SRU + The query model of the type-1 &acro.rpn;, + expressed in &acro.pqf;/&acro.pqn; is natively supported. + On the other hand, the default &acro.sru; web services Common Query Language - CQL is not natively supported. + &acro.cql; is not natively supported. - Zebra can be configured to understand and map CQL to PQF. See + &zebra; can be configured to understand and map &acro.cql; to &acro.pqf;. See .
@@ -66,8 +65,8 @@
Operation types - Zebra supports all of the three different - Z39.50/SRU operations defined in the + &zebra; supports all of the three different + &acro.z3950;/&acro.sru; operations defined in the standards: explain, search, and scan. A short description of the functionality and purpose of each is quite in order here. @@ -76,7 +75,7 @@
Explain Operation - The syntax of Z39.50/SRU queries is + The syntax of &acro.z3950;/&acro.sru; queries is well known to any client, but the specific semantics - taking into account a particular servers functionalities and abilities - must be @@ -89,15 +88,15 @@ of the general query model are supported. - The Z39.50 embeds the explain operation + The &acro.z3950; embeds the explain operation by performing a search in the magic IR-Explain-1 database; see . - In SRU, explain is an entirely separate - operation, which returns an ZeeRex XML record according to the + In &acro.sru;, explain is an entirely separate + operation, which returns an ZeeRex &acro.xml; record according to the structure defined by the protocol. @@ -117,7 +116,7 @@ simple free text searches to nested complex boolean queries, targeting specific indexes, and possibly enhanced with many query semantic specifications. Search interactions are the heart - and soul of Z39.50/SRU servers. + and soul of &acro.z3950;/&acro.sru; servers.
@@ -143,26 +142,25 @@
-
- RPN queries and semantics + &acro.rpn; queries and semantics - The PQF grammar - is documented in the YAZ manual, and shall not be - repeated here. This textual PQF representation - is not transmistted to Zebra during search, but it is in the - client mapped to the equivalent Z39.50 binary + The &acro.pqf; grammar + is documented in the &yaz; manual, and shall not be + repeated here. This textual &acro.pqf; representation + is not transmitted to &zebra; during search, but it is in the + client mapped to the equivalent &acro.z3950; binary query parse tree.
- RPN tree structure + &acro.rpn; tree structure - The RPN parse tree - or the equivalent textual representation in PQF - + The &acro.rpn; parse tree - or the equivalent textual representation in &acro.pqf; - may start with one specification of the attribute set used. Following is a query tree, which - consists of atomic query parts (APT) or + consists of atomic query parts (&acro.apt;) or named result sets, eventually paired by boolean binary operators, and finally recursively combined into @@ -173,18 +171,18 @@ Attribute sets Attribute sets define the exact meaning and semantics of queries - issued. Zebra comes with some predefined attribute set + issued. &zebra; comes with some predefined attribute set definitions, others can easily be defined and added to the configuration. - Attribute sets predefined in Zebra + Attribute sets predefined in &zebra; Attribute set - PQF notation (Short hand) + &acro.pqf; notation (Short hand) Status Notes @@ -201,30 +199,21 @@ predefined - Bib-1 + &acro.bib1; bib-1 - Standard PQF query language attribute set which defines the - semantics of Z39.50 searching. In addition, all of the - non-use attributes (types 2-12) define the hard-wired - Zebra internal query + Standard &acro.pqf; query language attribute set which defines the + semantics of &acro.z3950; searching. In addition, all of the + non-use attributes (types 2-14) define the hard-wired + &zebra; internal query processing. default GILS gils - Extension to the Bib-1 attribute set. + Extension to the &acro.bib1; attribute set. predefined -
@@ -237,8 +226,8 @@ - The Zebra internal query processing is modeled after - the Bib-1 attribute set, and the non-use + The &zebra; internal query processing is modeled after + the &acro.bib1; attribute set, and the non-use attributes type 2-6 are hard-wired in. It is therefore essential to be familiar with . @@ -317,7 +306,7 @@ retrieval, taking proximity into account: The hit set is a subset of the corresponding AND query - (see the PQF grammar for + (see the &acro.pqf; grammar for details on the proximity operator): Z> find @prox 0 3 0 2 k 2 information retrieval @@ -338,23 +327,23 @@
- Atomic queries (APT) + Atomic queries (&acro.apt;) Atomic queries are the query parts which work on one access point only. These consist of an attribute list followed by a single term or a quoted term list, and are often called - Attributes-Plus-Terms (APT) queries. + Attributes-Plus-Terms (&acro.apt;) queries. - Atomic (APT) queries are always leaf nodes in the PQF query tree. + Atomic (&acro.apt;) queries are always leaf nodes in the &acro.pqf; query tree. UN-supplied non-use attributes types 2-12 are either inherited from - higher nodes in the query tree, or are set to Zebra's default values. + higher nodes in the query tree, or are set to &zebra;'s default values. See for details. - Atomic queries (APT) + Atomic queries (&acro.apt;) @@ -369,7 +358,7 @@ List of orthogonal attributes Any of the orthogonal attribute types may be omitted, these are inherited from higher query tree nodes, or if not - inherited, are set to the default Zebra configuration values. + inherited, are set to the default &zebra; configuration values. @@ -407,7 +396,7 @@ The scan operation is only supported with - atomic APT queries, as it is bound to one access point at a + atomic &acro.apt; queries, as it is bound to one access point at a time. Boolean query trees are not allowed during scan. @@ -427,10 +416,10 @@
Named Result Sets - Named result sets are supported in Zebra, and result sets can be + Named result sets are supported in &zebra;, and result sets can be used as operands without limitations. It follows that named - result sets are leaf nodes in the PQF query tree, exactly as - atomic APT queries are. + result sets are leaf nodes in the &acro.pqf; query tree, exactly as + atomic &acro.apt; queries are. After the execution of a search, the result set is available at @@ -460,30 +449,30 @@ - Named result sets are only supported by the Z39.50 protocol. - The SRU web service is stateless, and therefore the notion of - named result sets does not exist when accessing a Zebra server by - the SRU protocol. + Named result sets are only supported by the &acro.z3950; protocol. + The &acro.sru; web service is stateless, and therefore the notion of + named result sets does not exist when accessing a &zebra; server by + the &acro.sru; protocol.
- Zebra's special access point of type 'string' + &zebra;'s special access point of type 'string' The numeric use (type 1) attribute is usually referred to from a given - attribute set. In addition, Zebra let you use + attribute set. In addition, &zebra; let you use any internal index name defined in your configuration as use attribute value. This is a great feature for debugging, and when you do not need the complexity of defined use attribute values. It is - the preferred way of accessing Zebra indexes directly. + the preferred way of accessing &zebra; indexes directly. Finding all documents which have the term list "information - retrieval" in an Zebra index, using it's internal full string + retrieval" in an &zebra; index, using its internal full string name. Scanning the same index. Z> find @attr 1=sometext "information retrieval" @@ -492,7 +481,7 @@ Searching or scanning - the bib-1 use attribute 54 using it's string name: + the bib-1 use attribute 54 using its string name: Z> find @attr 1=Code-language eng Z> scan @attr 1=Code-language "" @@ -501,7 +490,7 @@ It is possible to search in any silly string index - if it's defined in your - indexation rules and can be parsed by the PQF parser. + indexing rules and can be parsed by the &acro.pqf; parser. This is definitely not the recommended use of this facility, as it might confuse your users with some very unexpected results. @@ -512,14 +501,14 @@ See also for details, and - for the SRU PQF query extension using string names as a fast + for the &acro.sru; &acro.pqf; query extension using string names as a fast debugging facility.
- Zebra's special access point of type 'XPath' - for GRS filters + &zebra;'s special access point of type 'XPath' + for &acro.grs1; filters As we have seen above, it is possible (albeit seldom a great idea) to emulate @@ -528,18 +517,18 @@ string attributes which in appearance resemble XPath queries. There are two problems with this approach: first, the XPath-look-alike has to - be defined at indexation time, no new undefined + be defined at indexing time, no new undefined XPath queries can entered at search time, and second, it might confuse users very much that an XPath-alike index name in fact - gets populated from a possible entirely different XML element + gets populated from a possible entirely different &acro.xml; element than it pretends to access. - When using the GRS Record Model + When using the &acro.grs1; Record Model (see ), we have the possibility to embed life XPath expressions - in the PQF queries, which are here called + in the &acro.pqf; queries, which are here called use (type 1) xpath attributes. You must enable the xpath enable directive in your @@ -549,14 +538,14 @@ Only a very restricted subset of the XPath 1.0 - standard is supported as the GRS record model is simpler than - a full XML DOM structure. See the following examples for + standard is supported as the &acro.grs1; record model is simpler than + a full &acro.xml; &acro.dom; structure. See the following examples for possibilities. Finding all documents which have the term "content" - inside a text node found in a specific XML DOM + inside a text node found in a specific &acro.xml; &acro.dom; subtree, whose starting element is addressed by XPath. @@ -586,7 +575,7 @@ Filter the addressing XPath by a predicate working on exact string values in - attributes (in the XML sense) can be done: return all those docs which + attributes (in the &acro.xml; sense) can be done: return all those docs which have the term "english" contained in one of all text sub nodes of the subtree defined by the XPath /record/title[@lang='en']. And similar @@ -607,8 +596,8 @@ - Escaping PQF keywords and other non-parseable XPath constructs - with '{ }' to prevent client-side PQF parsing + Escaping &acro.pqf; keywords and other non-parseable XPath constructs + with '{ }' to prevent client-side &acro.pqf; parsing syntax errors: Z> find @attr {1=/root/first[@attr='danish']} content @@ -630,11 +619,11 @@
Explain Attribute Set - The Z39.50 standard defines the + The &acro.z3950; standard defines the Explain attribute set Exp-1, which is used to discover information about a server's search semantics and functional capabilities - Zebra exposes a "classic" + &zebra; exposes a "classic" Explain database by base name IR-Explain-1, which is populated with system internal information. @@ -644,11 +633,11 @@ In addition, the non-Use - Bib-1 attributes, that is, the types + &acro.bib1; attributes, that is, the types Relation, Position, Structure, Truncation, and Completeness are imported from - the Bib-1 attribute set, and may be used + the &acro.bib1; attribute set, and may be used within any explain query. @@ -669,7 +658,7 @@ See tab/explain.att and the - Z39.50 standard + &acro.z3950; standard for more information.
@@ -678,11 +667,11 @@ Explain searches with yaz-client Classic Explain only defines retrieval of Explain information - via ASN.1. Practically no Z39.50 clients supports this. Fortunately - they don't have to - Zebra allows retrieval of this information + via ASN.1. Practically no &acro.z3950; clients supports this. Fortunately + they don't have to - &zebra; allows retrieval of this information in other formats: - SUTRS, XML, - GRS-1 and ASN.1 Explain. + &acro.sutrs;, &acro.xml;, + &acro.grs1; and ASN.1 Explain. @@ -741,9 +730,9 @@ Get attribute details record for database Default. - This query is very useful to study the internal Zebra indexes. + This query is very useful to study the internal &zebra; indexes. If records have been indexed using the alvis - XSLT filter, the string representation names of the known indexes can be + &acro.xslt; filter, the string representation names of the known indexes can be found. Z> base IR-Explain-1 @@ -760,17 +749,17 @@
- Bib-1 Attribute Set + &acro.bib1; Attribute Set Most of the information contained in this section is an excerpt of - the ATTRIBUTE SET BIB-1 (Z39.50-1995) SEMANTICS - found at . The Bib-1 + the ATTRIBUTE SET &acro.bib1; (&acro.z3950;-1995) SEMANTICS + found at . The &acro.bib1; Attribute Set Semantics from 1995, also in an updated - Bib-1 + &acro.bib1; Attribute Set version from 2003. Index Data is not the copyright holder of this information, except for the configuration details, the listing of - Zebra's capabilities, and the example queries. + &zebra;'s capabilities, and the example queries. @@ -788,7 +777,7 @@ tab/gils.att. - For example, some few Bib-1 use + For example, some few &acro.bib1; use attributes from the tab/bib1.att are: att 1 Personal-name @@ -814,7 +803,7 @@ be sourced in the main configuration zebra.cfg. - In addition, Zebra allows the access of + In addition, &zebra; allows the access of internal index names and dynamic XPath as use attributes; see and @@ -835,8 +824,8 @@
- Zebra general Bib1 Non-Use Attributes (type 2-6) - + &zebra; general Bib1 Non-Use Attributes (type 2-6) +
Relation Attributes (type 2) @@ -922,7 +911,7 @@ The relation attributes 1-5 are supported and work exactly as expected. All ordering operations are based on a lexicographical ordering, - expect when the + except when the structure attribute numeric (109) is used. In this case, ordering is numerical. See . @@ -979,7 +968,7 @@ AlwaysMatches (103) is a great way to discover how many documents have been indexed in a given field. The search term is ignored, but needed for correct - PQF syntax. An empty search term may be supplied. + &acro.pqf; syntax. An empty search term may be supplied. Z> find @attr 1=Title @attr 2=103 "" Z> find @attr 1=Title @attr 2=103 @attr 4=1 "" @@ -1029,32 +1018,31 @@ - Zebra only supports first-in-field seaches if the + &zebra; only supports first-in-field seaches if the firstinfield is enabled for the index Refer to . - Zebra does not distinguish between first in field and + &zebra; does not distinguish between first in field and first in subfield. They result in the same hit count. - Searching for first position in (sub)field in only supported in Zebra + Searching for first position in (sub)field in only supported in &zebra; 2.0.2 and later.
- +
Structure Attributes (type 4) The structure attribute specifies the type of search term. This causes the search to be mapped on - different Zebra internal indexes, which must have been defined + different &zebra; internal indexes, which must have been defined at index time. The possible values of the structure attribute (type 4) can be defined - using the configuration file - tab/default.idx. + using the configuration file tab/default.idx. The default configuration is summarized in this table. @@ -1152,14 +1140,13 @@
- The structure attribute values Word list (6) is supported, and maps to the boolean AND combination of words supplied. The word list is useful when - google-like bag-of-word queries need to be translated from a GUI - query language to PQF. For example, the following queries + Google-like bag-of-word queries need to be translated from a GUI + query language to &acro.pqf;. For example, the following queries are equivalent: Z> find @attr 1=Title @attr 4=6 "mozart amadeus" @@ -1185,11 +1172,11 @@ Z> find @attr 1=Body-of-text @attr 2=102 @attr 4=105 "bach salieri teleman" - + The structure attribute value Local number (107) - is supported, and maps always to the Zebra internal document ID, + is supported, and maps always to the &zebra; internal document ID, irrespectively which use attribute is specified. The following queries have exactly the same unique record in the hit set: @@ -1213,13 +1200,14 @@ - The exact mapping between PQF queries and Zebra internal indexes + The exact mapping between &acro.pqf; queries and &zebra; internal indexes and index types is explained in .
+
Truncation Attributes (type = 5) @@ -1330,7 +1318,7 @@ The truncation attribute value - Regexp-2 (103) is a Zebra specific extension + Regexp-2 (103) is a &zebra; specific extension which allows fuzzy matches. One single error in spelling of search terms is allowed, i.e., a document is hit if it includes a term which can be mapped to the used @@ -1401,35 +1389,35 @@ Incomplete subfield (1) is the default, and - makes Zebra use + makes &zebra; use register type="w", whereas Complete field (3) triggers search and scan in index type="p". - The Complete subfield (2) is a reminiscens - from the happy MARC - binary format days. Zebra does not support it, but maps silently + The Complete subfield (2) is a reminiscent + from the happy &acro.marc; + binary format days. &zebra; does not support it, but maps silently to Complete field (3). - The exact mapping between PQF queries and Zebra internal indexes + The exact mapping between &acro.pqf; queries and &zebra; internal indexes and index types is explained in .
-
- +
+
- Extended Zebra RPN Features + Extended &zebra; &acro.rpn; Features - The Zebra internal query engine has been extended to specific needs + The &zebra; internal query engine has been extended to specific needs not covered by the bib-1 attribute set query model. These extensions are non-standard and non-portable: most functional extensions @@ -1441,9 +1429,9 @@
- Zebra specific retrieval of all records + &zebra; specific retrieval of all records - Zebra defines a hardwired string index name + &zebra; defines a hardwired string index name called _ALLRECORDS. It matches any record contained in the database, if used in conjunction with the relation attribute @@ -1470,28 +1458,28 @@ The special string index _ALLRECORDS is experimental, and the provided functionality and syntax may very - well change in future releases of Zebra. + well change in future releases of &zebra;.