X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Fquerymodel.xml;h=c7ebc179395a2bde2962fbf98fec157d817f73e4;hb=8d4fd86574ab7de92b916c99073550348eb778f1;hp=8852fc7a719b1129f7a1698ceea4ea1f1883389e;hpb=3d60205d934852596d8939b4db1114ec53a9d2f4;p=idzebra-moved-to-github.git diff --git a/doc/querymodel.xml b/doc/querymodel.xml index 8852fc7..c7ebc17 100644 --- a/doc/querymodel.xml +++ b/doc/querymodel.xml @@ -1,40 +1,152 @@ - + Query Model Query Model Overview + + + Query Languages + + + Zebra is born as a networking Information Retrieval engine adhering + to the international standards + Z39.50 and + SRU, + and implement the + type-1 Reverse Polish Notation (RPN) query + model defined there. + Unfortunately, this model has only defined a binary + encoded representation, which is used as transport packaging in + the Z39.50 protocol layer. This representation is not human + readable, nor defines any convenient way to specify queries. + + + Since the type-1 (RPN) + query structure has no direct, useful string + representation, every origin application needs to provide some + form of mapping from a local query notation or representation to it. + + + + + Prefix Query Format (PQF) + - Zebra is born as a networking Information Retrieval engine adhering - to the international standards - Z39.50 and - SRU, - and implement the query model defined there. - Unfortunately, the Z39.50 query model has only defined a binary - encoded representation, which is used as transport packaging in - the Z39.50 protocol layer. This representation is not human - readable, nor defines any convenient way to specify queries. - - - Therefore, Index Data has defined a textual representaion in the - Prefix Query Format, short - PQF, which then has been adopted by other - parties developing Z39.50 software. It is also often referred to as - Prefix Query Notation, or in short - PQN, and is thoroughly explained in - . - - - - In addition, Zebra can be configured to understand and map the - Common Query Language - (CQL) - to PQF. See an introduction on the mapping to the internal query - representation in - . - - + Index Data has defined a textual representaion in the + Prefix Query Format, short + PQF, which mappes + one-to-one to binary encoded + type-1 RPN query packages. + It has been adopted by other + parties developing Z39.50 software, and is often referred to as + Prefix Query Notation, or in short + PQN. See + for further explanaitions and + descriptions of Zebra's capabilities. + + + + + Common Query Language (CQL) + + The query model of the type-1 RPN, + expressed in PQF/PQN is natively supported. + On the other hand, the default SRU + webservices Common Query Language + CQL is not natively supported. + + + Zebra can be configured to understand and map CQL to PQF. See + . + + + + + + + Operation types + + Zebra supports all of the three different + Z39.50/SRU operations defined in the + standards: explain, search, + and scan. A short description of the + functionality and purpose of each is quite in order here. + + + + Explain Operation + + The syntax of Z39.50/SRU queries is + well known to any client, but the specific + semantics - taking into account a + particular servers functionalities and abilities - must be + discovered from case to case. Enters the + explain operation, which provides the means + for learning which + fields (also called + indexes or access points + are provided, which default parameter the server uses, which + retrieve document formats are defined, and which specific parts + of the general query model are supported. + + + The Z39.50 embeddes the explain operation + by perfoming a + search in the magic + IR-Explain-1 database; + see . + + + In SRU, explain is an entirely seperate + operation, which returns an Zeerex + XML record according to the + structure defined by the protocol. + + + In both cases, the information gathered through + explain operations can be used to + auto-configure a client user interface to the servers + capabilities. + + + + + Search Operation + + Search and retrieve interactions are the raison d'être. + They are used to query the remote database and + return search result documents. Search queries span from + simple free text searches to nested complex boolean queries, + targeting specific indexes, and possibly enhanced with many + query semantic specifications. Search interactions are the heart + and soul of Z39.50/SRU servers. + + + + + Scan Operation + + The scan operation is a helper functionality, + which operates on one index or access point a time. + + + It provides + the means to investigate the content of specific indexes. + Scanning an index returns a handfull of terms actually fond in + the indexes, and in addition the scan + operation returns th enumber of documents indexed by each term. + A search client can use this information to propose proper + spelling of search terms, to auto-fill search boxes, or to + display controlled vocabularies. + + + + + + + Prefix Query Format structure and syntax @@ -53,7 +165,8 @@ may start with one specification of the attribute set used. Following is a query tree, which - consists of atomic query parts, eventually + consists of atomic query parts (APT) or + named result sets, eventually paired by boolean binary operators, and finally recursively combined into complex query trees. @@ -66,46 +179,75 @@ issued. Zebra comes with some predefined attribute set definitions, others can easily be defined and added to the configuration. - - The Zebra internal query procesing is modeled after - the Bib1 attribute set, and the non-use - attributes type 2-9 are hard-wired in. It is therefore essential - to be familiar with . - + - +
+ - + - - + + + - - + + + non-use attributes (type 2-9) define the hard-wired + Zebra internal query + processing. + - - + + + + + + + + +
Attribute sets predefined in Zebra
exp-1Explain attribute setExplainexp-1 Special attribute set used on the special automagic IR-Explain-1 database to gain information on server capabilities, database names, and database and semantics.predefined
bib-1Bib1 attribute setBib1bib-1 Standard PQF query language attribute set which defines the semantics of Z39.50 searching. In addition, all of the - non-use attributes (type 2-9) define the Zebra internal query - processingdefault
gilsGILS attribute setGILSgils Extention to the Bib1 attribute set.predefined
IDXPATHidxpathHardwired XPATH like attribute set, only available for + indexing with the GRS record modeldepreciated
+ + + The use attributes (type 1) of the predefined attribute sets can + be reconfigured by tweaking the files + tab/*.att. + New attribute sets can be defined by adding similar files in the + configuration path of the server. + + + + The Zebra internal query processing is modeled after + the Bib1 attribute set, and the non-use + attributes type 2-6 are hard-wired in. It is therefore essential + to be familiar with . + + Boolean operators @@ -114,7 +256,9 @@ using the standard boolean operators into new query trees. - +
+ - + - + - + - +
Boolean operators
@and
@and binary AND operator Set intersection of two atomic queries hit sets
@or
@or binary OR operator Set union of two atomic queries hit sets
@not
@not binary AND NOT operator Set complement of two atomic queries hit sets
@prox
@prox binary PROXIMY operator Set intersection of two atomic queries hit sets. In addition, the intersection set is purged for all @@ -192,12 +336,13 @@ - Atomic queries + Atomic queries (APT) Atomic queries are the query parts which work on one acess point only. These consist of an attribute list followed by a single term or a - quoted term list. + quoted term list, and are often called + Attributes-Plus-Terms (APT) queries. Unsupplied non-use attributes type 2-9 are either inherited from @@ -205,7 +350,9 @@ See for details. - +
+ + + + All ordering operations are based on a lexicographical ordering, + expect when the + structure attribute numeric (109) is used. In + this case, ordering is numerical. See + . - + Ranked search for information retrieval in - the title-register - (see for the glory details): + the title-register: Z> find @attr 1=4 @attr 2=102 "information retrieval" - + + - Position Attributes (type = 3) + Position Attributes (type 3) + + + The position attribute specifies the location of the search term + within the field or subfield in which it appears. + + +
Atomic queries
+ + + + + + + + + + + + + + + + + + + + + + + + + + +
Position Attributes (type 3)
PositionValueNotes
First in field 1unsupported
First in subfield2unsupported
Any position in field3default
+ + + The position attribute values first in field (1), + and first in subfield(2) are unsupported. + Using them does not trigger an error, but silent defaults to + any position in field (3). + +
- Structure Attributes (type = 4) + Structure Attributes (type 4) + + + The structure attribute specifies the type of search + term. This causes the search to be mapped on + different Zebra internal indexes, which must have been defined + at index time. + + + + The possible values of the + structure attribute (type 4) can be defined + using the configuraiton file + tab/default.idx. + The default configuration is summerized in this table. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Structure Attributes (type 4)
StructureValueNotes
Phrase 1default
Word2supported
Key3supported
Year4supported
Date (normalized)5supported
Word list6supported
Date (un-normalized)100unsupported
Name (normalized) 101unsupported
Name (un-normalized) 102unsupported
Structure103unsupported
Urx104supported
Free-form-text105supported
Document-text106supported
Local-number107supported
String108unsupported
Numeric string109supported
+ + The structure attribute value local-number + (107) + is supported, and maps always to the Zebra internal document ID. + For example, in @@ -596,10 +1057,101 @@ Truncation Attributes (type = 5) + + + The truncation attribute specifies whether variations of one or + more characters are allowed between serch term and hit terms, or + not. Using non-default truncation attributes will broaden the + document hit set of a search query. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Truncation Attributes (type 5)
TruncationValueNotes
Right truncation 1supported
Left truncation2supported
Left and right truncation3supported
Do not truncate100default
Process # in search term101supported
RegExpr-1 102supported
RegExpr-2103supported
+ + + Truncation attribute value + Process # in search term (100) is a + poor-man's regular expression search. It maps + each # to .*, and + performes then a Regexp-1 (102) regular + expression search. + + + Truncation attribute value + Regexp-1 (102) is a normal regular search, + see. + + + Truncation attribute value + Regexp-2 (103) is a Zebra specific extention + which allows fuzzy matches. One single + error in spelling of search terms is allowed, i.e., a document + is hit if it includes a term which can be mapped to the used + search term by one character substitution, addition, deletion or + change of posiiton. + +
Completeness Attributes (type = 6) + + This attribute is ONLY used if structure w, p is to be + chosen. completeness is ignorned if not w, p is to be + used.. + Incomplete field(1) is the default and makes Zebra use + register type w. + complete subfield(2) and complete field(3) both triggers + search field type p. + @@ -612,38 +1164,46 @@ set used in a search operation query.
- +
+ - + + - + + - + + - + + - + + - + + @@ -759,7 +1319,8 @@ Zebra Extention Term Reference Attribute (type 10) - Zebra supports the searchResult-1 facility. If attribute 10 is + Zebra supports the searchResult-1 facility. + If the Term Reference Attribute (type 10) is given, that specifies a subqueryId value returned as part of the search result. It is a way for a client to name an APT part of a query. @@ -785,53 +1346,83 @@ recognized regardless of attribute set used in a scan operation query. -
Zebra Search Attribute Extentions
Name and TypeNameValue Operation Zebra version
Embedded Sort (type 7)Embedded Sort7 search 1.1
Term Set (type 8)Term Set8 search 1.1
Rank weight (type 9)Rank Weight9 search 1.1
Approx Limit (type 9)Approx Limit9 search 1.4
Term Reference (type 10)Term Reference10 search 1.4
+
+ - + + - + + - + +
Zebra Scan Attribute Extentions
Name and TypeNameType Operation Zebra version
Result Set Narrow (type 8)Result Set Narrow8 scan 1.3
Approximative Limit (type 9)Approximative Limit9 scan 1.4
- + Zebra Extention Result Set Narrow (type 8) - If attribute 8 is given for scan, the value is the name of a - result set. Each hit count in scan is @and'ed with the result set - given. + If attribute Result Set Narrow (type 8) + is given for scan, the value is the name of a + result set. Each hit count in scan is + @and'ed with the result set given. - + - Experimental and buggy. Definitely not to be used in production code. + Experimental. Do not use in production code. - + Zebra Extention Approximative Limit (type 9) - The approximative limit (as for search) is a way to enable approx - hit counts for scan hit counts. + The Zebra Extention Approximative Limit (type + 9) is a way to enable approx + hit counts for scan hit counts, in the same + way as for search hit counts. - Experimental. Do not use in production code. + Experimental and buggy. Definitely not to be used in production code. - + + + + Zebra special IDXPATH Attribute Set for GRS indexing + + The attribute-set idxpath consists of a single + Use (type 1) attribute. All non-use attributes + behave as normal. + + + This feature is enabled when defining the + xpath enable option in the GRS filter + *.abs configuration files. If one wants to use + the special idxpath numeric attribute set, the + main Zebra configuraiton file zebra.cfg + directive attset: idxpath.att must be enabled. + + The idxpath is depreciated, may not be + supported in future Zebra versions, and should definitely + not be used in production code. + + + + IDXPATH Use Attributes (type = 1) + + This attribute set allows one to search GRS filter indexed + records by XPATH like structured index names. It is enabled by + specifying the + + + + The idxpath option defines hard-coded + index names, which might clash with your own index names. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Zebra specific IDXPATH Use Attributes (type 1)
IDXPATHValueString IndexNotes
XPATH Begin1_XPATH_BEGINdepreciated
XPATH End2_XPATH_ENDdepreciated
XPATH CData1016_XPATH_CDATAdepreciated
XPATH Attribute Name3_XPATH_ATTR_NAMEdepreciated
XPATH Attribute CData1015_XPATH_ATTR_CDATAdepreciated
+ + + + See tab/idxpath.att for more information. + + + Search for all documents starting with root element + /root (either using the numeric or the string + use attributes): + + Z> find @attrset idxpath @attr 1=1 @attr 4=3 root/ + Z> find @attr idxpath 1=1 @attr 4=3 root/ + Z> find @attr 1=_XPATH_BEGIN @attr 4=3 root/ + + + + Search for all documents where specific nested XPATH + /c1/c2/../cn exists. Notice the very + counter-intuitive reverse notation! + + Z> find @attrset idxpath @attr 1=1 @attr 4=3 cn/cn-1/../c1/ + Z> find @attr 1=_XPATH_BEGIN @attr 4=3 cn/cn-1/../c1/ + + + + Search for CDATA string text in any element + + Z> find @attrset idxpath @attr 1=1016 text + Z> find @attr 1=_XPATH_CDATA text + + + + Search for CDATA string anothertext in any + attribute: + + Z> find @attrset idxpath @attr 1=1015 anothertext + Z> find @attr 1=_XPATH_ATTR_CDATA anothertext + + + + Search for all documents with have an XML element node + including an XML attribute named creator + + Z> find @attrset idxpath @attr 1=3 @attr 4=3 creator + Z> find @attr 1=_XPATH_ATTR_NAME @attr 4=3 creator + + + + Combining usual bib-1 attribut set searches + with idxpath attribute set searches: + + Z> find @and @attr idxpath 1=1 @attr 4=3 link/ @attr 1=4 mozart + Z> find @and @attr 1=_XPATH_BEGIN @attr 4=3 link/ @attr 1=_XPATH_CDATA mozart + + + +
+
+ Mapping from Bib1 Attributes to Zebra internal @@ -1034,7 +1763,9 @@ Both query types follow the same syntax with the operands: </para> - <table id="querymodel-regular-operands-table"> + <table id="querymodel-regular-operands-table" + frame="all" rowsep="1" colsep="1" align="center"> + <caption>Regular Expression Operands</caption> <!-- <thead> @@ -1043,15 +1774,15 @@ --> <tbody> <tr> - <td><emphasis>x</emphasis></td> - <td>Matches the character <emphasis>x</emphasis>.</td> + <td><literal>x</literal></td> + <td>Matches the character <literal>x</literal>.</td> </tr> <tr> - <td><emphasis>.</emphasis></td> + <td><literal>.</literal></td> <td>Matches any character.</td> </tr> <tr> - <td><emphasis>[ .. ]</emphasis></td> + <td><literal>[ .. ]</literal></td> <td>Matches the set of characters specified; such as <literal>[abc]</literal> or <literal>[a-c]</literal>.</td> </tr> @@ -1062,8 +1793,8 @@ The above operands can be combined with the following operators: </para> - - <table id="querymodel-regular-operators-table"> + <table id="querymodel-regular-operators-table" + frame="all" rowsep="1" colsep="1" align="center"> <caption>Regular Expression Operators</caption> <!-- <thead> @@ -1072,39 +1803,39 @@ --> <tbody> <tr> - <td><emphasis>x*</emphasis></td> - <td>Matches <emphasis>x</emphasis> zero or more times. + <td><literal>x*</literal></td> + <td>Matches <literal>x</literal> zero or more times. Priority: high.</td> </tr> <tr> - <td><emphasis>x+</emphasis></td> - <td>Matches <emphasis>x</emphasis> one or more times. + <td><literal>x+</literal></td> + <td>Matches <literal>x</literal> one or more times. Priority: high.</td> </tr> <tr> - <td><emphasis>x?</emphasis></td> - <td> Matches <emphasis>x</emphasis> zero or once. + <td><literal>x?</literal></td> + <td> Matches <literal>x</literal> zero or once. Priority: high.</td> </tr> <tr> - <td><emphasis>xy</emphasis></td> - <td> Matches <emphasis>x</emphasis>, then <emphasis>y</emphasis>. + <td><literal>xy</literal></td> + <td> Matches <literal>x</literal>, then <literal>y</literal>. Priority: medium.</td> </tr> <tr> - <td><emphasis>x|y</emphasis></td> - <td> Matches either <emphasis>x</emphasis> or <emphasis>y</emphasis>. + <td><literal>x|y</literal></td> + <td> Matches either <literal>x</literal> or <literal>y</literal>. Priority: low.</td> </tr> <tr> - <td><emphasis>( )</emphasis></td> + <td><literal>( )</literal></td> <td>The order of evaluation may be changed by using parentheses.</td> </tr> </tbody> </table> - + <para> - If the first character of the <emphasis>Regxp-2</emphasis> query + If the first character of the <literal>Regxp-2</literal> query is a plus character (<literal>+</literal>) it marks the beginning of a section with non-standard specifiers. The next plus character marks the end of the section. @@ -1128,8 +1859,7 @@ <para> Combinations with other attributes are possible. For example, a - ranked search with a regular expression - (see <xref linkend="administration-ranking"/> for the glory details): + ranked search with a regular expression: <screen> Z> find @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval" </screen> @@ -1144,7 +1874,7 @@ process input records. Two basic types of processing are available - raw text and structured data. Raw text is just that, and it is selected by providing the - argument <emphasis>text</emphasis> to Zebra. Structured records are + argument <literal>text</literal> to Zebra. Structured records are all handled internally using the basic mechanisms described in the subsequent sections. Zebra can read structured records in many different formats.