X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Fquerymodel.xml;h=b067c98a4a6ce946896319d0161f46055a347590;hb=fa91adf8d4d03f1b04c2ad3e4be9bc9a487e9a51;hp=8852fc7a719b1129f7a1698ceea4ea1f1883389e;hpb=3d60205d934852596d8939b4db1114ec53a9d2f4;p=idzebra-moved-to-github.git diff --git a/doc/querymodel.xml b/doc/querymodel.xml index 8852fc7..b067c98 100644 --- a/doc/querymodel.xml +++ b/doc/querymodel.xml @@ -1,40 +1,87 @@ - + Query Model Query Model Overview + + + Query Languages + + + Zebra is born as a networking Information Retrieval engine adhering + to the international standards + Z39.50 and + SRU, + and implement the query model defined there. + Unfortunately, the Z39.50 query model has only defined a binary + encoded representation, which is used as transport packaging in + the Z39.50 protocol layer. This representation is not human + readable, nor defines any convenient way to specify queries. + + + + + + + Prefix Query Format (PQF) + - Zebra is born as a networking Information Retrieval engine adhering - to the international standards - Z39.50 and - SRU, - and implement the query model defined there. - Unfortunately, the Z39.50 query model has only defined a binary - encoded representation, which is used as transport packaging in - the Z39.50 protocol layer. This representation is not human - readable, nor defines any convenient way to specify queries. - - - Therefore, Index Data has defined a textual representaion in the - Prefix Query Format, short - PQF, which then has been adopted by other - parties developing Z39.50 software. It is also often referred to as - Prefix Query Notation, or in short - PQN, and is thoroughly explained in - . - - + Index Data has defined a textual representaion in the + Prefix Query Format, short + PQF, which then has been adopted by other + parties developing Z39.50 software. It is also often referred to as + Prefix Query Notation, or in short + PQN, and is thoroughly explained in + . + + + + + + + Common Query Language (CQL) - In addition, Zebra can be configured to understand and map the - Common Query Language - (CQL) - to PQF. See an introduction on the mapping to the internal query - representation in - . - - + In addition, Zebra can be configured to understand and map the + Common Query Language + (CQL) + to PQF. See an introduction on the mapping to the internal query + representation in + . + + + + + + + Query types + + + + + Explain Queries + + + + + + Search Queries + + + + + + Scan Queries + + + + + + + + Prefix Query Format structure and syntax @@ -53,7 +100,7 @@ may start with one specification of the attribute set used. Following is a query tree, which - consists of atomic query parts, eventually + consists of atomic query parts (APT), eventually paired by boolean binary operators, and finally recursively combined into complex query trees. @@ -69,12 +116,14 @@ The Zebra internal query procesing is modeled after the Bib1 attribute set, and the non-use - attributes type 2-9 are hard-wired in. It is therefore essential + attributes type 2-6 are hard-wired in. It is therefore essential to be familiar with . - +
+ - + - + - + @@ -114,7 +163,9 @@ using the standard boolean operators into new query trees. -
Attribute sets predefined in Zebra
exp-1exp-1 Explain attribute set Special attribute set used on the special automagic IR-Explain-1 database to gain information on @@ -91,7 +140,7 @@ and semantics.
bib-1bib-1 Bib1 attribute set Standard PQF query language attribute set which defines the semantics of Z39.50 searching. In addition, all of the @@ -99,7 +148,7 @@ processing
gilsgils GILS attribute set Extention to the Bib1 attribute set.
+
+ - + - + - + - + - - + + - + - + @@ -1062,8 +1491,8 @@ The above operands can be combined with the following operators: - -
Boolean operators
@and
@and binary AND operator Set intersection of two atomic queries hit sets
@or
@or binary OR operator Set union of two atomic queries hit sets
@not
@not binary AND NOT operator Set complement of two atomic queries hit sets
@prox
@prox binary PROXIMY operator Set intersection of two atomic queries hit sets. In addition, the intersection set is purged for all @@ -192,12 +243,13 @@ - Atomic queries + Atomic queries (APT) Atomic queries are the query parts which work on one acess point only. These consist of an attribute list followed by a single term or a - quoted term list. + quoted term list, and are often called + Attributes-Plus-Terms (APT) queries. Unsupplied non-use attributes type 2-9 are either inherited from @@ -205,7 +257,9 @@ See for details. - +
+ + + + All ordering operations are based on a lexicographical ordering, + expect when the + structure attribute numeric (109) is used. In + this case, ordering is numerical. See + . - + Ranked search for information retrieval in - the title-register - (see for the glory details): + the title-register: Z> find @attr 1=4 @attr 2=102 "information retrieval" - + + - Position Attributes (type = 3) + Position Attributes (type 3) + + + The position attribute specifies the location of the search term + within the field or subfield in which it appears. + + +
Atomic queries
+ + + + + + + + + + + + + + + + + + + + + + + + + + +
Position Attributes (type 3)
PositionValueNotes
First in field 1unsupported
First in subfield2unsupported
Any position in field3default
+ + + The position attribute values first in field (1), + and first in subfield(2) are unsupported. + Using them does not trigger an error, but silent defaults to + any position in field (3). + +
- Structure Attributes (type = 4) + Structure Attributes (type 4) + + + The structure attribute specifies the type of search + term. This causes the search to be mapped on + different Zebra internal indexes, which must have been defined + at index time. + + + + The possible values of the + structure attribute (type 4) can be defined + using the configuraiton file + tab/default.idx. + The default configuration is summerized in this table. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Structure Attributes (type 4)
StructureValueNotes
Phrase 1default
Word2supported
Key3supported
Year4supported
Date (normalized)5supported
Word list6supported
Date (un-normalized)100unsupported
Name (normalized) 101unsupported
Name (un-normalized) 102unsupported
Structure103unsupported
Urx104supported
Free-form-text105supported
Document-text106supported
Local-number107supported
String108unsupported
Numeric string109supported
+ + The structure attribute value local-number + (107) + is supported, and maps always to the Zebra internal document ID. + For example, in @@ -596,10 +915,101 @@ Truncation Attributes (type = 5) + + + The truncation attribute specifies whether variations of one or + more characters are allowed between serch term and hit terms, or + not. Using non-default truncation attributes will broaden the + document hit set of a search query. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Truncation Attributes (type 5)
TruncationValueNotes
Right truncation 1supported
Left truncation2supported
Left and right truncation3supported
Do not truncate100default
Process # in search term101supported
RegExpr-1 102supported
RegExpr-2103supported
+ + + Truncation attribute value + Process # in search term (100) is a + poor-man's regular expression search. It maps + each # to .*, and + performes then a Regexp-1 (102) regular + expression search. + + + Truncation attribute value + Regexp-1 (102) is a normal regular search, + see. + + + Truncation attribute value + Regexp-2 (103) is a Zebra specific extention + which allows fuzzy matches. One single + error in spelling of search terms is allowed, i.e., a document + is hit if it includes a term which can be mapped to the used + search term by one character substitution, addition, deletion or + change of posiiton. + +
Completeness Attributes (type = 6) + + This attribute is ONLY used if structure w, p is to be + chosen. completeness is ignorned if not w, p is to be + used.. + Incomplete field(1) is the default and makes Zebra use + register type w. + complete subfield(2) and complete field(3) both triggers + search field type p. + @@ -612,38 +1022,46 @@ set used in a search operation query.
- +
+ - + + - + + - + + - + + - + + - + + @@ -759,7 +1177,8 @@ Zebra Extention Term Reference Attribute (type 10) - Zebra supports the searchResult-1 facility. If attribute 10 is + Zebra supports the searchResult-1 facility. + If the Term Reference Attribute (type 10) is given, that specifies a subqueryId value returned as part of the search result. It is a way for a client to name an APT part of a query. @@ -785,36 +1204,42 @@ recognized regardless of attribute set used in a scan operation query. -
Zebra Search Attribute Extentions
Name and TypeNameValue Operation Zebra version
Embedded Sort (type 7)Embedded Sort7 search 1.1
Term Set (type 8)Term Set8 search 1.1
Rank weight (type 9)Rank Weight9 search 1.1
Approx Limit (type 9)Approx Limit9 search 1.4
Term Reference (type 10)Term Reference10 search 1.4
+
+ - + + - + + - + +
Zebra Scan Attribute Extentions
Name and TypeNameType Operation Zebra version
Result Set Narrow (type 8)Result Set Narrow8 scan 1.3
Approximative Limit (type 9)Approximative Limit9 scan 1.4
- + Zebra Extention Result Set Narrow (type 8) - If attribute 8 is given for scan, the value is the name of a - result set. Each hit count in scan is @and'ed with the result set - given. + If attribute Result Set Narrow (type 8) + is given for scan, the value is the name of a + result set. Each hit count in scan is + @and'ed with the result set given.
xMatches the character x.xMatches the character x.
.. Matches any character.
[ .. ][ .. ] Matches the set of characters specified; such as [abc] or [a-c].
+
- - + - - + - - + - - + - - + - +
Regular Expression Operators
x*Matches x zero or more times. + x*Matches x zero or more times. Priority: high.
x+Matches x one or more times. + x+Matches x one or more times. Priority: high.
x? Matches x zero or once. + x? Matches x zero or once. Priority: high.
xy Matches x, then y. + xy Matches x, then y. Priority: medium.
x|y Matches either x or y. + x|y Matches either x or y. Priority: low.
( )( ) The order of evaluation may be changed by using parentheses.
- + - If the first character of the Regxp-2 query + If the first character of the Regxp-2 query is a plus character (+) it marks the beginning of a section with non-standard specifiers. The next plus character marks the end of the section. @@ -1128,8 +1557,7 @@ Combinations with other attributes are possible. For example, a - ranked search with a regular expression - (see for the glory details): + ranked search with a regular expression: Z> find @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval" @@ -1144,7 +1572,7 @@ process input records. Two basic types of processing are available - raw text and structured data. Raw text is just that, and it is selected by providing the - argument text to Zebra. Structured records are + argument text to Zebra. Structured records are all handled internally using the basic mechanisms described in the subsequent sections. Zebra can read structured records in many different formats.