X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Fquerymodel.xml;h=bed7a2e3d153c420b414c4ce30ad8ad4a91fd327;hb=0002d3ccff37e5598553683e95714ca5711f05e8;hp=58c39c62f9537a5b14c5ec56627e2aae7e347ffe;hpb=6d074f35cdc58c223a2f0e4c7ee9d9be5d47ddfb;p=idzebra-moved-to-github.git diff --git a/doc/querymodel.xml b/doc/querymodel.xml index 58c39c6..bed7a2e 100644 --- a/doc/querymodel.xml +++ b/doc/querymodel.xml @@ -1,43 +1,85 @@ - + Query Model Query Model Overview - - Zebra is born as a networking Information Retrieval engine adhering - to the international standards - Z39.50 and - SRU, - and implement the query model defined there. - Unfortunately, the Z39.50 query model has only defined a binary - encoded representation, which is used as transport packaging in - the Z39.50 protocol layer. This representation is not human - readable, nor defines any convenient way to specify queries. - + + + Query Languages + + + Zebra is born as a networking Information Retrieval engine adhering + to the international standards + Z39.50 and + SRU, + and implement the query model defined there. + Unfortunately, the Z39.50 query model has only defined a binary + encoded representation, which is used as transport packaging in + the Z39.50 protocol layer. This representation is not human + readable, nor defines any convenient way to specify queries. + + + + Prefix Query Format (PQF) + - Therefore, Index Data has defined a textual representation of the - RPN query: Prefix Query Format, short - PQF, which then has been adopted by other - parties developing Z39.50 software. It is also often referred to as - Prefix Query Notation, or in short - PQN, and is thoroughly explained in - . - + Index Data has defined a textual representaion in the + Prefix Query Format, short + PQF, which then has been adopted by other + parties developing Z39.50 software. It is also often referred to as + Prefix Query Notation, or in short + PQN, and is thoroughly explained in + . + + + + + Common Query Language (CQL) - In addition, Zebra can be configured to understand and map the - Common Query Language - (CQL) - to PQF. See an introduction on the mapping to the internal query - representation in - . - - + In addition, Zebra can be configured to understand and map the + Common Query Language + (CQL) + to PQF. See an introduction on the mapping to the internal query + representation in + . + + + + + + + Query types + + + + + Explain Queries + + + + + + Search Queries + + + + + + Scan Queries + + + + + + + + Prefix Query Format structure and syntax @@ -72,7 +114,7 @@ The Zebra internal query procesing is modeled after the Bib1 attribute set, and the non-use - attributes type 2-9 are hard-wired in. It is therefore essential + attributes type 2-6 are hard-wired in. It is therefore essential to be familiar with . @@ -548,10 +590,33 @@ - Use Attributes (type = 1) + Use Attributes (type 1) + A use attribute specifies an access point for any atomic query. + These acess points are highly dependent on the attribute set used + in the query, and are user configurable using the following + default configuration files: + tab/bib1.att, + tab/dan1.att, + tab/explain.att, and + tab/gils.att. + New attribute sets can be added by adding new + tab/*.att configuration files, which need to + be sourced in the main configuration zebra.cfg. + + + + In addition, Zebra allows the acess of + internal index names and dynamic + XPath as use attributes. + See + for + alternative acess to the Zebra internal index names and XPath queries. + + + Phrase search for information retrieval in the title-register: @@ -561,23 +626,94 @@ - Relation Attributes (type = 2) - - - Supported operations: = (default, of omitted), < > <=, >= . - Unsupported: Not equal. + Relation Attributes (type 2) - The following relation attributes are also supported: relevance (102). - + + Relation attributes describe the relationship of the access + point (left side + of the relation) to the search term as qualified by the attributes (right + side of the relation), e.g., Date-publication <= 1975. + - All operations are based on a lexicographical ordering, - expect in the case for the - following structure attributes: numeric(109). + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Relation Attributes (type 2)
RelationValueNotes
Less than1supported
Less than or equal2supported
Equal3default
Greater or equal4supported
Greater than5supported
Not equal6unsupported
Phonetic100unsupported
Stem101unsupported
Relevance102supported
AlwaysMatches103supported
- -
- + + The relation attribute + relevance (102) is supported, see + for full information. + + + + All ordering operations are based on a lexicographical ordering, + expect when the + structure attribute numeric (109) is used. In + this case, ordering is numerical. See + . + + + Ranked search for information retrieval in the title-register (see for the glory details): @@ -585,22 +721,172 @@ Z> find @attr 1=4 @attr 2=102 "information retrieval"
- + + - Position Attributes (type = 3) + Position Attributes (type 3) + - Only value of (any position(3) is supported. first in field(1), - and first in subfield(2) are unsupported but using them - does not trigger an error. + The position attribute specifies the location of the search term + within the field or subfield in which it appears. + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Position Attributes (type 3)
PositionValueNotes
First in field 1unsupported
First in subfield2unsupported
Any position in field3default
+ + + The position attribute values first in field (1), + and first in subfield(2) are unsupported. + Using them does not trigger an error, but silent defaults to + any position in field (3). +
- Structure Attributes (type = 4) - + Structure Attributes (type 4) + + + The structure attribute specifies the type of search + term. This causes the search to be mapped on + different Zebra internal indexes, which must have been defined + at index time. + + + + The possible values of the + structure attribute (type 4) can be defined + using the configuraiton file + tab/default.idx. + The default configuration is summerized in this table. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Structure Attributes (type 4)
StructureValueNotes
Phrase 1default
Word2supported
Key3supported
Year4supported
Date (normalized)5supported
Word list6supported
Date (un-normalized)100unsupported
Name (normalized) 101unsupported
Name (un-normalized) 102unsupported
Structure103unsupported
Urx104supported
Free-form-text105supported
Document-text106supported
Local-number107supported
String108unsupported
Numeric string109supported
+ The structure attribute value local-number + (107) + is supported, and maps always to the Zebra internal document ID. + + + For example, in the GILS schema (gils.abs), the west-bounding-coordinate is indexed as type n, @@ -615,16 +901,86 @@ Truncation Attributes (type = 5) + + + The truncation attribute specifies whether variations of one or + more characters are allowed between serch term and hit terms, or + not. Using non-default truncation attributes will broaden the + document hit set of a search query. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Truncation Attributes (type 5)
TruncationValueNotes
Right truncation 1supported
Left truncation2supported
Left and right truncation3supported
Do not truncate100default
Process # in search term101supported
RegExpr-1 102supported
RegExpr-2103supported
+ + + Truncation attribute value + Process # in search term (100) is a + poor-man's regular expression search. It maps + each # to .*, and + performes then a Regexp-1 (102) regular + expression search. + + + Truncation attribute value + Regexp-1 (102) is a normal regular search, + see. + - Supported are: No truncation(100) which is the default, - Right trunation(1), Left truncation(2), - Left&Right truncation(3), - Process # in term(100) which maps - each # to .*, - Regexp-1(102) normal regular, Regexp-2(103) (regular with fuzzy), + Truncation attribute value + Regexp-2 (103) is a Zebra specific extention + which allows fuzzy matches. One single + error in spelling of search terms is allowed, i.e., a document + is hit if it includes a term which can be mapped to the used + search term by one character substitution, addition, deletion or + change of posiiton. + -
@@ -637,6 +993,7 @@ register type w. complete subfield(2) and complete field(3) both triggers search field type p. +
@@ -653,34 +1010,40 @@ Zebra Search Attribute Extentions - Name and Type + Name + Value Operation Zebra version - Embedded Sort (type 7) + Embedded Sort + 7 search 1.1 - Term Set (type 8) + Term Set + 8 search 1.1 - Rank weight (type 9) + Rank Weight + 9 search 1.1 - Approx Limit (type 9) + Approx Limit + 9 search 1.4 - Term Reference (type 10) + Term Reference + 10 search 1.4