From: Marc Cromme Date: Thu, 22 Jun 2006 14:01:55 +0000 (+0000) Subject: zebra specific stuff split into advanced section X-Git-Tag: before.bug.529~13 X-Git-Url: http://git.indexdata.com/?p=idzebra-moved-to-github.git;a=commitdiff_plain;h=997db1975fa2132c9bb155b69c86f1310f5136b4 zebra specific stuff split into advanced section added section on completeness field tested and verified all PQF examples --- diff --git a/doc/Makefile.am b/doc/Makefile.am index 9d74037..f88b056 100644 --- a/doc/Makefile.am +++ b/doc/Makefile.am @@ -1,4 +1,4 @@ -## $Id: Makefile.am,v 1.48 2006-06-13 09:26:59 marc Exp $ +## $Id: Makefile.am,v 1.49 2006-06-22 14:01:55 marc Exp $ docdir=$(datadir)/doc/@PACKAGE@ SUBDIRS = common @@ -59,6 +59,7 @@ HTMLFILES = \ protocol-support.html \ querymodel-cql-to-pqf.html \ querymodel-pqf.html \ + querymodel-zebra.html \ querymodel.html \ quick-start.html \ record-model-alvisxslt-conf.html \ diff --git a/doc/querymodel.xml b/doc/querymodel.xml index c7ebc17..d359d18 100644 --- a/doc/querymodel.xml +++ b/doc/querymodel.xml @@ -1,5 +1,5 @@ - + Query Model @@ -222,6 +222,7 @@ Extention to the Bib1 attribute set. predefined + @@ -260,11 +262,13 @@ frame="all" rowsep="1" colsep="1" align="center"> Boolean operators - @and binary AND operator @@ -318,8 +322,9 @@ The hit set is a subset of the coresponding AND query. - Z> find @prox information retrieval + Z> find @prox 0 3 0 2 k 2 information retrieval + See PQF grammer for details. Querying for the intersection of all documents containing the @@ -360,14 +365,16 @@ --> - attribute list + + attribute list List of orthogonal attributes Any of the orthogonal attribute types may be omitted, these are inherited from higher query tree nodes, or if not inherited, are set to the default Zebra configuration values. - term + + term single term or quoted term list Here the search terms or list of search terms is added @@ -460,19 +467,24 @@ Finding all documents which have the term list "information - retrieval" in an Zebra index, using it's internal full string name. + retrieval" in an Zebra index, using it's internal full string + name. Scanning the same index. Z> find @attr 1=sometext "information retrieval" + Z> scan @attr 1=sometext aterm - Searching the bib-1 use attribute 54 using it's string name: + Searching or scanning + the bib-1 use attribute 54 using it's string name: Z> find @attr 1=Code-language eng + Z> scan @attr 1=Code-language "" - Searching in any silly string index - if it's defined in your + It is possible to search + in any silly string index - if it's defined in your indexation rules and can be parsed by the PQF parser. This is definitely not the recommended use of this facility, as it might confuse your users with some very @@ -482,7 +494,7 @@ - See for details, and + See also for details, and for the SRU PQF query extention using string names as a fast debugging facility. @@ -504,7 +516,7 @@ XPath queries can entered at search time, and second, it might confuse users very much that an XPath-alike index name in fact gets populated from a possible entirely different XML element - than it pretends to acess. + than it pretends to access. When using the GRS Record Model @@ -546,15 +558,25 @@ + + Searching inside attribute strings is possible: + + Z> find @attr 1=/link/@creator morten + + + Filter the adressing XPath by a predicate working on exact string values in attributes (in the XML sense) can be done: return all those docs which have the term "english" contained in one of all text subnodes of the subtree defined by the XPath - /record/title[@lang='en'] + /record/title[@lang='en']. And similar + predicate filtering. Z> find @attr 1=/record/title[@lang='en'] english + Z> find @attr 1=/link[@creator='sisse'] sibelius + Z> find @attr 1=/link[@creator='sisse']/description[@xml:lang='da'] sibelius @@ -571,8 +593,7 @@ with '{ }' to prevent syntax errors: Z> find @attr {1=/root/first[@attr='danish']} content - Z> find @attr {1=/root/second[@attr='danish lake']} - Z> find @attr {1=/root/third[@attr='dansk s\xc3\xb8']} + Z> find @attr {1=/record/@set} oai @@ -755,17 +776,17 @@ In addition, Zebra allows the acess of internal index names and dynamic - XPath as use attributes. - See and - for - alternative acess to the Zebra internal index names and XPath queries. + XPath as use attributes; see + and + . Phrase search for information retrieval in - the title-register: + the title-register, scanning the same register afterwards: Z> find @attr 1=4 "information retrieval" + Z> scan @attr 1=4 information @@ -935,7 +956,7 @@ The possible values of the structure attribute (type 4) can be defined - using the configuraiton file + using the configuration file tab/default.idx. The default configuration is summerized in this table. @@ -1034,16 +1055,56 @@ - + + + + The structure attribute values + Word list (6) + is supported, and maps to the boolean AND + combination of words supplied. The word list is useful when + google-like bag-of-word queries need to be translated from a GUI + query language to PQF. For example, the following queries + are equivalent: + + Z> find @attr 1=Title @attr 4=6 "mozart amadeus" + Z> find @attr 1=Title @and mozart amadeus + + + + + The structure attribute value + Free-form-text (105) and + Document-text (106) + are supported, and map both to the boolean OR + combination of words supplied. The following queries + are equivalent: + + Z> find @attr 1=Body-of-text @attr 4=105 "bach salieri teleman" + Z> find @attr 1=Body-of-text @attr 4=106 "bach salieri teleman" + Z> find @attr 1=Body-of-text @or bach @or salieri teleman + + This OR list of terms is very usefull in + combination with relevance ranking: + + Z> find @attr 1=Body-of-text @attr 2=102 @attr 4=105 "bach salieri teleman" + + - The structure attribute value local-number - (107) - is supported, and maps always to the Zebra internal document ID. - + The structure attribute value + Local number (107) + is supported, and maps always to the Zebra internal document ID, + irrespectively which use attribute is specified. The following queries + have exactly the same unique record in the hit set: + + Z> find @attr 4=107 10 + Z> find @attr 1=4 @attr 4=107 10 + Z> find @attr 1=1010 @attr 4=107 10 + + - For example, in + In the GILS schema (gils.abs), the west-bounding-coordinate is indexed as type n, and is therefore searched by specifying @@ -1054,6 +1115,13 @@ Z> find @attr 4=109 @attr 2=5 @attr gils 1=2038 -114 + + The exact mapping between PQF queries and Zebra internal indexes + and index types is explained in + . + + + Truncation Attributes (type = 5) @@ -1116,44 +1184,161 @@ - Truncation attribute value - Process # in search term (100) is a + The truncation attribute values 1-3 perform the obvious way: + + Z> scan @attr 1=Body-of-text schnittke + ... + * schnittke (81) + schnittkes (31) + schnittstelle (1) + ... + Z> find @attr 1=Body-of-text @attr 5=1 schnittke + ... + Number of hits: 95, setno 7 + ... + Z> find @attr 1=Body-of-text @attr 5=2 schnittke + ... + Number of hits: 81, setno 6 + ... + Z> find @attr 1=Body-of-text @attr 5=3 schnittke + ... + Number of hits: 95, setno 8 + + + + + The truncation attribute value + Process # in search term (101) is a poor-man's regular expression search. It maps each # to .*, and performes then a Regexp-1 (102) regular - expression search. + expression search. The following two queries are equivalent: + + Z> find @attr 1=Body-of-text @attr 5=101 schnit#ke + Z> find @attr 1=Body-of-text @attr 5=102 schnit.*ke + ... + Number of hits: 89, setno 10 + + - Truncation attribute value + The truncation attribute value Regexp-1 (102) is a normal regular search, - see. + see for details. + + Z> find @attr 1=Body-of-text @attr 5=102 schnit+ke + Z> find @attr 1=Body-of-text @attr 5=102 schni[a-t]+ke + + - Truncation attribute value + The truncation attribute value Regexp-2 (103) is a Zebra specific extention which allows fuzzy matches. One single error in spelling of search terms is allowed, i.e., a document is hit if it includes a term which can be mapped to the used search term by one character substitution, addition, deletion or change of posiiton. + + Z> find @attr 1=Body-of-text @attr 5=100 schnittke + ... + Number of hits: 81, setno 14 + ... + Z> find @attr 1=Body-of-text @attr 5=103 schnittke + ... + Number of hits: 103, setno 15 + ... + - Completeness Attributes (type = 6) + + - This attribute is ONLY used if structure w, p is to be - chosen. completeness is ignorned if not w, p is to be - used.. - Incomplete field(1) is the default and makes Zebra use - register type w. - complete subfield(2) and complete field(3) both triggers - search field type p. + The Completeness Attributes (type = 6) + is used to specify that a given search term or term list is either + part of the terms of a given index/field + (Incomplete subfield (1)), or is + what literally is found in the entire field's index + (Complete field (3)). + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Completeness Attributes (type = 6)
CompletenessValueNotes
Incomplete subfield1default
Complete subfield2depreciated
Complete field3supported
+ + + The Completeness Attributes (type = 6) + is only partially and conditionally + supported in the sense that it is ignored if the hit index is + not of structure type="w" or + type="p". + + + Incomplete subfield (1) is the default, and + makes Zebra use + register type="w", whereas + Complete field (3) triggers + search and scan in index type="p". + + + The Complete subfield (2) is a reminiscens + from the happy MARC + binary format days. Zebra does not support it, but maps silently + to Complete field (3). + + + The exact mapping between PQF queries and Zebra internal indexes + and index types is explained in + . +
+ +
+ + + + Advanced Zebra PQF Features + + The Zebra internal query engine has been extended to specific needs + not covered by the bib-1 attribute set query + model. These extentions are non-standard + and non-portable: most functional extentions + are modeled over the bib-1 attribute set, + defining type 7-9 attributes. + There are also the speciel + string type index names for the + idxpath attribute set. + @@ -1462,11 +1647,9 @@ IDXPATH Use Attributes (type = 1) This attribute set allows one to search GRS filter indexed - records by XPATH like structured index names. It is enabled by - specifying the + records by XPATH like structured index names. - The idxpath option defines hard-coded index names, which might clash with your own index names. @@ -1571,6 +1754,16 @@ Z> find @and @attr 1=_XPATH_BEGIN @attr 4=3 link/ @attr 1=_XPATH_CDATA mozart + + Scanning is supportet on all idxpath + indexes, both specified as numeric use attributes, or as string + index names. + + Z> scan @attrset idxpath @attr 1=1016 text + Z> scan @attr 1=_XPATH_ATTR_CDATA anothertext + Z> scan @attrset idxpath @attr 1=3 @attr 4=3 '' + +