From 558bf94a5f36eb89b0ca7ac4780b641da852c36b Mon Sep 17 00:00:00 2001 From: Marc Cromme Date: Tue, 13 Jun 2006 13:45:08 +0000 Subject: [PATCH] added a lot of info about attribute sets, PQF query structure, and string use attributes --- doc/administration.xml | 6 +- doc/architecture.xml | 6 +- doc/introduction.xml | 10 +- doc/querymodel.xml | 307 ++++++++++++++++++++++++++++++++++++++--- doc/recordmodel-alvisxslt.xml | 8 +- doc/server.xml | 14 +- doc/zebrasrv-man.xml | 8 +- doc/zebrasrv-virtual.xml | 17 +-- 8 files changed, 320 insertions(+), 56 deletions(-) diff --git a/doc/administration.xml b/doc/administration.xml index 10babae..7303d30 100644 --- a/doc/administration.xml +++ b/doc/administration.xml @@ -1,5 +1,5 @@ - + Administrating Zebra + Overview of Zebra Architecture @@ -173,10 +173,10 @@ as HTTP server, honoring SRW SOAP requests, and - SRU + SRU REST requests. Moreover, it can translate incoming - CQL + CQL queries to PQF queries, if diff --git a/doc/introduction.xml b/doc/introduction.xml index 81deaab..6c43d25 100644 --- a/doc/introduction.xml +++ b/doc/introduction.xml @@ -1,5 +1,5 @@ - + Introduction @@ -443,14 +443,14 @@ SRW-to-Z39.50 gateway, currently in beta test. --> Experimental support of the - Search/Retrieve Via URL ( SRU) - + Search/Retrieve Via URL ( SRU) + REST webservice, and the Search/Retrieve Web Service ( SRW) SOAP Web Service have recently been added to the YAZ/Zebra - combo - including server side Common Query Language (CQL) - parsing + combo - including server side Common Query Language (CQL) + parsing and configuration. It remains to find a sponsor for further testing, documentation and packaging of this exiting component. diff --git a/doc/querymodel.xml b/doc/querymodel.xml index bae113f..20e6b4b 100644 --- a/doc/querymodel.xml +++ b/doc/querymodel.xml @@ -1,5 +1,5 @@ - + Query Model @@ -8,8 +8,8 @@ Zebra is born as a networking Information Retrieval engine adhering to the international standards - Z39.50 and - SRU, + Z39.50 and + SRU, and implement the query model defined there. Unfortunately, the Z39.50 query model has only defined a binary encoded representation, which is used as transport packaging in @@ -29,7 +29,7 @@ In addition, Zebra can be configured to understand and map the Common Query Language - (CQL) + (CQL) to PQF. See an introduction on the mapping to the internal query representation in . @@ -40,22 +40,281 @@ Prefix Query Format structure and syntax The - PQF - grammer is documented in the YAZ manual. + PQF + grammer is documented in the YAZ manual, and shall not be + repeated here. This textual PQF representation is always during search mapped to the equivalent Zebra internal query parse tree. + + PQF tree structure + The PQF parse tree - or the equivalent textual representation - + may start with one specification of the + attribute set used. Following is a query + tree, which + consists of atomic query parts, eventually + paired by boolean binary operators, and + finally recursively combined into + complex query trees. + + Attribute sets + + Attribute sets define the exact meaning and semantics of queries + issued. Zebra comes with some predefined attribute set + definitions, others can easily be defined and added to the + configuration. + + The Zebra internal query procesing is modeled after + the Bib1 attribute set, and the non-use + attributes type 2-9 are hard-wired in. It is therefore essential + to be familiar with . + + + + + + + + + + + + + + + + + + + + + + + +
Attribute sets predefined in Zebra
exp-1Explain attribute setSpecial attribute set used on the special automagic + IR-Explain-1 database to gain information on + server capabilities, database names, and database + and semantics.
bib-1Bib1 attribute setStandard PQF query language attribute set which defines the + semantics of Z39.50 searching. In addition, all of the + non-use attributes (type 2-9) define the Zebra internal query + processing
gilsGILS attribute setExtention to the Bib1 attribute set.
+
+ + + Boolean operators + + A pair of subquery trees, or of atomic queries, is combined + using the standard boolean operators into new query trees. + + + + + + + + + + + + + + + + + + + + + + + +
Boolean operators
@andbinary AND operatorSet intersection of two atomic queries hit sets
@orbinary OR operatorSet union of two atomic queries hit sets
@notbinary AND NOT operatorSet complement of two atomic queries hit sets
@proxbinary PROXIMY operatorSet intersection of two atomic queries hit sets. In + addition, the intersection set is purged for all + documents which do not satisfy the requested query + term proximity. Usually a proper subset of the AND + operation.
+ + + For example, we can combine the terms + information and retrieval + into different searches in the default index of the default + attribute set as follows. + Querying for the union of all documents containing the + terms information OR + retrieval: + + @or information retrieval + + + + Querying for the intersection of all documents containing the + terms information AND + retrieval: + The hit set is a subset of the coresponding + OR query. + + @and information retrieval + + + + Querying for the intersection of all documents containing the + terms information AND + retrieval, taking proximity into account: + The hit set is a subset of the coresponding + AND query. + + @prox information retrieval + + + + Querying for the intersection of all documents containing the + terms information AND + retrieval, in the same order and near each + other as described in the term list + The hit set is a subset of the coresponding + PROXIMY query. + + "information retrieval" + + +
+ + + + Atomic queries + + Atomic queries are the query parts which work on one acess point + only. These consist of an attribute list + followed by a single term or a + quoted term list. + + + Unsupplied non-use attributes type 2-9 are either inherited from + higher nodes in the query tree, or are set to Zebra's default values. + See for details. + + + + + + + + + + + + + + + +
Atomic queries
attribute listList of orthogonal attributesAny of the orthogonal attribute types may be omitted, + these are inherited from higher query tree nodes, or if not + inherited, are set to the default Zebra configuration values. +
termsingle term + or quoted term list Here the search terms or list of search terms is added + to the query
+ + Querying for the term information in the + default index using the default attribite set, the server choice + of access point/index, and the default non-use attributes. + + "information" + + + + Equivalent query fully specified: + + @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 "information" + + + + + Finding all documents which have empty titles. Notice that the + empty term must be quoted, but is otherwise legal. + + @attr 1=4 "" + + + +
+ + + Zebra's special use attribute of type 'string' + + The numeric use (type 1) attribute is usually + refered to from a given + attribute set. In addition, Zebra let you use + any internal index + name defined in your configuration + as use atribute value. This is a great feature for + debugging, and when you do + not need the complecity of defined use attribute values. It is + the preferred way of accessing Zebra indexes directly. + + + Finding all documents which have the term list "information + retrieval" in an Zebra index, using it's internal full string name. + + @attr 1=sometext "information retrieval" + + + + Searching the bib-1 use attribute 54 using it's string name: + + @attr 1=Code-language eng + + + + Searching in any silly string index - if it's defined in your + indexation rules and can be parsed by the PQF parser. + This is definitely not the recommended use of + this facility, as it might confuse your users with some very + unexpected results. + + @attr 1=silly/xpath/alike[@index]/name "information retrieval" + + + + See for details, and + + for the SRU PQF query extention using string names as a fast + debugging facility. + + + +
+ Explain Attribute Set + + The Z39.50 standard defines the + Explainattribute set + exp-1, which is used to discover information + about a server's search semantics and functional capabilities + Zebra exposes a "classic" + Explain database by base name IR-Explain-1, which + is populated with system internal information. + - The attribute-set exp-1 is defined for - searching an Explain IR-Explain-1 database. - It consists of a single Use (type 1) attribute. + The attribute-set exp-1 consists of a single + Use (type 1) attribute. In addition, the non-Use @@ -63,7 +322,7 @@ Relation, Position, Structure, Truncation, and Completeness are imported from - the bib-1 attrubute set, and may be used + the bib-1 attribute set, and may be used within any explain query. @@ -90,6 +349,15 @@ Explain searches with yaz-client + + Classic Explain only defines retrieval of Explain information + via ASN.1. Pratically no Z39.50 clients supports this. Fortunately + they don't have to - Zebra allows retrieval of this information + in other formats: + SUTRS, XML, + GRS-1 and ASN.1 Explain. + + List supported categories to find out which explain commands are supported: @@ -173,10 +441,9 @@ Most of the information contained in this section is an excerpt of the ATTRIBUTE SET BIB-1 (Z39.50-1995) SEMANTICS, found at The BIB-1 + url="&url.z39.50.attset.bib1.1995;">The BIB-1 Attribute Set Semantics from 1995, also in an updated - Bib-1 + Bib-1 Attribute Set version from 2003. Index Data is not the copyright holder of this information. @@ -188,21 +455,21 @@ - Relation Attributes (type = 2) + Relation Attributes (type = 2) - Position Attributes (type = 3) + Position Attributes (type = 3) - Structure Attributes (type = 4) + Structure Attributes (type = 4) - Truncation Attributes (type = 5) + Truncation Attributes (type = 5) @@ -570,7 +837,7 @@ Hosts option, one can configure the YAZ Frontend CQL-to-PQF converter, specifying the interpretation of various - CQL + CQL indexes, relations, etc. in terms of Type-1 query attributes.
@@ -639,10 +906,10 @@ http://www.loc.gov/z3950/agency/document.html PQF and BIB-1 stuff to be explained - + http://www.loc.gov/z3950/agency/defns/bib1.html - + http://www.loc.gov/z3950/agency/bib1.html http://www.loc.gov/z3950/agency/markup/13.html diff --git a/doc/recordmodel-alvisxslt.xml b/doc/recordmodel-alvisxslt.xml index be69601..970e922 100644 --- a/doc/recordmodel-alvisxslt.xml +++ b/doc/recordmodel-alvisxslt.xml @@ -1,5 +1,5 @@ - + ALVIS XML Record Model and Filter Module @@ -58,7 +58,7 @@ unique, these are the literal schema or element set names used in SRW, - SRU and + SRU and Z39.50 protocol queries. The paths in the stylesheet attributes are relative to zebras working directory, or absolute to file @@ -218,7 +218,7 @@ the YAZ manual CQL section for the details of the YAZ frontend server - CQL + CQL configuration.
@@ -491,7 +491,7 @@ c) Main "alvis" XSLT filter config file: and so on. - in db/ a cql2pqf.txt yaz-client config file - which is also used in the yaz-server CQL-to-PQF process + which is also used in the yaz-server CQL-to-PQF process see: http://www.indexdata.com/yaz/doc/tools.tkl#tools.cql.map diff --git a/doc/server.xml b/doc/server.xml index 4113196..dd0a9e9 100644 --- a/doc/server.xml +++ b/doc/server.xml @@ -1,5 +1,5 @@ - + The Z39.50 Server @@ -309,17 +309,7 @@ The records in the explain database are of type - grs.sgml and can be retrieved as - SUTRS, XML, - GRS-1 and ASN.1 Explain. - - - Classic Explain only defines retrieaval of Explain information - via ASN.1. Pratically no Z39.50 clients supports this. Fortunately - they don't have to - since Zebra allows retrieval of this information - in the other formats. - - + grs.sgml. The root element for the Explain grs.sgml records is explain, thus explain.abs is used for indexing. diff --git a/doc/zebrasrv-man.xml b/doc/zebrasrv-man.xml index ae8ff7c..07aa192 100644 --- a/doc/zebrasrv-man.xml +++ b/doc/zebrasrv-man.xml @@ -1,6 +1,12 @@ + %local; + + %entities; + + %common; @@ -11,7 +17,7 @@ - + diff --git a/doc/zebrasrv-virtual.xml b/doc/zebrasrv-virtual.xml index 2ed8c92..248ab14 100644 --- a/doc/zebrasrv-virtual.xml +++ b/doc/zebrasrv-virtual.xml @@ -1,5 +1,5 @@ @@ -11,9 +11,9 @@ A backend can be configured to execute in a particular working - directory. Or the YAZ frontend may perform CQL to RPN conversion, thus - allowing traditional Z39.50 backends to be offered as a SRW/ SRU - service. SRW/ SRU Explain information for a particular backend may also + directory. Or the YAZ frontend may perform CQL to RPN conversion, thus + allowing traditional Z39.50 backends to be offered as a SRW/ SRU + service. SRW/ SRU Explain information for a particular backend may also be specified. @@ -128,8 +128,8 @@ element cql2rpn (optional) - Specifies a filename that includes CQL to RPN conversion for this - backend server. See CQL section in YAZ manual. + Specifies a filename that includes CQL to RPN conversion for this + backend server. See CQL section in YAZ manual. If given, the backend server will only "see" a Type-1/RPN query. @@ -138,7 +138,7 @@ element explain (optional) - Specifies SRW/ SRU ZeeRex content for this server. Copied verbatim + Specifies SRW/ SRU ZeeRex content for this server. Copied verbatim to the client. As things are now, some of the Explain content seems redundant because host information, etc. is also stored elsewhere. @@ -199,7 +199,8 @@ elements. - For "server2" elements for CQL to RPN conversion + For "server2" elements for +CQL to RPN conversion is supported and explain information has been added (a short one here to keep the example small). -- 1.7.10.4