X-Git-Url: http://git.indexdata.com/?p=yaz-moved-to-github.git;a=blobdiff_plain;f=doc%2Ftools.xml;h=6c02fc5cc920f498ae323057c7315b9de0ef5060;hp=450b3fc1602b7dd3a949da8e3ddb6c5d6a4113b2;hb=b7dce0eae6656ccb499233e04f5a5bf90178c7cd;hpb=73f8c92214bd7afdb0e465dec053272130b53bb5 diff --git a/doc/tools.xml b/doc/tools.xml index 450b3fc..6c02fc5 100644 --- a/doc/tools.xml +++ b/doc/tools.xml @@ -1,4 +1,4 @@ - + Supporting Tools @@ -16,7 +16,7 @@ Z_RPNQuery structure. Some programmers will prefer to construct the query manually, perhaps using odr_malloc() to simplify memory management. - The &yaz; distribution includes two separate, query-generating tools + The &yaz; distribution includes three separate, query-generating tools that may be of use to you. @@ -131,7 +131,7 @@ top-set ::= [ '@attrset' string ] - query-struct ::= attr-spec | simple | complex | '@term' term-type + query-struct ::= attr-spec | simple | complex | '@term' term-type query attr-spec ::= '@attr' [ string ] string query-struct @@ -173,11 +173,15 @@ The @attr operator is followed by an attribute specification (attr-spec above). The specification consists - of optional an attribute set, an attribute type-value pair and - a sub query. The attribute type-value pair is packed in one string: - an attribute type, a dash, followed by an attribute value. + of an optional attribute set, an attribute type-value pair and + a sub-query. The attribute type-value pair is packed in one string: + an attribute type, an equals sign, and an attribute value, like this: + @attr 1=1003. The type is always an integer but the value may be either an integer or a string (if it doesn't start with a digit character). + A string attribute-value is encoded as a Type-1 ``complex'' + attribute with the list of values containing the single string + specified, and including no semantic indicators. @@ -297,101 +301,111 @@ PQF queries - Queries using simple terms. - - dylan - "bob dylan" - - - Boolean operators. - - @or "dylan" "zimmerman" - @and @or dylan zimmerman when - @and when @or dylan zimmerman - - - - Reference to result sets. - - @set Result-1 - @and @set seta setb - - - - Attributes for terms. - - @attr 1=4 computer - @attr 1=4 @attr 4=1 "self portrait" - @attr exp1 @attr 1=1 CategoryList - @attr gils 1=2008 Copenhagen - @attr 1=/book/title computer - - - - Proximity. - - @prox 0 3 1 2 k 2 dylan zimmerman - - - Here the parameters 0, 3, 1, 2, k and 2 represent exclusion, - distance, ordered, relation, which-code and unit-code, in that - order. So: - - - exclusion = 0: the proximity condition must hold - - - distance = 3: the terms must be three units apart - - - ordered = 1: they must occur in the order they are specified - - - relation = 2: lessThanOrEqual (to the distance of 3 units) - - - which-code is ``known'', so the standard unit-codes are used - - - unit-code = 2: word. - - - So the whole proximity query means that the words - dylan and zimmerman must - both occur in the record, in that order, differing in position - by three or fewer words (i.e. with two or fewer words between - them.) The query would find ``Bob Dylan, aka. Robert - Zimmerman'', but not ``Bob Dylan, born as Robert Zimmerman'' - since the distance in this case is four. - - - - Specifying term type. - - @term string "a UTF-8 string, maybe?" - - - Mixed queries - - @or @and bob dylan @set Result-1 - - @attr 4=1 @and @attr 1=1 "bob dylan" @attr 1=4 "slow train coming" - - @and @attr 2=4 @attr gils 1=2038 -114 @attr 2=2 @attr gils 1=2039 -109 + PQF queries using simple terms + + + dylan + "bob dylan" + + + + PQF boolean operators + + + @or "dylan" "zimmerman" + @and @or dylan zimmerman when + @and when @or dylan zimmerman + + + + PQF references to result sets + + + @set Result-1 + @and @set seta setb + + + + Attributes for terms + + + @attr 1=4 computer + @attr 1=4 @attr 4=1 "self portrait" + @attrset exp1 @attr 1=1 CategoryList + @attr gils 1=2008 Copenhagen + @attr 1=/book/title computer + + + + PQF Proximity queries + + + @prox 0 3 1 2 k 2 dylan zimmerman + + + Here the parameters 0, 3, 1, 2, k and 2 represent exclusion, + distance, ordered, relation, which-code and unit-code, in that + order. So: + + + exclusion = 0: the proximity condition must hold + + + distance = 3: the terms must be three units apart + + + ordered = 1: they must occur in the order they are specified + + + relation = 2: lessThanOrEqual (to the distance of 3 units) + + + which-code is ``known'', so the standard unit-codes are used + + + unit-code = 2: word. + + + So the whole proximity query means that the words + dylan and zimmerman must + both occur in the record, in that order, differing in position + by three or fewer words (i.e. with two or fewer words between + them.) The query would find ``Bob Dylan, aka. Robert + Zimmerman'', but not ``Bob Dylan, born as Robert Zimmerman'' + since the distance in this case is four. + + + + PQF specification of search term + + + @term string "a UTF-8 string, maybe?" + + + + PQF mixed queries + + + @or @and bob dylan @set Result-1 + + @attr 4=1 @and @attr 1=1 "bob dylan" @attr 1=4 "slow train coming" + + @and @attr 2=4 @attr gils 1=2038 -114 @attr 2=2 @attr gils 1=2039 -109 - + - The last of these examples is a spatial search: in - the GILS attribute set, - access point - 2038 indicates West Bounding Coordinate and - 2030 indicates East Bounding Coordinate, - so the query is for areas extending from -114 degrees - to no more than -109 degrees. + access point + 2038 indicates West Bounding Coordinate and + 2030 indicates East Bounding Coordinate, + so the query is for areas extending from -114 degrees + to no more than -109 degrees. - - + + + CCL @@ -407,8 +421,7 @@ - The EUROPAGATE - research project working under the Libraries programme + The EUROPAGATE research project working under the Libraries programme of the European Commission's DG XIII has, amongst other useful tools, implemented a general-purpose CCL parser which produces an output structure that can be trivially converted to the internal RPN @@ -548,82 +561,153 @@ or c for completeness. The attributes for the special qualifier name term are used when no CCL qualifier is given in a query. + Common Bib-1 attributes + + + + + + Type + Description + + + + + u=value + + Use attribute. Common use attributes are + 1 Personal-name, 4 Title, 7 ISBN, 8 ISSN, 30 Date, + 62 Subject, 1003 Author), 1016 Any. Specify value + as an integer. + + + + + r=value + + Relation attribute. Common values are + 1 <, 2 <=, 3 =, 4 >=, 5 >, 6 <>, + 100 phonetic, 101 stem, 102 relevance, 103 always matches. + + + + + p=value + + Position attribute. Values: 1 first in field, 2 + first in any subfield, 3 any position in field. + + + + + s=value + + Structure attribute. Values: 1 phrase, 2 word, + 3 key, 4 year, 5 date, 6 word list, 100 date (un), + 101 name (norm), 102 name (un), 103 structure, 104 urx, + 105 free-form-text, 106 document-text, 107 local-number, + 108 string, 109 numeric string. + + + + + t=value + + Truncation attribute. Values: 1 right, 2 left, + 3 left& right, 100 none, 101 process #, 102 regular-1, + 103 regular-2, 104 CCL. + + + + + c=value + + Completeness attribute. Values: 1 incomplete subfield, + 2 complete subfield, 3 complete field. + + + + + +
- The attribute value val may be - specified as in integer. It is also possible to specify - non-numeric values, however, which are used in combination with - certain types. The special combinations are: - - s=pw - - The structure is set to either word or phrase depending - on the number of tokens in a term (phrase-word). - - - - - s=al - - Each token in the term is ANDed. (and-list). - This does not set the structure at all. - - - - - s=ol - - Each token in the term is ORed. (or-list). - This does not set the structure at all. - - - - - r=o - - Allows operators greather-than, less-than, ... equals and - sets relation attribute accordingly (relation ordered). - - - - - t=l - - Allows term to be left-truncated. - If term is of the form ?x, the resulting - Type-1 term is x and truncation is left. - - - - - t=r - - Allows term to be right-truncated. - If term is of the form x?, the resulting - Type-1 term is x and truncation is right. - - - - - t=n - - If term is does not include ?, the - truncation attribute is set to none (100). - - - - - t=b - - Allows term to be both left&right truncated. - If term is of the form ?x?, the - resulting term is x and trunctation is - set to both left&right. - - - - - + The complete list of Bib-1 attributes can be found + + here + . + + + It is also possible to specify non-numeric attribute values, + which are used in combination with certain types. + The special combinations are: + + Special attribute combos + + + + + + Name + Description + + + + + s=pw + The structure is set to either word or phrase depending + on the number of tokens in a term (phrase-word). + + + + s=al + Each token in the term is ANDed. (and-list). + This does not set the structure at all. + + + + s=ol + Each token in the term is ORed. (or-list). + This does not set the structure at all. + + + + r=o + Allows operators greather-than, less-than, ... equals and + sets relation attribute accordingly (relation ordered). + + + + t=l + Allows term to be left-truncated. + If term is of the form ?x, the resulting + Type-1 term is x and truncation is left. + + + + t=r + Allows term to be right-truncated. + If term is of the form x?, the resulting + Type-1 term is x and truncation is right. + + + + t=n + If term is does not include ?, the + truncation attribute is set to none (100). + + + + t=b + Allows term to be both left&right truncated. + If term is of the form ?x?, the + resulting term is x and trunctation is + set to both left&right. + + + + +
CCL profile @@ -635,26 +719,43 @@ au u=1 s=1 term s=105 ranked r=102 + date u=30 r=o - Three qualifiers are defined, ti, - au and ranked. + Four qualifiers are defined - ti, + au, ranked and + date. + + ti and au both set structure attribute to phrase (s=1). ti sets the use-attribute to 4. au sets the use-attribute to 1. When no qualifiers are used in the query the structure-attribute is - set to free-form-text (105). - + set to free-form-text (105) (rule for term). + The date sets the relation attribute to + the relation used in the CCL query and sets the use attribute + to 30 (Bib-1 Date). + You can combine attributes. To Search for "ranked title" you can do ti,ranked=knuth computer - which will use "relation is ranked", "use is title", "structure is - phrase". + which will set relation=ranked, use=title, structure=phrase. + + + Query + + year > 1980 + + is a valid query, while + + ti > 1980 + + is invalid. @@ -690,9 +791,9 @@ CCL directives - - - + + + Name @@ -1107,9 +1208,9 @@ int cql_transform_error(cql_transform_t ct, char **addinfop); error-code and sets the string-pointer at *addinfop to point to a string containing additional information about the error that occurred: for - example, if the error code is 15 (``Illegal or unsupported index + example, if the error code is 15 (``Illegal or unsupported context set''), the additional information is the name of the requested - index set that was not recognised. + context set that was not recognised. The SRW error-codes may be translated into brief human-readable @@ -1165,26 +1266,37 @@ int cql_transform_FILE(cql_transform_t ct, The following CQL patterns are recognized: - qualifier.set.name + index.set.name - This pattern is invoked when a CQL qualifier, such as + This pattern is invoked when a CQL index, such as dc.title is converted. set - and name is the index set and qualifier + and name are the context set and index name respectively. Typically, the RPN specifies an equivalent use attribute. - For terms not bound by a qualifier the pattern - qualifier.srw.serverChoice is used. - Here, the prefix srw is defined as - http://www.loc.gov/zing/cql/srw-indexes/v1.0/. + For terms not bound by an index the pattern + index.cql.serverChoice is used. + Here, the prefix cql is defined as + http://www.loc.gov/zing/cql/cql-indexes/v1.0/. If this pattern is not defined, the mapping will fail. + qualifier.set.name + (DEPRECATED) + + + + For backwards compatibility, this is recognised as a synonym of + index.set.name + + + + relation.relation @@ -1266,10 +1378,10 @@ int cql_transform_FILE(cql_transform_t ct, - This specification defines a CQL index set for a given prefix. + This specification defines a CQL context set for a given prefix. The value on the right hand side is the URI for the set - not RPN. All prefixes used in - qualifier patterns must be defined this way. + index patterns must be defined this way. @@ -1277,16 +1389,16 @@ int cql_transform_FILE(cql_transform_t ct, CQL to RPN mapping file - This simple file defines two index sets, three qualifiers and three + This simple file defines two context sets, three indexes and three relations, a position pattern and a default structure. @attr 1=1016 @attr 2=3 @attr 4=1 @attr 3=3 @attr 6=1 "computer" - by rules qualifier.srw.serverChoice, + by rules index.cql.serverChoice, relation.scr, structure.*, position.any. @@ -1415,15 +1527,13 @@ typedef struct oident PROTO_Z3950 - PROTO_SR + PROTO_GENERAL - If you don't care about talking to SR-based implementations (few - exist, and they may become fewer still if and when the ISO SR and ANSI - Z39.50 documents are merged into a single standard), you can ignore - this field on incoming packages, and always set it to PROTO_Z3950 - for outgoing packages. + Use PROTO_Z3950 for Z39.50 Object Identifers, + PROTO_GENERAL for other types (such as + those associated with ILL). @@ -1510,6 +1620,10 @@ typedef struct oident again, corresponding to the specific OIDs defined by the standard. + Refer to the + + Registry of Z39.50 Object Identifiers for the + whole list. @@ -1574,6 +1688,49 @@ typedef struct oident + Three utility functions are provided for translating OIDs' + symbolic names (e.g. Usmarc into OID structures + (int arrays) and strings containing the OID in dotted notation + (e.g. 1.2.840.10003.9.5.1). They are: + + + + int *oid_name_to_oid(oid_class oclass, const char *name, int *oid); + char *oid_to_dotstring(const int *oid, char *oidbuf); + char *oid_name_to_dotstring(oid_class oclass, const char *name, char *oidbuf); + + + + oid_name_to_oid() + translates the specified symbolic name, + interpreted as being of class oclass. (The + class must be specified as many symbolic names exist within + multiple classes - for example, Zthes is the + symbolic name of an attribute set, a schema and a tag-set.) The + sequence of integers representing the OID is written into the + area oid provided by the caller; it is the + caller's responsibility to ensure that this area is large enough + to contain the translated OID. As a convenience, the address of + the buffer (i.e. the value of oid) is + returned. + + + oid_to_dotstring() + Translates the int-array oid into a dotted + string which is written into the area oidbuf + supplied by the caller; it is the caller's responsibility to + ensure that this area is large enough. The address of the buffer + is returned. + + + oid_name_to_dotstring() + combines the previous two functions to derive a dotted string + representing the OID specified by oclass and + name, writing it into the buffer passed as + oidbuf and returning its address. + + + Finally, the module provides the following utility functions, whose meaning should be obvious: @@ -1611,7 +1768,7 @@ typedef struct oident release the associated memory again. For the structures describing the Z39.50 PDUs and related structures, it is convenient to use the memory-management system of the &odr; subsystem (see - Using ODR). However, in some circumstances + ). However, in some circumstances where you might otherwise benefit from using a simple nibble memory management system, it may be impractical to use odr_malloc() and odr_reset().