X-Git-Url: http://git.indexdata.com/?p=yaz-moved-to-github.git;a=blobdiff_plain;f=doc%2Ftools.xml;h=56fe958ff30604d8c0f034e1028ea49a82ac4b59;hp=4d19542d5685bb1cdd82ae2fa83b8d2a44e128bb;hb=4ab240934731700f437e2bf8cb695e4b5fc9c0dc;hpb=cff1ce5798328abf2ef7dce859b47ba1d9ec04f9 diff --git a/doc/tools.xml b/doc/tools.xml index 4d19542..56fe958 100644 --- a/doc/tools.xml +++ b/doc/tools.xml @@ -1,4 +1,4 @@ - + Supporting Tools @@ -20,7 +20,7 @@ that may be of use to you. - Prefix Query Format + Prefix Query Format Since RPN or reverse polish notation is really just a fancy way of @@ -32,19 +32,73 @@ in simple test applications and scripting environments (like Tcl). The demonstration client included with YAZ uses the PQF. + + + + The PQF have been adopted by other parties developing Z39.50 + software. It is often referred to as Prefix Query Notation + - PQN. + + + + The PQF is defined by the pquery module in the YAZ library. + There are two sets of function that have similar behavior. First + set operates on a PQF parser handle, second set doesn't. First set + set of functions are more flexible than the second set. Second set + is obsolete and is only provided to ensure backwards compatibility. + + + First set of functions all operate on a PQF parser handle: + + + #include <yaz/pquery.h> + + YAZ_PQF_Parser yaz_pqf_create (void); + + void yaz_pqf_destroy (YAZ_PQF_Parser p); + + Z_RPNQuery *yaz_pqf_parse (YAZ_PQF_Parser p, ODR o, const char *qbuf); + + Z_AttributesPlusTerm *yaz_pqf_scan (YAZ_PQF_Parser p, ODR o, + Odr_oid **attributeSetId, const char *qbuf); + + + int yaz_pqf_error (YAZ_PQF_Parser p, const char **msg, size_t *off); + - The PQF is defined by the pquery module in the YAZ library. The - pquery.h file provides the declaration of the - functions + A PQF parser is created and destructed by functions + yaz_pqf_create and + yaz_pqf_destroy respectively. + Function yaz_pqf_parse parses query given + by string qbuf. If parsing was successful, + a Z39.50 RPN Query is returned which is created using ODR stream + o. If parsing failed, a NULL pointer is + returned. + Function yaz_pqf_scan takes a scan query in + qbuf. If parsing was successful, the function + returns attributes plus term pointer and modifies + attributeSetId to hold attribute set for the + scan request - both allocated using ODR stream o. + If parsing failed, yaz_pqf_scan returns a NULL pointer. + Error information for bad queries can be obtained by a call to + yaz_pqf_error which returns an error code and + modifies *msg to point to an error description, + and modifies *off to the offset within last + query were parsing failed. - -Z_RPNQuery *p_query_rpn (ODR o, oid_proto proto, const char *qbuf); + + The second set of functions are declared as follows: + + + #include <yaz/pquery.h> + + Z_RPNQuery *p_query_rpn (ODR o, oid_proto proto, const char *qbuf); -Z_AttributesPlusTerm *p_query_scan (ODR o, oid_proto proto, - Odr_oid **attributeSetP, const char *qbuf); + Z_AttributesPlusTerm *p_query_scan (ODR o, oid_proto proto, + Odr_oid **attributeSetP, const char *qbuf); -int p_query_attset (const char *arg); - + int p_query_attset (const char *arg); + The function p_query_rpn() takes as arguments an &odr; stream (see section The ODR Module) @@ -57,10 +111,10 @@ int p_query_attset (const char *arg); If the parse went well, p_query_rpn() returns a pointer to a Z_RPNQuery structure which can be - placed directly into a Z_SearchRequest. + placed directly into a Z_SearchRequest. + If parsing failed, due to syntax error, a NULL pointer is returned. - The p_query_attset specifies which attribute set to use if the query doesn't specify one by the @attrset operator. @@ -77,7 +131,7 @@ int p_query_attset (const char *arg); top-set ::= [ '@attrset' string ] - query-struct ::= attr-spec | simple | complex + query-struct ::= attr-spec | simple | complex | '@term' term-type attr-spec ::= '@attr' [ string ] string query-struct @@ -89,7 +143,7 @@ int p_query_attset (const char *arg); result-set ::= '@set' string. - term ::= string + term ::= string. proximity ::= exclusion distance ordered relation which-code unit-code. @@ -104,6 +158,8 @@ int p_query_attset (const char *arg); which-code ::= 'known' | 'private' | integer. unit-code ::= integer. + + term-type ::= 'general' | 'numeric' | 'string' | 'oid' | 'datetime' | 'null'. @@ -115,31 +171,230 @@ int p_query_attset (const char *arg); - The following are all examples of valid queries in the PQF. + The @attr operator is followed by an attribute specification + (attr-spec above). The specification consists + of optional an attribute set, an attribute type-value pair and + a sub query. The attribute type-value pair is packed in one string: + an attribute type, a dash, followed by an attribute value. + The type is always an integer but the value may be either an + integer or a string (if it doesn't start with a digit character). - - dylan - - "bob dylan" - - @or "dylan" "zimmerman" - - @set Result-1 - - @or @and bob dylan @set Result-1 - - @attr 4=1 @and @attr 1=1 "bob dylan" @attr 1=4 "slow train coming" - - @attr 4=1 @attr 1=4 "self portrait" - - @prox 0 3 1 2 k 2 dylan zimmerman + + Version 3 of the Z39.50 specification defines various encoding of terms. + Use @term type + string, + where type is one of: general, + numeric or string + (for InternationalString). + If no term type has been given, the general form + is used. This is the only encoding allowed in both versions 2 and 3 + of the Z39.50 standard. + + + + Using Proximity Operators with PQF + + + This is an advanced topic, describing how to construct + queries that make very specific requirements on the + relative location of their operands. + You may wish to skip this section and go straight to + the example PQF queries. + + + + + Most Z39.50 servers do not support proximity searching, or + support only a small subset of the full functionality that + can be expressed using the PQF proximity operator. Be + aware that the ability to express a + query in PQF is no guarantee that any given server will + be able to execute it. + + + + + + The proximity operator @prox is a special + and more restrictive version of the conjunction operator + @and. Its semantics are described in + section 3.7.2 (Proximity) of Z39.50 the standard itself, which + can be read on-line at + + + + In PQF, the proximity operation is represented by a sequence + of the form + +@prox exclusion distance ordered relation which-code unit-code + + in which the meanings of the parameters are as described in in + the standard, and they can take the following values: + + exclusion + 0 = false (i.e. the proximity condition specified by the + remaining parameters must be satisfied) or + 1 = true (the proximity condition specified by the + remaining parameters must not be + satisifed). + + distance + An integer specifying the difference between the locations + of the operands: e.g. two adjacent words would have + distance=1 since their locations differ by one unit. + + ordered + 1 = ordered (the operands must occur in the order the + query specifies them) or + 0 = unordered (they may appear in either order). + + relation + Recognised values are + 1 (lessThan), + 2 (lessThanOrEqual), + 3 (equal), + 4 (greaterThanOrEqual), + 5 (greaterThan) and + 6 (notEqual). + + which-code + known + or + k + (the unit-code parameter is taken from the well-known list + of alternatives described in below) or + private + or + p + (the unit-code paramater has semantics specific to an + out-of-band agreement such as a profile). + + unit-code + If the which-code parameter is known + then the recognised values are + 1 (character), + 2 (word), + 3 (sentence), + 4 (paragraph), + 5 (section), + 6 (chapter), + 7 (document), + 8 (element), + 9 (subelement), + 10 (elementType) and + 11 (byte). + If which-code is private then the + acceptable values are determined by the profile. + + + (The numeric values of the relation and well-known unit-code + parameters are taken straight from + the ASN.1 of the proximity structure in the standard.) + + - @and @attr 2=4 @attr gils 1=2038 -114 @attr 2=2 @attr gils 1=2039 -109 - + PQF queries + Queries using simple terms. + + dylan + "bob dylan" + + + Boolean operators. + + @or "dylan" "zimmerman" + @and @or dylan zimmerman when + @and when @or dylan zimmerman + + + + Reference to result sets. + + @set Result-1 + @and @set seta setb + + + + Attributes for terms. + + @attr 1=4 computer + @attr 1=4 @attr 4=1 "self portrait" + @attr exp1 @attr 1=1 CategoryList + @attr gils 1=2008 Copenhagen + @attr 1=/book/title computer + + + + Proximity. + + @prox 0 3 1 2 k 2 dylan zimmerman + + + Here the parameters 0, 3, 1, 2, k and 2 represent exclusion, + distance, ordered, relation, which-code and unit-code, in that + order. So: + + + exclusion = 0: the proximity condition must hold + + + distance = 3: the terms must be three units apart + + + ordered = 1: they must occur in the order they are specified + + + relation = 2: lessThanOrEqual (to the distance of 3 units) + + + which-code is ``known'', so the standard unit-codes are used + + + unit-code = 2: word. + + + So the whole proximity query means that the words + dylan and zimmerman must + both occur in the record, in that order, differing in position + by three or fewer words (i.e. with two or fewer words between + them.) The query would find ``Bob Dylan, aka. Robert + Zimmerman'', but not ``Bob Dylan, born as Robert Zimmerman'' + since the distance in this case is four. + + + + Specifying term type. + + @term string "a UTF-8 string, maybe?" + + + Mixed queries + + @or @and bob dylan @set Result-1 + + @attr 4=1 @and @attr 1=1 "bob dylan" @attr 1=4 "slow train coming" + + @and @attr 2=4 @attr gils 1=2038 -114 @attr 2=2 @attr gils 1=2039 -109 + + + + The last of these examples is a spatial search: in + the GILS attribute set, + access point + 2038 indicates West Bounding Coordinate and + 2030 indicates East Bounding Coordinate, + so the query is for areas extending from -114 degrees + to no more than -109 degrees. + + + + - Common Command Language + Common Command Language Not all users enjoy typing in prefix query structures and numerical @@ -211,40 +466,43 @@ int p_query_attset (const char *arg); -- Proximity operator - - - The following queries are all valid: - - - - dylan - - "bob dylan" - - dylan or zimmerman - - set=1 - - (dylan and bob) or set=1 - - - - Assuming that the qualifiers ti, au - and date are defined we may use: - - - - ti=self portrait - - au=(bob dylan and slow train coming) - - date>1980 and (ti=((self portrait))) - - - + + CCL queries + + The following queries are all valid: + + + + dylan + + "bob dylan" + + dylan or zimmerman + + set=1 + + (dylan and bob) or set=1 + + + + Assuming that the qualifiers ti, + au + and date are defined we may use: + + + + ti=self portrait + + au=(bob dylan and slow train coming) + + date>1980 and (ti=((self portrait))) + + + + CCL Qualifiers - + Qualifiers are used to direct the search to a particular searchable index, such as title (ti) and author indexes (au). The CCL standard @@ -258,66 +516,67 @@ int p_query_attset (const char *arg); - Consider a scenario where the target support ranked searches in the - title-index. In this case, the user could specify - - - - ti,ranked=knuth computer - - - and the ranked would map to relation=relevance - (2=102) and the ti would map to title (1=4). - - - - A "profile" with a set predefined CCL qualifiers can be read from a - file. The YAZ client reads its CCL qualifiers from a file named + A CCL profile is a set of predefined CCL qualifiers that may be + read from a file. + The YAZ client reads its CCL qualifiers from a file named default.bib. Each line in the file has the form: qualifier-name - type=val - type=val ... + [attributeset,]type=val + [attributeset,]type=val ... where qualifier-name is the name of the qualifier to be used (eg. ti), - type is a BIB-1 category type and - val is the corresponding BIB-1 attribute - value. - The type can be either numeric or it may be - either u (use), r (relation), - p (position), s (structure), - t (truncation) or c (completeness). - The qualifier-name term - has a special meaning. - The types and values for this definition is used when - no qualifiers are present. - - - - Consider the following definition: - - - - ti u=4 s=1 - au u=1 s=1 - term s=105 - - - Two qualifiers are defined, ti and - au. - They both set the structure-attribute to phrase (1). - ti - sets the use-attribute to 4. au sets the - use-attribute to 1. - When no qualifiers are used in the query the structure-attribute is - set to free-form-text (105). + type is attribute type in the attribute + set (Bib-1 is used if no attribute set is given) and + val is attribute value. + The type can be specified as an + integer or as it be specified either as a single-letter: + u for use, + r for relation,p for position, + s for structure,t for truncation + or c for completeness. + The attributes for the special qualifier name term + are used when no CCL qualifier is given in a query. + CCL profile + + Consider the following definition: + + + + ti u=4 s=1 + au u=1 s=1 + term s=105 + ranked r=102 + + + Three qualifiers are defined, ti, + au and ranked. + ti and au both set + structure attribute to phrase (s=1). + ti + sets the use-attribute to 4. au sets the + use-attribute to 1. + When no qualifiers are used in the query the structure-attribute is + set to free-form-text (105). + + + You can combine attributes. To Search for "ranked title" you + can do + + ti,ranked=knuth computer + + which will use "relation is ranked", "use is title", "structure is + phrase". + + + CCL API @@ -382,6 +641,540 @@ struct ccl_rpn_node *ccl_find_str (CCL_bibset bibset, const char *str, + CQL + + CQL + - Common Query Language - was defined for the + SRW + protocol. + In many ways CQL has a similar syntax to CCL. + The objective of CQL is different. Where CCL aims to be + an end-user language, CQL is the protocol + query language for SRW. + + + + If you are new to CQL, read the + Gentle + Introduction. + + + + The CQL parser in &yaz; provides the following: + + + + It parses and validates a CQL query. + + + + + It generates a C structure that allows you to convert + a CQL query to some other query language, such as SQL. + + + + + The parser converts a valid CQL query to PQF, thus providing a + way to use CQL for both SRW/SRU servers and Z39.50 targets at the + same time. + + + + + The parser converts CQL to + + XCQL. + XCQL is an XML representation of CQL. + XCQL is part of the SRW specification. However, since SRU + supports CQL only, we don't expect XCQL to be widely used. + Furthermore, CQL has the advantage over XCQL that it is + easy to read. + + + + + CQL parsing + + A CQL parser is represented by the CQL_parser + handle. Its contents should be considered &yaz; internal (private). + +#include <yaz/cql.h> + +typedef struct cql_parser *CQL_parser; + +CQL_parser cql_parser_create(void); +void cql_parser_destroy(CQL_parser cp); + + A parser is created by cql_parser_create and + is destroyed by cql_parser_destroy. + + + To parse a CQL query string, the following function + is provided: + +int cql_parser_string(CQL_parser cp, const char *str); + + A CQL query is parsed by the cql_parser_string + which takes a query str. + If the query was valid (no syntax errors), then zero is returned; + otherwise a non-zero error code is returned. + + + +int cql_parser_stream(CQL_parser cp, + int (*getbyte)(void *client_data), + void (*ungetbyte)(int b, void *client_data), + void *client_data); + +int cql_parser_stdio(CQL_parser cp, FILE *f); + + The functions cql_parser_stream and + cql_parser_stdio parses a CQL query + - just like cql_parser_string. + The only difference is that the CQL query can be + fed to the parser in different ways. + The cql_parser_stream uses a generic + byte stream as input. The cql_parser_stdio + uses a FILE handle which is opened for reading. + + + + CQL tree + + The the query string is validl, the CQL parser + generates a tree representing the structure of the + CQL query. + + + +struct cql_node *cql_parser_result(CQL_parser cp); + + cql_parser_result returns the + a pointer to the root node of the resulting tree. + + + Each node in a CQL tree is represented by a + struct cql_node. + It is defined as follows: + +#define CQL_NODE_ST 1 +#define CQL_NODE_BOOL 2 +#define CQL_NODE_MOD 3 +struct cql_node { + int which; + union { + struct { + char *index; + char *term; + char *relation; + struct cql_node *modifiers; + struct cql_node *prefixes; + } st; + struct { + char *value; + struct cql_node *left; + struct cql_node *right; + struct cql_node *modifiers; + struct cql_node *prefixes; + } boolean; + struct { + char *name; + char *value; + struct cql_node *next; + } mod; + } u; +}; + + There are three kinds of nodes, search term (ST), boolean (BOOL), + and modifier (MOD). + + + The search term node has five members: + + + + index: index for search term. + If an index is unspecified for a search term, + index will be NULL. + + + + + term: the search term itself. + + + + + relation: relation for search term. + + + + + modifiers: relation modifiers for search + term. The modifiers is a simple linked + list (NULL for last entry). Each relation modifier node + is of type MOD. + + + + + prefixes: index prefixes for search + term. The prefixes is a simple linked + list (NULL for last entry). Each prefix node + is of type MOD. + + + + + + + The boolean node represents both and, + or, not as well as + proximity. + + + + left and right: left + - and right operand respectively. + + + + + modifiers: proximity arguments. + + + + + prefixes: index prefixes. + The prefixes is a simple linked + list (NULL for last entry). Each prefix node + is of type MOD. + + + + + + + The modifier node is a "utility" node used for name-value pairs, + such as prefixes, proximity arguements, etc. + + + + name name of mod node. + + + + + value value of mod node. + + + + + next: pointer to next node which is + always a mod node (NULL for last entry). + + + + + + + CQL to PQF conversion + + Conversion to PQF (and Z39.50 RPN) is tricky by the fact + that the resulting RPN depends on the Z39.50 target + capabilities (combinations of supported attributes). + In addition, the CQL and SRW operates on index prefixes + (URI or strings), whereas the RPN uses Object Identifiers + for attribute sets. + + + The CQL library of &yaz; defines a cql_transform_t + type. It represents a particular mapping between CQL and RPN. + This handle is created and destroyed by the functions: + +cql_transform_t cql_transform_open_FILE (FILE *f); +cql_transform_t cql_transform_open_fname(const char *fname); +void cql_transform_close(cql_transform_t ct); + + The first two functions create a tranformation handle from + either an already open FILE or from a filename respectively. + + + The handle is destroyed by cql_transform_close + in which case no further reference of the handle is allowed. + + + When a cql_transform_t handle has been created + you can convert to RPN. + +int cql_transform_buf(cql_transform_t ct, + struct cql_node *cn, char *out, int max); + + This function converts the CQL tree cn + using handle ct. + For the resulting PQF, you supply a buffer out + which must be able to hold at at least max + characters. + + + If conversion failed, cql_transform_buf + returns a non-zero SRW error code; otherwise zero is returned + (conversion successful). The meanings of the numeric error + codes are listed in the SRW specifications at + + + + If conversion fails, more information can be obtained by calling + +int cql_transform_error(cql_transform_t ct, char **addinfop); + + This function returns the most recently returned numeric + error-code and sets the string-pointer at + *addinfop to point to a string containing + additional information about the error that occurred: for + example, if the error code is 15 (``Illegal or unsupported index + set''), the additional information is the name of the requested + index set that was not recognised. + + + If you wish to be able to produce a PQF result in a different + way, there are two alternatives. + +void cql_transform_pr(cql_transform_t ct, + struct cql_node *cn, + void (*pr)(const char *buf, void *client_data), + void *client_data); + +int cql_transform_FILE(cql_transform_t ct, + struct cql_node *cn, FILE *f); + + The former function produces output to a user-defined + output stream. The latter writes the result to an already + open FILE. + + + + Specification of CQL to RPN mapping + + The file supplied to functions + cql_transform_open_FILE, + cql_transform_open_fname follows + a structure found in many Unix utilities. + It consists of mapping specifications - one per line. + Lines starting with # are ignored (comments). + + + Each line is of the form + + CQL pattern = RPN equivalent + + + + An RPN pattern is a simple attribute list. Each attribute pair + takes the form: + + [set] type=value + + The attribute set is optional. + The type is the attribute type, + value the attribute value. + + + The following CQL patterns are recognized: + + + qualifier.set.name + + + + This pattern is invoked when a CQL qualifier, such as + dc.title is converted. set + and name is the index set and qualifier + name respectively. + Typically, the RPN specifies an equivalent use attribute. + + + For terms not bound by a qualifier the pattern + qualifier.srw.serverChoice is used. + Here, the prefix srw is defined as + http://www.loc.gov/zing/cql/srw-indexes/v1.0/. + If this pattern is not defined, the mapping will fail. + + + + + relation.relation + + + + This pattern specifies how a CQL relation is mapped to RPN. + pattern is name of relation + operator. Since = is used as + separator between CQL pattern and RPN, CQL relations + including = cannot be + used directly. To avoid a conflict, the names + ge, + eq, + le, + must be used for CQL operators, greater-than-or-equal, + equal, less-than-or-equal respectively. + The RPN pattern is supposed to include a relation attribute. + + + For terms not bound by a relation, the pattern + relation.scr is used. If the pattern + is not defined, the mapping will fail. + + + The special pattern, relation.* is used + when no other relation pattern is matched. + + + + + + relationModifier.mod + + + + This pattern specifies how a CQL relation modifier is mapped to RPN. + The RPN pattern is usually a relation attribute. + + + + + + structure.type + + + + This pattern specifies how a CQL structure is mapped to RPN. + Note that this CQL pattern is somewhat to similar to + CQL pattern relation. + The type is a CQL relation. + + + The pattern, structure.* is used + when no other structure pattern is matched. + Usually, the RPN equivalent specifies a structure attribute. + + + + + + position.type + + + + This pattern specifies how the anchor (position) of + CQL is mapped to RPN. + The type is one + of first, any, + last, firstAndLast. + + + The pattern, position.* is used + when no other position pattern is matched. + + + + + + set.prefix + + + + This specification defines a CQL index set for a given prefix. + The value on the right hand side is the URI for the set - + not RPN. All prefixes used in + qualifier patterns must be defined this way. + + + + + + CQL to RPN mapping file + + This simple file defines two index sets, three qualifiers and three + relations, a position pattern and a default structure. + + + + + With the mappings above, the CQL query + + computer + + is converted to the PQF: + + @attr 1=1016 @attr 2=3 @attr 4=1 @attr 3=3 @attr 6=1 "computer" + + by rules qualifier.srw.serverChoice, + relation.scr, structure.*, + position.any. + + + CQL query + + computer^ + + is rejected, since position.right is + undefined. + + + CQL query + + >my = "http://www.loc.gov/zing/cql/dc-indexes/v1.0/" my.title = x + + is converted to + + @attr 1=4 @attr 2=3 @attr 4=1 @attr 3=3 @attr 6=1 "x" + + + + + CQL to XCQL conversion + + Conversion from CQL to XCQL is trivial and does not + require a mapping to be defined. + There three functions to choose from depending on the + way you wish to store the resulting output (XML buffer + containing XCQL). + +int cql_to_xml_buf(struct cql_node *cn, char *out, int max); +void cql_to_xml(struct cql_node *cn, + void (*pr)(const char *buf, void *client_data), + void *client_data); +void cql_to_xml_stdio(struct cql_node *cn, FILE *f); + + Function cql_to_xml_buf converts + to XCQL and stores result in a user supplied buffer of a given + max size. + + + cql_to_xml writes the result in + a user defined output stream. + cql_to_xml_stdio writes to a + a file. + + + Object Identifiers