X-Git-Url: http://git.indexdata.com/?p=yaz-moved-to-github.git;a=blobdiff_plain;f=doc%2Ftools.xml;h=51de23355a296217fa9cd0ecbb01a34250bcac95;hp=850a6fc23a7de44af21aef276c68cd672d38e396;hb=d193403feb3df490f60175d387603f4daf89cf1f;hpb=30dc1fd726606bff28c2f3884f3f294c42550008 diff --git a/doc/tools.xml b/doc/tools.xml index 850a6fc..51de233 100644 --- a/doc/tools.xml +++ b/doc/tools.xml @@ -1,4 +1,4 @@ - + Supporting Tools @@ -20,7 +20,7 @@ that may be of use to you. - Prefix Query Format + Prefix Query Format Since RPN or reverse polish notation is really just a fancy way of @@ -32,19 +32,73 @@ in simple test applications and scripting environments (like Tcl). The demonstration client included with YAZ uses the PQF. + + + + The PQF have been adopted by other parties developing Z39.50 + software. It is often referred to as Prefix Query Notation + - PQN. + + - The PQF is defined by the pquery module in the YAZ library. The - pquery.h file provides the declaration of the - functions + The PQF is defined by the pquery module in the YAZ library. + There are two sets of function that have similar behavior. First + set operates on a PQF parser handle, second set doesn't. First set + set of functions are more flexible than the second set. Second set + is obsolete and is only provided to ensure backwards compatibility. - -Z_RPNQuery *p_query_rpn (ODR o, oid_proto proto, const char *qbuf); + + First set of functions all operate on a PQF parser handle: + + + #include <yaz/pquery.h> -Z_AttributesPlusTerm *p_query_scan (ODR o, oid_proto proto, - Odr_oid **attributeSetP, const char *qbuf); + YAZ_PQF_Parser yaz_pqf_create (void); -int p_query_attset (const char *arg); - + void yaz_pqf_destroy (YAZ_PQF_Parser p); + + Z_RPNQuery *yaz_pqf_parse (YAZ_PQF_Parser p, ODR o, const char *qbuf); + + Z_AttributesPlusTerm *yaz_pqf_scan (YAZ_PQF_Parser p, ODR o, + Odr_oid **attributeSetId, const char *qbuf); + + + int yaz_pqf_error (YAZ_PQF_Parser p, const char **msg, size_t *off); + + + A PQF parser is created and destructed by functions + yaz_pqf_create and + yaz_pqf_destroy respectively. + Function yaz_pqf_parse parses query given + by string qbuf. If parsing was successful, + a Z39.50 RPN Query is returned which is created using ODR stream + o. If parsing failed, a NULL pointer is + returned. + Function yaz_pqf_scan takes a scan query in + qbuf. If parsing was successful, the function + returns attributes plus term pointer and modifies + attributeSetId to hold attribute set for the + scan request - both allocated using ODR stream o. + If parsing failed, yaz_pqf_scan returns a NULL pointer. + Error information for bad queries can be obtained by a call to + yaz_pqf_error which returns an error code and + modifies *msg to point to an error description, + and modifies *off to the offset within last + query were parsing failed. + + + The second set of functions are declared as follows: + + + #include <yaz/pquery.h> + + Z_RPNQuery *p_query_rpn (ODR o, oid_proto proto, const char *qbuf); + + Z_AttributesPlusTerm *p_query_scan (ODR o, oid_proto proto, + Odr_oid **attributeSetP, const char *qbuf); + + int p_query_attset (const char *arg); + The function p_query_rpn() takes as arguments an &odr; stream (see section The ODR Module) @@ -57,10 +111,10 @@ int p_query_attset (const char *arg); If the parse went well, p_query_rpn() returns a pointer to a Z_RPNQuery structure which can be - placed directly into a Z_SearchRequest. + placed directly into a Z_SearchRequest. + If parsing failed, due to syntax error, a NULL pointer is returned. - The p_query_attset specifies which attribute set to use if the query doesn't specify one by the @attrset operator. @@ -72,53 +126,71 @@ int p_query_attset (const char *arg); The grammar of the PQF is as follows: - - Query ::= [ '@attrset' AttSet ] QueryStruct. + + query ::= top-set query-struct. - AttSet ::= string. + top-set ::= [ '@attrset' string ] - QueryStruct ::= [ Attribute ] Simple | Complex. + query-struct ::= attr-spec | simple | complex | '@term' term-type - Attribute ::= '@attr' [ AttSet ] AttributeType '=' AttributeValue. + attr-spec ::= '@attr' [ string ] string query-struct - AttributeType ::= integer. + complex ::= operator query-struct query-struct. - AttributeValue ::= integer. + operator ::= '@and' | '@or' | '@not' | '@prox' proximity. - Complex ::= Operator QueryStruct QueryStruct. + simple ::= result-set | term. - Operator ::= '@and' | '@or' | '@not' | '@prox' Proximity. + result-set ::= '@set' string. - Simple ::= ResultSet | Term. + term ::= string. - ResultSet ::= '@set' string. + proximity ::= exclusion distance ordered relation which-code unit-code. - Term ::= string | '"' string '"'. + exclusion ::= '1' | '0' | 'void'. - Proximity ::= Exclusion Distance Ordered Relation WhichCode UnitCode. + distance ::= integer. - Exclusion ::= '1' | '0' | 'void'. + ordered ::= '1' | '0'. - Distance ::= integer. + relation ::= integer. - Ordered ::= '1' | '0'. + which-code ::= 'known' | 'private' | integer. - Relation ::= integer. + unit-code ::= integer. - WhichCode ::= 'known' | 'private' | integer. - - UnitCode ::= integer. - + term-type ::= 'general' | 'numeric' | 'string' | 'oid' | 'datetime' | 'null'. + You will note that the syntax above is a fairly faithful - representation of RPN, except for the Attibute, which has been + representation of RPN, except for the Attribute, which has been moved a step away from the term, allowing you to associate one or more attributes with an entire query structure. The parser will automatically apply the given attributes to each term as required. + The @attr operator is followed by an attribute specification + (attr-spec above). The specification consists + of optional an attribute set, an attribute type-value pair and + a sub query. The attribute type-value pair is packed in one string: + an attribute type, a dash, followed by an attribute value. + The type is always an integer but the value may be either an + integer or a string (if it doesn't start with a digit character). + + + + Z39.50 version 3 defines various encoding of terms. + Use the @term operator to indicate the encoding type: + general, numeric, + string (for InternationalString), .. + If no term type has been given, the general form + is used which is the only encoding allowed in both version 2 - and 3 + of the Z39.50 standard. + + + The following are all examples of valid queries in the PQF. @@ -133,6 +205,8 @@ int p_query_attset (const char *arg); @or @and bob dylan @set Result-1 + @attr 1=4 computer + @attr 4=1 @and @attr 1=1 "bob dylan" @attr 1=4 "slow train coming" @attr 4=1 @attr 1=4 "self portrait" @@ -140,10 +214,14 @@ int p_query_attset (const char *arg); @prox 0 3 1 2 k 2 dylan zimmerman @and @attr 2=4 @attr gils 1=2038 -114 @attr 2=2 @attr gils 1=2039 -109 + + @term string "a UTF-8 string, maybe?" + + @attr 1=/book/title computer - Common Command Language + Common Command Language Not all users enjoy typing in prefix query structures and numerical @@ -156,14 +234,15 @@ int p_query_attset (const char *arg); - The EUROPAGATE research project working under the Libraries programme + The EUROPAGATE + research project working under the Libraries programme of the European Commission's DG XIII has, amongst other useful tools, implemented a general-purpose CCL parser which produces an output structure that can be trivially converted to the internal RPN - representation of YAZ (The Z_RPNQuery structure). + representation of &yaz; (The Z_RPNQuery structure). Since the CCL utility - along with the rest of the software - produced by EUROPAGATE - is made freely available on a liberal license, it - is included as a supplement to YAZ. + produced by EUROPAGATE - is made freely available on a liberal + license, it is included as a supplement to &yaz;. CCL Syntax @@ -206,7 +285,7 @@ int p_query_attset (const char *arg); | string -- Qualifiers is a list of strings separated by comma - Relation ::= '=' | '>=' | '<=' | '<>' | '>' | '<' + Relation ::= '=' | '>=' | '<=' | '<>' | '>' | '<' -- Relational operators. This really doesn't follow the ISO8777 -- standard. @@ -385,6 +464,524 @@ struct ccl_rpn_node *ccl_find_str (CCL_bibset bibset, const char *str, + CQL + + CQL + - Common Query Language - was defined for the + SRW + protocol. + In many ways CQL has a similar syntax to CCL. + The objective of CQL is different. Where CCL aims to be + an end-user language, CQL is the protocol + query language for SRW. + + + + If you are new to CQL, read the + Gentle + Introduction. + + + + The CQL parser in &yaz; provides the following: + + + + It parses and validates a CQL query. + + + + + It generates a C structure that allows you to convert + a CQL query to some other query language, such as SQL. + + + + + The parser converts a valid CQL query to PQF, thus providing a + way to use CQL for both SRW/SRU servers and Z39.50 targets at the + same time. + + + + + The parser converts CQL to + + XCQL. + XCQL is an XML representation of CQL. + XCQL is part of the SRW specification. However, since SRU + supports CQL only, we don't expect XCQL to be widely used. + Furthermore, CQL has the advantage over XCQL that it is + easy to read. + + + + + CQL parsing + + A CQL parser is represented by the CQL_parser + handle. Its contents should be considered &yaz; internal (private). + +#include <yaz/cql.h> + +typedef struct cql_parser *CQL_parser; + +CQL_parser cql_parser_create(void); +void cql_parser_destroy(CQL_parser cp); + + A parser is created by cql_parser_create and + is destroyed by cql_parser_destroy. + + + To parse a CQL query string, the following function + is provided: + +int cql_parser_string(CQL_parser cp, const char *str); + + A CQL query is parsed by the cql_parser_string + which takes a query str. + If the query was valid (no syntax errors), then zero is returned; + otherwise a non-zero error code is returned. + + + +int cql_parser_stream(CQL_parser cp, + int (*getbyte)(void *client_data), + void (*ungetbyte)(int b, void *client_data), + void *client_data); + +int cql_parser_stdio(CQL_parser cp, FILE *f); + + The functions cql_parser_stream and + cql_parser_stdio parses a CQL query + - just like cql_parser_string. + The only difference is that the CQL query can be + fed to the parser in different ways. + The cql_parser_stream uses a generic + byte stream as input. The cql_parser_stdio + uses a FILE handle which is opened for reading. + + + + CQL tree + + The the query string is validl, the CQL parser + generates a tree representing the structure of the + CQL query. + + + +struct cql_node *cql_parser_result(CQL_parser cp); + + cql_parser_result returns the + a pointer to the root node of the resulting tree. + + + Each node in a CQL tree is represented by a + struct cql_node. + It is defined as follows: + +#define CQL_NODE_ST 1 +#define CQL_NODE_BOOL 2 +#define CQL_NODE_MOD 3 +struct cql_node { + int which; + union { + struct { + char *index; + char *term; + char *relation; + struct cql_node *modifiers; + struct cql_node *prefixes; + } st; + struct { + char *value; + struct cql_node *left; + struct cql_node *right; + struct cql_node *modifiers; + struct cql_node *prefixes; + } bool; + struct { + char *name; + char *value; + struct cql_node *next; + } mod; + } u; +}; + + There are three kinds of nodes, search term (ST), boolean (BOOL), + and modifier (MOD). + + + The search term node has five members: + + + + index: index for search term. + If an index is unspecified for a search term, + index will be NULL. + + + + + term: the search term itself. + + + + + relation: relation for search term. + + + + + modifiers: relation modifiers for search + term. The modifiers is a simple linked + list (NULL for last entry). Each relation modifier node + is of type MOD. + + + + + prefixes: index prefixes for search + term. The prefixes is a simple linked + list (NULL for last entry). Each prefix node + is of type MOD. + + + + + + + The boolean node represents both and, + or, not as well as + proximity. + + + + left and right: left + - and right operand respectively. + + + + + modifiers: proximity arguments. + + + + + prefixes: index prefixes. + The prefixes is a simple linked + list (NULL for last entry). Each prefix node + is of type MOD. + + + + + + + The modifier node is a "utility" node used for name-value pairs, + such as prefixes, proximity arguements, etc. + + + + name name of mod node. + + + + + value value of mod node. + + + + + next: pointer to next node which is + always a mod node (NULL for last entry). + + + + + + + CQL to PQF conversion + + Conversion to PQF (and Z39.50 RPN) is tricky by the fact + that the resulting RPN depends on the Z39.50 target + capabilities (combinations of supported attributes). + In addition, the CQL and SRW operates on index prefixes + (URI or strings), whereas the RPN uses Object Identifiers + for attribute sets. + + + The CQL library of &yaz; defines a cql_transform_t + type. It represents a particular mapping between CQL and RPN. + This handle is created and destroyed by the functions: + +cql_transform_t cql_transform_open_FILE (FILE *f); +cql_transform_t cql_transform_open_fname(const char *fname); +void cql_transform_close(cql_transform_t ct); + + The first two functions create a tranformation handle from + either an already open FILE or from a filename respectively. + + + The handle is destroyed by cql_transform_close + in which case no further reference of the handle is allowed. + + + When a cql_transform_t handle has been created + you can convert to RPN. + +int cql_transform_buf(cql_transform_t ct, + struct cql_node *cn, char *out, int max); + + This function converts the CQL tree cn + using handle ct. + For the resulting PQF, you supply a buffer out + which must be able to hold at at least max + characters. + + + If conversion failed, cql_transform_buf + returns a non-zero error code; otherwise zero is returned + (conversion successful). + + + If you wish to be able to produce a PQF result in a different + way, there are two alternatives. + +void cql_transform_pr(cql_transform_t ct, + struct cql_node *cn, + void (*pr)(const char *buf, void *client_data), + void *client_data); + +int cql_transform_FILE(cql_transform_t ct, + struct cql_node *cn, FILE *f); + + The former function produces output to a user-defined + output stream. The latter writes the result to an already + open FILE. + + + + Specification of CQL to RPN mapping + + The file supplied to functions + cql_transform_open_FILE, + cql_transform_open_fname follows + a structure found in many Unix utilities. + It consists of mapping specifications - one per line. + Lines starting with # are ignored (comments). + + + Each line is of the form + + CQL pattern = RPN equivalent + + + + An RPN pattern is a simple attribute list. Each attribute pair + takes the form: + + [set] type=value + + The attribute set is optional. + The type is the attribute type, + value the attribute value. + + + The following CQL patterns are recognized: + + + qualifier.set.name + + + + This pattern is invoked when a CQL qualifier, such as + dc.title is converted. set + and name is the index set and qualifier + name respectively. + Typically, the RPN specifies an equivalent use attribute. + + + For terms not bound by a qualifier the pattern + qualifier.srw.serverChoice is used. + Here, the prefix srw is defined as + http://www.loc.gov/zing/cql/srw-indexes/v1.0/. + If this pattern is not defined, the mapping will fail. + + + + + relation.relation + + + + This pattern specifies how a CQL relation is mapped to RPN. + pattern is name of relation + operator. Since = is used as + separator between CQL pattern and RPN, CQL relations + including = cannot be + used directly. To avoid a conflict, the names + ge, + eq, + le, + must be used for CQL operators, greater-than-or-equal, + equal, less-than-or-equal respectively. + The RPN pattern is supposed to include a relation attribute. + + + For terms not bound by a relation, the pattern + relation.scr is used. If the pattern + is not defined, the mapping will fail. + + + The special pattern, relation.* is used + when no other relation pattern is matched. + + + + + + relationModifier.mod + + + + This pattern specifies how a CQL relation modifier is mapped to RPN. + The RPN pattern is usually a relation attribute. + + + + + + structure.type + + + + This pattern specifies how a CQL structure is mapped to RPN. + Note that this CQL pattern is somewhat to similar to + CQL pattern relation. + The type is a CQL relation. + + + The pattern, structure.* is used + when no other structure pattern is matched. + Usually, the RPN equivalent specifies a structure attribute. + + + + + + position.type + + + + This pattern specifies how the anchor (position) of + CQL is mapped to RPN. + The type is one + of first, any, + last, firstAndLast. + + + The pattern, position.* is used + when no other position pattern is matched. + + + + + + set.prefix + + + + This specification defines a CQL index set for a given prefix. + The value on the right hand side is the URI for the set - + not RPN. All prefixes used in + qualifier patterns must be defined this way. + + + + + + Small CQL to RPN mapping file + + This small file defines two index sets, three qualifiers and three + relations, a position pattern and a default structure. + + + set.srw = http://www.loc.gov/zing/cql/srw-indexes/v1.0/ + set.dc = http://www.loc.gov/zing/cql/dc-indexes/v1.0/ + + qualifier.srw.serverChoice = 1=1016 + qualifier.dc.title = 1=4 + qualifier.dc.subject = 1=21 + + relation.< = 2=1 + relation.eq = 2=3 + relation.scr = 2=3 + + position.any = 3=3 6=1 + + structure.* = 4=1 + + + With the mappings above, the CQL query + + computer + + is converted to the PQF: + + @attr 1=1016 @attr 2=3 @attr 4=1 @attr 3=3 @attr 6=1 "computer" + + by rules qualifier.srw.serverChoice, + relation.scr, structure.*, + position.any. + + + CQL query + + computer^ + + is rejected, since position.right is + undefined. + + + CQL query + + >my = "http://www.loc.gov/zing/cql/dc-indexes/v1.0/" my.title = x + + is converted to + + @attr 1=4 @attr 2=3 @attr 4=1 @attr 3=3 @attr 6=1 "x" + + + + + CQL to XCQL conversion + + Conversion from CQL to XCQL is trivial and does not + require a mapping to be defined. + There three functions to choose from depending on the + way you wish to store the resulting output (XML buffer + containing XCQL). + +int cql_to_xml_buf(struct cql_node *cn, char *out, int max); +void cql_to_xml(struct cql_node *cn, + void (*pr)(const char *buf, void *client_data), + void *client_data); +void cql_to_xml_stdio(struct cql_node *cn, FILE *f); + + Function cql_to_xml_buf converts + to XCQL and stores result in a user supplied buffer of a given + max size. + + + cql_to_xml writes the result in + a user defined output stream. + cql_to_xml_stdio writes to a + a file. + + + Object Identifiers @@ -417,7 +1014,7 @@ struct ccl_rpn_node *ccl_find_str (CCL_bibset bibset, const char *str, The OID module provides a higher-level representation of the - family of object identifers which describe the Z39.50 protocol and its + family of object identifiers which describe the Z39.50 protocol and its related objects. The definition of the module interface is given in the oid.h file. @@ -583,7 +1180,7 @@ typedef struct oident The oid_ent_to_oid() function can be used whenever you need to prepare a PDU containing one or more OIDs. The separation of - the protocol element from the remainer of the + the protocol element from the remainder of the OID-description makes it simple to write applications that can communicate with either Z39.50 or OSI SR-based applications. @@ -702,7 +1299,7 @@ typedef struct oident sgml-indent-step:1 sgml-indent-data:t sgml-parent-document: "yaz.xml" - sgml-local-catalogs: "../../docbook/docbook.cat" + sgml-local-catalogs: nil sgml-namecase-general:t End: -->