1 <!-- $Id: tools.xml,v 1.15 2003-01-22 09:43:32 adam Exp $ -->
2 <chapter id="tools"><title>Supporting Tools</title>
5 In support of the service API - primarily the ASN module, which
6 provides the pro-grammatic interface to the Z39.50 APDUs, &yaz; contains
7 a collection of tools that support the development of applications.
10 <sect1 id="tools.query"><title>Query Syntax Parsers</title>
13 Since the type-1 (RPN) query structure has no direct, useful string
14 representation, every origin application needs to provide some form of
15 mapping from a local query notation or representation to a
16 <token>Z_RPNQuery</token> structure. Some programmers will prefer to
17 construct the query manually, perhaps using
18 <function>odr_malloc()</function> to simplify memory management.
19 The &yaz; distribution includes two separate, query-generating tools
20 that may be of use to you.
23 <sect2 id="PQF"><title>Prefix Query Format</title>
26 Since RPN or reverse polish notation is really just a fancy way of
27 describing a suffix notation format (operator follows operands), it
28 would seem that the confusion is total when we now introduce a prefix
29 notation for RPN. The reason is one of simple laziness - it's somewhat
30 simpler to interpret a prefix format, and this utility was designed
31 for maximum simplicity, to provide a baseline representation for use
32 in simple test applications and scripting environments (like Tcl). The
33 demonstration client included with YAZ uses the PQF.
38 The PQF have been adopted by other parties developing Z39.50
39 software. It is often referred to as Prefix Query Notation
44 The PQF is defined by the pquery module in the YAZ library.
45 There are two sets of function that have similar behavior. First
46 set operates on a PQF parser handle, second set doesn't. First set
47 set of functions are more flexible than the second set. Second set
48 is obsolete and is only provided to ensure backwards compatibility.
51 First set of functions all operate on a PQF parser handle:
54 #include <yaz/pquery.h>
56 YAZ_PQF_Parser yaz_pqf_create (void);
58 void yaz_pqf_destroy (YAZ_PQF_Parser p);
60 Z_RPNQuery *yaz_pqf_parse (YAZ_PQF_Parser p, ODR o, const char *qbuf);
62 Z_AttributesPlusTerm *yaz_pqf_scan (YAZ_PQF_Parser p, ODR o,
63 Odr_oid **attributeSetId, const char *qbuf);
66 int yaz_pqf_error (YAZ_PQF_Parser p, const char **msg, size_t *off);
69 A PQF parser is created and destructed by functions
70 <function>yaz_pqf_create</function> and
71 <function>yaz_pqf_destroy</function> respectively.
72 Function <function>yaz_pqf_parse</function> parses query given
73 by string <literal>qbuf</literal>. If parsing was successful,
74 a Z39.50 RPN Query is returned which is created using ODR stream
75 <literal>o</literal>. If parsing failed, a NULL pointer is
77 Function <function>yaz_pqf_scan</function> takes a scan query in
78 <literal>qbuf</literal>. If parsing was successful, the function
79 returns attributes plus term pointer and modifies
80 <literal>attributeSetId</literal> to hold attribute set for the
81 scan request - both allocated using ODR stream <literal>o</literal>.
82 If parsing failed, yaz_pqf_scan returns a NULL pointer.
83 Error information for bad queries can be obtained by a call to
84 <function>yaz_pqf_error</function> which returns an error code and
85 modifies <literal>*msg</literal> to point to an error description,
86 and modifies <literal>*off</literal> to the offset within last
87 query were parsing failed.
90 The second set of functions are declared as follows:
93 #include <yaz/pquery.h>
95 Z_RPNQuery *p_query_rpn (ODR o, oid_proto proto, const char *qbuf);
97 Z_AttributesPlusTerm *p_query_scan (ODR o, oid_proto proto,
98 Odr_oid **attributeSetP, const char *qbuf);
100 int p_query_attset (const char *arg);
103 The function <function>p_query_rpn()</function> takes as arguments an
104 &odr; stream (see section <link linkend="odr">The ODR Module</link>)
105 to provide a memory source (the structure created is released on
106 the next call to <function>odr_reset()</function> on the stream), a
107 protocol identifier (one of the constants <token>PROTO_Z3950</token> and
108 <token>PROTO_SR</token>), an attribute set reference, and
109 finally a null-terminated string holding the query string.
112 If the parse went well, <function>p_query_rpn()</function> returns a
113 pointer to a <literal>Z_RPNQuery</literal> structure which can be
114 placed directly into a <literal>Z_SearchRequest</literal>.
115 If parsing failed, due to syntax error, a NULL pointer is returned.
118 The <literal>p_query_attset</literal> specifies which attribute set
119 to use if the query doesn't specify one by the
120 <literal>@attrset</literal> operator.
121 The <literal>p_query_attset</literal> returns 0 if the argument is a
122 valid attribute set specifier; otherwise the function returns -1.
126 The grammar of the PQF is as follows:
130 query ::= top-set query-struct.
132 top-set ::= [ '@attrset' string ]
134 query-struct ::= attr-spec | simple | complex | '@term' term-type
136 attr-spec ::= '@attr' [ string ] string query-struct
138 complex ::= operator query-struct query-struct.
140 operator ::= '@and' | '@or' | '@not' | '@prox' proximity.
142 simple ::= result-set | term.
144 result-set ::= '@set' string.
148 proximity ::= exclusion distance ordered relation which-code unit-code.
150 exclusion ::= '1' | '0' | 'void'.
152 distance ::= integer.
154 ordered ::= '1' | '0'.
156 relation ::= integer.
158 which-code ::= 'known' | 'private' | integer.
160 unit-code ::= integer.
162 term-type ::= 'general' | 'numeric' | 'string' | 'oid' | 'datetime' | 'null'.
166 You will note that the syntax above is a fairly faithful
167 representation of RPN, except for the Attribute, which has been
168 moved a step away from the term, allowing you to associate one or more
169 attributes with an entire query structure. The parser will
170 automatically apply the given attributes to each term as required.
174 The @attr operator is followed by an attribute specification
175 (<literal>attr-spec</literal> above). The specification consists
176 of optional an attribute set, an attribute type-value pair and
177 a sub query. The attribute type-value pair is packed in one string:
178 an attribute type, a dash, followed by an attribute value.
179 The type is always an integer but the value may be either an
180 integer or a string (if it doesn't start with a digit character).
184 Z39.50 version 3 defines various encoding of terms.
185 Use the @term operator to indicate the encoding type:
186 <literal>general</literal>, <literal>numeric</literal>,
187 <literal>string</literal> (for InternationalString), ..
188 If no term type has been given, the <literal>general</literal> form
189 is used which is the only encoding allowed in both version 2 - and 3
190 of the Z39.50 standard.
194 The following are all examples of valid queries in the PQF.
202 @or "dylan" "zimmerman"
206 @or @and bob dylan @set Result-1
210 @attr 4=1 @and @attr 1=1 "bob dylan" @attr 1=4 "slow train coming"
212 @attr 4=1 @attr 1=4 "self portrait"
214 @prox 0 3 1 2 k 2 dylan zimmerman
216 @and @attr 2=4 @attr gils 1=2038 -114 @attr 2=2 @attr gils 1=2039 -109
218 @term string "a UTF-8 string, maybe?"
220 @attr 1=/book/title computer
224 <sect2 id="CCL"><title>Common Command Language</title>
227 Not all users enjoy typing in prefix query structures and numerical
228 attribute values, even in a minimalistic test client. In the library
229 world, the more intuitive Common Command Language (or ISO 8777) has
230 enjoyed some popularity - especially before the widespread
231 availability of graphical interfaces. It is still useful in
232 applications where you for some reason or other need to provide a
233 symbolic language for expressing boolean query structures.
237 The <ulink url="http://europagate.dtv.dk/">EUROPAGATE</ulink>
238 research project working under the Libraries programme
239 of the European Commission's DG XIII has, amongst other useful tools,
240 implemented a general-purpose CCL parser which produces an output
241 structure that can be trivially converted to the internal RPN
242 representation of &yaz; (The <literal>Z_RPNQuery</literal> structure).
243 Since the CCL utility - along with the rest of the software
244 produced by EUROPAGATE - is made freely available on a liberal
245 license, it is included as a supplement to &yaz;.
248 <sect3><title>CCL Syntax</title>
251 The CCL parser obeys the following grammar for the FIND argument.
252 The syntax is annotated by in the lines prefixed by
253 <literal>‐‐</literal>.
257 CCL-Find ::= CCL-Find Op Elements
260 Op ::= "and" | "or" | "not"
261 -- The above means that Elements are separated by boolean operators.
263 Elements ::= '(' CCL-Find ')'
266 | Qualifiers Relation Terms
267 | Qualifiers Relation '(' CCL-Find ')'
268 | Qualifiers '=' string '-' string
269 -- Elements is either a recursive definition, a result set reference, a
270 -- list of terms, qualifiers followed by terms, qualifiers followed
271 -- by a recursive definition or qualifiers in a range (lower - upper).
273 Set ::= 'set' = string
274 -- Reference to a result set
276 Terms ::= Terms Prox Term
278 -- Proximity of terms.
282 -- This basically means that a term may include a blank
284 Qualifiers ::= Qualifiers ',' string
286 -- Qualifiers is a list of strings separated by comma
288 Relation ::= '=' | '>=' | '<=' | '<>' | '>' | '<'
289 -- Relational operators. This really doesn't follow the ISO8777
293 -- Proximity operator
298 The following queries are all valid:
310 (dylan and bob) or set=1
314 Assuming that the qualifiers <literal>ti</literal>, <literal>au</literal>
315 and <literal>date</literal> are defined we may use:
321 au=(bob dylan and slow train coming)
323 date>1980 and (ti=((self portrait)))
328 <sect3><title>CCL Qualifiers</title>
331 Qualifiers are used to direct the search to a particular searchable
332 index, such as title (ti) and author indexes (au). The CCL standard
333 itself doesn't specify a particular set of qualifiers, but it does
334 suggest a few short-hand notations. You can customize the CCL parser
335 to support a particular set of qualifiers to reflect the current target
336 profile. Traditionally, a qualifier would map to a particular
337 use-attribute within the BIB-1 attribute set. However, you could also
338 define qualifiers that would set, for example, the
343 Consider a scenario where the target support ranked searches in the
344 title-index. In this case, the user could specify
348 ti,ranked=knuth computer
351 and the <literal>ranked</literal> would map to relation=relevance
352 (2=102) and the <literal>ti</literal> would map to title (1=4).
356 A "profile" with a set predefined CCL qualifiers can be read from a
357 file. The YAZ client reads its CCL qualifiers from a file named
358 <filename>default.bib</filename>. Each line in the file has the form:
362 <replaceable>qualifier-name</replaceable>
363 <replaceable>type</replaceable>=<replaceable>val</replaceable>
364 <replaceable>type</replaceable>=<replaceable>val</replaceable> ...
368 where <replaceable>qualifier-name</replaceable> is the name of the
369 qualifier to be used (eg. <literal>ti</literal>),
370 <replaceable>type</replaceable> is a BIB-1 category type and
371 <replaceable>val</replaceable> is the corresponding BIB-1 attribute
373 The <replaceable>type</replaceable> can be either numeric or it may be
374 either <literal>u</literal> (use), <literal>r</literal> (relation),
375 <literal>p</literal> (position), <literal>s</literal> (structure),
376 <literal>t</literal> (truncation) or <literal>c</literal> (completeness).
377 The <replaceable>qualifier-name</replaceable> <literal>term</literal>
378 has a special meaning.
379 The types and values for this definition is used when
380 <emphasis>no</emphasis> qualifiers are present.
384 Consider the following definition:
393 Two qualifiers are defined, <literal>ti</literal> and
394 <literal>au</literal>.
395 They both set the structure-attribute to phrase (1).
396 <literal>ti</literal>
397 sets the use-attribute to 4. <literal>au</literal> sets the
399 When no qualifiers are used in the query the structure-attribute is
400 set to free-form-text (105).
404 <sect3><title>CCL API</title>
406 All public definitions can be found in the header file
407 <filename>ccl.h</filename>. A profile identifier is of type
408 <literal>CCL_bibset</literal>. A profile must be created with the call
409 to the function <function>ccl_qual_mk</function> which returns a profile
410 handle of type <literal>CCL_bibset</literal>.
414 To read a file containing qualifier definitions the function
415 <function>ccl_qual_file</function> may be convenient. This function
416 takes an already opened <literal>FILE</literal> handle pointer as
417 argument along with a <literal>CCL_bibset</literal> handle.
421 To parse a simple string with a FIND query use the function
424 struct ccl_rpn_node *ccl_find_str (CCL_bibset bibset, const char *str,
425 int *error, int *pos);
428 which takes the CCL profile (<literal>bibset</literal>) and query
429 (<literal>str</literal>) as input. Upon successful completion the RPN
430 tree is returned. If an error occur, such as a syntax error, the integer
431 pointed to by <literal>error</literal> holds the error code and
432 <literal>pos</literal> holds the offset inside query string in which
437 An English representation of the error may be obtained by calling
438 the <literal>ccl_err_msg</literal> function. The error codes are
439 listed in <filename>ccl.h</filename>.
443 To convert the CCL RPN tree (type
444 <literal>struct ccl_rpn_node *</literal>)
445 to the Z_RPNQuery of YAZ the function <function>ccl_rpn_query</function>
446 must be used. This function which is part of YAZ is implemented in
447 <filename>yaz-ccl.c</filename>.
448 After calling this function the CCL RPN tree is probably no longer
449 needed. The <literal>ccl_rpn_delete</literal> destroys the CCL RPN tree.
453 A CCL profile may be destroyed by calling the
454 <function>ccl_qual_rm</function> function.
458 The token names for the CCL operators may be changed by setting the
459 globals (all type <literal>char *</literal>)
460 <literal>ccl_token_and</literal>, <literal>ccl_token_or</literal>,
461 <literal>ccl_token_not</literal> and <literal>ccl_token_set</literal>.
462 An operator may have aliases, i.e. there may be more than one name for
463 the operator. To do this, separate each alias with a space character.
467 <sect2 id="tools.cql"><title>CQL</title>
469 <ulink url="http://www.loc.gov/z3950/agency/zing/cql/">CQL</ulink>
470 - Common Query Language - was defined for the
471 <ulink url="http://www.loc.gov/z3950/agency/zing/srw/">SRW</ulink>
473 In many ways CQL has a similar syntax to CCL.
474 The objective of CQL is different. Where CCL aims to be
475 an end-user language, CQL is <emphasis>the</emphasis> protocol
476 query language for SRW. Unlike PQF (Z39.50 Type-1), CQL is easy
481 If you are new to CQL, read the
482 <ulink url="http://zing.z3950.org/cql/intro.html">Gentle
483 Introduction</ulink>.
487 The CQL parser in &yaz; provides the following:
491 It parses and validates a CQL query.
496 It generates a C structure that allows you to convert
497 a CQL query to some other query language, such as SQL.
502 The parser converts a valid CQL query to PQF, thus providing a
503 way to use CQL for both SRW/SRU servers and Z39.50 targets at the
509 The parser converts CQL to
510 <ulink url="http://www.loc.gov/z3950/agency/zing/cql/xcql.html">
512 XCQL is an XML representation of CQL.
513 XCQL is part of the SRW specification. However, since SRU
514 supports CQL only, we don't expect XCQL to be widely used.
515 Furthermore, CQL has the advantage over XCQL that it is
521 <sect3 id="tools.cql.parsing"><title>CQL parsing</title>
523 A CQL parser is represented by the <literal>CQL_parser</literal>
524 handle. Its contents should be considered &yaz; internal (private).
526 #include <yaz/cql.h>
528 typedef struct cql_parser *CQL_parser;
530 CQL_parser cql_parser_create(void);
531 void cql_parser_destroy(CQL_parser cp);
533 int cql_parser_string(CQL_parser cp, const char *str);
535 A parser is created by <function>cql_parser_create</function> and
536 is destroyed by <function>cql_parser_destroy</function>.
539 A CQL query is parsed by the <function>cql_parser_string</function>
540 which takes a query <parameter>str</parameter>.
541 If the query was valid (no syntax errors), then zero is returned;
542 otherwise a non-zero error code is returned.
546 int cql_parser_stream(CQL_parser cp,
547 int (*getbyte)(void *client_data),
548 void (*ungetbyte)(int b, void *client_data),
551 int cql_parser_stdio(CQL_parser cp, FILE *f);
553 The functions <function>cql_parser_stream</function> and
554 <function>cql_parser_stdio</function> parses a CQL query
555 - just like <function>cql_parser_string</function>.
556 The only difference is that the CQL query can be
557 fed to the parser in different ways.
558 The <function>cql_parser_stream</function> uses a generic
559 byte stream as input. The <function>cql_parser_stdio</function>
560 uses a <literal>FILE</literal> handle which is opened for reading.
563 <sect3 id="tools.cql.tree"><title>CQL tree</title>
565 We now turn to the tree representation of a valid CQL query.
567 #define CQL_NODE_ST 1
568 #define CQL_NODE_BOOL 2
569 #define CQL_NODE_MOD 3
577 struct cql_node *modifiers;
578 struct cql_node *prefixes;
582 struct cql_node *left;
583 struct cql_node *right;
584 struct cql_node *modifiers;
585 struct cql_node *prefixes;
590 struct cql_node *next;
595 There are three kinds of nodes, search term (ST), boolean (BOOL),
599 The search term node has five members:
603 <literal>index</literal>: index for search term.
604 If an index is unspecified for a search term,
605 <literal>index</literal> will be NULL.
610 <literal>term</literal>: the search term itself.
615 <literal>relation</literal>: relation for search term.
620 <literal>modifiers</literal>: relation modifiers for search
621 term. The <literal>modifiers</literal> is a simple linked
622 list (NULL for last entry). Each relation modifier node
623 is of type <literal>MOD</literal>.
628 <literal>prefixes</literal>: index prefixes for search
629 term. The <literal>prefixes</literal> is a simple linked
630 list (NULL for last entry). Each prefix node
631 is of type <literal>MOD</literal>.
638 The boolean node represents both <literal>and</literal>,
639 <literal>or</literal>, not as well as
644 <literal>left</literal> and <literal>right</literal>: left
645 - and right operand respectively.
650 <literal>modifiers</literal>: proximity arguments.
655 <literal>prefixes</literal>: index prefixes.
656 The <literal>prefixes</literal> is a simple linked
657 list (NULL for last entry). Each prefix node
658 is of type <literal>MOD</literal>.
665 The modifier node is a "utility" node used for name-value pairs,
666 such as prefixes, proximity arguements, etc.
670 <literal>name</literal> name of mod node.
675 <literal>value</literal> value of mod node.
680 <literal>next</literal>: pointer to next node which is
681 always a mod node (NULL for last entry).
690 <sect1 id="tools.oid"><title>Object Identifiers</title>
693 The basic YAZ representation of an OID is an array of integers,
694 terminated with the value -1. The &odr; module provides two
695 utility-functions to create and copy this type of data elements:
699 Odr_oid *odr_getoidbystr(ODR o, char *str);
703 Creates an OID based on a string-based representation using dots (.)
704 to separate elements in the OID.
708 Odr_oid *odr_oiddup(ODR odr, Odr_oid *o);
712 Creates a copy of the OID referenced by the <emphasis>o</emphasis>
714 Both functions take an &odr; stream as parameter. This stream is used to
715 allocate memory for the data elements, which is released on a
716 subsequent call to <function>odr_reset()</function> on that stream.
720 The OID module provides a higher-level representation of the
721 family of object identifiers which describe the Z39.50 protocol and its
722 related objects. The definition of the module interface is given in
723 the <filename>oid.h</filename> file.
727 The interface is mainly based on the <literal>oident</literal> structure.
728 The definition of this structure looks like this:
732 typedef struct oident
737 int oidsuffix[OID_SIZE];
743 The proto field takes one of the values
752 If you don't care about talking to SR-based implementations (few
753 exist, and they may become fewer still if and when the ISO SR and ANSI
754 Z39.50 documents are merged into a single standard), you can ignore
755 this field on incoming packages, and always set it to PROTO_Z3950
756 for outgoing packages.
760 The oclass field takes one of the values
782 corresponding to the OID classes defined by the Z39.50 standard.
784 Finally, the value field takes one of the values
842 again, corresponding to the specific OIDs defined by the standard.
846 The desc field contains a brief, mnemonic name for the OID in question.
854 struct oident *oid_getentbyoid(int *o);
858 takes as argument an OID, and returns a pointer to a static area
859 containing an <literal>oident</literal> structure. You typically use
860 this function when you receive a PDU containing an OID, and you wish
861 to branch out depending on the specific OID value.
869 int *oid_ent_to_oid(struct oident *ent, int *dst);
873 Takes as argument an <literal>oident</literal> structure - in which
874 the <literal>proto</literal>, <literal>oclass</literal>/, and
875 <literal>value</literal> fields are assumed to be set correctly -
876 and returns a pointer to a the buffer as given by <literal>dst</literal>
878 representation of the corresponding OID. The function returns
879 NULL and the array dst is unchanged if a mapping couldn't place.
880 The array <literal>dst</literal> should be at least of size
881 <literal>OID_SIZE</literal>.
885 The <function>oid_ent_to_oid()</function> function can be used whenever
886 you need to prepare a PDU containing one or more OIDs. The separation of
887 the <literal>protocol</literal> element from the remainder of the
888 OID-description makes it simple to write applications that can
889 communicate with either Z39.50 or OSI SR-based applications.
897 oid_value oid_getvalbyname(const char *name);
901 takes as argument a mnemonic OID name, and returns the
902 <literal>/value</literal> field of the first entry in the database that
903 contains the given name in its <literal>desc</literal> field.
907 Finally, the module provides the following utility functions, whose
908 meaning should be obvious:
912 void oid_oidcpy(int *t, int *s);
913 void oid_oidcat(int *t, int *s);
914 int oid_oidcmp(int *o1, int *o2);
915 int oid_oidlen(int *o);
920 The OID module has been criticized - and perhaps rightly so
921 - for needlessly abstracting the
922 representation of OIDs. Other toolkits use a simple
923 string-representation of OIDs with good results. In practice, we have
924 found the interface comfortable and quick to work with, and it is a
925 simple matter (for what it's worth) to create applications compatible
926 with both ISO SR and Z39.50. Finally, the use of the
927 <literal>/oident</literal> database is by no means mandatory.
928 You can easily create your own system for representing OIDs, as long
929 as it is compatible with the low-level integer-array representation
936 <sect1 id="tools.nmem"><title>Nibble Memory</title>
939 Sometimes when you need to allocate and construct a large,
940 interconnected complex of structures, it can be a bit of a pain to
941 release the associated memory again. For the structures describing the
942 Z39.50 PDUs and related structures, it is convenient to use the
943 memory-management system of the &odr; subsystem (see
944 <link linkend="odr-use">Using ODR</link>). However, in some circumstances
945 where you might otherwise benefit from using a simple nibble memory
946 management system, it may be impractical to use
947 <function>odr_malloc()</function> and <function>odr_reset()</function>.
948 For this purpose, the memory manager which also supports the &odr;
949 streams is made available in the NMEM module. The external interface
950 to this module is given in the <filename>nmem.h</filename> file.
954 The following prototypes are given:
958 NMEM nmem_create(void);
959 void nmem_destroy(NMEM n);
960 void *nmem_malloc(NMEM n, int size);
961 void nmem_reset(NMEM n);
962 int nmem_total(NMEM n);
963 void nmem_init(void);
964 void nmem_exit(void);
968 The <function>nmem_create()</function> function returns a pointer to a
969 memory control handle, which can be released again by
970 <function>nmem_destroy()</function> when no longer needed.
971 The function <function>nmem_malloc()</function> allocates a block of
972 memory of the requested size. A call to <function>nmem_reset()</function>
973 or <function>nmem_destroy()</function> will release all memory allocated
974 on the handle since it was created (or since the last call to
975 <function>nmem_reset()</function>. The function
976 <function>nmem_total()</function> returns the number of bytes currently
977 allocated on the handle.
981 The nibble memory pool is shared amongst threads. POSIX
982 mutex'es and WIN32 Critical sections are introduced to keep the
983 module thread safe. Function <function>nmem_init()</function>
984 initializes the nibble memory library and it is called automatically
985 the first time the <literal>YAZ.DLL</literal> is loaded. &yaz; uses
986 function <function>DllMain</function> to achieve this. You should
987 <emphasis>not</emphasis> call <function>nmem_init</function> or
988 <function>nmem_exit</function> unless you're absolute sure what
989 you're doing. Note that in previous &yaz; versions you'd have to call
990 <function>nmem_init</function> yourself.
996 <!-- Keep this comment at the end of the file
1001 sgml-minimize-attributes:nil
1002 sgml-always-quote-attributes:t
1005 sgml-parent-document: "yaz.xml"
1006 sgml-local-catalogs: nil
1007 sgml-namecase-general:t