doc/tools.xml

   1 <!-- $Id: tools.xml,v 1.21 2003-02-23 14:23:40 adam Exp $ -->
   2  <chapter id="tools"><title>Supporting Tools</title>
   3
   4   <para>
   5    In support of the service API - primarily the ASN module, which
   6    provides the pro-grammatic interface to the Z39.50 APDUs, &yaz; contains
   7    a collection of tools that support the development of applications.
   8   </para>
   9
  10   <sect1 id="tools.query"><title>Query Syntax Parsers</title>
  11
  12    <para>
  13     Since the type-1 (RPN) query structure has no direct, useful string
  14     representation, every origin application needs to provide some form of
  15     mapping from a local query notation or representation to a
  16     <token>Z_RPNQuery</token> structure. Some programmers will prefer to
  17     construct the query manually, perhaps using
  18     <function>odr_malloc()</function> to simplify memory management.
  19     The &yaz; distribution includes two separate, query-generating tools
  20     that may be of use to you.
  21    </para>
  22
  23    <sect2 id="PQF"><title>Prefix Query Format</title>
  24
  25     <para>
  26      Since RPN or reverse polish notation is really just a fancy way of
  27      describing a suffix notation format (operator follows operands), it
  28      would seem that the confusion is total when we now introduce a prefix
  29      notation for RPN. The reason is one of simple laziness - it's somewhat
  30      simpler to interpret a prefix format, and this utility was designed
  31      for maximum simplicity, to provide a baseline representation for use
  32      in simple test applications and scripting environments (like Tcl). The
  33      demonstration client included with YAZ uses the PQF.
  34     </para>
  35
  36     <note>
  37      <para>
  38       The PQF have been adopted by other parties developing Z39.50
  39       software. It is often referred to as Prefix Query Notation
  40       - PQN.
  41      </para>
  42     </note>
  43     <para>
  44      The PQF is defined by the pquery module in the YAZ library.
  45      There are two sets of function that have similar behavior. First
  46      set operates on a PQF parser handle, second set doesn't. First set
  47      set of functions are more flexible than the second set. Second set
  48      is obsolete and is only provided to ensure backwards compatibility.
  49     </para>
  50     <para>
  51      First set of functions all operate on a PQF parser handle:
  52     </para>
  53     <synopsis>
  54      #include &lt;yaz/pquery.h&gt;
  55
  56      YAZ_PQF_Parser yaz_pqf_create (void);
  57
  58      void yaz_pqf_destroy (YAZ_PQF_Parser p);
  59
  60      Z_RPNQuery *yaz_pqf_parse (YAZ_PQF_Parser p, ODR o, const char *qbuf);
  61
  62      Z_AttributesPlusTerm *yaz_pqf_scan (YAZ_PQF_Parser p, ODR o,
  63                           Odr_oid **attributeSetId, const char *qbuf);
  64
  65
  66      int yaz_pqf_error (YAZ_PQF_Parser p, const char **msg, size_t *off);
  67     </synopsis>
  68     <para>
  69      A PQF parser is created and destructed by functions
  70      <function>yaz_pqf_create</function> and
  71      <function>yaz_pqf_destroy</function> respectively.
  72      Function <function>yaz_pqf_parse</function> parses query given
  73      by string <literal>qbuf</literal>. If parsing was successful,
  74      a Z39.50 RPN Query is returned which is created using ODR stream
  75      <literal>o</literal>. If parsing failed, a NULL pointer is
  76      returned.
  77      Function <function>yaz_pqf_scan</function> takes a scan query in
  78      <literal>qbuf</literal>. If parsing was successful, the function
  79      returns attributes plus term pointer and modifies
  80      <literal>attributeSetId</literal> to hold attribute set for the
  81      scan request - both allocated using ODR stream <literal>o</literal>.
  82      If parsing failed, yaz_pqf_scan returns a NULL pointer.
  83      Error information for bad queries can be obtained by a call to
  84      <function>yaz_pqf_error</function> which returns an error code and
  85      modifies <literal>*msg</literal> to point to an error description,
  86      and modifies <literal>*off</literal> to the offset within last
  87      query were parsing failed.
  88     </para>
  89     <para>
  90      The second set of functions are declared as follows:
  91     </para>
  92     <synopsis>
  93      #include &lt;yaz/pquery.h&gt;
  94
  95      Z_RPNQuery *p_query_rpn (ODR o, oid_proto proto, const char *qbuf);
  96
  97      Z_AttributesPlusTerm *p_query_scan (ODR o, oid_proto proto,
  98                              Odr_oid **attributeSetP, const char *qbuf);
  99
 100      int p_query_attset (const char *arg);
 101     </synopsis>
 102     <para>
 103      The function <function>p_query_rpn()</function> takes as arguments an
 104       &odr; stream (see section <link linkend="odr">The ODR Module</link>)
 105      to provide a memory source (the structure created is released on
 106      the next call to <function>odr_reset()</function> on the stream), a
 107      protocol identifier (one of the constants <token>PROTO_Z3950</token> and
 108      <token>PROTO_SR</token>), an attribute set reference, and
 109      finally a null-terminated string holding the query string.
 110     </para>
 111     <para>
 112      If the parse went well, <function>p_query_rpn()</function> returns a
 113      pointer to a <literal>Z_RPNQuery</literal> structure which can be
 114      placed directly into a <literal>Z_SearchRequest</literal>.
 115      If parsing failed, due to syntax error, a NULL pointer is returned.
 116     </para>
 117     <para>
 118      The <literal>p_query_attset</literal> specifies which attribute set
 119      to use if the query doesn't specify one by the
 120      <literal>@attrset</literal> operator.
 121      The <literal>p_query_attset</literal> returns 0 if the argument is a
 122      valid attribute set specifier; otherwise the function returns -1.
 123     </para>
 124
 125     <para>
 126      The grammar of the PQF is as follows:
 127     </para>
 128
 129     <literallayout>
 130      query ::= top-set query-struct.
 131
 132      top-set ::= &lsqb; '@attrset' string &rsqb;
 133
 134      query-struct ::= attr-spec | simple | complex | '@term' term-type
 135
 136      attr-spec ::= '@attr' &lsqb; string &rsqb; string query-struct
 137
 138      complex ::= operator query-struct query-struct.
 139
 140      operator ::= '@and' | '@or' | '@not' | '@prox' proximity.
 141
 142      simple ::= result-set | term.
 143
 144      result-set ::= '@set' string.
 145
 146      term ::= string.
 147
 148      proximity ::= exclusion distance ordered relation which-code unit-code.
 149
 150      exclusion ::= '1' | '0' | 'void'.
 151
 152      distance ::= integer.
 153
 154      ordered ::= '1' | '0'.
 155
 156      relation ::= integer.
 157
 158      which-code ::= 'known' | 'private' | integer.
 159
 160      unit-code ::= integer.
 161
 162      term-type ::= 'general' | 'numeric' | 'string' | 'oid' | 'datetime' | 'null'.
 163     </literallayout>
 164
 165     <para>
 166      You will note that the syntax above is a fairly faithful
 167      representation of RPN, except for the Attribute, which has been
 168      moved a step away from the term, allowing you to associate one or more
 169      attributes with an entire query structure. The parser will
 170      automatically apply the given attributes to each term as required.
 171     </para>
 172
 173     <para>
 174      The @attr operator is followed by an attribute specification
 175      (<literal>attr-spec</literal> above). The specification consists
 176      of optional an attribute set, an attribute type-value pair and
 177      a sub query. The attribute type-value pair is packed in one string:
 178      an attribute type, a dash, followed by an attribute value.
 179      The type is always an integer but the value may be either an
 180      integer or a string (if it doesn't start with a digit character).
 181     </para>
 182
 183     <para>
 184      Version 3 of the Z39.50 specification defines various encoding of terms.
 185      Use the <literal>@term </literal> <replaceable>type</replaceable>,
 186      where type is one of: <literal>general</literal>,
 187      <literal>numeric</literal>, <literal>string</literal>
 188      (for InternationalString), ..
 189      If no term type has been given, the <literal>general</literal> form
 190      is used which is the only encoding allowed in both version 2 - and 3
 191      of the Z39.50 standard.
 192     </para>
 193
 194     <example><title>PQF queries</title>
 195
 196      <para>Queries using simple terms.
 197       <screen>
 198       dylan
 199       "bob dylan"
 200       </screen>
 201      </para>
 202      <para>Boolean operators.
 203       <screen>
 204        @or "dylan" "zimmerman"
 205        @and @or dylan zimmerman when
 206        @and when @or dylan zimmerman
 207       </screen>
 208      </para>
 209      <para>
 210       Reference to result sets.
 211       <screen>
 212        @set Result-1
 213        @and @set seta setb
 214       </screen>
 215      </para>
 216      <para>
 217       Attributes for terms.
 218       <screen>
 219        @attr 1=4 computer
 220        @attr 1=4 @attr 4=1 "self portrait"
 221        @attr exp1 @attr 1=1 CategoryList
 222        @attr gils 1=2008 Copenhagen
 223        @attr 1=/book/title computer
 224       </screen>
 225      </para>
 226      <para>
 227       Proximity.
 228       <screen>
 229        @prox 0 3 1 2 k 2 dylan zimmerman
 230        </screen>
 231       </para>
 232      <para>
 233       Specifying term type.
 234       <screen>
 235        @term string "a UTF-8 string, maybe?"
 236       </screen>
 237      </para>
 238      <para>Mixed queries
 239       <screen>
 240        @or @and bob dylan @set Result-1
 241
 242        @attr 4=1 @and @attr 1=1 "bob dylan" @attr 1=4 "slow train coming"
 243
 244        @and @attr 2=4 @attr gils 1=2038 -114 @attr 2=2 @attr gils 1=2039 -109
 245       </screen>
 246      </para>
 247     </example>
 248    </sect2>
 249    <sect2 id="CCL"><title>Common Command Language</title>
 250
 251     <para>
 252      Not all users enjoy typing in prefix query structures and numerical
 253      attribute values, even in a minimalistic test client. In the library
 254      world, the more intuitive Common Command Language (or ISO 8777) has
 255      enjoyed some popularity - especially before the widespread
 256      availability of graphical interfaces. It is still useful in
 257      applications where you for some reason or other need to provide a
 258      symbolic language for expressing boolean query structures.
 259     </para>
 260
 261     <para>
 262      The <ulink url="http://europagate.dtv.dk/">EUROPAGATE</ulink>
 263      research project working under the Libraries programme
 264      of the European Commission's DG XIII has, amongst other useful tools,
 265      implemented a general-purpose CCL parser which produces an output
 266      structure that can be trivially converted to the internal RPN
 267      representation of &yaz; (The <literal>Z_RPNQuery</literal> structure).
 268      Since the CCL utility - along with the rest of the software
 269      produced by EUROPAGATE - is made freely available on a liberal
 270      license, it is included as a supplement to &yaz;.
 271     </para>
 272
 273     <sect3><title>CCL Syntax</title>
 274
 275      <para>
 276       The CCL parser obeys the following grammar for the FIND argument.
 277       The syntax is annotated by in the lines prefixed by
 278       <literal>&dash;&dash;</literal>.
 279      </para>
 280
 281      <screen>
 282       CCL-Find ::= CCL-Find Op Elements
 283                 | Elements.
 284
 285       Op ::= "and" | "or" | "not"
 286       -- The above means that Elements are separated by boolean operators.
 287
 288       Elements ::= '(' CCL-Find ')'
 289                 | Set
 290                 | Terms
 291                 | Qualifiers Relation Terms
 292                 | Qualifiers Relation '(' CCL-Find ')'
 293                 | Qualifiers '=' string '-' string
 294       -- Elements is either a recursive definition, a result set reference, a
 295       -- list of terms, qualifiers followed by terms, qualifiers followed
 296       -- by a recursive definition or qualifiers in a range (lower - upper).
 297
 298       Set ::= 'set' = string
 299       -- Reference to a result set
 300
 301       Terms ::= Terms Prox Term
 302              | Term
 303       -- Proximity of terms.
 304
 305       Term ::= Term string
 306             | string
 307       -- This basically means that a term may include a blank
 308
 309       Qualifiers ::= Qualifiers ',' string
 310                   | string
 311       -- Qualifiers is a list of strings separated by comma
 312
 313       Relation ::= '=' | '>=' | '&lt;=' | '&lt;>' | '>' | '&lt;'
 314       -- Relational operators. This really doesn't follow the ISO8777
 315       -- standard.
 316
 317       Prox ::= '%' | '!'
 318       -- Proximity operator
 319
 320      </screen>
 321
 322      <para>
 323       The following queries are all valid:
 324      </para>
 325
 326      <screen>
 327       dylan
 328
 329       "bob dylan"
 330
 331       dylan or zimmerman
 332
 333       set=1
 334
 335       (dylan and bob) or set=1
 336
 337      </screen>
 338      <para>
 339       Assuming that the qualifiers <literal>ti</literal>, <literal>au</literal>
 340       and <literal>date</literal> are defined we may use:
 341      </para>
 342
 343      <screen>
 344       ti=self portrait
 345
 346       au=(bob dylan and slow train coming)
 347
 348       date>1980 and (ti=((self portrait)))
 349
 350      </screen>
 351
 352     </sect3>
 353     <sect3><title>CCL Qualifiers</title>
 354
 355      <para>
 356       Qualifiers are used to direct the search to a particular searchable
 357       index, such as title (ti) and author indexes (au). The CCL standard
 358       itself doesn't specify a particular set of qualifiers, but it does
 359       suggest a few short-hand notations. You can customize the CCL parser
 360       to support a particular set of qualifiers to reflect the current target
 361       profile. Traditionally, a qualifier would map to a particular
 362       use-attribute within the BIB-1 attribute set. However, you could also
 363       define qualifiers that would set, for example, the
 364       structure-attribute.
 365      </para>
 366
 367      <para>
 368       Consider a scenario where the target support ranked searches in the
 369       title-index. In this case, the user could specify
 370      </para>
 371
 372      <screen>
 373       ti,ranked=knuth computer
 374      </screen>
 375      <para>
 376       and the <literal>ranked</literal> would map to relation=relevance
 377       (2=102) and the <literal>ti</literal> would map to title (1=4).
 378      </para>
 379
 380      <para>
 381       A "profile" with a set predefined CCL qualifiers can be read from a
 382       file. The YAZ client reads its CCL qualifiers from a file named
 383       <filename>default.bib</filename>. Each line in the file has the form:
 384      </para>
 385
 386      <para>
 387       <replaceable>qualifier-name</replaceable>
 388       <replaceable>type</replaceable>=<replaceable>val</replaceable>
 389       <replaceable>type</replaceable>=<replaceable>val</replaceable> ...
 390      </para>
 391
 392      <para>
 393       where <replaceable>qualifier-name</replaceable> is the name of the
 394       qualifier to be used (eg. <literal>ti</literal>),
 395       <replaceable>type</replaceable> is a BIB-1 category type and
 396       <replaceable>val</replaceable> is the corresponding BIB-1 attribute
 397       value.
 398       The <replaceable>type</replaceable> can be either numeric or it may be
 399       either <literal>u</literal> (use), <literal>r</literal> (relation),
 400       <literal>p</literal> (position), <literal>s</literal> (structure),
 401       <literal>t</literal> (truncation) or <literal>c</literal> (completeness).
 402       The <replaceable>qualifier-name</replaceable> <literal>term</literal>
 403       has a special meaning.
 404       The types and values for this definition is used when
 405       <emphasis>no</emphasis> qualifiers are present.
 406      </para>
 407
 408      <para>
 409       Consider the following definition:
 410      </para>
 411
 412      <screen>
 413       ti       u=4 s=1
 414       au       u=1 s=1
 415       term     s=105
 416      </screen>
 417      <para>
 418       Two qualifiers are defined, <literal>ti</literal> and
 419       <literal>au</literal>.
 420       They both set the structure-attribute to phrase (1).
 421       <literal>ti</literal>
 422       sets the use-attribute to 4. <literal>au</literal> sets the
 423       use-attribute to 1.
 424       When no qualifiers are used in the query the structure-attribute is
 425       set to free-form-text (105).
 426      </para>
 427
 428     </sect3>
 429     <sect3><title>CCL API</title>
 430      <para>
 431       All public definitions can be found in the header file
 432       <filename>ccl.h</filename>. A profile identifier is of type
 433       <literal>CCL_bibset</literal>. A profile must be created with the call
 434       to the function <function>ccl_qual_mk</function> which returns a profile
 435       handle of type <literal>CCL_bibset</literal>.
 436      </para>
 437
 438      <para>
 439       To read a file containing qualifier definitions the function
 440       <function>ccl_qual_file</function> may be convenient. This function
 441       takes an already opened <literal>FILE</literal> handle pointer as
 442       argument along with a <literal>CCL_bibset</literal> handle.
 443      </para>
 444
 445      <para>
 446       To parse a simple string with a FIND query use the function
 447      </para>
 448      <screen>
 449 struct ccl_rpn_node *ccl_find_str (CCL_bibset bibset, const char *str,
 450                                    int *error, int *pos);
 451      </screen>
 452      <para>
 453       which takes the CCL profile (<literal>bibset</literal>) and query
 454       (<literal>str</literal>) as input. Upon successful completion the RPN
 455       tree is returned. If an error occur, such as a syntax error, the integer
 456       pointed to by <literal>error</literal> holds the error code and
 457       <literal>pos</literal> holds the offset inside query string in which
 458       the parsing failed.
 459      </para>
 460
 461      <para>
 462       An English representation of the error may be obtained by calling
 463       the <literal>ccl_err_msg</literal> function. The error codes are
 464       listed in <filename>ccl.h</filename>.
 465      </para>
 466
 467      <para>
 468       To convert the CCL RPN tree (type
 469       <literal>struct ccl_rpn_node *</literal>)
 470       to the Z_RPNQuery of YAZ the function <function>ccl_rpn_query</function>
 471       must be used. This function which is part of YAZ is implemented in
 472       <filename>yaz-ccl.c</filename>.
 473       After calling this function the CCL RPN tree is probably no longer
 474       needed. The <literal>ccl_rpn_delete</literal> destroys the CCL RPN tree.
 475      </para>
 476
 477      <para>
 478       A CCL profile may be destroyed by calling the
 479       <function>ccl_qual_rm</function> function.
 480      </para>
 481
 482      <para>
 483       The token names for the CCL operators may be changed by setting the
 484       globals (all type <literal>char *</literal>)
 485       <literal>ccl_token_and</literal>, <literal>ccl_token_or</literal>,
 486       <literal>ccl_token_not</literal> and <literal>ccl_token_set</literal>.
 487       An operator may have aliases, i.e. there may be more than one name for
 488       the operator. To do this, separate each alias with a space character.
 489      </para>
 490     </sect3>
 491    </sect2>
 492    <sect2 id="tools.cql"><title>CQL</title>
 493     <para>
 494      <ulink url="http://www.loc.gov/z3950/agency/zing/cql/">CQL</ulink>
 495       - Common Query Language - was defined for the
 496      <ulink url="http://www.loc.gov/z3950/agency/zing/srw/">SRW</ulink>
 497      protocol.
 498      In many ways CQL has a similar syntax to CCL.
 499      The objective of CQL is different. Where CCL aims to be
 500      an end-user language, CQL is <emphasis>the</emphasis> protocol
 501      query language for SRW.
 502     </para>
 503     <tip>
 504      <para>
 505       If you are new to CQL, read the
 506       <ulink url="http://zing.z3950.org/cql/intro.html">Gentle
 507        Introduction</ulink>.
 508      </para>
 509     </tip>
 510     <para>
 511      The CQL parser in &yaz; provides the following:
 512      <itemizedlist>
 513       <listitem>
 514        <para>
 515         It parses and validates a CQL query.
 516        </para>
 517       </listitem>
 518       <listitem>
 519        <para>
 520         It generates a C structure that allows you to convert
 521         a CQL query to some other query language, such as SQL.
 522        </para>
 523       </listitem>
 524       <listitem>
 525        <para>
 526         The parser converts a valid CQL query to PQF, thus providing a
 527         way to use CQL for both SRW/SRU servers and Z39.50 targets at the
 528         same time.
 529        </para>
 530       </listitem>
 531       <listitem>
 532        <para>
 533         The parser converts CQL to
 534         <ulink url="http://www.loc.gov/z3950/agency/zing/cql/xcql.html">
 535          XCQL</ulink>.
 536         XCQL is an XML representation of CQL.
 537         XCQL is part of the SRW specification. However, since SRU
 538         supports CQL only, we don't expect XCQL to be widely used.
 539         Furthermore, CQL has the advantage over XCQL that it is
 540         easy to read.
 541        </para>
 542       </listitem>
 543      </itemizedlist>
 544     </para>
 545     <sect3 id="tools.cql.parsing"><title>CQL parsing</title>
 546      <para>
 547       A CQL parser is represented by the <literal>CQL_parser</literal>
 548       handle. Its contents should be considered &yaz; internal (private).
 549       <synopsis>
 550 #include &lt;yaz/cql.h&gt;
 551
 552 typedef struct cql_parser *CQL_parser;
 553
 554 CQL_parser cql_parser_create(void);
 555 void cql_parser_destroy(CQL_parser cp);
 556       </synopsis>
 557      A parser is created by <function>cql_parser_create</function> and
 558      is destroyed by <function>cql_parser_destroy</function>.
 559      </para>
 560      <para>
 561       To parse a CQL query string, the following function
 562       is provided:
 563       <synopsis>
 564 int cql_parser_string(CQL_parser cp, const char *str);
 565       </synopsis>
 566       A CQL query is parsed by the <function>cql_parser_string</function>
 567       which takes a query <parameter>str</parameter>.
 568       If the query was valid (no syntax errors), then zero is returned;
 569       otherwise a non-zero error code is returned.
 570      </para>
 571      <para>
 572       <synopsis>
 573 int cql_parser_stream(CQL_parser cp,
 574                       int (*getbyte)(void *client_data),
 575                       void (*ungetbyte)(int b, void *client_data),
 576                       void *client_data);
 577
 578 int cql_parser_stdio(CQL_parser cp, FILE *f);
 579       </synopsis>
 580       The functions <function>cql_parser_stream</function> and
 581       <function>cql_parser_stdio</function> parses a CQL query
 582       - just like <function>cql_parser_string</function>.
 583       The only difference is that the CQL query can be
 584       fed to the parser in different ways.
 585       The <function>cql_parser_stream</function> uses a generic
 586       byte stream as input. The <function>cql_parser_stdio</function>
 587       uses a <literal>FILE</literal> handle which is opened for reading.
 588      </para>
 589     </sect3>
 590
 591     <sect3 id="tools.cql.tree"><title>CQL tree</title>
 592      <para>
 593       The the query string is validl, the CQL parser
 594       generates a tree representing the structure of the
 595       CQL query.
 596      </para>
 597      <para>
 598       <synopsis>
 599 struct cql_node *cql_parser_result(CQL_parser cp);
 600       </synopsis>
 601       <function>cql_parser_result</function> returns the
 602       a pointer to the root node of the resulting tree.
 603      </para>
 604      <para>
 605       Each node in a CQL tree is represented by a
 606       <literal>struct cql_node</literal>.
 607       It is defined as follows:
 608       <synopsis>
 609 #define CQL_NODE_ST 1
 610 #define CQL_NODE_BOOL 2
 611 #define CQL_NODE_MOD 3
 612 struct cql_node {
 613     int which;
 614     union {
 615         struct {
 616             char *index;
 617             char *term;
 618             char *relation;
 619             struct cql_node *modifiers;
 620             struct cql_node *prefixes;
 621         } st;
 622         struct {
 623             char *value;
 624             struct cql_node *left;
 625             struct cql_node *right;
 626             struct cql_node *modifiers;
 627             struct cql_node *prefixes;
 628         } boolean;
 629         struct {
 630             char *name;
 631             char *value;
 632             struct cql_node *next;
 633         } mod;
 634     } u;
 635 };
 636       </synopsis>
 637       There are three kinds of nodes, search term (ST), boolean (BOOL),
 638       and modifier (MOD).
 639      </para>
 640      <para>
 641       The search term node has five members:
 642       <itemizedlist>
 643        <listitem>
 644         <para>
 645          <literal>index</literal>: index for search term.
 646          If an index is unspecified for a search term,
 647          <literal>index</literal> will be NULL.
 648         </para>
 649        </listitem>
 650        <listitem>
 651         <para>
 652          <literal>term</literal>: the search term itself.
 653         </para>
 654        </listitem>
 655        <listitem>
 656         <para>
 657          <literal>relation</literal>: relation for search term.
 658         </para>
 659        </listitem>
 660        <listitem>
 661         <para>
 662          <literal>modifiers</literal>: relation modifiers for search
 663          term. The <literal>modifiers</literal> is a simple linked
 664          list (NULL for last entry). Each relation modifier node
 665          is of type <literal>MOD</literal>.
 666         </para>
 667        </listitem>
 668        <listitem>
 669         <para>
 670          <literal>prefixes</literal>: index prefixes for search
 671          term. The <literal>prefixes</literal> is a simple linked
 672          list (NULL for last entry). Each prefix node
 673          is of type <literal>MOD</literal>.
 674         </para>
 675        </listitem>
 676       </itemizedlist>
 677      </para>
 678
 679      <para>
 680       The boolean node represents both <literal>and</literal>,
 681       <literal>or</literal>, not as well as
 682       proximity.
 683       <itemizedlist>
 684        <listitem>
 685         <para>
 686          <literal>left</literal> and <literal>right</literal>: left
 687          - and right operand respectively.
 688         </para>
 689        </listitem>
 690        <listitem>
 691         <para>
 692          <literal>modifiers</literal>: proximity arguments.
 693         </para>
 694        </listitem>
 695        <listitem>
 696         <para>
 697          <literal>prefixes</literal>: index prefixes.
 698          The <literal>prefixes</literal> is a simple linked
 699          list (NULL for last entry). Each prefix node
 700          is of type <literal>MOD</literal>.
 701         </para>
 702        </listitem>
 703       </itemizedlist>
 704      </para>
 705
 706      <para>
 707       The modifier node is a "utility" node used for name-value pairs,
 708       such as prefixes, proximity arguements, etc.
 709       <itemizedlist>
 710        <listitem>
 711         <para>
 712          <literal>name</literal> name of mod node.
 713         </para>
 714        </listitem>
 715        <listitem>
 716         <para>
 717          <literal>value</literal> value of mod node.
 718         </para>
 719        </listitem>
 720        <listitem>
 721         <para>
 722          <literal>next</literal>: pointer to next node which is
 723          always a mod node (NULL for last entry).
 724         </para>
 725        </listitem>
 726       </itemizedlist>
 727      </para>
 728
 729     </sect3>
 730     <sect3 id="tools.cql.pqf"><title>CQL to PQF conversion</title>
 731      <para>
 732       Conversion to PQF (and Z39.50 RPN) is tricky by the fact
 733       that the resulting RPN depends on the Z39.50 target
 734       capabilities (combinations of supported attributes).
 735       In addition, the CQL and SRW operates on index prefixes
 736       (URI or strings), whereas the RPN uses Object Identifiers
 737       for attribute sets.
 738      </para>
 739      <para>
 740       The CQL library of &yaz; defines a <literal>cql_transform_t</literal>
 741       type. It represents a particular mapping between CQL and RPN.
 742       This handle is created and destroyed by the functions:
 743      <synopsis>
 744 cql_transform_t cql_transform_open_FILE (FILE *f);
 745 cql_transform_t cql_transform_open_fname(const char *fname);
 746 void cql_transform_close(cql_transform_t ct);
 747       </synopsis>
 748       The first two functions create a tranformation handle from
 749       either an already open FILE or from a filename respectively.
 750      </para>
 751      <para>
 752       The handle is destroyed by <function>cql_transform_close</function>
 753       in which case no further reference of the handle is allowed.
 754      </para>
 755      <para>
 756       When a <literal>cql_transform_t</literal> handle has been created
 757       you can convert to RPN.
 758       <synopsis>
 759 int cql_transform_buf(cql_transform_t ct,
 760                       struct cql_node *cn, char *out, int max);
 761       </synopsis>
 762       This function converts the CQL tree <literal>cn</literal>
 763       using handle <literal>ct</literal>.
 764       For the resulting PQF, you supply a buffer <literal>out</literal>
 765       which must be able to hold at at least <literal>max</literal>
 766       characters.
 767      </para>
 768      <para>
 769       If conversion failed, <function>cql_transform_buf</function>
 770       returns a non-zero error code; otherwise zero is returned
 771       (conversion successful).
 772      </para>
 773      <para>
 774       If you wish to be able to produce a PQF result in a different
 775       way, there are two alternatives.
 776       <synopsis>
 777 void cql_transform_pr(cql_transform_t ct,
 778                       struct cql_node *cn,
 779                       void (*pr)(const char *buf, void *client_data),
 780                       void *client_data);
 781
 782 int cql_transform_FILE(cql_transform_t ct,
 783                        struct cql_node *cn, FILE *f);
 784       </synopsis>
 785       The former function produces output to a user-defined
 786       output stream. The latter writes the result to an already
 787       open <literal>FILE</literal>.
 788      </para>
 789     </sect3>
 790     <sect3 id="tools.cql.map">
 791      <title>Specification of CQL to RPN mapping</title>
 792      <para>
 793       The file supplied to functions
 794       <function>cql_transform_open_FILE</function>,
 795       <function>cql_transform_open_fname</function> follows
 796       a structure found in many Unix utilities.
 797       It consists of mapping specifications - one per line.
 798       Lines starting with <literal>#</literal> are ignored (comments).
 799      </para>
 800      <para>
 801       Each line is of the form
 802       <literallayout>
 803        <replaceable>CQL pattern</replaceable><literal> = </literal> <replaceable> RPN equivalent</replaceable>
 804       </literallayout>
 805      </para>
 806      <para>
 807       An RPN pattern is a simple attribute list. Each attribute pair
 808       takes the form:
 809       <literallayout>
 810        [<replaceable>set</replaceable>] <replaceable>type</replaceable><literal>=</literal><replaceable>value</replaceable>
 811       </literallayout>
 812       The attribute <replaceable>set</replaceable> is optional.
 813       The <replaceable>type</replaceable> is the attribute type,
 814       <replaceable>value</replaceable> the attribute value.
 815      </para>
 816      <para>
 817       The following CQL patterns are recognized:
 818       <variablelist>
 819        <varlistentry><term>
 820          <literal>qualifier.</literal><replaceable>set</replaceable><literal>.</literal><replaceable>name</replaceable>
 821         </term>
 822         <listitem>
 823          <para>
 824           This pattern is invoked when a CQL qualifier, such as
 825           dc.title is converted. <replaceable>set</replaceable>
 826           and <replaceable>name</replaceable> is the index set and qualifier
 827           name respectively.
 828           Typically, the RPN specifies an equivalent use attribute.
 829          </para>
 830          <para>
 831           For terms not bound by a qualifier the pattern
 832           <literal>qualifier.srw.serverChoice</literal> is used.
 833           Here, the prefix <literal>srw</literal> is defined as
 834           <literal>http://www.loc.gov/zing/cql/srw-indexes/v1.0/</literal>.
 835           If this pattern is not defined, the mapping will fail.
 836          </para>
 837         </listitem>
 838        </varlistentry>
 839        <varlistentry><term>
 840          <literal>relation.</literal><replaceable>relation</replaceable>
 841         </term>
 842         <listitem>
 843          <para>
 844           This pattern specifies how a CQL relation is mapped to RPN.
 845           <replaceable>pattern</replaceable> is name of relation
 846           operator. Since <literal>=</literal> is used as
 847           separator between CQL pattern and RPN, CQL relations
 848           including <literal>=</literal> cannot be
 849           used directly. To avoid a conflict, the names
 850           <literal>ge</literal>,
 851           <literal>eq</literal>,
 852           <literal>le</literal>,
 853           must be used for CQL operators, greater-than-or-equal,
 854           equal, less-than-or-equal respectively.
 855           The RPN pattern is supposed to include a relation attribute.
 856          </para>
 857          <para>
 858           For terms not bound by a relation, the pattern
 859           <literal>relation.scr</literal> is used. If the pattern
 860           is not defined, the mapping will fail.
 861          </para>
 862          <para>
 863           The special pattern, <literal>relation.*</literal> is used
 864           when no other relation pattern is matched.
 865          </para>
 866         </listitem>
 867        </varlistentry>
 868
 869        <varlistentry><term>
 870          <literal>relationModifier.</literal><replaceable>mod</replaceable>
 871         </term>
 872         <listitem>
 873          <para>
 874           This pattern specifies how a CQL relation modifier is mapped to RPN.
 875           The RPN pattern is usually a relation attribute.
 876          </para>
 877         </listitem>
 878        </varlistentry>
 879
 880        <varlistentry><term>
 881          <literal>structure.</literal><replaceable>type</replaceable>
 882         </term>
 883         <listitem>
 884          <para>
 885           This pattern specifies how a CQL structure is mapped to RPN.
 886           Note that this CQL pattern is somewhat to similar to
 887           CQL pattern <literal>relation</literal>.
 888           The <replaceable>type</replaceable> is a CQL relation.
 889          </para>
 890          <para>
 891           The pattern, <literal>structure.*</literal> is used
 892           when no other structure pattern is matched.
 893           Usually, the RPN equivalent specifies a structure attribute.
 894          </para>
 895         </listitem>
 896        </varlistentry>
 897
 898        <varlistentry><term>
 899          <literal>position.</literal><replaceable>type</replaceable>
 900         </term>
 901         <listitem>
 902          <para>
 903           This pattern specifies how the anchor (position) of
 904           CQL is mapped to RPN.
 905           The <replaceable>type</replaceable> is one
 906           of <literal>first</literal>, <literal>any</literal>,
 907           <literal>last</literal>, <literal>firstAndLast</literal>.
 908          </para>
 909          <para>
 910           The pattern, <literal>position.*</literal> is used
 911           when no other position pattern is matched.
 912          </para>
 913         </listitem>
 914        </varlistentry>
 915
 916        <varlistentry><term>
 917          <literal>set.</literal><replaceable>prefix</replaceable>
 918         </term>
 919         <listitem>
 920          <para>
 921           This specification defines a CQL index set for a given prefix.
 922           The value on the right hand side is the URI for the set -
 923           <emphasis>not</emphasis> RPN. All prefixes used in
 924           qualifier patterns must be defined this way.
 925          </para>
 926         </listitem>
 927        </varlistentry>
 928       </variablelist>
 929      </para>
 930      <example><title>Small CQL to RPN mapping file</title>
 931       <para>
 932        This small file defines two index sets, three qualifiers and three
 933        relations, a position pattern and a default structure.
 934       </para>
 935       <programlisting><![CDATA[
 936        set.srw    = http://www.loc.gov/zing/cql/srw-indexes/v1.0/
 937        set.dc     = http://www.loc.gov/zing/cql/dc-indexes/v1.0/
 938
 939        qualifier.srw.serverChoice = 1=1016
 940        qualifier.dc.title         = 1=4
 941        qualifier.dc.subject       = 1=21
 942
 943        relation.<                 = 2=1
 944        relation.eq                = 2=3
 945        relation.scr               = 2=3
 946
 947        position.any               = 3=3 6=1
 948
 949        structure.*                = 4=1
 950 ]]>
 951       </programlisting>
 952       <para>
 953        With the mappings above, the CQL query
 954        <screen>
 955         computer
 956        </screen>
 957        is converted to the PQF:
 958        <screen>
 959         @attr 1=1016 @attr 2=3 @attr 4=1 @attr 3=3 @attr 6=1 "computer"
 960        </screen>
 961        by rules <literal>qualifier.srw.serverChoice</literal>,
 962        <literal>relation.scr</literal>, <literal>structure.*</literal>,
 963        <literal>position.any</literal>.
 964       </para>
 965       <para>
 966        CQL query
 967        <screen>
 968         computer^
 969        </screen>
 970        is rejected, since <literal>position.right</literal> is
 971        undefined.
 972       </para>
 973       <para>
 974        CQL query
 975        <screen>
 976         >my = "http://www.loc.gov/zing/cql/dc-indexes/v1.0/" my.title = x
 977        </screen>
 978        is converted to
 979        <screen>
 980         @attr 1=4 @attr 2=3 @attr 4=1 @attr 3=3 @attr 6=1 "x"
 981        </screen>
 982       </para>
 983      </example>
 984     </sect3>
 985     <sect3 id="tools.cql.xcql"><title>CQL to XCQL conversion</title>
 986      <para>
 987       Conversion from CQL to XCQL is trivial and does not
 988       require a mapping to be defined.
 989       There three functions to choose from depending on the
 990       way you wish to store the resulting output (XML buffer
 991       containing XCQL).
 992       <synopsis>
 993 int cql_to_xml_buf(struct cql_node *cn, char *out, int max);
 994 void cql_to_xml(struct cql_node *cn,
 995                 void (*pr)(const char *buf, void *client_data),
 996                 void *client_data);
 997 void cql_to_xml_stdio(struct cql_node *cn, FILE *f);
 998       </synopsis>
 999       Function <function>cql_to_xml_buf</function> converts
1000       to XCQL and stores result in a user supplied buffer of a given
1001       max size.
1002      </para>
1003      <para>
1004       <function>cql_to_xml</function> writes the result in
1005       a user defined output stream.
1006       <function>cql_to_xml_stdio</function> writes to a
1007       a file.
1008      </para>
1009     </sect3>
1010    </sect2>
1011   </sect1>
1012   <sect1 id="tools.oid"><title>Object Identifiers</title>
1013
1014    <para>
1015     The basic YAZ representation of an OID is an array of integers,
1016     terminated with the value -1. The &odr; module provides two
1017     utility-functions to create and copy this type of data elements:
1018    </para>
1019
1020    <screen>
1021     Odr_oid *odr_getoidbystr(ODR o, char *str);
1022    </screen>
1023
1024    <para>
1025     Creates an OID based on a string-based representation using dots (.)
1026     to separate elements in the OID.
1027    </para>
1028
1029    <screen>
1030     Odr_oid *odr_oiddup(ODR odr, Odr_oid *o);
1031    </screen>
1032
1033    <para>
1034     Creates a copy of the OID referenced by the <emphasis>o</emphasis>
1035     parameter.
1036     Both functions take an &odr; stream as parameter. This stream is used to
1037     allocate memory for the data elements, which is released on a
1038     subsequent call to <function>odr_reset()</function> on that stream.
1039    </para>
1040
1041    <para>
1042     The OID module provides a higher-level representation of the
1043     family of object identifiers which describe the Z39.50 protocol and its
1044     related objects. The definition of the module interface is given in
1045     the <filename>oid.h</filename> file.
1046    </para>
1047
1048    <para>
1049     The interface is mainly based on the <literal>oident</literal> structure.
1050     The definition of this structure looks like this:
1051    </para>
1052
1053    <screen>
1054 typedef struct oident
1055 {
1056     oid_proto proto;
1057     oid_class oclass;
1058     oid_value value;
1059     int oidsuffix[OID_SIZE];
1060     char *desc;
1061 } oident;
1062    </screen>
1063
1064    <para>
1065     The proto field takes one of the values
1066    </para>
1067
1068    <screen>
1069     PROTO_Z3950
1070     PROTO_SR
1071    </screen>
1072
1073    <para>
1074     If you don't care about talking to SR-based implementations (few
1075     exist, and they may become fewer still if and when the ISO SR and ANSI
1076     Z39.50 documents are merged into a single standard), you can ignore
1077     this field on incoming packages, and always set it to PROTO_Z3950
1078     for outgoing packages.
1079    </para>
1080    <para>
1081
1082     The oclass field takes one of the values
1083    </para>
1084
1085    <screen>
1086     CLASS_APPCTX
1087     CLASS_ABSYN
1088     CLASS_ATTSET
1089     CLASS_TRANSYN
1090     CLASS_DIAGSET
1091     CLASS_RECSYN
1092     CLASS_RESFORM
1093     CLASS_ACCFORM
1094     CLASS_EXTSERV
1095     CLASS_USERINFO
1096     CLASS_ELEMSPEC
1097     CLASS_VARSET
1098     CLASS_SCHEMA
1099     CLASS_TAGSET
1100     CLASS_GENERAL
1101    </screen>
1102
1103    <para>
1104     corresponding to the OID classes defined by the Z39.50 standard.
1105
1106     Finally, the value field takes one of the values
1107    </para>
1108
1109    <screen>
1110     VAL_APDU
1111     VAL_BER
1112     VAL_BASIC_CTX
1113     VAL_BIB1
1114     VAL_EXP1
1115     VAL_EXT1
1116     VAL_CCL1
1117     VAL_GILS
1118     VAL_WAIS
1119     VAL_STAS
1120     VAL_DIAG1
1121     VAL_ISO2709
1122     VAL_UNIMARC
1123     VAL_INTERMARC
1124     VAL_CCF
1125     VAL_USMARC
1126     VAL_UKMARC
1127     VAL_NORMARC
1128     VAL_LIBRISMARC
1129     VAL_DANMARC
1130     VAL_FINMARC
1131     VAL_MAB
1132     VAL_CANMARC
1133     VAL_SBN
1134     VAL_PICAMARC
1135     VAL_AUSMARC
1136     VAL_IBERMARC
1137     VAL_EXPLAIN
1138     VAL_SUTRS
1139     VAL_OPAC
1140     VAL_SUMMARY
1141     VAL_GRS0
1142     VAL_GRS1
1143     VAL_EXTENDED
1144     VAL_RESOURCE1
1145     VAL_RESOURCE2
1146     VAL_PROMPT1
1147     VAL_DES1
1148     VAL_KRB1
1149     VAL_PRESSET
1150     VAL_PQUERY
1151     VAL_PCQUERY
1152     VAL_ITEMORDER
1153     VAL_DBUPDATE
1154     VAL_EXPORTSPEC
1155     VAL_EXPORTINV
1156     VAL_NONE
1157     VAL_SETM
1158     VAL_SETG
1159     VAL_VAR1
1160     VAL_ESPEC1
1161    </screen>
1162
1163    <para>
1164     again, corresponding to the specific OIDs defined by the standard.
1165    </para>
1166
1167    <para>
1168     The desc field contains a brief, mnemonic name for the OID in question.
1169    </para>
1170
1171    <para>
1172     The function
1173    </para>
1174
1175    <screen>
1176     struct oident *oid_getentbyoid(int *o);
1177    </screen>
1178
1179    <para>
1180     takes as argument an OID, and returns a pointer to a static area
1181     containing an <literal>oident</literal> structure. You typically use
1182     this function when you receive a PDU containing an OID, and you wish
1183     to branch out depending on the specific OID value.
1184    </para>
1185
1186    <para>
1187     The function
1188    </para>
1189
1190    <screen>
1191     int *oid_ent_to_oid(struct oident *ent, int *dst);
1192    </screen>
1193
1194    <para>
1195     Takes as argument an <literal>oident</literal> structure - in which
1196     the <literal>proto</literal>, <literal>oclass</literal>/, and
1197     <literal>value</literal> fields are assumed to be set correctly -
1198     and returns a pointer to a the buffer as given by <literal>dst</literal>
1199     containing the base
1200     representation of the corresponding OID. The function returns
1201     NULL and the array dst is unchanged if a mapping couldn't place.
1202     The array <literal>dst</literal> should be at least of size
1203     <literal>OID_SIZE</literal>.
1204    </para>
1205    <para>
1206
1207     The <function>oid_ent_to_oid()</function> function can be used whenever
1208     you need to prepare a PDU containing one or more OIDs. The separation of
1209     the <literal>protocol</literal> element from the remainder of the
1210     OID-description makes it simple to write applications that can
1211     communicate with either Z39.50 or OSI SR-based applications.
1212    </para>
1213
1214    <para>
1215     The function
1216    </para>
1217
1218    <screen>
1219     oid_value oid_getvalbyname(const char *name);
1220    </screen>
1221
1222    <para>
1223     takes as argument a mnemonic OID name, and returns the
1224     <literal>/value</literal> field of the first entry in the database that
1225     contains the given name in its <literal>desc</literal> field.
1226    </para>
1227
1228    <para>
1229     Finally, the module provides the following utility functions, whose
1230     meaning should be obvious:
1231    </para>
1232
1233    <screen>
1234     void oid_oidcpy(int *t, int *s);
1235     void oid_oidcat(int *t, int *s);
1236     int oid_oidcmp(int *o1, int *o2);
1237     int oid_oidlen(int *o);
1238    </screen>
1239
1240    <note>
1241     <para>
1242      The OID module has been criticized - and perhaps rightly so
1243      - for needlessly abstracting the
1244      representation of OIDs. Other toolkits use a simple
1245      string-representation of OIDs with good results. In practice, we have
1246      found the interface comfortable and quick to work with, and it is a
1247      simple matter (for what it's worth) to create applications compatible
1248      with both ISO SR and Z39.50. Finally, the use of the
1249      <literal>/oident</literal> database is by no means mandatory.
1250      You can easily create your own system for representing OIDs, as long
1251      as it is compatible with the low-level integer-array representation
1252      of the ODR module.
1253     </para>
1254    </note>
1255
1256   </sect1>
1257
1258   <sect1 id="tools.nmem"><title>Nibble Memory</title>
1259
1260    <para>
1261     Sometimes when you need to allocate and construct a large,
1262     interconnected complex of structures, it can be a bit of a pain to
1263     release the associated memory again. For the structures describing the
1264     Z39.50 PDUs and related structures, it is convenient to use the
1265     memory-management system of the &odr; subsystem (see
1266     <link linkend="odr-use">Using ODR</link>). However, in some circumstances
1267     where you might otherwise benefit from using a simple nibble memory
1268     management system, it may be impractical to use
1269     <function>odr_malloc()</function> and <function>odr_reset()</function>.
1270     For this purpose, the memory manager which also supports the &odr;
1271     streams is made available in the NMEM module. The external interface
1272     to this module is given in the <filename>nmem.h</filename> file.
1273    </para>
1274
1275    <para>
1276     The following prototypes are given:
1277    </para>
1278
1279    <screen>
1280     NMEM nmem_create(void);
1281     void nmem_destroy(NMEM n);
1282     void *nmem_malloc(NMEM n, int size);
1283     void nmem_reset(NMEM n);
1284     int nmem_total(NMEM n);
1285     void nmem_init(void);
1286     void nmem_exit(void);
1287    </screen>
1288
1289    <para>
1290     The <function>nmem_create()</function> function returns a pointer to a
1291     memory control handle, which can be released again by
1292     <function>nmem_destroy()</function> when no longer needed.
1293     The function <function>nmem_malloc()</function> allocates a block of
1294     memory of the requested size. A call to <function>nmem_reset()</function>
1295     or <function>nmem_destroy()</function> will release all memory allocated
1296     on the handle since it was created (or since the last call to
1297     <function>nmem_reset()</function>. The function
1298     <function>nmem_total()</function> returns the number of bytes currently
1299     allocated on the handle.
1300    </para>
1301
1302    <para>
1303     The nibble memory pool is shared amongst threads. POSIX
1304     mutex'es and WIN32 Critical sections are introduced to keep the
1305     module thread safe. Function <function>nmem_init()</function>
1306     initializes the nibble memory library and it is called automatically
1307     the first time the <literal>YAZ.DLL</literal> is loaded. &yaz; uses
1308     function <function>DllMain</function> to achieve this. You should
1309     <emphasis>not</emphasis> call <function>nmem_init</function> or
1310     <function>nmem_exit</function> unless you're absolute sure what
1311     you're doing. Note that in previous &yaz; versions you'd have to call
1312     <function>nmem_init</function> yourself.
1313    </para>
1314
1315   </sect1>
1316  </chapter>
1317
1318  <!-- Keep this comment at the end of the file
1319  Local variables:
1320  mode: sgml
1321  sgml-omittag:t
1322  sgml-shorttag:t
1323  sgml-minimize-attributes:nil
1324  sgml-always-quote-attributes:t
1325  sgml-indent-step:1
1326  sgml-indent-data:t
1327  sgml-parent-document: "yaz.xml"
1328  sgml-local-catalogs: nil
1329  sgml-namecase-general:t
1330  End:
1331  -->