doc/tools.xml

   1 <!-- $Id: tools.xml,v 1.22 2003-03-18 13:30:21 adam Exp $ -->
   2  <chapter id="tools"><title>Supporting Tools</title>
   3
   4   <para>
   5    In support of the service API - primarily the ASN module, which
   6    provides the pro-grammatic interface to the Z39.50 APDUs, &yaz; contains
   7    a collection of tools that support the development of applications.
   8   </para>
   9
  10   <sect1 id="tools.query"><title>Query Syntax Parsers</title>
  11
  12    <para>
  13     Since the type-1 (RPN) query structure has no direct, useful string
  14     representation, every origin application needs to provide some form of
  15     mapping from a local query notation or representation to a
  16     <token>Z_RPNQuery</token> structure. Some programmers will prefer to
  17     construct the query manually, perhaps using
  18     <function>odr_malloc()</function> to simplify memory management.
  19     The &yaz; distribution includes two separate, query-generating tools
  20     that may be of use to you.
  21    </para>
  22
  23    <sect2 id="PQF"><title>Prefix Query Format</title>
  24
  25     <para>
  26      Since RPN or reverse polish notation is really just a fancy way of
  27      describing a suffix notation format (operator follows operands), it
  28      would seem that the confusion is total when we now introduce a prefix
  29      notation for RPN. The reason is one of simple laziness - it's somewhat
  30      simpler to interpret a prefix format, and this utility was designed
  31      for maximum simplicity, to provide a baseline representation for use
  32      in simple test applications and scripting environments (like Tcl). The
  33      demonstration client included with YAZ uses the PQF.
  34     </para>
  35
  36     <note>
  37      <para>
  38       The PQF have been adopted by other parties developing Z39.50
  39       software. It is often referred to as Prefix Query Notation
  40       - PQN.
  41      </para>
  42     </note>
  43     <para>
  44      The PQF is defined by the pquery module in the YAZ library.
  45      There are two sets of function that have similar behavior. First
  46      set operates on a PQF parser handle, second set doesn't. First set
  47      set of functions are more flexible than the second set. Second set
  48      is obsolete and is only provided to ensure backwards compatibility.
  49     </para>
  50     <para>
  51      First set of functions all operate on a PQF parser handle:
  52     </para>
  53     <synopsis>
  54      #include &lt;yaz/pquery.h&gt;
  55
  56      YAZ_PQF_Parser yaz_pqf_create (void);
  57
  58      void yaz_pqf_destroy (YAZ_PQF_Parser p);
  59
  60      Z_RPNQuery *yaz_pqf_parse (YAZ_PQF_Parser p, ODR o, const char *qbuf);
  61
  62      Z_AttributesPlusTerm *yaz_pqf_scan (YAZ_PQF_Parser p, ODR o,
  63                           Odr_oid **attributeSetId, const char *qbuf);
  64
  65
  66      int yaz_pqf_error (YAZ_PQF_Parser p, const char **msg, size_t *off);
  67     </synopsis>
  68     <para>
  69      A PQF parser is created and destructed by functions
  70      <function>yaz_pqf_create</function> and
  71      <function>yaz_pqf_destroy</function> respectively.
  72      Function <function>yaz_pqf_parse</function> parses query given
  73      by string <literal>qbuf</literal>. If parsing was successful,
  74      a Z39.50 RPN Query is returned which is created using ODR stream
  75      <literal>o</literal>. If parsing failed, a NULL pointer is
  76      returned.
  77      Function <function>yaz_pqf_scan</function> takes a scan query in
  78      <literal>qbuf</literal>. If parsing was successful, the function
  79      returns attributes plus term pointer and modifies
  80      <literal>attributeSetId</literal> to hold attribute set for the
  81      scan request - both allocated using ODR stream <literal>o</literal>.
  82      If parsing failed, yaz_pqf_scan returns a NULL pointer.
  83      Error information for bad queries can be obtained by a call to
  84      <function>yaz_pqf_error</function> which returns an error code and
  85      modifies <literal>*msg</literal> to point to an error description,
  86      and modifies <literal>*off</literal> to the offset within last
  87      query were parsing failed.
  88     </para>
  89     <para>
  90      The second set of functions are declared as follows:
  91     </para>
  92     <synopsis>
  93      #include &lt;yaz/pquery.h&gt;
  94
  95      Z_RPNQuery *p_query_rpn (ODR o, oid_proto proto, const char *qbuf);
  96
  97      Z_AttributesPlusTerm *p_query_scan (ODR o, oid_proto proto,
  98                              Odr_oid **attributeSetP, const char *qbuf);
  99
 100      int p_query_attset (const char *arg);
 101     </synopsis>
 102     <para>
 103      The function <function>p_query_rpn()</function> takes as arguments an
 104       &odr; stream (see section <link linkend="odr">The ODR Module</link>)
 105      to provide a memory source (the structure created is released on
 106      the next call to <function>odr_reset()</function> on the stream), a
 107      protocol identifier (one of the constants <token>PROTO_Z3950</token> and
 108      <token>PROTO_SR</token>), an attribute set reference, and
 109      finally a null-terminated string holding the query string.
 110     </para>
 111     <para>
 112      If the parse went well, <function>p_query_rpn()</function> returns a
 113      pointer to a <literal>Z_RPNQuery</literal> structure which can be
 114      placed directly into a <literal>Z_SearchRequest</literal>.
 115      If parsing failed, due to syntax error, a NULL pointer is returned.
 116     </para>
 117     <para>
 118      The <literal>p_query_attset</literal> specifies which attribute set
 119      to use if the query doesn't specify one by the
 120      <literal>@attrset</literal> operator.
 121      The <literal>p_query_attset</literal> returns 0 if the argument is a
 122      valid attribute set specifier; otherwise the function returns -1.
 123     </para>
 124
 125     <para>
 126      The grammar of the PQF is as follows:
 127     </para>
 128
 129     <literallayout>
 130      query ::= top-set query-struct.
 131
 132      top-set ::= &lsqb; '@attrset' string &rsqb;
 133
 134      query-struct ::= attr-spec | simple | complex | '@term' term-type
 135
 136      attr-spec ::= '@attr' &lsqb; string &rsqb; string query-struct
 137
 138      complex ::= operator query-struct query-struct.
 139
 140      operator ::= '@and' | '@or' | '@not' | '@prox' proximity.
 141
 142      simple ::= result-set | term.
 143
 144      result-set ::= '@set' string.
 145
 146      term ::= string.
 147
 148      proximity ::= exclusion distance ordered relation which-code unit-code.
 149
 150      exclusion ::= '1' | '0' | 'void'.
 151
 152      distance ::= integer.
 153
 154      ordered ::= '1' | '0'.
 155
 156      relation ::= integer.
 157
 158      which-code ::= 'known' | 'private' | integer.
 159
 160      unit-code ::= integer.
 161
 162      term-type ::= 'general' | 'numeric' | 'string' | 'oid' | 'datetime' | 'null'.
 163     </literallayout>
 164
 165     <para>
 166      You will note that the syntax above is a fairly faithful
 167      representation of RPN, except for the Attribute, which has been
 168      moved a step away from the term, allowing you to associate one or more
 169      attributes with an entire query structure. The parser will
 170      automatically apply the given attributes to each term as required.
 171     </para>
 172
 173     <para>
 174      The @attr operator is followed by an attribute specification
 175      (<literal>attr-spec</literal> above). The specification consists
 176      of optional an attribute set, an attribute type-value pair and
 177      a sub query. The attribute type-value pair is packed in one string:
 178      an attribute type, a dash, followed by an attribute value.
 179      The type is always an integer but the value may be either an
 180      integer or a string (if it doesn't start with a digit character).
 181     </para>
 182
 183     <para>
 184      Version 3 of the Z39.50 specification defines various encoding of terms.
 185      Use the <literal>@term </literal> <replaceable>type</replaceable>,
 186      where type is one of: <literal>general</literal>,
 187      <literal>numeric</literal>, <literal>string</literal>
 188      (for InternationalString), ..
 189      If no term type has been given, the <literal>general</literal> form
 190      is used which is the only encoding allowed in both version 2 - and 3
 191      of the Z39.50 standard.
 192     </para>
 193
 194     <example><title>PQF queries</title>
 195
 196      <para>Queries using simple terms.
 197       <screen>
 198       dylan
 199       "bob dylan"
 200       </screen>
 201      </para>
 202      <para>Boolean operators.
 203       <screen>
 204        @or "dylan" "zimmerman"
 205        @and @or dylan zimmerman when
 206        @and when @or dylan zimmerman
 207       </screen>
 208      </para>
 209      <para>
 210       Reference to result sets.
 211       <screen>
 212        @set Result-1
 213        @and @set seta setb
 214       </screen>
 215      </para>
 216      <para>
 217       Attributes for terms.
 218       <screen>
 219        @attr 1=4 computer
 220        @attr 1=4 @attr 4=1 "self portrait"
 221        @attr exp1 @attr 1=1 CategoryList
 222        @attr gils 1=2008 Copenhagen
 223        @attr 1=/book/title computer
 224       </screen>
 225      </para>
 226      <para>
 227       Proximity.
 228       <screen>
 229        @prox 0 3 1 2 k 2 dylan zimmerman
 230        </screen>
 231       </para>
 232      <para>
 233       Specifying term type.
 234       <screen>
 235        @term string "a UTF-8 string, maybe?"
 236       </screen>
 237      </para>
 238      <para>Mixed queries
 239       <screen>
 240        @or @and bob dylan @set Result-1
 241
 242        @attr 4=1 @and @attr 1=1 "bob dylan" @attr 1=4 "slow train coming"
 243
 244        @and @attr 2=4 @attr gils 1=2038 -114 @attr 2=2 @attr gils 1=2039 -109
 245       </screen>
 246      </para>
 247     </example>
 248    </sect2>
 249    <sect2 id="CCL"><title>Common Command Language</title>
 250
 251     <para>
 252      Not all users enjoy typing in prefix query structures and numerical
 253      attribute values, even in a minimalistic test client. In the library
 254      world, the more intuitive Common Command Language (or ISO 8777) has
 255      enjoyed some popularity - especially before the widespread
 256      availability of graphical interfaces. It is still useful in
 257      applications where you for some reason or other need to provide a
 258      symbolic language for expressing boolean query structures.
 259     </para>
 260
 261     <para>
 262      The <ulink url="http://europagate.dtv.dk/">EUROPAGATE</ulink>
 263      research project working under the Libraries programme
 264      of the European Commission's DG XIII has, amongst other useful tools,
 265      implemented a general-purpose CCL parser which produces an output
 266      structure that can be trivially converted to the internal RPN
 267      representation of &yaz; (The <literal>Z_RPNQuery</literal> structure).
 268      Since the CCL utility - along with the rest of the software
 269      produced by EUROPAGATE - is made freely available on a liberal
 270      license, it is included as a supplement to &yaz;.
 271     </para>
 272
 273     <sect3><title>CCL Syntax</title>
 274
 275      <para>
 276       The CCL parser obeys the following grammar for the FIND argument.
 277       The syntax is annotated by in the lines prefixed by
 278       <literal>&dash;&dash;</literal>.
 279      </para>
 280
 281      <screen>
 282       CCL-Find ::= CCL-Find Op Elements
 283                 | Elements.
 284
 285       Op ::= "and" | "or" | "not"
 286       -- The above means that Elements are separated by boolean operators.
 287
 288       Elements ::= '(' CCL-Find ')'
 289                 | Set
 290                 | Terms
 291                 | Qualifiers Relation Terms
 292                 | Qualifiers Relation '(' CCL-Find ')'
 293                 | Qualifiers '=' string '-' string
 294       -- Elements is either a recursive definition, a result set reference, a
 295       -- list of terms, qualifiers followed by terms, qualifiers followed
 296       -- by a recursive definition or qualifiers in a range (lower - upper).
 297
 298       Set ::= 'set' = string
 299       -- Reference to a result set
 300
 301       Terms ::= Terms Prox Term
 302              | Term
 303       -- Proximity of terms.
 304
 305       Term ::= Term string
 306             | string
 307       -- This basically means that a term may include a blank
 308
 309       Qualifiers ::= Qualifiers ',' string
 310                   | string
 311       -- Qualifiers is a list of strings separated by comma
 312
 313       Relation ::= '=' | '>=' | '&lt;=' | '&lt;>' | '>' | '&lt;'
 314       -- Relational operators. This really doesn't follow the ISO8777
 315       -- standard.
 316
 317       Prox ::= '%' | '!'
 318       -- Proximity operator
 319
 320      </screen>
 321
 322      <example><title>CCL queries</title>
 323       <para>
 324        The following queries are all valid:
 325       </para>
 326
 327       <screen>
 328        dylan
 329
 330        "bob dylan"
 331
 332        dylan or zimmerman
 333
 334        set=1
 335
 336        (dylan and bob) or set=1
 337
 338       </screen>
 339       <para>
 340        Assuming that the qualifiers <literal>ti</literal>,
 341        <literal>au</literal>
 342        and <literal>date</literal> are defined we may use:
 343       </para>
 344
 345       <screen>
 346        ti=self portrait
 347
 348        au=(bob dylan and slow train coming)
 349
 350        date>1980 and (ti=((self portrait)))
 351
 352       </screen>
 353      </example>
 354
 355     </sect3>
 356     <sect3><title>CCL Qualifiers</title>
 357
 358      <para>
 359       Qualifiers are used to direct the search to a particular searchable
 360       index, such as title (ti) and author indexes (au). The CCL standard
 361       itself doesn't specify a particular set of qualifiers, but it does
 362       suggest a few short-hand notations. You can customize the CCL parser
 363       to support a particular set of qualifiers to reflect the current target
 364       profile. Traditionally, a qualifier would map to a particular
 365       use-attribute within the BIB-1 attribute set. However, you could also
 366       define qualifiers that would set, for example, the
 367       structure-attribute.
 368      </para>
 369
 370      <para>
 371       A  CCL profile is a set of predefined CCL qualifiers that may be
 372       read from a file.
 373       The YAZ client reads its CCL qualifiers from a file named
 374       <filename>default.bib</filename>. Each line in the file has the form:
 375      </para>
 376
 377      <para>
 378       <replaceable>qualifier-name</replaceable>
 379       [<replaceable>attributeset</replaceable><literal>,</literal>]<replaceable>type</replaceable><literal>=</literal><replaceable>val</replaceable>
 380       [<replaceable>attributeset</replaceable><literal>,</literal>]<replaceable>type</replaceable><literal>=</literal><replaceable>val</replaceable> ...
 381      </para>
 382
 383      <para>
 384       where <replaceable>qualifier-name</replaceable> is the name of the
 385       qualifier to be used (eg. <literal>ti</literal>),
 386       <replaceable>type</replaceable> is attribute type in the attribute
 387       set (Bib-1 is used if no attribute set is given) and
 388       <replaceable>val</replaceable> is attribute value.
 389       The <replaceable>type</replaceable> can be specified as an
 390       integer or as it be specified either as a single-letter:
 391       <literal>u</literal> for use,
 392       <literal>r</literal> for relation,<literal>p</literal> for position,
 393       <literal>s</literal> for structure,<literal>t</literal> for truncation
 394       or <literal>c</literal> for completeness.
 395       The attributes for the special qualifier name <literal>term</literal>
 396       are used when no CCL qualifier is given in a query.
 397      </para>
 398
 399      <example><title>CCL profile</title>
 400       <para>
 401        Consider the following definition:
 402       </para>
 403
 404       <screen>
 405        ti       u=4 s=1
 406        au       u=1 s=1
 407        term     s=105
 408        ranked   r=102
 409       </screen>
 410       <para>
 411        Three qualifiers are defined, <literal>ti</literal>,
 412        <literal>au</literal> and <literal>ranked</literal>.
 413        <literal>ti</literal> and <literal>au</literal> both set
 414        structure attribute to phrase (s=1).
 415        <literal>ti</literal>
 416        sets the use-attribute to 4. <literal>au</literal> sets the
 417        use-attribute to 1.
 418        When no qualifiers are used in the query the structure-attribute is
 419        set to free-form-text (105).
 420       </para>
 421       <para>
 422        You can combine attributes. To Search for "ranked title" you
 423        can do
 424        <screen>
 425         ti,ranked=knuth computer
 426        </screen>
 427        which will use "relation is ranked", "use is title", "structure is
 428        phrase".
 429       </para>
 430      </example>
 431
 432     </sect3>
 433     <sect3><title>CCL API</title>
 434      <para>
 435       All public definitions can be found in the header file
 436       <filename>ccl.h</filename>. A profile identifier is of type
 437       <literal>CCL_bibset</literal>. A profile must be created with the call
 438       to the function <function>ccl_qual_mk</function> which returns a profile
 439       handle of type <literal>CCL_bibset</literal>.
 440      </para>
 441
 442      <para>
 443       To read a file containing qualifier definitions the function
 444       <function>ccl_qual_file</function> may be convenient. This function
 445       takes an already opened <literal>FILE</literal> handle pointer as
 446       argument along with a <literal>CCL_bibset</literal> handle.
 447      </para>
 448
 449      <para>
 450       To parse a simple string with a FIND query use the function
 451      </para>
 452      <screen>
 453 struct ccl_rpn_node *ccl_find_str (CCL_bibset bibset, const char *str,
 454                                    int *error, int *pos);
 455      </screen>
 456      <para>
 457       which takes the CCL profile (<literal>bibset</literal>) and query
 458       (<literal>str</literal>) as input. Upon successful completion the RPN
 459       tree is returned. If an error occur, such as a syntax error, the integer
 460       pointed to by <literal>error</literal> holds the error code and
 461       <literal>pos</literal> holds the offset inside query string in which
 462       the parsing failed.
 463      </para>
 464
 465      <para>
 466       An English representation of the error may be obtained by calling
 467       the <literal>ccl_err_msg</literal> function. The error codes are
 468       listed in <filename>ccl.h</filename>.
 469      </para>
 470
 471      <para>
 472       To convert the CCL RPN tree (type
 473       <literal>struct ccl_rpn_node *</literal>)
 474       to the Z_RPNQuery of YAZ the function <function>ccl_rpn_query</function>
 475       must be used. This function which is part of YAZ is implemented in
 476       <filename>yaz-ccl.c</filename>.
 477       After calling this function the CCL RPN tree is probably no longer
 478       needed. The <literal>ccl_rpn_delete</literal> destroys the CCL RPN tree.
 479      </para>
 480
 481      <para>
 482       A CCL profile may be destroyed by calling the
 483       <function>ccl_qual_rm</function> function.
 484      </para>
 485
 486      <para>
 487       The token names for the CCL operators may be changed by setting the
 488       globals (all type <literal>char *</literal>)
 489       <literal>ccl_token_and</literal>, <literal>ccl_token_or</literal>,
 490       <literal>ccl_token_not</literal> and <literal>ccl_token_set</literal>.
 491       An operator may have aliases, i.e. there may be more than one name for
 492       the operator. To do this, separate each alias with a space character.
 493      </para>
 494     </sect3>
 495    </sect2>
 496    <sect2 id="tools.cql"><title>CQL</title>
 497     <para>
 498      <ulink url="http://www.loc.gov/z3950/agency/zing/cql/">CQL</ulink>
 499       - Common Query Language - was defined for the
 500      <ulink url="http://www.loc.gov/z3950/agency/zing/srw/">SRW</ulink>
 501      protocol.
 502      In many ways CQL has a similar syntax to CCL.
 503      The objective of CQL is different. Where CCL aims to be
 504      an end-user language, CQL is <emphasis>the</emphasis> protocol
 505      query language for SRW.
 506     </para>
 507     <tip>
 508      <para>
 509       If you are new to CQL, read the
 510       <ulink url="http://zing.z3950.org/cql/intro.html">Gentle
 511        Introduction</ulink>.
 512      </para>
 513     </tip>
 514     <para>
 515      The CQL parser in &yaz; provides the following:
 516      <itemizedlist>
 517       <listitem>
 518        <para>
 519         It parses and validates a CQL query.
 520        </para>
 521       </listitem>
 522       <listitem>
 523        <para>
 524         It generates a C structure that allows you to convert
 525         a CQL query to some other query language, such as SQL.
 526        </para>
 527       </listitem>
 528       <listitem>
 529        <para>
 530         The parser converts a valid CQL query to PQF, thus providing a
 531         way to use CQL for both SRW/SRU servers and Z39.50 targets at the
 532         same time.
 533        </para>
 534       </listitem>
 535       <listitem>
 536        <para>
 537         The parser converts CQL to
 538         <ulink url="http://www.loc.gov/z3950/agency/zing/cql/xcql.html">
 539          XCQL</ulink>.
 540         XCQL is an XML representation of CQL.
 541         XCQL is part of the SRW specification. However, since SRU
 542         supports CQL only, we don't expect XCQL to be widely used.
 543         Furthermore, CQL has the advantage over XCQL that it is
 544         easy to read.
 545        </para>
 546       </listitem>
 547      </itemizedlist>
 548     </para>
 549     <sect3 id="tools.cql.parsing"><title>CQL parsing</title>
 550      <para>
 551       A CQL parser is represented by the <literal>CQL_parser</literal>
 552       handle. Its contents should be considered &yaz; internal (private).
 553       <synopsis>
 554 #include &lt;yaz/cql.h&gt;
 555
 556 typedef struct cql_parser *CQL_parser;
 557
 558 CQL_parser cql_parser_create(void);
 559 void cql_parser_destroy(CQL_parser cp);
 560       </synopsis>
 561      A parser is created by <function>cql_parser_create</function> and
 562      is destroyed by <function>cql_parser_destroy</function>.
 563      </para>
 564      <para>
 565       To parse a CQL query string, the following function
 566       is provided:
 567       <synopsis>
 568 int cql_parser_string(CQL_parser cp, const char *str);
 569       </synopsis>
 570       A CQL query is parsed by the <function>cql_parser_string</function>
 571       which takes a query <parameter>str</parameter>.
 572       If the query was valid (no syntax errors), then zero is returned;
 573       otherwise a non-zero error code is returned.
 574      </para>
 575      <para>
 576       <synopsis>
 577 int cql_parser_stream(CQL_parser cp,
 578                       int (*getbyte)(void *client_data),
 579                       void (*ungetbyte)(int b, void *client_data),
 580                       void *client_data);
 581
 582 int cql_parser_stdio(CQL_parser cp, FILE *f);
 583       </synopsis>
 584       The functions <function>cql_parser_stream</function> and
 585       <function>cql_parser_stdio</function> parses a CQL query
 586       - just like <function>cql_parser_string</function>.
 587       The only difference is that the CQL query can be
 588       fed to the parser in different ways.
 589       The <function>cql_parser_stream</function> uses a generic
 590       byte stream as input. The <function>cql_parser_stdio</function>
 591       uses a <literal>FILE</literal> handle which is opened for reading.
 592      </para>
 593     </sect3>
 594
 595     <sect3 id="tools.cql.tree"><title>CQL tree</title>
 596      <para>
 597       The the query string is validl, the CQL parser
 598       generates a tree representing the structure of the
 599       CQL query.
 600      </para>
 601      <para>
 602       <synopsis>
 603 struct cql_node *cql_parser_result(CQL_parser cp);
 604       </synopsis>
 605       <function>cql_parser_result</function> returns the
 606       a pointer to the root node of the resulting tree.
 607      </para>
 608      <para>
 609       Each node in a CQL tree is represented by a
 610       <literal>struct cql_node</literal>.
 611       It is defined as follows:
 612       <synopsis>
 613 #define CQL_NODE_ST 1
 614 #define CQL_NODE_BOOL 2
 615 #define CQL_NODE_MOD 3
 616 struct cql_node {
 617     int which;
 618     union {
 619         struct {
 620             char *index;
 621             char *term;
 622             char *relation;
 623             struct cql_node *modifiers;
 624             struct cql_node *prefixes;
 625         } st;
 626         struct {
 627             char *value;
 628             struct cql_node *left;
 629             struct cql_node *right;
 630             struct cql_node *modifiers;
 631             struct cql_node *prefixes;
 632         } boolean;
 633         struct {
 634             char *name;
 635             char *value;
 636             struct cql_node *next;
 637         } mod;
 638     } u;
 639 };
 640       </synopsis>
 641       There are three kinds of nodes, search term (ST), boolean (BOOL),
 642       and modifier (MOD).
 643      </para>
 644      <para>
 645       The search term node has five members:
 646       <itemizedlist>
 647        <listitem>
 648         <para>
 649          <literal>index</literal>: index for search term.
 650          If an index is unspecified for a search term,
 651          <literal>index</literal> will be NULL.
 652         </para>
 653        </listitem>
 654        <listitem>
 655         <para>
 656          <literal>term</literal>: the search term itself.
 657         </para>
 658        </listitem>
 659        <listitem>
 660         <para>
 661          <literal>relation</literal>: relation for search term.
 662         </para>
 663        </listitem>
 664        <listitem>
 665         <para>
 666          <literal>modifiers</literal>: relation modifiers for search
 667          term. The <literal>modifiers</literal> is a simple linked
 668          list (NULL for last entry). Each relation modifier node
 669          is of type <literal>MOD</literal>.
 670         </para>
 671        </listitem>
 672        <listitem>
 673         <para>
 674          <literal>prefixes</literal>: index prefixes for search
 675          term. The <literal>prefixes</literal> is a simple linked
 676          list (NULL for last entry). Each prefix node
 677          is of type <literal>MOD</literal>.
 678         </para>
 679        </listitem>
 680       </itemizedlist>
 681      </para>
 682
 683      <para>
 684       The boolean node represents both <literal>and</literal>,
 685       <literal>or</literal>, not as well as
 686       proximity.
 687       <itemizedlist>
 688        <listitem>
 689         <para>
 690          <literal>left</literal> and <literal>right</literal>: left
 691          - and right operand respectively.
 692         </para>
 693        </listitem>
 694        <listitem>
 695         <para>
 696          <literal>modifiers</literal>: proximity arguments.
 697         </para>
 698        </listitem>
 699        <listitem>
 700         <para>
 701          <literal>prefixes</literal>: index prefixes.
 702          The <literal>prefixes</literal> is a simple linked
 703          list (NULL for last entry). Each prefix node
 704          is of type <literal>MOD</literal>.
 705         </para>
 706        </listitem>
 707       </itemizedlist>
 708      </para>
 709
 710      <para>
 711       The modifier node is a "utility" node used for name-value pairs,
 712       such as prefixes, proximity arguements, etc.
 713       <itemizedlist>
 714        <listitem>
 715         <para>
 716          <literal>name</literal> name of mod node.
 717         </para>
 718        </listitem>
 719        <listitem>
 720         <para>
 721          <literal>value</literal> value of mod node.
 722         </para>
 723        </listitem>
 724        <listitem>
 725         <para>
 726          <literal>next</literal>: pointer to next node which is
 727          always a mod node (NULL for last entry).
 728         </para>
 729        </listitem>
 730       </itemizedlist>
 731      </para>
 732
 733     </sect3>
 734     <sect3 id="tools.cql.pqf"><title>CQL to PQF conversion</title>
 735      <para>
 736       Conversion to PQF (and Z39.50 RPN) is tricky by the fact
 737       that the resulting RPN depends on the Z39.50 target
 738       capabilities (combinations of supported attributes).
 739       In addition, the CQL and SRW operates on index prefixes
 740       (URI or strings), whereas the RPN uses Object Identifiers
 741       for attribute sets.
 742      </para>
 743      <para>
 744       The CQL library of &yaz; defines a <literal>cql_transform_t</literal>
 745       type. It represents a particular mapping between CQL and RPN.
 746       This handle is created and destroyed by the functions:
 747      <synopsis>
 748 cql_transform_t cql_transform_open_FILE (FILE *f);
 749 cql_transform_t cql_transform_open_fname(const char *fname);
 750 void cql_transform_close(cql_transform_t ct);
 751       </synopsis>
 752       The first two functions create a tranformation handle from
 753       either an already open FILE or from a filename respectively.
 754      </para>
 755      <para>
 756       The handle is destroyed by <function>cql_transform_close</function>
 757       in which case no further reference of the handle is allowed.
 758      </para>
 759      <para>
 760       When a <literal>cql_transform_t</literal> handle has been created
 761       you can convert to RPN.
 762       <synopsis>
 763 int cql_transform_buf(cql_transform_t ct,
 764                       struct cql_node *cn, char *out, int max);
 765       </synopsis>
 766       This function converts the CQL tree <literal>cn</literal>
 767       using handle <literal>ct</literal>.
 768       For the resulting PQF, you supply a buffer <literal>out</literal>
 769       which must be able to hold at at least <literal>max</literal>
 770       characters.
 771      </para>
 772      <para>
 773       If conversion failed, <function>cql_transform_buf</function>
 774       returns a non-zero error code; otherwise zero is returned
 775       (conversion successful).
 776      </para>
 777      <para>
 778       If you wish to be able to produce a PQF result in a different
 779       way, there are two alternatives.
 780       <synopsis>
 781 void cql_transform_pr(cql_transform_t ct,
 782                       struct cql_node *cn,
 783                       void (*pr)(const char *buf, void *client_data),
 784                       void *client_data);
 785
 786 int cql_transform_FILE(cql_transform_t ct,
 787                        struct cql_node *cn, FILE *f);
 788       </synopsis>
 789       The former function produces output to a user-defined
 790       output stream. The latter writes the result to an already
 791       open <literal>FILE</literal>.
 792      </para>
 793     </sect3>
 794     <sect3 id="tools.cql.map">
 795      <title>Specification of CQL to RPN mapping</title>
 796      <para>
 797       The file supplied to functions
 798       <function>cql_transform_open_FILE</function>,
 799       <function>cql_transform_open_fname</function> follows
 800       a structure found in many Unix utilities.
 801       It consists of mapping specifications - one per line.
 802       Lines starting with <literal>#</literal> are ignored (comments).
 803      </para>
 804      <para>
 805       Each line is of the form
 806       <literallayout>
 807        <replaceable>CQL pattern</replaceable><literal> = </literal> <replaceable> RPN equivalent</replaceable>
 808       </literallayout>
 809      </para>
 810      <para>
 811       An RPN pattern is a simple attribute list. Each attribute pair
 812       takes the form:
 813       <literallayout>
 814        [<replaceable>set</replaceable>] <replaceable>type</replaceable><literal>=</literal><replaceable>value</replaceable>
 815       </literallayout>
 816       The attribute <replaceable>set</replaceable> is optional.
 817       The <replaceable>type</replaceable> is the attribute type,
 818       <replaceable>value</replaceable> the attribute value.
 819      </para>
 820      <para>
 821       The following CQL patterns are recognized:
 822       <variablelist>
 823        <varlistentry><term>
 824          <literal>qualifier.</literal><replaceable>set</replaceable><literal>.</literal><replaceable>name</replaceable>
 825         </term>
 826         <listitem>
 827          <para>
 828           This pattern is invoked when a CQL qualifier, such as
 829           dc.title is converted. <replaceable>set</replaceable>
 830           and <replaceable>name</replaceable> is the index set and qualifier
 831           name respectively.
 832           Typically, the RPN specifies an equivalent use attribute.
 833          </para>
 834          <para>
 835           For terms not bound by a qualifier the pattern
 836           <literal>qualifier.srw.serverChoice</literal> is used.
 837           Here, the prefix <literal>srw</literal> is defined as
 838           <literal>http://www.loc.gov/zing/cql/srw-indexes/v1.0/</literal>.
 839           If this pattern is not defined, the mapping will fail.
 840          </para>
 841         </listitem>
 842        </varlistentry>
 843        <varlistentry><term>
 844          <literal>relation.</literal><replaceable>relation</replaceable>
 845         </term>
 846         <listitem>
 847          <para>
 848           This pattern specifies how a CQL relation is mapped to RPN.
 849           <replaceable>pattern</replaceable> is name of relation
 850           operator. Since <literal>=</literal> is used as
 851           separator between CQL pattern and RPN, CQL relations
 852           including <literal>=</literal> cannot be
 853           used directly. To avoid a conflict, the names
 854           <literal>ge</literal>,
 855           <literal>eq</literal>,
 856           <literal>le</literal>,
 857           must be used for CQL operators, greater-than-or-equal,
 858           equal, less-than-or-equal respectively.
 859           The RPN pattern is supposed to include a relation attribute.
 860          </para>
 861          <para>
 862           For terms not bound by a relation, the pattern
 863           <literal>relation.scr</literal> is used. If the pattern
 864           is not defined, the mapping will fail.
 865          </para>
 866          <para>
 867           The special pattern, <literal>relation.*</literal> is used
 868           when no other relation pattern is matched.
 869          </para>
 870         </listitem>
 871        </varlistentry>
 872
 873        <varlistentry><term>
 874          <literal>relationModifier.</literal><replaceable>mod</replaceable>
 875         </term>
 876         <listitem>
 877          <para>
 878           This pattern specifies how a CQL relation modifier is mapped to RPN.
 879           The RPN pattern is usually a relation attribute.
 880          </para>
 881         </listitem>
 882        </varlistentry>
 883
 884        <varlistentry><term>
 885          <literal>structure.</literal><replaceable>type</replaceable>
 886         </term>
 887         <listitem>
 888          <para>
 889           This pattern specifies how a CQL structure is mapped to RPN.
 890           Note that this CQL pattern is somewhat to similar to
 891           CQL pattern <literal>relation</literal>.
 892           The <replaceable>type</replaceable> is a CQL relation.
 893          </para>
 894          <para>
 895           The pattern, <literal>structure.*</literal> is used
 896           when no other structure pattern is matched.
 897           Usually, the RPN equivalent specifies a structure attribute.
 898          </para>
 899         </listitem>
 900        </varlistentry>
 901
 902        <varlistentry><term>
 903          <literal>position.</literal><replaceable>type</replaceable>
 904         </term>
 905         <listitem>
 906          <para>
 907           This pattern specifies how the anchor (position) of
 908           CQL is mapped to RPN.
 909           The <replaceable>type</replaceable> is one
 910           of <literal>first</literal>, <literal>any</literal>,
 911           <literal>last</literal>, <literal>firstAndLast</literal>.
 912          </para>
 913          <para>
 914           The pattern, <literal>position.*</literal> is used
 915           when no other position pattern is matched.
 916          </para>
 917         </listitem>
 918        </varlistentry>
 919
 920        <varlistentry><term>
 921          <literal>set.</literal><replaceable>prefix</replaceable>
 922         </term>
 923         <listitem>
 924          <para>
 925           This specification defines a CQL index set for a given prefix.
 926           The value on the right hand side is the URI for the set -
 927           <emphasis>not</emphasis> RPN. All prefixes used in
 928           qualifier patterns must be defined this way.
 929          </para>
 930         </listitem>
 931        </varlistentry>
 932       </variablelist>
 933      </para>
 934      <example><title>CQL to RPN mapping file</title>
 935       <para>
 936        This simple file defines two index sets, three qualifiers and three
 937        relations, a position pattern and a default structure.
 938       </para>
 939       <programlisting><![CDATA[
 940        set.srw    = http://www.loc.gov/zing/cql/srw-indexes/v1.0/
 941        set.dc     = http://www.loc.gov/zing/cql/dc-indexes/v1.0/
 942
 943        qualifier.srw.serverChoice = 1=1016
 944        qualifier.dc.title         = 1=4
 945        qualifier.dc.subject       = 1=21
 946
 947        relation.<                 = 2=1
 948        relation.eq                = 2=3
 949        relation.scr               = 2=3
 950
 951        position.any               = 3=3 6=1
 952
 953        structure.*                = 4=1
 954 ]]>
 955       </programlisting>
 956       <para>
 957        With the mappings above, the CQL query
 958        <screen>
 959         computer
 960        </screen>
 961        is converted to the PQF:
 962        <screen>
 963         @attr 1=1016 @attr 2=3 @attr 4=1 @attr 3=3 @attr 6=1 "computer"
 964        </screen>
 965        by rules <literal>qualifier.srw.serverChoice</literal>,
 966        <literal>relation.scr</literal>, <literal>structure.*</literal>,
 967        <literal>position.any</literal>.
 968       </para>
 969       <para>
 970        CQL query
 971        <screen>
 972         computer^
 973        </screen>
 974        is rejected, since <literal>position.right</literal> is
 975        undefined.
 976       </para>
 977       <para>
 978        CQL query
 979        <screen>
 980         >my = "http://www.loc.gov/zing/cql/dc-indexes/v1.0/" my.title = x
 981        </screen>
 982        is converted to
 983        <screen>
 984         @attr 1=4 @attr 2=3 @attr 4=1 @attr 3=3 @attr 6=1 "x"
 985        </screen>
 986       </para>
 987      </example>
 988     </sect3>
 989     <sect3 id="tools.cql.xcql"><title>CQL to XCQL conversion</title>
 990      <para>
 991       Conversion from CQL to XCQL is trivial and does not
 992       require a mapping to be defined.
 993       There three functions to choose from depending on the
 994       way you wish to store the resulting output (XML buffer
 995       containing XCQL).
 996       <synopsis>
 997 int cql_to_xml_buf(struct cql_node *cn, char *out, int max);
 998 void cql_to_xml(struct cql_node *cn,
 999                 void (*pr)(const char *buf, void *client_data),
1000                 void *client_data);
1001 void cql_to_xml_stdio(struct cql_node *cn, FILE *f);
1002       </synopsis>
1003       Function <function>cql_to_xml_buf</function> converts
1004       to XCQL and stores result in a user supplied buffer of a given
1005       max size.
1006      </para>
1007      <para>
1008       <function>cql_to_xml</function> writes the result in
1009       a user defined output stream.
1010       <function>cql_to_xml_stdio</function> writes to a
1011       a file.
1012      </para>
1013     </sect3>
1014    </sect2>
1015   </sect1>
1016   <sect1 id="tools.oid"><title>Object Identifiers</title>
1017
1018    <para>
1019     The basic YAZ representation of an OID is an array of integers,
1020     terminated with the value -1. The &odr; module provides two
1021     utility-functions to create and copy this type of data elements:
1022    </para>
1023
1024    <screen>
1025     Odr_oid *odr_getoidbystr(ODR o, char *str);
1026    </screen>
1027
1028    <para>
1029     Creates an OID based on a string-based representation using dots (.)
1030     to separate elements in the OID.
1031    </para>
1032
1033    <screen>
1034     Odr_oid *odr_oiddup(ODR odr, Odr_oid *o);
1035    </screen>
1036
1037    <para>
1038     Creates a copy of the OID referenced by the <emphasis>o</emphasis>
1039     parameter.
1040     Both functions take an &odr; stream as parameter. This stream is used to
1041     allocate memory for the data elements, which is released on a
1042     subsequent call to <function>odr_reset()</function> on that stream.
1043    </para>
1044
1045    <para>
1046     The OID module provides a higher-level representation of the
1047     family of object identifiers which describe the Z39.50 protocol and its
1048     related objects. The definition of the module interface is given in
1049     the <filename>oid.h</filename> file.
1050    </para>
1051
1052    <para>
1053     The interface is mainly based on the <literal>oident</literal> structure.
1054     The definition of this structure looks like this:
1055    </para>
1056
1057    <screen>
1058 typedef struct oident
1059 {
1060     oid_proto proto;
1061     oid_class oclass;
1062     oid_value value;
1063     int oidsuffix[OID_SIZE];
1064     char *desc;
1065 } oident;
1066    </screen>
1067
1068    <para>
1069     The proto field takes one of the values
1070    </para>
1071
1072    <screen>
1073     PROTO_Z3950
1074     PROTO_SR
1075    </screen>
1076
1077    <para>
1078     If you don't care about talking to SR-based implementations (few
1079     exist, and they may become fewer still if and when the ISO SR and ANSI
1080     Z39.50 documents are merged into a single standard), you can ignore
1081     this field on incoming packages, and always set it to PROTO_Z3950
1082     for outgoing packages.
1083    </para>
1084    <para>
1085
1086     The oclass field takes one of the values
1087    </para>
1088
1089    <screen>
1090     CLASS_APPCTX
1091     CLASS_ABSYN
1092     CLASS_ATTSET
1093     CLASS_TRANSYN
1094     CLASS_DIAGSET
1095     CLASS_RECSYN
1096     CLASS_RESFORM
1097     CLASS_ACCFORM
1098     CLASS_EXTSERV
1099     CLASS_USERINFO
1100     CLASS_ELEMSPEC
1101     CLASS_VARSET
1102     CLASS_SCHEMA
1103     CLASS_TAGSET
1104     CLASS_GENERAL
1105    </screen>
1106
1107    <para>
1108     corresponding to the OID classes defined by the Z39.50 standard.
1109
1110     Finally, the value field takes one of the values
1111    </para>
1112
1113    <screen>
1114     VAL_APDU
1115     VAL_BER
1116     VAL_BASIC_CTX
1117     VAL_BIB1
1118     VAL_EXP1
1119     VAL_EXT1
1120     VAL_CCL1
1121     VAL_GILS
1122     VAL_WAIS
1123     VAL_STAS
1124     VAL_DIAG1
1125     VAL_ISO2709
1126     VAL_UNIMARC
1127     VAL_INTERMARC
1128     VAL_CCF
1129     VAL_USMARC
1130     VAL_UKMARC
1131     VAL_NORMARC
1132     VAL_LIBRISMARC
1133     VAL_DANMARC
1134     VAL_FINMARC
1135     VAL_MAB
1136     VAL_CANMARC
1137     VAL_SBN
1138     VAL_PICAMARC
1139     VAL_AUSMARC
1140     VAL_IBERMARC
1141     VAL_EXPLAIN
1142     VAL_SUTRS
1143     VAL_OPAC
1144     VAL_SUMMARY
1145     VAL_GRS0
1146     VAL_GRS1
1147     VAL_EXTENDED
1148     VAL_RESOURCE1
1149     VAL_RESOURCE2
1150     VAL_PROMPT1
1151     VAL_DES1
1152     VAL_KRB1
1153     VAL_PRESSET
1154     VAL_PQUERY
1155     VAL_PCQUERY
1156     VAL_ITEMORDER
1157     VAL_DBUPDATE
1158     VAL_EXPORTSPEC
1159     VAL_EXPORTINV
1160     VAL_NONE
1161     VAL_SETM
1162     VAL_SETG
1163     VAL_VAR1
1164     VAL_ESPEC1
1165    </screen>
1166
1167    <para>
1168     again, corresponding to the specific OIDs defined by the standard.
1169    </para>
1170
1171    <para>
1172     The desc field contains a brief, mnemonic name for the OID in question.
1173    </para>
1174
1175    <para>
1176     The function
1177    </para>
1178
1179    <screen>
1180     struct oident *oid_getentbyoid(int *o);
1181    </screen>
1182
1183    <para>
1184     takes as argument an OID, and returns a pointer to a static area
1185     containing an <literal>oident</literal> structure. You typically use
1186     this function when you receive a PDU containing an OID, and you wish
1187     to branch out depending on the specific OID value.
1188    </para>
1189
1190    <para>
1191     The function
1192    </para>
1193
1194    <screen>
1195     int *oid_ent_to_oid(struct oident *ent, int *dst);
1196    </screen>
1197
1198    <para>
1199     Takes as argument an <literal>oident</literal> structure - in which
1200     the <literal>proto</literal>, <literal>oclass</literal>/, and
1201     <literal>value</literal> fields are assumed to be set correctly -
1202     and returns a pointer to a the buffer as given by <literal>dst</literal>
1203     containing the base
1204     representation of the corresponding OID. The function returns
1205     NULL and the array dst is unchanged if a mapping couldn't place.
1206     The array <literal>dst</literal> should be at least of size
1207     <literal>OID_SIZE</literal>.
1208    </para>
1209    <para>
1210
1211     The <function>oid_ent_to_oid()</function> function can be used whenever
1212     you need to prepare a PDU containing one or more OIDs. The separation of
1213     the <literal>protocol</literal> element from the remainder of the
1214     OID-description makes it simple to write applications that can
1215     communicate with either Z39.50 or OSI SR-based applications.
1216    </para>
1217
1218    <para>
1219     The function
1220    </para>
1221
1222    <screen>
1223     oid_value oid_getvalbyname(const char *name);
1224    </screen>
1225
1226    <para>
1227     takes as argument a mnemonic OID name, and returns the
1228     <literal>/value</literal> field of the first entry in the database that
1229     contains the given name in its <literal>desc</literal> field.
1230    </para>
1231
1232    <para>
1233     Finally, the module provides the following utility functions, whose
1234     meaning should be obvious:
1235    </para>
1236
1237    <screen>
1238     void oid_oidcpy(int *t, int *s);
1239     void oid_oidcat(int *t, int *s);
1240     int oid_oidcmp(int *o1, int *o2);
1241     int oid_oidlen(int *o);
1242    </screen>
1243
1244    <note>
1245     <para>
1246      The OID module has been criticized - and perhaps rightly so
1247      - for needlessly abstracting the
1248      representation of OIDs. Other toolkits use a simple
1249      string-representation of OIDs with good results. In practice, we have
1250      found the interface comfortable and quick to work with, and it is a
1251      simple matter (for what it's worth) to create applications compatible
1252      with both ISO SR and Z39.50. Finally, the use of the
1253      <literal>/oident</literal> database is by no means mandatory.
1254      You can easily create your own system for representing OIDs, as long
1255      as it is compatible with the low-level integer-array representation
1256      of the ODR module.
1257     </para>
1258    </note>
1259
1260   </sect1>
1261
1262   <sect1 id="tools.nmem"><title>Nibble Memory</title>
1263
1264    <para>
1265     Sometimes when you need to allocate and construct a large,
1266     interconnected complex of structures, it can be a bit of a pain to
1267     release the associated memory again. For the structures describing the
1268     Z39.50 PDUs and related structures, it is convenient to use the
1269     memory-management system of the &odr; subsystem (see
1270     <link linkend="odr-use">Using ODR</link>). However, in some circumstances
1271     where you might otherwise benefit from using a simple nibble memory
1272     management system, it may be impractical to use
1273     <function>odr_malloc()</function> and <function>odr_reset()</function>.
1274     For this purpose, the memory manager which also supports the &odr;
1275     streams is made available in the NMEM module. The external interface
1276     to this module is given in the <filename>nmem.h</filename> file.
1277    </para>
1278
1279    <para>
1280     The following prototypes are given:
1281    </para>
1282
1283    <screen>
1284     NMEM nmem_create(void);
1285     void nmem_destroy(NMEM n);
1286     void *nmem_malloc(NMEM n, int size);
1287     void nmem_reset(NMEM n);
1288     int nmem_total(NMEM n);
1289     void nmem_init(void);
1290     void nmem_exit(void);
1291    </screen>
1292
1293    <para>
1294     The <function>nmem_create()</function> function returns a pointer to a
1295     memory control handle, which can be released again by
1296     <function>nmem_destroy()</function> when no longer needed.
1297     The function <function>nmem_malloc()</function> allocates a block of
1298     memory of the requested size. A call to <function>nmem_reset()</function>
1299     or <function>nmem_destroy()</function> will release all memory allocated
1300     on the handle since it was created (or since the last call to
1301     <function>nmem_reset()</function>. The function
1302     <function>nmem_total()</function> returns the number of bytes currently
1303     allocated on the handle.
1304    </para>
1305
1306    <para>
1307     The nibble memory pool is shared amongst threads. POSIX
1308     mutex'es and WIN32 Critical sections are introduced to keep the
1309     module thread safe. Function <function>nmem_init()</function>
1310     initializes the nibble memory library and it is called automatically
1311     the first time the <literal>YAZ.DLL</literal> is loaded. &yaz; uses
1312     function <function>DllMain</function> to achieve this. You should
1313     <emphasis>not</emphasis> call <function>nmem_init</function> or
1314     <function>nmem_exit</function> unless you're absolute sure what
1315     you're doing. Note that in previous &yaz; versions you'd have to call
1316     <function>nmem_init</function> yourself.
1317    </para>
1318
1319   </sect1>
1320  </chapter>
1321
1322  <!-- Keep this comment at the end of the file
1323  Local variables:
1324  mode: sgml
1325  sgml-omittag:t
1326  sgml-shorttag:t
1327  sgml-minimize-attributes:nil
1328  sgml-always-quote-attributes:t
1329  sgml-indent-step:1
1330  sgml-indent-data:t
1331  sgml-parent-document: "yaz.xml"
1332  sgml-local-catalogs: nil
1333  sgml-namecase-general:t
1334  End:
1335  -->