doc/tools.xml

   1 <!-- $Header: /home/cvsroot/yaz/doc/tools.xml,v 1.1 2001-01-04 13:36:25 adam Exp $ -->
   2 <chapter><title>Supporting Tools</title>
   3
   4 <para>
   5 In support of the service API - primarily the ASN module, which
   6 provides the programmatic interface to the Z39.50 APDUs, YAZ contains
   7 a collection of tools that support the development of applications.
   8 </para>
   9
  10 <sect1><title>Query Syntax Parsers</title>
  11
  12 <para>
  13 Since the type-1 (RPN) query structure has no direct, useful string
  14 representation, every origin application needs to provide some form of
  15 mapping from a local query notation or representation to a
  16 <token>Z_RPNQuery</token> structure. Some programmers will prefer to
  17 construct the query manually, perhaps using <function>odr_malloc()</function>
  18 to simplify memory management. The &yaz; distribution includes two separate,
  19 query-generating tools that may be of use to you.
  20 </para>
  21
  22 <sect2><title id="PQF">Prefix Query Format</title>
  23
  24 <para>
  25 Since RPN or reverse polish notation is really just a fancy way of
  26 describing a suffix notation format (operator follows operands), it
  27 would seem that the confusion is total when we now introduce a prefix
  28 notation for RPN. The reason is one of simple laziness - it's somewhat
  29 simpler to interpret a prefix format, and this utility was designed
  30 for maximum simplicity, to provide a baseline representation for use
  31 in simple test applications and scripting environments (like Tcl). The
  32 demonstration client included with YAZ uses the PQF.
  33 </para>
  34 <para>
  35 The PQF is defined by the pquery module in the YAZ library. The
  36 <filename>pquery.h</filename> file provides the declaration of the functions
  37 </para>
  38 <screen>
  39 Z_RPNQuery *p_query_rpn (ODR o, oid_proto proto, const char *qbuf);
  40
  41 Z_AttributesPlusTerm *p_query_scan (ODR o, oid_proto proto,
  42       Odr_oid **attributeSetP, const char *qbuf);
  43
  44 int p_query_attset (const char *arg);
  45 </screen>
  46 <para>
  47 The function <function>p_query_rpn()</function> takes as arguments an
  48 &odr; stream (see section <link linkend="odr">The ODR Module</link>)
  49 to provide a memory source (the structure created is released on
  50 the next call to <function>odr_reset()</function> on the stream), a
  51 protocol identifier (one of the constants <token>PROTO_Z3950</token> and
  52 <token>PROTO_SR</token>), an attribute set
  53 reference, and finally a null-terminated string holding the query
  54 string.
  55 </para>
  56 <para>
  57 If the parse went well, <function>p_query_rpn()</function> returns a
  58 pointer to a <literal>Z_RPNQuery</literal> structure which can be
  59 placed directly into a <literal>Z_SearchRequest</literal>.
  60 </para>
  61 <para>
  62
  63 The <literal>p_query_attset</literal> specifies which attribute set to use if
  64 the query doesn't specify one by the <literal>@attrset</literal> operator.
  65 The <literal>p_query_attset</literal> returns 0 if the argument is a
  66 valid attribute set specifier; otherwise the function returns -1.
  67 </para>
  68
  69 <para>
  70 The grammar of the PQF is as follows:
  71 </para>
  72
  73 <screen>
  74 Query ::= &lsqb; AttSet &rsqb; QueryStruct.
  75
  76 AttSet ::= string.
  77
  78 QueryStruct ::= { Attribute } Simple | Complex.
  79
  80 Attribute ::= '@attr' AttributeType '=' AttributeValue.
  81
  82 AttributeType ::= integer.
  83
  84 AttributeValue ::= integer.
  85
  86 Complex ::= Operator QueryStruct QueryStruct.
  87
  88 Operator ::= '@and' | '@or' | '@not' | '@prox' Proximity.
  89
  90 Simple ::= ResultSet | Term.
  91
  92 ResultSet ::= '@set' string.
  93
  94 Term ::= string | '"' string '"'.
  95
  96 Proximity ::= Exclusion Distance Ordered Relation WhichCode UnitCode.
  97
  98 Exclusion ::= '1' | '0' | 'void'.
  99
 100 Distance ::= integer.
 101
 102 Ordered ::= '1' | '0'.
 103
 104 Relation ::= integer.
 105
 106 WhichCode ::= 'known' | 'private' | integer.
 107
 108 UnitCode ::= integer.
 109 </screen>
 110
 111 <para>
 112 You will note that the syntax above is a fairly faithful
 113 representation of RPN, except for the Attibute, which has been
 114 moved a step away from the term, allowing you to associate one or more
 115 attributes with an entire query structure. The parser will
 116 automatically apply the given attributes to each term as required.
 117 </para>
 118
 119 <para>
 120 The following are all examples of valid queries in the PQF.
 121 </para>
 122
 123 <screen>
 124 dylan
 125
 126 "bob dylan"
 127
 128 @or "dylan" "zimmerman"
 129
 130 @set Result-1
 131
 132 @or @and bob dylan @set Result-1
 133
 134 @attr 4=1 @and @attr 1=1 "bob dylan" @attr 1=4 "slow train coming"
 135
 136 @attr 4=1 @attr 1=4 "self portrait"
 137
 138 @prox 0 3 1 2 k 2 dylan zimmerman
 139 </screen>
 140
 141 </sect2>
 142 <sect2><title id="CCL">Common Command Language</title>
 143
 144 <para>
 145 Not all users enjoy typing in prefix query structures and numerical
 146 attribute values, even in a minimalistic test client. In the library
 147 world, the more intuitive Common Command Language (or ISO 8777) has
 148 enjoyed some popularity - especially before the widespread
 149 availability of graphical interfaces. It is still useful in
 150 applications where you for some reason or other need to provide a
 151 symbolic language for expressing boolean query structures.
 152 </para>
 153
 154 <para>
 155 The EUROPAGATE research project working under the Libraries programme
 156 of the European Commission's DG XIII has, amongst other useful tools,
 157 implemented a general-purpose CCL parser which produces an output
 158 structure that can be trivially converted to the internal RPN
 159 representation of YAZ (The <literal>Z_RPNQuery</literal> structure).
 160 Since the CCL utility - along with the rest of the software
 161 produced by EUROPAGATE - is made freely available on a liberal license, it
 162 is included as a supplement to YAZ.
 163 </para>
 164
 165 <sect3><title>CCL Syntax</title>
 166
 167 <para>
 168 The CCL parser obeys the following grammar for the FIND argument.
 169 The syntax is annotated by in the lines prefixed by
 170 <literal>&dash;&dash;</literal>.
 171 </para>
 172
 173 <screen>
 174 CCL-Find ::= CCL-Find Op Elements
 175            | Elements.
 176
 177 Op ::= "and" | "or" | "not"
 178 -- The above means that Elements are separated by boolean operators.
 179
 180 Elements ::= '(' CCL-Find ')'
 181            | Set
 182            | Terms
 183            | Qualifiers Relation Terms
 184            | Qualifiers Relation '(' CCL-Find ')'
 185            | Qualifiers '=' string '-' string
 186 -- Elements is either a recursive definition, a result set reference, a
 187 -- list of terms, qualifiers followed by terms, qualifiers followed
 188 -- by a recursive definition or qualifiers in a range (lower - upper).
 189
 190 Set ::= 'set' = string
 191 -- Reference to a result set
 192
 193 Terms ::= Terms Prox Term
 194         | Term
 195 -- Proximity of terms.
 196
 197 Term ::= Term string
 198        | string
 199 -- This basically means that a term may include a blank
 200
 201 Qualifiers ::= Qualifiers ',' string
 202              | string
 203 -- Qualifiers is a list of strings separated by comma
 204
 205 Relation ::= '=' | '>=' | '<=' | '<>' | '>' | '<'
 206 -- Relational operators. This really doesn't follow the ISO8777
 207 -- standard.
 208
 209 Prox ::= '%' | '!'
 210 -- Proximity operator
 211
 212 </screen>
 213
 214 <para>
 215 The following queries are all valid:
 216 </para>
 217
 218 <screen>
 219 dylan
 220
 221 "bob dylan"
 222
 223 dylan or zimmerman
 224
 225 set=1
 226
 227 (dylan and bob) or set=1
 228
 229 </screen>
 230 <para>
 231 Assuming that the qualifiers <literal>ti</literal>, <literal>au</literal>
 232 and <literal>date</literal> are defined we may use:
 233 </para>
 234
 235 <screen>
 236 ti=self portrait
 237
 238 au=(bob dylan and slow train coming)
 239
 240 date>1980 and (ti=((self portrait)))
 241
 242 </screen>
 243
 244 </sect3>
 245 <sect3><title>CCL Qualifiers</title>
 246
 247 <para>
 248 Qualifiers are used to direct the search to a particular searchable
 249 index, such as title (ti) and author indexes (au). The CCL standard
 250 itself doesn't specify a particular set of qualifiers, but it does
 251 suggest a few short-hand notations. You can customize the CCL parser
 252 to support a particular set of qualifiers to relect the current target
 253 profile. Traditionally, a qualifier would map to a particular
 254 use-attribute within the BIB-1 attribute set. However, you could also
 255 define qualifiers that would set, for example, the
 256 structure-attribute.
 257 </para>
 258
 259 <para>
 260 Consider a scenario where the target support ranked searches in the
 261 title-index. In this case, the user could specify
 262 </para>
 263
 264 <screen>>
 265 ti,ranked=knuth computer
 266 </screen>
 267 <para>
 268 and the <literal>ranked</literal> would map to structure=free-form-text
 269 (4=105) and the <literal>ti</literal> would map to title (1=4).
 270 </para>
 271
 272 <para>
 273 A "profile" with a set predefined CCL qualifiers can be read from a
 274 file. The YAZ client reads its CCL qualifiers from a file named
 275 <filename>default.bib</filename>. Each line in the file has the form:
 276 </para>
 277
 278 <para>
 279 <replaceable>qualifier-name</replaceable>
 280    <replaceable>type</replaceable>=<replaceable>val</replaceable> <replaceable>type</replaceable>=<replaceable>val</replaceable> ...
 281 </para>
 282
 283 <para>
 284 where <replaceable>qualifier-name</replaceable> is the name of the
 285 qualifier to be used (eg. <literal>ti</literal>),
 286 <replaceable>type</replaceable> is a BIB-1 category type and
 287 <replaceable>val</replaceable> is the corresponding BIB-1 attribute value.
 288 The <replaceable>type</replaceable> can be either numeric or it may be
 289 either <literal>u</literal> (use), <literal>r</literal> (relation),
 290 <literal>p</literal> (position), <literal>s</literal> (structure),
 291 <literal>t</literal> (truncation) or <literal>c</literal> (completeness).
 292 The <replaceable>qualifier-name</replaceable> <literal>term</literal> has a
 293 special meaning. The types and values for this definition is used when
 294 <emphasis>no</emphasis> qualifiers are present.
 295 </para>
 296
 297 <para>
 298 Consider the following definition:
 299 </para>
 300
 301 <screen>
 302 ti       u=4 s=1
 303 au       u=1 s=1
 304 term     s=105
 305 </screen>
 306 <para>
 307 Two qualifiers are defined, <literal>ti</literal> and <literal>au</literal>.
 308 They both set the structure-attribute to phrase (1). <literal>ti</literal>
 309 sets the use-attribute to 4. <literal>au</literal> sets the use-attribute
 310 to 1. When no qualifiers are used in the query the structure-attribute is
 311 set to free-form-text (105).
 312 </para>
 313
 314 </sect3>
 315 <sect3><title>CCL API</title>
 316 <para>
 317 All public definitions can be found in the header file
 318 <filename>ccl.h</filename>. A profile identifier is of type
 319 <literal>CCL_bibset</literal>. A profile must be created with the call to
 320 the function <function>ccl_qual_mk</function> which returns a profile
 321 handle of type <literal>CCL_bibset</literal>.
 322 </para>
 323
 324 <para>
 325 To read a file containing qualifier definitions the function
 326 <function>ccl_qual_file</function> may be convenient. This function takes
 327 an already opened <literal>FILE</literal> handle pointer as argument
 328 along with a <literal>CCL_bibset</literal> handle.
 329 </para>
 330
 331 <para>
 332 To parse a simple string with a FIND query use the function
 333 </para>
 334 <screen>
 335   struct ccl_rpn_node *ccl_find_str (CCL_bibset bibset, const char *str,
 336                                      int *error, int *pos);
 337 </screen>
 338 <para>
 339 which takes the CCL profile (<literal>bibset</literal>) and query
 340 (<literal>str</literal>) as input. Upon successful completion the RPN
 341 tree is returned. If an error eccur, such as a syntax error, the integer
 342 pointed to by <literal>error</literal> holds the error code and
 343 <literal>pos</literal> holds the offset inside query string in which
 344 the parsing failed.
 345 </para>
 346
 347 <para>
 348 An english representation of the error may be obtained by calling
 349 the <literal>ccl_err_msg</literal> function. The error codes are listed in
 350 <filename>ccl.h</filename>.
 351 </para>
 352
 353 <para>
 354 To convert the CCL RPN tree (type <literal>struct ccl_rpn_node *</literal>)
 355 to the Z_RPNQuery of YAZ the function <function>ccl_rpn_query</function>
 356 must be used. This function which is part of YAZ is implemented in
 357 <filename>yaz-ccl.c</filename>.
 358 After calling this function the CCL RPN tree is probably no longer
 359 needed. The <literal>ccl_rpn_delete</literal> destroys the CCL RPN tree.
 360 </para>
 361
 362 <para>
 363 A CCL profile may be destroyed by calling the <function>ccl_qual_rm</function>
 364 function.
 365 </para>
 366
 367 <para>
 368 The token names for the CCL operators may be changed by setting the
 369 globals (all type <literal>char *</literal>)
 370 <literal>ccl_token_and</literal>, <literal>ccl_token_or</literal>,
 371 <literal>ccl_token_not</literal> and <literal>ccl_token_set</literal>.
 372 An operator may have aliases, i.e. there may be more than one name for
 373 the operator. To do this, separate each alias with a space character.
 374 </para>
 375 </sect3>
 376 </sect2>
 377 </sect1>
 378 <sect1><title>Object Identifiers</title>
 379
 380 <para>
 381 The basic YAZ representation of an OID is an array of integers,
 382 terminated with the value -1. The &odr; module provides two
 383 utility-functions to create and copy this type of data elements:
 384 </para>
 385
 386 <screen>
 387   Odr_oid *odr_getoidbystr(ODR o, char *str);
 388 </screen>
 389
 390 <para>
 391 Creates an OID based on a string-based representation using dots (.)
 392 to separate elements in the OID.
 393 </para>
 394
 395 <screen>
 396 Odr_oid *odr_oiddup(ODR odr, Odr_oid *o);
 397 </screen>
 398
 399 <para>
 400 Creates a copy of the OID referenced by the <emphasis>o</emphasis> parameter.
 401 Both functions take an &odr; stream as parameter. This stream is used to
 402 allocate memory for the data elements, which is released on a
 403 subsequent call to <function>odr_reset()</function> on that stream.
 404 </para>
 405
 406 <para>
 407 The OID module provides a higher-level representation of the
 408 family of object identifers which describe the Z39.50 protocol and its
 409 related objects. The definition of the module interface is given in
 410 the <filename>oid.h</filename> file.
 411 </para>
 412
 413 <para>
 414 The interface is mainly based on the <literal>oident</literal> structure. The
 415 definition of this structure looks like this:
 416 </para>
 417
 418 <screen>
 419 typedef struct oident
 420 {
 421     oid_proto proto;
 422     oid_class oclass;
 423     oid_value value;
 424     int oidsuffix[OID_SIZE];
 425     char *desc;
 426 } oident;
 427 </screen>
 428
 429 <para>
 430 The proto field takes one of the values
 431 </para>
 432
 433 <screen>
 434 PROTO_Z3950
 435 PROTO_SR
 436 </screen>
 437
 438 <para>
 439 If you don't care about talking to SR-based implementations (few
 440 exist, and they may become fewer still if and when the ISO SR and ANSI
 441 Z39.50 documents are merged into a single standard), you can ignore
 442 this field on incoming packages, and always set it to PROTO_Z3950
 443 for outgoing packages.
 444 </para>
 445 <para>
 446
 447 The oclass field takes one of the values
 448 </para>
 449
 450 <screen>
 451 CLASS_APPCTX
 452 CLASS_ABSYN
 453 CLASS_ATTSET
 454 CLASS_TRANSYN
 455 CLASS_DIAGSET
 456 CLASS_RECSYN
 457 CLASS_RESFORM
 458 CLASS_ACCFORM
 459 CLASS_EXTSERV
 460 CLASS_USERINFO
 461 CLASS_ELEMSPEC
 462 CLASS_VARSET
 463 CLASS_SCHEMA
 464 CLASS_TAGSET
 465 CLASS_GENERAL
 466 </screen>
 467
 468 <para>
 469 corresponding to the OID classes defined by the Z39.50 standard.
 470
 471 Finally, the value field takes one of the values
 472 </para>
 473
 474 <screen>
 475 VAL_APDU
 476 VAL_BER
 477 VAL_BASIC_CTX
 478 VAL_BIB1
 479 VAL_EXP1
 480 VAL_EXT1
 481 VAL_CCL1
 482 VAL_GILS
 483 VAL_WAIS
 484 VAL_STAS
 485 VAL_DIAG1
 486 VAL_ISO2709
 487 VAL_UNIMARC
 488 VAL_INTERMARC
 489 VAL_CCF
 490 VAL_USMARC
 491 VAL_UKMARC
 492 VAL_NORMARC
 493 VAL_LIBRISMARC
 494 VAL_DANMARC
 495 VAL_FINMARC
 496 VAL_MAB
 497 VAL_CANMARC
 498 VAL_SBN
 499 VAL_PICAMARC
 500 VAL_AUSMARC
 501 VAL_IBERMARC
 502 VAL_EXPLAIN
 503 VAL_SUTRS
 504 VAL_OPAC
 505 VAL_SUMMARY
 506 VAL_GRS0
 507 VAL_GRS1
 508 VAL_EXTENDED
 509 VAL_RESOURCE1
 510 VAL_RESOURCE2
 511 VAL_PROMPT1
 512 VAL_DES1
 513 VAL_KRB1
 514 VAL_PRESSET
 515 VAL_PQUERY
 516 VAL_PCQUERY
 517 VAL_ITEMORDER
 518 VAL_DBUPDATE
 519 VAL_EXPORTSPEC
 520 VAL_EXPORTINV
 521 VAL_NONE
 522 VAL_SETM
 523 VAL_SETG
 524 VAL_VAR1
 525 VAL_ESPEC1
 526 </screen>
 527
 528 <para>
 529 again, corresponding to the specific OIDs defined by the standard.
 530 </para>
 531
 532 <para>
 533 The desc field contains a brief, mnemonic name for the OID in question.
 534 </para>
 535
 536 <para>
 537 The function
 538 </para>
 539
 540 <screen>
 541   struct oident *oid_getentbyoid(int *o);
 542 </screen>
 543
 544 <para>
 545 takes as argument an OID, and returns a pointer to a static area
 546 containing an <literal>oident</literal> structure. You typically use
 547 this function when you receive a PDU containing an OID, and you wish
 548 to branch out depending on the specific OID value.
 549 </para>
 550
 551 <para>
 552 The function
 553 </para>
 554
 555 <screen>
 556   int *oid_ent_to_oid(struct oident *ent, int *dst);
 557 </screen>
 558
 559 <para>
 560 Takes as argument an <literal>oident</literal> structure - in which
 561 the <literal>proto</literal>, <literal>oclass</literal>/, and
 562 <literal>value</literal> fields are assumed to be set correctly -
 563 and returns a pointer to a the buffer as given by <literal>dst</literal>
 564 containing the base
 565 representation of the corresponding OID. The function returns
 566 NULL and the array dst is unchanged if a mapping couldn't place.
 567 The array <literal>dst</literal> should be at least of size
 568 <literal>OID_SIZE</literal>.
 569 </para>
 570 <para>
 571
 572 The <function>oid_ent_to_oid()</function> function can be used whenever
 573 you need to prepare a PDU containing one or more OIDs. The separation of
 574 the <literal>protocol</literal> element from the remainer of the
 575 OID-description makes it simple to write applications that can
 576 communicate with either Z39.50 or OSI SR-based applications.
 577 </para>
 578
 579 <para>
 580 The function
 581 </para>
 582
 583 <screen><
 584   oid_value oid_getvalbyname(const char *name);
 585 </screen>
 586
 587 <para>
 588 takes as argument a mnemonic OID name, and returns the
 589 <literal>/value</literal> field of the first entry in the database that
 590 contains the given name in its <literal>desc</literal> field.
 591 </para>
 592
 593 <para>
 594 Finally, the module provides the following utility functions, whose
 595 meaning should be obvious:
 596 </para>
 597
 598 <screen>
 599   void oid_oidcpy(int *t, int *s);
 600   void oid_oidcat(int *t, int *s);
 601   int oid_oidcmp(int *o1, int *o2);
 602   int oid_oidlen(int *o);
 603 </screen>
 604
 605 <note>
 606 <para>
 607 The OID module has been criticized - and perhaps rightly so
 608 - for needlessly abstracting the
 609 representation of OIDs. Other toolkits use a simple
 610 string-representation of OIDs with good results. In practice, we have
 611 found the interface comfortable and quick to work with, and it is a
 612 simple matter (for what it's worth) to create applications compatible with
 613 both ISO SR and Z39.50. Finally, the use of the <literal>/oident</literal>
 614 database is by no means mandatory. You can easily create your
 615 own system for representing OIDs, as long as it is compatible with the
 616 low-level integer-array representation of the ODR module.
 617 </para>
 618 </note>
 619
 620 </sect1>
 621
 622 <sect1><title>Nibble Memory</title>
 623
 624 <para>
 625 Sometimes when you need to allocate and construct a large,
 626 interconnected complex of structures, it can be a bit of a pain to
 627 release the associated memory again. For the structures describing the
 628 Z39.50 PDUs and related structures, it is convenient to use the
 629 memory-management system of the &odr; subsystem (see
 630 <link linkend="odr-use">Using ODR</link>). However, in some circumstances
 631 where you might otherwise benefit from using a simple nibble memory
 632 management system, it may be impractical to use
 633 <function>odr_malloc()</function> and <function>odr_reset()</function>.
 634 For this purpose, the memory manager which also supports the &odr; streams
 635 is made available in the NMEM module. The external interface to this module is given in the <filename>nmem.h</filename> file.
 636 </para>
 637
 638 <para>
 639 The following prototypes are given:
 640 </para>
 641
 642 <screen>
 643 NMEM nmem_create(void);
 644 void nmem_destroy(NMEM n);
 645 void *nmem_malloc(NMEM n, int size);
 646 void nmem_reset(NMEM n);
 647 int nmem_total(NMEM n);
 648 void nmem_init(void);
 649 </screen>
 650
 651 <para>
 652 The <function>nmem_create()</function> function returns a pointer to a
 653 memory control handle, which can be released again by
 654 <function>nmem_destroy()</function> when no longer needed.
 655 The function <function>nmem_malloc()</function> allocates a block of
 656 memory of the requested size. A call to <function>nmem_reset()</function> or
 657 <function>nmem_destroy()</function> will release all memory allocated on
 658 the handle since it was created (or since the last call to
 659 <function>nmem_reset()</function>. The function
 660 <function>nmem_total()</function> returns the number of bytes currently
 661 allocated on the handle.
 662 </para>
 663
 664 <note>
 665 <para>
 666 The nibble memory pool is shared amonst threads. POSIX
 667 mutex'es and WIN32 Critical sections are introduced to keep the
 668 module thread safe. On WIN32 function <function>nmem_init()</function>
 669 initialises the Critical Section handle and should be called once before any
 670 other nmem function is used.
 671 </para>
 672 </note>
 673
 674 </sect1>
 675 </chapter>