Supporting Tools
In support of the service API - primarily the ASN module, which
provides the programmatic interface to the Z39.50 APDUs, YAZ contains
a collection of tools that support the development of applications.
Query Syntax Parsers
Since the type-1 (RPN) query structure has no direct, useful string
representation, every origin application needs to provide some form of
mapping from a local query notation or representation to a
Z_RPNQuery structure. Some programmers will prefer to
construct the query manually, perhaps using
odr_malloc() to simplify memory management.
The &yaz; distribution includes two separate, query-generating tools
that may be of use to you.
Prefix Query Format
Since RPN or reverse polish notation is really just a fancy way of
describing a suffix notation format (operator follows operands), it
would seem that the confusion is total when we now introduce a prefix
notation for RPN. The reason is one of simple laziness - it's somewhat
simpler to interpret a prefix format, and this utility was designed
for maximum simplicity, to provide a baseline representation for use
in simple test applications and scripting environments (like Tcl). The
demonstration client included with YAZ uses the PQF.
The PQF is defined by the pquery module in the YAZ library. The
pquery.h file provides the declaration of the
functions
Z_RPNQuery *p_query_rpn (ODR o, oid_proto proto, const char *qbuf);
Z_AttributesPlusTerm *p_query_scan (ODR o, oid_proto proto,
Odr_oid **attributeSetP, const char *qbuf);
int p_query_attset (const char *arg);
The function p_query_rpn() takes as arguments an
&odr; stream (see section The ODR Module)
to provide a memory source (the structure created is released on
the next call to odr_reset() on the stream), a
protocol identifier (one of the constants PROTO_Z3950 and
PROTO_SR), an attribute set
reference, and finally a null-terminated string holding the query
string.
If the parse went well, p_query_rpn() returns a
pointer to a Z_RPNQuery structure which can be
placed directly into a Z_SearchRequest.
The p_query_attset specifies which attribute set
to use if the query doesn't specify one by the
@attrset operator.
The p_query_attset returns 0 if the argument is a
valid attribute set specifier; otherwise the function returns -1.
The grammar of the PQF is as follows:
Query ::= [ AttSet ] QueryStruct.
AttSet ::= string.
QueryStruct ::= { Attribute } Simple | Complex.
Attribute ::= '@attr' AttributeType '=' AttributeValue.
AttributeType ::= integer.
AttributeValue ::= integer.
Complex ::= Operator QueryStruct QueryStruct.
Operator ::= '@and' | '@or' | '@not' | '@prox' Proximity.
Simple ::= ResultSet | Term.
ResultSet ::= '@set' string.
Term ::= string | '"' string '"'.
Proximity ::= Exclusion Distance Ordered Relation WhichCode UnitCode.
Exclusion ::= '1' | '0' | 'void'.
Distance ::= integer.
Ordered ::= '1' | '0'.
Relation ::= integer.
WhichCode ::= 'known' | 'private' | integer.
UnitCode ::= integer.
You will note that the syntax above is a fairly faithful
representation of RPN, except for the Attibute, which has been
moved a step away from the term, allowing you to associate one or more
attributes with an entire query structure. The parser will
automatically apply the given attributes to each term as required.
The following are all examples of valid queries in the PQF.
dylan
"bob dylan"
@or "dylan" "zimmerman"
@set Result-1
@or @and bob dylan @set Result-1
@attr 4=1 @and @attr 1=1 "bob dylan" @attr 1=4 "slow train coming"
@attr 4=1 @attr 1=4 "self portrait"
@prox 0 3 1 2 k 2 dylan zimmerman
Common Command Language
Not all users enjoy typing in prefix query structures and numerical
attribute values, even in a minimalistic test client. In the library
world, the more intuitive Common Command Language (or ISO 8777) has
enjoyed some popularity - especially before the widespread
availability of graphical interfaces. It is still useful in
applications where you for some reason or other need to provide a
symbolic language for expressing boolean query structures.
The EUROPAGATE research project working under the Libraries programme
of the European Commission's DG XIII has, amongst other useful tools,
implemented a general-purpose CCL parser which produces an output
structure that can be trivially converted to the internal RPN
representation of YAZ (The Z_RPNQuery structure).
Since the CCL utility - along with the rest of the software
produced by EUROPAGATE - is made freely available on a liberal license, it
is included as a supplement to YAZ.
CCL Syntax
The CCL parser obeys the following grammar for the FIND argument.
The syntax is annotated by in the lines prefixed by
‐‐.
CCL-Find ::= CCL-Find Op Elements
| Elements.
Op ::= "and" | "or" | "not"
-- The above means that Elements are separated by boolean operators.
Elements ::= '(' CCL-Find ')'
| Set
| Terms
| Qualifiers Relation Terms
| Qualifiers Relation '(' CCL-Find ')'
| Qualifiers '=' string '-' string
-- Elements is either a recursive definition, a result set reference, a
-- list of terms, qualifiers followed by terms, qualifiers followed
-- by a recursive definition or qualifiers in a range (lower - upper).
Set ::= 'set' = string
-- Reference to a result set
Terms ::= Terms Prox Term
| Term
-- Proximity of terms.
Term ::= Term string
| string
-- This basically means that a term may include a blank
Qualifiers ::= Qualifiers ',' string
| string
-- Qualifiers is a list of strings separated by comma
Relation ::= '=' | '>=' | '<=' | '<>' | '>' | '<'
-- Relational operators. This really doesn't follow the ISO8777
-- standard.
Prox ::= '%' | '!'
-- Proximity operator
The following queries are all valid:
dylan
"bob dylan"
dylan or zimmerman
set=1
(dylan and bob) or set=1
Assuming that the qualifiers ti, au
and date are defined we may use:
ti=self portrait
au=(bob dylan and slow train coming)
date>1980 and (ti=((self portrait)))
CCL Qualifiers
Qualifiers are used to direct the search to a particular searchable
index, such as title (ti) and author indexes (au). The CCL standard
itself doesn't specify a particular set of qualifiers, but it does
suggest a few short-hand notations. You can customize the CCL parser
to support a particular set of qualifiers to relect the current target
profile. Traditionally, a qualifier would map to a particular
use-attribute within the BIB-1 attribute set. However, you could also
define qualifiers that would set, for example, the
structure-attribute.
Consider a scenario where the target support ranked searches in the
title-index. In this case, the user could specify
>
ti,ranked=knuth computer
and the ranked would map to structure=free-form-text
(4=105) and the ti would map to title (1=4).
A "profile" with a set predefined CCL qualifiers can be read from a
file. The YAZ client reads its CCL qualifiers from a file named
default.bib. Each line in the file has the form:
qualifier-nametype=valtype=val ...
where qualifier-name is the name of the
qualifier to be used (eg. ti),
type is a BIB-1 category type and
val is the corresponding BIB-1 attribute
value.
The type can be either numeric or it may be
either u (use), r (relation),
p (position), s (structure),
t (truncation) or c (completeness).
The qualifier-nameterm
has a special meaning.
The types and values for this definition is used when
no qualifiers are present.
Consider the following definition:
ti u=4 s=1
au u=1 s=1
term s=105
Two qualifiers are defined, ti and
au.
They both set the structure-attribute to phrase (1).
ti
sets the use-attribute to 4. au sets the
use-attribute to 1.
When no qualifiers are used in the query the structure-attribute is
set to free-form-text (105).
CCL API
All public definitions can be found in the header file
ccl.h. A profile identifier is of type
CCL_bibset. A profile must be created with the call
to the function ccl_qual_mk which returns a profile
handle of type CCL_bibset.
To read a file containing qualifier definitions the function
ccl_qual_file may be convenient. This function
takes an already opened FILE handle pointer as
argument along with a CCL_bibset handle.
To parse a simple string with a FIND query use the function
struct ccl_rpn_node *ccl_find_str (CCL_bibset bibset, const char *str,
int *error, int *pos);
which takes the CCL profile (bibset) and query
(str) as input. Upon successful completion the RPN
tree is returned. If an error eccur, such as a syntax error, the integer
pointed to by error holds the error code and
pos holds the offset inside query string in which
the parsing failed.
An english representation of the error may be obtained by calling
the ccl_err_msg function. The error codes are
listed in ccl.h.
To convert the CCL RPN tree (type
struct ccl_rpn_node *)
to the Z_RPNQuery of YAZ the function ccl_rpn_query
must be used. This function which is part of YAZ is implemented in
yaz-ccl.c.
After calling this function the CCL RPN tree is probably no longer
needed. The ccl_rpn_delete destroys the CCL RPN tree.
A CCL profile may be destroyed by calling the
ccl_qual_rm function.
The token names for the CCL operators may be changed by setting the
globals (all type char *)
ccl_token_and, ccl_token_or,
ccl_token_not and ccl_token_set.
An operator may have aliases, i.e. there may be more than one name for
the operator. To do this, separate each alias with a space character.
Object Identifiers
The basic YAZ representation of an OID is an array of integers,
terminated with the value -1. The &odr; module provides two
utility-functions to create and copy this type of data elements:
Odr_oid *odr_getoidbystr(ODR o, char *str);
Creates an OID based on a string-based representation using dots (.)
to separate elements in the OID.
Odr_oid *odr_oiddup(ODR odr, Odr_oid *o);
Creates a copy of the OID referenced by the o
parameter.
Both functions take an &odr; stream as parameter. This stream is used to
allocate memory for the data elements, which is released on a
subsequent call to odr_reset() on that stream.
The OID module provides a higher-level representation of the
family of object identifers which describe the Z39.50 protocol and its
related objects. The definition of the module interface is given in
the oid.h file.
The interface is mainly based on the oident structure.
The definition of this structure looks like this:
typedef struct oident
{
oid_proto proto;
oid_class oclass;
oid_value value;
int oidsuffix[OID_SIZE];
char *desc;
} oident;
The proto field takes one of the values
PROTO_Z3950
PROTO_SR
If you don't care about talking to SR-based implementations (few
exist, and they may become fewer still if and when the ISO SR and ANSI
Z39.50 documents are merged into a single standard), you can ignore
this field on incoming packages, and always set it to PROTO_Z3950
for outgoing packages.
The oclass field takes one of the values
CLASS_APPCTX
CLASS_ABSYN
CLASS_ATTSET
CLASS_TRANSYN
CLASS_DIAGSET
CLASS_RECSYN
CLASS_RESFORM
CLASS_ACCFORM
CLASS_EXTSERV
CLASS_USERINFO
CLASS_ELEMSPEC
CLASS_VARSET
CLASS_SCHEMA
CLASS_TAGSET
CLASS_GENERAL
corresponding to the OID classes defined by the Z39.50 standard.
Finally, the value field takes one of the values
VAL_APDU
VAL_BER
VAL_BASIC_CTX
VAL_BIB1
VAL_EXP1
VAL_EXT1
VAL_CCL1
VAL_GILS
VAL_WAIS
VAL_STAS
VAL_DIAG1
VAL_ISO2709
VAL_UNIMARC
VAL_INTERMARC
VAL_CCF
VAL_USMARC
VAL_UKMARC
VAL_NORMARC
VAL_LIBRISMARC
VAL_DANMARC
VAL_FINMARC
VAL_MAB
VAL_CANMARC
VAL_SBN
VAL_PICAMARC
VAL_AUSMARC
VAL_IBERMARC
VAL_EXPLAIN
VAL_SUTRS
VAL_OPAC
VAL_SUMMARY
VAL_GRS0
VAL_GRS1
VAL_EXTENDED
VAL_RESOURCE1
VAL_RESOURCE2
VAL_PROMPT1
VAL_DES1
VAL_KRB1
VAL_PRESSET
VAL_PQUERY
VAL_PCQUERY
VAL_ITEMORDER
VAL_DBUPDATE
VAL_EXPORTSPEC
VAL_EXPORTINV
VAL_NONE
VAL_SETM
VAL_SETG
VAL_VAR1
VAL_ESPEC1
again, corresponding to the specific OIDs defined by the standard.
The desc field contains a brief, mnemonic name for the OID in question.
The function
struct oident *oid_getentbyoid(int *o);
takes as argument an OID, and returns a pointer to a static area
containing an oident structure. You typically use
this function when you receive a PDU containing an OID, and you wish
to branch out depending on the specific OID value.
The function
int *oid_ent_to_oid(struct oident *ent, int *dst);
Takes as argument an oident structure - in which
the proto, oclass/, and
value fields are assumed to be set correctly -
and returns a pointer to a the buffer as given by dst
containing the base
representation of the corresponding OID. The function returns
NULL and the array dst is unchanged if a mapping couldn't place.
The array dst should be at least of size
OID_SIZE.
The oid_ent_to_oid() function can be used whenever
you need to prepare a PDU containing one or more OIDs. The separation of
the protocol element from the remainer of the
OID-description makes it simple to write applications that can
communicate with either Z39.50 or OSI SR-based applications.
The function
<
oid_value oid_getvalbyname(const char *name);
takes as argument a mnemonic OID name, and returns the
/value field of the first entry in the database that
contains the given name in its desc field.
Finally, the module provides the following utility functions, whose
meaning should be obvious:
void oid_oidcpy(int *t, int *s);
void oid_oidcat(int *t, int *s);
int oid_oidcmp(int *o1, int *o2);
int oid_oidlen(int *o);
The OID module has been criticized - and perhaps rightly so
- for needlessly abstracting the
representation of OIDs. Other toolkits use a simple
string-representation of OIDs with good results. In practice, we have
found the interface comfortable and quick to work with, and it is a
simple matter (for what it's worth) to create applications compatible
with both ISO SR and Z39.50. Finally, the use of the
/oident database is by no means mandatory.
You can easily create your own system for representing OIDs, as long
as it is compatible with the low-level integer-array representation
of the ODR module.
Nibble Memory
Sometimes when you need to allocate and construct a large,
interconnected complex of structures, it can be a bit of a pain to
release the associated memory again. For the structures describing the
Z39.50 PDUs and related structures, it is convenient to use the
memory-management system of the &odr; subsystem (see
Using ODR). However, in some circumstances
where you might otherwise benefit from using a simple nibble memory
management system, it may be impractical to use
odr_malloc() and odr_reset().
For this purpose, the memory manager which also supports the &odr;
streams is made available in the NMEM module. The external interface
to this module is given in the nmem.h file.
The following prototypes are given:
NMEM nmem_create(void);
void nmem_destroy(NMEM n);
void *nmem_malloc(NMEM n, int size);
void nmem_reset(NMEM n);
int nmem_total(NMEM n);
void nmem_init(void);
The nmem_create() function returns a pointer to a
memory control handle, which can be released again by
nmem_destroy() when no longer needed.
The function nmem_malloc() allocates a block of
memory of the requested size. A call to nmem_reset()
or nmem_destroy() will release all memory allocated
on the handle since it was created (or since the last call to
nmem_reset(). The function
nmem_total() returns the number of bytes currently
allocated on the handle.
The nibble memory pool is shared amonst threads. POSIX
mutex'es and WIN32 Critical sections are introduced to keep the
module thread safe. On WIN32 function nmem_init()
initialises the Critical Section handle and should be called once
before any other nmem function is used.