Supporting Tools
In support of the service API - primarily the ASN module, which
provides the pro-grammatic interface to the Z39.50 APDUs, &yaz; contains
a collection of tools that support the development of applications.
Query Syntax Parsers
Since the type-1 (RPN) query structure has no direct, useful string
representation, every origin application needs to provide some form of
mapping from a local query notation or representation to a
Z_RPNQuery structure. Some programmers will prefer to
construct the query manually, perhaps using
odr_malloc() to simplify memory management.
The &yaz; distribution includes three separate, query-generating tools
that may be of use to you.
Prefix Query Format
Since RPN or reverse polish notation is really just a fancy way of
describing a suffix notation format (operator follows operands), it
would seem that the confusion is total when we now introduce a prefix
notation for RPN. The reason is one of simple laziness - it's somewhat
simpler to interpret a prefix format, and this utility was designed
for maximum simplicity, to provide a baseline representation for use
in simple test applications and scripting environments (like Tcl). The
demonstration client included with YAZ uses the PQF.
The PQF have been adopted by other parties developing Z39.50
software. It is often referred to as Prefix Query Notation
- PQN.
The PQF is defined by the pquery module in the YAZ library.
There are two sets of function that have similar behavior. First
set operates on a PQF parser handle, second set doesn't. First set
set of functions are more flexible than the second set. Second set
is obsolete and is only provided to ensure backwards compatibility.
First set of functions all operate on a PQF parser handle:
#include <yaz/pquery.h>
YAZ_PQF_Parser yaz_pqf_create (void);
void yaz_pqf_destroy (YAZ_PQF_Parser p);
Z_RPNQuery *yaz_pqf_parse (YAZ_PQF_Parser p, ODR o, const char *qbuf);
Z_AttributesPlusTerm *yaz_pqf_scan (YAZ_PQF_Parser p, ODR o,
Odr_oid **attributeSetId, const char *qbuf);
int yaz_pqf_error (YAZ_PQF_Parser p, const char **msg, size_t *off);
A PQF parser is created and destructed by functions
yaz_pqf_create and
yaz_pqf_destroy respectively.
Function yaz_pqf_parse parses query given
by string qbuf. If parsing was successful,
a Z39.50 RPN Query is returned which is created using ODR stream
o. If parsing failed, a NULL pointer is
returned.
Function yaz_pqf_scan takes a scan query in
qbuf. If parsing was successful, the function
returns attributes plus term pointer and modifies
attributeSetId to hold attribute set for the
scan request - both allocated using ODR stream o.
If parsing failed, yaz_pqf_scan returns a NULL pointer.
Error information for bad queries can be obtained by a call to
yaz_pqf_error which returns an error code and
modifies *msg to point to an error description,
and modifies *off to the offset within last
query were parsing failed.
The second set of functions are declared as follows:
#include <yaz/pquery.h>
Z_RPNQuery *p_query_rpn (ODR o, oid_proto proto, const char *qbuf);
Z_AttributesPlusTerm *p_query_scan (ODR o, oid_proto proto,
Odr_oid **attributeSetP, const char *qbuf);
int p_query_attset (const char *arg);
The function p_query_rpn() takes as arguments an
&odr; stream (see section The ODR Module)
to provide a memory source (the structure created is released on
the next call to odr_reset() on the stream), a
protocol identifier (one of the constants PROTO_Z3950 and
PROTO_SR), an attribute set reference, and
finally a null-terminated string holding the query string.
If the parse went well, p_query_rpn() returns a
pointer to a Z_RPNQuery structure which can be
placed directly into a Z_SearchRequest.
If parsing failed, due to syntax error, a NULL pointer is returned.
The p_query_attset specifies which attribute set
to use if the query doesn't specify one by the
@attrset operator.
The p_query_attset returns 0 if the argument is a
valid attribute set specifier; otherwise the function returns -1.
The grammar of the PQF is as follows:
query ::= top-set query-struct.
top-set ::= [ '@attrset' string ]
query-struct ::= attr-spec | simple | complex | '@term' term-type query
attr-spec ::= '@attr' [ string ] string query-struct
complex ::= operator query-struct query-struct.
operator ::= '@and' | '@or' | '@not' | '@prox' proximity.
simple ::= result-set | term.
result-set ::= '@set' string.
term ::= string.
proximity ::= exclusion distance ordered relation which-code unit-code.
exclusion ::= '1' | '0' | 'void'.
distance ::= integer.
ordered ::= '1' | '0'.
relation ::= integer.
which-code ::= 'known' | 'private' | integer.
unit-code ::= integer.
term-type ::= 'general' | 'numeric' | 'string' | 'oid' | 'datetime' | 'null'.
You will note that the syntax above is a fairly faithful
representation of RPN, except for the Attribute, which has been
moved a step away from the term, allowing you to associate one or more
attributes with an entire query structure. The parser will
automatically apply the given attributes to each term as required.
The @attr operator is followed by an attribute specification
(attr-spec above). The specification consists
of an optional attribute set, an attribute type-value pair and
a sub-query. The attribute type-value pair is packed in one string:
an attribute type, an equals sign, and an attribute value, like this:
@attr 1=1003.
The type is always an integer but the value may be either an
integer or a string (if it doesn't start with a digit character).
A string attribute-value is encoded as a Type-1 ``complex''
attribute with the list of values containing the single string
specified, and including no semantic indicators.
Version 3 of the Z39.50 specification defines various encoding of terms.
Use @term typestring,
where type is one of: general,
numeric or string
(for InternationalString).
If no term type has been given, the general form
is used. This is the only encoding allowed in both versions 2 and 3
of the Z39.50 standard.
Using Proximity Operators with PQF
This is an advanced topic, describing how to construct
queries that make very specific requirements on the
relative location of their operands.
You may wish to skip this section and go straight to
the example PQF queries.
Most Z39.50 servers do not support proximity searching, or
support only a small subset of the full functionality that
can be expressed using the PQF proximity operator. Be
aware that the ability to express a
query in PQF is no guarantee that any given server will
be able to execute it.
The proximity operator @prox is a special
and more restrictive version of the conjunction operator
@and. Its semantics are described in
section 3.7.2 (Proximity) of Z39.50 the standard itself, which
can be read on-line at
In PQF, the proximity operation is represented by a sequence
of the form
@prox exclusiondistanceorderedrelationwhich-codeunit-code
in which the meanings of the parameters are as described in in
the standard, and they can take the following values:
exclusion
0 = false (i.e. the proximity condition specified by the
remaining parameters must be satisfied) or
1 = true (the proximity condition specified by the
remaining parameters must not be
satisifed).
distance
An integer specifying the difference between the locations
of the operands: e.g. two adjacent words would have
distance=1 since their locations differ by one unit.
ordered
1 = ordered (the operands must occur in the order the
query specifies them) or
0 = unordered (they may appear in either order).
relation
Recognised values are
1 (lessThan),
2 (lessThanOrEqual),
3 (equal),
4 (greaterThanOrEqual),
5 (greaterThan) and
6 (notEqual).
which-codeknown
or
k
(the unit-code parameter is taken from the well-known list
of alternatives described in below) or
private
or
p
(the unit-code paramater has semantics specific to an
out-of-band agreement such as a profile).
unit-code
If the which-code parameter is known
then the recognised values are
1 (character),
2 (word),
3 (sentence),
4 (paragraph),
5 (section),
6 (chapter),
7 (document),
8 (element),
9 (subelement),
10 (elementType) and
11 (byte).
If which-code is private then the
acceptable values are determined by the profile.
(The numeric values of the relation and well-known unit-code
parameters are taken straight from
the ASN.1 of the proximity structure in the standard.)
PQF queriesPQF queries using simple terms
dylan
"bob dylan"
PQF boolean operators
@or "dylan" "zimmerman"
@and @or dylan zimmerman when
@and when @or dylan zimmerman
PQF references to result sets
@set Result-1
@and @set seta @set setb
Attributes for terms
@attr 1=4 computer
@attr 1=4 @attr 4=1 "self portrait"
@attrset exp1 @attr 1=1 CategoryList
@attr gils 1=2008 Copenhagen
@attr 1=/book/title computer
PQF Proximity queries
@prox 0 3 1 2 k 2 dylan zimmerman
Here the parameters 0, 3, 1, 2, k and 2 represent exclusion,
distance, ordered, relation, which-code and unit-code, in that
order. So:
exclusion = 0: the proximity condition must hold
distance = 3: the terms must be three units apart
ordered = 1: they must occur in the order they are specified
relation = 2: lessThanOrEqual (to the distance of 3 units)
which-code is ``known'', so the standard unit-codes are used
unit-code = 2: word.
So the whole proximity query means that the words
dylan and zimmerman must
both occur in the record, in that order, differing in position
by three or fewer words (i.e. with two or fewer words between
them.) The query would find ``Bob Dylan, aka. Robert
Zimmerman'', but not ``Bob Dylan, born as Robert Zimmerman''
since the distance in this case is four.
PQF specification of search term
@term string "a UTF-8 string, maybe?"
PQF mixed queries
@or @and bob dylan @set Result-1
@attr 4=1 @and @attr 1=1 "bob dylan" @attr 1=4 "slow train coming"
@and @attr 2=4 @attr gils 1=2038 -114 @attr 2=2 @attr gils 1=2039 -109
The last of these examples is a spatial search: in
the GILS attribute set,
access point
2038 indicates West Bounding Coordinate and
2030 indicates East Bounding Coordinate,
so the query is for areas extending from -114 degrees
to no more than -109 degrees.
CCL
Not all users enjoy typing in prefix query structures and numerical
attribute values, even in a minimalistic test client. In the library
world, the more intuitive Common Command Language - CCL (ISO 8777)
has enjoyed some popularity - especially before the widespread
availability of graphical interfaces. It is still useful in
applications where you for some reason or other need to provide a
symbolic language for expressing boolean query structures.
The EUROPAGATE research project working under the Libraries programme
of the European Commission's DG XIII has, amongst other useful tools,
implemented a general-purpose CCL parser which produces an output
structure that can be trivially converted to the internal RPN
representation of &yaz; (The Z_RPNQuery structure).
Since the CCL utility - along with the rest of the software
produced by EUROPAGATE - is made freely available on a liberal
license, it is included as a supplement to &yaz;.
CCL Syntax
The CCL parser obeys the following grammar for the FIND argument.
The syntax is annotated by in the lines prefixed by
‐‐.
CCL-Find ::= CCL-Find Op Elements
| Elements.
Op ::= "and" | "or" | "not"
-- The above means that Elements are separated by boolean operators.
Elements ::= '(' CCL-Find ')'
| Set
| Terms
| Qualifiers Relation Terms
| Qualifiers Relation '(' CCL-Find ')'
| Qualifiers '=' string '-' string
-- Elements is either a recursive definition, a result set reference, a
-- list of terms, qualifiers followed by terms, qualifiers followed
-- by a recursive definition or qualifiers in a range (lower - upper).
Set ::= 'set' = string
-- Reference to a result set
Terms ::= Terms Prox Term
| Term
-- Proximity of terms.
Term ::= Term string
| string
-- This basically means that a term may include a blank
Qualifiers ::= Qualifiers ',' string
| string
-- Qualifiers is a list of strings separated by comma
Relation ::= '=' | '>=' | '<=' | '<>' | '>' | '<'
-- Relational operators. This really doesn't follow the ISO8777
-- standard.
Prox ::= '%' | '!'
-- Proximity operator
CCL queries
The following queries are all valid:
dylan
"bob dylan"
dylan or zimmerman
set=1
(dylan and bob) or set=1
Assuming that the qualifiers ti,
au
and date are defined we may use:
ti=self portrait
au=(bob dylan and slow train coming)
date>1980 and (ti=((self portrait)))
CCL Qualifiers
Qualifiers are used to direct the search to a particular searchable
index, such as title (ti) and author indexes (au). The CCL standard
itself doesn't specify a particular set of qualifiers, but it does
suggest a few short-hand notations. You can customize the CCL parser
to support a particular set of qualifiers to reflect the current target
profile. Traditionally, a qualifier would map to a particular
use-attribute within the BIB-1 attribute set. It is also
possible to set other attributes, such as the structure
attribute.
A CCL profile is a set of predefined CCL qualifiers that may be
read from a file or set in the CCL API.
The YAZ client reads its CCL qualifiers from a file named
default.bib. There are four types of
lines in a CCL profile: qualifier specification,
qualifier alias, comments and directives.
Qualifier specification
A qualifier specification is of the form:
qualifier-name
[attributeset,]type=val
[attributeset,]type=val ...
where qualifier-name is the name of the
qualifier to be used (eg. ti),
type is attribute type in the attribute
set (Bib-1 is used if no attribute set is given) and
val is attribute value.
The type can be specified as an
integer or as it be specified either as a single-letter:
u for use,
r for relation,p for position,
s for structure,t for truncation
or c for completeness.
The attributes for the special qualifier name term
are used when no CCL qualifier is given in a query.
Common Bib-1 attributesTypeDescriptionu=value
Use attribute (1). Common use attributes are
1 Personal-name, 4 Title, 7 ISBN, 8 ISSN, 30 Date,
62 Subject, 1003 Author), 1016 Any. Specify value
as an integer.
r=value
Relation attribute (2). Common values are
1 <, 2 <=, 3 =, 4 >=, 5 >, 6 <>,
100 phonetic, 101 stem, 102 relevance, 103 always matches.
p=value
Position attribute (3). Values: 1 first in field, 2
first in any subfield, 3 any position in field.
s=value
Structure attribute (4). Values: 1 phrase, 2 word,
3 key, 4 year, 5 date, 6 word list, 100 date (un),
101 name (norm), 102 name (un), 103 structure, 104 urx,
105 free-form-text, 106 document-text, 107 local-number,
108 string, 109 numeric string.
t=value
Truncation attribute (5). Values: 1 right, 2 left,
3 left& right, 100 none, 101 process #, 102 regular-1,
103 regular-2, 104 CCL.
c=value
Completeness attribute (6). Values: 1 incomplete subfield,
2 complete subfield, 3 complete field.
Refer to the complete
list of Bib-1 attributes
It is also possible to specify non-numeric attribute values,
which are used in combination with certain types.
The special combinations are:
Special attribute combosNameDescriptions=pw
The structure is set to either word or phrase depending
on the number of tokens in a term (phrase-word).
s=al
Each token in the term is ANDed. (and-list).
This does not set the structure at all.
s=ol
Each token in the term is ORed. (or-list).
This does not set the structure at all.
r=o
Allows ranges and the operators greather-than, less-than, ...
equals.
This sets Bib-1 relation attribute accordingly (relation
ordered). A query construct is only treated as a range if
dash is used and that is surrounded by white-space. So
-1980 is treated as term
"-1980" not <= 1980.
If - 1980 is used, however, that is
treated as a range.
r=r
Similar to r=o but assumes that terms
are non-negative (not prefixed with -).
Thus, a dash will always be treated as a range.
The construct 1980-1990 is
treated as a range with r=r but as a
single term "1980-1990" with
r=o. The special attribute
r=r is available in YAZ 2.0.24 or later.
t=l
Allows term to be left-truncated.
If term is of the form ?x, the resulting
Type-1 term is x and truncation is left.
t=r
Allows term to be right-truncated.
If term is of the form x?, the resulting
Type-1 term is x and truncation is right.
t=n
If term is does not include ?, the
truncation attribute is set to none (100).
t=b
Allows term to be both left&right truncated.
If term is of the form ?x?, the
resulting term is x and trunctation is
set to both left&right.
CCL profile
Consider the following definition:
ti u=4 s=1
au u=1 s=1
term s=105
ranked r=102
date u=30 r=o
ti and au both set
structure attribute to phrase (s=1).
ti
sets the use-attribute to 4. au sets the
use-attribute to 1.
When no qualifiers are used in the query the structure-attribute is
set to free-form-text (105) (rule for term).
The date sets the relation attribute to
the relation used in the CCL query and sets the use attribute
to 30 (Bib-1 Date).
You can combine attributes. To Search for "ranked title" you
can do
ti,ranked=knuth computer
which will set relation=ranked, use=title, structure=phrase.
Query
date > 1980
is a valid query. But
ti > 1980
is invalid.
Qualifier alias
A qualifier alias is of the form:
qq1q2 ..
which declares q to
be an alias for q1,
q2... such that the CCL
query q=x is equivalent to
q1=x or q2=x or ....
Comments
Lines with white space or lines that begin with
character # are treated as comments.
Directives
Directive specifications takes the form
@directivevalue
CCL directivesNameDescriptionDefaulttruncationTruncation character?fieldSpecifies how multiple fields are to be
combined. There are two modes: or:
multiple qualifier fields are ORed,
merge: attributes for the qualifier
fields are merged and assigned to one term.
mergecaseSpecificies if CCL operatores and qualifiers should be
compared with case sensitivity or not. Specify 0 for
case sensitive; 1 for case insensitive.0andSpecifies token for CCL operator AND.andorSpecifies token for CCL operator OR.ornotSpecifies token for CCL operator NOT.notsetSpecifies token for CCL operator SET.set
CCL API
All public definitions can be found in the header file
ccl.h. A profile identifier is of type
CCL_bibset. A profile must be created with the call
to the function ccl_qual_mk which returns a profile
handle of type CCL_bibset.
To read a file containing qualifier definitions the function
ccl_qual_file may be convenient. This function
takes an already opened FILE handle pointer as
argument along with a CCL_bibset handle.
To parse a simple string with a FIND query use the function
struct ccl_rpn_node *ccl_find_str (CCL_bibset bibset, const char *str,
int *error, int *pos);
which takes the CCL profile (bibset) and query
(str) as input. Upon successful completion the RPN
tree is returned. If an error occur, such as a syntax error, the integer
pointed to by error holds the error code and
pos holds the offset inside query string in which
the parsing failed.
An English representation of the error may be obtained by calling
the ccl_err_msg function. The error codes are
listed in ccl.h.
To convert the CCL RPN tree (type
struct ccl_rpn_node *)
to the Z_RPNQuery of YAZ the function ccl_rpn_query
must be used. This function which is part of YAZ is implemented in
yaz-ccl.c.
After calling this function the CCL RPN tree is probably no longer
needed. The ccl_rpn_delete destroys the CCL RPN tree.
A CCL profile may be destroyed by calling the
ccl_qual_rm function.
The token names for the CCL operators may be changed by setting the
globals (all type char *)
ccl_token_and, ccl_token_or,
ccl_token_not and ccl_token_set.
An operator may have aliases, i.e. there may be more than one name for
the operator. To do this, separate each alias with a space character.
CQLCQL
- Common Query Language - was defined for the
SRU protocol.
In many ways CQL has a similar syntax to CCL.
The objective of CQL is different. Where CCL aims to be
an end-user language, CQL is the protocol
query language for SRU.
If you are new to CQL, read the
Gentle Introduction.
The CQL parser in &yaz; provides the following:
It parses and validates a CQL query.
It generates a C structure that allows you to convert
a CQL query to some other query language, such as SQL.
The parser converts a valid CQL query to PQF, thus providing a
way to use CQL for both SRU servers and Z39.50 targets at the
same time.
The parser converts CQL to
XCQL.
XCQL is an XML representation of CQL.
XCQL is part of the SRU specification. However, since SRU
supports CQL only, we don't expect XCQL to be widely used.
Furthermore, CQL has the advantage over XCQL that it is
easy to read.
CQL parsing
A CQL parser is represented by the CQL_parser
handle. Its contents should be considered &yaz; internal (private).
#include <yaz/cql.h>
typedef struct cql_parser *CQL_parser;
CQL_parser cql_parser_create(void);
void cql_parser_destroy(CQL_parser cp);
A parser is created by cql_parser_create and
is destroyed by cql_parser_destroy.
To parse a CQL query string, the following function
is provided:
int cql_parser_string(CQL_parser cp, const char *str);
A CQL query is parsed by the cql_parser_string
which takes a query str.
If the query was valid (no syntax errors), then zero is returned;
otherwise -1 is returned to indicate a syntax error.
int cql_parser_stream(CQL_parser cp,
int (*getbyte)(void *client_data),
void (*ungetbyte)(int b, void *client_data),
void *client_data);
int cql_parser_stdio(CQL_parser cp, FILE *f);
The functions cql_parser_stream and
cql_parser_stdio parses a CQL query
- just like cql_parser_string.
The only difference is that the CQL query can be
fed to the parser in different ways.
The cql_parser_stream uses a generic
byte stream as input. The cql_parser_stdio
uses a FILE handle which is opened for reading.
CQL tree
The the query string is valid, the CQL parser
generates a tree representing the structure of the
CQL query.
struct cql_node *cql_parser_result(CQL_parser cp);
cql_parser_result returns the
a pointer to the root node of the resulting tree.
Each node in a CQL tree is represented by a
struct cql_node.
It is defined as follows:
#define CQL_NODE_ST 1
#define CQL_NODE_BOOL 2
struct cql_node {
int which;
union {
struct {
char *index;
char *index_uri;
char *term;
char *relation;
char *relation_uri;
struct cql_node *modifiers;
} st;
struct {
char *value;
struct cql_node *left;
struct cql_node *right;
struct cql_node *modifiers;
} boolean;
} u;
};
There are two node types: search term (ST) and boolean (BOOL).
A modifier is treated as a search term too.
The search term node has five members:
index: index for search term.
If an index is unspecified for a search term,
index will be NULL.
index_uri: index URi for search term
or NULL if none could be resolved for the index.
term: the search term itself.
relation: relation for search term.
relation_uri: relation URI for search term.
modifiers: relation modifiers for search
term. The modifiers list itself of cql_nodes
each of type ST.
The boolean node represents both and,
or, not as well as
proximity.
left and right: left
- and right operand respectively.
modifiers: proximity arguments.
CQL to PQF conversion
Conversion to PQF (and Z39.50 RPN) is tricky by the fact
that the resulting RPN depends on the Z39.50 target
capabilities (combinations of supported attributes).
In addition, the CQL and SRU operates on index prefixes
(URI or strings), whereas the RPN uses Object Identifiers
for attribute sets.
The CQL library of &yaz; defines a cql_transform_t
type. It represents a particular mapping between CQL and RPN.
This handle is created and destroyed by the functions:
cql_transform_t cql_transform_open_FILE (FILE *f);
cql_transform_t cql_transform_open_fname(const char *fname);
void cql_transform_close(cql_transform_t ct);
The first two functions create a tranformation handle from
either an already open FILE or from a filename respectively.
The handle is destroyed by cql_transform_close
in which case no further reference of the handle is allowed.
When a cql_transform_t handle has been created
you can convert to RPN.
int cql_transform_buf(cql_transform_t ct,
struct cql_node *cn, char *out, int max);
This function converts the CQL tree cn
using handle ct.
For the resulting PQF, you supply a buffer out
which must be able to hold at at least max
characters.
If conversion failed, cql_transform_buf
returns a non-zero SRU error code; otherwise zero is returned
(conversion successful). The meanings of the numeric error
codes are listed in the SRU specifications at
If conversion fails, more information can be obtained by calling
int cql_transform_error(cql_transform_t ct, char **addinfop);
This function returns the most recently returned numeric
error-code and sets the string-pointer at
*addinfop to point to a string containing
additional information about the error that occurred: for
example, if the error code is 15 (``Illegal or unsupported context
set''), the additional information is the name of the requested
context set that was not recognised.
The SRU error-codes may be translated into brief human-readable
error messages using
const char *cql_strerror(int code);
If you wish to be able to produce a PQF result in a different
way, there are two alternatives.
void cql_transform_pr(cql_transform_t ct,
struct cql_node *cn,
void (*pr)(const char *buf, void *client_data),
void *client_data);
int cql_transform_FILE(cql_transform_t ct,
struct cql_node *cn, FILE *f);
The former function produces output to a user-defined
output stream. The latter writes the result to an already
open FILE.
Specification of CQL to RPN mappings
The file supplied to functions
cql_transform_open_FILE,
cql_transform_open_fname follows
a structure found in many Unix utilities.
It consists of mapping specifications - one per line.
Lines starting with # are ignored (comments).
Each line is of the form
CQL pattern = RPN equivalent
An RPN pattern is a simple attribute list. Each attribute pair
takes the form:
[set] type=value
The attribute set is optional.
The type is the attribute type,
value the attribute value.
The following CQL patterns are recognized:
index.set.name
This pattern is invoked when a CQL index, such as
dc.title is converted. set
and name are the context set and index
name respectively.
Typically, the RPN specifies an equivalent use attribute.
For terms not bound by an index the pattern
index.cql.serverChoice is used.
Here, the prefix cql is defined as
http://www.loc.gov/zing/cql/cql-indexes/v1.0/.
If this pattern is not defined, the mapping will fail.
qualifier.set.name
(DEPRECATED)
For backwards compatibility, this is recognised as a synonym of
index.set.namerelation.relation
This pattern specifies how a CQL relation is mapped to RPN.
pattern is name of relation
operator. Since = is used as
separator between CQL pattern and RPN, CQL relations
including = cannot be
used directly. To avoid a conflict, the names
ge,
eq,
le,
must be used for CQL operators, greater-than-or-equal,
equal, less-than-or-equal respectively.
The RPN pattern is supposed to include a relation attribute.
For terms not bound by a relation, the pattern
relation.scr is used. If the pattern
is not defined, the mapping will fail.
The special pattern, relation.* is used
when no other relation pattern is matched.
relationModifier.mod
This pattern specifies how a CQL relation modifier is mapped to RPN.
The RPN pattern is usually a relation attribute.
structure.type
This pattern specifies how a CQL structure is mapped to RPN.
Note that this CQL pattern is somewhat to similar to
CQL pattern relation.
The type is a CQL relation.
The pattern, structure.* is used
when no other structure pattern is matched.
Usually, the RPN equivalent specifies a structure attribute.
position.type
This pattern specifies how the anchor (position) of
CQL is mapped to RPN.
The type is one
of first, any,
last, firstAndLast.
The pattern, position.* is used
when no other position pattern is matched.
set.prefix
This specification defines a CQL context set for a given prefix.
The value on the right hand side is the URI for the set -
not RPN. All prefixes used in
index patterns must be defined this way.
CQL to RPN mapping file
This simple file defines two context sets, three indexes and three
relations, a position pattern and a default structure.
With the mappings above, the CQL query
computer
is converted to the PQF:
@attr 1=1016 @attr 2=3 @attr 4=1 @attr 3=3 @attr 6=1 "computer"
by rules index.cql.serverChoice,
relation.scr, structure.*,
position.any.
CQL query
computer^
is rejected, since position.right is
undefined.
CQL query
>my = "http://www.loc.gov/zing/cql/dc-indexes/v1.0/" my.title = x
is converted to
@attr 1=4 @attr 2=3 @attr 4=1 @attr 3=3 @attr 6=1 "x"
CQL to XCQL conversion
Conversion from CQL to XCQL is trivial and does not
require a mapping to be defined.
There three functions to choose from depending on the
way you wish to store the resulting output (XML buffer
containing XCQL).
int cql_to_xml_buf(struct cql_node *cn, char *out, int max);
void cql_to_xml(struct cql_node *cn,
void (*pr)(const char *buf, void *client_data),
void *client_data);
void cql_to_xml_stdio(struct cql_node *cn, FILE *f);
Function cql_to_xml_buf converts
to XCQL and stores result in a user supplied buffer of a given
max size.
cql_to_xml writes the result in
a user defined output stream.
cql_to_xml_stdio writes to a
a file.
Object Identifiers
The basic YAZ representation of an OID is an array of integers,
terminated with the value -1. The &odr; module provides two
utility-functions to create and copy this type of data elements:
Odr_oid *odr_getoidbystr(ODR o, char *str);
Creates an OID based on a string-based representation using dots (.)
to separate elements in the OID.
Odr_oid *odr_oiddup(ODR odr, Odr_oid *o);
Creates a copy of the OID referenced by the o
parameter.
Both functions take an &odr; stream as parameter. This stream is used to
allocate memory for the data elements, which is released on a
subsequent call to odr_reset() on that stream.
The OID module provides a higher-level representation of the
family of object identifiers which describe the Z39.50 protocol and its
related objects. The definition of the module interface is given in
the oid.h file.
The interface is mainly based on the oident structure.
The definition of this structure looks like this:
typedef struct oident
{
oid_proto proto;
oid_class oclass;
oid_value value;
int oidsuffix[OID_SIZE];
char *desc;
} oident;
The proto field takes one of the values
PROTO_Z3950
PROTO_GENERAL
Use PROTO_Z3950 for Z39.50 Object Identifers,
PROTO_GENERAL for other types (such as
those associated with ILL).
The oclass field takes one of the values
CLASS_APPCTX
CLASS_ABSYN
CLASS_ATTSET
CLASS_TRANSYN
CLASS_DIAGSET
CLASS_RECSYN
CLASS_RESFORM
CLASS_ACCFORM
CLASS_EXTSERV
CLASS_USERINFO
CLASS_ELEMSPEC
CLASS_VARSET
CLASS_SCHEMA
CLASS_TAGSET
CLASS_GENERAL
corresponding to the OID classes defined by the Z39.50 standard.
Finally, the value field takes one of the values
VAL_APDU
VAL_BER
VAL_BASIC_CTX
VAL_BIB1
VAL_EXP1
VAL_EXT1
VAL_CCL1
VAL_GILS
VAL_WAIS
VAL_STAS
VAL_DIAG1
VAL_ISO2709
VAL_UNIMARC
VAL_INTERMARC
VAL_CCF
VAL_USMARC
VAL_UKMARC
VAL_NORMARC
VAL_LIBRISMARC
VAL_DANMARC
VAL_FINMARC
VAL_MAB
VAL_CANMARC
VAL_SBN
VAL_PICAMARC
VAL_AUSMARC
VAL_IBERMARC
VAL_EXPLAIN
VAL_SUTRS
VAL_OPAC
VAL_SUMMARY
VAL_GRS0
VAL_GRS1
VAL_EXTENDED
VAL_RESOURCE1
VAL_RESOURCE2
VAL_PROMPT1
VAL_DES1
VAL_KRB1
VAL_PRESSET
VAL_PQUERY
VAL_PCQUERY
VAL_ITEMORDER
VAL_DBUPDATE
VAL_EXPORTSPEC
VAL_EXPORTINV
VAL_NONE
VAL_SETM
VAL_SETG
VAL_VAR1
VAL_ESPEC1
again, corresponding to the specific OIDs defined by the standard.
Refer to the
Registry of Z39.50 Object Identifiers for the
whole list.
The desc field contains a brief, mnemonic name for the OID in question.
The function
struct oident *oid_getentbyoid(int *o);
takes as argument an OID, and returns a pointer to a static area
containing an oident structure. You typically use
this function when you receive a PDU containing an OID, and you wish
to branch out depending on the specific OID value.
The function
int *oid_ent_to_oid(struct oident *ent, int *dst);
Takes as argument an oident structure - in which
the proto, oclass/, and
value fields are assumed to be set correctly -
and returns a pointer to a the buffer as given by dst
containing the base
representation of the corresponding OID. The function returns
NULL and the array dst is unchanged if a mapping couldn't place.
The array dst should be at least of size
OID_SIZE.
The oid_ent_to_oid() function can be used whenever
you need to prepare a PDU containing one or more OIDs. The separation of
the protocol element from the remainder of the
OID-description makes it simple to write applications that can
communicate with either Z39.50 or OSI SR-based applications.
The function
oid_value oid_getvalbyname(const char *name);
takes as argument a mnemonic OID name, and returns the
/value field of the first entry in the database that
contains the given name in its desc field.
Three utility functions are provided for translating OIDs'
symbolic names (e.g. Usmarc into OID structures
(int arrays) and strings containing the OID in dotted notation
(e.g. 1.2.840.10003.9.5.1). They are:
int *oid_name_to_oid(oid_class oclass, const char *name, int *oid);
char *oid_to_dotstring(const int *oid, char *oidbuf);
char *oid_name_to_dotstring(oid_class oclass, const char *name, char *oidbuf);
oid_name_to_oid()
translates the specified symbolic name,
interpreted as being of class oclass. (The
class must be specified as many symbolic names exist within
multiple classes - for example, Zthes is the
symbolic name of an attribute set, a schema and a tag-set.) The
sequence of integers representing the OID is written into the
area oid provided by the caller; it is the
caller's responsibility to ensure that this area is large enough
to contain the translated OID. As a convenience, the address of
the buffer (i.e. the value of oid) is
returned.
oid_to_dotstring()
Translates the int-array oid into a dotted
string which is written into the area oidbuf
supplied by the caller; it is the caller's responsibility to
ensure that this area is large enough. The address of the buffer
is returned.
oid_name_to_dotstring()
combines the previous two functions to derive a dotted string
representing the OID specified by oclass and
name, writing it into the buffer passed as
oidbuf and returning its address.
Finally, the module provides the following utility functions, whose
meaning should be obvious:
void oid_oidcpy(int *t, int *s);
void oid_oidcat(int *t, int *s);
int oid_oidcmp(int *o1, int *o2);
int oid_oidlen(int *o);
The OID module has been criticized - and perhaps rightly so
- for needlessly abstracting the
representation of OIDs. Other toolkits use a simple
string-representation of OIDs with good results. In practice, we have
found the interface comfortable and quick to work with, and it is a
simple matter (for what it's worth) to create applications compatible
with both ISO SR and Z39.50. Finally, the use of the
/oident database is by no means mandatory.
You can easily create your own system for representing OIDs, as long
as it is compatible with the low-level integer-array representation
of the ODR module.
Nibble Memory
Sometimes when you need to allocate and construct a large,
interconnected complex of structures, it can be a bit of a pain to
release the associated memory again. For the structures describing the
Z39.50 PDUs and related structures, it is convenient to use the
memory-management system of the &odr; subsystem (see
). However, in some circumstances
where you might otherwise benefit from using a simple nibble memory
management system, it may be impractical to use
odr_malloc() and odr_reset().
For this purpose, the memory manager which also supports the &odr;
streams is made available in the NMEM module. The external interface
to this module is given in the nmem.h file.
The following prototypes are given:
NMEM nmem_create(void);
void nmem_destroy(NMEM n);
void *nmem_malloc(NMEM n, int size);
void nmem_reset(NMEM n);
int nmem_total(NMEM n);
void nmem_init(void);
void nmem_exit(void);
The nmem_create() function returns a pointer to a
memory control handle, which can be released again by
nmem_destroy() when no longer needed.
The function nmem_malloc() allocates a block of
memory of the requested size. A call to nmem_reset()
or nmem_destroy() will release all memory allocated
on the handle since it was created (or since the last call to
nmem_reset(). The function
nmem_total() returns the number of bytes currently
allocated on the handle.
The nibble memory pool is shared amongst threads. POSIX
mutex'es and WIN32 Critical sections are introduced to keep the
module thread safe. Function nmem_init()
initializes the nibble memory library and it is called automatically
the first time the YAZ.DLL is loaded. &yaz; uses
function DllMain to achieve this. You should
not call nmem_init or
nmem_exit unless you're absolute sure what
you're doing. Note that in previous &yaz; versions you'd have to call
nmem_init yourself.
Log
&yaz; has evolved a fairly complex log system which should be useful both
for debugging &yaz; itself, debugging applications that use &yaz;, and for
production use of those applications.
The log functions are declared in header yaz/log.h
and implemented in src/log.c.
Due to name clash with syslog and some math utilities the logging
interface has been modified as of YAZ 2.0.29. The obsolete interface
is still available if in header file yaz/log.h.
The key points of the interface are:
void yaz_log(int level, const char *fmt, ...)
void yaz_log_init(int level, const char *prefix, const char *name);
void yaz_log_init_file(const char *fname);
void yaz_log_init_level(int level);
void yaz_log_init_prefix(const char *prefix);
void yaz_log_time_format(const char *fmt);
void yaz_log_init_max_size(int mx);
int yaz_log_mask_str(const char *str);
int yaz_log_module_level(const char *name);
The reason for the whole log module is the yaz_log
function. It takes a bitmask indicating the log levels, a
printf-like format string, and a variable number of
arguments to log.
The log level is a bit mask, that says on which level(s)
the log entry should be made, and optionally set some behaviour of the
logging. In the most simple cases, it can be one of YLOG_FATAL,
YLOG_DEBUG, YLOG_WARN, YLOG_LOG. Those can be combined with bits
that modify the way the log entry is written:YLOG_ERRNO,
YLOG_NOTIME, YLOG_FLUSH.
Most of the rest of the bits are deprecated, and should not be used. Use
the dynamic log levels instead.
Applications that use &yaz;, should not use the LOG_LOG for ordinary
messages, but should make use of the dynamic loglevel system. This consists
of two parts, defining the loglevel and checking it.
To define the log levels, the (main) program should pass a string to
yaz_log_mask_str to define which log levels are to be
logged. This string should be a comma-separated list of log level names,
and can contain both hard-coded names and dynamic ones. The log level
calculation starts with YLOG_DEFAULT_LEVEL and adds a bit
for each word it meets, unless the word starts with a '-', in which case it
clears the bit. If the string 'none' is found,
all bits are cleared. Typically this string comes from the command-line,
often identified by -v. The
yaz_log_mask_str returns a log level that should be
passed to yaz_log_init_level for it to take effect.
Each module should check what log bits it should be used, by calling
yaz_log_module_level with a suitable name for the
module. The name is cleared from a preceding path and an extension, if any,
so it is quite possible to use __FILE__ for it. If the
name has been passed to yaz_log_mask_str, the routine
returns a non-zero bitmask, which should then be used in consequent calls
to yaz_log. (It can also be tested, so as to avoid unnecessary calls to
yaz_log, in time-critical places, or when the log entry would take time
to construct.)
Yaz uses the following dynamic log levels:
server, session, request, requestdetail for the server
functionality.
zoom for the zoom client api.
ztest for the simple test server.
malloc, nmem, odr, eventl for internal debugging of yaz itself.
Of course, any program using yaz is welcome to define as many new ones, as
it needs.
By default the log is written to stderr, but this can be changed by a call
to yaz_log_init_file or
yaz_log_init. If the log is directed to a file, the
file size is checked at every write, and if it exceeds the limit given in
yaz_log_init_max_size, the log is rotated. The
rotation keeps one old version (with a .1 appended to
the name). The size defaults to 1GB. Setting it to zero will disable the
rotation feature.
A typical yaz-log looks like this
13:23:14-23/11 yaz-ztest(1) [session] Starting session from tcp:127.0.0.1 (pid=30968)
13:23:14-23/11 yaz-ztest(1) [request] Init from 'YAZ' (81) (ver 2.0.28) OK
13:23:17-23/11 yaz-ztest(1) [request] Search Z: @attrset Bib-1 foo OK:7 hits
13:23:22-23/11 yaz-ztest(1) [request] Present: [1] 2+2 OK 2 records returned
13:24:13-23/11 yaz-ztest(1) [request] Close OK
The log entries start with a time stamp. This can be omitted by setting the
YLOG_NOTIME bit in the loglevel. This way automatic tests
can be hoped to produce identical log files, that are easy to diff. The
format of the time stamp can be set with
yaz_log_time_format, which takes a format string just
like strftime.
Next in a log line comes the prefix, often the name of the program. For
yaz-based servers, it can also contain the session number. Then
comes one or more logbits in square brackets, depending on the logging
level set by yaz_log_init_level and the loglevel
passed to yaz_log_init_level. Finally comes the format
string and additional values passed to yaz_log
The log level YLOG_LOGLVL, enabled by the string
loglevel, will log all the log-level affecting
operations. This can come in handy if you need to know what other log
levels would be useful. Grep the logfile for [loglevel].
The log system is almost independent of the rest of &yaz;, the only
important dependence is of nmem, and that only for
using the semaphore definition there.
The dynamic log levels and log rotation were introduced in &yaz; 2.0.28. At
the same time, the log bit names were changed from
LOG_something to YLOG_something,
to avoid collision with syslog.h.
MARC
YAZ provides a fast utility that decodes MARC records and
encodes to a varity of output formats. The MARC records must
be encoded in ISO2709.
/* create handler */
yaz_marc_t yaz_marc_create(void);
/* destroy */
void yaz_marc_destroy(yaz_marc_t mt);
/* set XML mode YAZ_MARC_LINE, YAZ_MARC_SIMPLEXML, ... */
void yaz_marc_xml(yaz_marc_t mt, int xmlmode);
#define YAZ_MARC_LINE 0
#define YAZ_MARC_SIMPLEXML 1
#define YAZ_MARC_OAIMARC 2
#define YAZ_MARC_MARCXML 3
#define YAZ_MARC_ISO2709 4
#define YAZ_MARC_XCHANGE 5
/* supply iconv handle for character set conversion .. */
void yaz_marc_iconv(yaz_marc_t mt, yaz_iconv_t cd);
/* set debug level, 0=none, 1=more, 2=even more, .. */
void yaz_marc_debug(yaz_marc_t mt, int level);
/* decode MARC in buf of size bsize. Returns >0 on success; <=0 on failure.
On success, result in *result with size *rsize. */
int yaz_marc_decode_buf (yaz_marc_t mt, const char *buf, int bsize,
char **result, int *rsize);
/* decode MARC in buf of size bsize. Returns >0 on success; <=0 on failure.
On success, result in WRBUF */
int yaz_marc_decode_wrbuf (yaz_marc_t mt, const char *buf,
int bsize, WRBUF wrbuf);
]]>
A MARC conversion handle must be created by using
yaz_marc_create and destroyed
by calling yaz_marc_destroy.
All other function operate on a yaz_marc_t handle.
The output is specified by a call to yaz_marc_xml.
The xmlmode must be one of
YAZ_MARC_LINE
A simple line-by-line format suitable for display but not
recommend for further (machine) processing.
YAZ_MARC_MARXML
The resulting record is converted to MARCXML.
YAZ_MARC_ISO2709
The resulting record is converted to ISO2709 (MARC).
The actual conversion functions are
yaz_marc_decode_buf and
yaz_marc_decode_wrbuf which decodes and encodes
a MARC record. The former function operates on simple buffers, the
stores the resulting record in a WRBUF handle (WRBUF is a simple string
type).
Display of MARC record
The followint program snippet illustrates how the MARC API may
be used to convert a MARC record to the line-by-line format: