Supporting Tools

Supporting Tools In support of the service API - primarily the ASN module, which provides the pro-grammatic interface to the Z39.50 APDUs, &yaz; contains a collection of tools that support the development of applications. Query Syntax Parsers Since the type-1 (RPN) query structure has no direct, useful string representation, every origin application needs to provide some form of mapping from a local query notation or representation to a Z_RPNQuery structure. Some programmers will prefer to construct the query manually, perhaps using odr_malloc() to simplify memory management. The &yaz; distribution includes three separate, query-generating tools that may be of use to you. Prefix Query Format Since RPN or reverse polish notation is really just a fancy way of describing a suffix notation format (operator follows operands), it would seem that the confusion is total when we now introduce a prefix notation for RPN. The reason is one of simple laziness - it's somewhat simpler to interpret a prefix format, and this utility was designed for maximum simplicity, to provide a baseline representation for use in simple test applications and scripting environments (like Tcl). The demonstration client included with YAZ uses the PQF. The PQF have been adopted by other parties developing Z39.50 software. It is often referred to as Prefix Query Notation - PQN. The PQF is defined by the pquery module in the YAZ library. There are two sets of function that have similar behavior. First set operates on a PQF parser handle, second set doesn't. First set set of functions are more flexible than the second set. Second set is obsolete and is only provided to ensure backwards compatibility. First set of functions all operate on a PQF parser handle: #include <yaz/pquery.h> YAZ_PQF_Parser yaz_pqf_create(void); void yaz_pqf_destroy(YAZ_PQF_Parser p); Z_RPNQuery *yaz_pqf_parse(YAZ_PQF_Parser p, ODR o, const char *qbuf); Z_AttributesPlusTerm *yaz_pqf_scan(YAZ_PQF_Parser p, ODR o, Odr_oid **attributeSetId, const char *qbuf); int yaz_pqf_error(YAZ_PQF_Parser p, const char **msg, size_t *off); A PQF parser is created and destructed by functions yaz_pqf_create and yaz_pqf_destroy respectively. Function yaz_pqf_parse parses query given by string qbuf. If parsing was successful, a Z39.50 RPN Query is returned which is created using ODR stream o. If parsing failed, a NULL pointer is returned. Function yaz_pqf_scan takes a scan query in qbuf. If parsing was successful, the function returns attributes plus term pointer and modifies attributeSetId to hold attribute set for the scan request - both allocated using ODR stream o. If parsing failed, yaz_pqf_scan returns a NULL pointer. Error information for bad queries can be obtained by a call to yaz_pqf_error which returns an error code and modifies *msg to point to an error description, and modifies *off to the offset within last query were parsing failed. The second set of functions are declared as follows: #include <yaz/pquery.h> Z_RPNQuery *p_query_rpn(ODR o, oid_proto proto, const char *qbuf); Z_AttributesPlusTerm *p_query_scan(ODR o, oid_proto proto, Odr_oid **attributeSetP, const char *qbuf); int p_query_attset(const char *arg); The function p_query_rpn() takes as arguments an &odr; stream (see section The ODR Module) to provide a memory source (the structure created is released on the next call to odr_reset() on the stream), a protocol identifier (one of the constants PROTO_Z3950 and PROTO_SR), an attribute set reference, and finally a null-terminated string holding the query string. If the parse went well, p_query_rpn() returns a pointer to a Z_RPNQuery structure which can be placed directly into a Z_SearchRequest. If parsing failed, due to syntax error, a NULL pointer is returned. The p_query_attset specifies which attribute set to use if the query doesn't specify one by the @attrset operator. The p_query_attset returns 0 if the argument is a valid attribute set specifier; otherwise the function returns -1. The grammar of the PQF is as follows: query ::= top-set query-struct. top-set ::= [ '@attrset' string ] query-struct ::= attr-spec | simple | complex | '@term' term-type query attr-spec ::= '@attr' [ string ] string query-struct complex ::= operator query-struct query-struct. operator ::= '@and' | '@or' | '@not' | '@prox' proximity. simple ::= result-set | term. result-set ::= '@set' string. term ::= string. proximity ::= exclusion distance ordered relation which-code unit-code. exclusion ::= '1' | '0' | 'void'. distance ::= integer. ordered ::= '1' | '0'. relation ::= integer. which-code ::= 'known' | 'private' | integer. unit-code ::= integer. term-type ::= 'general' | 'numeric' | 'string' | 'oid' | 'datetime' | 'null'. You will note that the syntax above is a fairly faithful representation of RPN, except for the Attribute, which has been moved a step away from the term, allowing you to associate one or more attributes with an entire query structure. The parser will automatically apply the given attributes to each term as required. The @attr operator is followed by an attribute specification (attr-spec above). The specification consists of an optional attribute set, an attribute type-value pair and a sub-query. The attribute type-value pair is packed in one string: an attribute type, an equals sign, and an attribute value, like this: @attr 1=1003. The type is always an integer but the value may be either an integer or a string (if it doesn't start with a digit character). A string attribute-value is encoded as a Type-1 ``complex'' attribute with the list of values containing the single string specified, and including no semantic indicators. Version 3 of the Z39.50 specification defines various encoding of terms. Use @term type string, where type is one of: general, numeric or string (for InternationalString). If no term type has been given, the general form is used. This is the only encoding allowed in both versions 2 and 3 of the Z39.50 standard. Using Proximity Operators with PQF This is an advanced topic, describing how to construct queries that make very specific requirements on the relative location of their operands. You may wish to skip this section and go straight to the example PQF queries. Most Z39.50 servers do not support proximity searching, or support only a small subset of the full functionality that can be expressed using the PQF proximity operator. Be aware that the ability to express a query in PQF is no guarantee that any given server will be able to execute it. The proximity operator @prox is a special and more restrictive version of the conjunction operator @and. Its semantics are described in section 3.7.2 (Proximity) of Z39.50 the standard itself, which can be read on-line at In PQF, the proximity operation is represented by a sequence of the form @prox exclusion distance ordered relation which-code unit-code in which the meanings of the parameters are as described in in the standard, and they can take the following values: exclusion 0 = false (i.e. the proximity condition specified by the remaining parameters must be satisfied) or 1 = true (the proximity condition specified by the remaining parameters must not be satisifed). distance An integer specifying the difference between the locations of the operands: e.g. two adjacent words would have distance=1 since their locations differ by one unit. ordered 1 = ordered (the operands must occur in the order the query specifies them) or 0 = unordered (they may appear in either order). relation Recognised values are 1 (lessThan), 2 (lessThanOrEqual), 3 (equal), 4 (greaterThanOrEqual), 5 (greaterThan) and 6 (notEqual). which-code known or k (the unit-code parameter is taken from the well-known list of alternatives described in below) or private or p (the unit-code paramater has semantics specific to an out-of-band agreement such as a profile). unit-code If the which-code parameter is known then the recognised values are 1 (character), 2 (word), 3 (sentence), 4 (paragraph), 5 (section), 6 (chapter), 7 (document), 8 (element), 9 (subelement), 10 (elementType) and 11 (byte). If which-code is private then the acceptable values are determined by the profile. (The numeric values of the relation and well-known unit-code parameters are taken straight from the ASN.1 of the proximity structure in the standard.) PQF queries PQF queries using simple terms dylan "bob dylan" PQF boolean operators @or "dylan" "zimmerman" @and @or dylan zimmerman when @and when @or dylan zimmerman PQF references to result sets @set Result-1 @and @set seta @set setb Attributes for terms @attr 1=4 computer @attr 1=4 @attr 4=1 "self portrait" @attrset exp1 @attr 1=1 CategoryList @attr gils 1=2008 Copenhagen @attr 1=/book/title computer PQF Proximity queries @prox 0 3 1 2 k 2 dylan zimmerman Here the parameters 0, 3, 1, 2, k and 2 represent exclusion, distance, ordered, relation, which-code and unit-code, in that order. So: exclusion = 0: the proximity condition must hold distance = 3: the terms must be three units apart ordered = 1: they must occur in the order they are specified relation = 2: lessThanOrEqual (to the distance of 3 units) which-code is ``known'', so the standard unit-codes are used unit-code = 2: word. So the whole proximity query means that the words dylan and zimmerman must both occur in the record, in that order, differing in position by three or fewer words (i.e. with two or fewer words between them.) The query would find ``Bob Dylan, aka. Robert Zimmerman'', but not ``Bob Dylan, born as Robert Zimmerman'' since the distance in this case is four. PQF specification of search term type @term string "a UTF-8 string, maybe?" PQF mixed queries @or @and bob dylan @set Result-1 @attr 4=1 @and @attr 1=1 "bob dylan" @attr 1=4 "slow train coming" @and @attr 2=4 @attr gils 1=2038 -114 @attr 2=2 @attr gils 1=2039 -109 The last of these examples is a spatial search: in the GILS attribute set, access point 2038 indicates West Bounding Coordinate and 2030 indicates East Bounding Coordinate, so the query is for areas extending from -114 degrees to no more than -109 degrees. CCL Not all users enjoy typing in prefix query structures and numerical attribute values, even in a minimalistic test client. In the library world, the more intuitive Common Command Language - CCL (ISO 8777) has enjoyed some popularity - especially before the widespread availability of graphical interfaces. It is still useful in applications where you for some reason or other need to provide a symbolic language for expressing boolean query structures. CCL Syntax The CCL parser obeys the following grammar for the FIND argument. The syntax is annotated by in the lines prefixed by --. CCL-Find ::= CCL-Find Op Elements | Elements. Op ::= "and" | "or" | "not" -- The above means that Elements are separated by boolean operators. Elements ::= '(' CCL-Find ')' | Set | Terms | Qualifiers Relation Terms | Qualifiers Relation '(' CCL-Find ')' | Qualifiers '=' string '-' string -- Elements is either a recursive definition, a result set reference, a -- list of terms, qualifiers followed by terms, qualifiers followed -- by a recursive definition or qualifiers in a range (lower - upper). Set ::= 'set' = string -- Reference to a result set Terms ::= Terms Prox Term | Term -- Proximity of terms. Term ::= Term string | string -- This basically means that a term may include a blank Qualifiers ::= Qualifiers ',' string | string -- Qualifiers is a list of strings separated by comma Relation ::= '=' | '>=' | '<=' | '<>' | '>' | '<' -- Relational operators. This really doesn't follow the ISO8777 -- standard. Prox ::= '%' | '!' -- Proximity operator CCL queries The following queries are all valid: dylan "bob dylan" dylan or zimmerman set=1 (dylan and bob) or set=1 righttrunc? "notrunc?" singlechar#mask Assuming that the qualifiers ti, au and date are defined we may use: ti=self portrait au=(bob dylan and slow train coming) date>1980 and (ti=((self portrait))) CCL Qualifiers Qualifiers are used to direct the search to a particular searchable index, such as title (ti) and author indexes (au). The CCL standard itself doesn't specify a particular set of qualifiers, but it does suggest a few short-hand notations. You can customize the CCL parser to support a particular set of qualifiers to reflect the current target profile. Traditionally, a qualifier would map to a particular use-attribute within the BIB-1 attribute set. It is also possible to set other attributes, such as the structure attribute. A CCL profile is a set of predefined CCL qualifiers that may be read from a file or set in the CCL API. The YAZ client reads its CCL qualifiers from a file named default.bib. There are four types of lines in a CCL profile: qualifier specification, qualifier alias, comments and directives. Qualifier specification A qualifier specification is of the form: qualifier-name [attributeset,]type=val [attributeset,]type=val ... where qualifier-name is the name of the qualifier to be used (eg. ti), type is attribute type in the attribute set (Bib-1 is used if no attribute set is given) and val is attribute value. The type can be specified as an integer or as it be specified either as a single-letter: u for use, r for relation,p for position, s for structure,t for truncation or c for completeness. The attributes for the special qualifier name term are used when no CCL qualifier is given in a query. Common Bib-1 attributes Type Description u=value Use attribute (1). Common use attributes are 1 Personal-name, 4 Title, 7 ISBN, 8 ISSN, 30 Date, 62 Subject, 1003 Author), 1016 Any. Specify value as an integer. r=value Relation attribute (2). Common values are 1 <, 2 <=, 3 =, 4 >=, 5 >, 6 <>, 100 phonetic, 101 stem, 102 relevance, 103 always matches. p=value Position attribute (3). Values: 1 first in field, 2 first in any subfield, 3 any position in field. s=value Structure attribute (4). Values: 1 phrase, 2 word, 3 key, 4 year, 5 date, 6 word list, 100 date (un), 101 name (norm), 102 name (un), 103 structure, 104 urx, 105 free-form-text, 106 document-text, 107 local-number, 108 string, 109 numeric string. t=value Truncation attribute (5). Values: 1 right, 2 left, 3 left& right, 100 none, 101 process #, 102 regular-1, 103 regular-2, 104 CCL. c=value Completeness attribute (6). Values: 1 incomplete subfield, 2 complete subfield, 3 complete field.

Refer to or the complete list of Bib-1 attributes It is also possible to specify non-numeric attribute values, which are used in combination with certain types. The special combinations are: Special attribute combos Name Description s=pw The structure is set to either word or phrase depending on the number of tokens in a term (phrase-word). s=al Each token in the term is ANDed. (and-list). This does not set the structure at all. s=ol Each token in the term is ORed. (or-list). This does not set the structure at all. s=ag Tokens that appears as phrases (with blank in them) gets structure phrase attached (4=1). Tokens that appear to be words gets structure word attached (4=2). Phrases and words are ANDed. This is a variant of s=al and s=pw, with the main difference that words are not split (with operator AND) but instead kept in one RPN token. This facility appeared in YAZ 4.2.38. r=o Allows ranges and the operators greather-than, less-than, ... equals. This sets Bib-1 relation attribute accordingly (relation ordered). A query construct is only treated as a range if dash is used and that is surrounded by white-space. So -1980 is treated as term "-1980" not <= 1980. If - 1980 is used, however, that is treated as a range. r=r Similar to r=o but assumes that terms are non-negative (not prefixed with -). Thus, a dash will always be treated as a range. The construct 1980-1990 is treated as a range with r=r but as a single term "1980-1990" with r=o. The special attribute r=r is available in YAZ 2.0.24 or later. t=l Allows term to be left-truncated. If term is of the form ?x, the resulting Type-1 term is x and truncation is left. t=r Allows term to be right-truncated. If term is of the form x?, the resulting Type-1 term is x and truncation is right. t=n If term is does not include ?, the truncation attribute is set to none (100). t=b Allows term to be both left&right truncated. If term is of the form ?x?, the resulting term is x and trunctation is set to both left&right. t=x Allows masking anywhere in a term, thus fully supporting # (mask one character) and ? (zero or more of any). If masking is used, trunction is set to 102 (regexp-1 in term) and the term is converted accordingly to a regular expression. t=z Allows masking anywhere in a term, thus fully supporting # (mask one character) and ? (zero or more of any). If masking is used, trunction is set to 104 (Z39.58 in term) and the term is converted accordingly to Z39.58 masking term - actually the same truncation as CCL itself.

CCL profile Consider the following definition: ti u=4 s=1 au u=1 s=1 term s=105 ranked r=102 date u=30 r=o ti and au both set structure attribute to phrase (s=1). ti sets the use-attribute to 4. au sets the use-attribute to 1. When no qualifiers are used in the query the structure-attribute is set to free-form-text (105) (rule for term). The date sets the relation attribute to the relation used in the CCL query and sets the use attribute to 30 (Bib-1 Date). You can combine attributes. To Search for "ranked title" you can do ti,ranked=knuth computer which will set relation=ranked, use=title, structure=phrase. Query date > 1980 is a valid query. But ti > 1980 is invalid. Qualifier alias A qualifier alias is of the form: q q1 q2 .. which declares q to be an alias for q1, q2... such that the CCL query q=x is equivalent to q1=x or q2=x or .... Comments Lines with white space or lines that begin with character # are treated as comments. Directives Directive specifications takes the form @directive value CCL directives Name Description Default truncation Truncation character ? mask Masking character. Requires YAZ 4.2.58 or later # field Specifies how multiple fields are to be combined. There are two modes: or: multiple qualifier fields are ORed, merge: attributes for the qualifier fields are merged and assigned to one term. merge case Specifies if CCL operators and qualifiers should be compared with case sensitivity or not. Specify 1 for case sensitive; 0 for case insensitive. 1 and Specifies token for CCL operator AND. and or Specifies token for CCL operator OR. or not Specifies token for CCL operator NOT. not set Specifies token for CCL operator SET. set

CCL API All public definitions can be found in the header file ccl.h. A profile identifier is of type CCL_bibset. A profile must be created with the call to the function ccl_qual_mk which returns a profile handle of type CCL_bibset. To read a file containing qualifier definitions the function ccl_qual_file may be convenient. This function takes an already opened FILE handle pointer as argument along with a CCL_bibset handle. To parse a simple string with a FIND query use the function struct ccl_rpn_node *ccl_find_str(CCL_bibset bibset, const char *str, int *error, int *pos); which takes the CCL profile (bibset) and query (str) as input. Upon successful completion the RPN tree is returned. If an error occur, such as a syntax error, the integer pointed to by error holds the error code and pos holds the offset inside query string in which the parsing failed. An English representation of the error may be obtained by calling the ccl_err_msg function. The error codes are listed in ccl.h. To convert the CCL RPN tree (type struct ccl_rpn_node *) to the Z_RPNQuery of YAZ the function ccl_rpn_query must be used. This function which is part of YAZ is implemented in yaz-ccl.c. After calling this function the CCL RPN tree is probably no longer needed. The ccl_rpn_delete destroys the CCL RPN tree. A CCL profile may be destroyed by calling the ccl_qual_rm function. The token names for the CCL operators may be changed by setting the globals (all type char *) ccl_token_and, ccl_token_or, ccl_token_not and ccl_token_set. An operator may have aliases, i.e. there may be more than one name for the operator. To do this, separate each alias with a space character. CQL CQL - Common Query Language - was defined for the SRU protocol. In many ways CQL has a similar syntax to CCL. The objective of CQL is different. Where CCL aims to be an end-user language, CQL is the protocol query language for SRU. If you are new to CQL, read the Gentle Introduction. The CQL parser in &yaz; provides the following: It parses and validates a CQL query. It generates a C structure that allows you to convert a CQL query to some other query language, such as SQL. The parser converts a valid CQL query to PQF, thus providing a way to use CQL for both SRU servers and Z39.50 targets at the same time. The parser converts CQL to XCQL. XCQL is an XML representation of CQL. XCQL is part of the SRU specification. However, since SRU supports CQL only, we don't expect XCQL to be widely used. Furthermore, CQL has the advantage over XCQL that it is easy to read. CQL parsing A CQL parser is represented by the CQL_parser handle. Its contents should be considered &yaz; internal (private). #include <yaz/cql.h> typedef struct cql_parser *CQL_parser; CQL_parser cql_parser_create(void); void cql_parser_destroy(CQL_parser cp); A parser is created by cql_parser_create and is destroyed by cql_parser_destroy. To parse a CQL query string, the following function is provided: int cql_parser_string(CQL_parser cp, const char *str); A CQL query is parsed by the cql_parser_string which takes a query str. If the query was valid (no syntax errors), then zero is returned; otherwise -1 is returned to indicate a syntax error. int cql_parser_stream(CQL_parser cp, int (*getbyte)(void *client_data), void (*ungetbyte)(int b, void *client_data), void *client_data); int cql_parser_stdio(CQL_parser cp, FILE *f); The functions cql_parser_stream and cql_parser_stdio parses a CQL query - just like cql_parser_string. The only difference is that the CQL query can be fed to the parser in different ways. The cql_parser_stream uses a generic byte stream as input. The cql_parser_stdio uses a FILE handle which is opened for reading. CQL tree The the query string is valid, the CQL parser generates a tree representing the structure of the CQL query. struct cql_node *cql_parser_result(CQL_parser cp); cql_parser_result returns the a pointer to the root node of the resulting tree. Each node in a CQL tree is represented by a struct cql_node. It is defined as follows: #define CQL_NODE_ST 1 #define CQL_NODE_BOOL 2 #define CQL_NODE_SORT 3 struct cql_node { int which; union { struct { char *index; char *index_uri; char *term; char *relation; char *relation_uri; struct cql_node *modifiers; } st; struct { char *value; struct cql_node *left; struct cql_node *right; struct cql_node *modifiers; } boolean; struct { char *index; struct cql_node *next; struct cql_node *modifiers; struct cql_node *search; } sort; } u; }; There are three node types: search term (ST), boolean (BOOL) and sortby (SORT). A modifier is treated as a search term too. The search term node has five members: index: index for search term. If an index is unspecified for a search term, index will be NULL. index_uri: index URi for search term or NULL if none could be resolved for the index. term: the search term itself. relation: relation for search term. relation_uri: relation URI for search term. modifiers: relation modifiers for search term. The modifiers list itself of cql_nodes each of type ST. The boolean node represents and, or, not + proximity. left and right: left - and right operand respectively. modifiers: proximity arguments. The sort node represents both the SORTBY clause. CQL to PQF conversion Conversion to PQF (and Z39.50 RPN) is tricky by the fact that the resulting RPN depends on the Z39.50 target capabilities (combinations of supported attributes). In addition, the CQL and SRU operates on index prefixes (URI or strings), whereas the RPN uses Object Identifiers for attribute sets. The CQL library of &yaz; defines a cql_transform_t type. It represents a particular mapping between CQL and RPN. This handle is created and destroyed by the functions: cql_transform_t cql_transform_open_FILE (FILE *f); cql_transform_t cql_transform_open_fname(const char *fname); void cql_transform_close(cql_transform_t ct); The first two functions create a tranformation handle from either an already open FILE or from a filename respectively. The handle is destroyed by cql_transform_close in which case no further reference of the handle is allowed. When a cql_transform_t handle has been created you can convert to RPN. int cql_transform_buf(cql_transform_t ct, struct cql_node *cn, char *out, int max); This function converts the CQL tree cn using handle ct. For the resulting PQF, you supply a buffer out which must be able to hold at at least max characters. If conversion failed, cql_transform_buf returns a non-zero SRU error code; otherwise zero is returned (conversion successful). The meanings of the numeric error codes are listed in the SRU specifications at If conversion fails, more information can be obtained by calling int cql_transform_error(cql_transform_t ct, char **addinfop); This function returns the most recently returned numeric error-code and sets the string-pointer at *addinfop to point to a string containing additional information about the error that occurred: for example, if the error code is 15 (``Illegal or unsupported context set''), the additional information is the name of the requested context set that was not recognised. The SRU error-codes may be translated into brief human-readable error messages using const char *cql_strerror(int code); If you wish to be able to produce a PQF result in a different way, there are two alternatives. void cql_transform_pr(cql_transform_t ct, struct cql_node *cn, void (*pr)(const char *buf, void *client_data), void *client_data); int cql_transform_FILE(cql_transform_t ct, struct cql_node *cn, FILE *f); The former function produces output to a user-defined output stream. The latter writes the result to an already open FILE. Specification of CQL to RPN mappings The file supplied to functions cql_transform_open_FILE, cql_transform_open_fname follows a structure found in many Unix utilities. It consists of mapping specifications - one per line. Lines starting with # are ignored (comments). Each line is of the form CQL pattern = RPN equivalent An RPN pattern is a simple attribute list. Each attribute pair takes the form: [set] type=value The attribute set is optional. The type is the attribute type, value the attribute value. The character * (asterisk) has special meaning when used in the RPN pattern. Each occurrence of * is substituted with the CQL matching name (index, relation, qualifier etc). This facility can be used to copy a CQL name verbatim to the RPN result. The following CQL patterns are recognized: index.set.name This pattern is invoked when a CQL index, such as dc.title is converted. set and name are the context set and index name respectively. Typically, the RPN specifies an equivalent use attribute. For terms not bound by an index the pattern index.cql.serverChoice is used. Here, the prefix cql is defined as http://www.loc.gov/zing/cql/cql-indexes/v1.0/. If this pattern is not defined, the mapping will fail. The pattern, index.set.* is used when no other index pattern is matched. qualifier.set.name (DEPRECATED) For backwards compatibility, this is recognised as a synonym of index.set.name relation.relation This pattern specifies how a CQL relation is mapped to RPN. pattern is name of relation operator. Since = is used as separator between CQL pattern and RPN, CQL relations including = cannot be used directly. To avoid a conflict, the names ge, eq, le, must be used for CQL operators, greater-than-or-equal, equal, less-than-or-equal respectively. The RPN pattern is supposed to include a relation attribute. For terms not bound by a relation, the pattern relation.scr is used. If the pattern is not defined, the mapping will fail. The special pattern, relation.* is used when no other relation pattern is matched. relationModifier.mod This pattern specifies how a CQL relation modifier is mapped to RPN. The RPN pattern is usually a relation attribute. structure.type This pattern specifies how a CQL structure is mapped to RPN. Note that this CQL pattern is somewhat to similar to CQL pattern relation. The type is a CQL relation. The pattern, structure.* is used when no other structure pattern is matched. Usually, the RPN equivalent specifies a structure attribute. position.type This pattern specifies how the anchor (position) of CQL is mapped to RPN. The type is one of first, any, last, firstAndLast. The pattern, position.* is used when no other position pattern is matched. set.prefix This specification defines a CQL context set for a given prefix. The value on the right hand side is the URI for the set - not RPN. All prefixes used in index patterns must be defined this way. set This specification defines a default CQL context set for index names. The value on the right hand side is the URI for the set. CQL to RPN mapping file This simple file defines two context sets, three indexes and three relations, a position pattern and a default structure. With the mappings above, the CQL query computer is converted to the PQF: @attr 1=1016 @attr 2=3 @attr 4=1 @attr 3=3 @attr 6=1 "computer" by rules index.cql.serverChoice, relation.scr, structure.*, position.any. CQL query computer^ is rejected, since position.right is undefined. CQL query >my = "http://www.loc.gov/zing/cql/dc-indexes/v1.0/" my.title = x is converted to @attr 1=4 @attr 2=3 @attr 4=1 @attr 3=3 @attr 6=1 "x" CQL to RPN string attributes In this example we allow any index to be passed to RPN as a use attribute. The http://bogus/rpn context set is also the default so we can make queries such as title = a which is converted to @attr 2=3 @attr 4=1 @attr 3=3 @attr 1=title "a" CQL to RPN using Bath Profile The file etc/pqf.properties has mappings from the Bath Profile and Dublin Core to RPN. If YAZ is installed as a package it's usually located in /usr/share/yaz/etc and part of the development package, such as libyaz-dev. CQL to XCQL conversion Conversion from CQL to XCQL is trivial and does not require a mapping to be defined. There three functions to choose from depending on the way you wish to store the resulting output (XML buffer containing XCQL). int cql_to_xml_buf(struct cql_node *cn, char *out, int max); void cql_to_xml(struct cql_node *cn, void (*pr)(const char *buf, void *client_data), void *client_data); void cql_to_xml_stdio(struct cql_node *cn, FILE *f); Function cql_to_xml_buf converts to XCQL and stores result in a user supplied buffer of a given max size. cql_to_xml writes the result in a user defined output stream. cql_to_xml_stdio writes to a a file. PQF to CQL conversion Conversion from PQF to CQL is offered by the two functions shown below. The former uses a generic stream for result. The latter puts result in a WRBUF (string container). #include <yaz/rpn2cql.h> int cql_transform_rpn2cql_stream(cql_transform_t ct, void (*pr)(const char *buf, void *client_data), void *client_data, Z_RPNQuery *q); int cql_transform_rpn2cql_wrbuf(cql_transform_t ct, WRBUF w, Z_RPNQuery *q); The configuration is the same as used in CQL to PQF conversions. Object Identifiers The basic YAZ representation of an OID is an array of integers, terminated with the value -1. This integer is of type Odr_oid. Fundamental OID operations and the type Odr_oid are defined in yaz/oid_util.h. An OID can either be declared as a automatic variable or it can allocated using the memory utilities or ODR/NMEM. It's guaranteed that an OID can fit in OID_SIZE integers. Create OID on stack We can create an OID for the Bib-1 attribute set with: Odr_oid bib1[OID_SIZE]; bib1[0] = 1; bib1[1] = 2; bib1[2] = 840; bib1[3] = 10003; bib1[4] = 3; bib1[5] = 1; bib1[6] = -1; And OID may also be filled from a string-based representation using dots (.). This is achieved by function int oid_dotstring_to_oid(const char *name, Odr_oid *oid); This functions returns 0 if name could be converted; -1 otherwise. Using oid_oiddotstring_to_oid We can fill the Bib-1 attribute set OID easier with: Odr_oid bib1[OID_SIZE]; oid_oiddotstring_to_oid("1.2.840.10003.3.1", bib1); We can also allocate an OID dynamically on a ODR stream with: Odr_oid *odr_getoidbystr(ODR o, const char *str); This creates an OID from string-based representation using dots. This function take an &odr; stream as parameter. This stream is used to allocate memory for the data elements, which is released on a subsequent call to odr_reset() on that stream. Using odr_getoidbystr We can create a OID for the Bib-1 attribute set with: Odr_oid *bib1 = odr_getoidbystr(odr, "1.2.840.10003.3.1"); The function char *oid_oid_to_dotstring(const Odr_oid *oid, char *oidbuf) does the reverse of oid_oiddotstring_to_oid. It converts an OID to the string-based representation using dots. The supplied char buffer oidbuf holds the resulting string and must be at least OID_STR_MAX in size. OIDs can be copied with oid_oidcpy which takes two OID lists as arguments. Alternativly, an OID copy can be allocated on a ODR stream with: Odr_oid *odr_oiddup(ODR odr, const Odr_oid *o); OIDs can be compared with oid_oidcmp which returns zero if the two OIDs provided are identical; non-zero otherwise. OID database From YAZ version 3 and later, the oident system has been replaced by an OID database. OID database is a misnomer .. the old odient system was also a database. The OID database is really just a map between named Object Identifiers (string) and their OID raw equivalents. Most operations either convert from string to OID or other way around. Unfortunately, whenever we supply a string we must also specify the OID class. The class is necessary because some strings correspond to multiple OIDs. An example of such a string is Bib-1 which may either be an attribute-set or a diagnostic-set. Applications using the YAZ database should include yaz/oid_db.h. A YAZ database handle is of type yaz_oid_db_t. Actually that's a pointer. You need not think deal with that. YAZ has a built-in database which can be considered "constant" for most purposes. We can get hold that by using function yaz_oid_std. All functions with prefix yaz_string_to_oid converts from class + string to OID. We have variants of this operation due to different memory allocation strategies. All functions with prefix yaz_oid_to_string converts from OID to string + class. Create OID with YAZ DB We can create an OID for the Bib-1 attribute set on the ODR stream odr with: Odr_oid *bib1 = yaz_string_to_oid_odr(yaz_oid_std(), CLASS_ATTSET, "Bib-1", odr); This is more complex than using odr_getoidbystr. You would only use yaz_string_to_oid_odr when the string (here Bib-1) is supplied by a user or configuration. Standard OIDs All the object identifers in the standard OID database as returned by yaz_oid_std can referenced directly in a program as a constant OID. Each constant OID is prefixed with yaz_oid_ - followed by OID class (lowercase) - then by OID name (normalized and lowercase). See for list of all object identifiers built into YAZ. These are declared in yaz/oid_std.h but are included by yaz/oid_db.h as well. Use a built-in OID We can allocate our own OID filled with the constant OID for Bib-1 with: Odr_oid *bib1 = odr_oiddup(o, yaz_oid_attset_bib1); Nibble Memory Sometimes when you need to allocate and construct a large, interconnected complex of structures, it can be a bit of a pain to release the associated memory again. For the structures describing the Z39.50 PDUs and related structures, it is convenient to use the memory-management system of the &odr; subsystem (see ). However, in some circumstances where you might otherwise benefit from using a simple nibble memory management system, it may be impractical to use odr_malloc() and odr_reset(). For this purpose, the memory manager which also supports the &odr; streams is made available in the NMEM module. The external interface to this module is given in the nmem.h file. The following prototypes are given: NMEM nmem_create(void); void nmem_destroy(NMEM n); void *nmem_malloc(NMEM n, size_t size); void nmem_reset(NMEM n); size_t nmem_total(NMEM n); void nmem_init(void); void nmem_exit(void); The nmem_create() function returns a pointer to a memory control handle, which can be released again by nmem_destroy() when no longer needed. The function nmem_malloc() allocates a block of memory of the requested size. A call to nmem_reset() or nmem_destroy() will release all memory allocated on the handle since it was created (or since the last call to nmem_reset(). The function nmem_total() returns the number of bytes currently allocated on the handle. The nibble memory pool is shared amongst threads. POSIX mutex'es and WIN32 Critical sections are introduced to keep the module thread safe. Function nmem_init() initializes the nibble memory library and it is called automatically the first time the YAZ.DLL is loaded. &yaz; uses function DllMain to achieve this. You should not call nmem_init or nmem_exit unless you're absolute sure what you're doing. Note that in previous &yaz; versions you'd have to call nmem_init yourself. Log &yaz; has evolved a fairly complex log system which should be useful both for debugging &yaz; itself, debugging applications that use &yaz;, and for production use of those applications. The log functions are declared in header yaz/log.h and implemented in src/log.c. Due to name clash with syslog and some math utilities the logging interface has been modified as of YAZ 2.0.29. The obsolete interface is still available if in header file yaz/log.h. The key points of the interface are: void yaz_log(int level, const char *fmt, ...) void yaz_log_init(int level, const char *prefix, const char *name); void yaz_log_init_file(const char *fname); void yaz_log_init_level(int level); void yaz_log_init_prefix(const char *prefix); void yaz_log_time_format(const char *fmt); void yaz_log_init_max_size(int mx); int yaz_log_mask_str(const char *str); int yaz_log_module_level(const char *name); The reason for the whole log module is the yaz_log function. It takes a bitmask indicating the log levels, a printf-like format string, and a variable number of arguments to log. The log level is a bit mask, that says on which level(s) the log entry should be made, and optionally set some behaviour of the logging. In the most simple cases, it can be one of YLOG_FATAL, YLOG_DEBUG, YLOG_WARN, YLOG_LOG. Those can be combined with bits that modify the way the log entry is written:YLOG_ERRNO, YLOG_NOTIME, YLOG_FLUSH. Most of the rest of the bits are deprecated, and should not be used. Use the dynamic log levels instead. Applications that use &yaz;, should not use the LOG_LOG for ordinary messages, but should make use of the dynamic loglevel system. This consists of two parts, defining the loglevel and checking it. To define the log levels, the (main) program should pass a string to yaz_log_mask_str to define which log levels are to be logged. This string should be a comma-separated list of log level names, and can contain both hard-coded names and dynamic ones. The log level calculation starts with YLOG_DEFAULT_LEVEL and adds a bit for each word it meets, unless the word starts with a '-', in which case it clears the bit. If the string 'none' is found, all bits are cleared. Typically this string comes from the command-line, often identified by -v. The yaz_log_mask_str returns a log level that should be passed to yaz_log_init_level for it to take effect. Each module should check what log bits it should be used, by calling yaz_log_module_level with a suitable name for the module. The name is cleared from a preceding path and an extension, if any, so it is quite possible to use __FILE__ for it. If the name has been passed to yaz_log_mask_str, the routine returns a non-zero bitmask, which should then be used in consequent calls to yaz_log. (It can also be tested, so as to avoid unnecessary calls to yaz_log, in time-critical places, or when the log entry would take time to construct.) Yaz uses the following dynamic log levels: server, session, request, requestdetail for the server functionality. zoom for the zoom client api. ztest for the simple test server. malloc, nmem, odr, eventl for internal debugging of yaz itself. Of course, any program using yaz is welcome to define as many new ones, as it needs. By default the log is written to stderr, but this can be changed by a call to yaz_log_init_file or yaz_log_init. If the log is directed to a file, the file size is checked at every write, and if it exceeds the limit given in yaz_log_init_max_size, the log is rotated. The rotation keeps one old version (with a .1 appended to the name). The size defaults to 1GB. Setting it to zero will disable the rotation feature. A typical yaz-log looks like this 13:23:14-23/11 yaz-ztest(1) [session] Starting session from tcp:127.0.0.1 (pid=30968) 13:23:14-23/11 yaz-ztest(1) [request] Init from 'YAZ' (81) (ver 2.0.28) OK 13:23:17-23/11 yaz-ztest(1) [request] Search Z: @attrset Bib-1 foo OK:7 hits 13:23:22-23/11 yaz-ztest(1) [request] Present: [1] 2+2 OK 2 records returned 13:24:13-23/11 yaz-ztest(1) [request] Close OK The log entries start with a time stamp. This can be omitted by setting the YLOG_NOTIME bit in the loglevel. This way automatic tests can be hoped to produce identical log files, that are easy to diff. The format of the time stamp can be set with yaz_log_time_format, which takes a format string just like strftime. Next in a log line comes the prefix, often the name of the program. For yaz-based servers, it can also contain the session number. Then comes one or more logbits in square brackets, depending on the logging level set by yaz_log_init_level and the loglevel passed to yaz_log_init_level. Finally comes the format string and additional values passed to yaz_log The log level YLOG_LOGLVL, enabled by the string loglevel, will log all the log-level affecting operations. This can come in handy if you need to know what other log levels would be useful. Grep the logfile for [loglevel]. The log system is almost independent of the rest of &yaz;, the only important dependence is of nmem, and that only for using the semaphore definition there. The dynamic log levels and log rotation were introduced in &yaz; 2.0.28. At the same time, the log bit names were changed from LOG_something to YLOG_something, to avoid collision with syslog.h. MARC YAZ provides a fast utility for working with MARC records. Early versions of the MARC utility only allowed decoding of ISO2709. Today the utility may both encode - and decode to a varity of formats. /* create handler */ yaz_marc_t yaz_marc_create(void); /* destroy */ void yaz_marc_destroy(yaz_marc_t mt); /* set XML mode YAZ_MARC_LINE, YAZ_MARC_SIMPLEXML, ... */ void yaz_marc_xml(yaz_marc_t mt, int xmlmode); #define YAZ_MARC_LINE 0 #define YAZ_MARC_SIMPLEXML 1 #define YAZ_MARC_OAIMARC 2 #define YAZ_MARC_MARCXML 3 #define YAZ_MARC_ISO2709 4 #define YAZ_MARC_XCHANGE 5 #define YAZ_MARC_CHECK 6 #define YAZ_MARC_TURBOMARC 7 #define YAZ_MARC_JSON 8 /* supply iconv handle for character set conversion .. */ void yaz_marc_iconv(yaz_marc_t mt, yaz_iconv_t cd); /* set debug level, 0=none, 1=more, 2=even more, .. */ void yaz_marc_debug(yaz_marc_t mt, int level); /* decode MARC in buf of size bsize. Returns >0 on success; <=0 on failure. On success, result in *result with size *rsize. */ int yaz_marc_decode_buf(yaz_marc_t mt, const char *buf, int bsize, const char **result, size_t *rsize); /* decode MARC in buf of size bsize. Returns >0 on success; <=0 on failure. On success, result in WRBUF */ int yaz_marc_decode_wrbuf(yaz_marc_t mt, const char *buf, int bsize, WRBUF wrbuf); ]]> The synopsis is just a basic subset of all functionality. Refer to the actual header file marcdisp.h for details. A MARC conversion handle must be created by using yaz_marc_create and destroyed by calling yaz_marc_destroy. All other function operate on a yaz_marc_t handle. The output is specified by a call to yaz_marc_xml. The xmlmode must be one of YAZ_MARC_LINE A simple line-by-line format suitable for display but not recommend for further (machine) processing. YAZ_MARC_MARCXML MARCXML. YAZ_MARC_ISO2709 ISO2709 (sometimes just referred to as "MARC"). YAZ_MARC_XCHANGE MarcXchange. YAZ_MARC_CHECK Pseudo format for validation only. Does not generate any real output except diagnostics. YAZ_MARC_TURBOMARC XML format with same semantics as MARCXML but more compact and geared towards fast processing with XSLT. Refer to for more information. YAZ_MARC_JSON MARC-in_JSON format. The actual conversion functions are yaz_marc_decode_buf and yaz_marc_decode_wrbuf which decodes and encodes a MARC record. The former function operates on simple buffers, the stores the resulting record in a WRBUF handle (WRBUF is a simple string type). Display of MARC record The following program snippet illustrates how the MARC API may be used to convert a MARC record to the line-by-line format: TurboMARC TurboMARC is yet another XML encoding of a MARC record. The format was designed for fast processing with XSLT. Applications like Pazpar2 uses XSLT to convert an XML encoded MARC record to an internal representation. This conversion mostly check the tag of a MARC field to determine the basic rules in the conversion. This check is costly when that is tag is encoded as an attribute in MARCXML. By having the tag value as the element instead, makes processing many times faster (at least for Libxslt). TurboMARC is encoded as follows: Record elements is part of namespace "http://www.indexdata.com/turbomarc". A record is enclosed in element r. A collection of records is enclosed in element collection. The leader is encoded as element l with the leader content as its (text) value. A control field is encoded as element c concatenated with the tag value of the control field if the tag value matches the regular expression [a-zA-Z0-9]*. If the tag value do not match the regular expression [a-zA-Z0-9]* the control field is encoded as element c and attribute code will hold the tag value. This rule ensure that in the rare cases where a tag value might result in a non-wellformed XML YAZ encode it as a coded attribute (as in MARCXML). The control field content is the the text value of this element. Indicators are encoded as attribute names i1, i2, etc.. and corresponding values for each indicator. A data field is encoded as element d concatenated with the tag value of the data field or using the attribute code as described in the rules for control fields. The children of the data field element is subfield elements. Each subfield element is encoded as s concatenated with the sub field code. The text of the subfield element is the contents of the subfield. Indicators are encoded as attributes for the data field element similar to the encoding for control fields. Retrieval Facility YAZ version 2.1.20 or later includes a Retrieval facility tool which allows a SRU/Z39.50 to describe itself and perform record conversions. The idea is the following: An SRU/Z39.50 client sends a retrieval request which includes a combination of the following parameters: syntax (format), schema (or element set name). The retrieval facility is invoked with parameters in a server/proxy. The retrieval facility matches the parameters a set of "supported" retrieval types. If there is no match, the retrieval signals an error (syntax and / or schema not supported). For a successful match, the backend is invoked with the same or altered retrieval parameters (syntax, schema). If a record is received from the backend, it is converted to the frontend name / syntax. The resulting record is sent back the client and tagged with the frontend syntax / schema. The Retrieval facility is driven by an XML configuration. The configuration is neither Z39.50 ZeeRex or SRU ZeeRex. But it should be easy to generate both of them from the XML configuration. (unfortunately the two versions of ZeeRex differ substantially in this regard). Retrieval XML format All elements should be covered by namespace http://indexdata.com/yaz . The root element node must be retrievalinfo. The retrievalinfo must include one or more retrieval elements. Each retrieval defines specific combination of syntax, name and identifier supported by this retrieval service. The retrieval element may include any of the following attributes: syntax (REQUIRED) Defines the record syntax. Possible values is any of the names defined in YAZ' OID database or a raw OID in (n.n ... n). name (OPTIONAL) Defines the name of the retrieval format. This can be any string. For SRU, the value, is equivalent to schema (short-hand); for Z39.50 it's equivalent to simple element set name. For YAZ 3.0.24 and later this name may be specified as a glob expression with operators * and ?. identifier (OPTIONAL) Defines the URI schema name of the retrieval format. This can be any string. For SRU, the value, is equivalent to URI schema. For Z39.50, there is no equivalent. The retrieval may include one backend element. If a backend element is given, it specifies how the records are retrieved by some backend and how the records are converted from the backend to the "frontend". The attributes, name and syntax may be specified for the backend element. These semantics of these attributes is equivalent to those for the retrieval. However, these values are passed to the "backend". The backend element may includes one or more conversion instructions (as children elements). The supported conversions are: marc The marc element specifies a conversion to - and from ISO2709 encoded MARC and &acro.marcxml;/MarcXchange. The following attributes may be specified: inputformat (REQUIRED) Format of input. Supported values are marc (for ISO2709), xml (MARCXML/MarcXchange) and json (MARC-in_JSON). outputformat (REQUIRED) Format of output. Supported values are line (MARC line format); marcxml (for MARCXML), marc (ISO2709), marcxhcange (for MarcXchange), or json (MARC-in_JSON ). inputcharset (OPTIONAL) Encoding of input. For XML input formats, this need not be given, but for ISO2709 based inputformats, this should be set to the encoding used. For MARC21 records, a common inputcharset value would be marc-8. outputcharset (OPTIONAL) Encoding of output. If outputformat is XML based, it is strongly recommened to use utf-8. xslt The xslt element specifies a conversion via &acro.xslt;. The following attributes may be specified: stylesheet (REQUIRED) Stylesheet file. Retrieval Facility Examples MARC21 backend A typical way to use the retrieval facility is to enable XML for servers that only supports ISO2709 encoded MARC21 records. ]]> This means that our frontend supports: MARC21 F(ull) records. MARC21 B(rief) records. MARCXML records. Dublin core records. MARCXML backend SRW/SRU and Solr backends returns records in XML. If they return MARCXML or MarcXchange, the retrieval module can convert those into ISO2709 formats, most commonly USMARC (AKA MARC21). In this example, the backend returns MARCXML for schema="marcxml". ]]> This means that our frontend supports: MARC21 records (any element set name) in MARC-8 encoding. MARCXML records for element-set=marcxml Dublin core records for element-set=dc. API It should be easy to use the retrieval systems from applications. Refer to the headers yaz/retrieval.h and yaz/record_conv.h. Sorting This chapter describes sorting and how it is supported in YAZ. Sorting applies to a result-set. The Z39.50 sorting facility takes one or more input result-sets and one result-set as output. The most simple case is that the input-set is the same as the output-set. Z39.50 sorting has a separate APDU (service) that is, thus, performed following a search (two phases). In SRU/Solr, however, the model is different. Here, sorting is specified during the the search operation. Note, however, that SRU might perform sort as separate search, by referring to an existing result-set in the query (result-set reference). Using the Z39.50 sort service yaz-client and the ZOOM API supports the Z39.50 sort facility. In any case the sort sequence or sort critiera is using a string notation. This notation is a one-line notation suitable for being manually entered or generated and allows for easy logging (one liner). For the ZOOM API, the sort is specified in the call to ZOOM_query_sortby function. For yaz-client the sort is performed and specified using the sort and sort+ commands. For description of the sort criteria notation refer to the sort command in the yaz-client manual. The ZOOM API might choose one of several sort strategies for sorting. Refer to . Type-7 sort Type-7 sort is an extension to the Bib-1 based RPN query where the sort specification is embedded as an Attribute-Plus-Term. The objectives for introducing Type-7 sorting is that it allows a client to perform sorting even if it does not implement/support Z39.50 sort. Virtually all Z39.50 client software supports RPN queries. It also may improve performance because the sort critieria is specified along with the search query. The sort is triggered by the presence of type 7 and the value of type 7 specifies the sortRelation The value for type 7 is 1 for ascending and 2 for descending. For the sortElement only the generic part is handled. If generic sortKey is of type sortField, then attribute type 1 is present and the value is sortField (InternationalString). If generic sortKey is of type sortAttributes, then the attributes in list is used . generic sortKey of type elementSpec is not supported. The term in the sorting Attribute-Plus-Term combo should hold an integer. The value is 0 for primary sorting criteria, 1 for second criteria, etc.