X-Git-Url: http://git.indexdata.com/?p=yaz-moved-to-github.git;a=blobdiff_plain;f=doc%2Ftools.xml;h=2d9b62b30e7ccffcc2d486e14660e62e1c72e60a;hp=8b3fe80fe73a0166080fc9f1d9d3f3973d6c2b65;hb=6e32d36c608fe4c1e345e07ba0bf93b1129f58f1;hpb=0f72f09a46621eb0aa9960b990dd35c221333e4d diff --git a/doc/tools.xml b/doc/tools.xml index 8b3fe80..2d9b62b 100644 --- a/doc/tools.xml +++ b/doc/tools.xml @@ -1,675 +1,2692 @@ - -Supporting Tools - - -In support of the service API - primarily the ASN module, which -provides the programmatic interface to the Z39.50 APDUs, YAZ contains -a collection of tools that support the development of applications. - - -Query Syntax Parsers - - -Since the type-1 (RPN) query structure has no direct, useful string -representation, every origin application needs to provide some form of -mapping from a local query notation or representation to a -Z_RPNQuery structure. Some programmers will prefer to -construct the query manually, perhaps using odr_malloc() -to simplify memory management. The &yaz; distribution includes two separate, -query-generating tools that may be of use to you. - - -Prefix Query Format - - -Since RPN or reverse polish notation is really just a fancy way of -describing a suffix notation format (operator follows operands), it -would seem that the confusion is total when we now introduce a prefix -notation for RPN. The reason is one of simple laziness - it's somewhat -simpler to interpret a prefix format, and this utility was designed -for maximum simplicity, to provide a baseline representation for use -in simple test applications and scripting environments (like Tcl). The -demonstration client included with YAZ uses the PQF. - - -The PQF is defined by the pquery module in the YAZ library. The -pquery.h file provides the declaration of the functions - - -Z_RPNQuery *p_query_rpn (ODR o, oid_proto proto, const char *qbuf); - -Z_AttributesPlusTerm *p_query_scan (ODR o, oid_proto proto, - Odr_oid **attributeSetP, const char *qbuf); - -int p_query_attset (const char *arg); - - -The function p_query_rpn() takes as arguments an -&odr; stream (see section The ODR Module) -to provide a memory source (the structure created is released on -the next call to odr_reset() on the stream), a -protocol identifier (one of the constants PROTO_Z3950 and -PROTO_SR), an attribute set -reference, and finally a null-terminated string holding the query -string. - - -If the parse went well, p_query_rpn() returns a -pointer to a Z_RPNQuery structure which can be -placed directly into a Z_SearchRequest. - - - -The p_query_attset specifies which attribute set to use if -the query doesn't specify one by the @attrset operator. -The p_query_attset returns 0 if the argument is a -valid attribute set specifier; otherwise the function returns -1. - - - -The grammar of the PQF is as follows: - - - -Query ::= [ AttSet ] QueryStruct. - -AttSet ::= string. - -QueryStruct ::= { Attribute } Simple | Complex. - -Attribute ::= '@attr' AttributeType '=' AttributeValue. - -AttributeType ::= integer. - -AttributeValue ::= integer. - -Complex ::= Operator QueryStruct QueryStruct. - -Operator ::= '@and' | '@or' | '@not' | '@prox' Proximity. - -Simple ::= ResultSet | Term. - -ResultSet ::= '@set' string. - -Term ::= string | '"' string '"'. - -Proximity ::= Exclusion Distance Ordered Relation WhichCode UnitCode. - -Exclusion ::= '1' | '0' | 'void'. - -Distance ::= integer. - -Ordered ::= '1' | '0'. - -Relation ::= integer. - -WhichCode ::= 'known' | 'private' | integer. - -UnitCode ::= integer. - - - -You will note that the syntax above is a fairly faithful -representation of RPN, except for the Attibute, which has been -moved a step away from the term, allowing you to associate one or more -attributes with an entire query structure. The parser will -automatically apply the given attributes to each term as required. - - - -The following are all examples of valid queries in the PQF. - - - -dylan - -"bob dylan" - -@or "dylan" "zimmerman" - -@set Result-1 - -@or @and bob dylan @set Result-1 - -@attr 4=1 @and @attr 1=1 "bob dylan" @attr 1=4 "slow train coming" - -@attr 4=1 @attr 1=4 "self portrait" - -@prox 0 3 1 2 k 2 dylan zimmerman - - - -Common Command Language - - -Not all users enjoy typing in prefix query structures and numerical -attribute values, even in a minimalistic test client. In the library -world, the more intuitive Common Command Language (or ISO 8777) has -enjoyed some popularity - especially before the widespread -availability of graphical interfaces. It is still useful in -applications where you for some reason or other need to provide a -symbolic language for expressing boolean query structures. - - - -The EUROPAGATE research project working under the Libraries programme -of the European Commission's DG XIII has, amongst other useful tools, -implemented a general-purpose CCL parser which produces an output -structure that can be trivially converted to the internal RPN -representation of YAZ (The Z_RPNQuery structure). -Since the CCL utility - along with the rest of the software -produced by EUROPAGATE - is made freely available on a liberal license, it -is included as a supplement to YAZ. - - -CCL Syntax - - -The CCL parser obeys the following grammar for the FIND argument. -The syntax is annotated by in the lines prefixed by -‐‐. - - - -CCL-Find ::= CCL-Find Op Elements - | Elements. - -Op ::= "and" | "or" | "not" --- The above means that Elements are separated by boolean operators. - -Elements ::= '(' CCL-Find ')' - | Set - | Terms - | Qualifiers Relation Terms - | Qualifiers Relation '(' CCL-Find ')' - | Qualifiers '=' string '-' string --- Elements is either a recursive definition, a result set reference, a --- list of terms, qualifiers followed by terms, qualifiers followed --- by a recursive definition or qualifiers in a range (lower - upper). - -Set ::= 'set' = string --- Reference to a result set - -Terms ::= Terms Prox Term - | Term --- Proximity of terms. - -Term ::= Term string - | string --- This basically means that a term may include a blank - -Qualifiers ::= Qualifiers ',' string - | string --- Qualifiers is a list of strings separated by comma - -Relation ::= '=' | '>=' | '<=' | '<>' | '>' | '<' --- Relational operators. This really doesn't follow the ISO8777 --- standard. - -Prox ::= '%' | '!' --- Proximity operator - - - - -The following queries are all valid: - - - -dylan - -"bob dylan" - -dylan or zimmerman - -set=1 - -(dylan and bob) or set=1 - - - -Assuming that the qualifiers ti, au -and date are defined we may use: - - - -ti=self portrait - -au=(bob dylan and slow train coming) - -date>1980 and (ti=((self portrait))) - - - - -CCL Qualifiers - - -Qualifiers are used to direct the search to a particular searchable -index, such as title (ti) and author indexes (au). The CCL standard -itself doesn't specify a particular set of qualifiers, but it does -suggest a few short-hand notations. You can customize the CCL parser -to support a particular set of qualifiers to relect the current target -profile. Traditionally, a qualifier would map to a particular -use-attribute within the BIB-1 attribute set. However, you could also -define qualifiers that would set, for example, the -structure-attribute. - - - -Consider a scenario where the target support ranked searches in the -title-index. In this case, the user could specify - - -> -ti,ranked=knuth computer - - -and the ranked would map to structure=free-form-text -(4=105) and the ti would map to title (1=4). - - - -A "profile" with a set predefined CCL qualifiers can be read from a -file. The YAZ client reads its CCL qualifiers from a file named -default.bib. Each line in the file has the form: - - - -qualifier-name - type=val type=val ... - - - -where qualifier-name is the name of the -qualifier to be used (eg. ti), -type is a BIB-1 category type and -val is the corresponding BIB-1 attribute value. -The type can be either numeric or it may be -either u (use), r (relation), -p (position), s (structure), -t (truncation) or c (completeness). -The qualifier-name term has a -special meaning. The types and values for this definition is used when -no qualifiers are present. - - - -Consider the following definition: - - - -ti u=4 s=1 -au u=1 s=1 -term s=105 - - -Two qualifiers are defined, ti and au. -They both set the structure-attribute to phrase (1). ti -sets the use-attribute to 4. au sets the use-attribute -to 1. When no qualifiers are used in the query the structure-attribute is -set to free-form-text (105). - - - -CCL API - -All public definitions can be found in the header file -ccl.h. A profile identifier is of type -CCL_bibset. A profile must be created with the call to -the function ccl_qual_mk which returns a profile -handle of type CCL_bibset. - - - -To read a file containing qualifier definitions the function -ccl_qual_file may be convenient. This function takes -an already opened FILE handle pointer as argument -along with a CCL_bibset handle. - - - -To parse a simple string with a FIND query use the function - - - struct ccl_rpn_node *ccl_find_str (CCL_bibset bibset, const char *str, - int *error, int *pos); - - -which takes the CCL profile (bibset) and query -(str) as input. Upon successful completion the RPN -tree is returned. If an error eccur, such as a syntax error, the integer -pointed to by error holds the error code and -pos holds the offset inside query string in which -the parsing failed. - - - -An english representation of the error may be obtained by calling -the ccl_err_msg function. The error codes are listed in -ccl.h. - - - -To convert the CCL RPN tree (type struct ccl_rpn_node *) -to the Z_RPNQuery of YAZ the function ccl_rpn_query -must be used. This function which is part of YAZ is implemented in -yaz-ccl.c. -After calling this function the CCL RPN tree is probably no longer -needed. The ccl_rpn_delete destroys the CCL RPN tree. - - - -A CCL profile may be destroyed by calling the ccl_qual_rm -function. - - - -The token names for the CCL operators may be changed by setting the -globals (all type char *) -ccl_token_and, ccl_token_or, -ccl_token_not and ccl_token_set. -An operator may have aliases, i.e. there may be more than one name for -the operator. To do this, separate each alias with a space character. - - - - -Object Identifiers - - -The basic YAZ representation of an OID is an array of integers, -terminated with the value -1. The &odr; module provides two -utility-functions to create and copy this type of data elements: - - - - Odr_oid *odr_getoidbystr(ODR o, char *str); - - - -Creates an OID based on a string-based representation using dots (.) -to separate elements in the OID. - - - -Odr_oid *odr_oiddup(ODR odr, Odr_oid *o); - - - -Creates a copy of the OID referenced by the o parameter. -Both functions take an &odr; stream as parameter. This stream is used to -allocate memory for the data elements, which is released on a -subsequent call to odr_reset() on that stream. - - - -The OID module provides a higher-level representation of the -family of object identifers which describe the Z39.50 protocol and its -related objects. The definition of the module interface is given in -the oid.h file. - - - -The interface is mainly based on the oident structure. The -definition of this structure looks like this: - - - -typedef struct oident -{ - oid_proto proto; - oid_class oclass; - oid_value value; - int oidsuffix[OID_SIZE]; - char *desc; -} oident; - - - -The proto field takes one of the values - - - -PROTO_Z3950 -PROTO_SR - - - -If you don't care about talking to SR-based implementations (few -exist, and they may become fewer still if and when the ISO SR and ANSI -Z39.50 documents are merged into a single standard), you can ignore -this field on incoming packages, and always set it to PROTO_Z3950 -for outgoing packages. - - - -The oclass field takes one of the values - - - -CLASS_APPCTX -CLASS_ABSYN -CLASS_ATTSET -CLASS_TRANSYN -CLASS_DIAGSET -CLASS_RECSYN -CLASS_RESFORM -CLASS_ACCFORM -CLASS_EXTSERV -CLASS_USERINFO -CLASS_ELEMSPEC -CLASS_VARSET -CLASS_SCHEMA -CLASS_TAGSET -CLASS_GENERAL - - - -corresponding to the OID classes defined by the Z39.50 standard. - -Finally, the value field takes one of the values - - - -VAL_APDU -VAL_BER -VAL_BASIC_CTX -VAL_BIB1 -VAL_EXP1 -VAL_EXT1 -VAL_CCL1 -VAL_GILS -VAL_WAIS -VAL_STAS -VAL_DIAG1 -VAL_ISO2709 -VAL_UNIMARC -VAL_INTERMARC -VAL_CCF -VAL_USMARC -VAL_UKMARC -VAL_NORMARC -VAL_LIBRISMARC -VAL_DANMARC -VAL_FINMARC -VAL_MAB -VAL_CANMARC -VAL_SBN -VAL_PICAMARC -VAL_AUSMARC -VAL_IBERMARC -VAL_EXPLAIN -VAL_SUTRS -VAL_OPAC -VAL_SUMMARY -VAL_GRS0 -VAL_GRS1 -VAL_EXTENDED -VAL_RESOURCE1 -VAL_RESOURCE2 -VAL_PROMPT1 -VAL_DES1 -VAL_KRB1 -VAL_PRESSET -VAL_PQUERY -VAL_PCQUERY -VAL_ITEMORDER -VAL_DBUPDATE -VAL_EXPORTSPEC -VAL_EXPORTINV -VAL_NONE -VAL_SETM -VAL_SETG -VAL_VAR1 -VAL_ESPEC1 - - - -again, corresponding to the specific OIDs defined by the standard. - - - -The desc field contains a brief, mnemonic name for the OID in question. - - - -The function - - - - struct oident *oid_getentbyoid(int *o); - - - -takes as argument an OID, and returns a pointer to a static area -containing an oident structure. You typically use -this function when you receive a PDU containing an OID, and you wish -to branch out depending on the specific OID value. - - - -The function - - - - int *oid_ent_to_oid(struct oident *ent, int *dst); - - - -Takes as argument an oident structure - in which -the proto, oclass/, and -value fields are assumed to be set correctly - -and returns a pointer to a the buffer as given by dst -containing the base -representation of the corresponding OID. The function returns -NULL and the array dst is unchanged if a mapping couldn't place. -The array dst should be at least of size -OID_SIZE. - - - -The oid_ent_to_oid() function can be used whenever -you need to prepare a PDU containing one or more OIDs. The separation of -the protocol element from the remainer of the -OID-description makes it simple to write applications that can -communicate with either Z39.50 or OSI SR-based applications. - - - -The function - - -< - oid_value oid_getvalbyname(const char *name); - - - -takes as argument a mnemonic OID name, and returns the -/value field of the first entry in the database that -contains the given name in its desc field. - - - -Finally, the module provides the following utility functions, whose -meaning should be obvious: - - - - void oid_oidcpy(int *t, int *s); - void oid_oidcat(int *t, int *s); - int oid_oidcmp(int *o1, int *o2); - int oid_oidlen(int *o); - - - - -The OID module has been criticized - and perhaps rightly so -- for needlessly abstracting the -representation of OIDs. Other toolkits use a simple -string-representation of OIDs with good results. In practice, we have -found the interface comfortable and quick to work with, and it is a -simple matter (for what it's worth) to create applications compatible with -both ISO SR and Z39.50. Finally, the use of the /oident -database is by no means mandatory. You can easily create your -own system for representing OIDs, as long as it is compatible with the -low-level integer-array representation of the ODR module. - - - - - -Nibble Memory - - -Sometimes when you need to allocate and construct a large, -interconnected complex of structures, it can be a bit of a pain to -release the associated memory again. For the structures describing the -Z39.50 PDUs and related structures, it is convenient to use the -memory-management system of the &odr; subsystem (see -Using ODR). However, in some circumstances -where you might otherwise benefit from using a simple nibble memory -management system, it may be impractical to use -odr_malloc() and odr_reset(). -For this purpose, the memory manager which also supports the &odr; streams -is made available in the NMEM module. The external interface to this module is given in the nmem.h file. - - - -The following prototypes are given: - - - -NMEM nmem_create(void); -void nmem_destroy(NMEM n); -void *nmem_malloc(NMEM n, int size); -void nmem_reset(NMEM n); -int nmem_total(NMEM n); -void nmem_init(void); - - - -The nmem_create() function returns a pointer to a -memory control handle, which can be released again by -nmem_destroy() when no longer needed. -The function nmem_malloc() allocates a block of -memory of the requested size. A call to nmem_reset() or -nmem_destroy() will release all memory allocated on -the handle since it was created (or since the last call to -nmem_reset(). The function -nmem_total() returns the number of bytes currently -allocated on the handle. - - - - -The nibble memory pool is shared amonst threads. POSIX -mutex'es and WIN32 Critical sections are introduced to keep the -module thread safe. On WIN32 function nmem_init() -initialises the Critical Section handle and should be called once before any -other nmem function is used. - - - - - \ No newline at end of file + Supporting Tools + + + In support of the service API - primarily the ASN module, which + provides the pro-grammatic interface to the Z39.50 APDUs, &yaz; contains + a collection of tools that support the development of applications. + + + Query Syntax Parsers + + + Since the type-1 (RPN) query structure has no direct, useful string + representation, every origin application needs to provide some form of + mapping from a local query notation or representation to a + Z_RPNQuery structure. Some programmers will prefer to + construct the query manually, perhaps using + odr_malloc() to simplify memory management. + The &yaz; distribution includes three separate, query-generating tools + that may be of use to you. + + + Prefix Query Format + + + Since RPN or reverse polish notation is really just a fancy way of + describing a suffix notation format (operator follows operands), it + would seem that the confusion is total when we now introduce a prefix + notation for RPN. The reason is one of simple laziness - it's somewhat + simpler to interpret a prefix format, and this utility was designed + for maximum simplicity, to provide a baseline representation for use + in simple test applications and scripting environments (like Tcl). The + demonstration client included with YAZ uses the PQF. + + + + + The PQF have been adopted by other parties developing Z39.50 + software. It is often referred to as Prefix Query Notation + - PQN. + + + + The PQF is defined by the pquery module in the YAZ library. + There are two sets of function that have similar behavior. First + set operates on a PQF parser handle, second set doesn't. First set + set of functions are more flexible than the second set. Second set + is obsolete and is only provided to ensure backwards compatibility. + + + First set of functions all operate on a PQF parser handle: + + + #include <yaz/pquery.h> + + YAZ_PQF_Parser yaz_pqf_create(void); + + void yaz_pqf_destroy(YAZ_PQF_Parser p); + + Z_RPNQuery *yaz_pqf_parse(YAZ_PQF_Parser p, ODR o, const char *qbuf); + + Z_AttributesPlusTerm *yaz_pqf_scan(YAZ_PQF_Parser p, ODR o, + Odr_oid **attributeSetId, const char *qbuf); + + int yaz_pqf_error(YAZ_PQF_Parser p, const char **msg, size_t *off); + + + A PQF parser is created and destructed by functions + yaz_pqf_create and + yaz_pqf_destroy respectively. + Function yaz_pqf_parse parses query given + by string qbuf. If parsing was successful, + a Z39.50 RPN Query is returned which is created using ODR stream + o. If parsing failed, a NULL pointer is + returned. + Function yaz_pqf_scan takes a scan query in + qbuf. If parsing was successful, the function + returns attributes plus term pointer and modifies + attributeSetId to hold attribute set for the + scan request - both allocated using ODR stream o. + If parsing failed, yaz_pqf_scan returns a NULL pointer. + Error information for bad queries can be obtained by a call to + yaz_pqf_error which returns an error code and + modifies *msg to point to an error description, + and modifies *off to the offset within last + query were parsing failed. + + + The second set of functions are declared as follows: + + + #include <yaz/pquery.h> + + Z_RPNQuery *p_query_rpn(ODR o, oid_proto proto, const char *qbuf); + + Z_AttributesPlusTerm *p_query_scan(ODR o, oid_proto proto, + Odr_oid **attributeSetP, const char *qbuf); + + int p_query_attset(const char *arg); + + + The function p_query_rpn() takes as arguments an + &odr; stream (see section The ODR Module) + to provide a memory source (the structure created is released on + the next call to odr_reset() on the stream), a + protocol identifier (one of the constants PROTO_Z3950 and + PROTO_SR), an attribute set reference, and + finally a null-terminated string holding the query string. + + + If the parse went well, p_query_rpn() returns a + pointer to a Z_RPNQuery structure which can be + placed directly into a Z_SearchRequest. + If parsing failed, due to syntax error, a NULL pointer is returned. + + + The p_query_attset specifies which attribute set + to use if the query doesn't specify one by the + @attrset operator. + The p_query_attset returns 0 if the argument is a + valid attribute set specifier; otherwise the function returns -1. + + + + The grammar of the PQF is as follows: + + + + query ::= top-set query-struct. + + top-set ::= [ '@attrset' string ] + + query-struct ::= attr-spec | simple | complex | '@term' term-type query + + attr-spec ::= '@attr' [ string ] string query-struct + + complex ::= operator query-struct query-struct. + + operator ::= '@and' | '@or' | '@not' | '@prox' proximity. + + simple ::= result-set | term. + + result-set ::= '@set' string. + + term ::= string. + + proximity ::= exclusion distance ordered relation which-code unit-code. + + exclusion ::= '1' | '0' | 'void'. + + distance ::= integer. + + ordered ::= '1' | '0'. + + relation ::= integer. + + which-code ::= 'known' | 'private' | integer. + + unit-code ::= integer. + + term-type ::= 'general' | 'numeric' | 'string' | 'oid' | 'datetime' | 'null'. + + + + You will note that the syntax above is a fairly faithful + representation of RPN, except for the Attribute, which has been + moved a step away from the term, allowing you to associate one or more + attributes with an entire query structure. The parser will + automatically apply the given attributes to each term as required. + + + + The @attr operator is followed by an attribute specification + (attr-spec above). The specification consists + of an optional attribute set, an attribute type-value pair and + a sub-query. The attribute type-value pair is packed in one string: + an attribute type, an equals sign, and an attribute value, like this: + @attr 1=1003. + The type is always an integer but the value may be either an + integer or a string (if it doesn't start with a digit character). + A string attribute-value is encoded as a Type-1 ``complex'' + attribute with the list of values containing the single string + specified, and including no semantic indicators. + + + + Version 3 of the Z39.50 specification defines various encoding of terms. + Use @term type + string, + where type is one of: general, + numeric or string + (for InternationalString). + If no term type has been given, the general form + is used. This is the only encoding allowed in both versions 2 and 3 + of the Z39.50 standard. + + + + Using Proximity Operators with PQF + + + This is an advanced topic, describing how to construct + queries that make very specific requirements on the + relative location of their operands. + You may wish to skip this section and go straight to + the example PQF queries. + + + + + Most Z39.50 servers do not support proximity searching, or + support only a small subset of the full functionality that + can be expressed using the PQF proximity operator. Be + aware that the ability to express a + query in PQF is no guarantee that any given server will + be able to execute it. + + + + + + The proximity operator @prox is a special + and more restrictive version of the conjunction operator + @and. Its semantics are described in + section 3.7.2 (Proximity) of Z39.50 the standard itself, which + can be read on-line at + + + + In PQF, the proximity operation is represented by a sequence + of the form + +@prox exclusion distance ordered relation which-code unit-code + + in which the meanings of the parameters are as described in in + the standard, and they can take the following values: + + exclusion + 0 = false (i.e. the proximity condition specified by the + remaining parameters must be satisfied) or + 1 = true (the proximity condition specified by the + remaining parameters must not be + satisifed). + + distance + An integer specifying the difference between the locations + of the operands: e.g. two adjacent words would have + distance=1 since their locations differ by one unit. + + ordered + 1 = ordered (the operands must occur in the order the + query specifies them) or + 0 = unordered (they may appear in either order). + + relation + Recognised values are + 1 (lessThan), + 2 (lessThanOrEqual), + 3 (equal), + 4 (greaterThanOrEqual), + 5 (greaterThan) and + 6 (notEqual). + + which-code + known + or + k + (the unit-code parameter is taken from the well-known list + of alternatives described in below) or + private + or + p + (the unit-code paramater has semantics specific to an + out-of-band agreement such as a profile). + + unit-code + If the which-code parameter is known + then the recognised values are + 1 (character), + 2 (word), + 3 (sentence), + 4 (paragraph), + 5 (section), + 6 (chapter), + 7 (document), + 8 (element), + 9 (subelement), + 10 (elementType) and + 11 (byte). + If which-code is private then the + acceptable values are determined by the profile. + + + (The numeric values of the relation and well-known unit-code + parameters are taken straight from + the ASN.1 of the proximity structure in the standard.) + + + + PQF queries + + + PQF queries using simple terms + + + dylan + + "bob dylan" + + + + + PQF boolean operators + + + @or "dylan" "zimmerman" + + @and @or dylan zimmerman when + + @and when @or dylan zimmerman + + + + + PQF references to result sets + + + @set Result-1 + + @and @set seta @set setb + + + + + Attributes for terms + + + @attr 1=4 computer + + @attr 1=4 @attr 4=1 "self portrait" + + @attrset exp1 @attr 1=1 CategoryList + + @attr gils 1=2008 Copenhagen + + @attr 1=/book/title computer + + + + + PQF Proximity queries + + + @prox 0 3 1 2 k 2 dylan zimmerman + + + Here the parameters 0, 3, 1, 2, k and 2 represent exclusion, + distance, ordered, relation, which-code and unit-code, in that + order. So: + + + exclusion = 0: the proximity condition must hold + + + distance = 3: the terms must be three units apart + + + ordered = 1: they must occur in the order they are specified + + + relation = 2: lessThanOrEqual (to the distance of 3 units) + + + which-code is ``known'', so the standard unit-codes are used + + + unit-code = 2: word. + + + So the whole proximity query means that the words + dylan and zimmerman must + both occur in the record, in that order, differing in position + by three or fewer words (i.e. with two or fewer words between + them.) The query would find ``Bob Dylan, aka. Robert + Zimmerman'', but not ``Bob Dylan, born as Robert Zimmerman'' + since the distance in this case is four. + + + + + PQF specification of search term type + + + @term string "a UTF-8 string, maybe?" + + + + + PQF mixed queries + + + @or @and bob dylan @set Result-1 + + @attr 4=1 @and @attr 1=1 "bob dylan" @attr 1=4 "slow train coming" + + @and @attr 2=4 @attr gils 1=2038 -114 @attr 2=2 @attr gils 1=2039 -109 + + + + The last of these examples is a spatial search: in + the GILS attribute set, + access point + 2038 indicates West Bounding Coordinate and + 2030 indicates East Bounding Coordinate, + so the query is for areas extending from -114 degrees + to no more than -109 degrees. + + + + + + + CCL + + + Not all users enjoy typing in prefix query structures and numerical + attribute values, even in a minimalistic test client. In the library + world, the more intuitive Common Command Language - CCL (ISO 8777) + has enjoyed some popularity - especially before the widespread + availability of graphical interfaces. It is still useful in + applications where you for some reason or other need to provide a + symbolic language for expressing boolean query structures. + + + + CCL Syntax + + + The CCL parser obeys the following grammar for the FIND argument. + The syntax is annotated by in the lines prefixed by + --. + + + + CCL-Find ::= CCL-Find Op Elements + | Elements. + + Op ::= "and" | "or" | "not" + -- The above means that Elements are separated by boolean operators. + + Elements ::= '(' CCL-Find ')' + | Set + | Terms + | Qualifiers Relation Terms + | Qualifiers Relation '(' CCL-Find ')' + | Qualifiers '=' string '-' string + -- Elements is either a recursive definition, a result set reference, a + -- list of terms, qualifiers followed by terms, qualifiers followed + -- by a recursive definition or qualifiers in a range (lower - upper). + + Set ::= 'set' = string + -- Reference to a result set + + Terms ::= Terms Prox Term + | Term + -- Proximity of terms. + + Term ::= Term string + | string + -- This basically means that a term may include a blank + + Qualifiers ::= Qualifiers ',' string + | string + -- Qualifiers is a list of strings separated by comma + + Relation ::= '=' | '>=' | '<=' | '<>' | '>' | '<' + -- Relational operators. This really doesn't follow the ISO8777 + -- standard. + + Prox ::= '%' | '!' + -- Proximity operator + + + + + CCL queries + + The following queries are all valid: + + + + dylan + + "bob dylan" + + dylan or zimmerman + + set=1 + + (dylan and bob) or set=1 + + righttrunc? + + "notrunc?" + + singlechar#mask + + + + Assuming that the qualifiers ti, + au + and date are defined we may use: + + + + ti=self portrait + + au=(bob dylan and slow train coming) + + date>1980 and (ti=((self portrait))) + + + + + + + CCL Qualifiers + + + Qualifiers are used to direct the search to a particular searchable + index, such as title (ti) and author indexes (au). The CCL standard + itself doesn't specify a particular set of qualifiers, but it does + suggest a few short-hand notations. You can customize the CCL parser + to support a particular set of qualifiers to reflect the current target + profile. Traditionally, a qualifier would map to a particular + use-attribute within the BIB-1 attribute set. It is also + possible to set other attributes, such as the structure + attribute. + + + + A CCL profile is a set of predefined CCL qualifiers that may be + read from a file or set in the CCL API. + The YAZ client reads its CCL qualifiers from a file named + default.bib. There are four types of + lines in a CCL profile: qualifier specification, + qualifier alias, comments and directives. + + + Qualifier specification + + A qualifier specification is of the form: + + + + qualifier-name + [attributeset,]type=val + [attributeset,]type=val ... + + + + where qualifier-name is the name of the + qualifier to be used (eg. ti), + type is attribute type in the attribute + set (Bib-1 is used if no attribute set is given) and + val is attribute value. + The type can be specified as an + integer or as it be specified either as a single-letter: + u for use, + r for relation,p for position, + s for structure,t for truncation + or c for completeness. + The attributes for the special qualifier name term + are used when no CCL qualifier is given in a query. + + Common Bib-1 attributes + + + + + + Type + Description + + + + + u=value + + Use attribute (1). Common use attributes are + 1 Personal-name, 4 Title, 7 ISBN, 8 ISSN, 30 Date, + 62 Subject, 1003 Author), 1016 Any. Specify value + as an integer. + + + + + r=value + + Relation attribute (2). Common values are + 1 <, 2 <=, 3 =, 4 >=, 5 >, 6 <>, + 100 phonetic, 101 stem, 102 relevance, 103 always matches. + + + + + p=value + + Position attribute (3). Values: 1 first in field, 2 + first in any subfield, 3 any position in field. + + + + + s=value + + Structure attribute (4). Values: 1 phrase, 2 word, + 3 key, 4 year, 5 date, 6 word list, 100 date (un), + 101 name (norm), 102 name (un), 103 structure, 104 urx, + 105 free-form-text, 106 document-text, 107 local-number, + 108 string, 109 numeric string. + + + + + t=value + + Truncation attribute (5). Values: 1 right, 2 left, + 3 left& right, 100 none, 101 process #, 102 regular-1, + 103 regular-2, 104 CCL. + + + + + c=value + + Completeness attribute (6). Values: 1 incomplete subfield, + 2 complete subfield, 3 complete field. + + + + + +
+
+ + Refer to or the complete + list of Bib-1 attributes + + + It is also possible to specify non-numeric attribute values, + which are used in combination with certain types. + The special combinations are: + + + Special attribute combos + + + + + + Name + Description + + + + + s=pw + The structure is set to either word or phrase depending + on the number of tokens in a term (phrase-word). + + + + s=al + Each token in the term is ANDed. (and-list). + This does not set the structure at all. + + + + s=ol + Each token in the term is ORed. (or-list). + This does not set the structure at all. + + + + s=ag + Tokens that appears as phrases (with blank in them) gets + structure phrase attached (4=1). Tokens that appear to be words + gets structure word attached (4=2). Phrases and words are + ANDed. This is a variant of s=al and s=pw, with the main + difference that words are not split (with operator AND) + but instead kept in one RPN token. This facility appeared + in YAZ 4.2.38. + + + + r=o + Allows ranges and the operators greather-than, less-than, ... + equals. + This sets Bib-1 relation attribute accordingly (relation + ordered). A query construct is only treated as a range if + dash is used and that is surrounded by white-space. So + -1980 is treated as term + "-1980" not <= 1980. + If - 1980 is used, however, that is + treated as a range. + + + + r=r + Similar to r=o but assumes that terms + are non-negative (not prefixed with -). + Thus, a dash will always be treated as a range. + The construct 1980-1990 is + treated as a range with r=r but as a + single term "1980-1990" with + r=o. The special attribute + r=r is available in YAZ 2.0.24 or later. + + + + t=l + Allows term to be left-truncated. + If term is of the form ?x, the resulting + Type-1 term is x and truncation is left. + + + + t=r + Allows term to be right-truncated. + If term is of the form x?, the resulting + Type-1 term is x and truncation is right. + + + + t=n + If term is does not include ?, the + truncation attribute is set to none (100). + + + + t=b + Allows term to be both left&right truncated. + If term is of the form ?x?, the + resulting term is x and trunctation is + set to both left&right. + + + + t=x + Allows masking anywhere in a term, thus fully supporting + # (mask one character) and ? (zero or more of any). + If masking is used, trunction is set to 102 (regexp-1 in term) + and the term is converted accordingly to a regular expression. + + + + t=z + Allows masking anywhere in a term, thus fully supporting + # (mask one character) and ? (zero or more of any). + If masking is used, trunction is set to 104 (Z39.58 in term) + and the term is converted accordingly to Z39.58 masking term - + actually the same truncation as CCL itself. + + + + + +
+
+ CCL profile + + Consider the following definition: + + + + ti u=4 s=1 + au u=1 s=1 + term s=105 + ranked r=102 + date u=30 r=o + + + ti and au both set + structure attribute to phrase (s=1). + ti + sets the use-attribute to 4. au sets the + use-attribute to 1. + When no qualifiers are used in the query the structure-attribute is + set to free-form-text (105) (rule for term). + The date sets the relation attribute to + the relation used in the CCL query and sets the use attribute + to 30 (Bib-1 Date). + + + You can combine attributes. To Search for "ranked title" you + can do + + ti,ranked=knuth computer + + which will set relation=ranked, use=title, structure=phrase. + + + Query + + date > 1980 + + is a valid query. But + + ti > 1980 + + is invalid. + + +
+ + Qualifier alias + + A qualifier alias is of the form: + + + q + q1 q2 .. + + + which declares q to + be an alias for q1, + q2... such that the CCL + query q=x is equivalent to + q1=x or q2=x or .... + + + + + Comments + + Lines with white space or lines that begin with + character # are treated as comments. + + + + + Directives + + Directive specifications takes the form + + @directive value + + + CCL directives + + + + + + + Name + Description + Default + + + + + truncation + Truncation character + ? + + + mask + Masking character. Requires YAZ 4.2.58 or later + # + + + field + Specifies how multiple fields are to be + combined. There are two modes: or: + multiple qualifier fields are ORed, + merge: attributes for the qualifier + fields are merged and assigned to one term. + + merge + + + case + Specifies if CCL operators and qualifiers should be + compared with case sensitivity or not. Specify 1 for + case sensitive; 0 for case insensitive. + 1 + + + + and + Specifies token for CCL operator AND. + and + + + + or + Specifies token for CCL operator OR. + or + + + + not + Specifies token for CCL operator NOT. + not + + + + set + Specifies token for CCL operator SET. + set + + + +
+
+
+ + CCL API + + All public definitions can be found in the header file + ccl.h. A profile identifier is of type + CCL_bibset. A profile must be created with the call + to the function ccl_qual_mk which returns a profile + handle of type CCL_bibset. + + + + To read a file containing qualifier definitions the function + ccl_qual_file may be convenient. This function + takes an already opened FILE handle pointer as + argument along with a CCL_bibset handle. + + + + To parse a simple string with a FIND query use the function + + +struct ccl_rpn_node *ccl_find_str(CCL_bibset bibset, const char *str, + int *error, int *pos); + + + which takes the CCL profile (bibset) and query + (str) as input. Upon successful completion the RPN + tree is returned. If an error occur, such as a syntax error, the integer + pointed to by error holds the error code and + pos holds the offset inside query string in which + the parsing failed. + + + + An English representation of the error may be obtained by calling + the ccl_err_msg function. The error codes are + listed in ccl.h. + + + + To convert the CCL RPN tree (type + struct ccl_rpn_node *) + to the Z_RPNQuery of YAZ the function ccl_rpn_query + must be used. This function which is part of YAZ is implemented in + yaz-ccl.c. + After calling this function the CCL RPN tree is probably no longer + needed. The ccl_rpn_delete destroys the CCL RPN tree. + + + + A CCL profile may be destroyed by calling the + ccl_qual_rm function. + + + + The token names for the CCL operators may be changed by setting the + globals (all type char *) + ccl_token_and, ccl_token_or, + ccl_token_not and ccl_token_set. + An operator may have aliases, i.e. there may be more than one name for + the operator. To do this, separate each alias with a space character. + + +
+ CQL + + CQL + - Common Query Language - was defined for the + SRU protocol. + In many ways CQL has a similar syntax to CCL. + The objective of CQL is different. Where CCL aims to be + an end-user language, CQL is the protocol + query language for SRU. + + + + If you are new to CQL, read the + Gentle Introduction. + + + + The CQL parser in &yaz; provides the following: + + + + It parses and validates a CQL query. + + + + + It generates a C structure that allows you to convert + a CQL query to some other query language, such as SQL. + + + + + The parser converts a valid CQL query to PQF, thus providing a + way to use CQL for both SRU servers and Z39.50 targets at the + same time. + + + + + The parser converts CQL to XCQL. + XCQL is an XML representation of CQL. + XCQL is part of the SRU specification. However, since SRU + supports CQL only, we don't expect XCQL to be widely used. + Furthermore, CQL has the advantage over XCQL that it is + easy to read. + + + + + CQL parsing + + A CQL parser is represented by the CQL_parser + handle. Its contents should be considered &yaz; internal (private). + +#include <yaz/cql.h> + +typedef struct cql_parser *CQL_parser; + +CQL_parser cql_parser_create(void); +void cql_parser_destroy(CQL_parser cp); + + A parser is created by cql_parser_create and + is destroyed by cql_parser_destroy. + + + To parse a CQL query string, the following function + is provided: + +int cql_parser_string(CQL_parser cp, const char *str); + + A CQL query is parsed by the cql_parser_string + which takes a query str. + If the query was valid (no syntax errors), then zero is returned; + otherwise -1 is returned to indicate a syntax error. + + + +int cql_parser_stream(CQL_parser cp, + int (*getbyte)(void *client_data), + void (*ungetbyte)(int b, void *client_data), + void *client_data); + +int cql_parser_stdio(CQL_parser cp, FILE *f); + + The functions cql_parser_stream and + cql_parser_stdio parses a CQL query + - just like cql_parser_string. + The only difference is that the CQL query can be + fed to the parser in different ways. + The cql_parser_stream uses a generic + byte stream as input. The cql_parser_stdio + uses a FILE handle which is opened for reading. + + + + CQL tree + + The the query string is valid, the CQL parser + generates a tree representing the structure of the + CQL query. + + + +struct cql_node *cql_parser_result(CQL_parser cp); + + cql_parser_result returns the + a pointer to the root node of the resulting tree. + + + Each node in a CQL tree is represented by a + struct cql_node. + It is defined as follows: + +#define CQL_NODE_ST 1 +#define CQL_NODE_BOOL 2 +#define CQL_NODE_SORT 3 +struct cql_node { + int which; + union { + struct { + char *index; + char *index_uri; + char *term; + char *relation; + char *relation_uri; + struct cql_node *modifiers; + } st; + struct { + char *value; + struct cql_node *left; + struct cql_node *right; + struct cql_node *modifiers; + } boolean; + struct { + char *index; + struct cql_node *next; + struct cql_node *modifiers; + struct cql_node *search; + } sort; + } u; +}; + + There are three node types: search term (ST), boolean (BOOL) + and sortby (SORT). + A modifier is treated as a search term too. + + + The search term node has five members: + + + + index: index for search term. + If an index is unspecified for a search term, + index will be NULL. + + + + + index_uri: index URi for search term + or NULL if none could be resolved for the index. + + + + + term: the search term itself. + + + + + relation: relation for search term. + + + + + relation_uri: relation URI for search term. + + + + + modifiers: relation modifiers for search + term. The modifiers list itself of cql_nodes + each of type ST. + + + + + + + The boolean node represents and, + or, not + + proximity. + + + + left and right: left + - and right operand respectively. + + + + + modifiers: proximity arguments. + + + + + + + The sort node represents both the SORTBY clause. + + + + CQL to PQF conversion + + Conversion to PQF (and Z39.50 RPN) is tricky by the fact + that the resulting RPN depends on the Z39.50 target + capabilities (combinations of supported attributes). + In addition, the CQL and SRU operates on index prefixes + (URI or strings), whereas the RPN uses Object Identifiers + for attribute sets. + + + The CQL library of &yaz; defines a cql_transform_t + type. It represents a particular mapping between CQL and RPN. + This handle is created and destroyed by the functions: + +cql_transform_t cql_transform_open_FILE (FILE *f); +cql_transform_t cql_transform_open_fname(const char *fname); +void cql_transform_close(cql_transform_t ct); + + The first two functions create a tranformation handle from + either an already open FILE or from a filename respectively. + + + The handle is destroyed by cql_transform_close + in which case no further reference of the handle is allowed. + + + When a cql_transform_t handle has been created + you can convert to RPN. + +int cql_transform_buf(cql_transform_t ct, + struct cql_node *cn, char *out, int max); + + This function converts the CQL tree cn + using handle ct. + For the resulting PQF, you supply a buffer out + which must be able to hold at at least max + characters. + + + If conversion failed, cql_transform_buf + returns a non-zero SRU error code; otherwise zero is returned + (conversion successful). The meanings of the numeric error + codes are listed in the SRU specification somewhere (no + direct link anymore). + + + If conversion fails, more information can be obtained by calling + +int cql_transform_error(cql_transform_t ct, char **addinfop); + + This function returns the most recently returned numeric + error-code and sets the string-pointer at + *addinfop to point to a string containing + additional information about the error that occurred: for + example, if the error code is 15 (``Illegal or unsupported context + set''), the additional information is the name of the requested + context set that was not recognised. + + + The SRU error-codes may be translated into brief human-readable + error messages using + +const char *cql_strerror(int code); + + + + If you wish to be able to produce a PQF result in a different + way, there are two alternatives. + +void cql_transform_pr(cql_transform_t ct, + struct cql_node *cn, + void (*pr)(const char *buf, void *client_data), + void *client_data); + +int cql_transform_FILE(cql_transform_t ct, + struct cql_node *cn, FILE *f); + + The former function produces output to a user-defined + output stream. The latter writes the result to an already + open FILE. + + + + Specification of CQL to RPN mappings + + The file supplied to functions + cql_transform_open_FILE, + cql_transform_open_fname follows + a structure found in many Unix utilities. + It consists of mapping specifications - one per line. + Lines starting with # are ignored (comments). + + + Each line is of the form + + CQL pattern = RPN equivalent + + + + An RPN pattern is a simple attribute list. Each attribute pair + takes the form: + + [set] type=value + + The attribute set is optional. + The type is the attribute type, + value the attribute value. + + + The character * (asterisk) has special meaning + when used in the RPN pattern. + Each occurrence of * is substituted with the + CQL matching name (index, relation, qualifier etc). + This facility can be used to copy a CQL name verbatim to the RPN result. + + + The following CQL patterns are recognized: + + + index.set.name + + + + This pattern is invoked when a CQL index, such as + dc.title is converted. set + and name are the context set and index + name respectively. + Typically, the RPN specifies an equivalent use attribute. + + + For terms not bound by an index the pattern + index.cql.serverChoice is used. + Here, the prefix cql is defined as + http://www.loc.gov/zing/cql/cql-indexes/v1.0/. + If this pattern is not defined, the mapping will fail. + + + The pattern, + index.set.* + is used when no other index pattern is matched. + + + + + qualifier.set.name + (DEPRECATED) + + + + For backwards compatibility, this is recognised as a synonym of + index.set.name + + + + + relation.relation + + + + This pattern specifies how a CQL relation is mapped to RPN. + pattern is name of relation + operator. Since = is used as + separator between CQL pattern and RPN, CQL relations + including = cannot be + used directly. To avoid a conflict, the names + ge, + eq, + le, + must be used for CQL operators, greater-than-or-equal, + equal, less-than-or-equal respectively. + The RPN pattern is supposed to include a relation attribute. + + + For terms not bound by a relation, the pattern + relation.scr is used. If the pattern + is not defined, the mapping will fail. + + + The special pattern, relation.* is used + when no other relation pattern is matched. + + + + + + relationModifier.mod + + + + This pattern specifies how a CQL relation modifier is mapped to RPN. + The RPN pattern is usually a relation attribute. + + + + + + structure.type + + + + This pattern specifies how a CQL structure is mapped to RPN. + Note that this CQL pattern is somewhat to similar to + CQL pattern relation. + The type is a CQL relation. + + + The pattern, structure.* is used + when no other structure pattern is matched. + Usually, the RPN equivalent specifies a structure attribute. + + + + + + position.type + + + + This pattern specifies how the anchor (position) of + CQL is mapped to RPN. + The type is one + of first, any, + last, firstAndLast. + + + The pattern, position.* is used + when no other position pattern is matched. + + + + + + set.prefix + + + + This specification defines a CQL context set for a given prefix. + The value on the right hand side is the URI for the set - + not RPN. All prefixes used in + index patterns must be defined this way. + + + + + + set + + + + This specification defines a default CQL context set for index names. + The value on the right hand side is the URI for the set. + + + + + + + + CQL to RPN mapping file + + This simple file defines two context sets, three indexes and three + relations, a position pattern and a default structure. + + + + + With the mappings above, the CQL query + + computer + + is converted to the PQF: + + @attr 1=1016 @attr 2=3 @attr 4=1 @attr 3=3 @attr 6=1 "computer" + + by rules index.cql.serverChoice, + relation.scr, structure.*, + position.any. + + + CQL query + + computer^ + + is rejected, since position.right is + undefined. + + + CQL query + + >my = "http://www.loc.gov/zing/cql/dc-indexes/v1.0/" my.title = x + + is converted to + + @attr 1=4 @attr 2=3 @attr 4=1 @attr 3=3 @attr 6=1 "x" + + + + + CQL to RPN string attributes + + In this example we allow any index to be passed to RPN as + a use attribute. + + + + + The http://bogus/rpn context set is also the default + so we can make queries such as + + title = a + + which is converted to + + @attr 2=3 @attr 4=1 @attr 3=3 @attr 1=title "a" + + + + + CQL to RPN using Bath Profile + + The file etc/pqf.properties has mappings from + the Bath Profile and Dublin Core to RPN. + If YAZ is installed as a package it's usually located + in /usr/share/yaz/etc and part of the + development package, such as libyaz-dev. + + + + CQL to XCQL conversion + + Conversion from CQL to XCQL is trivial and does not + require a mapping to be defined. + There three functions to choose from depending on the + way you wish to store the resulting output (XML buffer + containing XCQL). + +int cql_to_xml_buf(struct cql_node *cn, char *out, int max); +void cql_to_xml(struct cql_node *cn, + void (*pr)(const char *buf, void *client_data), + void *client_data); +void cql_to_xml_stdio(struct cql_node *cn, FILE *f); + + Function cql_to_xml_buf converts + to XCQL and stores result in a user supplied buffer of a given + max size. + + + cql_to_xml writes the result in + a user defined output stream. + cql_to_xml_stdio writes to a + a file. + + + + PQF to CQL conversion + + Conversion from PQF to CQL is offered by the two functions shown + below. The former uses a generic stream for result. The latter + puts result in a WRBUF (string container). + +#include <yaz/rpn2cql.h> + +int cql_transform_rpn2cql_stream(cql_transform_t ct, + void (*pr)(const char *buf, void *client_data), + void *client_data, + Z_RPNQuery *q); + +int cql_transform_rpn2cql_wrbuf(cql_transform_t ct, + WRBUF w, + Z_RPNQuery *q); + + The configuration is the same as used in CQL to PQF conversions. + + + +
+ Object Identifiers + + + The basic YAZ representation of an OID is an array of integers, + terminated with the value -1. This integer is of type + Odr_oid. + + + Fundamental OID operations and the type Odr_oid + are defined in yaz/oid_util.h. + + + An OID can either be declared as a automatic variable or it can + allocated using the memory utilities or ODR/NMEM. It's + guaranteed that an OID can fit in OID_SIZE integers. + + Create OID on stack + + We can create an OID for the Bib-1 attribute set with: + + Odr_oid bib1[OID_SIZE]; + bib1[0] = 1; + bib1[1] = 2; + bib1[2] = 840; + bib1[3] = 10003; + bib1[4] = 3; + bib1[5] = 1; + bib1[6] = -1; + + + + + And OID may also be filled from a string-based representation using + dots (.). This is achieved by function + + int oid_dotstring_to_oid(const char *name, Odr_oid *oid); + + This functions returns 0 if name could be converted; -1 otherwise. + + Using oid_oiddotstring_to_oid + + We can fill the Bib-1 attribute set OID easier with: + + Odr_oid bib1[OID_SIZE]; + oid_oiddotstring_to_oid("1.2.840.10003.3.1", bib1); + + + + + We can also allocate an OID dynamically on a ODR stream with: + + Odr_oid *odr_getoidbystr(ODR o, const char *str); + + This creates an OID from string-based representation using dots. + This function take an &odr; stream as parameter. This stream is used to + allocate memory for the data elements, which is released on a + subsequent call to odr_reset() on that stream. + + + Using odr_getoidbystr + + We can create a OID for the Bib-1 attribute set with: + + Odr_oid *bib1 = odr_getoidbystr(odr, "1.2.840.10003.3.1"); + + + + + + The function + + char *oid_oid_to_dotstring(const Odr_oid *oid, char *oidbuf) + + does the reverse of oid_oiddotstring_to_oid. It + converts an OID to the string-based representation using dots. + The supplied char buffer oidbuf holds the resulting + string and must be at least OID_STR_MAX in size. + + + + OIDs can be copied with oid_oidcpy which takes + two OID lists as arguments. Alternativly, an OID copy can be allocated + on a ODR stream with: + + Odr_oid *odr_oiddup(ODR odr, const Odr_oid *o); + + + + + OIDs can be compared with oid_oidcmp which returns + zero if the two OIDs provided are identical; non-zero otherwise. + + + OID database + + From YAZ version 3 and later, the oident system has been replaced + by an OID database. OID database is a misnomer .. the old odient + system was also a database. + + + The OID database is really just a map between named Object Identifiers + (string) and their OID raw equivalents. Most operations either + convert from string to OID or other way around. + + + Unfortunately, whenever we supply a string we must also specify the + OID class. The class is necessary because some + strings correspond to multiple OIDs. An example of such a string is + Bib-1 which may either be an attribute-set + or a diagnostic-set. + + + Applications using the YAZ database should include + yaz/oid_db.h. + + + A YAZ database handle is of type yaz_oid_db_t. + Actually that's a pointer. You need not think deal with that. + YAZ has a built-in database which can be considered "constant" for + most purposes. + We can get hold that by using function yaz_oid_std. + + + All functions with prefix yaz_string_to_oid + converts from class + string to OID. We have variants of this + operation due to different memory allocation strategies. + + + All functions with prefix + yaz_oid_to_string converts from OID to string + + class. + + + Create OID with YAZ DB + + We can create an OID for the Bib-1 attribute set on the ODR stream + odr with: + + Odr_oid *bib1 = + yaz_string_to_oid_odr(yaz_oid_std(), CLASS_ATTSET, "Bib-1", odr); + + This is more complex than using odr_getoidbystr. + You would only use yaz_string_to_oid_odr when the + string (here Bib-1) is supplied by a user or configuration. + + + + + Standard OIDs + + + All the object identifers in the standard OID database as returned + by yaz_oid_std can referenced directly in a + program as a constant OID. + Each constant OID is prefixed with yaz_oid_ - + followed by OID class (lowercase) - then by OID name (normalized and + lowercase). + + + See for list of all object identifiers + built into YAZ. + These are declared in yaz/oid_std.h but are + included by yaz/oid_db.h as well. + + + Use a built-in OID + + We can allocate our own OID filled with the constant OID for + Bib-1 with: + + Odr_oid *bib1 = odr_oiddup(o, yaz_oid_attset_bib1); + + + + + + Nibble Memory + + + Sometimes when you need to allocate and construct a large, + interconnected complex of structures, it can be a bit of a pain to + release the associated memory again. For the structures describing the + Z39.50 PDUs and related structures, it is convenient to use the + memory-management system of the &odr; subsystem (see + ). However, in some circumstances + where you might otherwise benefit from using a simple nibble memory + management system, it may be impractical to use + odr_malloc() and odr_reset(). + For this purpose, the memory manager which also supports the &odr; + streams is made available in the NMEM module. The external interface + to this module is given in the nmem.h file. + + + + The following prototypes are given: + + + + NMEM nmem_create(void); + void nmem_destroy(NMEM n); + void *nmem_malloc(NMEM n, size_t size); + void nmem_reset(NMEM n); + size_t nmem_total(NMEM n); + void nmem_init(void); + void nmem_exit(void); + + + + The nmem_create() function returns a pointer to a + memory control handle, which can be released again by + nmem_destroy() when no longer needed. + The function nmem_malloc() allocates a block of + memory of the requested size. A call to nmem_reset() + or nmem_destroy() will release all memory allocated + on the handle since it was created (or since the last call to + nmem_reset(). The function + nmem_total() returns the number of bytes currently + allocated on the handle. + + + + The nibble memory pool is shared amongst threads. POSIX + mutex'es and WIN32 Critical sections are introduced to keep the + module thread safe. Function nmem_init() + initializes the nibble memory library and it is called automatically + the first time the YAZ.DLL is loaded. &yaz; uses + function DllMain to achieve this. You should + not call nmem_init or + nmem_exit unless you're absolute sure what + you're doing. Note that in previous &yaz; versions you'd have to call + nmem_init yourself. + + + + + Log + + &yaz; has evolved a fairly complex log system which should be useful both + for debugging &yaz; itself, debugging applications that use &yaz;, and for + production use of those applications. + + + The log functions are declared in header yaz/log.h + and implemented in src/log.c. + Due to name clash with syslog and some math utilities the logging + interface has been modified as of YAZ 2.0.29. The obsolete interface + is still available if in header file yaz/log.h. + The key points of the interface are: + + + void yaz_log(int level, const char *fmt, ...) + + void yaz_log_init(int level, const char *prefix, const char *name); + void yaz_log_init_file(const char *fname); + void yaz_log_init_level(int level); + void yaz_log_init_prefix(const char *prefix); + void yaz_log_time_format(const char *fmt); + void yaz_log_init_max_size(int mx); + + int yaz_log_mask_str(const char *str); + int yaz_log_module_level(const char *name); + + + + The reason for the whole log module is the yaz_log + function. It takes a bitmask indicating the log levels, a + printf-like format string, and a variable number of + arguments to log. + + + + The log level is a bit mask, that says on which level(s) + the log entry should be made, and optionally set some behaviour of the + logging. In the most simple cases, it can be one of YLOG_FATAL, + YLOG_DEBUG, YLOG_WARN, YLOG_LOG. Those can be combined with bits + that modify the way the log entry is written:YLOG_ERRNO, + YLOG_NOTIME, YLOG_FLUSH. + Most of the rest of the bits are deprecated, and should not be used. Use + the dynamic log levels instead. + + + + Applications that use &yaz;, should not use the LOG_LOG for ordinary + messages, but should make use of the dynamic loglevel system. This consists + of two parts, defining the loglevel and checking it. + + + + To define the log levels, the (main) program should pass a string to + yaz_log_mask_str to define which log levels are to be + logged. This string should be a comma-separated list of log level names, + and can contain both hard-coded names and dynamic ones. The log level + calculation starts with YLOG_DEFAULT_LEVEL and adds a bit + for each word it meets, unless the word starts with a '-', in which case it + clears the bit. If the string 'none' is found, + all bits are cleared. Typically this string comes from the command-line, + often identified by -v. The + yaz_log_mask_str returns a log level that should be + passed to yaz_log_init_level for it to take effect. + + + + Each module should check what log bits it should be used, by calling + yaz_log_module_level with a suitable name for the + module. The name is cleared from a preceding path and an extension, if any, + so it is quite possible to use __FILE__ for it. If the + name has been passed to yaz_log_mask_str, the routine + returns a non-zero bitmask, which should then be used in consequent calls + to yaz_log. (It can also be tested, so as to avoid unnecessary calls to + yaz_log, in time-critical places, or when the log entry would take time + to construct.) + + + + Yaz uses the following dynamic log levels: + server, session, request, requestdetail for the server + functionality. + zoom for the zoom client api. + ztest for the simple test server. + malloc, nmem, odr, eventl for internal debugging of yaz itself. + Of course, any program using yaz is welcome to define as many new ones, as + it needs. + + + + By default the log is written to stderr, but this can be changed by a call + to yaz_log_init_file or + yaz_log_init. If the log is directed to a file, the + file size is checked at every write, and if it exceeds the limit given in + yaz_log_init_max_size, the log is rotated. The + rotation keeps one old version (with a .1 appended to + the name). The size defaults to 1GB. Setting it to zero will disable the + rotation feature. + + + + A typical yaz-log looks like this + 13:23:14-23/11 yaz-ztest(1) [session] Starting session from tcp:127.0.0.1 (pid=30968) + 13:23:14-23/11 yaz-ztest(1) [request] Init from 'YAZ' (81) (ver 2.0.28) OK + 13:23:17-23/11 yaz-ztest(1) [request] Search Z: @attrset Bib-1 foo OK:7 hits + 13:23:22-23/11 yaz-ztest(1) [request] Present: [1] 2+2 OK 2 records returned + 13:24:13-23/11 yaz-ztest(1) [request] Close OK + + + + The log entries start with a time stamp. This can be omitted by setting the + YLOG_NOTIME bit in the loglevel. This way automatic tests + can be hoped to produce identical log files, that are easy to diff. The + format of the time stamp can be set with + yaz_log_time_format, which takes a format string just + like strftime. + + + + Next in a log line comes the prefix, often the name of the program. For + yaz-based servers, it can also contain the session number. Then + comes one or more logbits in square brackets, depending on the logging + level set by yaz_log_init_level and the loglevel + passed to yaz_log_init_level. Finally comes the format + string and additional values passed to yaz_log + + + + The log level YLOG_LOGLVL, enabled by the string + loglevel, will log all the log-level affecting + operations. This can come in handy if you need to know what other log + levels would be useful. Grep the logfile for [loglevel]. + + + + The log system is almost independent of the rest of &yaz;, the only + important dependence is of nmem, and that only for + using the semaphore definition there. + + + + The dynamic log levels and log rotation were introduced in &yaz; 2.0.28. At + the same time, the log bit names were changed from + LOG_something to YLOG_something, + to avoid collision with syslog.h. + + + + + MARC + + + YAZ provides a fast utility for working with MARC records. + Early versions of the MARC utility only allowed decoding of ISO2709. + Today the utility may both encode - and decode to a varity of formats. + + + + /* create handler */ + yaz_marc_t yaz_marc_create(void); + /* destroy */ + void yaz_marc_destroy(yaz_marc_t mt); + + /* set XML mode YAZ_MARC_LINE, YAZ_MARC_SIMPLEXML, ... */ + void yaz_marc_xml(yaz_marc_t mt, int xmlmode); + #define YAZ_MARC_LINE 0 + #define YAZ_MARC_SIMPLEXML 1 + #define YAZ_MARC_OAIMARC 2 + #define YAZ_MARC_MARCXML 3 + #define YAZ_MARC_ISO2709 4 + #define YAZ_MARC_XCHANGE 5 + #define YAZ_MARC_CHECK 6 + #define YAZ_MARC_TURBOMARC 7 + #define YAZ_MARC_JSON 8 + + /* supply iconv handle for character set conversion .. */ + void yaz_marc_iconv(yaz_marc_t mt, yaz_iconv_t cd); + + /* set debug level, 0=none, 1=more, 2=even more, .. */ + void yaz_marc_debug(yaz_marc_t mt, int level); + + /* decode MARC in buf of size bsize. Returns >0 on success; <=0 on failure. + On success, result in *result with size *rsize. */ + int yaz_marc_decode_buf(yaz_marc_t mt, const char *buf, int bsize, + const char **result, size_t *rsize); + + /* decode MARC in buf of size bsize. Returns >0 on success; <=0 on failure. + On success, result in WRBUF */ + int yaz_marc_decode_wrbuf(yaz_marc_t mt, const char *buf, + int bsize, WRBUF wrbuf); +]]> + + + + The synopsis is just a basic subset of all functionality. Refer + to the actual header file marcdisp.h for + details. + + + + A MARC conversion handle must be created by using + yaz_marc_create and destroyed + by calling yaz_marc_destroy. + + + All other function operate on a yaz_marc_t handle. + The output is specified by a call to yaz_marc_xml. + The xmlmode must be one of + + + YAZ_MARC_LINE + + + A simple line-by-line format suitable for display but not + recommend for further (machine) processing. + + + + + + YAZ_MARC_MARCXML + + + MARCXML. + + + + + + YAZ_MARC_ISO2709 + + + ISO2709 (sometimes just referred to as "MARC"). + + + + + + YAZ_MARC_XCHANGE + + + MarcXchange. + + + + + + YAZ_MARC_CHECK + + + Pseudo format for validation only. Does not generate + any real output except diagnostics. + + + + + + YAZ_MARC_TURBOMARC + + + XML format with same semantics as MARCXML but more compact + and geared towards fast processing with XSLT. Refer to + for more information. + + + + + + YAZ_MARC_JSON + + + MARC-in_JSON format. + + + + + + + + The actual conversion functions are + yaz_marc_decode_buf and + yaz_marc_decode_wrbuf which decodes and encodes + a MARC record. The former function operates on simple buffers, the + stores the resulting record in a WRBUF handle (WRBUF is a simple string + type). + + + Display of MARC record + + The following program snippet illustrates how the MARC API may + be used to convert a MARC record to the line-by-line format: + + + + + + TurboMARC + + TurboMARC is yet another XML encoding of a MARC record. The format + was designed for fast processing with XSLT. + + + Applications like + Pazpar2 uses XSLT to convert an XML encoded MARC record to an internal + representation. This conversion mostly check the tag of a MARC field + to determine the basic rules in the conversion. This check is + costly when that is tag is encoded as an attribute in MARCXML. + By having the tag value as the element instead, makes processing + many times faster (at least for Libxslt). + + + TurboMARC is encoded as follows: + + + Record elements is part of namespace + "http://www.indexdata.com/turbomarc". + + + A record is enclosed in element r. + + + A collection of records is enclosed in element + collection. + + + The leader is encoded as element l with the + leader content as its (text) value. + + + A control field is encoded as element c concatenated + with the tag value of the control field if the tag value + matches the regular expression [a-zA-Z0-9]*. + If the tag value do not match the regular expression + [a-zA-Z0-9]* the control field is encoded + as element c and attribute code + will hold the tag value. + This rule ensure that in the rare cases where a tag value might + result in a non-wellformed XML YAZ encode it as a coded attribute + (as in MARCXML). + + + The control field content is the the text value of this element. + Indicators are encoded as attribute names + i1, i2, etc.. and + corresponding values for each indicator. + + + A data field is encoded as element d concatenated + with the tag value of the data field or using the attribute + code as described in the rules for control fields. + The children of the data field element is subfield elements. + Each subfield element is encoded as s + concatenated with the sub field code. + The text of the subfield element is the contents of the subfield. + Indicators are encoded as attributes for the data field element similar + to the encoding for control fields. + + + + + + + + Retrieval Facility + + YAZ version 2.1.20 or later includes a Retrieval facility tool + which allows a SRU/Z39.50 to describe itself and perform record + conversions. The idea is the following: + + + + + An SRU/Z39.50 client sends a retrieval request which includes + a combination of the following parameters: syntax (format), + schema (or element set name). + + + + + + The retrieval facility is invoked with parameters in a + server/proxy. The retrieval facility matches the parameters a set of + "supported" retrieval types. + If there is no match, the retrieval signals an error + (syntax and / or schema not supported). + + + + + + For a successful match, the backend is invoked with the same + or altered retrieval parameters (syntax, schema). If + a record is received from the backend, it is converted to the + frontend name / syntax. + + + + + + The resulting record is sent back the client and tagged with + the frontend syntax / schema. + + + + + + + The Retrieval facility is driven by an XML configuration. The + configuration is neither Z39.50 ZeeRex or SRU ZeeRex. But it + should be easy to generate both of them from the XML configuration. + (unfortunately the two versions + of ZeeRex differ substantially in this regard). + + + Retrieval XML format + + All elements should be covered by namespace + http://indexdata.com/yaz . + The root element node must be retrievalinfo. + + + The retrievalinfo must include one or + more retrieval elements. Each + retrieval defines specific combination of + syntax, name and identifier supported by this retrieval service. + + + The retrieval element may include any of the + following attributes: + + syntax (REQUIRED) + + + Defines the record syntax. Possible values is any + of the names defined in YAZ' OID database or a raw + OID in (n.n ... n). + + + + name (OPTIONAL) + + + Defines the name of the retrieval format. This can be + any string. For SRU, the value, is equivalent to schema (short-hand); + for Z39.50 it's equivalent to simple element set name. + For YAZ 3.0.24 and later this name may be specified as a glob + expression with operators + * and ?. + + + + identifier (OPTIONAL) + + + Defines the URI schema name of the retrieval format. This can be + any string. For SRU, the value, is equivalent to URI schema. + For Z39.50, there is no equivalent. + + + + + + + The retrieval may include one + backend element. If a backend + element is given, it specifies how the records are retrieved by + some backend and how the records are converted from the backend to + the "frontend". + + + The attributes, name and syntax + may be specified for the backend element. These + semantics of these attributes is equivalent to those for the + retrieval. However, these values are passed to + the "backend". + + + The backend element may includes one or more + conversion instructions (as children elements). The supported + conversions are: + + marc + + + The marc element specifies a conversion + to - and from ISO2709 encoded MARC and + &acro.marcxml;/MarcXchange. + The following attributes may be specified: + + + inputformat (REQUIRED) + + + Format of input. Supported values are + marc (for ISO2709), xml + (MARCXML/MarcXchange) and json + (MARC-in_JSON). + + + + + outputformat (REQUIRED) + + + Format of output. Supported values are + line (MARC line format); + marcxml (for MARCXML), + marc (ISO2709), + marcxhcange (for MarcXchange), + or json + (MARC-in_JSON ). + + + + + inputcharset (OPTIONAL) + + + Encoding of input. For XML input formats, this need not + be given, but for ISO2709 based inputformats, this should + be set to the encoding used. For MARC21 records, a common + inputcharset value would be marc-8. + + + + + outputcharset (OPTIONAL) + + + Encoding of output. If outputformat is XML based, it is + strongly recommened to use utf-8. + + + + + + + + + xslt + + + The xslt element specifies a conversion + via &acro.xslt;. The following attributes may be specified: + + + stylesheet (REQUIRED) + + + Stylesheet file. + + + + + + + + + + + + + Retrieval Facility Examples + + MARC21 backend + + A typical way to use the retrieval facility is to enable XML + for servers that only supports ISO2709 encoded MARC21 records. + + + + + + + + + + + + + + + + +]]> + + + This means that our frontend supports: + + + + MARC21 F(ull) records. + + + + + MARC21 B(rief) records. + + + + + + MARCXML records. + + + + + + Dublin core records. + + + + + + + + MARCXML backend + + SRW/SRU and Solr backends returns records in XML. + If they return MARCXML or MarcXchange, the retrieval module + can convert those into ISO2709 formats, most commonly USMARC + (AKA MARC21). + In this example, the backend returns MARCXML for schema="marcxml". + + + + + + + + + + + + + + +]]> + + + This means that our frontend supports: + + + + MARC21 records (any element set name) in MARC-8 encoding. + + + + + MARCXML records for element-set=marcxml + + + + + Dublin core records for element-set=dc. + + + + + + + + + API + + It should be easy to use the retrieval systems from applications. Refer + to the headers + yaz/retrieval.h and + yaz/record_conv.h. + + + + Sorting + + This chapter describes sorting and how it is supported in YAZ. + Sorting applies to a result-set. + The + Z39.50 sorting facility + + takes one or more input result-sets + and one result-set as output. The most simple case is that + the input-set is the same as the output-set. + + + Z39.50 sorting has a separate APDU (service) that is, thus, performed + following a search (two phases). + + + In SRU/Solr, however, the model is different. Here, sorting is specified + during the the search operation. Note, however, that SRU might + perform sort as separate search, by referring to an existing result-set + in the query (result-set reference). + + Using the Z39.50 sort service + + yaz-client and the ZOOM API supports the Z39.50 sort facility. In any + case the sort sequence or sort critiera is using a string notation. + This notation is a one-line notation suitable for being manually + entered or generated and allows for easy logging (one liner). + For the ZOOM API, the sort is specified in the call to ZOOM_query_sortby + function. For yaz-client the sort is performed and specified using + the sort and sort+ commands. For description of the sort criteria notation + refer to the sort command in the + yaz-client manual. + + + The ZOOM API might choose one of several sort strategies for + sorting. Refer to . + + + Type-7 sort + + Type-7 sort is an extension to the Bib-1 based RPN query where the + sort specification is embedded as an Attribute-Plus-Term. + + + The objectives for introducing Type-7 sorting is that it allows + a client to perform sorting even if it does not implement/support + Z39.50 sort. Virtually all Z39.50 client software supports + RPN queries. It also may improve performance because the sort + critieria is specified along with the search query. + + + The sort is triggered by the presence of type 7 and the value of type 7 + specifies the + + sortRelation + + The value for type 7 is 1 for ascending and 2 for descending. + For the + + sortElement + + only the generic part is handled. If generic sortKey is of type + sortField, then attribute type 1 is present and the value is + sortField (InternationalString). If generic sortKey is of type + sortAttributes, then the attributes in list is used . generic sortKey + of type elementSpec is not supported. + + + The term in the sorting Attribute-Plus-Term combo should hold + an integer. The value is 0 for primary sorting criteria, 1 for second + criteria, etc. + + + + Facets + + YAZ supports facets for in Solr, SRU 2.0 and Z39.50 protocols. + + + Like Type-1/RPN, YAZ supports a string notation for specifying + facets. For the API this is performed by + yaz_pqf_parse_facet_list. + + + For ZOOM C the facets are given by option "facets" + For yaz-client it is used for the facets command. + + + The grammar of this specification is as follows: + + facet-spec ::= facet-list + + facet-list ::= facet-list ',' attr-spec | attr-spec + + attr-spec ::= attr-spec '@attr' string | '@attr' string + + + The notation is inspired by PQF. The string following '@attr' + may not include blanks and is of the form + type=value, + where type is an integer and + value is a string or an integer. + + + The Facets specification is not Bib-1. The following types apply: + + + Facet attributes + + + + + + Type + Description + + + + + 1 + + Field-name. This is often a string, eg "Author", "Year", etc. + + + + + 2 + + Sort order. Value should be an integer. + Value 0: count descending (frequency). Value 1: alpha ascending. + + + + + 3 + + Number of terms requested. + + + + + 4 + + Start offset. + + + + + +
+
+
+ +