X-Git-Url: http://git.indexdata.com/?p=yaz-moved-to-github.git;a=blobdiff_plain;f=doc%2Ftools.xml;h=e1209a1e2423043e97c44ca761bc8e6c5859ece3;hp=b5f220d0ab13b099a42f1f180b6806ec6cdefae5;hb=f7db9d090f9a6baf83fa7f1e6c277d1ecc4d0a64;hpb=cbd3dfed0ab7919bf1deb0e8af5fed4ffb8bdf5d diff --git a/doc/tools.xml b/doc/tools.xml index b5f220d..e1209a1 100644 --- a/doc/tools.xml +++ b/doc/tools.xml @@ -1,4 +1,3 @@ - Supporting Tools @@ -129,11 +128,11 @@ query ::= top-set query-struct. - top-set ::= [ '@attrset' string ] + top-set ::= [ '@attrset' string ] query-struct ::= attr-spec | simple | complex | '@term' term-type query - attr-spec ::= '@attr' [ string ] string query-struct + attr-spec ::= '@attr' [ string ] string query-struct complex ::= operator query-struct query-struct. @@ -225,7 +224,7 @@ @and. Its semantics are described in section 3.7.2 (Proximity) of Z39.50 the standard itself, which can be read on-line at - + In PQF, the proximity operation is represented by a sequence @@ -294,50 +293,63 @@ (The numeric values of the relation and well-known unit-code parameters are taken straight from - the ASN.1 of the proximity structure in the standard.) PQF queries - PQF queries using simple terms + + PQF queries using simple terms dylan + "bob dylan" - PQF boolean operators + + PQF boolean operators @or "dylan" "zimmerman" + @and @or dylan zimmerman when + @and when @or dylan zimmerman - PQF references to result sets + + PQF references to result sets @set Result-1 - @and @set seta setb + + @and @set seta @set setb - Attributes for terms + + Attributes for terms @attr 1=4 computer + @attr 1=4 @attr 4=1 "self portrait" + @attrset exp1 @attr 1=1 CategoryList + @attr gils 1=2008 Copenhagen + @attr 1=/book/title computer - PQF Proximity queries + + PQF Proximity queries @prox 0 3 1 2 k 2 dylan zimmerman @@ -376,14 +388,16 @@ - PQF specification of search term + + PQF specification of search term type @term string "a UTF-8 string, maybe?" - PQF mixed queries + + PQF mixed queries @or @and bob dylan @set Result-1 @@ -420,23 +434,13 @@ symbolic language for expressing boolean query structures. - - The EUROPAGATE research project working under the Libraries programme - of the European Commission's DG XIII has, amongst other useful tools, - implemented a general-purpose CCL parser which produces an output - structure that can be trivially converted to the internal RPN - representation of &yaz; (The Z_RPNQuery structure). - Since the CCL utility - along with the rest of the software - produced by EUROPAGATE - is made freely available on a liberal - license, it is included as a supplement to &yaz;. - - - CCL Syntax + + CCL Syntax The CCL parser obeys the following grammar for the FIND argument. The syntax is annotated by in the lines prefixed by - ‐‐. + --. @@ -480,7 +484,8 @@ - CCL queries + + CCL queries The following queries are all valid: @@ -514,7 +519,8 @@ - CCL Qualifiers + + CCL Qualifiers Qualifiers are used to direct the search to a particular searchable @@ -536,7 +542,8 @@ lines in a CCL profile: qualifier specification, qualifier alias, comments and directives. - Qualifier specification + + Qualifier specification A qualifier specification is of the form: @@ -561,7 +568,8 @@ or c for completeness. The attributes for the special qualifier name term are used when no CCL qualifier is given in a query. - Common Bib-1 attributes +
+ Common Bib-1 attributes @@ -575,7 +583,7 @@ u=value - Use attribute. Common use attributes are + Use attribute (1). Common use attributes are 1 Personal-name, 4 Title, 7 ISBN, 8 ISSN, 30 Date, 62 Subject, 1003 Author), 1016 Any. Specify value as an integer. @@ -585,7 +593,7 @@ r=value - Relation attribute. Common values are + Relation attribute (2). Common values are 1 <, 2 <=, 3 =, 4 >=, 5 >, 6 <>, 100 phonetic, 101 stem, 102 relevance, 103 always matches. @@ -594,7 +602,7 @@ p=value - Position attribute. Values: 1 first in field, 2 + Position attribute (3). Values: 1 first in field, 2 first in any subfield, 3 any position in field. @@ -602,7 +610,7 @@ s=value - Structure attribute. Values: 1 phrase, 2 word, + Structure attribute (4). Values: 1 phrase, 2 word, 3 key, 4 year, 5 date, 6 word list, 100 date (un), 101 name (norm), 102 name (un), 103 structure, 104 urx, 105 free-form-text, 106 document-text, 107 local-number, @@ -613,7 +621,7 @@ t=value - Truncation attribute. Values: 1 right, 2 left, + Truncation attribute (5). Values: 1 right, 2 left, 3 left& right, 100 none, 101 process #, 102 regular-1, 103 regular-2, 104 CCL. @@ -622,7 +630,7 @@ c=value - Completeness attribute. Values: 1 incomplete subfield, + Completeness attribute (6). Values: 1 incomplete subfield, 2 complete subfield, 3 complete field. @@ -632,17 +640,16 @@
- The complete list of Bib-1 attributes can be found - - here - . + Refer to or the complete + list of Bib-1 attributes It is also possible to specify non-numeric attribute values, which are used in combination with certain types. The special combinations are: - Special attribute combos +
+ Special attribute combos @@ -672,9 +679,39 @@ + s=ag + Tokens that appears as phrases (with blank in them) gets + structure phrase attached. Tokens that appers as words + gets structure phrase attached. Phrases and words are + ANDed. This is a variant of s=al and s=pw, with the main + difference that words are not split (with operator AND) + but instead kept in one RPN token. This facility appeared + in YAZ 4.2.38. + + + r=o - Allows operators greather-than, less-than, ... equals and - sets relation attribute accordingly (relation ordered). + Allows ranges and the operators greather-than, less-than, ... + equals. + This sets Bib-1 relation attribute accordingly (relation + ordered). A query construct is only treated as a range if + dash is used and that is surrounded by white-space. So + -1980 is treated as term + "-1980" not <= 1980. + If - 1980 is used, however, that is + treated as a range. + + + + r=r + Similar to r=o but assumes that terms + are non-negative (not prefixed with -). + Thus, a dash will always be treated as a range. + The construct 1980-1990 is + treated as a range with r=r but as a + single term "1980-1990" with + r=o. The special attribute + r=r is available in YAZ 2.0.24 or later. @@ -705,11 +742,29 @@ set to both left&right. + + t=x + Allows masking anywhere in a term, thus fully supporting + # (mask one character) and ? (zero or more of any). + If masking is used, trunction is set to 102 (regexp-1 in term) + and the term is converted accordingly to a regular expression. + + + + t=z + Allows masking anywhere in a term, thus fully supporting + # (mask one character) and ? (zero or more of any). + If masking is used, trunction is set to 104 (Z39.58 in term) + and the term is converted accordingly to Z39.58 masking term - + actually the same truncation as CCL itself. + + +
- CCL profile + CCL profile Consider the following definition: @@ -722,11 +777,6 @@ date u=30 r=o - Four qualifiers are defined - ti, - au, ranked and - date. - - ti and au both set structure attribute to phrase (s=1). ti @@ -749,9 +799,9 @@ Query - year > 1980 + date > 1980 - is a valid query, while + is a valid query. But ti > 1980 @@ -759,7 +809,8 @@
- Qualifier alias + + Qualifier alias A qualifier alias is of the form: @@ -772,24 +823,27 @@ be an alias for q1, q2... such that the CCL query q=x is equivalent to - q1=x or w2=x or .... + q1=x or q2=x or .... - Comments + + Comments Lines with white space or lines that begin with character # are treated as comments. - Directives + + Directives Directive specifications takes the form @directive value - CCL directives +
+ CCL directives @@ -819,10 +873,10 @@ case - Specificies if CCL operatores and qualifiers should be - compared with case sensitivity or not. Specify 0 for - case sensitive; 1 for case insensitive. - 0 + Specifies if CCL operators and qualifiers should be + compared with case sensitivity or not. Specify 1 for + case sensitive; 0 for case insensitive. + 1 @@ -853,7 +907,8 @@
- CCL API + + CCL API All public definitions can be found in the header file ccl.h. A profile identifier is of type @@ -916,22 +971,20 @@ struct ccl_rpn_node *ccl_find_str (CCL_bibset bibset, const char *str, - CQL + CQL - CQL + CQL - Common Query Language - was defined for the - SRW - protocol. + SRU protocol. In many ways CQL has a similar syntax to CCL. The objective of CQL is different. Where CCL aims to be an end-user language, CQL is the protocol - query language for SRW. + query language for SRU. If you are new to CQL, read the - Gentle - Introduction. + Gentle Introduction. @@ -951,17 +1004,16 @@ struct ccl_rpn_node *ccl_find_str (CCL_bibset bibset, const char *str, The parser converts a valid CQL query to PQF, thus providing a - way to use CQL for both SRW/SRU servers and Z39.50 targets at the + way to use CQL for both SRU servers and Z39.50 targets at the same time. The parser converts CQL to - - XCQL. + XCQL. XCQL is an XML representation of CQL. - XCQL is part of the SRW specification. However, since SRU + XCQL is part of the SRU specification. However, since SRU supports CQL only, we don't expect XCQL to be widely used. Furthermore, CQL has the advantage over XCQL that it is easy to read. @@ -969,7 +1021,7 @@ struct ccl_rpn_node *ccl_find_str (CCL_bibset bibset, const char *str, - CQL parsing + CQL parsing A CQL parser is represented by the CQL_parser handle. Its contents should be considered &yaz; internal (private). @@ -1015,7 +1067,7 @@ int cql_parser_stdio(CQL_parser cp, FILE *f); - CQL tree + CQL tree The the query string is valid, the CQL parser generates a tree representing the structure of the @@ -1035,34 +1087,28 @@ struct cql_node *cql_parser_result(CQL_parser cp); #define CQL_NODE_ST 1 #define CQL_NODE_BOOL 2 -#define CQL_NODE_MOD 3 struct cql_node { int which; union { struct { char *index; + char *index_uri; char *term; char *relation; + char *relation_uri; struct cql_node *modifiers; - struct cql_node *prefixes; } st; struct { char *value; struct cql_node *left; struct cql_node *right; struct cql_node *modifiers; - struct cql_node *prefixes; } boolean; - struct { - char *name; - char *value; - struct cql_node *next; - } mod; } u; }; - There are three kinds of nodes, search term (ST), boolean (BOOL), - and modifier (MOD). + There are two node types: search term (ST) and boolean (BOOL). + A modifier is treated as a search term too. The search term node has five members: @@ -1076,6 +1122,12 @@ struct cql_node { + index_uri: index URi for search term + or NULL if none could be resolved for the index. + + + + term: the search term itself. @@ -1086,18 +1138,14 @@ struct cql_node { - modifiers: relation modifiers for search - term. The modifiers is a simple linked - list (NULL for last entry). Each relation modifier node - is of type MOD. + relation_uri: relation URI for search term. - prefixes: index prefixes for search - term. The prefixes is a simple linked - list (NULL for last entry). Each prefix node - is of type MOD. + modifiers: relation modifiers for search + term. The modifiers list itself of cql_nodes + each of type ST. @@ -1119,47 +1167,16 @@ struct cql_node { modifiers: proximity arguments. - - - prefixes: index prefixes. - The prefixes is a simple linked - list (NULL for last entry). Each prefix node - is of type MOD. - - - - - - - The modifier node is a "utility" node used for name-value pairs, - such as prefixes, proximity arguements, etc. - - - - name name of mod node. - - - - - value value of mod node. - - - - - next: pointer to next node which is - always a mod node (NULL for last entry). - - - CQL to PQF conversion + CQL to PQF conversion Conversion to PQF (and Z39.50 RPN) is tricky by the fact that the resulting RPN depends on the Z39.50 target capabilities (combinations of supported attributes). - In addition, the CQL and SRW operates on index prefixes + In addition, the CQL and SRU operates on index prefixes (URI or strings), whereas the RPN uses Object Identifiers for attribute sets. @@ -1194,10 +1211,10 @@ int cql_transform_buf(cql_transform_t ct, If conversion failed, cql_transform_buf - returns a non-zero SRW error code; otherwise zero is returned + returns a non-zero SRU error code; otherwise zero is returned (conversion successful). The meanings of the numeric error - codes are listed in the SRW specifications at - + codes are listed in the SRU specifications at + If conversion fails, more information can be obtained by calling @@ -1208,12 +1225,12 @@ int cql_transform_error(cql_transform_t ct, char **addinfop); error-code and sets the string-pointer at *addinfop to point to a string containing additional information about the error that occurred: for - example, if the error code is 15 (``Illegal or unsupported index + example, if the error code is 15 (``Illegal or unsupported context set''), the additional information is the name of the requested - index set that was not recognised. + context set that was not recognised. - The SRW error-codes may be translated into brief human-readable + The SRU error-codes may be translated into brief human-readable error messages using const char *cql_strerror(int code); @@ -1236,8 +1253,8 @@ int cql_transform_FILE(cql_transform_t ct, open FILE. - - Specification of CQL to RPN mapping + + Specification of CQL to RPN mappings The file supplied to functions cql_transform_open_FILE, @@ -1263,26 +1280,49 @@ int cql_transform_FILE(cql_transform_t ct, value the attribute value. + The character * (asterisk) has special meaning + when used in the RPN pattern. + Each occurrence of * is substituted with the + CQL matching name (index, relation, qualifier etc). + This facility can be used to copy a CQL name verbatim to the RPN result. + + The following CQL patterns are recognized: - qualifier.set.name + index.set.name - This pattern is invoked when a CQL qualifier, such as + This pattern is invoked when a CQL index, such as dc.title is converted. set - and name is the index set and qualifier + and name are the context set and index name respectively. Typically, the RPN specifies an equivalent use attribute. - For terms not bound by a qualifier the pattern - qualifier.srw.serverChoice is used. - Here, the prefix srw is defined as - http://www.loc.gov/zing/cql/srw-indexes/v1.0/. + For terms not bound by an index the pattern + index.cql.serverChoice is used. + Here, the prefix cql is defined as + http://www.loc.gov/zing/cql/cql-indexes/v1.0/. If this pattern is not defined, the mapping will fail. + + The pattern, + index.set.* + is used when no other index pattern is matched. + + + + + qualifier.set.name + (DEPRECATED) + + + + For backwards compatibility, this is recognised as a synonym of + index.set.name + @@ -1367,35 +1407,48 @@ int cql_transform_FILE(cql_transform_t ct, - This specification defines a CQL index set for a given prefix. + This specification defines a CQL context set for a given prefix. The value on the right hand side is the URI for the set - not RPN. All prefixes used in - qualifier patterns must be defined this way. + index patterns must be defined this way. + + + set + + + + This specification defines a default CQL context set for index names. + The value on the right hand side is the URI for the set. + + + + - CQL to RPN mapping file + + CQL to RPN mapping file - This simple file defines two index sets, three qualifiers and three + This simple file defines two context sets, three indexes and three relations, a position pattern and a default structure. @@ -1407,7 +1460,7 @@ int cql_transform_FILE(cql_transform_t ct, @attr 1=1016 @attr 2=3 @attr 4=1 @attr 3=3 @attr 6=1 "computer" - by rules qualifier.srw.serverChoice, + by rules index.cql.serverChoice, relation.scr, structure.*, position.any. @@ -1430,8 +1483,51 @@ int cql_transform_FILE(cql_transform_t ct, + + CQL to RPN string attributes + + In this example we allow any index to be passed to RPN as + a use attribute. + + + + + The http://bogus/rpn context set is also the default + so we can make queries such as + + title = a + + which is converted to + + @attr 2=3 @attr 4=1 @attr 3=3 @attr 1=title "a" + + + + + CQL to RPN using Bath Profile + + The file etc/pqf.properties has mappings from + the Bath Profile and Dublin Core to RPN. + If YAZ is installed as a package it's usually located + in /usr/share/yaz/etc and part of the + development package, such as libyaz-dev. + + - CQL to XCQL conversion + CQL to XCQL conversion Conversion from CQL to XCQL is trivial and does not require a mapping to be defined. @@ -1462,293 +1558,178 @@ void cql_to_xml_stdio(struct cql_node *cn, FILE *f); The basic YAZ representation of an OID is an array of integers, - terminated with the value -1. The &odr; module provides two - utility-functions to create and copy this type of data elements: - - - - Odr_oid *odr_getoidbystr(ODR o, char *str); - - - - Creates an OID based on a string-based representation using dots (.) - to separate elements in the OID. - - - - Odr_oid *odr_oiddup(ODR odr, Odr_oid *o); - - - - Creates a copy of the OID referenced by the o - parameter. - Both functions take an &odr; stream as parameter. This stream is used to - allocate memory for the data elements, which is released on a - subsequent call to odr_reset() on that stream. - - - - The OID module provides a higher-level representation of the - family of object identifiers which describe the Z39.50 protocol and its - related objects. The definition of the module interface is given in - the oid.h file. - - - - The interface is mainly based on the oident structure. - The definition of this structure looks like this: - - - -typedef struct oident -{ - oid_proto proto; - oid_class oclass; - oid_value value; - int oidsuffix[OID_SIZE]; - char *desc; -} oident; - - - - The proto field takes one of the values - - - - PROTO_Z3950 - PROTO_GENERAL - - - - Use PROTO_Z3950 for Z39.50 Object Identifers, - PROTO_GENERAL for other types (such as - those associated with ILL). - - - - The oclass field takes one of the values + terminated with the value -1. This integer is of type + Odr_oid. - - - CLASS_APPCTX - CLASS_ABSYN - CLASS_ATTSET - CLASS_TRANSYN - CLASS_DIAGSET - CLASS_RECSYN - CLASS_RESFORM - CLASS_ACCFORM - CLASS_EXTSERV - CLASS_USERINFO - CLASS_ELEMSPEC - CLASS_VARSET - CLASS_SCHEMA - CLASS_TAGSET - CLASS_GENERAL - - - - corresponding to the OID classes defined by the Z39.50 standard. - - Finally, the value field takes one of the values - - - - VAL_APDU - VAL_BER - VAL_BASIC_CTX - VAL_BIB1 - VAL_EXP1 - VAL_EXT1 - VAL_CCL1 - VAL_GILS - VAL_WAIS - VAL_STAS - VAL_DIAG1 - VAL_ISO2709 - VAL_UNIMARC - VAL_INTERMARC - VAL_CCF - VAL_USMARC - VAL_UKMARC - VAL_NORMARC - VAL_LIBRISMARC - VAL_DANMARC - VAL_FINMARC - VAL_MAB - VAL_CANMARC - VAL_SBN - VAL_PICAMARC - VAL_AUSMARC - VAL_IBERMARC - VAL_EXPLAIN - VAL_SUTRS - VAL_OPAC - VAL_SUMMARY - VAL_GRS0 - VAL_GRS1 - VAL_EXTENDED - VAL_RESOURCE1 - VAL_RESOURCE2 - VAL_PROMPT1 - VAL_DES1 - VAL_KRB1 - VAL_PRESSET - VAL_PQUERY - VAL_PCQUERY - VAL_ITEMORDER - VAL_DBUPDATE - VAL_EXPORTSPEC - VAL_EXPORTINV - VAL_NONE - VAL_SETM - VAL_SETG - VAL_VAR1 - VAL_ESPEC1 - - - again, corresponding to the specific OIDs defined by the standard. - Refer to the - - Registry of Z39.50 Object Identifiers for the - whole list. + Fundamental OID operations and the type Odr_oid + are defined in yaz/oid_util.h. - - The desc field contains a brief, mnemonic name for the OID in question. + An OID can either be declared as a automatic variable or it can + allocated using the memory utilities or ODR/NMEM. It's + guaranteed that an OID can fit in OID_SIZE integers. - + Create OID on stack + + We can create an OID for the Bib-1 attribute set with: + + Odr_oid bib1[OID_SIZE]; + bib1[0] = 1; + bib1[1] = 2; + bib1[2] = 840; + bib1[3] = 10003; + bib1[4] = 3; + bib1[5] = 1; + bib1[6] = -1; + + + - The function + And OID may also be filled from a string-based representation using + dots (.). This is achieved by function + + int oid_dotstring_to_oid(const char *name, Odr_oid *oid); + + This functions returns 0 if name could be converted; -1 otherwise. - - - struct oident *oid_getentbyoid(int *o); - - - - takes as argument an OID, and returns a pointer to a static area - containing an oident structure. You typically use - this function when you receive a PDU containing an OID, and you wish - to branch out depending on the specific OID value. + Using oid_oiddotstring_to_oid + + We can fill the Bib-1 attribute set OID easier with: + + Odr_oid bib1[OID_SIZE]; + oid_oiddotstring_to_oid("1.2.840.10003.3.1", bib1); + - + - The function - - + We can also allocate an OID dynamically on a ODR stream with: - int *oid_ent_to_oid(struct oident *ent, int *dst); + Odr_oid *odr_getoidbystr(ODR o, const char *str); - - - Takes as argument an oident structure - in which - the proto, oclass/, and - value fields are assumed to be set correctly - - and returns a pointer to a the buffer as given by dst - containing the base - representation of the corresponding OID. The function returns - NULL and the array dst is unchanged if a mapping couldn't place. - The array dst should be at least of size - OID_SIZE. + This creates an OID from string-based representation using dots. + This function take an &odr; stream as parameter. This stream is used to + allocate memory for the data elements, which is released on a + subsequent call to odr_reset() on that stream. - - The oid_ent_to_oid() function can be used whenever - you need to prepare a PDU containing one or more OIDs. The separation of - the protocol element from the remainder of the - OID-description makes it simple to write applications that can - communicate with either Z39.50 or OSI SR-based applications. - + Using odr_getoidbystr + + We can create a OID for the Bib-1 attribute set with: + + Odr_oid *bib1 = odr_getoidbystr(odr, "1.2.840.10003.3.1"); + + + The function + + char *oid_oid_to_dotstring(const Odr_oid *oid, char *oidbuf) + + does the reverse of oid_oiddotstring_to_oid. It + converts an OID to the string-based representation using dots. + The supplied char buffer oidbuf holds the resulting + string and must be at least OID_STR_MAX in size. - - oid_value oid_getvalbyname(const char *name); - - - takes as argument a mnemonic OID name, and returns the - /value field of the first entry in the database that - contains the given name in its desc field. + OIDs can be copied with oid_oidcpy which takes + two OID lists as arguments. Alternativly, an OID copy can be allocated + on a ODR stream with: + + Odr_oid *odr_oiddup(ODR odr, const Odr_oid *o); + - + - Three utility functions are provided for translating OIDs' - symbolic names (e.g. Usmarc into OID structures - (int arrays) and strings containing the OID in dotted notation - (e.g. 1.2.840.10003.9.5.1). They are: - - - - int *oid_name_to_oid(oid_class oclass, const char *name, int *oid); - char *oid_to_dotstring(const int *oid, char *oidbuf); - char *oid_name_to_dotstring(oid_class oclass, const char *name, char *oidbuf); - - - - oid_name_to_oid() - translates the specified symbolic name, - interpreted as being of class oclass. (The - class must be specified as many symbolic names exist within - multiple classes - for example, Zthes is the - symbolic name of an attribute set, a schema and a tag-set.) The - sequence of integers representing the OID is written into the - area oid provided by the caller; it is the - caller's responsibility to ensure that this area is large enough - to contain the translated OID. As a convenience, the address of - the buffer (i.e. the value of oid) is - returned. - - - oid_to_dotstring() - Translates the int-array oid into a dotted - string which is written into the area oidbuf - supplied by the caller; it is the caller's responsibility to - ensure that this area is large enough. The address of the buffer - is returned. - - - oid_name_to_dotstring() - combines the previous two functions to derive a dotted string - representing the OID specified by oclass and - name, writing it into the buffer passed as - oidbuf and returning its address. - - - - Finally, the module provides the following utility functions, whose - meaning should be obvious: + OIDs can be compared with oid_oidcmp which returns + zero if the two OIDs provided are identical; non-zero otherwise. + + OID database + + From YAZ version 3 and later, the oident system has been replaced + by an OID database. OID database is a misnomer .. the old odient + system was also a database. + + + The OID database is really just a map between named Object Identifiers + (string) and their OID raw equivalents. Most operations either + convert from string to OID or other way around. + + + Unfortunately, whenever we supply a string we must also specify the + OID class. The class is necessary because some + strings correspond to multiple OIDs. An example of such a string is + Bib-1 which may either be an attribute-set + or a diagnostic-set. + + + Applications using the YAZ database should include + yaz/oid_db.h. + + + A YAZ database handle is of type yaz_oid_db_t. + Actually that's a pointer. You need not think deal with that. + YAZ has a built-in database which can be considered "constant" for + most purposes. + We can get hold that by using function yaz_oid_std. + + + All functions with prefix yaz_string_to_oid + converts from class + string to OID. We have variants of this + operation due to different memory allocation strategies. + + + All functions with prefix + yaz_oid_to_string converts from OID to string + + class. + - - void oid_oidcpy(int *t, int *s); - void oid_oidcat(int *t, int *s); - int oid_oidcmp(int *o1, int *o2); - int oid_oidlen(int *o); - + Create OID with YAZ DB + + We can create an OID for the Bib-1 attribute set on the ODR stream + odr with: + + Odr_oid *bib1 = + yaz_string_to_oid_odr(yaz_oid_std(), CLASS_ATTSET, "Bib-1", odr); + + This is more complex than using odr_getoidbystr. + You would only use yaz_string_to_oid_odr when the + string (here Bib-1) is supplied by a user or configuration. + + - + + Standard OIDs + - The OID module has been criticized - and perhaps rightly so - - for needlessly abstracting the - representation of OIDs. Other toolkits use a simple - string-representation of OIDs with good results. In practice, we have - found the interface comfortable and quick to work with, and it is a - simple matter (for what it's worth) to create applications compatible - with both ISO SR and Z39.50. Finally, the use of the - /oident database is by no means mandatory. - You can easily create your own system for representing OIDs, as long - as it is compatible with the low-level integer-array representation - of the ODR module. + All the object identifers in the standard OID database as returned + by yaz_oid_std can referenced directly in a + program as a constant OID. + Each constant OID is prefixed with yaz_oid_ - + followed by OID class (lowercase) - then by OID name (normalized and + lowercase). + + + See for list of all object identifiers + built into YAZ. + These are declared in yaz/oid_std.h but are + included by yaz/oid_db.h as well. - + Use a built-in OID + + We can allocate our own OID filled with the constant OID for + Bib-1 with: + + Odr_oid *bib1 = odr_oiddup(o, yaz_oid_attset_bib1); + + + + - Nibble Memory @@ -1773,9 +1754,9 @@ typedef struct oident NMEM nmem_create(void); void nmem_destroy(NMEM n); - void *nmem_malloc(NMEM n, int size); + void *nmem_malloc(NMEM n, size_t size); void nmem_reset(NMEM n); - int nmem_total(NMEM n); + size_t nmem_total(NMEM n); void nmem_init(void); void nmem_exit(void); @@ -1807,6 +1788,636 @@ typedef struct oident + + Log + + &yaz; has evolved a fairly complex log system which should be useful both + for debugging &yaz; itself, debugging applications that use &yaz;, and for + production use of those applications. + + + The log functions are declared in header yaz/log.h + and implemented in src/log.c. + Due to name clash with syslog and some math utilities the logging + interface has been modified as of YAZ 2.0.29. The obsolete interface + is still available if in header file yaz/log.h. + The key points of the interface are: + + + void yaz_log(int level, const char *fmt, ...) + + void yaz_log_init(int level, const char *prefix, const char *name); + void yaz_log_init_file(const char *fname); + void yaz_log_init_level(int level); + void yaz_log_init_prefix(const char *prefix); + void yaz_log_time_format(const char *fmt); + void yaz_log_init_max_size(int mx); + + int yaz_log_mask_str(const char *str); + int yaz_log_module_level(const char *name); + + + + The reason for the whole log module is the yaz_log + function. It takes a bitmask indicating the log levels, a + printf-like format string, and a variable number of + arguments to log. + + + + The log level is a bit mask, that says on which level(s) + the log entry should be made, and optionally set some behaviour of the + logging. In the most simple cases, it can be one of YLOG_FATAL, + YLOG_DEBUG, YLOG_WARN, YLOG_LOG. Those can be combined with bits + that modify the way the log entry is written:YLOG_ERRNO, + YLOG_NOTIME, YLOG_FLUSH. + Most of the rest of the bits are deprecated, and should not be used. Use + the dynamic log levels instead. + + + + Applications that use &yaz;, should not use the LOG_LOG for ordinary + messages, but should make use of the dynamic loglevel system. This consists + of two parts, defining the loglevel and checking it. + + + + To define the log levels, the (main) program should pass a string to + yaz_log_mask_str to define which log levels are to be + logged. This string should be a comma-separated list of log level names, + and can contain both hard-coded names and dynamic ones. The log level + calculation starts with YLOG_DEFAULT_LEVEL and adds a bit + for each word it meets, unless the word starts with a '-', in which case it + clears the bit. If the string 'none' is found, + all bits are cleared. Typically this string comes from the command-line, + often identified by -v. The + yaz_log_mask_str returns a log level that should be + passed to yaz_log_init_level for it to take effect. + + + + Each module should check what log bits it should be used, by calling + yaz_log_module_level with a suitable name for the + module. The name is cleared from a preceding path and an extension, if any, + so it is quite possible to use __FILE__ for it. If the + name has been passed to yaz_log_mask_str, the routine + returns a non-zero bitmask, which should then be used in consequent calls + to yaz_log. (It can also be tested, so as to avoid unnecessary calls to + yaz_log, in time-critical places, or when the log entry would take time + to construct.) + + + + Yaz uses the following dynamic log levels: + server, session, request, requestdetail for the server + functionality. + zoom for the zoom client api. + ztest for the simple test server. + malloc, nmem, odr, eventl for internal debugging of yaz itself. + Of course, any program using yaz is welcome to define as many new ones, as + it needs. + + + + By default the log is written to stderr, but this can be changed by a call + to yaz_log_init_file or + yaz_log_init. If the log is directed to a file, the + file size is checked at every write, and if it exceeds the limit given in + yaz_log_init_max_size, the log is rotated. The + rotation keeps one old version (with a .1 appended to + the name). The size defaults to 1GB. Setting it to zero will disable the + rotation feature. + + + + A typical yaz-log looks like this + 13:23:14-23/11 yaz-ztest(1) [session] Starting session from tcp:127.0.0.1 (pid=30968) + 13:23:14-23/11 yaz-ztest(1) [request] Init from 'YAZ' (81) (ver 2.0.28) OK + 13:23:17-23/11 yaz-ztest(1) [request] Search Z: @attrset Bib-1 foo OK:7 hits + 13:23:22-23/11 yaz-ztest(1) [request] Present: [1] 2+2 OK 2 records returned + 13:24:13-23/11 yaz-ztest(1) [request] Close OK + + + + The log entries start with a time stamp. This can be omitted by setting the + YLOG_NOTIME bit in the loglevel. This way automatic tests + can be hoped to produce identical log files, that are easy to diff. The + format of the time stamp can be set with + yaz_log_time_format, which takes a format string just + like strftime. + + + + Next in a log line comes the prefix, often the name of the program. For + yaz-based servers, it can also contain the session number. Then + comes one or more logbits in square brackets, depending on the logging + level set by yaz_log_init_level and the loglevel + passed to yaz_log_init_level. Finally comes the format + string and additional values passed to yaz_log + + + + The log level YLOG_LOGLVL, enabled by the string + loglevel, will log all the log-level affecting + operations. This can come in handy if you need to know what other log + levels would be useful. Grep the logfile for [loglevel]. + + + + The log system is almost independent of the rest of &yaz;, the only + important dependence is of nmem, and that only for + using the semaphore definition there. + + + + The dynamic log levels and log rotation were introduced in &yaz; 2.0.28. At + the same time, the log bit names were changed from + LOG_something to YLOG_something, + to avoid collision with syslog.h. + + + + + MARC + + + YAZ provides a fast utility for working with MARC records. + Early versions of the MARC utility only allowed decoding of ISO2709. + Today the utility may both encode - and decode to a varity of formats. + + + + /* create handler */ + yaz_marc_t yaz_marc_create(void); + /* destroy */ + void yaz_marc_destroy(yaz_marc_t mt); + + /* set XML mode YAZ_MARC_LINE, YAZ_MARC_SIMPLEXML, ... */ + void yaz_marc_xml(yaz_marc_t mt, int xmlmode); + #define YAZ_MARC_LINE 0 + #define YAZ_MARC_SIMPLEXML 1 + #define YAZ_MARC_OAIMARC 2 + #define YAZ_MARC_MARCXML 3 + #define YAZ_MARC_ISO2709 4 + #define YAZ_MARC_XCHANGE 5 + #define YAZ_MARC_CHECK 6 + #define YAZ_MARC_TURBOMARC 7 + + /* supply iconv handle for character set conversion .. */ + void yaz_marc_iconv(yaz_marc_t mt, yaz_iconv_t cd); + + /* set debug level, 0=none, 1=more, 2=even more, .. */ + void yaz_marc_debug(yaz_marc_t mt, int level); + + /* decode MARC in buf of size bsize. Returns >0 on success; <=0 on failure. + On success, result in *result with size *rsize. */ + int yaz_marc_decode_buf(yaz_marc_t mt, const char *buf, int bsize, + const char **result, size_t *rsize); + + /* decode MARC in buf of size bsize. Returns >0 on success; <=0 on failure. + On success, result in WRBUF */ + int yaz_marc_decode_wrbuf(yaz_marc_t mt, const char *buf, + int bsize, WRBUF wrbuf); +]]> + + + + The synopsis is just a basic subset of all functionality. Refer + to the actual header file marcdisp.h for + details. + + + + A MARC conversion handle must be created by using + yaz_marc_create and destroyed + by calling yaz_marc_destroy. + + + All other function operate on a yaz_marc_t handle. + The output is specified by a call to yaz_marc_xml. + The xmlmode must be one of + + + YAZ_MARC_LINE + + + A simple line-by-line format suitable for display but not + recommend for further (machine) processing. + + + + + + YAZ_MARC_MARCXML + + + MARCXML. + + + + + + YAZ_MARC_ISO2709 + + + ISO2709 (sometimes just referred to as "MARC"). + + + + + + YAZ_MARC_XCHANGE + + + MarcXchange. + + + + + + YAZ_MARC_CHECK + + + Pseudo format for validation only. Does not generate + any real output except diagnostics. + + + + + + YAZ_MARC_TURBOMARC + + + XML format with same semantics as MARCXML but more compact + and geared towards fast processing with XSLT. Refer to + for more information. + + + + + + + + The actual conversion functions are + yaz_marc_decode_buf and + yaz_marc_decode_wrbuf which decodes and encodes + a MARC record. The former function operates on simple buffers, the + stores the resulting record in a WRBUF handle (WRBUF is a simple string + type). + + + Display of MARC record + + The following program snippet illustrates how the MARC API may + be used to convert a MARC record to the line-by-line format: + + + + + + TurboMARC + + TurboMARC is yet another XML encoding of a MARC record. The format + was designed for fast processing with XSLT. + + + Applications like + Pazpar2 uses XSLT to convert an XML encoded MARC record to an internal + representation. This conversion mostly check the tag of a MARC field + to determine the basic rules in the conversion. This check is + costly when that is tag is encoded as an attribute in MARCXML. + By having the tag value as the element instead, makes processing + many times faster (at least for Libxslt). + + + TurboMARC is encoded as follows: + + + Record elements is part of namespace + "http://www.indexdata.com/turbomarc". + + + A record is enclosed in element r. + + + A collection of records is enclosed in element + collection. + + + The leader is encoded as element l with the + leader content as its (text) value. + + + A control field is encoded as element c concatenated + with the tag value of the control field if the tag value + matches the regular expression [a-zA-Z0-9]*. + If the tag value do not match the regular expression + [a-zA-Z0-9]* the control field is encoded + as element c and attribute code + will hold the tag value. + This rule ensure that in the rare cases where a tag value might + result in a non-wellformed XML YAZ encode it as a coded attribute + (as in MARCXML). + + + The control field content is the the text value of this element. + Indicators are encoded as attribute names + i1, i2, etc.. and + corresponding values for each indicator. + + + A data field is encoded as element d concatenated + with the tag value of the data field or using the attribute + code as described in the rules for control fields. + The children of the data field element is subfield elements. + Each subfield element is encoded as s + concatenated with the sub field code. + The text of the subfield element is the contents of the subfield. + Indicators are encoded as attributes for the data field element similar + to the encoding for control fields. + + + + + + + + Retrieval Facility + + YAZ version 2.1.20 or later includes a Retrieval facility tool + which allows a SRU/Z39.50 to describe itself and perform record + conversions. The idea is the following: + + + + + An SRU/Z39.50 client sends a retrieval request which includes + a combination of the following parameters: syntax (format), + schema (or element set name). + + + + + + The retrieval facility is invoked with parameters in a + server/proxy. The retrieval facility matches the parameters a set of + "supported" retrieval types. + If there is no match, the retrieval signals an error + (syntax and / or schema not supported). + + + + + + For a successful match, the backend is invoked with the same + or altered retrieval parameters (syntax, schema). If + a record is received from the backend, it is converted to the + frontend name / syntax. + + + + + + The resulting record is sent back the client and tagged with + the frontend syntax / schema. + + + + + + + The Retrieval facility is driven by an XML configuration. The + configuration is neither Z39.50 ZeeRex or SRU ZeeRex. But it + should be easy to generate both of them from the XML configuration. + (unfortunately the two versions + of ZeeRex differ substantially in this regard). + + + Retrieval XML format + + All elements should be covered by namespace + http://indexdata.com/yaz . + The root element node must be retrievalinfo. + + + The retrievalinfo must include one or + more retrieval elements. Each + retrieval defines specific combination of + syntax, name and identifier supported by this retrieval service. + + + The retrieval element may include any of the + following attributes: + + syntax (REQUIRED) + + + Defines the record syntax. Possible values is any + of the names defined in YAZ' OID database or a raw + OID in (n.n ... n). + + + + name (OPTIONAL) + + + Defines the name of the retrieval format. This can be + any string. For SRU, the value, is equivalent to schema (short-hand); + for Z39.50 it's equivalent to simple element set name. + For YAZ 3.0.24 and later this name may be specified as a glob + expression with operators + * and ?. + + + + identifier (OPTIONAL) + + + Defines the URI schema name of the retrieval format. This can be + any string. For SRU, the value, is equivalent to URI schema. + For Z39.50, there is no equivalent. + + + + + + + The retrieval may include one + backend element. If a backend + element is given, it specifies how the records are retrieved by + some backend and how the records are converted from the backend to + the "frontend". + + + The attributes, name and syntax + may be specified for the backend element. These + semantics of these attributes is equivalent to those for the + retrieval. However, these values are passed to + the "backend". + + + The backend element may includes one or more + conversion instructions (as children elements). The supported + conversions are: + + marc + + + The marc element specifies a conversion + to - and from ISO2709 encoded MARC and + &acro.marcxml;/MarcXchange. + The following attributes may be specified: + + + inputformat (REQUIRED) + + + Format of input. Supported values are + marc (for ISO2709); and xml + for MARCXML/MarcXchange. + + + + + outputformat (REQUIRED) + + + Format of output. Supported values are + line (MARC line format); + marcxml (for MARCXML), + marc (ISO2709), + marcxhcange (for MarcXchange). + + + + + inputcharset (OPTIONAL) + + + Encoding of input. For XML input formats, this need not + be given, but for ISO2709 based inputformats, this should + be set to the encoding used. For MARC21 records, a common + inputcharset value would be marc-8. + + + + + outputcharset (OPTIONAL) + + + Encoding of output. If outputformat is XML based, it is + strongly recommened to use utf-8. + + + + + + + + + xslt + + + The xslt element specifies a conversion + via &acro.xslt;. The following attributes may be specified: + + + stylesheet (REQUIRED) + + + Stylesheet file. + + + + + + + + + + + + + Retrieval Facility Examples + + MARC21 backend + + A typical way to use the retrieval facility is to enable XML + for servers that only supports ISO2709 encoded MARC21 records. + + + + + + + + + + + + + + + + +]]> + + + This means that our frontend supports: + + + + MARC21 F(ull) records. + + + + + MARC21 B(rief) records. + + + + + + MARCXML records. + + + + + + Dublin core records. + + + + + + + + API + + It should be easy to use the retrieval systems from applications. Refer + to the headers + yaz/retrieval.h and + yaz/record_conv.h. + + +