X-Git-Url: http://git.indexdata.com/?p=yaz-moved-to-github.git;a=blobdiff_plain;f=doc%2Ftools.xml;h=d19faf90ff430bd3172a4281357f9dac4ff62156;hp=cc3e3c5635a783d151863a6e2a66e877b21aec67;hb=0cd4a65083c86690792faa7bb14de67a30bcfc20;hpb=835fe1fa5d34428ba2803cd4a2b1a9b9aec48ab0 diff --git a/doc/tools.xml b/doc/tools.xml index cc3e3c5..d19faf9 100644 --- a/doc/tools.xml +++ b/doc/tools.xml @@ -1,5 +1,5 @@ Supporting Tools - + In support of the service API - primarily the ASN module, which provides the pro-grammatic interface to the Z39.50 APDUs, &yaz; contains @@ -40,7 +40,7 @@ - The PQF is defined by the pquery module in the YAZ library. + The PQF is defined by the pquery module in the YAZ library. There are two sets of function that have similar behavior. First set operates on a PQF parser handle, second set doesn't. First set set of functions are more flexible than the second set. Second set @@ -52,17 +52,16 @@ #include <yaz/pquery.h> - YAZ_PQF_Parser yaz_pqf_create (void); + YAZ_PQF_Parser yaz_pqf_create(void); - void yaz_pqf_destroy (YAZ_PQF_Parser p); + void yaz_pqf_destroy(YAZ_PQF_Parser p); - Z_RPNQuery *yaz_pqf_parse (YAZ_PQF_Parser p, ODR o, const char *qbuf); + Z_RPNQuery *yaz_pqf_parse(YAZ_PQF_Parser p, ODR o, const char *qbuf); - Z_AttributesPlusTerm *yaz_pqf_scan (YAZ_PQF_Parser p, ODR o, + Z_AttributesPlusTerm *yaz_pqf_scan(YAZ_PQF_Parser p, ODR o, Odr_oid **attributeSetId, const char *qbuf); - - int yaz_pqf_error (YAZ_PQF_Parser p, const char **msg, size_t *off); + int yaz_pqf_error(YAZ_PQF_Parser p, const char **msg, size_t *off); A PQF parser is created and destructed by functions @@ -73,7 +72,7 @@ a Z39.50 RPN Query is returned which is created using ODR stream o. If parsing failed, a NULL pointer is returned. - Function yaz_pqf_scan takes a scan query in + Function yaz_pqf_scan takes a scan query in qbuf. If parsing was successful, the function returns attributes plus term pointer and modifies attributeSetId to hold attribute set for the @@ -91,12 +90,12 @@ #include <yaz/pquery.h> - Z_RPNQuery *p_query_rpn (ODR o, oid_proto proto, const char *qbuf); + Z_RPNQuery *p_query_rpn(ODR o, oid_proto proto, const char *qbuf); - Z_AttributesPlusTerm *p_query_scan (ODR o, oid_proto proto, + Z_AttributesPlusTerm *p_query_scan(ODR o, oid_proto proto, Odr_oid **attributeSetP, const char *qbuf); - int p_query_attset (const char *arg); + int p_query_attset(const char *arg); The function p_query_rpn() takes as arguments an @@ -110,7 +109,7 @@ If the parse went well, p_query_rpn() returns a pointer to a Z_RPNQuery structure which can be - placed directly into a Z_SearchRequest. + placed directly into a Z_SearchRequest. If parsing failed, due to syntax error, a NULL pointer is returned. @@ -170,7 +169,7 @@ - The @attr operator is followed by an attribute specification + The @attr operator is followed by an attribute specification (attr-spec above). The specification consists of an optional attribute set, an attribute type-value pair and a sub-query. The attribute type-value pair is packed in one string: @@ -194,7 +193,7 @@ is used. This is the only encoding allowed in both versions 2 and 3 of the Z39.50 standard. - + Using Proximity Operators with PQF @@ -221,7 +220,7 @@ The proximity operator @prox is a special and more restrictive version of the conjunction operator - @and. Its semantics are described in + @and. Its semantics are described in section 3.7.2 (Proximity) of Z39.50 the standard itself, which can be read on-line at @@ -401,9 +400,9 @@ @or @and bob dylan @set Result-1 - + @attr 4=1 @and @attr 1=1 "bob dylan" @attr 1=4 "slow train coming" - + @and @attr 2=4 @attr gils 1=2038 -114 @attr 2=2 @attr gils 1=2039 -109 @@ -483,45 +482,51 @@ -- Proximity operator - + CCL queries The following queries are all valid: - + dylan - + "bob dylan" - + dylan or zimmerman - + set=1 - + (dylan and bob) or set=1 - + + righttrunc? + + "notrunc?" + + singlechar#mask + Assuming that the qualifiers ti, au and date are defined we may use: - + ti=self portrait - + au=(bob dylan and slow train coming) date>1980 and (ti=((self portrait))) - + - + CCL Qualifiers - + Qualifiers are used to direct the search to a particular searchable index, such as title (ti) and author indexes (au). The CCL standard @@ -547,13 +552,13 @@ A qualifier specification is of the form: - + - qualifier-name + qualifier-name [attributeset,]type=val - [attributeset,]type=val ... + [attributeset,]type=val ... - + where qualifier-name is the name of the qualifier to be used (eg. ti), @@ -562,7 +567,7 @@ val is attribute value. The type can be specified as an integer or as it be specified either as a single-letter: - u for use, + u for use, r for relation,p for position, s for structure,t for truncation or c for completeness. @@ -644,10 +649,10 @@ list of Bib-1 attributes - It is also possible to specify non-numeric attribute values, + It is also possible to specify non-numeric attribute values, which are used in combination with certain types. The special combinations are: - + Special attribute combos @@ -672,26 +677,37 @@ This does not set the structure at all. - + s=ol Each token in the term is ORed. (or-list). This does not set the structure at all. - + + s=ag + Tokens that appears as phrases (with blank in them) gets + structure phrase attached (4=1). Tokens that appear to be words + gets structure word attached (4=2). Phrases and words are + ANDed. This is a variant of s=al and s=pw, with the main + difference that words are not split (with operator AND) + but instead kept in one RPN token. This facility appeared + in YAZ 4.2.38. + + + r=o Allows ranges and the operators greather-than, less-than, ... equals. This sets Bib-1 relation attribute accordingly (relation ordered). A query construct is only treated as a range if dash is used and that is surrounded by white-space. So - -1980 is treated as term + -1980 is treated as term "-1980" not <= 1980. If - 1980 is used, however, that is treated as a range. - + r=r Similar to r=o but assumes that terms are non-negative (not prefixed with -). @@ -703,27 +719,27 @@ r=r is available in YAZ 2.0.24 or later. - + t=l Allows term to be left-truncated. If term is of the form ?x, the resulting Type-1 term is x and truncation is left. - + t=r Allows term to be right-truncated. If term is of the form x?, the resulting Type-1 term is x and truncation is right. - + t=n If term is does not include ?, the truncation attribute is set to none (100). - + t=b Allows term to be both left&right truncated. If term is of the form ?x?, the @@ -740,6 +756,15 @@ + t=z + Allows masking anywhere in a term, thus fully supporting + # (mask one character) and ? (zero or more of any). + If masking is used, trunction is set to 104 (Z39.58 in term) + and the term is converted accordingly to Z39.58 masking term - + actually the same truncation as CCL itself. + + +
@@ -748,7 +773,7 @@ Consider the following definition: - + ti u=4 s=1 au u=1 s=1 @@ -757,7 +782,7 @@ date u=30 r=o - ti and au both set + ti and au both set structure attribute to phrase (s=1). ti sets the use-attribute to 4. au sets the @@ -770,7 +795,7 @@ You can combine attributes. To Search for "ranked title" you - can do + can do ti,ranked=knuth computer @@ -795,12 +820,12 @@ A qualifier alias is of the form: - q + q q1 q2 .. which declares q to - be an alias for q1, + be an alias for q1, q2... such that the CCL query q=x is equivalent to q1=x or q2=x or .... @@ -842,6 +867,11 @@ ? + mask + Masking character. Requires YAZ 4.2.58 or later + # + + field Specifies how multiple fields are to be combined. There are two modes: or: @@ -853,10 +883,10 @@ case - Specificies if CCL operatores and qualifiers should be - compared with case sensitivity or not. Specify 0 for - case sensitive; 1 for case insensitive. - 0 + Specifies if CCL operators and qualifiers should be + compared with case sensitivity or not. Specify 1 for + case sensitive; 0 for case insensitive. + 1 @@ -908,8 +938,8 @@ To parse a simple string with a FIND query use the function -struct ccl_rpn_node *ccl_find_str (CCL_bibset bibset, const char *str, - int *error, int *pos); +struct ccl_rpn_node *ccl_find_str(CCL_bibset bibset, const char *str, + int *error, int *pos); which takes the CCL profile (bibset) and query @@ -963,7 +993,7 @@ struct ccl_rpn_node *ccl_find_str (CCL_bibset bibset, const char *str, - If you are new to CQL, read the + If you are new to CQL, read the Gentle Introduction. @@ -1046,7 +1076,7 @@ int cql_parser_stdio(CQL_parser cp, FILE *f); uses a FILE handle which is opened for reading.
- + CQL tree The the query string is valid, the CQL parser @@ -1061,12 +1091,13 @@ struct cql_node *cql_parser_result(CQL_parser cp); a pointer to the root node of the resulting tree. - Each node in a CQL tree is represented by a + Each node in a CQL tree is represented by a struct cql_node. It is defined as follows: #define CQL_NODE_ST 1 #define CQL_NODE_BOOL 2 +#define CQL_NODE_SORT 3 struct cql_node { int which; union { @@ -1084,10 +1115,17 @@ struct cql_node { struct cql_node *right; struct cql_node *modifiers; } boolean; + struct { + char *index; + struct cql_node *next; + struct cql_node *modifiers; + struct cql_node *search; + } sort; } u; }; - There are two node types: search term (ST) and boolean (BOOL). + There are three node types: search term (ST), boolean (BOOL) + and sortby (SORT). A modifier is treated as a search term too. @@ -1132,8 +1170,8 @@ struct cql_node { - The boolean node represents both and, - or, not as well as + The boolean node represents and, + or, not + proximity. @@ -1150,12 +1188,16 @@ struct cql_node { + + The sort node represents both the SORTBY clause. + + CQL to PQF conversion Conversion to PQF (and Z39.50 RPN) is tricky by the fact that the resulting RPN depends on the Z39.50 target - capabilities (combinations of supported attributes). + capabilities (combinations of supported attributes). In addition, the CQL and SRU operates on index prefixes (URI or strings), whereas the RPN uses Object Identifiers for attribute sets. @@ -1173,7 +1215,7 @@ void cql_transform_close(cql_transform_t ct); either an already open FILE or from a filename respectively. - The handle is destroyed by cql_transform_close + The handle is destroyed by cql_transform_close in which case no further reference of the handle is allowed. @@ -1183,7 +1225,7 @@ void cql_transform_close(cql_transform_t ct); int cql_transform_buf(cql_transform_t ct, struct cql_node *cn, char *out, int max); - This function converts the CQL tree cn + This function converts the CQL tree cn using handle ct. For the resulting PQF, you supply a buffer out which must be able to hold at at least max @@ -1236,7 +1278,7 @@ int cql_transform_FILE(cql_transform_t ct, Specification of CQL to RPN mappings - The file supplied to functions + The file supplied to functions cql_transform_open_FILE, cql_transform_open_fname follows a structure found in many Unix utilities. @@ -1274,7 +1316,7 @@ int cql_transform_FILE(cql_transform_t ct, - This pattern is invoked when a CQL index, such as + This pattern is invoked when a CQL index, such as dc.title is converted. set and name are the context set and index name respectively. @@ -1288,7 +1330,7 @@ int cql_transform_FILE(cql_transform_t ct, If this pattern is not defined, the mapping will fail. - The pattern, + The pattern, index.set.* is used when no other index pattern is matched. @@ -1353,7 +1395,7 @@ int cql_transform_FILE(cql_transform_t ct, This pattern specifies how a CQL structure is mapped to RPN. Note that this CQL pattern is somewhat to similar to - CQL pattern relation. + CQL pattern relation. The type is a CQL relation. @@ -1388,7 +1430,7 @@ int cql_transform_FILE(cql_transform_t ct, This specification defines a CQL context set for a given prefix. - The value on the right hand side is the URI for the set - + The value on the right hand side is the URI for the set - not RPN. All prefixes used in index patterns must be defined this way. @@ -1421,7 +1463,7 @@ int cql_transform_FILE(cql_transform_t ct, index.cql.serverChoice = 1=1016 index.dc.title = 1=4 index.dc.subject = 1=21 - + relation.< = 2=1 relation.eq = 2=3 relation.scr = 2=3 @@ -1516,7 +1558,7 @@ int cql_transform_FILE(cql_transform_t ct, containing XCQL). int cql_to_xml_buf(struct cql_node *cn, char *out, int max); -void cql_to_xml(struct cql_node *cn, +void cql_to_xml(struct cql_node *cn, void (*pr)(const char *buf, void *client_data), void *client_data); void cql_to_xml_stdio(struct cql_node *cn, FILE *f); @@ -1532,13 +1574,34 @@ void cql_to_xml_stdio(struct cql_node *cn, FILE *f); a file. + + PQF to CQL conversion + + Conversion from PQF to CQL is offered by the two functions shown + below. The former uses a generic stream for result. The latter + puts result in a WRBUF (string container). + +#include <yaz/rpn2cql.h> + +int cql_transform_rpn2cql_stream(cql_transform_t ct, + void (*pr)(const char *buf, void *client_data), + void *client_data, + Z_RPNQuery *q); + +int cql_transform_rpn2cql_wrbuf(cql_transform_t ct, + WRBUF w, + Z_RPNQuery *q); + + The configuration is the same as used in CQL to PQF conversions. + + Object Identifiers The basic YAZ representation of an OID is an array of integers, - terminated with the value -1. This integer is of type + terminated with the value -1. This integer is of type Odr_oid. @@ -1621,12 +1684,12 @@ void cql_to_xml_stdio(struct cql_node *cn, FILE *f); Odr_oid *odr_oiddup(ODR odr, const Odr_oid *o); - + OIDs can be compared with oid_oidcmp which returns zero if the two OIDs provided are identical; non-zero otherwise. - + OID database From YAZ version 3 and later, the oident system has been replaced @@ -1639,21 +1702,21 @@ void cql_to_xml_stdio(struct cql_node *cn, FILE *f); convert from string to OID or other way around. - Unfortunately, whenever we supply a string we must also specify the + Unfortunately, whenever we supply a string we must also specify the OID class. The class is necessary because some strings correspond to multiple OIDs. An example of such a string is - Bib-1 which may either be an attribute-set + Bib-1 which may either be an attribute-set or a diagnostic-set. - Applications using the YAZ database should include + Applications using the YAZ database should include yaz/oid_db.h. A YAZ database handle is of type yaz_oid_db_t. Actually that's a pointer. You need not think deal with that. YAZ has a built-in database which can be considered "constant" for - most purposes. + most purposes. We can get hold that by using function yaz_oid_std. @@ -1672,7 +1735,7 @@ void cql_to_xml_stdio(struct cql_node *cn, FILE *f); We can create an OID for the Bib-1 attribute set on the ODR stream odr with: - Odr_oid *bib1 = + Odr_oid *bib1 = yaz_string_to_oid_odr(yaz_oid_std(), CLASS_ATTSET, "Bib-1", odr); This is more complex than using odr_getoidbystr. @@ -1683,7 +1746,7 @@ void cql_to_xml_stdio(struct cql_node *cn, FILE *f); Standard OIDs - + All the object identifers in the standard OID database as returned by yaz_oid_std can referenced directly in a @@ -1764,16 +1827,16 @@ void cql_to_xml_stdio(struct cql_node *cn, FILE *f); not call nmem_init or nmem_exit unless you're absolute sure what you're doing. Note that in previous &yaz; versions you'd have to call - nmem_init yourself. + nmem_init yourself. Log - &yaz; has evolved a fairly complex log system which should be useful both + &yaz; has evolved a fairly complex log system which should be useful both for debugging &yaz; itself, debugging applications that use &yaz;, and for - production use of those applications. + production use of those applications. The log functions are declared in header yaz/log.h @@ -1827,7 +1890,7 @@ void cql_to_xml_stdio(struct cql_node *cn, FILE *f); logged. This string should be a comma-separated list of log level names, and can contain both hard-coded names and dynamic ones. The log level calculation starts with YLOG_DEFAULT_LEVEL and adds a bit - for each word it meets, unless the word starts with a '-', in which case it + for each word it meets, unless the word starts with a '-', in which case it clears the bit. If the string 'none' is found, all bits are cleared. Typically this string comes from the command-line, often identified by -v. The @@ -1836,15 +1899,15 @@ void cql_to_xml_stdio(struct cql_node *cn, FILE *f); - Each module should check what log bits it should be used, by calling + Each module should check what log bits it should be used, by calling yaz_log_module_level with a suitable name for the module. The name is cleared from a preceding path and an extension, if any, so it is quite possible to use __FILE__ for it. If the name has been passed to yaz_log_mask_str, the routine returns a non-zero bitmask, which should then be used in consequent calls to yaz_log. (It can also be tested, so as to avoid unnecessary calls to - yaz_log, in time-critical places, or when the log entry would take time - to construct.) + yaz_log, in time-critical places, or when the log entry would take time + to construct.) @@ -1906,20 +1969,20 @@ void cql_to_xml_stdio(struct cql_node *cn, FILE *f); The log system is almost independent of the rest of &yaz;, the only important dependence is of nmem, and that only for - using the semaphore definition there. + using the semaphore definition there. The dynamic log levels and log rotation were introduced in &yaz; 2.0.28. At the same time, the log bit names were changed from - LOG_something to YLOG_something, + LOG_something to YLOG_something, to avoid collision with syslog.h. - + MARC - + YAZ provides a fast utility for working with MARC records. Early versions of the MARC utility only allowed decoding of ISO2709. @@ -2039,7 +2102,7 @@ void cql_to_xml_stdio(struct cql_node *cn, FILE *f); - The actual conversion functions are + The actual conversion functions are yaz_marc_decode_buf and yaz_marc_decode_wrbuf which decodes and encodes a MARC record. The former function operates on simple buffers, the @@ -2097,7 +2160,7 @@ void cql_to_xml_stdio(struct cql_node *cn, FILE *f); collection. - The leader is encoded as element l with the + The leader is encoded as element l with the leader content as its (text) value. @@ -2140,7 +2203,7 @@ void cql_to_xml_stdio(struct cql_node *cn, FILE *f); YAZ version 2.1.20 or later includes a Retrieval facility tool which allows a SRU/Z39.50 to describe itself and perform record conversions. The idea is the following: - + @@ -2188,13 +2251,13 @@ void cql_to_xml_stdio(struct cql_node *cn, FILE *f); Retrieval XML format - All elements should be covered by namespace + All elements should be covered by namespace http://indexdata.com/yaz . The root element node must be retrievalinfo. The retrievalinfo must include one or - more retrieval elements. Each + more retrieval elements. Each retrieval defines specific combination of syntax, name and identifier supported by this retrieval service. @@ -2235,7 +2298,7 @@ void cql_to_xml_stdio(struct cql_node *cn, FILE *f); - The retrieval may include one + The retrieval may include one backend element. If a backend element is given, it specifies how the records are retrieved by some backend and how the records are converted from the backend to @@ -2256,8 +2319,8 @@ void cql_to_xml_stdio(struct cql_node *cn, FILE *f); marc - The marc element specifies a conversion - to - and from ISO2709 encoded MARC and + The marc element specifies a conversion + to - and from ISO2709 encoded MARC and &acro.marcxml;/MarcXchange. The following attributes may be specified: @@ -2265,7 +2328,7 @@ void cql_to_xml_stdio(struct cql_node *cn, FILE *f); inputformat (REQUIRED) - Format of input. Supported values are + Format of input. Supported values are marc (for ISO2709); and xml for MARCXML/MarcXchange. @@ -2275,8 +2338,8 @@ void cql_to_xml_stdio(struct cql_node *cn, FILE *f); outputformat (REQUIRED) - Format of output. Supported values are - line (MARC line format); + Format of output. Supported values are + line (MARC line format); marcxml (for MARCXML), marc (ISO2709), marcxhcange (for MarcXchange). @@ -2387,19 +2450,69 @@ void cql_to_xml_stdio(struct cql_node *cn, FILE *f); + + + MARCXML backend + + SRW/SRU and Solr backends returns records in XML. + If they return MARCXML or MarcXchange, the retrieval module + can convert those into ISO2709 formats, most commonly USMARC + (AKA MARC21). + In this example, the backend returns MARCXML for schema="marcxml". + + + + + + + + + + + + + + +]]> + + + This means that our frontend supports: + + + + MARC21 records (any element set name) in MARC-8 encoding. + + + + + MARCXML records for element-set=marcxml + + + + + Dublin core records for element-set=dc. + + + + + + API It should be easy to use the retrieval systems from applications. Refer to the headers - yaz/retrieval.h and + yaz/retrieval.h and yaz/record_conv.h.
- +