-<!-- $Id: tools.xml,v 1.11 2002-05-30 20:57:31 adam Exp $ -->
<chapter id="tools"><title>Supporting Tools</title>
<para>
<token>Z_RPNQuery</token> structure. Some programmers will prefer to
construct the query manually, perhaps using
<function>odr_malloc()</function> to simplify memory management.
- The &yaz; distribution includes two separate, query-generating tools
+ The &yaz; distribution includes three separate, query-generating tools
that may be of use to you.
</para>
- <sect2><title id="PQF">Prefix Query Format</title>
+ <sect2 id="PQF"><title>Prefix Query Format</title>
<para>
Since RPN or reverse polish notation is really just a fancy way of
in simple test applications and scripting environments (like Tcl). The
demonstration client included with YAZ uses the PQF.
</para>
+
+ <note>
+ <para>
+ The PQF have been adopted by other parties developing Z39.50
+ software. It is often referred to as Prefix Query Notation
+ - PQN.
+ </para>
+ </note>
<para>
- The PQF is defined by the pquery module in the YAZ library. The
- <filename>pquery.h</filename> file provides the declaration of the
- functions
+ The PQF is defined by the pquery module in the YAZ library.
+ There are two sets of function that have similar behavior. First
+ set operates on a PQF parser handle, second set doesn't. First set
+ set of functions are more flexible than the second set. Second set
+ is obsolete and is only provided to ensure backwards compatibility.
</para>
- <screen>
-Z_RPNQuery *p_query_rpn (ODR o, oid_proto proto, const char *qbuf);
+ <para>
+ First set of functions all operate on a PQF parser handle:
+ </para>
+ <synopsis>
+ #include <yaz/pquery.h>
-Z_AttributesPlusTerm *p_query_scan (ODR o, oid_proto proto,
- Odr_oid **attributeSetP, const char *qbuf);
+ YAZ_PQF_Parser yaz_pqf_create (void);
-int p_query_attset (const char *arg);
- </screen>
+ void yaz_pqf_destroy (YAZ_PQF_Parser p);
+
+ Z_RPNQuery *yaz_pqf_parse (YAZ_PQF_Parser p, ODR o, const char *qbuf);
+
+ Z_AttributesPlusTerm *yaz_pqf_scan (YAZ_PQF_Parser p, ODR o,
+ Odr_oid **attributeSetId, const char *qbuf);
+
+
+ int yaz_pqf_error (YAZ_PQF_Parser p, const char **msg, size_t *off);
+ </synopsis>
+ <para>
+ A PQF parser is created and destructed by functions
+ <function>yaz_pqf_create</function> and
+ <function>yaz_pqf_destroy</function> respectively.
+ Function <function>yaz_pqf_parse</function> parses query given
+ by string <literal>qbuf</literal>. If parsing was successful,
+ a Z39.50 RPN Query is returned which is created using ODR stream
+ <literal>o</literal>. If parsing failed, a NULL pointer is
+ returned.
+ Function <function>yaz_pqf_scan</function> takes a scan query in
+ <literal>qbuf</literal>. If parsing was successful, the function
+ returns attributes plus term pointer and modifies
+ <literal>attributeSetId</literal> to hold attribute set for the
+ scan request - both allocated using ODR stream <literal>o</literal>.
+ If parsing failed, yaz_pqf_scan returns a NULL pointer.
+ Error information for bad queries can be obtained by a call to
+ <function>yaz_pqf_error</function> which returns an error code and
+ modifies <literal>*msg</literal> to point to an error description,
+ and modifies <literal>*off</literal> to the offset within last
+ query were parsing failed.
+ </para>
+ <para>
+ The second set of functions are declared as follows:
+ </para>
+ <synopsis>
+ #include <yaz/pquery.h>
+
+ Z_RPNQuery *p_query_rpn (ODR o, oid_proto proto, const char *qbuf);
+
+ Z_AttributesPlusTerm *p_query_scan (ODR o, oid_proto proto,
+ Odr_oid **attributeSetP, const char *qbuf);
+
+ int p_query_attset (const char *arg);
+ </synopsis>
<para>
The function <function>p_query_rpn()</function> takes as arguments an
&odr; stream (see section <link linkend="odr">The ODR Module</link>)
<para>
If the parse went well, <function>p_query_rpn()</function> returns a
pointer to a <literal>Z_RPNQuery</literal> structure which can be
- placed directly into a <literal>Z_SearchRequest</literal>.
+ placed directly into a <literal>Z_SearchRequest</literal>.
+ If parsing failed, due to syntax error, a NULL pointer is returned.
</para>
<para>
-
The <literal>p_query_attset</literal> specifies which attribute set
to use if the query doesn't specify one by the
<literal>@attrset</literal> operator.
<literallayout>
query ::= top-set query-struct.
- top-set ::= [ '@attrset' string ]
+ top-set ::= [ '@attrset' string ]
- query-struct ::= attr-spec | simple | complex
+ query-struct ::= attr-spec | simple | complex | '@term' term-type query
- attr-spec ::= '@attr' [ string ] string query-struct
+ attr-spec ::= '@attr' [ string ] string query-struct
complex ::= operator query-struct query-struct.
result-set ::= '@set' string.
- term ::= string
+ term ::= string.
proximity ::= exclusion distance ordered relation which-code unit-code.
which-code ::= 'known' | 'private' | integer.
unit-code ::= integer.
+
+ term-type ::= 'general' | 'numeric' | 'string' | 'oid' | 'datetime' | 'null'.
</literallayout>
<para>
</para>
<para>
- The following are all examples of valid queries in the PQF.
+ The @attr operator is followed by an attribute specification
+ (<literal>attr-spec</literal> above). The specification consists
+ of an optional attribute set, an attribute type-value pair and
+ a sub-query. The attribute type-value pair is packed in one string:
+ an attribute type, an equals sign, and an attribute value, like this:
+ <literal>@attr 1=1003</literal>.
+ The type is always an integer but the value may be either an
+ integer or a string (if it doesn't start with a digit character).
+ A string attribute-value is encoded as a Type-1 ``complex''
+ attribute with the list of values containing the single string
+ specified, and including no semantic indicators.
</para>
- <screen>
- dylan
-
- "bob dylan"
-
- @or "dylan" "zimmerman"
-
- @set Result-1
-
- @or @and bob dylan @set Result-1
-
- @attr 4=1 @and @attr 1=1 "bob dylan" @attr 1=4 "slow train coming"
-
- @attr 4=1 @attr 1=4 "self portrait"
-
- @prox 0 3 1 2 k 2 dylan zimmerman
-
- @and @attr 2=4 @attr gils 1=2038 -114 @attr 2=2 @attr gils 1=2039 -109
- </screen>
+ <para>
+ Version 3 of the Z39.50 specification defines various encoding of terms.
+ Use <literal>@term </literal> <replaceable>type</replaceable>
+ <replaceable>string</replaceable>,
+ where type is one of: <literal>general</literal>,
+ <literal>numeric</literal> or <literal>string</literal>
+ (for InternationalString).
+ If no term type has been given, the <literal>general</literal> form
+ is used. This is the only encoding allowed in both versions 2 and 3
+ of the Z39.50 standard.
+ </para>
+
+ <sect3 id="PQF-prox">
+ <title>Using Proximity Operators with PQF</title>
+ <note>
+ <para>
+ This is an advanced topic, describing how to construct
+ queries that make very specific requirements on the
+ relative location of their operands.
+ You may wish to skip this section and go straight to
+ <link linkend="pqf-examples">the example PQF queries</link>.
+ </para>
+ <para>
+ <warning>
+ <para>
+ Most Z39.50 servers do not support proximity searching, or
+ support only a small subset of the full functionality that
+ can be expressed using the PQF proximity operator. Be
+ aware that the ability to <emphasis>express</emphasis> a
+ query in PQF is no guarantee that any given server will
+ be able to <emphasis>execute</emphasis> it.
+ </para>
+ </warning>
+ </para>
+ </note>
+ <para>
+ The proximity operator <literal>@prox</literal> is a special
+ and more restrictive version of the conjunction operator
+ <literal>@and</literal>. Its semantics are described in
+ section 3.7.2 (Proximity) of Z39.50 the standard itself, which
+ can be read on-line at
+ <ulink url="&url.z39.50.proximity;"/>
+ </para>
+ <para>
+ In PQF, the proximity operation is represented by a sequence
+ of the form
+ <screen>
+@prox <replaceable>exclusion</replaceable> <replaceable>distance</replaceable> <replaceable>ordered</replaceable> <replaceable>relation</replaceable> <replaceable>which-code</replaceable> <replaceable>unit-code</replaceable>
+ </screen>
+ in which the meanings of the parameters are as described in in
+ the standard, and they can take the following values:
+ <itemizedlist>
+ <listitem><formalpara><title>exclusion</title><para>
+ 0 = false (i.e. the proximity condition specified by the
+ remaining parameters must be satisfied) or
+ 1 = true (the proximity condition specified by the
+ remaining parameters must <emphasis>not</emphasis> be
+ satisifed).
+ </para></formalpara></listitem>
+ <listitem><formalpara><title>distance</title><para>
+ An integer specifying the difference between the locations
+ of the operands: e.g. two adjacent words would have
+ distance=1 since their locations differ by one unit.
+ </para></formalpara></listitem>
+ <listitem><formalpara><title>ordered</title><para>
+ 1 = ordered (the operands must occur in the order the
+ query specifies them) or
+ 0 = unordered (they may appear in either order).
+ </para></formalpara></listitem>
+ <listitem><formalpara><title>relation</title><para>
+ Recognised values are
+ 1 (lessThan),
+ 2 (lessThanOrEqual),
+ 3 (equal),
+ 4 (greaterThanOrEqual),
+ 5 (greaterThan) and
+ 6 (notEqual).
+ </para></formalpara></listitem>
+ <listitem><formalpara><title>which-code</title><para>
+ <literal>known</literal>
+ or
+ <literal>k</literal>
+ (the unit-code parameter is taken from the well-known list
+ of alternatives described in below) or
+ <literal>private</literal>
+ or
+ <literal>p</literal>
+ (the unit-code paramater has semantics specific to an
+ out-of-band agreement such as a profile).
+ </para></formalpara></listitem>
+ <listitem><formalpara><title>unit-code</title><para>
+ If the which-code parameter is <literal>known</literal>
+ then the recognised values are
+ 1 (character),
+ 2 (word),
+ 3 (sentence),
+ 4 (paragraph),
+ 5 (section),
+ 6 (chapter),
+ 7 (document),
+ 8 (element),
+ 9 (subelement),
+ 10 (elementType) and
+ 11 (byte).
+ If which-code is <literal>private</literal> then the
+ acceptable values are determined by the profile.
+ </para></formalpara></listitem>
+ </itemizedlist>
+ (The numeric values of the relation and well-known unit-code
+ parameters are taken straight from
+ <ulink url="&url.z39.50.proximity.asn1;"
+ >the ASN.1</ulink> of the proximity structure in the standard.)
+ </para>
+ </sect3>
+ <sect3 id="pqf-examples"><title>PQF queries</title>
+
+ <example id="example.pqf.simple.terms">
+ <title>PQF queries using simple terms</title>
+ <para>
+ <screen>
+ dylan
+
+ "bob dylan"
+ </screen>
+ </para>
+ </example>
+ <example id="pqf.example.pqf.boolean.operators">
+ <title>PQF boolean operators</title>
+ <para>
+ <screen>
+ @or "dylan" "zimmerman"
+
+ @and @or dylan zimmerman when
+
+ @and when @or dylan zimmerman
+ </screen>
+ </para>
+ </example>
+ <example id="example.pqf.result.sets">
+ <title>PQF references to result sets</title>
+ <para>
+ <screen>
+ @set Result-1
+
+ @and @set seta @set setb
+ </screen>
+ </para>
+ </example>
+ <example id="example.pqf.attributes">
+ <title>Attributes for terms</title>
+ <para>
+ <screen>
+ @attr 1=4 computer
+
+ @attr 1=4 @attr 4=1 "self portrait"
+
+ @attrset exp1 @attr 1=1 CategoryList
+
+ @attr gils 1=2008 Copenhagen
+
+ @attr 1=/book/title computer
+ </screen>
+ </para>
+ </example>
+ <example id="example.pqf.proximity">
+ <title>PQF Proximity queries</title>
+ <para>
+ <screen>
+ @prox 0 3 1 2 k 2 dylan zimmerman
+ </screen>
+ <note><para>
+ Here the parameters 0, 3, 1, 2, k and 2 represent exclusion,
+ distance, ordered, relation, which-code and unit-code, in that
+ order. So:
+ <itemizedlist>
+ <listitem><para>
+ exclusion = 0: the proximity condition must hold
+ </para></listitem>
+ <listitem><para>
+ distance = 3: the terms must be three units apart
+ </para></listitem>
+ <listitem><para>
+ ordered = 1: they must occur in the order they are specified
+ </para></listitem>
+ <listitem><para>
+ relation = 2: lessThanOrEqual (to the distance of 3 units)
+ </para></listitem>
+ <listitem><para>
+ which-code is ``known'', so the standard unit-codes are used
+ </para></listitem>
+ <listitem><para>
+ unit-code = 2: word.
+ </para></listitem>
+ </itemizedlist>
+ So the whole proximity query means that the words
+ <literal>dylan</literal> and <literal>zimmerman</literal> must
+ both occur in the record, in that order, differing in position
+ by three or fewer words (i.e. with two or fewer words between
+ them.) The query would find ``Bob Dylan, aka. Robert
+ Zimmerman'', but not ``Bob Dylan, born as Robert Zimmerman''
+ since the distance in this case is four.
+ </para></note>
+ </para>
+ </example>
+ <example id="example.pqf.search.term.type">
+ <title>PQF specification of search term type</title>
+ <para>
+ <screen>
+ @term string "a UTF-8 string, maybe?"
+ </screen>
+ </para>
+ </example>
+ <example id="example.pqf.mixed.queries">
+ <title>PQF mixed queries</title>
+ <para>
+ <screen>
+ @or @and bob dylan @set Result-1
+
+ @attr 4=1 @and @attr 1=1 "bob dylan" @attr 1=4 "slow train coming"
+
+ @and @attr 2=4 @attr gils 1=2038 -114 @attr 2=2 @attr gils 1=2039 -109
+ </screen>
+ <note>
+ <para>
+ The last of these examples is a spatial search: in
+ <ulink url="http://www.gils.net/prof_v2.html#sec_7_4"
+ >the GILS attribute set</ulink>,
+ access point
+ 2038 indicates West Bounding Coordinate and
+ 2030 indicates East Bounding Coordinate,
+ so the query is for areas extending from -114 degrees
+ to no more than -109 degrees.
+ </para>
+ </note>
+ </para>
+ </example>
+ </sect3>
</sect2>
- <sect2><title id="CCL">Common Command Language</title>
+ <sect2 id="CCL"><title>CCL</title>
<para>
Not all users enjoy typing in prefix query structures and numerical
attribute values, even in a minimalistic test client. In the library
- world, the more intuitive Common Command Language (or ISO 8777) has
- enjoyed some popularity - especially before the widespread
+ world, the more intuitive Common Command Language - CCL (ISO 8777)
+ has enjoyed some popularity - especially before the widespread
availability of graphical interfaces. It is still useful in
applications where you for some reason or other need to provide a
symbolic language for expressing boolean query structures.
</para>
- <para>
- The <ulink url="http://europagate.dtv.dk/">EUROPAGATE</ulink>
- research project working under the Libraries programme
- of the European Commission's DG XIII has, amongst other useful tools,
- implemented a general-purpose CCL parser which produces an output
- structure that can be trivially converted to the internal RPN
- representation of &yaz; (The <literal>Z_RPNQuery</literal> structure).
- Since the CCL utility - along with the rest of the software
- produced by EUROPAGATE - is made freely available on a liberal
- license, it is included as a supplement to &yaz;.
- </para>
-
- <sect3><title>CCL Syntax</title>
+ <sect3 id="ccl.syntax">
+ <title>CCL Syntax</title>
<para>
The CCL parser obeys the following grammar for the FIND argument.
The syntax is annotated by in the lines prefixed by
- <literal>‐‐</literal>.
+ <literal>--</literal>.
</para>
<screen>
-- Proximity operator
</screen>
-
- <para>
- The following queries are all valid:
- </para>
-
- <screen>
- dylan
-
- "bob dylan"
-
- dylan or zimmerman
-
- set=1
-
- (dylan and bob) or set=1
-
- </screen>
- <para>
- Assuming that the qualifiers <literal>ti</literal>, <literal>au</literal>
- and <literal>date</literal> are defined we may use:
- </para>
-
- <screen>
- ti=self portrait
-
- au=(bob dylan and slow train coming)
-
- date>1980 and (ti=((self portrait)))
-
- </screen>
-
+
+ <example id="example.ccl.queries">
+ <title>CCL queries</title>
+ <para>
+ The following queries are all valid:
+ </para>
+
+ <screen>
+ dylan
+
+ "bob dylan"
+
+ dylan or zimmerman
+
+ set=1
+
+ (dylan and bob) or set=1
+
+ </screen>
+ <para>
+ Assuming that the qualifiers <literal>ti</literal>,
+ <literal>au</literal>
+ and <literal>date</literal> are defined we may use:
+ </para>
+
+ <screen>
+ ti=self portrait
+
+ au=(bob dylan and slow train coming)
+
+ date>1980 and (ti=((self portrait)))
+
+ </screen>
+ </example>
+
</sect3>
- <sect3><title>CCL Qualifiers</title>
-
+ <sect3 id="ccl.qualifiers">
+ <title>CCL Qualifiers</title>
+
<para>
Qualifiers are used to direct the search to a particular searchable
index, such as title (ti) and author indexes (au). The CCL standard
suggest a few short-hand notations. You can customize the CCL parser
to support a particular set of qualifiers to reflect the current target
profile. Traditionally, a qualifier would map to a particular
- use-attribute within the BIB-1 attribute set. However, you could also
- define qualifiers that would set, for example, the
- structure-attribute.
- </para>
-
- <para>
- Consider a scenario where the target support ranked searches in the
- title-index. In this case, the user could specify
- </para>
-
- <screen>
- ti,ranked=knuth computer
- </screen>
- <para>
- and the <literal>ranked</literal> would map to relation=relevance
- (2=102) and the <literal>ti</literal> would map to title (1=4).
- </para>
-
- <para>
- A "profile" with a set predefined CCL qualifiers can be read from a
- file. The YAZ client reads its CCL qualifiers from a file named
- <filename>default.bib</filename>. Each line in the file has the form:
- </para>
-
- <para>
- <replaceable>qualifier-name</replaceable>
- <replaceable>type</replaceable>=<replaceable>val</replaceable>
- <replaceable>type</replaceable>=<replaceable>val</replaceable> ...
- </para>
-
- <para>
- where <replaceable>qualifier-name</replaceable> is the name of the
- qualifier to be used (eg. <literal>ti</literal>),
- <replaceable>type</replaceable> is a BIB-1 category type and
- <replaceable>val</replaceable> is the corresponding BIB-1 attribute
- value.
- The <replaceable>type</replaceable> can be either numeric or it may be
- either <literal>u</literal> (use), <literal>r</literal> (relation),
- <literal>p</literal> (position), <literal>s</literal> (structure),
- <literal>t</literal> (truncation) or <literal>c</literal> (completeness).
- The <replaceable>qualifier-name</replaceable> <literal>term</literal>
- has a special meaning.
- The types and values for this definition is used when
- <emphasis>no</emphasis> qualifiers are present.
+ use-attribute within the BIB-1 attribute set. It is also
+ possible to set other attributes, such as the structure
+ attribute.
</para>
<para>
- Consider the following definition:
+ A CCL profile is a set of predefined CCL qualifiers that may be
+ read from a file or set in the CCL API.
+ The YAZ client reads its CCL qualifiers from a file named
+ <filename>default.bib</filename>. There are four types of
+ lines in a CCL profile: qualifier specification,
+ qualifier alias, comments and directives.
</para>
-
- <screen>
- ti u=4 s=1
- au u=1 s=1
- term s=105
- </screen>
- <para>
- Two qualifiers are defined, <literal>ti</literal> and
- <literal>au</literal>.
- They both set the structure-attribute to phrase (1).
- <literal>ti</literal>
- sets the use-attribute to 4. <literal>au</literal> sets the
- use-attribute to 1.
- When no qualifiers are used in the query the structure-attribute is
- set to free-form-text (105).
- </para>
-
+ <sect4 id="ccl.qualifier.specification">
+ <title>Qualifier specification</title>
+ <para>
+ A qualifier specification is of the form:
+ </para>
+
+ <para>
+ <replaceable>qualifier-name</replaceable>
+ [<replaceable>attributeset</replaceable><literal>,</literal>]<replaceable>type</replaceable><literal>=</literal><replaceable>val</replaceable>
+ [<replaceable>attributeset</replaceable><literal>,</literal>]<replaceable>type</replaceable><literal>=</literal><replaceable>val</replaceable> ...
+ </para>
+
+ <para>
+ where <replaceable>qualifier-name</replaceable> is the name of the
+ qualifier to be used (eg. <literal>ti</literal>),
+ <replaceable>type</replaceable> is attribute type in the attribute
+ set (Bib-1 is used if no attribute set is given) and
+ <replaceable>val</replaceable> is attribute value.
+ The <replaceable>type</replaceable> can be specified as an
+ integer or as it be specified either as a single-letter:
+ <literal>u</literal> for use,
+ <literal>r</literal> for relation,<literal>p</literal> for position,
+ <literal>s</literal> for structure,<literal>t</literal> for truncation
+ or <literal>c</literal> for completeness.
+ The attributes for the special qualifier name <literal>term</literal>
+ are used when no CCL qualifier is given in a query.
+ <table id="ccl.common.bib1.attributes">
+ <title>Common Bib-1 attributes</title>
+ <tgroup cols="2">
+ <colspec colwidth="2*" colname="type"></colspec>
+ <colspec colwidth="9*" colname="description"></colspec>
+ <thead>
+ <row>
+ <entry>Type</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry><literal>u=</literal><replaceable>value</replaceable></entry>
+ <entry>
+ Use attribute (1). Common use attributes are
+ 1 Personal-name, 4 Title, 7 ISBN, 8 ISSN, 30 Date,
+ 62 Subject, 1003 Author), 1016 Any. Specify value
+ as an integer.
+ </entry>
+ </row>
+
+ <row>
+ <entry><literal>r=</literal><replaceable>value</replaceable></entry>
+ <entry>
+ Relation attribute (2). Common values are
+ 1 <, 2 <=, 3 =, 4 >=, 5 >, 6 <>,
+ 100 phonetic, 101 stem, 102 relevance, 103 always matches.
+ </entry>
+ </row>
+
+ <row>
+ <entry><literal>p=</literal><replaceable>value</replaceable></entry>
+ <entry>
+ Position attribute (3). Values: 1 first in field, 2
+ first in any subfield, 3 any position in field.
+ </entry>
+ </row>
+
+ <row>
+ <entry><literal>s=</literal><replaceable>value</replaceable></entry>
+ <entry>
+ Structure attribute (4). Values: 1 phrase, 2 word,
+ 3 key, 4 year, 5 date, 6 word list, 100 date (un),
+ 101 name (norm), 102 name (un), 103 structure, 104 urx,
+ 105 free-form-text, 106 document-text, 107 local-number,
+ 108 string, 109 numeric string.
+ </entry>
+ </row>
+
+ <row>
+ <entry><literal>t=</literal><replaceable>value</replaceable></entry>
+ <entry>
+ Truncation attribute (5). Values: 1 right, 2 left,
+ 3 left& right, 100 none, 101 process #, 102 regular-1,
+ 103 regular-2, 104 CCL.
+ </entry>
+ </row>
+
+ <row>
+ <entry><literal>c=</literal><replaceable>value</replaceable></entry>
+ <entry>
+ Completeness attribute (6). Values: 1 incomplete subfield,
+ 2 complete subfield, 3 complete field.
+ </entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+ </para>
+ <para>
+ Refer to <xref linkend="bib1"/> or the complete
+ <ulink url="&url.z39.50.attset.bib1;">list of Bib-1 attributes</ulink>
+ </para>
+ <para>
+ It is also possible to specify non-numeric attribute values,
+ which are used in combination with certain types.
+ The special combinations are:
+
+ <table id="ccl.special.attribute.combos">
+ <title>Special attribute combos</title>
+ <tgroup cols="2">
+ <colspec colwidth="2*" colname="name"></colspec>
+ <colspec colwidth="9*" colname="description"></colspec>
+ <thead>
+ <row>
+ <entry>Name</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry><literal>s=pw</literal></entry><entry>
+ The structure is set to either word or phrase depending
+ on the number of tokens in a term (phrase-word).
+ </entry>
+ </row>
+ <row>
+ <entry><literal>s=al</literal></entry><entry>
+ Each token in the term is ANDed. (and-list).
+ This does not set the structure at all.
+ </entry>
+ </row>
+
+ <row><entry><literal>s=ol</literal></entry><entry>
+ Each token in the term is ORed. (or-list).
+ This does not set the structure at all.
+ </entry>
+ </row>
+
+ <row><entry><literal>r=o</literal></entry><entry>
+ Allows ranges and the operators greather-than, less-than, ...
+ equals.
+ This sets Bib-1 relation attribute accordingly (relation
+ ordered). A query construct is only treated as a range if
+ dash is used and that is surrounded by white-space. So
+ <literal>-1980</literal> is treated as term
+ <literal>"-1980"</literal> not <literal><= 1980</literal>.
+ If <literal>- 1980</literal> is used, however, that is
+ treated as a range.
+ </entry>
+ </row>
+
+ <row><entry><literal>r=r</literal></entry><entry>
+ Similar to <literal>r=o</literal> but assumes that terms
+ are non-negative (not prefixed with <literal>-</literal>).
+ Thus, a dash will always be treated as a range.
+ The construct <literal>1980-1990</literal> is
+ treated as a range with <literal>r=r</literal> but as a
+ single term <literal>"1980-1990"</literal> with
+ <literal>r=o</literal>. The special attribute
+ <literal>r=r</literal> is available in YAZ 2.0.24 or later.
+ </entry>
+ </row>
+
+ <row><entry><literal>t=l</literal></entry><entry>
+ Allows term to be left-truncated.
+ If term is of the form <literal>?x</literal>, the resulting
+ Type-1 term is <literal>x</literal> and truncation is left.
+ </entry>
+ </row>
+
+ <row><entry><literal>t=r</literal></entry><entry>
+ Allows term to be right-truncated.
+ If term is of the form <literal>x?</literal>, the resulting
+ Type-1 term is <literal>x</literal> and truncation is right.
+ </entry>
+ </row>
+
+ <row><entry><literal>t=n</literal></entry><entry>
+ If term is does not include <literal>?</literal>, the
+ truncation attribute is set to none (100).
+ </entry>
+ </row>
+
+ <row><entry><literal>t=b</literal></entry><entry>
+ Allows term to be both left&right truncated.
+ If term is of the form <literal>?x?</literal>, the
+ resulting term is <literal>x</literal> and trunctation is
+ set to both left&right.
+ </entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ </para>
+ <example id="example.ccl.profile"><title>CCL profile</title>
+ <para>
+ Consider the following definition:
+ </para>
+
+ <screen>
+ ti u=4 s=1
+ au u=1 s=1
+ term s=105
+ ranked r=102
+ date u=30 r=o
+ </screen>
+ <para>
+ <literal>ti</literal> and <literal>au</literal> both set
+ structure attribute to phrase (s=1).
+ <literal>ti</literal>
+ sets the use-attribute to 4. <literal>au</literal> sets the
+ use-attribute to 1.
+ When no qualifiers are used in the query the structure-attribute is
+ set to free-form-text (105) (rule for <literal>term</literal>).
+ The <literal>date</literal> sets the relation attribute to
+ the relation used in the CCL query and sets the use attribute
+ to 30 (Bib-1 Date).
+ </para>
+ <para>
+ You can combine attributes. To Search for "ranked title" you
+ can do
+ <screen>
+ ti,ranked=knuth computer
+ </screen>
+ which will set relation=ranked, use=title, structure=phrase.
+ </para>
+ <para>
+ Query
+ <screen>
+ date > 1980
+ </screen>
+ is a valid query. But
+ <screen>
+ ti > 1980
+ </screen>
+ is invalid.
+ </para>
+ </example>
+ </sect4>
+ <sect4 id="ccl.qualifier.alias">
+ <title>Qualifier alias</title>
+ <para>
+ A qualifier alias is of the form:
+ </para>
+ <para>
+ <replaceable>q</replaceable>
+ <replaceable>q1</replaceable> <replaceable>q2</replaceable> ..
+ </para>
+ <para>
+ which declares <replaceable>q</replaceable> to
+ be an alias for <replaceable>q1</replaceable>,
+ <replaceable>q2</replaceable>... such that the CCL
+ query <replaceable>q=x</replaceable> is equivalent to
+ <replaceable>q1=x or q2=x or ...</replaceable>.
+ </para>
+ </sect4>
+
+ <sect4 id="ccl.comments">
+ <title>Comments</title>
+ <para>
+ Lines with white space or lines that begin with
+ character <literal>#</literal> are treated as comments.
+ </para>
+ </sect4>
+
+ <sect4 id="ccl.directives">
+ <title>Directives</title>
+ <para>
+ Directive specifications takes the form
+ </para>
+ <para><literal>@</literal><replaceable>directive</replaceable> <replaceable>value</replaceable>
+ </para>
+ <table id="ccl.directives.table">
+ <title>CCL directives</title>
+ <tgroup cols="3">
+ <colspec colwidth="2*" colname="name"></colspec>
+ <colspec colwidth="8*" colname="description"></colspec>
+ <colspec colwidth="1*" colname="default"></colspec>
+ <thead>
+ <row>
+ <entry>Name</entry>
+ <entry>Description</entry>
+ <entry>Default</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>truncation</entry>
+ <entry>Truncation character</entry>
+ <entry><literal>?</literal></entry>
+ </row>
+ <row>
+ <entry>field</entry>
+ <entry>Specifies how multiple fields are to be
+ combined. There are two modes: <literal>or</literal>:
+ multiple qualifier fields are ORed,
+ <literal>merge</literal>: attributes for the qualifier
+ fields are merged and assigned to one term.
+ </entry>
+ <entry><literal>merge</literal></entry>
+ </row>
+ <row>
+ <entry>case</entry>
+ <entry>Specificies if CCL operatores and qualifiers should be
+ compared with case sensitivity or not. Specify 0 for
+ case sensitive; 1 for case insensitive.</entry>
+ <entry><literal>0</literal></entry>
+ </row>
+
+ <row>
+ <entry>and</entry>
+ <entry>Specifies token for CCL operator AND.</entry>
+ <entry><literal>and</literal></entry>
+ </row>
+
+ <row>
+ <entry>or</entry>
+ <entry>Specifies token for CCL operator OR.</entry>
+ <entry><literal>or</literal></entry>
+ </row>
+
+ <row>
+ <entry>not</entry>
+ <entry>Specifies token for CCL operator NOT.</entry>
+ <entry><literal>not</literal></entry>
+ </row>
+
+ <row>
+ <entry>set</entry>
+ <entry>Specifies token for CCL operator SET.</entry>
+ <entry><literal>set</literal></entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ </sect4>
</sect3>
- <sect3><title>CCL API</title>
+ <sect3 id="ccl.api">
+ <title>CCL API</title>
<para>
All public definitions can be found in the header file
<filename>ccl.h</filename>. A profile identifier is of type
</para>
</sect3>
</sect2>
+ <sect2 id="cql"><title>CQL</title>
+ <para>
+ <ulink url="&url.cql;">CQL</ulink>
+ - Common Query Language - was defined for the
+ <ulink url="&url.sru;">SRU</ulink> protocol.
+ In many ways CQL has a similar syntax to CCL.
+ The objective of CQL is different. Where CCL aims to be
+ an end-user language, CQL is <emphasis>the</emphasis> protocol
+ query language for SRU.
+ </para>
+ <tip>
+ <para>
+ If you are new to CQL, read the
+ <ulink url="&url.cql.intro;">Gentle Introduction</ulink>.
+ </para>
+ </tip>
+ <para>
+ The CQL parser in &yaz; provides the following:
+ <itemizedlist>
+ <listitem>
+ <para>
+ It parses and validates a CQL query.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ It generates a C structure that allows you to convert
+ a CQL query to some other query language, such as SQL.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ The parser converts a valid CQL query to PQF, thus providing a
+ way to use CQL for both SRU servers and Z39.50 targets at the
+ same time.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ The parser converts CQL to
+ <ulink url="&url.xcql;">XCQL</ulink>.
+ XCQL is an XML representation of CQL.
+ XCQL is part of the SRU specification. However, since SRU
+ supports CQL only, we don't expect XCQL to be widely used.
+ Furthermore, CQL has the advantage over XCQL that it is
+ easy to read.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <sect3 id="cql.parsing"><title>CQL parsing</title>
+ <para>
+ A CQL parser is represented by the <literal>CQL_parser</literal>
+ handle. Its contents should be considered &yaz; internal (private).
+ <synopsis>
+#include <yaz/cql.h>
+
+typedef struct cql_parser *CQL_parser;
+
+CQL_parser cql_parser_create(void);
+void cql_parser_destroy(CQL_parser cp);
+ </synopsis>
+ A parser is created by <function>cql_parser_create</function> and
+ is destroyed by <function>cql_parser_destroy</function>.
+ </para>
+ <para>
+ To parse a CQL query string, the following function
+ is provided:
+ <synopsis>
+int cql_parser_string(CQL_parser cp, const char *str);
+ </synopsis>
+ A CQL query is parsed by the <function>cql_parser_string</function>
+ which takes a query <parameter>str</parameter>.
+ If the query was valid (no syntax errors), then zero is returned;
+ otherwise -1 is returned to indicate a syntax error.
+ </para>
+ <para>
+ <synopsis>
+int cql_parser_stream(CQL_parser cp,
+ int (*getbyte)(void *client_data),
+ void (*ungetbyte)(int b, void *client_data),
+ void *client_data);
+
+int cql_parser_stdio(CQL_parser cp, FILE *f);
+ </synopsis>
+ The functions <function>cql_parser_stream</function> and
+ <function>cql_parser_stdio</function> parses a CQL query
+ - just like <function>cql_parser_string</function>.
+ The only difference is that the CQL query can be
+ fed to the parser in different ways.
+ The <function>cql_parser_stream</function> uses a generic
+ byte stream as input. The <function>cql_parser_stdio</function>
+ uses a <literal>FILE</literal> handle which is opened for reading.
+ </para>
+ </sect3>
+
+ <sect3 id="cql.tree"><title>CQL tree</title>
+ <para>
+ The the query string is valid, the CQL parser
+ generates a tree representing the structure of the
+ CQL query.
+ </para>
+ <para>
+ <synopsis>
+struct cql_node *cql_parser_result(CQL_parser cp);
+ </synopsis>
+ <function>cql_parser_result</function> returns the
+ a pointer to the root node of the resulting tree.
+ </para>
+ <para>
+ Each node in a CQL tree is represented by a
+ <literal>struct cql_node</literal>.
+ It is defined as follows:
+ <synopsis>
+#define CQL_NODE_ST 1
+#define CQL_NODE_BOOL 2
+struct cql_node {
+ int which;
+ union {
+ struct {
+ char *index;
+ char *index_uri;
+ char *term;
+ char *relation;
+ char *relation_uri;
+ struct cql_node *modifiers;
+ } st;
+ struct {
+ char *value;
+ struct cql_node *left;
+ struct cql_node *right;
+ struct cql_node *modifiers;
+ } boolean;
+ } u;
+};
+ </synopsis>
+ There are two node types: search term (ST) and boolean (BOOL).
+ A modifier is treated as a search term too.
+ </para>
+ <para>
+ The search term node has five members:
+ <itemizedlist>
+ <listitem>
+ <para>
+ <literal>index</literal>: index for search term.
+ If an index is unspecified for a search term,
+ <literal>index</literal> will be NULL.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>index_uri</literal>: index URi for search term
+ or NULL if none could be resolved for the index.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>term</literal>: the search term itself.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>relation</literal>: relation for search term.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>relation_uri</literal>: relation URI for search term.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>modifiers</literal>: relation modifiers for search
+ term. The <literal>modifiers</literal> list itself of cql_nodes
+ each of type <literal>ST</literal>.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+
+ <para>
+ The boolean node represents both <literal>and</literal>,
+ <literal>or</literal>, not as well as
+ proximity.
+ <itemizedlist>
+ <listitem>
+ <para>
+ <literal>left</literal> and <literal>right</literal>: left
+ - and right operand respectively.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>modifiers</literal>: proximity arguments.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+
+ </sect3>
+ <sect3 id="cql.to.pqf"><title>CQL to PQF conversion</title>
+ <para>
+ Conversion to PQF (and Z39.50 RPN) is tricky by the fact
+ that the resulting RPN depends on the Z39.50 target
+ capabilities (combinations of supported attributes).
+ In addition, the CQL and SRU operates on index prefixes
+ (URI or strings), whereas the RPN uses Object Identifiers
+ for attribute sets.
+ </para>
+ <para>
+ The CQL library of &yaz; defines a <literal>cql_transform_t</literal>
+ type. It represents a particular mapping between CQL and RPN.
+ This handle is created and destroyed by the functions:
+ <synopsis>
+cql_transform_t cql_transform_open_FILE (FILE *f);
+cql_transform_t cql_transform_open_fname(const char *fname);
+void cql_transform_close(cql_transform_t ct);
+ </synopsis>
+ The first two functions create a tranformation handle from
+ either an already open FILE or from a filename respectively.
+ </para>
+ <para>
+ The handle is destroyed by <function>cql_transform_close</function>
+ in which case no further reference of the handle is allowed.
+ </para>
+ <para>
+ When a <literal>cql_transform_t</literal> handle has been created
+ you can convert to RPN.
+ <synopsis>
+int cql_transform_buf(cql_transform_t ct,
+ struct cql_node *cn, char *out, int max);
+ </synopsis>
+ This function converts the CQL tree <literal>cn</literal>
+ using handle <literal>ct</literal>.
+ For the resulting PQF, you supply a buffer <literal>out</literal>
+ which must be able to hold at at least <literal>max</literal>
+ characters.
+ </para>
+ <para>
+ If conversion failed, <function>cql_transform_buf</function>
+ returns a non-zero SRU error code; otherwise zero is returned
+ (conversion successful). The meanings of the numeric error
+ codes are listed in the SRU specifications at
+ <ulink url="&url.sru.diagnostics.list;"/>
+ </para>
+ <para>
+ If conversion fails, more information can be obtained by calling
+ <synopsis>
+int cql_transform_error(cql_transform_t ct, char **addinfop);
+ </synopsis>
+ This function returns the most recently returned numeric
+ error-code and sets the string-pointer at
+ <literal>*addinfop</literal> to point to a string containing
+ additional information about the error that occurred: for
+ example, if the error code is 15 (``Illegal or unsupported context
+ set''), the additional information is the name of the requested
+ context set that was not recognised.
+ </para>
+ <para>
+ The SRU error-codes may be translated into brief human-readable
+ error messages using
+ <synopsis>
+const char *cql_strerror(int code);
+ </synopsis>
+ </para>
+ <para>
+ If you wish to be able to produce a PQF result in a different
+ way, there are two alternatives.
+ <synopsis>
+void cql_transform_pr(cql_transform_t ct,
+ struct cql_node *cn,
+ void (*pr)(const char *buf, void *client_data),
+ void *client_data);
+
+int cql_transform_FILE(cql_transform_t ct,
+ struct cql_node *cn, FILE *f);
+ </synopsis>
+ The former function produces output to a user-defined
+ output stream. The latter writes the result to an already
+ open <literal>FILE</literal>.
+ </para>
+ </sect3>
+ <sect3 id="cql.to.rpn">
+ <title>Specification of CQL to RPN mappings</title>
+ <para>
+ The file supplied to functions
+ <function>cql_transform_open_FILE</function>,
+ <function>cql_transform_open_fname</function> follows
+ a structure found in many Unix utilities.
+ It consists of mapping specifications - one per line.
+ Lines starting with <literal>#</literal> are ignored (comments).
+ </para>
+ <para>
+ Each line is of the form
+ <literallayout>
+ <replaceable>CQL pattern</replaceable><literal> = </literal> <replaceable> RPN equivalent</replaceable>
+ </literallayout>
+ </para>
+ <para>
+ An RPN pattern is a simple attribute list. Each attribute pair
+ takes the form:
+ <literallayout>
+ [<replaceable>set</replaceable>] <replaceable>type</replaceable><literal>=</literal><replaceable>value</replaceable>
+ </literallayout>
+ The attribute <replaceable>set</replaceable> is optional.
+ The <replaceable>type</replaceable> is the attribute type,
+ <replaceable>value</replaceable> the attribute value.
+ </para>
+ <para>
+ The character <literal>*</literal> (asterisk) has special meaning
+ when used in the RPN pattern.
+ Each occurrence of <literal>*</literal> is substituted with the
+ CQL matching name (index, relation, qualifier etc).
+ This facility can be used to copy a CQL name verbatim to the RPN result.
+ </para>
+ <para>
+ The following CQL patterns are recognized:
+ <variablelist>
+ <varlistentry><term>
+ <literal>index.</literal><replaceable>set</replaceable><literal>.</literal><replaceable>name</replaceable>
+ </term>
+ <listitem>
+ <para>
+ This pattern is invoked when a CQL index, such as
+ dc.title is converted. <replaceable>set</replaceable>
+ and <replaceable>name</replaceable> are the context set and index
+ name respectively.
+ Typically, the RPN specifies an equivalent use attribute.
+ </para>
+ <para>
+ For terms not bound by an index the pattern
+ <literal>index.cql.serverChoice</literal> is used.
+ Here, the prefix <literal>cql</literal> is defined as
+ <literal>http://www.loc.gov/zing/cql/cql-indexes/v1.0/</literal>.
+ If this pattern is not defined, the mapping will fail.
+ </para>
+ <para>
+ The pattern,
+ <literal>index.</literal><replaceable>set</replaceable><literal>.*</literal>
+ is used when no other index pattern is matched.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry><term>
+ <literal>qualifier.</literal><replaceable>set</replaceable><literal>.</literal><replaceable>name</replaceable>
+ (DEPRECATED)
+ </term>
+ <listitem>
+ <para>
+ For backwards compatibility, this is recognised as a synonym of
+ <literal>index.</literal><replaceable>set</replaceable><literal>.</literal><replaceable>name</replaceable>
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry><term>
+ <literal>relation.</literal><replaceable>relation</replaceable>
+ </term>
+ <listitem>
+ <para>
+ This pattern specifies how a CQL relation is mapped to RPN.
+ <replaceable>pattern</replaceable> is name of relation
+ operator. Since <literal>=</literal> is used as
+ separator between CQL pattern and RPN, CQL relations
+ including <literal>=</literal> cannot be
+ used directly. To avoid a conflict, the names
+ <literal>ge</literal>,
+ <literal>eq</literal>,
+ <literal>le</literal>,
+ must be used for CQL operators, greater-than-or-equal,
+ equal, less-than-or-equal respectively.
+ The RPN pattern is supposed to include a relation attribute.
+ </para>
+ <para>
+ For terms not bound by a relation, the pattern
+ <literal>relation.scr</literal> is used. If the pattern
+ is not defined, the mapping will fail.
+ </para>
+ <para>
+ The special pattern, <literal>relation.*</literal> is used
+ when no other relation pattern is matched.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry><term>
+ <literal>relationModifier.</literal><replaceable>mod</replaceable>
+ </term>
+ <listitem>
+ <para>
+ This pattern specifies how a CQL relation modifier is mapped to RPN.
+ The RPN pattern is usually a relation attribute.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry><term>
+ <literal>structure.</literal><replaceable>type</replaceable>
+ </term>
+ <listitem>
+ <para>
+ This pattern specifies how a CQL structure is mapped to RPN.
+ Note that this CQL pattern is somewhat to similar to
+ CQL pattern <literal>relation</literal>.
+ The <replaceable>type</replaceable> is a CQL relation.
+ </para>
+ <para>
+ The pattern, <literal>structure.*</literal> is used
+ when no other structure pattern is matched.
+ Usually, the RPN equivalent specifies a structure attribute.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry><term>
+ <literal>position.</literal><replaceable>type</replaceable>
+ </term>
+ <listitem>
+ <para>
+ This pattern specifies how the anchor (position) of
+ CQL is mapped to RPN.
+ The <replaceable>type</replaceable> is one
+ of <literal>first</literal>, <literal>any</literal>,
+ <literal>last</literal>, <literal>firstAndLast</literal>.
+ </para>
+ <para>
+ The pattern, <literal>position.*</literal> is used
+ when no other position pattern is matched.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry><term>
+ <literal>set.</literal><replaceable>prefix</replaceable>
+ </term>
+ <listitem>
+ <para>
+ This specification defines a CQL context set for a given prefix.
+ The value on the right hand side is the URI for the set -
+ <emphasis>not</emphasis> RPN. All prefixes used in
+ index patterns must be defined this way.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry><term>
+ <literal>set</literal>
+ </term>
+ <listitem>
+ <para>
+ This specification defines a default CQL context set for index names.
+ The value on the right hand side is the URI for the set.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+ </para>
+ <example id="example.cql.to.rpn.mapping">
+ <title>CQL to RPN mapping file</title>
+ <para>
+ This simple file defines two context sets, three indexes and three
+ relations, a position pattern and a default structure.
+ </para>
+ <programlisting><![CDATA[
+ set.cql = http://www.loc.gov/zing/cql/context-sets/cql/v1.1/
+ set.dc = http://www.loc.gov/zing/cql/dc-indexes/v1.0/
+
+ index.cql.serverChoice = 1=1016
+ index.dc.title = 1=4
+ index.dc.subject = 1=21
+
+ relation.< = 2=1
+ relation.eq = 2=3
+ relation.scr = 2=3
+
+ position.any = 3=3 6=1
+
+ structure.* = 4=1
+]]>
+ </programlisting>
+ <para>
+ With the mappings above, the CQL query
+ <screen>
+ computer
+ </screen>
+ is converted to the PQF:
+ <screen>
+ @attr 1=1016 @attr 2=3 @attr 4=1 @attr 3=3 @attr 6=1 "computer"
+ </screen>
+ by rules <literal>index.cql.serverChoice</literal>,
+ <literal>relation.scr</literal>, <literal>structure.*</literal>,
+ <literal>position.any</literal>.
+ </para>
+ <para>
+ CQL query
+ <screen>
+ computer^
+ </screen>
+ is rejected, since <literal>position.right</literal> is
+ undefined.
+ </para>
+ <para>
+ CQL query
+ <screen>
+ >my = "http://www.loc.gov/zing/cql/dc-indexes/v1.0/" my.title = x
+ </screen>
+ is converted to
+ <screen>
+ @attr 1=4 @attr 2=3 @attr 4=1 @attr 3=3 @attr 6=1 "x"
+ </screen>
+ </para>
+ </example>
+ <example id="example.cql.to.rpn.string">
+ <title>CQL to RPN string attributes</title>
+ <para>
+ In this example we allow any index to be passed to RPN as
+ a use attribute.
+ </para>
+ <programlisting><![CDATA[
+ # Identifiers for prefixes used in this file. (index.*)
+ set.cql = info:srw/cql-context-set/1/cql-v1.1
+ set.rpn = http://bogus/rpn
+ set = http://bogus/rpn
+
+ # The default index when none is specified by the query
+ index.cql.serverChoice = 1=any
+
+ index.rpn.* = 1=*
+ relation.eq = 2=3
+ structure.* = 4=1
+ position.any = 3=3
+]]>
+ </programlisting>
+ <para>
+ The <literal>http://bogus/rpn</literal> context set is also the default
+ so we can make queries such as
+ <screen>
+ title = a
+ </screen>
+ which is converted to
+ <screen>
+ @attr 2=3 @attr 4=1 @attr 3=3 @attr 1=title "a"
+ </screen>
+ </para>
+ </example>
+ <example id="example.cql.to.rpn.bathprofile">
+ <title>CQL to RPN using Bath Profile</title>
+ <para>
+ The file <filename>etc/pqf.properties</filename> has mappings from
+ the Bath Profile and Dublin Core to RPN.
+ If YAZ is installed as a package it's usually located
+ in <filename>/usr/share/yaz/etc</filename> and part of the
+ development package, such as <literal>libyaz-dev</literal>.
+ </para>
+ </example>
+ </sect3>
+ <sect3 id="cql.xcql"><title>CQL to XCQL conversion</title>
+ <para>
+ Conversion from CQL to XCQL is trivial and does not
+ require a mapping to be defined.
+ There three functions to choose from depending on the
+ way you wish to store the resulting output (XML buffer
+ containing XCQL).
+ <synopsis>
+int cql_to_xml_buf(struct cql_node *cn, char *out, int max);
+void cql_to_xml(struct cql_node *cn,
+ void (*pr)(const char *buf, void *client_data),
+ void *client_data);
+void cql_to_xml_stdio(struct cql_node *cn, FILE *f);
+ </synopsis>
+ Function <function>cql_to_xml_buf</function> converts
+ to XCQL and stores result in a user supplied buffer of a given
+ max size.
+ </para>
+ <para>
+ <function>cql_to_xml</function> writes the result in
+ a user defined output stream.
+ <function>cql_to_xml_stdio</function> writes to a
+ a file.
+ </para>
+ </sect3>
+ </sect2>
</sect1>
<sect1 id="tools.oid"><title>Object Identifiers</title>
<para>
The basic YAZ representation of an OID is an array of integers,
- terminated with the value -1. The &odr; module provides two
- utility-functions to create and copy this type of data elements:
- </para>
-
- <screen>
- Odr_oid *odr_getoidbystr(ODR o, char *str);
- </screen>
-
- <para>
- Creates an OID based on a string-based representation using dots (.)
- to separate elements in the OID.
- </para>
-
- <screen>
- Odr_oid *odr_oiddup(ODR odr, Odr_oid *o);
- </screen>
-
- <para>
- Creates a copy of the OID referenced by the <emphasis>o</emphasis>
- parameter.
- Both functions take an &odr; stream as parameter. This stream is used to
- allocate memory for the data elements, which is released on a
- subsequent call to <function>odr_reset()</function> on that stream.
+ terminated with the value -1. This integer is of type
+ <literal>Odr_oid</literal>.
</para>
-
- <para>
- The OID module provides a higher-level representation of the
- family of object identifiers which describe the Z39.50 protocol and its
- related objects. The definition of the module interface is given in
- the <filename>oid.h</filename> file.
- </para>
-
<para>
- The interface is mainly based on the <literal>oident</literal> structure.
- The definition of this structure looks like this:
+ Fundamental OID operations and the type <literal>Odr_oid</literal>
+ are defined in <filename>yaz/oid_util.h</filename>.
</para>
-
- <screen>
-typedef struct oident
-{
- oid_proto proto;
- oid_class oclass;
- oid_value value;
- int oidsuffix[OID_SIZE];
- char *desc;
-} oident;
- </screen>
-
- <para>
- The proto field takes one of the values
- </para>
-
- <screen>
- PROTO_Z3950
- PROTO_SR
- </screen>
-
<para>
- If you don't care about talking to SR-based implementations (few
- exist, and they may become fewer still if and when the ISO SR and ANSI
- Z39.50 documents are merged into a single standard), you can ignore
- this field on incoming packages, and always set it to PROTO_Z3950
- for outgoing packages.
+ An OID can either be declared as a automatic variable or it can
+ allocated using the memory utilities or ODR/NMEM. It's
+ guaranteed that an OID can fit in <literal>OID_SIZE</literal> integers.
</para>
+ <example id="tools.oid.bib1.1"><title>Create OID on stack</title>
+ <para>
+ We can create an OID for the Bib-1 attribute set with:
+ <screen>
+ Odr_oid bib1[OID_SIZE];
+ bib1[0] = 1;
+ bib1[1] = 2;
+ bib1[2] = 840;
+ bib1[3] = 10003;
+ bib1[4] = 3;
+ bib1[5] = 1;
+ bib1[6] = -1;
+ </screen>
+ </para>
+ </example>
<para>
-
- The oclass field takes one of the values
- </para>
-
- <screen>
- CLASS_APPCTX
- CLASS_ABSYN
- CLASS_ATTSET
- CLASS_TRANSYN
- CLASS_DIAGSET
- CLASS_RECSYN
- CLASS_RESFORM
- CLASS_ACCFORM
- CLASS_EXTSERV
- CLASS_USERINFO
- CLASS_ELEMSPEC
- CLASS_VARSET
- CLASS_SCHEMA
- CLASS_TAGSET
- CLASS_GENERAL
- </screen>
-
- <para>
- corresponding to the OID classes defined by the Z39.50 standard.
-
- Finally, the value field takes one of the values
- </para>
-
- <screen>
- VAL_APDU
- VAL_BER
- VAL_BASIC_CTX
- VAL_BIB1
- VAL_EXP1
- VAL_EXT1
- VAL_CCL1
- VAL_GILS
- VAL_WAIS
- VAL_STAS
- VAL_DIAG1
- VAL_ISO2709
- VAL_UNIMARC
- VAL_INTERMARC
- VAL_CCF
- VAL_USMARC
- VAL_UKMARC
- VAL_NORMARC
- VAL_LIBRISMARC
- VAL_DANMARC
- VAL_FINMARC
- VAL_MAB
- VAL_CANMARC
- VAL_SBN
- VAL_PICAMARC
- VAL_AUSMARC
- VAL_IBERMARC
- VAL_EXPLAIN
- VAL_SUTRS
- VAL_OPAC
- VAL_SUMMARY
- VAL_GRS0
- VAL_GRS1
- VAL_EXTENDED
- VAL_RESOURCE1
- VAL_RESOURCE2
- VAL_PROMPT1
- VAL_DES1
- VAL_KRB1
- VAL_PRESSET
- VAL_PQUERY
- VAL_PCQUERY
- VAL_ITEMORDER
- VAL_DBUPDATE
- VAL_EXPORTSPEC
- VAL_EXPORTINV
- VAL_NONE
- VAL_SETM
- VAL_SETG
- VAL_VAR1
- VAL_ESPEC1
- </screen>
-
- <para>
- again, corresponding to the specific OIDs defined by the standard.
+ And OID may also be filled from a string-based representation using
+ dots (.). This is achieved by function
+ <screen>
+ int oid_dotstring_to_oid(const char *name, Odr_oid *oid);
+ </screen>
+ This functions returns 0 if name could be converted; -1 otherwise.
</para>
-
- <para>
- The desc field contains a brief, mnemonic name for the OID in question.
+ <example id="tools.oid.bib1.2"><title>Using oid_oiddotstring_to_oid</title>
+ <para>
+ We can fill the Bib-1 attribute set OID easier with:
+ <screen>
+ Odr_oid bib1[OID_SIZE];
+ oid_oiddotstring_to_oid("1.2.840.10003.3.1", bib1);
+ </screen>
</para>
-
+ </example>
<para>
- The function
- </para>
-
+ We can also allocate an OID dynamically on a ODR stream with:
<screen>
- struct oident *oid_getentbyoid(int *o);
+ Odr_oid *odr_getoidbystr(ODR o, const char *str);
</screen>
-
- <para>
- takes as argument an OID, and returns a pointer to a static area
- containing an <literal>oident</literal> structure. You typically use
- this function when you receive a PDU containing an OID, and you wish
- to branch out depending on the specific OID value.
- </para>
-
- <para>
- The function
+ This creates an OID from string-based representation using dots.
+ This function take an &odr; stream as parameter. This stream is used to
+ allocate memory for the data elements, which is released on a
+ subsequent call to <function>odr_reset()</function> on that stream.
</para>
- <screen>
- int *oid_ent_to_oid(struct oident *ent, int *dst);
- </screen>
-
- <para>
- Takes as argument an <literal>oident</literal> structure - in which
- the <literal>proto</literal>, <literal>oclass</literal>/, and
- <literal>value</literal> fields are assumed to be set correctly -
- and returns a pointer to a the buffer as given by <literal>dst</literal>
- containing the base
- representation of the corresponding OID. The function returns
- NULL and the array dst is unchanged if a mapping couldn't place.
- The array <literal>dst</literal> should be at least of size
- <literal>OID_SIZE</literal>.
- </para>
- <para>
-
- The <function>oid_ent_to_oid()</function> function can be used whenever
- you need to prepare a PDU containing one or more OIDs. The separation of
- the <literal>protocol</literal> element from the remainder of the
- OID-description makes it simple to write applications that can
- communicate with either Z39.50 or OSI SR-based applications.
- </para>
+ <example id="tools.oid.bib1.3"><title>Using odr_getoidbystr</title>
+ <para>
+ We can create a OID for the Bib-1 attribute set with:
+ <screen>
+ Odr_oid *bib1 = odr_getoidbystr(odr, "1.2.840.10003.3.1");
+ </screen>
+ </para>
+ </example>
<para>
The function
+ <screen>
+ char *oid_oid_to_dotstring(const Odr_oid *oid, char *oidbuf)
+ </screen>
+ does the reverse of <function>oid_oiddotstring_to_oid</function>. It
+ converts an OID to the string-based representation using dots.
+ The supplied char buffer <literal>oidbuf</literal> holds the resulting
+ string and must be at least <literal>OID_STR_MAX</literal> in size.
</para>
- <screen>
- oid_value oid_getvalbyname(const char *name);
- </screen>
-
<para>
- takes as argument a mnemonic OID name, and returns the
- <literal>/value</literal> field of the first entry in the database that
- contains the given name in its <literal>desc</literal> field.
+ OIDs can be copied with <function>oid_oidcpy</function> which takes
+ two OID lists as arguments. Alternativly, an OID copy can be allocated
+ on a ODR stream with:
+ <screen>
+ Odr_oid *odr_oiddup(ODR odr, const Odr_oid *o);
+ </screen>
</para>
-
+
<para>
- Finally, the module provides the following utility functions, whose
- meaning should be obvious:
+ OIDs can be compared with <function>oid_oidcmp</function> which returns
+ zero if the two OIDs provided are identical; non-zero otherwise.
</para>
+
+ <sect2 id="tools.oid.database"><title>OID database</title>
+ <para>
+ From YAZ version 3 and later, the oident system has been replaced
+ by an OID database. OID database is a misnomer .. the old odient
+ system was also a database.
+ </para>
+ <para>
+ The OID database is really just a map between named Object Identifiers
+ (string) and their OID raw equivalents. Most operations either
+ convert from string to OID or other way around.
+ </para>
+ <para>
+ Unfortunately, whenever we supply a string we must also specify the
+ <emphasis>OID class</emphasis>. The class is necessary because some
+ strings correspond to multiple OIDs. An example of such a string is
+ <literal>Bib-1</literal> which may either be an attribute-set
+ or a diagnostic-set.
+ </para>
+ <para>
+ Applications using the YAZ database should include
+ <filename>yaz/oid_db.h</filename>.
+ </para>
+ <para>
+ A YAZ database handle is of type <literal>yaz_oid_db_t</literal>.
+ Actually that's a pointer. You need not think deal with that.
+ YAZ has a built-in database which can be considered "constant" for
+ most purposes.
+ We can get hold that by using function <function>yaz_oid_std</function>.
+ </para>
+ <para>
+ All functions with prefix <function>yaz_string_to_oid</function>
+ converts from class + string to OID. We have variants of this
+ operation due to different memory allocation strategies.
+ </para>
+ <para>
+ All functions with prefix
+ <function>yaz_oid_to_string</function> converts from OID to string
+ + class.
+ </para>
- <screen>
- void oid_oidcpy(int *t, int *s);
- void oid_oidcat(int *t, int *s);
- int oid_oidcmp(int *o1, int *o2);
- int oid_oidlen(int *o);
- </screen>
+ <example id="tools.oid.bib1.4"><title>Create OID with YAZ DB</title>
+ <para>
+ We can create an OID for the Bib-1 attribute set on the ODR stream
+ odr with:
+ <screen>
+ Odr_oid *bib1 =
+ yaz_string_to_oid_odr(yaz_oid_std(), CLASS_ATTSET, "Bib-1", odr);
+ </screen>
+ This is more complex than using <function>odr_getoidbystr</function>.
+ You would only use <function>yaz_string_to_oid_odr</function> when the
+ string (here Bib-1) is supplied by a user or configuration.
+ </para>
+ </example>
- <note>
+ </sect2>
+ <sect2 id="tools.oid.std"><title>Standard OIDs</title>
+
+ <para>
+ All the object identifers in the standard OID database as returned
+ by <function>yaz_oid_std</function> can referenced directly in a
+ program as a constant OID.
+ Each constant OID is prefixed with <literal>yaz_oid_</literal> -
+ followed by OID class (lowercase) - then by OID name (normalized and
+ lowercase).
+ </para>
<para>
- The OID module has been criticized - and perhaps rightly so
- - for needlessly abstracting the
- representation of OIDs. Other toolkits use a simple
- string-representation of OIDs with good results. In practice, we have
- found the interface comfortable and quick to work with, and it is a
- simple matter (for what it's worth) to create applications compatible
- with both ISO SR and Z39.50. Finally, the use of the
- <literal>/oident</literal> database is by no means mandatory.
- You can easily create your own system for representing OIDs, as long
- as it is compatible with the low-level integer-array representation
- of the ODR module.
+ See <xref linkend="list-oids"/> for list of all object identifiers
+ built into YAZ.
+ These are declared in <filename>yaz/oid_std.h</filename> but are
+ included by <filename>yaz/oid_db.h</filename> as well.
</para>
- </note>
+ <example id="tools.oid.bib1.5"><title>Use a built-in OID</title>
+ <para>
+ We can allocate our own OID filled with the constant OID for
+ Bib-1 with:
+ <screen>
+ Odr_oid *bib1 = odr_oiddup(o, yaz_oid_attset_bib1);
+ </screen>
+ </para>
+ </example>
+ </sect2>
</sect1>
-
<sect1 id="tools.nmem"><title>Nibble Memory</title>
<para>
release the associated memory again. For the structures describing the
Z39.50 PDUs and related structures, it is convenient to use the
memory-management system of the &odr; subsystem (see
- <link linkend="odr-use">Using ODR</link>). However, in some circumstances
+ <xref linkend="odr.use"/>). However, in some circumstances
where you might otherwise benefit from using a simple nibble memory
management system, it may be impractical to use
<function>odr_malloc()</function> and <function>odr_reset()</function>.
</para>
</sect1>
+
+ <sect1 id="tools.log"><title>Log</title>
+ <para>
+ &yaz; has evolved a fairly complex log system which should be useful both
+ for debugging &yaz; itself, debugging applications that use &yaz;, and for
+ production use of those applications.
+ </para>
+ <para>
+ The log functions are declared in header <filename>yaz/log.h</filename>
+ and implemented in <filename>src/log.c</filename>.
+ Due to name clash with syslog and some math utilities the logging
+ interface has been modified as of YAZ 2.0.29. The obsolete interface
+ is still available if in header file <filename>yaz/log.h</filename>.
+ The key points of the interface are:
+ </para>
+ <screen>
+ void yaz_log(int level, const char *fmt, ...)
+
+ void yaz_log_init(int level, const char *prefix, const char *name);
+ void yaz_log_init_file(const char *fname);
+ void yaz_log_init_level(int level);
+ void yaz_log_init_prefix(const char *prefix);
+ void yaz_log_time_format(const char *fmt);
+ void yaz_log_init_max_size(int mx);
+
+ int yaz_log_mask_str(const char *str);
+ int yaz_log_module_level(const char *name);
+ </screen>
+
+ <para>
+ The reason for the whole log module is the <function>yaz_log</function>
+ function. It takes a bitmask indicating the log levels, a
+ <literal>printf</literal>-like format string, and a variable number of
+ arguments to log.
+ </para>
+
+ <para>
+ The <literal>log level</literal> is a bit mask, that says on which level(s)
+ the log entry should be made, and optionally set some behaviour of the
+ logging. In the most simple cases, it can be one of <literal>YLOG_FATAL,
+ YLOG_DEBUG, YLOG_WARN, YLOG_LOG</literal>. Those can be combined with bits
+ that modify the way the log entry is written:<literal>YLOG_ERRNO,
+ YLOG_NOTIME, YLOG_FLUSH</literal>.
+ Most of the rest of the bits are deprecated, and should not be used. Use
+ the dynamic log levels instead.
+ </para>
+
+ <para>
+ Applications that use &yaz;, should not use the LOG_LOG for ordinary
+ messages, but should make use of the dynamic loglevel system. This consists
+ of two parts, defining the loglevel and checking it.
+ </para>
+
+ <para>
+ To define the log levels, the (main) program should pass a string to
+ <function>yaz_log_mask_str</function> to define which log levels are to be
+ logged. This string should be a comma-separated list of log level names,
+ and can contain both hard-coded names and dynamic ones. The log level
+ calculation starts with <literal>YLOG_DEFAULT_LEVEL</literal> and adds a bit
+ for each word it meets, unless the word starts with a '-', in which case it
+ clears the bit. If the string <literal>'none'</literal> is found,
+ all bits are cleared. Typically this string comes from the command-line,
+ often identified by <literal>-v</literal>. The
+ <function>yaz_log_mask_str</function> returns a log level that should be
+ passed to <function>yaz_log_init_level</function> for it to take effect.
+ </para>
+
+ <para>
+ Each module should check what log bits it should be used, by calling
+ <function>yaz_log_module_level</function> with a suitable name for the
+ module. The name is cleared from a preceding path and an extension, if any,
+ so it is quite possible to use <literal>__FILE__</literal> for it. If the
+ name has been passed to <function>yaz_log_mask_str</function>, the routine
+ returns a non-zero bitmask, which should then be used in consequent calls
+ to yaz_log. (It can also be tested, so as to avoid unnecessary calls to
+ yaz_log, in time-critical places, or when the log entry would take time
+ to construct.)
+ </para>
+
+ <para>
+ Yaz uses the following dynamic log levels:
+ <literal>server, session, request, requestdetail</literal> for the server
+ functionality.
+ <literal>zoom</literal> for the zoom client api.
+ <literal>ztest</literal> for the simple test server.
+ <literal>malloc, nmem, odr, eventl</literal> for internal debugging of yaz itself.
+ Of course, any program using yaz is welcome to define as many new ones, as
+ it needs.
+ </para>
+
+ <para>
+ By default the log is written to stderr, but this can be changed by a call
+ to <function>yaz_log_init_file</function> or
+ <function>yaz_log_init</function>. If the log is directed to a file, the
+ file size is checked at every write, and if it exceeds the limit given in
+ <function>yaz_log_init_max_size</function>, the log is rotated. The
+ rotation keeps one old version (with a <literal>.1</literal> appended to
+ the name). The size defaults to 1GB. Setting it to zero will disable the
+ rotation feature.
+ </para>
+
+ <screen>
+ A typical yaz-log looks like this
+ 13:23:14-23/11 yaz-ztest(1) [session] Starting session from tcp:127.0.0.1 (pid=30968)
+ 13:23:14-23/11 yaz-ztest(1) [request] Init from 'YAZ' (81) (ver 2.0.28) OK
+ 13:23:17-23/11 yaz-ztest(1) [request] Search Z: @attrset Bib-1 foo OK:7 hits
+ 13:23:22-23/11 yaz-ztest(1) [request] Present: [1] 2+2 OK 2 records returned
+ 13:24:13-23/11 yaz-ztest(1) [request] Close OK
+ </screen>
+
+ <para>
+ The log entries start with a time stamp. This can be omitted by setting the
+ <literal>YLOG_NOTIME</literal> bit in the loglevel. This way automatic tests
+ can be hoped to produce identical log files, that are easy to diff. The
+ format of the time stamp can be set with
+ <function>yaz_log_time_format</function>, which takes a format string just
+ like <function>strftime</function>.
+ </para>
+
+ <para>
+ Next in a log line comes the prefix, often the name of the program. For
+ yaz-based servers, it can also contain the session number. Then
+ comes one or more logbits in square brackets, depending on the logging
+ level set by <function>yaz_log_init_level</function> and the loglevel
+ passed to <function>yaz_log_init_level</function>. Finally comes the format
+ string and additional values passed to <function>yaz_log</function>
+ </para>
+
+ <para>
+ The log level <literal>YLOG_LOGLVL</literal>, enabled by the string
+ <literal>loglevel</literal>, will log all the log-level affecting
+ operations. This can come in handy if you need to know what other log
+ levels would be useful. Grep the logfile for <literal>[loglevel]</literal>.
+ </para>
+
+ <para>
+ The log system is almost independent of the rest of &yaz;, the only
+ important dependence is of <filename>nmem</filename>, and that only for
+ using the semaphore definition there.
+ </para>
+
+ <para>
+ The dynamic log levels and log rotation were introduced in &yaz; 2.0.28. At
+ the same time, the log bit names were changed from
+ <literal>LOG_something</literal> to <literal>YLOG_something</literal>,
+ to avoid collision with <filename>syslog.h</filename>.
+ </para>
+
+ </sect1>
+
+ <sect1 id="marc"><title>MARC</title>
+
+ <para>
+ YAZ provides a fast utility that decodes MARC records and
+ encodes to a varity of output formats. The MARC records must
+ be encoded in ISO2709.
+ </para>
+ <synopsis><![CDATA[
+ #include <yaz/marcdisp.h>
+
+ /* create handler */
+ yaz_marc_t yaz_marc_create(void);
+ /* destroy */
+ void yaz_marc_destroy(yaz_marc_t mt);
+
+ /* set XML mode YAZ_MARC_LINE, YAZ_MARC_SIMPLEXML, ... */
+ void yaz_marc_xml(yaz_marc_t mt, int xmlmode);
+ #define YAZ_MARC_LINE 0
+ #define YAZ_MARC_SIMPLEXML 1
+ #define YAZ_MARC_OAIMARC 2
+ #define YAZ_MARC_MARCXML 3
+ #define YAZ_MARC_ISO2709 4
+ #define YAZ_MARC_XCHANGE 5
+
+ /* supply iconv handle for character set conversion .. */
+ void yaz_marc_iconv(yaz_marc_t mt, yaz_iconv_t cd);
+
+ /* set debug level, 0=none, 1=more, 2=even more, .. */
+ void yaz_marc_debug(yaz_marc_t mt, int level);
+
+ /* decode MARC in buf of size bsize. Returns >0 on success; <=0 on failure.
+ On success, result in *result with size *rsize. */
+ int yaz_marc_decode_buf (yaz_marc_t mt, const char *buf, int bsize,
+ char **result, int *rsize);
+
+ /* decode MARC in buf of size bsize. Returns >0 on success; <=0 on failure.
+ On success, result in WRBUF */
+ int yaz_marc_decode_wrbuf (yaz_marc_t mt, const char *buf,
+ int bsize, WRBUF wrbuf);
+]]>
+ </synopsis>
+ <para>
+ A MARC conversion handle must be created by using
+ <function>yaz_marc_create</function> and destroyed
+ by calling <function>yaz_marc_destroy</function>.
+ </para>
+ <para>
+ All other function operate on a <literal>yaz_marc_t</literal> handle.
+ The output is specified by a call to <function>yaz_marc_xml</function>.
+ The <literal>xmlmode</literal> must be one of
+ <variablelist>
+ <varlistentry>
+ <term>YAZ_MARC_LINE</term>
+ <listitem>
+ <para>
+ A simple line-by-line format suitable for display but not
+ recommend for further (machine) processing.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>YAZ_MARC_MARCXML</term>
+ <listitem>
+ <para>
+ The resulting record is converted to MARCXML.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>YAZ_MARC_ISO2709</term>
+ <listitem>
+ <para>
+ The resulting record is converted to ISO2709 (MARC).
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ <para>
+ The actual conversion functions are
+ <function>yaz_marc_decode_buf</function> and
+ <function>yaz_marc_decode_wrbuf</function> which decodes and encodes
+ a MARC record. The former function operates on simple buffers, the
+ stores the resulting record in a WRBUF handle (WRBUF is a simple string
+ type).
+ </para>
+ <example id="example.marc.display">
+ <title>Display of MARC record</title>
+ <para>
+ The followint program snippet illustrates how the MARC API may
+ be used to convert a MARC record to the line-by-line format:
+ <programlisting><![CDATA[
+ void print_marc(const char *marc_buf, int marc_buf_size)
+ {
+ char *result; /* for result buf */
+ int result_len; /* for size of result */
+ yaz_marc_t mt = yaz_marc_create();
+ yaz_marc_xml(mt, YAZ_MARC_LINE);
+ yaz_marc_decode_buf(mt, marc_buf, marc_buf_size,
+ &result, &result_len);
+ fwrite(result, result_len, 1, stdout);
+ yaz_marc_destroy(mt); /* note that result is now freed... */
+ }
+]]>
+ </programlisting>
+ </para>
+ </example>
+ </sect1>
+
+ <sect1 id="tools.retrieval">
+ <title>Retrieval Facility</title>
+ <para>
+ YAZ version 2.1.20 or later includes a Retrieval facility tool
+ which allows a SRU/Z39.50 to describe itself and perform record
+ conversions. The idea is the following:
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ An SRU/Z39.50 client sends a retrieval request which includes
+ a combination of the following parameters: syntax (format),
+ schema (or element set name).
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ The retrieval facility is invoked with parameters in a
+ server/proxy. The retrieval facility matches the parameters a set of
+ "supported" retrieval types.
+ If there is no match, the retrieval signals an error
+ (syntax and / or schema not supported).
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ For a successful match, the backend is invoked with the same
+ or altered retrieval parameters (syntax, schema). If
+ a record is received from the backend, it is converted to the
+ frontend name / syntax.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ The resulting record is sent back the client and tagged with
+ the frontend syntax / schema.
+ </para>
+ </listitem>
+
+ </itemizedlist>
+ </para>
+ <para>
+ The Retrieval facility is driven by an XML configuration. The
+ configuration is neither Z39.50 ZeeRex or SRU ZeeRex. But it
+ should be easy to generate both of them from the XML configuration.
+ (unfortunately the two versions
+ of ZeeRex differ substantially in this regard).
+ </para>
+ <sect2 id="tools.retrieval.format">
+ <title>Retrieval XML format</title>
+ <para>
+ All elements should be covered by namespace
+ <literal>http://indexdata.com/yaz</literal> .
+ The root element node must be <literal>retrievalinfo</literal>.
+ </para>
+ <para>
+ The <literal>retrievalinfo</literal> must include one or
+ more <literal>retrieval</literal> elements. Each
+ <literal>retrieval</literal> defines specific combination of
+ syntax, name and identifier supported by this retrieval service.
+ </para>
+ <para>
+ The <literal>retrieval</literal> element may include any of the
+ following attributes:
+ <variablelist>
+ <varlistentry><term><literal>syntax</literal> (REQUIRED)</term>
+ <listitem>
+ <para>
+ Defines the record syntax. Possible values is any
+ of the names defined in YAZ' OID database or a raw
+ OID in (n.n ... n).
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry><term><literal>name</literal> (OPTIONAL)</term>
+ <listitem>
+ <para>
+ Defines the name of the retrieval format. This can be
+ any string. For SRU, the value, is equivalent to schema (short-hand);
+ for Z39.50 it's equivalent to simple element set name.
+ For YAZ 3.0.24 and later this name may be specified as a glob
+ expression with operators
+ <literal>*</literal> and <literal>?</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry><term><literal>identifier</literal> (OPTIONAL)</term>
+ <listitem>
+ <para>
+ Defines the URI schema name of the retrieval format. This can be
+ any string. For SRU, the value, is equivalent to URI schema.
+ For Z39.50, there is no equivalent.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ <para>
+ The <literal>retrieval</literal> may include one
+ <literal>backend</literal> element. If a <literal>backend</literal>
+ element is given, it specifies how the records are retrieved by
+ some backend and how the records are converted from the backend to
+ the "frontend".
+ </para>
+ <para>
+ The attributes, <literal>name</literal> and <literal>syntax</literal>
+ may be specified for the <literal>backend</literal> element. These
+ semantics of these attributes is equivalent to those for the
+ <literal>retrieval</literal>. However, these values are passed to
+ the "backend".
+ </para>
+ <para>
+ The <literal>backend</literal> element may includes one or more
+ conversion instructions (as children elements). The supported
+ conversions are:
+ <variablelist>
+ <varlistentry><term><literal>marc</literal></term>
+ <listitem>
+ <para>
+ The <literal>marc</literal> element specifies a conversion
+ to - and from ISO2709 encoded MARC and
+ <ulink url="&url.marcxml;">&acro.marcxml;</ulink>/MarcXchange.
+ The following attributes may be specified:
+
+ <variablelist>
+ <varlistentry><term><literal>inputformat</literal> (REQUIRED)</term>
+ <listitem>
+ <para>
+ Format of input. Supported values are
+ <literal>marc</literal> (for ISO2709); and <literal>xml</literal>
+ for MARCXML/MarcXchange.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry><term><literal>outputformat</literal> (REQUIRED)</term>
+ <listitem>
+ <para>
+ Format of output. Supported values are
+ <literal>line</literal> (MARC line format);
+ <literal>marcxml</literal> (for MARCXML),
+ <literal>marc</literal> (ISO2709),
+ <literal>marcxhcange</literal> (for MarcXchange).
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry><term><literal>inputcharset</literal> (OPTIONAL)</term>
+ <listitem>
+ <para>
+ Encoding of input. For XML input formats, this need not
+ be given, but for ISO2709 based inputformats, this should
+ be set to the encoding used. For MARC21 records, a common
+ inputcharset value would be <literal>marc-8</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry><term><literal>outputcharset</literal> (OPTIONAL)</term>
+ <listitem>
+ <para>
+ Encoding of output. If outputformat is XML based, it is
+ strongly recommened to use <literal>utf-8</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry><term><literal>xslt</literal></term>
+ <listitem>
+ <para>
+ The <literal>xslt</literal> element specifies a conversion
+ via &acro.xslt;. The following attributes may be specified:
+
+ <variablelist>
+ <varlistentry><term><literal>stylesheet</literal> (REQUIRED)</term>
+ <listitem>
+ <para>
+ Stylesheet file.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </sect2>
+ <sect2 id="tools.retrieval.examples">
+ <title>Retrieval Facility Examples</title>
+ <example id="tools.retrieval.marc21">
+ <title>MARC21 backend</title>
+ <para>
+ A typical way to use the retrieval facility is to enable XML
+ for servers that only supports ISO2709 encoded MARC21 records.
+ </para>
+ <programlisting><![CDATA[
+ <retrievalinfo>
+ <retrieval syntax="usmarc" name="F"/>
+ <retrieval syntax="usmarc" name="B"/>
+ <retrieval syntax="xml" name="marcxml"
+ identifier="info:srw/schema/1/marcxml-v1.1">
+ <backend syntax="usmarc" name="F">
+ <marc inputformat="marc" outputformat="marcxml"
+ inputcharset="marc-8"/>
+ </backend>
+ </retrieval>
+ <retrieval syntax="xml" name="dc">
+ <backend syntax="usmarc" name="F">
+ <marc inputformat="marc" outputformat="marcxml"
+ inputcharset="marc-8"/>
+ <xslt stylesheet="MARC21slim2DC.xsl"/>
+ </backend>
+ </retrieval>
+ </retrievalinfo>
+]]>
+ </programlisting>
+ <para>
+ This means that our frontend supports:
+ <itemizedlist>
+ <listitem>
+ <para>
+ MARC21 F(ull) records.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ MARC21 B(rief) records.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ MARCXML records.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Dublin core records.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ </example>
+ </sect2>
+ <sect2 id="tools.retrieval.api">
+ <title>API</title>
+ <para>
+ It should be easy to use the retrieval systems from applications. Refer
+ to the headers
+ <filename>yaz/retrieval.h</filename> and
+ <filename>yaz/record_conv.h</filename>.
+ </para>
+ </sect2>
+ </sect1>
</chapter>
<!-- Keep this comment at the end of the file