-<!-- $Id: tools.xml,v 1.22 2003-03-18 13:30:21 adam Exp $ -->
+<!-- $Id: tools.xml,v 1.35 2004-04-22 13:12:49 adam Exp $ -->
<chapter id="tools"><title>Supporting Tools</title>
<para>
<token>Z_RPNQuery</token> structure. Some programmers will prefer to
construct the query manually, perhaps using
<function>odr_malloc()</function> to simplify memory management.
- The &yaz; distribution includes two separate, query-generating tools
+ The &yaz; distribution includes three separate, query-generating tools
that may be of use to you.
</para>
top-set ::= [ '@attrset' string ]
- query-struct ::= attr-spec | simple | complex | '@term' term-type
+ query-struct ::= attr-spec | simple | complex | '@term' term-type query
attr-spec ::= '@attr' [ string ] string query-struct
<para>
The @attr operator is followed by an attribute specification
(<literal>attr-spec</literal> above). The specification consists
- of optional an attribute set, an attribute type-value pair and
- a sub query. The attribute type-value pair is packed in one string:
- an attribute type, a dash, followed by an attribute value.
+ of an optional attribute set, an attribute type-value pair and
+ a sub-query. The attribute type-value pair is packed in one string:
+ an attribute type, an equals sign, and an attribute value, like this:
+ <literal>@attr 1=1003</literal>.
The type is always an integer but the value may be either an
integer or a string (if it doesn't start with a digit character).
+ A string attribute-value is encoded as a Type-1 ``complex''
+ attribute with the list of values containing the single string
+ specified, and including no semantic indicators.
</para>
<para>
Version 3 of the Z39.50 specification defines various encoding of terms.
- Use the <literal>@term </literal> <replaceable>type</replaceable>,
+ Use <literal>@term </literal> <replaceable>type</replaceable>
+ <replaceable>string</replaceable>,
where type is one of: <literal>general</literal>,
- <literal>numeric</literal>, <literal>string</literal>
- (for InternationalString), ..
+ <literal>numeric</literal> or <literal>string</literal>
+ (for InternationalString).
If no term type has been given, the <literal>general</literal> form
- is used which is the only encoding allowed in both version 2 - and 3
+ is used. This is the only encoding allowed in both versions 2 and 3
of the Z39.50 standard.
</para>
- <example><title>PQF queries</title>
+ <sect3 id="PQF-prox">
+ <title>Using Proximity Operators with PQF</title>
+ <note>
+ <para>
+ This is an advanced topic, describing how to construct
+ queries that make very specific requirements on the
+ relative location of their operands.
+ You may wish to skip this section and go straight to
+ <link linkend="pqf-examples">the example PQF queries</link>.
+ </para>
+ <para>
+ <warning>
+ <para>
+ Most Z39.50 servers do not support proximity searching, or
+ support only a small subset of the full functionality that
+ can be expressed using the PQF proximity operator. Be
+ aware that the ability to <emphasis>express</emphasis> a
+ query in PQF is no guarantee that any given server will
+ be able to <emphasis>execute</emphasis> it.
+ </para>
+ </warning>
+ </para>
+ </note>
+ <para>
+ The proximity operator <literal>@prox</literal> is a special
+ and more restrictive version of the conjunction operator
+ <literal>@and</literal>. Its semantics are described in
+ section 3.7.2 (Proximity) of Z39.50 the standard itself, which
+ can be read on-line at
+ <ulink url="http://lcweb.loc.gov/z3950/agency/markup/09.html"/>
+ </para>
+ <para>
+ In PQF, the proximity operation is represented by a sequence
+ of the form
+ <screen>
+@prox <replaceable>exclusion</replaceable> <replaceable>distance</replaceable> <replaceable>ordered</replaceable> <replaceable>relation</replaceable> <replaceable>which-code</replaceable> <replaceable>unit-code</replaceable>
+ </screen>
+ in which the meanings of the parameters are as described in in
+ the standard, and they can take the following values:
+ <itemizedlist>
+ <listitem><formalpara><title>exclusion</title><para>
+ 0 = false (i.e. the proximity condition specified by the
+ remaining parameters must be satisfied) or
+ 1 = true (the proximity condition specified by the
+ remaining parameters must <emphasis>not</emphasis> be
+ satisifed).
+ </para></formalpara></listitem>
+ <listitem><formalpara><title>distance</title><para>
+ An integer specifying the difference between the locations
+ of the operands: e.g. two adjacent words would have
+ distance=1 since their locations differ by one unit.
+ </para></formalpara></listitem>
+ <listitem><formalpara><title>ordered</title><para>
+ 1 = ordered (the operands must occur in the order the
+ query specifies them) or
+ 0 = unordered (they may appear in either order).
+ </para></formalpara></listitem>
+ <listitem><formalpara><title>relation</title><para>
+ Recognised values are
+ 1 (lessThan),
+ 2 (lessThanOrEqual),
+ 3 (equal),
+ 4 (greaterThanOrEqual),
+ 5 (greaterThan) and
+ 6 (notEqual).
+ </para></formalpara></listitem>
+ <listitem><formalpara><title>which-code</title><para>
+ <literal>known</literal>
+ or
+ <literal>k</literal>
+ (the unit-code parameter is taken from the well-known list
+ of alternatives described in below) or
+ <literal>private</literal>
+ or
+ <literal>p</literal>
+ (the unit-code paramater has semantics specific to an
+ out-of-band agreement such as a profile).
+ </para></formalpara></listitem>
+ <listitem><formalpara><title>unit-code</title><para>
+ If the which-code parameter is <literal>known</literal>
+ then the recognised values are
+ 1 (character),
+ 2 (word),
+ 3 (sentence),
+ 4 (paragraph),
+ 5 (section),
+ 6 (chapter),
+ 7 (document),
+ 8 (element),
+ 9 (subelement),
+ 10 (elementType) and
+ 11 (byte).
+ If which-code is <literal>private</literal> then the
+ acceptable values are determined by the profile.
+ </para></formalpara></listitem>
+ </itemizedlist>
+ (The numeric values of the relation and well-known unit-code
+ parameters are taken straight from
+ <ulink url="http://lcweb.loc.gov/z3950/agency/asn1.html#ProximityOperator"
+ >the ASN.1</ulink> of the proximity structure in the standard.)
+ </para>
+ </sect3>
- <para>Queries using simple terms.
- <screen>
- dylan
- "bob dylan"
- </screen>
- </para>
- <para>Boolean operators.
- <screen>
- @or "dylan" "zimmerman"
- @and @or dylan zimmerman when
- @and when @or dylan zimmerman
- </screen>
- </para>
- <para>
- Reference to result sets.
- <screen>
- @set Result-1
- @and @set seta setb
- </screen>
- </para>
- <para>
- Attributes for terms.
- <screen>
- @attr 1=4 computer
- @attr 1=4 @attr 4=1 "self portrait"
- @attr exp1 @attr 1=1 CategoryList
- @attr gils 1=2008 Copenhagen
- @attr 1=/book/title computer
- </screen>
- </para>
- <para>
- Proximity.
- <screen>
- @prox 0 3 1 2 k 2 dylan zimmerman
+ <sect3 id="pqf-examples"><title>PQF queries</title>
+
+ <example><title>PQF queries using simple terms</title>
+ <para>
+ <screen>
+ dylan
+ "bob dylan"
</screen>
</para>
- <para>
- Specifying term type.
- <screen>
- @term string "a UTF-8 string, maybe?"
- </screen>
- </para>
- <para>Mixed queries
- <screen>
- @or @and bob dylan @set Result-1
-
- @attr 4=1 @and @attr 1=1 "bob dylan" @attr 1=4 "slow train coming"
-
- @and @attr 2=4 @attr gils 1=2038 -114 @attr 2=2 @attr gils 1=2039 -109
+ </example>
+ <example><title>PQF boolean operators</title>
+ <para>
+ <screen>
+ @or "dylan" "zimmerman"
+ @and @or dylan zimmerman when
+ @and when @or dylan zimmerman
+ </screen>
+ </para>
+ </example>
+ <example><title>PQF references to result sets</title>
+ <para>
+ <screen>
+ @set Result-1
+ @and @set seta setb
+ </screen>
+ </para>
+ </example>
+ <example><title>Attributes for terms</title>
+ <para>
+ <screen>
+ @attr 1=4 computer
+ @attr 1=4 @attr 4=1 "self portrait"
+ @attrset exp1 @attr 1=1 CategoryList
+ @attr gils 1=2008 Copenhagen
+ @attr 1=/book/title computer
+ </screen>
+ </para>
+ </example>
+ <example><title>PQF Proximity queries</title>
+ <para>
+ <screen>
+ @prox 0 3 1 2 k 2 dylan zimmerman
+ </screen>
+ <note><para>
+ Here the parameters 0, 3, 1, 2, k and 2 represent exclusion,
+ distance, ordered, relation, which-code and unit-code, in that
+ order. So:
+ <itemizedlist>
+ <listitem><para>
+ exclusion = 0: the proximity condition must hold
+ </para></listitem>
+ <listitem><para>
+ distance = 3: the terms must be three units apart
+ </para></listitem>
+ <listitem><para>
+ ordered = 1: they must occur in the order they are specified
+ </para></listitem>
+ <listitem><para>
+ relation = 2: lessThanOrEqual (to the distance of 3 units)
+ </para></listitem>
+ <listitem><para>
+ which-code is ``known'', so the standard unit-codes are used
+ </para></listitem>
+ <listitem><para>
+ unit-code = 2: word.
+ </para></listitem>
+ </itemizedlist>
+ So the whole proximity query means that the words
+ <literal>dylan</literal> and <literal>zimmerman</literal> must
+ both occur in the record, in that order, differing in position
+ by three or fewer words (i.e. with two or fewer words between
+ them.) The query would find ``Bob Dylan, aka. Robert
+ Zimmerman'', but not ``Bob Dylan, born as Robert Zimmerman''
+ since the distance in this case is four.
+ </para></note>
+ </para>
+ </example>
+ <example><title>PQF specification of search term</title>
+ <para>
+ <screen>
+ @term string "a UTF-8 string, maybe?"
+ </screen>
+ </para>
+ </example>
+ <example><title>PQF mixed queries</title>
+ <para>
+ <screen>
+ @or @and bob dylan @set Result-1
+
+ @attr 4=1 @and @attr 1=1 "bob dylan" @attr 1=4 "slow train coming"
+
+ @and @attr 2=4 @attr gils 1=2038 -114 @attr 2=2 @attr gils 1=2039 -109
</screen>
- </para>
- </example>
+ <note>
+ <para>
+ The last of these examples is a spatial search: in
+ <ulink url="http://www.gils.net/prof_v2.html#sec_7_4"
+ >the GILS attribute set</ulink>,
+ access point
+ 2038 indicates West Bounding Coordinate and
+ 2030 indicates East Bounding Coordinate,
+ so the query is for areas extending from -114 degrees
+ to no more than -109 degrees.
+ </para>
+ </note>
+ </para>
+ </example>
+ </sect3>
</sect2>
- <sect2 id="CCL"><title>Common Command Language</title>
+ <sect2 id="CCL"><title>CCL</title>
<para>
Not all users enjoy typing in prefix query structures and numerical
attribute values, even in a minimalistic test client. In the library
- world, the more intuitive Common Command Language (or ISO 8777) has
- enjoyed some popularity - especially before the widespread
+ world, the more intuitive Common Command Language - CCL (ISO 8777)
+ has enjoyed some popularity - especially before the widespread
availability of graphical interfaces. It is still useful in
applications where you for some reason or other need to provide a
symbolic language for expressing boolean query structures.
</para>
<para>
- The <ulink url="http://europagate.dtv.dk/">EUROPAGATE</ulink>
- research project working under the Libraries programme
+ The EUROPAGATE research project working under the Libraries programme
of the European Commission's DG XIII has, amongst other useful tools,
implemented a general-purpose CCL parser which produces an output
structure that can be trivially converted to the internal RPN
suggest a few short-hand notations. You can customize the CCL parser
to support a particular set of qualifiers to reflect the current target
profile. Traditionally, a qualifier would map to a particular
- use-attribute within the BIB-1 attribute set. However, you could also
- define qualifiers that would set, for example, the
- structure-attribute.
+ use-attribute within the BIB-1 attribute set. It is also
+ possible to set other attributes, such as the structure
+ attribute.
</para>
<para>
A CCL profile is a set of predefined CCL qualifiers that may be
- read from a file.
+ read from a file or set in the CCL API.
The YAZ client reads its CCL qualifiers from a file named
- <filename>default.bib</filename>. Each line in the file has the form:
- </para>
-
- <para>
- <replaceable>qualifier-name</replaceable>
- [<replaceable>attributeset</replaceable><literal>,</literal>]<replaceable>type</replaceable><literal>=</literal><replaceable>val</replaceable>
- [<replaceable>attributeset</replaceable><literal>,</literal>]<replaceable>type</replaceable><literal>=</literal><replaceable>val</replaceable> ...
+ <filename>default.bib</filename>. There are four types of
+ lines in a CCL profile: qualifier specification,
+ qualifier alias, comments and directives.
</para>
-
- <para>
- where <replaceable>qualifier-name</replaceable> is the name of the
- qualifier to be used (eg. <literal>ti</literal>),
- <replaceable>type</replaceable> is attribute type in the attribute
- set (Bib-1 is used if no attribute set is given) and
- <replaceable>val</replaceable> is attribute value.
- The <replaceable>type</replaceable> can be specified as an
- integer or as it be specified either as a single-letter:
- <literal>u</literal> for use,
- <literal>r</literal> for relation,<literal>p</literal> for position,
- <literal>s</literal> for structure,<literal>t</literal> for truncation
- or <literal>c</literal> for completeness.
- The attributes for the special qualifier name <literal>term</literal>
- are used when no CCL qualifier is given in a query.
- </para>
-
- <example><title>CCL profile</title>
+ <sect4><title id="qualifier-specification">Qualifier specification</title>
<para>
- Consider the following definition:
+ A qualifier specification is of the form:
</para>
- <screen>
- ti u=4 s=1
- au u=1 s=1
- term s=105
- ranked r=102
- </screen>
<para>
- Three qualifiers are defined, <literal>ti</literal>,
- <literal>au</literal> and <literal>ranked</literal>.
- <literal>ti</literal> and <literal>au</literal> both set
- structure attribute to phrase (s=1).
- <literal>ti</literal>
- sets the use-attribute to 4. <literal>au</literal> sets the
- use-attribute to 1.
- When no qualifiers are used in the query the structure-attribute is
- set to free-form-text (105).
+ <replaceable>qualifier-name</replaceable>
+ [<replaceable>attributeset</replaceable><literal>,</literal>]<replaceable>type</replaceable><literal>=</literal><replaceable>val</replaceable>
+ [<replaceable>attributeset</replaceable><literal>,</literal>]<replaceable>type</replaceable><literal>=</literal><replaceable>val</replaceable> ...
</para>
+
<para>
- You can combine attributes. To Search for "ranked title" you
- can do
+ where <replaceable>qualifier-name</replaceable> is the name of the
+ qualifier to be used (eg. <literal>ti</literal>),
+ <replaceable>type</replaceable> is attribute type in the attribute
+ set (Bib-1 is used if no attribute set is given) and
+ <replaceable>val</replaceable> is attribute value.
+ The <replaceable>type</replaceable> can be specified as an
+ integer or as it be specified either as a single-letter:
+ <literal>u</literal> for use,
+ <literal>r</literal> for relation,<literal>p</literal> for position,
+ <literal>s</literal> for structure,<literal>t</literal> for truncation
+ or <literal>c</literal> for completeness.
+ The attributes for the special qualifier name <literal>term</literal>
+ are used when no CCL qualifier is given in a query.
+ <table><title>Common Bib-1 attributes</title>
+ <tgroup cols="2">
+ <colspec colwidth="2*" colname="type"></colspec>
+ <colspec colwidth="9*" colname="description"></colspec>
+ <thead>
+ <row>
+ <entry>Type</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry><literal>u=</literal><replaceable>value</replaceable></entry>
+ <entry>
+ Use attribute. Common use attributes are
+ 1 Personal-name, 4 Title, 7 ISBN, 8 ISSN, 30 Date,
+ 62 Subject, 1003 Author), 1016 Any. Specify value
+ as an integer.
+ </entry>
+ </row>
+
+ <row>
+ <entry><literal>r=</literal><replaceable>value</replaceable></entry>
+ <entry>
+ Relation attribute. Common values are
+ 1 <, 2 <=, 3 =, 4 >=, 5 >, 6 <>,
+ 100 phonetic, 101 stem, 102 relevance, 103 always matches.
+ </entry>
+ </row>
+
+ <row>
+ <entry><literal>p=</literal><replaceable>value</replaceable></entry>
+ <entry>
+ Position attribute. Values: 1 first in field, 2
+ first in any subfield, 3 any position in field.
+ </entry>
+ </row>
+
+ <row>
+ <entry><literal>s=</literal><replaceable>value</replaceable></entry>
+ <entry>
+ Structure attribute. Values: 1 phrase, 2 word,
+ 3 key, 4 year, 5 date, 6 word list, 100 date (un),
+ 101 name (norm), 102 name (un), 103 structure, 104 urx,
+ 105 free-form-text, 106 document-text, 107 local-number,
+ 108 string, 109 numeric string.
+ </entry>
+ </row>
+
+ <row>
+ <entry><literal>t=</literal><replaceable>value</replaceable></entry>
+ <entry>
+ Truncation attribute. Values: 1 right, 2 left,
+ 3 left& right, 100 none, 101 process #, 102 regular-1,
+ 103 regular-2, 104 CCL.
+ </entry>
+ </row>
+
+ <row>
+ <entry><literal>c=</literal><replaceable>value</replaceable></entry>
+ <entry>
+ Completeness attribute. Values: 1 incomplete subfield,
+ 2 complete subfield, 3 complete field.
+ </entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+ </para>
+ <para>
+ The complete list of Bib-1 attributes can be found
+ <ulink url="http://lcweb.loc.gov/z3950/agency/defns/bib1.html">
+ here
+ </ulink>.
+ </para>
+ <para>
+ It is also possible to specify non-numeric attribute values,
+ which are used in combination with certain types.
+ The special combinations are:
+
+ <table><title>Special attribute combos</title>
+ <tgroup cols="2">
+ <colspec colwidth="2*" colname="name"></colspec>
+ <colspec colwidth="9*" colname="description"></colspec>
+ <thead>
+ <row>
+ <entry>Name</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry><literal>s=pw</literal></entry><entry>
+ The structure is set to either word or phrase depending
+ on the number of tokens in a term (phrase-word).
+ </entry>
+ </row>
+ <row>
+ <entry><literal>s=al</literal></entry><entry>
+ Each token in the term is ANDed. (and-list).
+ This does not set the structure at all.
+ </entry>
+ </row>
+
+ <row><entry><literal>s=ol</literal></entry><entry>
+ Each token in the term is ORed. (or-list).
+ This does not set the structure at all.
+ </entry>
+ </row>
+
+ <row><entry><literal>r=o</literal></entry><entry>
+ Allows operators greather-than, less-than, ... equals and
+ sets relation attribute accordingly (relation ordered).
+ </entry>
+ </row>
+
+ <row><entry><literal>t=l</literal></entry><entry>
+ Allows term to be left-truncated.
+ If term is of the form <literal>?x</literal>, the resulting
+ Type-1 term is <literal>x</literal> and truncation is left.
+ </entry>
+ </row>
+
+ <row><entry><literal>t=r</literal></entry><entry>
+ Allows term to be right-truncated.
+ If term is of the form <literal>x?</literal>, the resulting
+ Type-1 term is <literal>x</literal> and truncation is right.
+ </entry>
+ </row>
+
+ <row><entry><literal>t=n</literal></entry><entry>
+ If term is does not include <literal>?</literal>, the
+ truncation attribute is set to none (100).
+ </entry>
+ </row>
+
+ <row><entry><literal>t=b</literal></entry><entry>
+ Allows term to be both left&right truncated.
+ If term is of the form <literal>?x?</literal>, the
+ resulting term is <literal>x</literal> and trunctation is
+ set to both left&right.
+ </entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ </para>
+ <example><title>CCL profile</title>
+ <para>
+ Consider the following definition:
+ </para>
+
<screen>
- ti,ranked=knuth computer
- </screen>
- which will use "relation is ranked", "use is title", "structure is
- phrase".
+ ti u=4 s=1
+ au u=1 s=1
+ term s=105
+ ranked r=102
+ date u=30 r=o
+ </screen>
+ <para>
+ Four qualifiers are defined - <literal>ti</literal>,
+ <literal>au</literal>, <literal>ranked</literal> and
+ <literal>date</literal>.
+ </para>
+ <para>
+ <literal>ti</literal> and <literal>au</literal> both set
+ structure attribute to phrase (s=1).
+ <literal>ti</literal>
+ sets the use-attribute to 4. <literal>au</literal> sets the
+ use-attribute to 1.
+ When no qualifiers are used in the query the structure-attribute is
+ set to free-form-text (105) (rule for <literal>term</literal>).
+ The <literal>date</literal> sets the relation attribute to
+ the relation used in the CCL query and sets the use attribute
+ to 30 (Bib-1 Date).
+ </para>
+ <para>
+ You can combine attributes. To Search for "ranked title" you
+ can do
+ <screen>
+ ti,ranked=knuth computer
+ </screen>
+ which will set relation=ranked, use=title, structure=phrase.
+ </para>
+ <para>
+ Query
+ <screen>
+ year > 1980
+ </screen>
+ is a valid query, while
+ <screen>
+ ti > 1980
+ </screen>
+ is invalid.
+ </para>
+ </example>
+ </sect4>
+ <sect4><title>Qualifier alias</title>
+ <para>
+ A qualifier alias is of the form:
</para>
- </example>
-
+ <para>
+ <replaceable>q</replaceable>
+ <replaceable>q1</replaceable> <replaceable>q2</replaceable> ..
+ </para>
+ <para>
+ which declares <replaceable>q</replaceable> to
+ be an alias for <replaceable>q1</replaceable>,
+ <replaceable>q2</replaceable>... such that the CCL
+ query <replaceable>q=x</replaceable> is equivalent to
+ <replaceable>q1=x or w2=x or ...</replaceable>.
+ </para>
+ </sect4>
+
+ <sect4><title>Comments</title>
+ <para>
+ Lines with white space or lines that begin with
+ character <literal>#</literal> are treated as comments.
+ </para>
+ </sect4>
+
+ <sect4><title>Directives</title>
+ <para>
+ Directive specifications takes the form
+ </para>
+ <para><literal>@</literal><replaceable>directive</replaceable> <replaceable>value</replaceable>
+ </para>
+ <table><title>CCL directives</title>
+ <tgroup cols="3">
+ <colspec colwidth="2*" colname="name"></colspec>
+ <colspec colwidth="8*" colname="description"></colspec>
+ <colspec colwidth="1*" colname="default"></colspec>
+ <thead>
+ <row>
+ <entry>Name</entry>
+ <entry>Description</entry>
+ <entry>Default</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>truncation</entry>
+ <entry>Truncation character</entry>
+ <entry><literal>?</literal></entry>
+ </row>
+ <row>
+ <entry>field</entry>
+ <entry>Specifies how multiple fields are to be
+ combined. There are two modes: <literal>or</literal>:
+ multiple qualifier fields are ORed,
+ <literal>merge</literal>: attributes for the qualifier
+ fields are merged and assigned to one term.
+ </entry>
+ <entry><literal>merge</literal></entry>
+ </row>
+ <row>
+ <entry>case</entry>
+ <entry>Specificies if CCL operatores and qualifiers should be
+ compared with case sensitivity or not. Specify 0 for
+ case sensitive; 1 for case insensitive.</entry>
+ <entry><literal>0</literal></entry>
+ </row>
+
+ <row>
+ <entry>and</entry>
+ <entry>Specifies token for CCL operator AND.</entry>
+ <entry><literal>and</literal></entry>
+ </row>
+
+ <row>
+ <entry>or</entry>
+ <entry>Specifies token for CCL operator OR.</entry>
+ <entry><literal>or</literal></entry>
+ </row>
+
+ <row>
+ <entry>not</entry>
+ <entry>Specifies token for CCL operator NOT.</entry>
+ <entry><literal>not</literal></entry>
+ </row>
+
+ <row>
+ <entry>set</entry>
+ <entry>Specifies token for CCL operator SET.</entry>
+ <entry><literal>set</literal></entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ </sect4>
</sect3>
<sect3><title>CCL API</title>
<para>
A CQL query is parsed by the <function>cql_parser_string</function>
which takes a query <parameter>str</parameter>.
If the query was valid (no syntax errors), then zero is returned;
- otherwise a non-zero error code is returned.
+ otherwise -1 is returned to indicate a syntax error.
</para>
<para>
<synopsis>
<sect3 id="tools.cql.tree"><title>CQL tree</title>
<para>
- The the query string is validl, the CQL parser
+ The the query string is valid, the CQL parser
generates a tree representing the structure of the
CQL query.
</para>
</para>
<para>
If conversion failed, <function>cql_transform_buf</function>
- returns a non-zero error code; otherwise zero is returned
- (conversion successful).
+ returns a non-zero SRW error code; otherwise zero is returned
+ (conversion successful). The meanings of the numeric error
+ codes are listed in the SRW specifications at
+ <ulink url="http://www.loc.gov/srw/diagnostic-list.html"/>
+ </para>
+ <para>
+ If conversion fails, more information can be obtained by calling
+ <synopsis>
+int cql_transform_error(cql_transform_t ct, char **addinfop);
+ </synopsis>
+ This function returns the most recently returned numeric
+ error-code and sets the string-pointer at
+ <literal>*addinfop</literal> to point to a string containing
+ additional information about the error that occurred: for
+ example, if the error code is 15 (``Illegal or unsupported context
+ set''), the additional information is the name of the requested
+ context set that was not recognised.
+ </para>
+ <para>
+ The SRW error-codes may be translated into brief human-readable
+ error messages using
+ <synopsis>
+const char *cql_strerror(int code);
+ </synopsis>
</para>
<para>
If you wish to be able to produce a PQF result in a different
The following CQL patterns are recognized:
<variablelist>
<varlistentry><term>
- <literal>qualifier.</literal><replaceable>set</replaceable><literal>.</literal><replaceable>name</replaceable>
+ <literal>index.</literal><replaceable>set</replaceable><literal>.</literal><replaceable>name</replaceable>
</term>
<listitem>
<para>
- This pattern is invoked when a CQL qualifier, such as
+ This pattern is invoked when a CQL index, such as
dc.title is converted. <replaceable>set</replaceable>
- and <replaceable>name</replaceable> is the index set and qualifier
+ and <replaceable>name</replaceable> are the context set and index
name respectively.
Typically, the RPN specifies an equivalent use attribute.
</para>
<para>
- For terms not bound by a qualifier the pattern
- <literal>qualifier.srw.serverChoice</literal> is used.
- Here, the prefix <literal>srw</literal> is defined as
- <literal>http://www.loc.gov/zing/cql/srw-indexes/v1.0/</literal>.
+ For terms not bound by an index the pattern
+ <literal>index.cql.serverChoice</literal> is used.
+ Here, the prefix <literal>cql</literal> is defined as
+ <literal>http://www.loc.gov/zing/cql/cql-indexes/v1.0/</literal>.
If this pattern is not defined, the mapping will fail.
</para>
</listitem>
</varlistentry>
<varlistentry><term>
+ <literal>qualifier.</literal><replaceable>set</replaceable><literal>.</literal><replaceable>name</replaceable>
+ (DEPRECATED)
+ </term>
+ <listitem>
+ <para>
+ For backwards compatibility, this is recognised as a synonym of
+ <literal>index.</literal><replaceable>set</replaceable><literal>.</literal><replaceable>name</replaceable>
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry><term>
<literal>relation.</literal><replaceable>relation</replaceable>
</term>
<listitem>
</term>
<listitem>
<para>
- This specification defines a CQL index set for a given prefix.
+ This specification defines a CQL context set for a given prefix.
The value on the right hand side is the URI for the set -
<emphasis>not</emphasis> RPN. All prefixes used in
- qualifier patterns must be defined this way.
+ index patterns must be defined this way.
</para>
</listitem>
</varlistentry>
</para>
<example><title>CQL to RPN mapping file</title>
<para>
- This simple file defines two index sets, three qualifiers and three
+ This simple file defines two context sets, three indexes and three
relations, a position pattern and a default structure.
</para>
<programlisting><![CDATA[
- set.srw = http://www.loc.gov/zing/cql/srw-indexes/v1.0/
+ set.cql = http://www.loc.gov/zing/cql/context-sets/cql/v1.1/
set.dc = http://www.loc.gov/zing/cql/dc-indexes/v1.0/
- qualifier.srw.serverChoice = 1=1016
- qualifier.dc.title = 1=4
- qualifier.dc.subject = 1=21
+ index.cql.serverChoice = 1=1016
+ index.dc.title = 1=4
+ index.dc.subject = 1=21
relation.< = 2=1
relation.eq = 2=3
<screen>
@attr 1=1016 @attr 2=3 @attr 4=1 @attr 3=3 @attr 6=1 "computer"
</screen>
- by rules <literal>qualifier.srw.serverChoice</literal>,
+ by rules <literal>index.cql.serverChoice</literal>,
<literal>relation.scr</literal>, <literal>structure.*</literal>,
<literal>position.any</literal>.
</para>
<screen>
PROTO_Z3950
- PROTO_SR
+ PROTO_GENERAL
</screen>
<para>
- If you don't care about talking to SR-based implementations (few
- exist, and they may become fewer still if and when the ISO SR and ANSI
- Z39.50 documents are merged into a single standard), you can ignore
- this field on incoming packages, and always set it to PROTO_Z3950
- for outgoing packages.
+ Use <literal>PROTO_Z3950</literal> for Z39.50 Object Identifers,
+ <literal>PROTO_GENERAL</literal> for other types (such as
+ those associated with ILL).
</para>
<para>
<para>
again, corresponding to the specific OIDs defined by the standard.
+ Refer to the
+ <ulink url="http://lcweb.loc.gov/z3950/agency/defns/oids.html">
+ Registry of Z39.50 Object Identifiers</ulink> for the
+ whole list.
</para>
<para>
</para>
<para>
+ Three utility functions are provided for translating OIDs'
+ symbolic names (e.g. <literal>Usmarc</literal> into OID structures
+ (int arrays) and strings containing the OID in dotted notation
+ (e.g. <literal>1.2.840.10003.9.5.1</literal>). They are:
+ </para>
+
+ <screen>
+ int *oid_name_to_oid(oid_class oclass, const char *name, int *oid);
+ char *oid_to_dotstring(const int *oid, char *oidbuf);
+ char *oid_name_to_dotstring(oid_class oclass, const char *name, char *oidbuf);
+ </screen>
+
+ <para>
+ <literal>oid_name_to_oid()</literal>
+ translates the specified symbolic <literal>name</literal>,
+ interpreted as being of class <literal>oclass</literal>. (The
+ class must be specified as many symbolic names exist within
+ multiple classes - for example, <literal>Zthes</literal> is the
+ symbolic name of an attribute set, a schema and a tag-set.) The
+ sequence of integers representing the OID is written into the
+ area <literal>oid</literal> provided by the caller; it is the
+ caller's responsibility to ensure that this area is large enough
+ to contain the translated OID. As a convenience, the address of
+ the buffer (i.e. the value of <literal>oid</literal>) is
+ returned.
+ </para>
+ <para>
+ <literal>oid_to_dotstring()</literal>
+ Translates the int-array <literal>oid</literal> into a dotted
+ string which is written into the area <literal>oidbuf</literal>
+ supplied by the caller; it is the caller's responsibility to
+ ensure that this area is large enough. The address of the buffer
+ is returned.
+ </para>
+ <para>
+ <literal>oid_name_to_dotstring()</literal>
+ combines the previous two functions to derive a dotted string
+ representing the OID specified by <literal>oclass</literal> and
+ <literal>name</literal>, writing it into the buffer passed as
+ <literal>oidbuf</literal> and returning its address.
+ </para>
+
+ <para>
Finally, the module provides the following utility functions, whose
meaning should be obvious:
</para>
release the associated memory again. For the structures describing the
Z39.50 PDUs and related structures, it is convenient to use the
memory-management system of the &odr; subsystem (see
- <link linkend="odr-use">Using ODR</link>). However, in some circumstances
+ <xref linkend="odr.use"/>). However, in some circumstances
where you might otherwise benefit from using a simple nibble memory
management system, it may be impractical to use
<function>odr_malloc()</function> and <function>odr_reset()</function>.
</para>
</sect1>
+
+ <sect1 id="tools.marc"><title>MARC</title>
+
+ <para>
+ YAZ provides a fast utility that decodes MARC records and
+ encodes to a varity of output formats. The MARC records must
+ be encoded in ISO2709.
+ </para>
+ <synopsis><![CDATA[
+ #include <yaz/marcdisp.h>
+
+ /* create handler */
+ yaz_marc_t yaz_marc_create(void);
+ /* destroy */
+ void yaz_marc_destroy(yaz_marc_t mt);
+
+ /* set XML mode YAZ_MARC_LINE, YAZ_MARC_SIMPLEXML, ... */
+ void yaz_marc_xml(yaz_marc_t mt, int xmlmode);
+ #define YAZ_MARC_LINE 0
+ #define YAZ_MARC_SIMPLEXML 1
+ #define YAZ_MARC_OAIMARC 2
+ #define YAZ_MARC_MARCXML 3
+ #define YAZ_MARC_ISO2709 4
+
+ /* supply iconv handle for character set conversion .. */
+ void yaz_marc_iconv(yaz_marc_t mt, yaz_iconv_t cd);
+
+ /* set debug level, 0=none, 1=more, 2=even more, .. */
+ void yaz_marc_debug(yaz_marc_t mt, int level);
+
+ /* decode MARC in buf of size bsize. Returns >0 on success; <=0 on failure.
+ On success, result in *result with size *rsize. */
+ int yaz_marc_decode_buf (yaz_marc_t mt, const char *buf, int bsize,
+ char **result, int *rsize);
+
+ /* decode MARC in buf of size bsize. Returns >0 on success; <=0 on failure.
+ On success, result in WRBUF */
+ int yaz_marc_decode_wrbuf (yaz_marc_t mt, const char *buf,
+ int bsize, WRBUF wrbuf);
+]]>
+ </synopsis>
+ <para>
+ A MARC conversion handle must be created by using
+ <function>yaz_marc_create</function> and destroyed
+ by calling <function>yaz_marc_destroy</function>.
+ </para>
+ <para>
+ All other function operate on a <literal>yaz_marc_t</literal> handle.
+ The output is specified by a call to <function>yaz_marc_xml</function>.
+ The <literal>xmlmode</literal> must be one of
+ <variablelist>
+ <varlistentry>
+ <term>YAZ_MARC_LINE</term>
+ <listitem>
+ <para>
+ A simple line-by-line format suitable for display but not
+ recommend for further (machine) processing.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>YAZ_MARC_MARXML</term>
+ <listitem>
+ <para>
+ The resulting record is converted to MARCXML.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>YAZ_MARC_ISO2709</term>
+ <listitem>
+ <para>
+ The resulting record is converted to ISO2709 (MARC).
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ <para>
+ The actual conversion functions are
+ <function>yaz_marc_decode_buf</function> and
+ <function>yaz_marc_decode_wrbuf</function> which decodes and encodes
+ a MARC record. The former function operates on simple buffers, the
+ stores the resulting record in a WRBUF handle (WRBUF is a simple string
+ type).
+ </para>
+ <example>
+ <title>Display of MARC record</title>
+ <para>
+ The followint program snippet illustrates how the MARC API may
+ be used to convert a MARC record to the line-by-line format:
+ <programlisting><![CDATA[
+ void print_marc(const char *marc_buf, int marc_buf_size)
+ {
+ char *result; /* for result buf */
+ int result_len; /* for size of result */
+ yaz_marc_t mt = yaz_marc_create();
+ yaz_marc_xml(mt, YAZ_MARC_LINE);
+ yaz_marc_decode_buf(mt, marc_buf, marc_buf_size,
+ &result, &result_len);
+ fwrite(result, result_len, 1, stdout);
+ yaz_marc_destroy(mt); /* note that result is now freed... */
+ }
+]]>
+ </programlisting>
+ </para>
+ </example>
+ </sect1>
+
</chapter>
<!-- Keep this comment at the end of the file