-<!-- $Id: tools.xml,v 1.57 2007-02-08 11:36:59 adam Exp $ -->
<chapter id="tools"><title>Supporting Tools</title>
<para>
symbolic language for expressing boolean query structures.
</para>
- <para>
- The EUROPAGATE research project working under the Libraries programme
- of the European Commission's DG XIII has, amongst other useful tools,
- implemented a general-purpose CCL parser which produces an output
- structure that can be trivially converted to the internal RPN
- representation of &yaz; (The <literal>Z_RPNQuery</literal> structure).
- Since the CCL utility - along with the rest of the software
- produced by EUROPAGATE - is made freely available on a liberal
- license, it is included as a supplement to &yaz;.
- </para>
-
<sect3 id="ccl.syntax">
<title>CCL Syntax</title>
</table>
</para>
<para>
- Refer to the complete
+ Refer to <xref linkend="bib1"/> or the complete
<ulink url="&url.z39.50.attset.bib1;">list of Bib-1 attributes</ulink>
</para>
<para>
set to both left&right.
</entry>
</row>
+
+ <row><entry><literal>t=x</literal></entry><entry>
+ Allows masking anywhere in a term, thus fully supporting
+ # (mask one character) and ? (zero or more of any).
+ If masking is used, trunction is set to 102 (regexp-1 in term)
+ and the term is converted accordingly to a regular expression.
+ </entry>
+ </row>
+
+ <row><entry><literal>t=z</literal></entry><entry>
+ Allows masking anywhere in a term, thus fully supporting
+ # (mask one character) and ? (zero or more of any).
+ If masking is used, trunction is set to 104 (Z39.58 in term)
+ and the term is converted accordingly to Z39.58 masking term -
+ actually the same truncation as CCL itself.
+ </entry>
+ </row>
+
</tbody>
</tgroup>
</table>
<para>
The basic YAZ representation of an OID is an array of integers,
- terminated with the value -1. The &odr; module provides two
- utility-functions to create and copy this type of data elements:
+ terminated with the value -1. This integer is of type
+ <literal>Odr_oid</literal>.
</para>
-
- <screen>
- Odr_oid *odr_getoidbystr(ODR o, char *str);
- </screen>
-
<para>
- Creates an OID based on a string-based representation using dots (.)
- to separate elements in the OID.
+ Fundamental OID operations and the type <literal>Odr_oid</literal>
+ are defined in <filename>yaz/oid_util.h</filename>.
</para>
-
- <screen>
- Odr_oid *odr_oiddup(ODR odr, Odr_oid *o);
- </screen>
-
- <para>
- Creates a copy of the OID referenced by the <emphasis>o</emphasis>
- parameter.
- Both functions take an &odr; stream as parameter. This stream is used to
- allocate memory for the data elements, which is released on a
- subsequent call to <function>odr_reset()</function> on that stream.
- </para>
-
<para>
- The OID module provides a higher-level representation of the
- family of object identifiers which describe the Z39.50 protocol and its
- related objects. The definition of the module interface is given in
- the <filename>oid.h</filename> file.
+ An OID can either be declared as a automatic variable or it can
+ allocated using the memory utilities or ODR/NMEM. It's
+ guaranteed that an OID can fit in <literal>OID_SIZE</literal> integers.
</para>
-
- <para>
- The interface is mainly based on the <literal>oident</literal> structure.
- The definition of this structure looks like this:
- </para>
-
- <screen>
-typedef struct oident
-{
- oid_proto proto;
- oid_class oclass;
- oid_value value;
- int oidsuffix[OID_SIZE];
- char *desc;
-} oident;
- </screen>
-
- <para>
- The proto field takes one of the values
- </para>
-
- <screen>
- PROTO_Z3950
- PROTO_GENERAL
- </screen>
-
+ <example id="tools.oid.bib1.1"><title>Create OID on stack</title>
+ <para>
+ We can create an OID for the Bib-1 attribute set with:
+ <screen>
+ Odr_oid bib1[OID_SIZE];
+ bib1[0] = 1;
+ bib1[1] = 2;
+ bib1[2] = 840;
+ bib1[3] = 10003;
+ bib1[4] = 3;
+ bib1[5] = 1;
+ bib1[6] = -1;
+ </screen>
+ </para>
+ </example>
<para>
- Use <literal>PROTO_Z3950</literal> for Z39.50 Object Identifers,
- <literal>PROTO_GENERAL</literal> for other types (such as
- those associated with ILL).
+ And OID may also be filled from a string-based representation using
+ dots (.). This is achieved by function
+ <screen>
+ int oid_dotstring_to_oid(const char *name, Odr_oid *oid);
+ </screen>
+ This functions returns 0 if name could be converted; -1 otherwise.
</para>
- <para>
-
- The oclass field takes one of the values
+ <example id="tools.oid.bib1.2"><title>Using oid_oiddotstring_to_oid</title>
+ <para>
+ We can fill the Bib-1 attribute set OID easier with:
+ <screen>
+ Odr_oid bib1[OID_SIZE];
+ oid_oiddotstring_to_oid("1.2.840.10003.3.1", bib1);
+ </screen>
</para>
-
- <screen>
- CLASS_APPCTX
- CLASS_ABSYN
- CLASS_ATTSET
- CLASS_TRANSYN
- CLASS_DIAGSET
- CLASS_RECSYN
- CLASS_RESFORM
- CLASS_ACCFORM
- CLASS_EXTSERV
- CLASS_USERINFO
- CLASS_ELEMSPEC
- CLASS_VARSET
- CLASS_SCHEMA
- CLASS_TAGSET
- CLASS_GENERAL
- </screen>
-
+ </example>
<para>
- corresponding to the OID classes defined by the Z39.50 standard.
-
- Finally, the value field takes one of the values
- </para>
-
+ We can also allocate an OID dynamically on a ODR stream with:
<screen>
- VAL_APDU
- VAL_BER
- VAL_BASIC_CTX
- VAL_BIB1
- VAL_EXP1
- VAL_EXT1
- VAL_CCL1
- VAL_GILS
- VAL_WAIS
- VAL_STAS
- VAL_DIAG1
- VAL_ISO2709
- VAL_UNIMARC
- VAL_INTERMARC
- VAL_CCF
- VAL_USMARC
- VAL_UKMARC
- VAL_NORMARC
- VAL_LIBRISMARC
- VAL_DANMARC
- VAL_FINMARC
- VAL_MAB
- VAL_CANMARC
- VAL_SBN
- VAL_PICAMARC
- VAL_AUSMARC
- VAL_IBERMARC
- VAL_EXPLAIN
- VAL_SUTRS
- VAL_OPAC
- VAL_SUMMARY
- VAL_GRS0
- VAL_GRS1
- VAL_EXTENDED
- VAL_RESOURCE1
- VAL_RESOURCE2
- VAL_PROMPT1
- VAL_DES1
- VAL_KRB1
- VAL_PRESSET
- VAL_PQUERY
- VAL_PCQUERY
- VAL_ITEMORDER
- VAL_DBUPDATE
- VAL_EXPORTSPEC
- VAL_EXPORTINV
- VAL_NONE
- VAL_SETM
- VAL_SETG
- VAL_VAR1
- VAL_ESPEC1
+ Odr_oid *odr_getoidbystr(ODR o, const char *str);
</screen>
-
- <para>
- again, corresponding to the specific OIDs defined by the standard.
- Refer to the
- <ulink url="&url.z39.50.oids;">
- Registry of Z39.50 Object Identifiers</ulink> for the
- whole list.
- </para>
-
- <para>
- The desc field contains a brief, mnemonic name for the OID in question.
- </para>
-
- <para>
- The function
+ This creates an OID from string-based representation using dots.
+ This function take an &odr; stream as parameter. This stream is used to
+ allocate memory for the data elements, which is released on a
+ subsequent call to <function>odr_reset()</function> on that stream.
</para>
- <screen>
- struct oident *oid_getentbyoid(int *o);
- </screen>
-
- <para>
- takes as argument an OID, and returns a pointer to a static area
- containing an <literal>oident</literal> structure. You typically use
- this function when you receive a PDU containing an OID, and you wish
- to branch out depending on the specific OID value.
- </para>
+ <example id="tools.oid.bib1.3"><title>Using odr_getoidbystr</title>
+ <para>
+ We can create a OID for the Bib-1 attribute set with:
+ <screen>
+ Odr_oid *bib1 = odr_getoidbystr(odr, "1.2.840.10003.3.1");
+ </screen>
+ </para>
+ </example>
<para>
The function
- </para>
-
- <screen>
- int *oid_ent_to_oid(struct oident *ent, int *dst);
- </screen>
-
- <para>
- Takes as argument an <literal>oident</literal> structure - in which
- the <literal>proto</literal>, <literal>oclass</literal>/, and
- <literal>value</literal> fields are assumed to be set correctly -
- and returns a pointer to a the buffer as given by <literal>dst</literal>
- containing the base
- representation of the corresponding OID. The function returns
- NULL and the array dst is unchanged if a mapping couldn't place.
- The array <literal>dst</literal> should be at least of size
- <literal>OID_SIZE</literal>.
- </para>
- <para>
-
- The <function>oid_ent_to_oid()</function> function can be used whenever
- you need to prepare a PDU containing one or more OIDs. The separation of
- the <literal>protocol</literal> element from the remainder of the
- OID-description makes it simple to write applications that can
- communicate with either Z39.50 or OSI SR-based applications.
+ <screen>
+ char *oid_oid_to_dotstring(const Odr_oid *oid, char *oidbuf)
+ </screen>
+ does the reverse of <function>oid_oiddotstring_to_oid</function>. It
+ converts an OID to the string-based representation using dots.
+ The supplied char buffer <literal>oidbuf</literal> holds the resulting
+ string and must be at least <literal>OID_STR_MAX</literal> in size.
</para>
<para>
- The function
+ OIDs can be copied with <function>oid_oidcpy</function> which takes
+ two OID lists as arguments. Alternativly, an OID copy can be allocated
+ on a ODR stream with:
+ <screen>
+ Odr_oid *odr_oiddup(ODR odr, const Odr_oid *o);
+ </screen>
</para>
-
- <screen>
- oid_value oid_getvalbyname(const char *name);
- </screen>
-
- <para>
- takes as argument a mnemonic OID name, and returns the
- <literal>/value</literal> field of the first entry in the database that
- contains the given name in its <literal>desc</literal> field.
- </para>
-
- <para>
- Three utility functions are provided for translating OIDs'
- symbolic names (e.g. <literal>Usmarc</literal> into OID structures
- (int arrays) and strings containing the OID in dotted notation
- (e.g. <literal>1.2.840.10003.9.5.1</literal>). They are:
- </para>
-
- <screen>
- int *oid_name_to_oid(oid_class oclass, const char *name, int *oid);
- char *oid_to_dotstring(const int *oid, char *oidbuf);
- char *oid_name_to_dotstring(oid_class oclass, const char *name, char *oidbuf);
- </screen>
-
- <para>
- <literal>oid_name_to_oid()</literal>
- translates the specified symbolic <literal>name</literal>,
- interpreted as being of class <literal>oclass</literal>. (The
- class must be specified as many symbolic names exist within
- multiple classes - for example, <literal>Zthes</literal> is the
- symbolic name of an attribute set, a schema and a tag-set.) The
- sequence of integers representing the OID is written into the
- area <literal>oid</literal> provided by the caller; it is the
- caller's responsibility to ensure that this area is large enough
- to contain the translated OID. As a convenience, the address of
- the buffer (i.e. the value of <literal>oid</literal>) is
- returned.
- </para>
- <para>
- <literal>oid_to_dotstring()</literal>
- Translates the int-array <literal>oid</literal> into a dotted
- string which is written into the area <literal>oidbuf</literal>
- supplied by the caller; it is the caller's responsibility to
- ensure that this area is large enough. The address of the buffer
- is returned.
- </para>
- <para>
- <literal>oid_name_to_dotstring()</literal>
- combines the previous two functions to derive a dotted string
- representing the OID specified by <literal>oclass</literal> and
- <literal>name</literal>, writing it into the buffer passed as
- <literal>oidbuf</literal> and returning its address.
- </para>
-
+
<para>
- Finally, the module provides the following utility functions, whose
- meaning should be obvious:
+ OIDs can be compared with <function>oid_oidcmp</function> which returns
+ zero if the two OIDs provided are identical; non-zero otherwise.
</para>
+
+ <sect2 id="tools.oid.database"><title>OID database</title>
+ <para>
+ From YAZ version 3 and later, the oident system has been replaced
+ by an OID database. OID database is a misnomer .. the old odient
+ system was also a database.
+ </para>
+ <para>
+ The OID database is really just a map between named Object Identifiers
+ (string) and their OID raw equivalents. Most operations either
+ convert from string to OID or other way around.
+ </para>
+ <para>
+ Unfortunately, whenever we supply a string we must also specify the
+ <emphasis>OID class</emphasis>. The class is necessary because some
+ strings correspond to multiple OIDs. An example of such a string is
+ <literal>Bib-1</literal> which may either be an attribute-set
+ or a diagnostic-set.
+ </para>
+ <para>
+ Applications using the YAZ database should include
+ <filename>yaz/oid_db.h</filename>.
+ </para>
+ <para>
+ A YAZ database handle is of type <literal>yaz_oid_db_t</literal>.
+ Actually that's a pointer. You need not think deal with that.
+ YAZ has a built-in database which can be considered "constant" for
+ most purposes.
+ We can get hold that by using function <function>yaz_oid_std</function>.
+ </para>
+ <para>
+ All functions with prefix <function>yaz_string_to_oid</function>
+ converts from class + string to OID. We have variants of this
+ operation due to different memory allocation strategies.
+ </para>
+ <para>
+ All functions with prefix
+ <function>yaz_oid_to_string</function> converts from OID to string
+ + class.
+ </para>
- <screen>
- void oid_oidcpy(int *t, int *s);
- void oid_oidcat(int *t, int *s);
- int oid_oidcmp(int *o1, int *o2);
- int oid_oidlen(int *o);
- </screen>
+ <example id="tools.oid.bib1.4"><title>Create OID with YAZ DB</title>
+ <para>
+ We can create an OID for the Bib-1 attribute set on the ODR stream
+ odr with:
+ <screen>
+ Odr_oid *bib1 =
+ yaz_string_to_oid_odr(yaz_oid_std(), CLASS_ATTSET, "Bib-1", odr);
+ </screen>
+ This is more complex than using <function>odr_getoidbystr</function>.
+ You would only use <function>yaz_string_to_oid_odr</function> when the
+ string (here Bib-1) is supplied by a user or configuration.
+ </para>
+ </example>
- <note>
+ </sect2>
+ <sect2 id="tools.oid.std"><title>Standard OIDs</title>
+
<para>
- The OID module has been criticized - and perhaps rightly so
- - for needlessly abstracting the
- representation of OIDs. Other toolkits use a simple
- string-representation of OIDs with good results. In practice, we have
- found the interface comfortable and quick to work with, and it is a
- simple matter (for what it's worth) to create applications compatible
- with both ISO SR and Z39.50. Finally, the use of the
- <literal>/oident</literal> database is by no means mandatory.
- You can easily create your own system for representing OIDs, as long
- as it is compatible with the low-level integer-array representation
- of the ODR module.
+ All the object identifers in the standard OID database as returned
+ by <function>yaz_oid_std</function> can referenced directly in a
+ program as a constant OID.
+ Each constant OID is prefixed with <literal>yaz_oid_</literal> -
+ followed by OID class (lowercase) - then by OID name (normalized and
+ lowercase).
+ </para>
+ <para>
+ See <xref linkend="list-oids"/> for list of all object identifiers
+ built into YAZ.
+ These are declared in <filename>yaz/oid_std.h</filename> but are
+ included by <filename>yaz/oid_db.h</filename> as well.
</para>
- </note>
+ <example id="tools.oid.bib1.5"><title>Use a built-in OID</title>
+ <para>
+ We can allocate our own OID filled with the constant OID for
+ Bib-1 with:
+ <screen>
+ Odr_oid *bib1 = odr_oiddup(o, yaz_oid_attset_bib1);
+ </screen>
+ </para>
+ </example>
+ </sect2>
</sect1>
-
<sect1 id="tools.nmem"><title>Nibble Memory</title>
<para>
<screen>
NMEM nmem_create(void);
void nmem_destroy(NMEM n);
- void *nmem_malloc(NMEM n, int size);
+ void *nmem_malloc(NMEM n, size_t size);
void nmem_reset(NMEM n);
- int nmem_total(NMEM n);
+ size_t nmem_total(NMEM n);
void nmem_init(void);
void nmem_exit(void);
</screen>
<sect1 id="marc"><title>MARC</title>
<para>
- YAZ provides a fast utility that decodes MARC records and
- encodes to a varity of output formats. The MARC records must
- be encoded in ISO2709.
+ YAZ provides a fast utility for working with MARC records.
+ Early versions of the MARC utility only allowed decoding of ISO2709.
+ Today the utility may both encode - and decode to a varity of formats.
</para>
<synopsis><![CDATA[
#include <yaz/marcdisp.h>
#define YAZ_MARC_MARCXML 3
#define YAZ_MARC_ISO2709 4
#define YAZ_MARC_XCHANGE 5
+ #define YAZ_MARC_CHECK 6
+ #define YAZ_MARC_TURBOMARC 7
/* supply iconv handle for character set conversion .. */
void yaz_marc_iconv(yaz_marc_t mt, yaz_iconv_t cd);
/* decode MARC in buf of size bsize. Returns >0 on success; <=0 on failure.
On success, result in *result with size *rsize. */
- int yaz_marc_decode_buf (yaz_marc_t mt, const char *buf, int bsize,
- char **result, int *rsize);
+ int yaz_marc_decode_buf(yaz_marc_t mt, const char *buf, int bsize,
+ const char **result, size_t *rsize);
/* decode MARC in buf of size bsize. Returns >0 on success; <=0 on failure.
On success, result in WRBUF */
- int yaz_marc_decode_wrbuf (yaz_marc_t mt, const char *buf,
- int bsize, WRBUF wrbuf);
+ int yaz_marc_decode_wrbuf(yaz_marc_t mt, const char *buf,
+ int bsize, WRBUF wrbuf);
]]>
</synopsis>
+ <note>
+ <para>
+ The synopsis is just a basic subset of all functionality. Refer
+ to the actual header file <filename>marcdisp.h</filename> for
+ details.
+ </para>
+ </note>
<para>
A MARC conversion handle must be created by using
<function>yaz_marc_create</function> and destroyed
<term>YAZ_MARC_MARCXML</term>
<listitem>
<para>
- The resulting record is converted to MARCXML.
+ <ulink url="&url.marcxml;">MARCXML</ulink>.
</para>
</listitem>
</varlistentry>
<term>YAZ_MARC_ISO2709</term>
<listitem>
<para>
- The resulting record is converted to ISO2709 (MARC).
+ ISO2709 (sometimes just referred to as "MARC").
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>YAZ_MARC_XCHANGE</term>
+ <listitem>
+ <para>
+ <ulink url="&url.marcxchange;">MarcXchange</ulink>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>YAZ_MARC_CHECK</term>
+ <listitem>
+ <para>
+ Pseudo format for validation only. Does not generate
+ any real output except diagnostics.
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term>YAZ_MARC_TURBOMARC</term>
+ <listitem>
+ <para>
+ XML format with same semantics as MARCXML but more compact
+ and geared towards fast processing with XSLT. Refer to
+ <xref linkend="tools.turbomarc"/> for more information.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</para>
<para>
<example id="example.marc.display">
<title>Display of MARC record</title>
<para>
- The followint program snippet illustrates how the MARC API may
+ The following program snippet illustrates how the MARC API may
be used to convert a MARC record to the line-by-line format:
<programlisting><![CDATA[
void print_marc(const char *marc_buf, int marc_buf_size)
{
char *result; /* for result buf */
- int result_len; /* for size of result */
+ size_t result_len; /* for size of result */
yaz_marc_t mt = yaz_marc_create();
yaz_marc_xml(mt, YAZ_MARC_LINE);
yaz_marc_decode_buf(mt, marc_buf, marc_buf_size,
</programlisting>
</para>
</example>
+ <sect2 id="tools.turbomarc">
+ <title>TurboMARC</title>
+ <para>
+ TurboMARC is yet another XML encoding of a MARC record. The format
+ was designed for fast processing with XSLT.
+ </para>
+ <para>
+ Applications like
+ Pazpar2 uses XSLT to convert an XML encoded MARC record to an internal
+ representation. This conversion mostly check the tag of a MARC field
+ to determine the basic rules in the conversion. This check is
+ costly when that is tag is encoded as an attribute in MARCXML.
+ By having the tag value as the element instead, makes processing
+ many times faster (at least for Libxslt).
+ </para>
+ <para>
+ TurboMARC is encoded as follows:
+ <itemizedlist>
+ <listitem><para>
+ Record elements is part of namespace
+ "<literal>http://www.indexdata.com/turbomarc</literal>".
+ </para></listitem>
+ <listitem><para>
+ A record is enclosed in element <literal>r</literal>.
+ </para></listitem>
+ <listitem><para>
+ A collection of records is enclosed in element
+ <literal>collection</literal>.
+ </para></listitem>
+ <listitem><para>
+ The leader is encoded as element <literal>l</literal> with the
+ leader content as its (text) value.
+ </para></listitem>
+ <listitem><para>
+ A control field is encoded as element <literal>c</literal> concatenated
+ with the tag value of the control field if the tag value
+ matches the regular expression <literal>[a-zA-Z0-9]*</literal>.
+ If the tag value do not match the regular expression
+ <literal>[a-zA-Z0-9]*</literal> the control field is encoded
+ as element <literal>c</literal> and attribute <literal>code</literal>
+ will hold the tag value.
+ This rule ensure that in the rare cases where a tag value might
+ result in a non-wellformed XML YAZ encode it as a coded attribute
+ (as in MARCXML).
+ </para>
+ <para>
+ The control field content is the the text value of this element.
+ Indicators are encoded as attribute names
+ <literal>i1</literal>, <literal>i2</literal>, etc.. and
+ corresponding values for each indicator.
+ </para></listitem>
+ <listitem><para>
+ A data field is encoded as element <literal>d</literal> concatenated
+ with the tag value of the data field or using the attribute
+ <literal>code</literal> as described in the rules for control fields.
+ The children of the data field element is subfield elements.
+ Each subfield element is encoded as <literal>s</literal>
+ concatenated with the sub field code.
+ The text of the subfield element is the contents of the subfield.
+ Indicators are encoded as attributes for the data field element similar
+ to the encoding for control fields.
+ </para></listitem>
+ </itemizedlist>
+ </para>
+ </sect2>
</sect1>
<sect1 id="tools.retrieval">
<para>
Defines the name of the retrieval format. This can be
any string. For SRU, the value, is equivalent to schema (short-hand);
- for Z39.50 it's equivalent to simple element set name.
+ for Z39.50 it's equivalent to simple element set name.
+ For YAZ 3.0.24 and later this name may be specified as a glob
+ expression with operators
+ <literal>*</literal> and <literal>?</literal>.
</para>
</listitem>
</varlistentry>
<para>
The <literal>marc</literal> element specifies a conversion
to - and from ISO2709 encoded MARC and
- <ulink url="&url.marcxml;">&marcxml;</ulink>/MarcXchange.
+ <ulink url="&url.marcxml;">&acro.marcxml;</ulink>/MarcXchange.
The following attributes may be specified:
<variablelist>
<listitem>
<para>
The <literal>xslt</literal> element specifies a conversion
- via &xslt;. The following attributes may be specified:
+ via &acro.xslt;. The following attributes may be specified:
<variablelist>
<varlistentry><term><literal>stylesheet</literal> (REQUIRED)</term>
</sect2>
<sect2 id="tools.retrieval.examples">
<title>Retrieval Facility Examples</title>
- <example>
- <title id="tools.retrieval.marc21">MARC21 backend</title>
+ <example id="tools.retrieval.marc21">
+ <title>MARC21 backend</title>
<para>
A typical way to use the retrieval facility is to enable XML
for servers that only supports ISO2709 encoded MARC21 records.
</para>
</example>
</sect2>
- <sect2>
+ <sect2 id="tools.retrieval.api">
<title>API</title>
<para>
It should be easy to use the retrieval systems from applications. Refer