-<!-- $Id: tools.xml,v 1.62 2007-05-08 08:22:35 adam Exp $ -->
<chapter id="tools"><title>Supporting Tools</title>
<para>
symbolic language for expressing boolean query structures.
</para>
- <para>
- The EUROPAGATE research project working under the Libraries programme
- of the European Commission's DG XIII has, amongst other useful tools,
- implemented a general-purpose CCL parser which produces an output
- structure that can be trivially converted to the internal RPN
- representation of &yaz; (The <literal>Z_RPNQuery</literal> structure).
- Since the CCL utility - along with the rest of the software
- produced by EUROPAGATE - is made freely available on a liberal
- license, it is included as a supplement to &yaz;.
- </para>
-
<sect3 id="ccl.syntax">
<title>CCL Syntax</title>
</table>
</para>
<para>
- Refer to the complete
+ Refer to <xref linkend="bib1"/> or the complete
<ulink url="&url.z39.50.attset.bib1;">list of Bib-1 attributes</ulink>
</para>
<para>
</para>
</example>
</sect2>
-
- <sect2 id="tools.oid.oident"><title>OID oident</title>
-
- <note>
- <para>
- The oident utility has been removed from YAZ version 3. This
- sub section only applies to YAZ version 2.
- </para>
- </note>
-
- <para>
- The OID module provides a higher-level representation of the
- family of object identifiers which describe the Z39.50 protocol and its
- related objects. The definition of the module interface is given in
- the <filename>oid.h</filename> file.
- </para>
-
- <para>
- The interface is mainly based on the <literal>oident</literal> structure.
- The definition of this structure looks like this:
- </para>
-
- <screen>
-typedef struct oident
-{
- oid_proto proto;
- oid_class oclass;
- oid_value value;
- int oidsuffix[OID_SIZE];
- char *desc;
-} oident;
- </screen>
-
- <para>
- The proto field takes one of the values
- </para>
-
- <screen>
- PROTO_Z3950
- PROTO_GENERAL
- </screen>
-
- <para>
- Use <literal>PROTO_Z3950</literal> for Z39.50 Object Identifers,
- <literal>PROTO_GENERAL</literal> for other types (such as
- those associated with ILL).
- </para>
- <para>
-
- The oclass field takes one of the values
- </para>
-
- <screen>
- CLASS_APPCTX
- CLASS_ABSYN
- CLASS_ATTSET
- CLASS_TRANSYN
- CLASS_DIAGSET
- CLASS_RECSYN
- CLASS_RESFORM
- CLASS_ACCFORM
- CLASS_EXTSERV
- CLASS_USERINFO
- CLASS_ELEMSPEC
- CLASS_VARSET
- CLASS_SCHEMA
- CLASS_TAGSET
- CLASS_GENERAL
- </screen>
-
- <para>
- corresponding to the OID classes defined by the Z39.50 standard.
-
- Finally, the value field takes one of the values
- </para>
-
- <screen>
- VAL_APDU
- VAL_BER
- VAL_BASIC_CTX
- VAL_BIB1
- VAL_EXP1
- VAL_EXT1
- VAL_CCL1
- VAL_GILS
- VAL_WAIS
- VAL_STAS
- VAL_DIAG1
- VAL_ISO2709
- VAL_UNIMARC
- VAL_INTERMARC
- VAL_CCF
- VAL_USMARC
- VAL_UKMARC
- VAL_NORMARC
- VAL_LIBRISMARC
- VAL_DANMARC
- VAL_FINMARC
- VAL_MAB
- VAL_CANMARC
- VAL_SBN
- VAL_PICAMARC
- VAL_AUSMARC
- VAL_IBERMARC
- VAL_EXPLAIN
- VAL_SUTRS
- VAL_OPAC
- VAL_SUMMARY
- VAL_GRS0
- VAL_GRS1
- VAL_EXTENDED
- VAL_RESOURCE1
- VAL_RESOURCE2
- VAL_PROMPT1
- VAL_DES1
- VAL_KRB1
- VAL_PRESSET
- VAL_PQUERY
- VAL_PCQUERY
- VAL_ITEMORDER
- VAL_DBUPDATE
- VAL_EXPORTSPEC
- VAL_EXPORTINV
- VAL_NONE
- VAL_SETM
- VAL_SETG
- VAL_VAR1
- VAL_ESPEC1
- </screen>
-
- <para>
- again, corresponding to the specific OIDs defined by the standard.
- Refer to the
- <ulink url="&url.z39.50.oids;">
- Registry of Z39.50 Object Identifiers</ulink> for the
- whole list.
- </para>
-
- <para>
- The desc field contains a brief, mnemonic name for the OID in question.
- </para>
-
- <para>
- The function
- </para>
-
- <screen>
- struct oident *oid_getentbyoid(int *o);
- </screen>
-
- <para>
- takes as argument an OID, and returns a pointer to a static area
- containing an <literal>oident</literal> structure. You typically use
- this function when you receive a PDU containing an OID, and you wish
- to branch out depending on the specific OID value.
- </para>
-
- <para>
- The function
- </para>
-
- <screen>
- int *oid_ent_to_oid(struct oident *ent, int *dst);
- </screen>
-
- <para>
- Takes as argument an <literal>oident</literal> structure - in which
- the <literal>proto</literal>, <literal>oclass</literal>/, and
- <literal>value</literal> fields are assumed to be set correctly -
- and returns a pointer to a the buffer as given by <literal>dst</literal>
- containing the base
- representation of the corresponding OID. The function returns
- NULL and the array dst is unchanged if a mapping couldn't place.
- The array <literal>dst</literal> should be at least of size
- <literal>OID_SIZE</literal>.
- </para>
- <para>
-
- The <function>oid_ent_to_oid()</function> function can be used whenever
- you need to prepare a PDU containing one or more OIDs. The separation of
- the <literal>protocol</literal> element from the remainder of the
- OID-description makes it simple to write applications that can
- communicate with either Z39.50 or OSI SR-based applications.
- </para>
-
- <para>
- The function
- </para>
-
- <screen>
- oid_value oid_getvalbyname(const char *name);
- </screen>
-
- <para>
- takes as argument a mnemonic OID name, and returns the
- <literal>/value</literal> field of the first entry in the database that
- contains the given name in its <literal>desc</literal> field.
- </para>
-
- <para>
- Three utility functions are provided for translating OIDs'
- symbolic names (e.g. <literal>Usmarc</literal> into OID structures
- (int arrays) and strings containing the OID in dotted notation
- (e.g. <literal>1.2.840.10003.9.5.1</literal>). They are:
- </para>
-
- <screen>
- int *oid_name_to_oid(oid_class oclass, const char *name, int *oid);
- char *oid_to_dotstring(const int *oid, char *oidbuf);
- char *oid_name_to_dotstring(oid_class oclass, const char *name, char *oidbuf);
- </screen>
-
- <para>
- <literal>oid_name_to_oid()</literal>
- translates the specified symbolic <literal>name</literal>,
- interpreted as being of class <literal>oclass</literal>. (The
- class must be specified as many symbolic names exist within
- multiple classes - for example, <literal>Zthes</literal> is the
- symbolic name of an attribute set, a schema and a tag-set.) The
- sequence of integers representing the OID is written into the
- area <literal>oid</literal> provided by the caller; it is the
- caller's responsibility to ensure that this area is large enough
- to contain the translated OID. As a convenience, the address of
- the buffer (i.e. the value of <literal>oid</literal>) is
- returned.
- </para>
- <para>
- <literal>oid_to_dotstring()</literal>
- Translates the int-array <literal>oid</literal> into a dotted
- string which is written into the area <literal>oidbuf</literal>
- supplied by the caller; it is the caller's responsibility to
- ensure that this area is large enough. The address of the buffer
- is returned.
- </para>
- <para>
- <literal>oid_name_to_dotstring()</literal>
- combines the previous two functions to derive a dotted string
- representing the OID specified by <literal>oclass</literal> and
- <literal>name</literal>, writing it into the buffer passed as
- <literal>oidbuf</literal> and returning its address.
- </para>
-
- <note>
- <para>
- The OID module has been criticized - and perhaps rightly so
- - for needlessly abstracting the
- representation of OIDs. Other toolkits use a simple
- string-representation of OIDs with good results. In practice, we have
- found the interface comfortable and quick to work with, and it is a
- simple matter (for what it's worth) to create applications compatible
- with both ISO SR and Z39.50. Finally, the use of the
- <literal>/oident</literal> database is by no means mandatory.
- You can easily create your own system for representing OIDs, as long
- as it is compatible with the low-level integer-array representation
- of the ODR module.
- </para>
- </note>
-
- </sect2>
</sect1>
<sect1 id="tools.nmem"><title>Nibble Memory</title>
<screen>
NMEM nmem_create(void);
void nmem_destroy(NMEM n);
- void *nmem_malloc(NMEM n, int size);
+ void *nmem_malloc(NMEM n, size_t size);
void nmem_reset(NMEM n);
- int nmem_total(NMEM n);
+ size_t nmem_total(NMEM n);
void nmem_init(void);
void nmem_exit(void);
</screen>
<sect1 id="marc"><title>MARC</title>
<para>
- YAZ provides a fast utility that decodes MARC records and
- encodes to a varity of output formats. The MARC records must
- be encoded in ISO2709.
+ YAZ provides a fast utility for working with MARC records.
+ Early versions of the MARC utility only allowed decoding of ISO2709.
+ Today the utility may both encode - and decode to a varity of formats.
</para>
<synopsis><![CDATA[
#include <yaz/marcdisp.h>
#define YAZ_MARC_MARCXML 3
#define YAZ_MARC_ISO2709 4
#define YAZ_MARC_XCHANGE 5
+ #define YAZ_MARC_CHECK 6
+ #define YAZ_MARC_TURBOMARC 7
/* supply iconv handle for character set conversion .. */
void yaz_marc_iconv(yaz_marc_t mt, yaz_iconv_t cd);
/* decode MARC in buf of size bsize. Returns >0 on success; <=0 on failure.
On success, result in *result with size *rsize. */
- int yaz_marc_decode_buf (yaz_marc_t mt, const char *buf, int bsize,
- char **result, int *rsize);
+ int yaz_marc_decode_buf(yaz_marc_t mt, const char *buf, int bsize,
+ const char **result, size_t *rsize);
/* decode MARC in buf of size bsize. Returns >0 on success; <=0 on failure.
On success, result in WRBUF */
- int yaz_marc_decode_wrbuf (yaz_marc_t mt, const char *buf,
- int bsize, WRBUF wrbuf);
+ int yaz_marc_decode_wrbuf(yaz_marc_t mt, const char *buf,
+ int bsize, WRBUF wrbuf);
]]>
</synopsis>
+ <note>
+ <para>
+ The synopsis is just a basic subset of all functionality. Refer
+ to the actual header file <filename>marcdisp.h</filename> for
+ details.
+ </para>
+ </note>
<para>
A MARC conversion handle must be created by using
<function>yaz_marc_create</function> and destroyed
<term>YAZ_MARC_MARCXML</term>
<listitem>
<para>
- The resulting record is converted to MARCXML.
+ <ulink url="&url.marcxml;">MARCXML</ulink>.
</para>
</listitem>
</varlistentry>
<term>YAZ_MARC_ISO2709</term>
<listitem>
<para>
- The resulting record is converted to ISO2709 (MARC).
+ ISO2709 (sometimes just referred to as "MARC").
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>YAZ_MARC_XCHANGE</term>
+ <listitem>
+ <para>
+ <ulink url="&url.marcxchange;">MarcXchange</ulink>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>YAZ_MARC_CHECK</term>
+ <listitem>
+ <para>
+ Pseudo format for validation only. Does not generate
+ any real output except diagnostics.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>YAZ_MARC_TURBOMARC</term>
+ <listitem>
+ <para>
+ XML format with same semantics as MARCXML but more compact
+ and geared towards fast processing with XSLT. Refer to
+ <xref linkend="tools.turbomarc"/> for more information.
</para>
</listitem>
</varlistentry>
+
</variablelist>
</para>
<para>
<example id="example.marc.display">
<title>Display of MARC record</title>
<para>
- The followint program snippet illustrates how the MARC API may
+ The following program snippet illustrates how the MARC API may
be used to convert a MARC record to the line-by-line format:
<programlisting><![CDATA[
void print_marc(const char *marc_buf, int marc_buf_size)
{
char *result; /* for result buf */
- int result_len; /* for size of result */
+ size_t result_len; /* for size of result */
yaz_marc_t mt = yaz_marc_create();
yaz_marc_xml(mt, YAZ_MARC_LINE);
yaz_marc_decode_buf(mt, marc_buf, marc_buf_size,
</programlisting>
</para>
</example>
+ <sect2 id="tools.turbomarc">
+ <title>TurboMARC</title>
+ <para>
+ TurboMARC is yet another XML encoding of a MARC record. The format
+ was designed for fast processing with XSLT.
+ </para>
+ <para>
+ Applications like
+ Pazpar2 uses XSLT to convert an XML encoded MARC record to an internal
+ representation. This conversion mostly check the tag of a MARC field
+ to determine the basic rules in the conversion. This check is
+ costly when that is tag is encoded as an attribute in MARCXML.
+ By having the tag value as the element instead, makes processing
+ many times faster (at least for Libxslt).
+ </para>
+ <para>
+ TurboMARC is encoded as follows:
+ <itemizedlist>
+ <listitem><para>
+ Record elements is part of namespace
+ "<literal>http://www.indexdata.com/turbomarc</literal>".
+ </para></listitem>
+ <listitem><para>
+ A record is enclosed in element <literal>r</literal>.
+ </para></listitem>
+ <listitem><para>
+ A collection of records is enclosed in element
+ <literal>collection</literal>.
+ </para></listitem>
+ <listitem><para>
+ The leader is encoded as element <literal>l</literal> with the
+ leader content as its (text) value.
+ </para></listitem>
+ <listitem><para>
+ A control field is encoded as element <literal>c</literal> concatenated
+ with the tag value of the control field if the tag value
+ matches the regular expression <literal>[a-zA-Z0-9]*</literal>.
+ If the tag value do not match the regular expression
+ <literal>[a-zA-Z0-9]*</literal> the control field is encoded
+ as element <literal>c</literal> and attribute <literal>code</literal>
+ will hold the tag value.
+ This rule ensure that in the rare cases where a tag value might
+ result in a non-wellformed XML YAZ encode it as a coded attribute
+ (as in MARCXML).
+ </para>
+ <para>
+ The control field content is the the text value of this element.
+ Indicators are encoded as attribute names
+ <literal>i1</literal>, <literal>i2</literal>, etc.. and
+ corresponding values for each indicator.
+ </para></listitem>
+ <listitem><para>
+ A data field is encoded as element <literal>d</literal> concatenated
+ with the tag value of the data field or using the attribute
+ <literal>code</literal> as described in the rules for control fields.
+ The children of the data field element is subfield elements.
+ Each subfield element is encoded as <literal>s</literal>
+ concatenated with the sub field code.
+ The text of the subfield element is the contents of the subfield.
+ Indicators are encoded as attributes for the data field element similar
+ to the encoding for control fields.
+ </para></listitem>
+ </itemizedlist>
+ </para>
+ </sect2>
</sect1>
<sect1 id="tools.retrieval">
<para>
Defines the name of the retrieval format. This can be
any string. For SRU, the value, is equivalent to schema (short-hand);
- for Z39.50 it's equivalent to simple element set name.
+ for Z39.50 it's equivalent to simple element set name.
+ For YAZ 3.0.24 and later this name may be specified as a glob
+ expression with operators
+ <literal>*</literal> and <literal>?</literal>.
</para>
</listitem>
</varlistentry>
<para>
The <literal>marc</literal> element specifies a conversion
to - and from ISO2709 encoded MARC and
- <ulink url="&url.marcxml;">&marcxml;</ulink>/MarcXchange.
+ <ulink url="&url.marcxml;">&acro.marcxml;</ulink>/MarcXchange.
The following attributes may be specified:
<variablelist>
<listitem>
<para>
The <literal>xslt</literal> element specifies a conversion
- via &xslt;. The following attributes may be specified:
+ via &acro.xslt;. The following attributes may be specified:
<variablelist>
<varlistentry><term><literal>stylesheet</literal> (REQUIRED)</term>