-<!-- $Id: tools.xml,v 1.34 2003-12-18 17:27:31 mike Exp $ -->
+<!-- $Id: tools.xml,v 1.49 2006-04-25 11:25:08 marc Exp $ -->
<chapter id="tools"><title>Supporting Tools</title>
<para>
<literal>@and</literal>. Its semantics are described in
section 3.7.2 (Proximity) of Z39.50 the standard itself, which
can be read on-line at
- <ulink url="http://lcweb.loc.gov/z3950/agency/markup/09.html"/>
+ <ulink url="&url.z39.50.proximity;"/>
</para>
<para>
In PQF, the proximity operation is represented by a sequence
</itemizedlist>
(The numeric values of the relation and well-known unit-code
parameters are taken straight from
- <ulink url="http://lcweb.loc.gov/z3950/agency/asn1.html#ProximityOperator"
+ <ulink url="&url.z39.50.proximity.asn1;"
>the ASN.1</ulink> of the proximity structure in the standard.)
</para>
</sect3>
<para>
<screen>
dylan
+
"bob dylan"
</screen>
</para>
<para>
<screen>
@or "dylan" "zimmerman"
+
@and @or dylan zimmerman when
+
@and when @or dylan zimmerman
</screen>
</para>
<para>
<screen>
@set Result-1
- @and @set seta setb
+
+ @and @set seta @set setb
</screen>
</para>
</example>
<para>
<screen>
@attr 1=4 computer
+
@attr 1=4 @attr 4=1 "self portrait"
+
@attrset exp1 @attr 1=1 CategoryList
+
@attr gils 1=2008 Copenhagen
+
@attr 1=/book/title computer
</screen>
</para>
<row>
<entry><literal>u=</literal><replaceable>value</replaceable></entry>
<entry>
- Use attribute. Common use attributes are
+ Use attribute (1). Common use attributes are
1 Personal-name, 4 Title, 7 ISBN, 8 ISSN, 30 Date,
62 Subject, 1003 Author), 1016 Any. Specify value
as an integer.
<row>
<entry><literal>r=</literal><replaceable>value</replaceable></entry>
<entry>
- Relation attribute. Common values are
+ Relation attribute (2). Common values are
1 <, 2 <=, 3 =, 4 >=, 5 >, 6 <>,
100 phonetic, 101 stem, 102 relevance, 103 always matches.
</entry>
<row>
<entry><literal>p=</literal><replaceable>value</replaceable></entry>
<entry>
- Position attribute. Values: 1 first in field, 2
+ Position attribute (3). Values: 1 first in field, 2
first in any subfield, 3 any position in field.
</entry>
</row>
<row>
<entry><literal>s=</literal><replaceable>value</replaceable></entry>
<entry>
- Structure attribute. Values: 1 phrase, 2 word,
+ Structure attribute (4). Values: 1 phrase, 2 word,
3 key, 4 year, 5 date, 6 word list, 100 date (un),
101 name (norm), 102 name (un), 103 structure, 104 urx,
105 free-form-text, 106 document-text, 107 local-number,
<row>
<entry><literal>t=</literal><replaceable>value</replaceable></entry>
<entry>
- Truncation attribute. Values: 1 right, 2 left,
+ Truncation attribute (5). Values: 1 right, 2 left,
3 left& right, 100 none, 101 process #, 102 regular-1,
103 regular-2, 104 CCL.
</entry>
<row>
<entry><literal>c=</literal><replaceable>value</replaceable></entry>
<entry>
- Completeness attribute. Values: 1 incomplete subfield,
+ Completeness attribute (6). Values: 1 incomplete subfield,
2 complete subfield, 3 complete field.
</entry>
</row>
</table>
</para>
<para>
- The complete list of Bib-1 attributes can be found
- <ulink url="http://lcweb.loc.gov/z3950/agency/defns/bib1.html">
- here
- </ulink>.
+ Refer to the complete
+ <ulink url="&url.z39.50.attset.bib1;">list of Bib-1 attributes</ulink>
</para>
<para>
It is also possible to specify non-numeric attribute values,
</row>
<row><entry><literal>r=o</literal></entry><entry>
- Allows operators greather-than, less-than, ... equals and
- sets relation attribute accordingly (relation ordered).
+ Allows ranges and the operators greather-than, less-than, ...
+ equals.
+ This sets Bib-1 relation attribute accordingly (relation
+ ordered). A query construct is only treated as a range if
+ dash is used and that is surrounded by white-space. So
+ <literal>-1980</literal> is treated as term
+ <literal>"-1980"</literal> not <literal><= 1980</literal>.
+ If <literal>- 1980</literal> is used, however, that is
+ treated as a range.
+ </entry>
+ </row>
+
+ <row><entry><literal>r=r</literal></entry><entry>
+ Similar to <literal>r=o</literal> but assumes that terms
+ are non-negative (not prefixed with <literal>-</literal>).
+ Thus, a dash will always be treated as a range.
+ The construct <literal>1980-1990</literal> is
+ treated as a range with <literal>r=r</literal> but as a
+ single term <literal>"1980-1990"</literal> with
+ <literal>r=o</literal>. The special attribute
+ <literal>r=r</literal> is available in YAZ 2.0.24 or later.
</entry>
</row>
date u=30 r=o
</screen>
<para>
- Four qualifiers are defined - <literal>ti</literal>,
- <literal>au</literal>, <literal>ranked</literal> and
- <literal>date</literal>.
- </para>
- <para>
<literal>ti</literal> and <literal>au</literal> both set
structure attribute to phrase (s=1).
<literal>ti</literal>
<para>
Query
<screen>
- year > 1980
+ date > 1980
</screen>
- is a valid query, while
+ is a valid query. But
<screen>
ti > 1980
</screen>
be an alias for <replaceable>q1</replaceable>,
<replaceable>q2</replaceable>... such that the CCL
query <replaceable>q=x</replaceable> is equivalent to
- <replaceable>q1=x or w2=x or ...</replaceable>.
+ <replaceable>q1=x or q2=x or ...</replaceable>.
</para>
</sect4>
</sect2>
<sect2 id="tools.cql"><title>CQL</title>
<para>
- <ulink url="http://www.loc.gov/z3950/agency/zing/cql/">CQL</ulink>
+ <ulink url="&url.cql;">CQL</ulink>
- Common Query Language - was defined for the
- <ulink url="http://www.loc.gov/z3950/agency/zing/srw/">SRW</ulink>
- protocol.
+ <ulink url="&url.srw;">SRW</ulink> protocol.
In many ways CQL has a similar syntax to CCL.
The objective of CQL is different. Where CCL aims to be
an end-user language, CQL is <emphasis>the</emphasis> protocol
<tip>
<para>
If you are new to CQL, read the
- <ulink url="http://zing.z3950.org/cql/intro.html">Gentle
- Introduction</ulink>.
+ <ulink url="&url.cql.intro;">Gentle Introduction</ulink>.
</para>
</tip>
<para>
<listitem>
<para>
The parser converts CQL to
- <ulink url="http://www.loc.gov/z3950/agency/zing/cql/xcql.html">
- XCQL</ulink>.
+ <ulink url="&url.xcql;">XCQL</ulink>.
XCQL is an XML representation of CQL.
XCQL is part of the SRW specification. However, since SRU
supports CQL only, we don't expect XCQL to be widely used.
<synopsis>
#define CQL_NODE_ST 1
#define CQL_NODE_BOOL 2
-#define CQL_NODE_MOD 3
struct cql_node {
int which;
union {
struct {
char *index;
+ char *index_uri;
char *term;
char *relation;
+ char *relation_uri;
struct cql_node *modifiers;
- struct cql_node *prefixes;
} st;
struct {
char *value;
struct cql_node *left;
struct cql_node *right;
struct cql_node *modifiers;
- struct cql_node *prefixes;
} boolean;
- struct {
- char *name;
- char *value;
- struct cql_node *next;
- } mod;
} u;
};
</synopsis>
- There are three kinds of nodes, search term (ST), boolean (BOOL),
- and modifier (MOD).
+ There are two node types: search term (ST) and boolean (BOOL).
+ A modifier is treated as a search term too.
</para>
<para>
The search term node has five members:
</listitem>
<listitem>
<para>
+ <literal>index_uri</literal>: index URi for search term
+ or NULL if none could be resolved for the index.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
<literal>term</literal>: the search term itself.
</para>
</listitem>
</listitem>
<listitem>
<para>
- <literal>modifiers</literal>: relation modifiers for search
- term. The <literal>modifiers</literal> is a simple linked
- list (NULL for last entry). Each relation modifier node
- is of type <literal>MOD</literal>.
+ <literal>relation_uri</literal>: relation URI for search term.
</para>
</listitem>
<listitem>
<para>
- <literal>prefixes</literal>: index prefixes for search
- term. The <literal>prefixes</literal> is a simple linked
- list (NULL for last entry). Each prefix node
- is of type <literal>MOD</literal>.
+ <literal>modifiers</literal>: relation modifiers for search
+ term. The <literal>modifiers</literal> list itself of cql_nodes
+ each of type <literal>ST</literal>.
</para>
</listitem>
</itemizedlist>
<literal>modifiers</literal>: proximity arguments.
</para>
</listitem>
- <listitem>
- <para>
- <literal>prefixes</literal>: index prefixes.
- The <literal>prefixes</literal> is a simple linked
- list (NULL for last entry). Each prefix node
- is of type <literal>MOD</literal>.
- </para>
- </listitem>
- </itemizedlist>
- </para>
-
- <para>
- The modifier node is a "utility" node used for name-value pairs,
- such as prefixes, proximity arguements, etc.
- <itemizedlist>
- <listitem>
- <para>
- <literal>name</literal> name of mod node.
- </para>
- </listitem>
- <listitem>
- <para>
- <literal>value</literal> value of mod node.
- </para>
- </listitem>
- <listitem>
- <para>
- <literal>next</literal>: pointer to next node which is
- always a mod node (NULL for last entry).
- </para>
- </listitem>
</itemizedlist>
</para>
returns a non-zero SRW error code; otherwise zero is returned
(conversion successful). The meanings of the numeric error
codes are listed in the SRW specifications at
- <ulink url="http://www.loc.gov/srw/diagnostic-list.html"/>
+ <ulink url="&url.sru.diagnostics.list;"/>
</para>
<para>
If conversion fails, more information can be obtained by calling
</para>
</sect3>
<sect3 id="tools.cql.map">
- <title>Specification of CQL to RPN mapping</title>
+ <title>Specification of CQL to RPN mappings</title>
<para>
The file supplied to functions
<function>cql_transform_open_FILE</function>,
<para>
again, corresponding to the specific OIDs defined by the standard.
Refer to the
- <ulink url="http://lcweb.loc.gov/z3950/agency/defns/oids.html">
+ <ulink url="&url.z39.50.oids;">
Registry of Z39.50 Object Identifiers</ulink> for the
whole list.
</para>
</para>
</sect1>
+
+ <sect1 id="tools.log"><title>Log</title>
+ <para>
+ &yaz; has evolved a fairly complex log system which should be useful both
+ for debugging &yaz; itself, debugging applications that use &yaz;, and for
+ production use of those applications.
+ </para>
+ <para>
+ The log functions are declared in header <filename>yaz/log.h</filename>
+ and implemented in <filename>src/log.c</filename>.
+ Due to name clash with syslog and some math utilities the logging
+ interface has been modified as of YAZ 2.0.29. The obsolete interface
+ is still available if in header file <filename>yaz/log.h</filename>.
+ The key points of the interface are:
+ </para>
+ <screen>
+ void yaz_log(int level, const char *fmt, ...)
+
+ void yaz_log_init(int level, const char *prefix, const char *name);
+ void yaz_log_init_file(const char *fname);
+ void yaz_log_init_level(int level);
+ void yaz_log_init_prefix(const char *prefix);
+ void yaz_log_time_format(const char *fmt);
+ void yaz_log_init_max_size(int mx);
+
+ int yaz_log_mask_str(const char *str);
+ int yaz_log_module_level(const char *name);
+ </screen>
+
+ <para>
+ The reason for the whole log module is the <function>yaz_log</function>
+ function. It takes a bitmask indicating the log levels, a
+ <literal>printf</literal>-like format string, and a variable number of
+ arguments to log.
+ </para>
+
+ <para>
+ The <literal>log level</literal> is a bit mask, that says on which level(s)
+ the log entry should be made, and optionally set some behaviour of the
+ logging. In the most simple cases, it can be one of <literal>YLOG_FATAL,
+ YLOG_DEBUG, YLOG_WARN, YLOG_LOG</literal>. Those can be combined with bits
+ that modify the way the log entry is written:<literal>YLOG_ERRNO,
+ YLOG_NOTIME, YLOG_FLUSH</literal>.
+ Most of the rest of the bits are deprecated, and should not be used. Use
+ the dynamic log levels instead.
+ </para>
+
+ <para>
+ Applications that use &yaz;, should not use the LOG_LOG for ordinary
+ messages, but should make use of the dynamic loglevel system. This consists
+ of two parts, defining the loglevel and checking it.
+ </para>
+
+ <para>
+ To define the log levels, the (main) program should pass a string to
+ <function>yaz_log_mask_str</function> to define which log levels are to be
+ logged. This string should be a comma-separated list of log level names,
+ and can contain both hard-coded names and dynamic ones. The log level
+ calculation starts with <literal>YLOG_DEFAULT_LEVEL</literal> and adds a bit
+ for each word it meets, unless the word starts with a '-', in which case it
+ clears the bit. If the string <literal>'none'</literal> is found,
+ all bits are cleared. Typically this string comes from the command-line,
+ often identified by <literal>-v</literal>. The
+ <function>yaz_log_mask_str</function> returns a log level that should be
+ passed to <function>yaz_log_init_level</function> for it to take effect.
+ </para>
+
+ <para>
+ Each module should check what log bits it should be used, by calling
+ <function>yaz_log_module_level</function> with a suitable name for the
+ module. The name is cleared from a preceding path and an extension, if any,
+ so it is quite possible to use <literal>__FILE__</literal> for it. If the
+ name has been passed to <function>yaz_log_mask_str</function>, the routine
+ returns a non-zero bitmask, which should then be used in consequent calls
+ to yaz_log. (It can also be tested, so as to avoid unnecessary calls to
+ yaz_log, in time-critical places, or when the log entry would take time
+ to construct.)
+ </para>
+
+ <para>
+ Yaz uses the following dynamic log levels:
+ <literal>server, session, request, requestdetail</literal> for the server
+ functionality.
+ <literal>zoom</literal> for the zoom client api.
+ <literal>ztest</literal> for the simple test server.
+ <literal>malloc, nmem, odr, eventl</literal> for internal debugging of yaz itself.
+ Of course, any program using yaz is welcome to define as many new ones, as
+ it needs.
+ </para>
+
+ <para>
+ By default the log is written to stderr, but this can be changed by a call
+ to <function>yaz_log_init_file</function> or
+ <function>yaz_log_init</function>. If the log is directed to a file, the
+ file size is checked at every write, and if it exceeds the limit given in
+ <function>yaz_log_init_max_size</function>, the log is rotated. The
+ rotation keeps one old version (with a <literal>.1</literal> appended to
+ the name). The size defaults to 1GB. Setting it to zero will disable the
+ rotation feature.
+ </para>
+
+ <screen>
+ A typical yaz-log looks like this
+ 13:23:14-23/11 yaz-ztest(1) [session] Starting session from tcp:127.0.0.1 (pid=30968)
+ 13:23:14-23/11 yaz-ztest(1) [request] Init from 'YAZ' (81) (ver 2.0.28) OK
+ 13:23:17-23/11 yaz-ztest(1) [request] Search Z: @attrset Bib-1 foo OK:7 hits
+ 13:23:22-23/11 yaz-ztest(1) [request] Present: [1] 2+2 OK 2 records returned
+ 13:24:13-23/11 yaz-ztest(1) [request] Close OK
+ </screen>
+
+ <para>
+ The log entries start with a time stamp. This can be omitted by setting the
+ <literal>YLOG_NOTIME</literal> bit in the loglevel. This way automatic tests
+ can be hoped to produce identical log files, that are easy to diff. The
+ format of the time stamp can be set with
+ <function>yaz_log_time_format</function>, which takes a format string just
+ like <function>strftime</function>.
+ </para>
+
+ <para>
+ Next in a log line comes the prefix, often the name of the program. For
+ yaz-based servers, it can also contain the session number. Then
+ comes one or more logbits in square brackets, depending on the logging
+ level set by <function>yaz_log_init_level</function> and the loglevel
+ passed to <function>yaz_log_init_level</function>. Finally comes the format
+ string and additional values passed to <function>yaz_log</function>
+ </para>
+
+ <para>
+ The log level <literal>YLOG_LOGLVL</literal>, enabled by the string
+ <literal>loglevel</literal>, will log all the log-level affecting
+ operations. This can come in handy if you need to know what other log
+ levels would be useful. Grep the logfile for <literal>[loglevel]</literal>.
+ </para>
+
+ <para>
+ The log system is almost independent of the rest of &yaz;, the only
+ important dependence is of <filename>nmem</filename>, and that only for
+ using the semaphore definition there.
+ </para>
+
+ <para>
+ The dynamic log levels and log rotation were introduced in &yaz; 2.0.28. At
+ the same time, the log bit names were changed from
+ <literal>LOG_something</literal> to <literal>YLOG_something</literal>,
+ to avoid collision with <filename>syslog.h</filename>.
+ </para>
+
+ </sect1>
+
+ <sect1 id="tools.marc"><title>MARC</title>
+
+ <para>
+ YAZ provides a fast utility that decodes MARC records and
+ encodes to a varity of output formats. The MARC records must
+ be encoded in ISO2709.
+ </para>
+ <synopsis><![CDATA[
+ #include <yaz/marcdisp.h>
+
+ /* create handler */
+ yaz_marc_t yaz_marc_create(void);
+ /* destroy */
+ void yaz_marc_destroy(yaz_marc_t mt);
+
+ /* set XML mode YAZ_MARC_LINE, YAZ_MARC_SIMPLEXML, ... */
+ void yaz_marc_xml(yaz_marc_t mt, int xmlmode);
+ #define YAZ_MARC_LINE 0
+ #define YAZ_MARC_SIMPLEXML 1
+ #define YAZ_MARC_OAIMARC 2
+ #define YAZ_MARC_MARCXML 3
+ #define YAZ_MARC_ISO2709 4
+ #define YAZ_MARC_XCHANGE 5
+
+ /* supply iconv handle for character set conversion .. */
+ void yaz_marc_iconv(yaz_marc_t mt, yaz_iconv_t cd);
+
+ /* set debug level, 0=none, 1=more, 2=even more, .. */
+ void yaz_marc_debug(yaz_marc_t mt, int level);
+
+ /* decode MARC in buf of size bsize. Returns >0 on success; <=0 on failure.
+ On success, result in *result with size *rsize. */
+ int yaz_marc_decode_buf (yaz_marc_t mt, const char *buf, int bsize,
+ char **result, int *rsize);
+
+ /* decode MARC in buf of size bsize. Returns >0 on success; <=0 on failure.
+ On success, result in WRBUF */
+ int yaz_marc_decode_wrbuf (yaz_marc_t mt, const char *buf,
+ int bsize, WRBUF wrbuf);
+]]>
+ </synopsis>
+ <para>
+ A MARC conversion handle must be created by using
+ <function>yaz_marc_create</function> and destroyed
+ by calling <function>yaz_marc_destroy</function>.
+ </para>
+ <para>
+ All other function operate on a <literal>yaz_marc_t</literal> handle.
+ The output is specified by a call to <function>yaz_marc_xml</function>.
+ The <literal>xmlmode</literal> must be one of
+ <variablelist>
+ <varlistentry>
+ <term>YAZ_MARC_LINE</term>
+ <listitem>
+ <para>
+ A simple line-by-line format suitable for display but not
+ recommend for further (machine) processing.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>YAZ_MARC_MARXML</term>
+ <listitem>
+ <para>
+ The resulting record is converted to MARCXML.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>YAZ_MARC_ISO2709</term>
+ <listitem>
+ <para>
+ The resulting record is converted to ISO2709 (MARC).
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ <para>
+ The actual conversion functions are
+ <function>yaz_marc_decode_buf</function> and
+ <function>yaz_marc_decode_wrbuf</function> which decodes and encodes
+ a MARC record. The former function operates on simple buffers, the
+ stores the resulting record in a WRBUF handle (WRBUF is a simple string
+ type).
+ </para>
+ <example>
+ <title>Display of MARC record</title>
+ <para>
+ The followint program snippet illustrates how the MARC API may
+ be used to convert a MARC record to the line-by-line format:
+ <programlisting><![CDATA[
+ void print_marc(const char *marc_buf, int marc_buf_size)
+ {
+ char *result; /* for result buf */
+ int result_len; /* for size of result */
+ yaz_marc_t mt = yaz_marc_create();
+ yaz_marc_xml(mt, YAZ_MARC_LINE);
+ yaz_marc_decode_buf(mt, marc_buf, marc_buf_size,
+ &result, &result_len);
+ fwrite(result, result_len, 1, stdout);
+ yaz_marc_destroy(mt); /* note that result is now freed... */
+ }
+]]>
+ </programlisting>
+ </para>
+ </example>
+ </sect1>
+
</chapter>
<!-- Keep this comment at the end of the file