+ <sect1 id="tools.retrieval">
+ <title>Retrieval Facility</title>
+ <para>
+ YAZ version 2.1.20 or later includes a Retrieval facility tool
+ which allows a SRU/Z39.50 to describe itself and perform record
+ conversions. The idea is the following:
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ An SRU/Z39.50 client sends a retrieval request which includes
+ a combination of the following parameters: syntax (format),
+ schema (or element set name).
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ The retrieval facility is invoked with parameters in a
+ server/proxy. The retrieval facility matches the parameters a set of
+ "supported" retrieval types.
+ If there is no match, the retrieval signals an error
+ (syntax and / or schema not supported).
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ For a successful match, the backend is invoked with the same
+ or altered retrieval parameters (syntax, schema). If
+ a record is received from the backend, it is converted to the
+ frontend name / syntax.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ The resulting record is sent back the client and tagged with
+ the frontend syntax / schema.
+ </para>
+ </listitem>
+
+ </itemizedlist>
+ </para>
+ <para>
+ The Retrieval facility is driven by an XML configuration. The
+ configuration is neither Z39.50 ZeeRex or SRU ZeeRex. But it
+ should be easy to generate both of them from the XML configuration.
+ (unfortunately the two versions
+ of ZeeRex differ substantially in this regard).
+ </para>
+ <sect2 id="tools.retrieval.format">
+ <title>Retrieval XML format</title>
+ <para>
+ All elements should be covered by namespace
+ <literal>http://indexdata.com/yaz</literal> .
+ The root element node must be <literal>retrievalinfo</literal>.
+ </para>
+ <para>
+ The <literal>retrievalinfo</literal> must include one or
+ more <literal>retrieval</literal> elements. Each
+ <literal>retrieval</literal> defines specific combination of
+ syntax, name and identifier supported by this retrieval service.
+ </para>
+ <para>
+ The <literal>retrieval</literal> element may include any of the
+ following attributes:
+ <variablelist>
+ <varlistentry><term><literal>syntax</literal> (REQUIRED)</term>
+ <listitem>
+ <para>
+ Defines the record syntax. Possible values is any
+ of the names defined in YAZ' OID database or a raw
+ OID in (n.n ... n).
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry><term><literal>name</literal> (OPTIONAL)</term>
+ <listitem>
+ <para>
+ Defines the name of the retrieval format. This can be
+ any string. For SRU, the value, is equivalent to schema (short-hand);
+ for Z39.50 it's equivalent to simple element set name.
+ For YAZ 3.0.24 and later this name may be specified as a glob
+ expression with operators
+ <literal>*</literal> and <literal>?</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry><term><literal>identifier</literal> (OPTIONAL)</term>
+ <listitem>
+ <para>
+ Defines the URI schema name of the retrieval format. This can be
+ any string. For SRU, the value, is equivalent to URI schema.
+ For Z39.50, there is no equivalent.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ <para>
+ The <literal>retrieval</literal> may include one
+ <literal>backend</literal> element. If a <literal>backend</literal>
+ element is given, it specifies how the records are retrieved by
+ some backend and how the records are converted from the backend to
+ the "frontend".
+ </para>
+ <para>
+ The attributes, <literal>name</literal> and <literal>syntax</literal>
+ may be specified for the <literal>backend</literal> element. These
+ semantics of these attributes is equivalent to those for the
+ <literal>retrieval</literal>. However, these values are passed to
+ the "backend".
+ </para>
+ <para>
+ The <literal>backend</literal> element may includes one or more
+ conversion instructions (as children elements). The supported
+ conversions are:
+ <variablelist>
+ <varlistentry><term><literal>marc</literal></term>
+ <listitem>
+ <para>
+ The <literal>marc</literal> element specifies a conversion
+ to - and from ISO2709 encoded MARC and
+ <ulink url="&url.marcxml;">&acro.marcxml;</ulink>/MarcXchange.
+ The following attributes may be specified:
+
+ <variablelist>
+ <varlistentry><term><literal>inputformat</literal> (REQUIRED)</term>
+ <listitem>
+ <para>
+ Format of input. Supported values are
+ <literal>marc</literal> (for ISO2709), <literal>xml</literal>
+ (MARCXML/MarcXchange) and <literal>json</literal>
+ (<ulink url="&url.marc_in_json;">MARC-in_JSON</ulink>).
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry><term><literal>outputformat</literal> (REQUIRED)</term>
+ <listitem>
+ <para>
+ Format of output. Supported values are
+ <literal>line</literal> (MARC line format);
+ <literal>marcxml</literal> (for MARCXML),
+ <literal>marc</literal> (ISO2709),
+ <literal>marcxhcange</literal> (for MarcXchange),
+ or <literal>json</literal>
+ (<ulink url="&url.marc_in_json;">MARC-in_JSON </ulink>).
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry><term><literal>inputcharset</literal> (OPTIONAL)</term>
+ <listitem>
+ <para>
+ Encoding of input. For XML input formats, this need not
+ be given, but for ISO2709 based inputformats, this should
+ be set to the encoding used. For MARC21 records, a common
+ inputcharset value would be <literal>marc-8</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry><term><literal>outputcharset</literal> (OPTIONAL)</term>
+ <listitem>
+ <para>
+ Encoding of output. If outputformat is XML based, it is
+ strongly recommened to use <literal>utf-8</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry><term><literal>xslt</literal></term>
+ <listitem>
+ <para>
+ The <literal>xslt</literal> element specifies a conversion
+ via &acro.xslt;. The following attributes may be specified:
+
+ <variablelist>
+ <varlistentry><term><literal>stylesheet</literal> (REQUIRED)</term>
+ <listitem>
+ <para>
+ Stylesheet file.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry><term><literal>solrmarc</literal></term>
+ <listitem>
+ <para>
+ The <literal>solrmarc</literal> decodes solrmarc records.
+ It assumes that the input is pure solrmarc text (no escaping)
+ and will convert all sequences of the form #XX; to a single
+ character of the hexadecimal value as given by XX. The output,
+ presumably, is a valid ISO2709 buffer.
+ </para>
+ <para>
+ This conversion is available in YAZ 5.0.21 and later.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </sect2>
+ <sect2 id="tools.retrieval.examples">
+ <title>Retrieval Facility Examples</title>
+ <example id="tools.retrieval.marc21">
+ <title>MARC21 backend</title>
+ <para>
+ A typical way to use the retrieval facility is to enable XML
+ for servers that only supports ISO2709 encoded MARC21 records.
+ </para>
+ <programlisting><![CDATA[
+ <retrievalinfo>
+ <retrieval syntax="usmarc" name="F"/>
+ <retrieval syntax="usmarc" name="B"/>
+ <retrieval syntax="xml" name="marcxml"
+ identifier="info:srw/schema/1/marcxml-v1.1">
+ <backend syntax="usmarc" name="F">
+ <marc inputformat="marc" outputformat="marcxml"
+ inputcharset="marc-8"/>
+ </backend>
+ </retrieval>
+ <retrieval syntax="xml" name="dc">
+ <backend syntax="usmarc" name="F">
+ <marc inputformat="marc" outputformat="marcxml"
+ inputcharset="marc-8"/>
+ <xslt stylesheet="MARC21slim2DC.xsl"/>
+ </backend>
+ </retrieval>
+ </retrievalinfo>
+]]>
+ </programlisting>
+ <para>
+ This means that our frontend supports:
+ <itemizedlist>
+ <listitem>
+ <para>
+ MARC21 F(ull) records.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ MARC21 B(rief) records.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ MARCXML records.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Dublin core records.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ </example>
+
+ <example id="tools.retrieval.marcxml">
+ <title>MARCXML backend</title>
+ <para>
+ SRW/SRU and Solr backends returns records in XML.
+ If they return MARCXML or MarcXchange, the retrieval module
+ can convert those into ISO2709 formats, most commonly USMARC
+ (AKA MARC21).
+ In this example, the backend returns MARCXML for schema="marcxml".
+ </para>
+ <programlisting><![CDATA[
+ <retrievalinfo>
+ <retrieval syntax="usmarc">
+ <backend syntax="xml" name="marcxml">
+ <marc inputformat="xml" outputformat="marc"
+ outputcharset="marc-8"/>
+ </backend>
+ </retrieval>
+ <retrieval syntax="xml" name="marcxml"
+ identifier="info:srw/schema/1/marcxml-v1.1"/>
+ <retrieval syntax="xml" name="dc">
+ <backend syntax="xml" name="marcxml">
+ <xslt stylesheet="MARC21slim2DC.xsl"/>
+ </backend>
+ </retrieval>
+ </retrievalinfo>
+]]>
+ </programlisting>
+ <para>
+ This means that our frontend supports:
+ <itemizedlist>
+ <listitem>
+ <para>
+ MARC21 records (any element set name) in MARC-8 encoding.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ MARCXML records for element-set=marcxml
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Dublin core records for element-set=dc.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ </example>
+
+ </sect2>
+ <sect2 id="tools.retrieval.api">
+ <title>API</title>
+ <para>
+ It should be easy to use the retrieval systems from applications. Refer
+ to the headers
+ <filename>yaz/retrieval.h</filename> and
+ <filename>yaz/record_conv.h</filename>.
+ </para>
+ </sect2>
+ </sect1>
+ <sect1 id="sorting"><title>Sorting</title>
+ <para>
+ This chapter describes sorting and how it is supported in YAZ.
+ Sorting applies to a result-set.
+ The <ulink url="http://www.loc.gov/z3950/agency/markup/05.html#3.2.7">
+ Z39.50 sorting facility
+ </ulink>
+ takes one or more input result-sets
+ and one result-set as output. The most simple case is that
+ the input-set is the same as the output-set.
+ </para>
+ <para>
+ Z39.50 sorting has a separate APDU (service) that is, thus, performed
+ following a search (two phases).
+ </para>
+ <para>
+ In SRU/Solr, however, the model is different. Here, sorting is specified
+ during the the search operation. Note, however, that SRU might
+ perform sort as separate search, by referring to an existing result-set
+ in the query (result-set reference).
+ </para>
+ <sect2><title>Using the Z39.50 sort service</title>
+ <para>
+ yaz-client and the ZOOM API supports the Z39.50 sort facility. In any
+ case the sort sequence or sort critiera is using a string notation.
+ This notation is a one-line notation suitable for being manually
+ entered or generated and allows for easy logging (one liner).
+ For the ZOOM API, the sort is specified in the call to ZOOM_query_sortby
+ function. For yaz-client the sort is performed and specified using
+ the sort and sort+ commands. For description of the sort criteria notation
+ refer to the <link linkend="sortspec">sort command</link> in the
+ yaz-client manual.
+ </para>
+ <para>
+ The ZOOM API might choose one of several sort strategies for
+ sorting. Refer to <xref linkend="zoom-sort-strategy"/>.
+ </para>
+ </sect2>
+ <sect2><title>Type-7 sort</title>
+ <para>
+ Type-7 sort is an extension to the Bib-1 based RPN query where the
+ sort specification is embedded as an Attribute-Plus-Term.
+ </para>
+ <para>
+ The objectives for introducing Type-7 sorting is that it allows
+ a client to perform sorting even if it does not implement/support
+ Z39.50 sort. Virtually all Z39.50 client software supports
+ RPN queries. It also may improve performance because the sort
+ critieria is specified along with the search query.
+ </para>
+ <para>
+ The sort is triggered by the presence of type 7 and the value of type 7
+ specifies the
+ <ulink url="http://www.loc.gov/z3950/agency/asn1.html#SortKeySpec">
+ sortRelation
+ </ulink>
+ The value for type 7 is 1 for ascending and 2 for descending.
+ For the
+ <ulink url="http://www.loc.gov/z3950/agency/asn1.html#SortElement">
+ sortElement
+ </ulink>
+ only the generic part is handled. If generic sortKey is of type
+ sortField, then attribute type 1 is present and the value is
+ sortField (InternationalString). If generic sortKey is of type
+ sortAttributes, then the attributes in list is used . generic sortKey
+ of type elementSpec is not supported.
+ </para>
+ <para>
+ The term in the sorting Attribute-Plus-Term combo should hold
+ an integer. The value is 0 for primary sorting criteria, 1 for second
+ criteria, etc.
+ </para>
+ </sect2>
+ </sect1>
+ <sect1 id="facets"><title>Facets</title>
+ <para>
+ YAZ supports facets for in Solr, SRU 2.0 and Z39.50 protocols.
+ </para>
+ <para>
+ Like Type-1/RPN, YAZ supports a string notation for specifying
+ facets. For the API this is performed by
+ <function>yaz_pqf_parse_facet_list</function>.
+ </para>
+ <para>
+ For ZOOM C the facets are given by option "facets"
+ For yaz-client it is used for the facets command.
+ </para>
+ <para>
+ The grammar of this specification is as follows:
+ <literallayout>
+ facet-spec ::= facet-list
+
+ facet-list ::= facet-list ',' attr-spec | attr-spec
+
+ attr-spec ::= attr-spec '@attr' string | '@attr' string
+
+ </literallayout>
+ The notation is inspired by PQF. The string following '@attr'
+ may not include blanks and is of the form
+ <replaceable>type</replaceable><literal>=</literal><replaceable>value</replaceable>,
+ where <replaceable>type</replaceable> is an integer and
+ <replaceable>value</replaceable> is a string or an integer.
+ </para>
+ <para>
+ The Facets specification is not Bib-1. The following types apply:
+ </para>
+ <table id="facet.attributes">
+ <title>Facet attributes</title>
+ <tgroup cols="2">
+ <colspec colwidth="2*" colname="type"></colspec>
+ <colspec colwidth="9*" colname="description"></colspec>
+ <thead>
+ <row>
+ <entry>Type</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>1</entry>
+ <entry>
+ Field-name. This is often a string, eg "Author", "Year", etc.
+ </entry>
+ </row>
+
+ <row>
+ <entry>2</entry>
+ <entry>
+ Sort order. Value should be an integer.
+ Value 0: count descending (frequency). Value 1: alpha ascending.
+ </entry>
+ </row>
+
+ <row>
+ <entry>3</entry>
+ <entry>
+ Number of terms requested.
+ </entry>
+ </row>
+
+ <row>
+ <entry>4</entry>
+ <entry>
+ Start offset.
+ </entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+ </sect1>