<?xml version="1.0" standalone="no"?>
-<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook V4.1//EN"
- "http://www.oasis-open.org/docbook/xml/4.1/docbookx.dtd"
+<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook V4.4//EN"
+ "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"
[
<!ENTITY % local SYSTEM "local.ent">
%local;
<refentryinfo>
<productname>Pazpar2</productname>
<productnumber>&version;</productnumber>
+ <info><orgname>Index Data</orgname></info>
</refentryinfo>
+
<refmeta>
<refentrytitle>Pazpar2 conf</refentrytitle>
<manvolnum>5</manvolnum>
+ <refmiscinfo class="manual">File formats and conventions</refmiscinfo>
</refmeta>
<refnamediv>
<refsect1><title>FORMAT</title>
<para>
- The configuration file is XML-structured. It must be valid XML. All
+ The configuration file is XML-structured. It must be well-formed XML. All
elements specific to Pazpar2 should belong to the namespace
<literal>http://www.indexdata.com/pazpar2/1.0</literal>
(this is assumed in the
- following examples). The root element is named <literal>pazpar2</literal>.
+ following examples). The root element is named "<literal>pazpar2</literal>".
Under the root element are a number of elements which group categories of
information. The categories are described below.
</para>
+ <refsect2 id="config-threads"><title>threads</title>
+ <para>
+ This section is optional and is supported for Pazpar2 version 1.3.1 and
+ later . It is identified by element "<literal>threads</literal>" which
+ may include one attribute "<literal>number</literal>" which specifies
+ the number of worker-threads that the Pazpar2 instance is to use.
+ A value of 0 (zero) disables worker-threads (all work is carried out
+ in main thread).
+ </para>
+ </refsect2>
<refsect2 id="config-server"><title>server</title>
<para>
- This section governs overall behavior of the server. The data
+ This section governs overall behavior of a server endpoint. It is identified
+ by the element "server" which takes an optional attribute, "id", which
+ identifies this particular Pazpar2 server. Any string value for "id"
+ may be given.
+ </para>
+ <para>The data
elements are described below. From Pazpar2 version 1.2 this is
a repeatable element.
</para>
</para>
</listitem>
</varlistentry>
-
- <varlistentry>
- <term>relevance</term>
- <listitem>
- <para>
- Specifies ICU tokenization and transformation rules
- for tokens that are used in Pazpar2's relevance ranking. The 'id'
- attribute is currently not used, and the 'locale'
- attribute must be set to one of the locale strings
- defined in ICU. The child elements listed below can be
- in any order, except the 'index' element which logically
- belongs to the end of the list. The stated tokenization,
- transformation and charmapping instructions are performed
- in order from top to bottom.
- </para>
- <variablelist> <!-- Level 2 -->
- <varlistentry><term>casemap</term>
- <listitem>
- <para>
- The attribute 'rule' defines the direction of the
- per-character casemapping, allowed values are "l"
- (lower), "u" (upper), "t" (title).
- </para>
- </listitem>
- </varlistentry>
- <varlistentry><term>transform</term>
- <listitem>
- <para>
- Normalization and transformation of tokens follows
- the rules defined in the 'rule' attribute. For
- possible values we refer to the extensive ICU
- documentation found at the
- <ulink url="&url.icu.transform;">ICU
- transformation</ulink> home page. Set filtering
- principles are explained at the
- <ulink url="&url.icu.unicode.set;">ICU set and
- filtering</ulink> page.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry><term>tokenize</term>
- <listitem>
- <para>
- Tokenization is the only rule in the ICU chain
- which splits one token into multiple tokens. The
- 'rule' attribute may have the following values:
- "s" (sentence), "l" (line-break), "w" (word), and
- "c" (character), the later probably not being
- very useful in a pruning Pazpar2 installation.
- </para>
- </listitem>
- </varlistentry>
- </variablelist>
- </listitem>
- </varlistentry>
<varlistentry>
- <term>sort</term>
+ <term>relevance / sort / mergekey / facet</term>
<listitem>
<para>
- Specifies ICU tokenization and transformation rules
- for tokens that are used in Pazpar2's sorting. The contents
- is similar to that of <literal>relevance</literal>.
+ Specifies character set normalization for relevancy / sorting /
+ mergekey and facets - for the server. These definitions serves as
+ default for services that don't have these given. For the meaning
+ of these settings refer to the "relevance" element inside service.
</para>
</listitem>
</varlistentry>
<varlistentry>
- <term>mergekey</term>
+ <term>settings</term>
<listitem>
<para>
- Specifies ICU tokenization and transformation rules
- for tokens that are used in Pazpar2's mergekey. The contents
- is similar to that of <literal>relevance</literal>.
+ Specifies target settings for the server.. These settings serves
+ as default for all services which don't have these given.
+ The settings element requires one attribute 'src' which specifies
+ a settings file or a directory . If a directory is given all
+ files with suffix <filename>.xml</filename> is read from this
+ directory. Refer to
+ <xref linkend="target_settings"/> for more information.
</para>
</listitem>
</varlistentry>
One of these elements is required for every data element in
the internal representation of the record (see
<xref linkend="data_model"/>. It governs
- subsequent processing as pertains to sorting, relevance
- ranking, merging, and display of data elements. It supports
- the following attributes:
+ subsequent processing as pertains to sorting, relevance
+ ranking, merging, and display of data elements. It supports
+ the following attributes:
</para>
<variablelist> <!-- level 3 -->
longest element (strlen), 'range' (calculate a range
of values across all matching records), 'all' (include
all elements), or 'no' (don't merge; this is the
- default);
+ default);
</para>
</listitem>
</varlistentry>
<varlistentry><term>mergekey</term>
<listitem>
<para>
- If set to <literal>yes</literal>, the value of this
- metadata element is appended to the resulting mergekey.
- By default metadata is not part of a mergekey.
+ If set to '<literal>required</literal>', the value of this
+ metadata element is appended to the resulting mergekey if
+ the metadata is present in a record instance.
+ If the metadata element is not present, the a unique mergekey
+ will be generated instead.
+ </para>
+ <para>
+ If set to '<literal>optional</literal>', the value of this
+ metadata element is appended to the resulting mergekey if the
+ the metadata is present in a record instance. If the metadata
+ is not present, it will be empty.
+ </para>
+ <para>
+ If set to '<literal>no</literal>' or the mergekey attribute is
+ omitted, the metadata will not be used in the creation of a
+ mergekey.
</para>
</listitem>
</varlistentry>
-
<varlistentry><term>setting</term>
<listitem>
<para>
- This attribute allows you to make use of static database
- settings in the processing of records. Three possible values
- are allowed. 'no' is the default and doesn't do anything.
- 'postproc' copies the value of a setting with the same name
- into the output of the normalization stylesheet(s). 'parameter'
- makes the value of a setting with the same name available
- as a parameter to the normalization stylesheet, so you
- can further process the value inside of the stylesheet, or use
- the value to decide how to deal with other data values.
+ This attribute allows you to make use of static database
+ settings in the processing of records. Three possible values
+ are allowed. 'no' is the default and doesn't do anything.
+ 'postproc' copies the value of a setting with the same name
+ into the output of the normalization stylesheet(s). 'parameter'
+ makes the value of a setting with the same name available
+ as a parameter to the normalization stylesheet, so you
+ can further process the value inside of the stylesheet, or use
+ the value to decide how to deal with other data values.
</para>
<para>
+ The purpose of using settings in this way can either be to
+ control the behavior of normalization stylesheet in a database-
+ dependent way, or to easily make database-dependent values
+ available to display-logic in your user interface, without having
+ to implement complicated interactions between the user interface
+ and your configuration system.
</para>
- The purpose of using settings in this way can either be to
- control the behavior of normalization stylesheet in a database-
- dependent way, or to easily make database-dependent values
- available to display-logic in your user interface, without having
- to implement complicated interactions between the user interface
- and your configuration system.
</listitem>
</varlistentry>
+
</variablelist> <!-- attributes to metadata -->
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term>relevance</term>
+ <listitem>
+ <para>
+ Specifies ICU tokenization and transformation rules
+ for tokens that are used in Pazpar2's relevance ranking.
+ The 'id' attribute is currently not used, and the 'locale'
+ attribute must be set to one of the locale strings
+ defined in ICU. The child elements listed below can be
+ in any order, except the 'index' element which logically
+ belongs to the end of the list. The stated tokenization,
+ transformation and charmapping instructions are performed
+ in order from top to bottom.
+ </para>
+ <variablelist> <!-- Level 2 -->
+ <varlistentry><term>casemap</term>
+ <listitem>
+ <para>
+ The attribute 'rule' defines the direction of the
+ per-character casemapping, allowed values are "l"
+ (lower), "u" (upper), "t" (title).
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry><term>transform</term>
+ <listitem>
+ <para>
+ Normalization and transformation of tokens follows
+ the rules defined in the 'rule' attribute. For
+ possible values we refer to the extensive ICU
+ documentation found at the
+ <ulink url="&url.icu.transform;">ICU
+ transformation</ulink> home page. Set filtering
+ principles are explained at the
+ <ulink url="&url.icu.unicode.set;">ICU set and
+ filtering</ulink> page.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry><term>tokenize</term>
+ <listitem>
+ <para>
+ Tokenization is the only rule in the ICU chain
+ which splits one token into multiple tokens. The
+ 'rule' attribute may have the following values:
+ "s" (sentence), "l" (line-break), "w" (word), and
+ "c" (character), the later probably not being
+ very useful in a pruning Pazpar2 installation.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ <para>
+ From Pazpar2 version 1.1 the ICU wrapper from YAZ is used.
+ Refer to the <ulink url="&url.yaz.yaz-icu;">yaz-icu</ulink>
+ utility for more information.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>sort</term>
+ <listitem>
+ <para>
+ Specifies ICU tokenization and transformation rules
+ for tokens that are used in Pazpar2's sorting. The contents
+ is similar to that of <literal>relevance</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>mergekey</term>
+ <listitem>
+ <para>
+ Specifies ICU tokenization and transformation rules
+ for tokens that are used in Pazpar2's mergekey. The contents
+ is similar to that of <literal>relevance</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>facet</term>
+ <listitem>
+ <para>
+ Specifies ICU tokenization and transformation rules
+ for tokens that are used in Pazpar2's facets. The contents
+ is similar to that of <literal>relevance</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>settings</term>
+ <listitem>
+ <para>
+ Specifies target settings for this service. Refer to
+ <xref linkend="target_settings"/>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>timeout</term>
+ <listitem>
+ <para>
+ Specifies timeout parameters for this service.
+ The <literal>timeout</literal>
+ element supports the following attributes:
+ <literal>session</literal>, <literal>z3950_operation</literal>,
+ <literal>z3950_session</literal> which specifies
+ 'session timeout', 'Z39.50 operation timeout',
+ 'Z39.50 session timeout' respectively. The Z39.50 operation
+ timeout is the time Pazpar2 will wait for an active Z39.50/SRU
+ operation before it gives up (times out). The Z39.50 session
+ time out is the time Pazpar2 will keep the session alive for
+ an idle session (no operation).
+ </para>
+ <para>
+ The following is recommended but not required:
+ z3950_operation (30) < session (60) < z3950_session (180) .
+ The default values are given in parantheses.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist> <!-- Data elements in service directive -->
</listitem>
</varlistentry>
+
</variablelist> <!-- Data elements in server directive -->
</refsect2>
<refsect1><title>EXAMPLE</title>
<para>Below is a working example configuration:
- <screen><![CDATA[
-<?xml version="1.0" encoding="UTF-8"?>
-<pazpar2 xmlns="http://www.indexdata.com/pazpar2/1.0">
-
-<server>
- <listen port="9004"/>
- <proxy host="us1.indexdata.com" myurl="us1.indexdata.com"/>
-
- <!-- optional ICU ranking configuration example -->
- <!--
- <icu_chain id="el:word" locale="el">
- <normalize rule="[:Control:] Any-Remove"/>
- <tokenize rule="l"/>
- <normalize rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
- <casemap rule="l"/>
- <index/>
- </icu_chain>
- -->
-
- <service>
- <metadata name="title" brief="yes" sortkey="skiparticle" merge="longest" rank="6"/>
- <metadata name="isbn" merge="unique"/>
- <metadata name="date" brief="yes" sortkey="numeric" type="year" merge="range"
- termlist="yes"/>
- <metadata name="author" brief="yes" termlist="yes" merge="longest" rank="2"/>
- <metadata name="subject" merge="unique" termlist="yes" rank="3"/>
- <metadata name="url" merge="unique"/>
- </service>
-</server>
-
-</pazpar2>
-]]></screen>
+ <screen><![CDATA[
+ <?xml version="1.0" encoding="UTF-8"?>
+ <pazpar2 xmlns="http://www.indexdata.com/pazpar2/1.0">
+
+ <threads number="10"/>
+ <server>
+ <listen port="9004"/>
+ <service>
+ <metadata name="title" brief="yes" sortkey="skiparticle"
+ merge="longest" rank="6"/>
+ <metadata name="isbn" merge="unique"/>
+ <metadata name="date" brief="yes" sortkey="numeric"
+ type="year" merge="range" termlist="yes"/>
+ <metadata name="author" brief="yes" termlist="yes"
+ merge="longest" rank="2"/>
+ <metadata name="subject" merge="unique" termlist="yes" rank="3"/>
+ <metadata name="url" merge="unique"/>
+ <relevance>
+ <icu_chain id="relevance" locale="el">
+ <transform rule="[:Control:] Any-Remove"/>
+ <tokenize rule="l"/>
+ <transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
+ <casemap rule="l"/>
+ </icu_chain>
+ </relevance>
+ <settings src="mysettings"/>
+ <timeout session="60"/>
+ <service>
+ </server>
+ </pazpar2>
+ ]]></screen>
</para>
</refsect1>
<set name="pz:cclmap:isbn" value="u=7"/>
<set name="pz:cclmap:issn" value="u=8"/>
<set name="pz:cclmap:date" value="u=30 r=r"/>
+q
+ <set name="pz:limitmap:title" value="rpn:@attr 1=4 @attr 6=3"/>
+ <set name="pz:limitmap:date" value="ccl:date"/>
<!-- Retrieval settings -->
</para>
</listitem>
</varlistentry>
- <varlistentry>
+ <varlistentry id="requestsyntax">
<term>pz:requestsyntax</term>
<listitem>
<para>
<term>pz:nativesyntax</term>
<listitem>
<para>
- The representation (syntax) of the retrieval records. Currently
- recognized values are iso2709 and xml.
+ Specifies how Pazpar2 shoule map retrieved records to XML. Currently
+ supported values are <literal>xml</literal>,
+ <literal>iso2709</literal> and <literal>txml</literal>.
</para>
<para>
- For iso2709, can also specify a native character set, e.g. "iso2709;latin-1".
- If no character set is provided, MARC-8 is assumed.
+ The value <literal>iso2709</literal> makes Pazpar2 convert retrieved
+ MARC records to MARCXML. In order to convert to XML, the exact
+ chacater set of the MARC must be known (if not, the resulting
+ XML is probably not well-formed). The character set may be
+ specified by adding:
+ <literal>;charset=</literal><replaceable>charset</replaceable> to
+ <literal>iso2709</literal>. If omitted, a charset of
+ MARC-8 is assumed. This is correct for most MARC21/USMARC records.
</para>
<para>
- If pz:nativesyntax is not specified, pazpar2 will attempt to determine
- the value based on the response from the server.
+ The value <literal>txml</literal> is like <literal>iso2709</literal>
+ except that records are converted to TurboMARC instead of MARCXML.
+ </para>
+ <para>
+ The value <literal>xml</literal> is used if Pazpar2 retrieves
+ records that are already XML (no conversion takes place).
</para>
</listitem>
</varlistentry>
</varlistentry>
<varlistentry>
+ <term>pz:negotiation_charset</term>
+ <listitem>
+ <para>
+ Sets character set for Z39.50 negotiation. Most targets do not support
+ this, and some will even close connection if set (crash on server
+ side or similar). If set, you probably want to set it to
+ <literal>UTF-8</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term>pz:xslt</term>
<listitem>
<para>
- Provides the path of an XSLT stylesheet which will be used to
- map incoming records to the internal representation.
+ Is a comma separated list of of files that specifies
+ how to convert incoming records to the internal representation.
+ </para>
+ <para>
+ The suffix of each file specifies the kind of tranformation.
+ Suffix "<literal>.xsl</literal>" makes an XSL transform. Suffix
+ "<literal>.mmap</literal>" will use the MMAP transform (described below).
+ </para>
+ <para>
+ The special value "<literal>auto</literal>" will use a file
+ which is the <link linkend="requestsyntax">pz:requestsyntax's</link>
+ value followed by
+ <literal>'.xsl'</literal>.
+ </para>
+ <para>
+ When mapping MARC records, XSLT can be bypassed for increased
+ performance with the alternate "MARC map" format. Provide the
+ path of a file with extension ".mmap" containing on each line:
+ <programlisting>
+ <field> <subfield> <metadata element></programlisting>
+ For example:
+ <programlisting>
+ 245 a title
+ 500 $ description
+ 773 * citation</programlisting>
+ To map the field value specify a subfield of '$'. To store a
+ concatenation of all subfields, specify a subfield of '*'.
</para>
</listitem>
</varlistentry>
</para>
</listitem>
</varlistentry>
-
+
<varlistentry>
<term>pz:apdulog</term>
<listitem>
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term>pz:sru</term>
+ <listitem>
+ <para>
+ This setting enables
+ <ulink url="&url.sru;">SRU</ulink>/<ulink url="&url.solr;">SOLR</ulink>
+ support.
+ It has four possible settings.
+ 'get', enables SRU access through GET requests. 'post' enables SRU/POST
+ support, less commonly supported, but useful if very large requests are
+ to be submitted. 'srw' enables the SRW (SRU over SOAP) variation of
+ the protocol.
+ </para>
+ <para>
+ A value of 'solr' anables SOLR client support. This is supported
+ for Pazpar version 1.5.0 and later.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>pz:sru_version</term>
+ <listitem>
+ <para>
+ This allows SRU version to be specified. If unset Pazpar2
+ will the default of YAZ (currently 1.2). Should be set
+ to 1.1 or 1.2. For SOLR, the current supported/tested version is 1.4
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>pz:pqf_prefix</term>
+ <listitem>
+ <para>
+ Allows you to specify an arbitrary PQF query language substring.
+ The provided string is prefixed the user's query after it has been
+ normalized to PQF internally in pazpar2.
+ This allows you to attach complex 'filters' to queries for a given
+ target, sometimes necessary to select sub-catalogs
+ in union catalog systems, etc.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>pz:pqf_strftime</term>
+ <listitem>
+ <para>
+ Allows you to extend a query with dates and operators.
+ The provided string allows certain substitutions and serves as a
+ format string.
+ The special two character sequence '%%' gets converted to the
+ original query. Other characters leading with the percent sign are
+ conversions supported by strftime.
+ All other characters are copied verbatim. For example, the string
+ <literal>@and @attr 1=30 @attr 2=3 %Y %%</literal>
+ would search for current year combined with the original PQF (%%).
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>pz:sort</term>
+ <listitem>
+ <para>
+ Specifies sort criteria to be applied to the result set.
+ Only works for targets which support the sort service.
+ </para>
+ </listitem>
+ </varlistentry>
<varlistentry>
- <term>pz:sru</term>
- <listitem>
- <para>
- This setting enables SRU/SRW support. It has three possible settings.
- 'get', enables SRU access through GET requests. 'post' enables SRU/POST
- support, less commonly supported, but useful if very large requests are
- to be submitted. 'srw' enables the SRW variation of the protocol.
- </para>
- </listitem>
+ <term>pz:recordfilter</term>
+ <listitem>
+ <para>
+ Specifies a filter which allows Pazpar2 to only include
+ records that meet a certain criteria in a result. Unmatched records
+ will be ignored. The filter takes the form name, name~value, or name=value, which
+ will include only records with metadata element (name) that has the
+ substring (~value) given, or matches exactly (=value). If value is omitted all records
+ with the named
+ metadata element present will be included.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>pz:preferred</term>
+ <listitem>
+ <para>
+ Specifies that a target is preferred, e.g. possible local, faster target. Using block=pref on show command
+ will wait for all these targets to return records before releasing the block. If no target is preferred,
+ the block=pref will identical to block=1, which release when one target has returned records.
+ </para>
+ </listitem>
</varlistentry>
<varlistentry>
- <term>pz:sru_version</term>
- <listitem>
- <para>
- This allows SRU version to be specified. If unset Pazpar2
- will the default of YAZ (currently 1.2). Should be set
- to 1.1 or 1.2.
- </para>
- </listitem>
+ <term>pz:block_timeout</term>
+ <listitem>
+ <para>
+ (Not yet implemented). Specifies the time for which a block should be released anyway.
+ </para>
+ </listitem>
</varlistentry>
<varlistentry>
- <term>pz:pqf_prefix</term>
- <listitem>
- <para>
- Allows you to specify an arbitrary PQF query language substring. The provided
- string is prefixed the user's query after it has been normalized to PQF
- internally in pazpar2. This allows you to attach complex 'filters' to
- queries for a gien target, sometimes necessary to select sub-catalogs
- in union catalog systems, etc.
+ <term>pz:facetmap:<replaceable>name</replaceable></term>
+ <listitem>
+ <para>
+ Specifies that for field <replaceable>name</replaceable>, the target
+ supports (native) facets. The value is the name of the
+ field on the target.
+ </para>
+ <note>
+ <para>
+ At this point only SOLR targets have been tested with this
+ facility.
</para>
- </listitem>
+ </note>
+ </listitem>
</varlistentry>
<varlistentry>
- <term>pz:sort</term>
- <listitem>
- <para>
- Specifies sort criteria to be applied to the result set. Only works for targets
- which support the sort service.
+ <term>pz:limitmap:<replaceable>name</replaceable></term>
+ <listitem>
+ <para>
+ Specifies attributes for limiting a search to a field - using
+ the limit parameter for search. In some cases the mapping of
+ a field to a value is identical to an existing cclmap field; in
+ other cases the field must be specified in a different way - for
+ example to match a complete field (rather than parts of a subfield).
+ </para>
+ <para>
+ The value of limitmap may have one of two forms: referral to
+ an exisiting CCL field or a raw PQF string. Leading string
+ determines type; either <literal>ccl:</literal> for CCL field or
+ <literal>rpn:</literal> for PQF/RPN.
+ </para>
+ <note>
+ <para>
+ The limitmap facility is supported for Pazpar2 version 1.6.0.
</para>
- </listitem>
+ </note>
+ </listitem>
</varlistentry>
+
</variablelist>
- </refsect2>
+ </refsect2>
+
</refsect1>
<refsect1><title>SEE ALSO</title>
<para>