-<chapter id="administration">
- <!-- $Id: administration.xml,v 1.34 2006-06-12 09:39:18 marc Exp $ -->
- <title>Administrating Zebra</title>
- <!-- ### It's a bit daft that this chapter (which describes half of
- the configuration-file formats) is separated from
- "recordmodel-grs.xml" (which describes the other half) by the
- instructions on running zebraidx and zebrasrv. Some careful
- re-ordering is required here.
- -->
+ <chapter id="administration">
+ <title>Administrating &zebra;</title>
+ <!-- ### It's a bit daft that this chapter (which describes half of
+ the configuration-file formats) is separated from
+ "recordmodel-grs.xml" (which describes the other half) by the
+ instructions on running zebraidx and zebrasrv. Some careful
+ re-ordering is required here.
+ -->
- <para>
- Unlike many simpler retrieval systems, Zebra supports safe, incremental
- updates to an existing index.
- </para>
-
- <para>
- Normally, when Zebra modifies the index it reads a number of records
- that you specify.
- Depending on your specifications and on the contents of each record
- one the following events take place for each record:
- <variablelist>
-
- <varlistentry>
- <term>Insert</term>
- <listitem>
- <para>
- The record is indexed as if it never occurred before.
- Either the Zebra system doesn't know how to identify the record or
- Zebra can identify the record but didn't find it to be already indexed.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>Modify</term>
- <listitem>
- <para>
- The record has already been indexed.
- In this case either the contents of the record or the location
- (file) of the record indicates that it has been indexed before.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>Delete</term>
- <listitem>
- <para>
- The record is deleted from the index. As in the
- update-case it must be able to identify the record.
- </para>
- </listitem>
- </varlistentry>
- </variablelist>
- </para>
-
- <para>
- Please note that in both the modify- and delete- case the Zebra
- indexer must be able to generate a unique key that identifies the record
- in question (more on this below).
- </para>
-
- <para>
- To administrate the Zebra retrieval system, you run the
- <literal>zebraidx</literal> program.
- This program supports a number of options which are preceded by a dash,
- and a few commands (not preceded by dash).
-</para>
-
- <para>
- Both the Zebra administrative tool and the Z39.50 server share a
- set of index files and a global configuration file.
- The name of the configuration file defaults to
- <literal>zebra.cfg</literal>.
- The configuration file includes specifications on how to index
- various kinds of records and where the other configuration files
- are located. <literal>zebrasrv</literal> and <literal>zebraidx</literal>
- <emphasis>must</emphasis> be run in the directory where the
- configuration file lives unless you indicate the location of the
- configuration file by option <literal>-c</literal>.
- </para>
-
- <sect1 id="record-types">
- <title>Record Types</title>
-
- <para>
- Indexing is a per-record process, in which either insert/modify/delete
- will occur. Before a record is indexed search keys are extracted from
- whatever might be the layout the original record (sgml,html,text, etc..).
- The Zebra system currently supports two fundamental types of records:
- structured and simple text.
- To specify a particular extraction process, use either the
- command line option <literal>-t</literal> or specify a
- <literal>recordType</literal> setting in the configuration file.
- </para>
-
- </sect1>
-
- <sect1 id="configuration-file">
- <title>The Zebra Configuration File</title>
-
- <para>
- The Zebra configuration file, read by <literal>zebraidx</literal> and
- <literal>zebrasrv</literal> defaults to <literal>zebra.cfg</literal>
- unless specified by <literal>-c</literal> option.
- </para>
-
- <para>
- You can edit the configuration file with a normal text editor.
- parameter names and values are separated by colons in the file. Lines
- starting with a hash sign (<literal>#</literal>) are
- treated as comments.
- </para>
-
<para>
- If you manage different sets of records that share common
- characteristics, you can organize the configuration settings for each
- type into "groups".
- When <literal>zebraidx</literal> is run and you wish to address a
- given group you specify the group name with the <literal>-g</literal>
- option.
- In this case settings that have the group name as their prefix
- will be used by <literal>zebraidx</literal>.
- If no <literal>-g</literal> option is specified, the settings
- without prefix are used.
+ Unlike many simpler retrieval systems, &zebra; supports safe, incremental
+ updates to an existing index.
</para>
-
- <para>
- In the configuration file, the group name is placed before the option
- name itself, separated by a dot (.). For instance, to set the record type
- for group <literal>public</literal> to <literal>grs.sgml</literal>
- (the SGML-like format for structured records) you would write:
- </para>
-
- <para>
- <screen>
- public.recordType: grs.sgml
- </screen>
- </para>
-
- <para>
- To set the default value of the record type to <literal>text</literal>
- write:
- </para>
-
- <para>
- <screen>
- recordType: text
- </screen>
- </para>
-
- <para>
- The available configuration settings are summarized below. They will be
- explained further in the following sections.
- </para>
-
- <!--
- FIXME - Didn't Adam make something to have multiple databases in multiple dirs...
- -->
-
+
<para>
+ Normally, when &zebra; modifies the index it reads a number of records
+ that you specify.
+ Depending on your specifications and on the contents of each record
+ one the following events take place for each record:
<variablelist>
-
- <varlistentry>
- <term>
- <emphasis>group</emphasis>
- .recordType[<emphasis>.name</emphasis>]:
- <replaceable>type</replaceable>
- </term>
- <listitem>
- <para>
- Specifies how records with the file extension
- <emphasis>name</emphasis> should be handled by the indexer.
- This option may also be specified as a command line option
- (<literal>-t</literal>). Note that if you do not specify a
- <emphasis>name</emphasis>, the setting applies to all files.
- In general, the record type specifier consists of the elements (each
- element separated by dot), <emphasis>fundamental-type</emphasis>,
- <emphasis>file-read-type</emphasis> and arguments. Currently, two
- fundamental types exist, <literal>text</literal> and
- <literal>grs</literal>.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>group</emphasis>.recordId:
- <replaceable>record-id-spec</replaceable></term>
- <listitem>
- <para>
- Specifies how the records are to be identified when updated. See
- <xref linkend="locating-records"/>.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>group</emphasis>.database:
- <replaceable>database</replaceable></term>
- <listitem>
- <para>
- Specifies the Z39.50 database name.
- <!-- FIXME - now we can have multiple databases in one server. -H -->
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>group</emphasis>.storeKeys:
- <replaceable>boolean</replaceable></term>
- <listitem>
- <para>
- Specifies whether key information should be saved for a given
- group of records. If you plan to update/delete this type of
- records later this should be specified as 1; otherwise it
- should be 0 (default), to save register space.
- <!-- ### this is the first mention of "register" -->
- See <xref linkend="file-ids"/>.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>group</emphasis>.storeData:
- <replaceable>boolean</replaceable></term>
- <listitem>
- <para>
- Specifies whether the records should be stored internally
- in the Zebra system files.
- If you want to maintain the raw records yourself,
- this option should be false (0).
- If you want Zebra to take care of the records for you, it
- should be true(1).
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <!-- ### probably a better place to define "register" -->
- <term>register: <replaceable>register-location</replaceable></term>
- <listitem>
- <para>
- Specifies the location of the various register files that Zebra uses
- to represent your databases.
- See <xref linkend="register-location"/>.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>shadow: <replaceable>register-location</replaceable></term>
- <listitem>
- <para>
- Enables the <emphasis>safe update</emphasis> facility of Zebra, and
- tells the system where to place the required, temporary files.
- See <xref linkend="shadow-registers"/>.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>lockDir: <replaceable>directory</replaceable></term>
- <listitem>
- <para>
- Directory in which various lock files are stored.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>keyTmpDir: <replaceable>directory</replaceable></term>
- <listitem>
- <para>
- Directory in which temporary files used during zebraidx's update
- phase are stored.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>setTmpDir: <replaceable>directory</replaceable></term>
- <listitem>
- <para>
- Specifies the directory that the server uses for temporary result sets.
- If not specified <literal>/tmp</literal> will be used.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>profilePath: <replaceable>path</replaceable></term>
- <listitem>
- <para>
- Specifies a path of profile specification files.
- The path is composed of one or more directories separated by
- colon. Similar to PATH for UNIX systems.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>attset: <replaceable>filename</replaceable></term>
- <listitem>
- <para>
- Specifies the filename(s) of attribute set files for use in
- searching. At least the Bib-1 set should be loaded
- (<literal>bib1.att</literal>).
- The <literal>profilePath</literal> setting is used to look for
- the specified files.
- See <xref linkend="attset-files"/>
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>memMax: <replaceable>size</replaceable></term>
- <listitem>
- <para>
- Specifies <replaceable>size</replaceable> of internal memory
- to use for the zebraidx program.
- The amount is given in megabytes - default is 4 (4 MB).
- The more memory, the faster large updates happen, up to about
- half the free memory available on the computer.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>tempfiles: <replaceable>Yes/Auto/No</replaceable></term>
- <listitem>
- <para>
- Tells zebra if it should use temporary files when indexing. The
- default is Auto, in which case zebra uses temporary files only
- if it would need more that <replaceable>memMax</replaceable>
- megabytes of memory. This should be good for most uses.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>root: <replaceable>dir</replaceable></term>
- <listitem>
- <para>
- Specifies a directory base for Zebra. All relative paths
- given (in profilePath, register, shadow) are based on this
- directory. This setting is useful if your Zebra server
- is running in a different directory from where
- <literal>zebra.cfg</literal> is located.
- </para>
- </listitem>
- </varlistentry>
<varlistentry>
- <term>passwd: <replaceable>file</replaceable></term>
+ <term>Insert</term>
<listitem>
<para>
- Specifies a file with description of user accounts for Zebra.
- The format is similar to that known to Apache's htpasswd files
- and UNIX' passwd files. Non-empty lines not beginning with
- # are considered account lines. There is one account per-line.
- A line consists of fields separate by a single colon character.
- First field is username, second is password.
+ The record is indexed as if it never occurred before.
+ Either the &zebra; system doesn't know how to identify the record or
+ &zebra; can identify the record but didn't find it to be already indexed.
</para>
</listitem>
</varlistentry>
-
<varlistentry>
- <term>passwd.c: <replaceable>file</replaceable></term>
+ <term>Modify</term>
<listitem>
<para>
- Specifies a file with description of user accounts for Zebra.
- File format is similar to that used by the passwd directive except
- that the password are encrypted. Use Apache's htpasswd or similar
- for maintenanace.
+ The record has already been indexed.
+ In this case either the contents of the record or the location
+ (file) of the record indicates that it has been indexed before.
</para>
</listitem>
</varlistentry>
-
<varlistentry>
- <term>perm.<replaceable>user</replaceable>:
- <replaceable>permstring</replaceable></term>
+ <term>Delete</term>
<listitem>
<para>
- Specifies permissions (priviledge) for a user that are allowed
- to access Zebra via the passwd system. There are two kinds
- of permissions currently: read (r) and write(w). By default
- users not listed in a permission directive are given the read
- priviledge. To specify permissions for a user with no
- username, or Z39.50 anonymous style use
- <literal>anonymous</literal>. The permstring consists of
- a sequence of characters. Include character <literal>w</literal>
- for write/update access, <literal>r</literal> for read access.
+ The record is deleted from the index. As in the
+ update-case it must be able to identify the record.
</para>
</listitem>
</varlistentry>
-
- <varlistentry>
- <term>dbaccess <replaceable>accessfile</replaceable></term>
- <listitem>
- <para>
- Names a file which lists database subscriptions for individual users.
- The access file should consists of lines of the form <literal>username:
- dbnames</literal>, where dbnames is a list of database names, seprated by
- '+'. No whitespace is allowed in the database list.
- </para>
- </listitem>
- </varlistentry>
-
</variablelist>
</para>
-
- </sect1>
-
- <sect1 id="locating-records">
- <title>Locating Records</title>
-
- <para>
- The default behavior of the Zebra system is to reference the
- records from their original location, i.e. where they were found when you
- ran <literal>zebraidx</literal>.
- That is, when a client wishes to retrieve a record
- following a search operation, the files are accessed from the place
- where you originally put them - if you remove the files (without
- running <literal>zebraidx</literal> again, the server will return
- diagnostic number 14 (``System error in presenting records'') to
- the client.
- </para>
-
- <para>
- If your input files are not permanent - for example if you retrieve
- your records from an outside source, or if they were temporarily
- mounted on a CD-ROM drive,
- you may want Zebra to make an internal copy of them. To do this,
- you specify 1 (true) in the <literal>storeData</literal> setting. When
- the Z39.50 server retrieves the records they will be read from the
- internal file structures of the system.
- </para>
-
- </sect1>
-
- <sect1 id="simple-indexing">
- <title>Indexing with no Record IDs (Simple Indexing)</title>
-
- <para>
- If you have a set of records that are not expected to change over time
- you may can build your database without record IDs.
- This indexing method uses less space than the other methods and
- is simple to use.
- </para>
-
- <para>
- To use this method, you simply omit the <literal>recordId</literal> entry
- for the group of files that you index. To add a set of records you use
- <literal>zebraidx</literal> with the <literal>update</literal> command. The
- <literal>update</literal> command will always add all of the records that it
- encounters to the index - whether they have already been indexed or
- not. If the set of indexed files change, you should delete all of the
- index files, and build a new index from scratch.
- </para>
-
- <para>
- Consider a system in which you have a group of text files called
- <literal>simple</literal>.
- That group of records should belong to a Z39.50 database called
- <literal>textbase</literal>.
- The following <literal>zebra.cfg</literal> file will suffice:
- </para>
- <para>
-
- <screen>
- profilePath: /usr/local/idzebra/tab
- attset: bib1.att
- simple.recordType: text
- simple.database: textbase
- </screen>
- </para>
-
<para>
- Since the existing records in an index can not be addressed by their
- IDs, it is impossible to delete or modify records when using this method.
+ Please note that in both the modify- and delete- case the &zebra;
+ indexer must be able to generate a unique key that identifies the record
+ in question (more on this below).
</para>
-
- </sect1>
-
- <sect1 id="file-ids">
- <title>Indexing with File Record IDs</title>
-
- <para>
- If you have a set of files that regularly change over time: Old files
- are deleted, new ones are added, or existing files are modified, you
- can benefit from using the <emphasis>file ID</emphasis>
- indexing methodology.
- Examples of this type of database might include an index of WWW
- resources, or a USENET news spool area.
- Briefly speaking, the file key methodology uses the directory paths
- of the individual records as a unique identifier for each record.
- To perform indexing of a directory with file keys, again, you specify
- the top-level directory after the <literal>update</literal> command.
- The command will recursively traverse the directories and compare
- each one with whatever have been indexed before in that same directory.
- If a file is new (not in the previous version of the directory) it
- is inserted into the registers; if a file was already indexed and
- it has been modified since the last update, the index is also
- modified; if a file has been removed since the last
- visit, it is deleted from the index.
- </para>
-
+
<para>
- The resulting system is easy to administrate. To delete a record you
- simply have to delete the corresponding file (say, with the
- <literal>rm</literal> command). And to add records you create new
- files (or directories with files). For your changes to take effect
- in the register you must run <literal>zebraidx update</literal> with
- the same directory root again. This mode of operation requires more
- disk space than simpler indexing methods, but it makes it easier for
- you to keep the index in sync with a frequently changing set of data.
- If you combine this system with the <emphasis>safe update</emphasis>
- facility (see below), you never have to take your server off-line for
- maintenance or register updating purposes.
+ To administrate the &zebra; retrieval system, you run the
+ <literal>zebraidx</literal> program.
+ This program supports a number of options which are preceded by a dash,
+ and a few commands (not preceded by dash).
</para>
-
+
<para>
- To enable indexing with pathname IDs, you must specify
- <literal>file</literal> as the value of <literal>recordId</literal>
- in the configuration file. In addition, you should set
- <literal>storeKeys</literal> to <literal>1</literal>, since the Zebra
- indexer must save additional information about the contents of each record
- in order to modify the indexes correctly at a later time.
+ Both the &zebra; administrative tool and the &acro.z3950; server share a
+ set of index files and a global configuration file.
+ The name of the configuration file defaults to
+ <literal>zebra.cfg</literal>.
+ The configuration file includes specifications on how to index
+ various kinds of records and where the other configuration files
+ are located. <literal>zebrasrv</literal> and <literal>zebraidx</literal>
+ <emphasis>must</emphasis> be run in the directory where the
+ configuration file lives unless you indicate the location of the
+ configuration file by option <literal>-c</literal>.
</para>
-
+
+ <sect1 id="record-types">
+ <title>Record Types</title>
+
+ <para>
+ Indexing is a per-record process, in which either insert/modify/delete
+ will occur. Before a record is indexed search keys are extracted from
+ whatever might be the layout the original record (sgml,html,text, etc..).
+ The &zebra; system currently supports two fundamental types of records:
+ structured and simple text.
+ To specify a particular extraction process, use either the
+ command line option <literal>-t</literal> or specify a
+ <literal>recordType</literal> setting in the configuration file.
+ </para>
+
+ </sect1>
+
+ <sect1 id="zebra-cfg">
+ <title>The &zebra; Configuration File</title>
+
+ <para>
+ The &zebra; configuration file, read by <literal>zebraidx</literal> and
+ <literal>zebrasrv</literal> defaults to <literal>zebra.cfg</literal>
+ unless specified by <literal>-c</literal> option.
+ </para>
+
+ <para>
+ You can edit the configuration file with a normal text editor.
+ parameter names and values are separated by colons in the file. Lines
+ starting with a hash sign (<literal>#</literal>) are
+ treated as comments.
+ </para>
+
+ <para>
+ If you manage different sets of records that share common
+ characteristics, you can organize the configuration settings for each
+ type into "groups".
+ When <literal>zebraidx</literal> is run and you wish to address a
+ given group you specify the group name with the <literal>-g</literal>
+ option.
+ In this case settings that have the group name as their prefix
+ will be used by <literal>zebraidx</literal>.
+ If no <literal>-g</literal> option is specified, the settings
+ without prefix are used.
+ </para>
+
+ <para>
+ In the configuration file, the group name is placed before the option
+ name itself, separated by a dot (.). For instance, to set the record type
+ for group <literal>public</literal> to <literal>grs.sgml</literal>
+ (the &acro.sgml;-like format for structured records) you would write:
+ </para>
+
+ <para>
+ <screen>
+ public.recordType: grs.sgml
+ </screen>
+ </para>
+
+ <para>
+ To set the default value of the record type to <literal>text</literal>
+ write:
+ </para>
+
+ <para>
+ <screen>
+ recordType: text
+ </screen>
+ </para>
+
+ <para>
+ The available configuration settings are summarized below. They will be
+ explained further in the following sections.
+ </para>
+
<!--
- FIXME - There must be a simpler way to do this with Adams string tags -H
- -->
+ FIXME - Didn't Adam make something to have multiple databases in multiple dirs...
+ -->
- <para>
- For example, to update records of group <literal>esdd</literal>
- located below
- <literal>/data1/records/</literal> you should type:
- <screen>
- $ zebraidx -g esdd update /data1/records
- </screen>
- </para>
-
- <para>
- The corresponding configuration file includes:
- <screen>
- esdd.recordId: file
- esdd.recordType: grs.sgml
- esdd.storeKeys: 1
- </screen>
- </para>
-
- <note>
- <para>You cannot start out with a group of records with simple
- indexing (no record IDs as in the previous section) and then later
- enable file record Ids. Zebra must know from the first time that you
- index the group that
- the files should be indexed with file record IDs.
+ <para>
+ <variablelist>
+
+ <varlistentry>
+ <term>
+ <emphasis>group</emphasis>
+ .recordType[<emphasis>.name</emphasis>]:
+ <replaceable>type</replaceable>
+ </term>
+ <listitem>
+ <para>
+ Specifies how records with the file extension
+ <emphasis>name</emphasis> should be handled by the indexer.
+ This option may also be specified as a command line option
+ (<literal>-t</literal>). Note that if you do not specify a
+ <emphasis>name</emphasis>, the setting applies to all files.
+ In general, the record type specifier consists of the elements (each
+ element separated by dot), <emphasis>fundamental-type</emphasis>,
+ <emphasis>file-read-type</emphasis> and arguments. Currently, two
+ fundamental types exist, <literal>text</literal> and
+ <literal>grs</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>group</emphasis>.recordId:
+ <replaceable>record-id-spec</replaceable></term>
+ <listitem>
+ <para>
+ Specifies how the records are to be identified when updated. See
+ <xref linkend="locating-records"/>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>group</emphasis>.database:
+ <replaceable>database</replaceable></term>
+ <listitem>
+ <para>
+ Specifies the &acro.z3950; database name.
+ <!-- FIXME - now we can have multiple databases in one server. -H -->
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>group</emphasis>.storeKeys:
+ <replaceable>boolean</replaceable></term>
+ <listitem>
+ <para>
+ Specifies whether key information should be saved for a given
+ group of records. If you plan to update/delete this type of
+ records later this should be specified as 1; otherwise it
+ should be 0 (default), to save register space.
+ <!-- ### this is the first mention of "register" -->
+ See <xref linkend="file-ids"/>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>group</emphasis>.storeData:
+ <replaceable>boolean</replaceable></term>
+ <listitem>
+ <para>
+ Specifies whether the records should be stored internally
+ in the &zebra; system files.
+ If you want to maintain the raw records yourself,
+ this option should be false (0).
+ If you want &zebra; to take care of the records for you, it
+ should be true(1).
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <!-- ### probably a better place to define "register" -->
+ <term>register: <replaceable>register-location</replaceable></term>
+ <listitem>
+ <para>
+ Specifies the location of the various register files that &zebra; uses
+ to represent your databases.
+ See <xref linkend="register-location"/>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>shadow: <replaceable>register-location</replaceable></term>
+ <listitem>
+ <para>
+ Enables the <emphasis>safe update</emphasis> facility of &zebra;, and
+ tells the system where to place the required, temporary files.
+ See <xref linkend="shadow-registers"/>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lockDir: <replaceable>directory</replaceable></term>
+ <listitem>
+ <para>
+ Directory in which various lock files are stored.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>keyTmpDir: <replaceable>directory</replaceable></term>
+ <listitem>
+ <para>
+ Directory in which temporary files used during zebraidx's update
+ phase are stored.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>setTmpDir: <replaceable>directory</replaceable></term>
+ <listitem>
+ <para>
+ Specifies the directory that the server uses for temporary result sets.
+ If not specified <literal>/tmp</literal> will be used.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>profilePath: <replaceable>path</replaceable></term>
+ <listitem>
+ <para>
+ Specifies a path of profile specification files.
+ The path is composed of one or more directories separated by
+ colon. Similar to <literal>PATH</literal> for UNIX systems.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>modulePath: <replaceable>path</replaceable></term>
+ <listitem>
+ <para>
+ Specifies a path of record filter modules.
+ The path is composed of one or more directories separated by
+ colon. Similar to <literal>PATH</literal> for UNIX systems.
+ The 'make install' procedure typically puts modules in
+ <filename>/usr/local/lib/idzebra-2.0/modules</filename>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>index: <replaceable>filename</replaceable></term>
+ <listitem>
+ <para>
+ Defines the filename which holds fields structure
+ definitions. If omitted, the file <filename>default.idx</filename>
+ is read.
+ Refer to <xref linkend="default-idx-file"/> for
+ more information.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>sortmax: <replaceable>integer</replaceable></term>
+ <listitem>
+ <para>
+ Specifies the maximum number of records that will be sorted
+ in a result set. If the result set contains more than
+ <replaceable>integer</replaceable> records, records after the
+ limit will not be sorted. If omitted, the default value is
+ 1,000.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>staticrank: <replaceable>integer</replaceable></term>
+ <listitem>
+ <para>
+ Enables whether static ranking is to be enabled (1) or
+ disabled (0). If omitted, it is disabled - corresponding
+ to a value of 0.
+ Refer to <xref linkend="administration-ranking-static"/> .
+ </para>
+ </listitem>
+ </varlistentry>
+
+
+ <varlistentry>
+ <term>estimatehits: <replaceable>integer</replaceable></term>
+ <listitem>
+ <para>
+ Controls whether &zebra; should calculate approximate hit counts and
+ at which hit count it is to be enabled.
+ A value of 0 disables approximate hit counts.
+ For a positive value approximate hit count is enabled
+ if it is known to be larger than <replaceable>integer</replaceable>.
+ </para>
+ <para>
+ Approximate hit counts can also be triggered by a particular
+ attribute in a query.
+ Refer to <xref linkend="querymodel-zebra-global-attr-limit"/>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>attset: <replaceable>filename</replaceable></term>
+ <listitem>
+ <para>
+ Specifies the filename(s) of attribute set files for use in
+ searching. In many configurations <filename>bib1.att</filename>
+ is used, but that is not required. If Classic Explain
+ attributes is to be used for searching,
+ <filename>explain.att</filename> must be given.
+ The path to att-files in general can be given using
+ <literal>profilePath</literal> setting.
+ See also <xref linkend="attset-files"/>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>memMax: <replaceable>size</replaceable></term>
+ <listitem>
+ <para>
+ Specifies <replaceable>size</replaceable> of internal memory
+ to use for the zebraidx program.
+ The amount is given in megabytes - default is 4 (4 MB).
+ The more memory, the faster large updates happen, up to about
+ half the free memory available on the computer.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>tempfiles: <replaceable>Yes/Auto/No</replaceable></term>
+ <listitem>
+ <para>
+ Tells zebra if it should use temporary files when indexing. The
+ default is Auto, in which case zebra uses temporary files only
+ if it would need more that <replaceable>memMax</replaceable>
+ megabytes of memory. This should be good for most uses.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>root: <replaceable>dir</replaceable></term>
+ <listitem>
+ <para>
+ Specifies a directory base for &zebra;. All relative paths
+ given (in profilePath, register, shadow) are based on this
+ directory. This setting is useful if your &zebra; server
+ is running in a different directory from where
+ <literal>zebra.cfg</literal> is located.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>passwd: <replaceable>file</replaceable></term>
+ <listitem>
+ <para>
+ Specifies a file with description of user accounts for &zebra;.
+ The format is similar to that known to Apache's htpasswd files
+ and UNIX' passwd files. Non-empty lines not beginning with
+ # are considered account lines. There is one account per-line.
+ A line consists of fields separate by a single colon character.
+ First field is username, second is password.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>passwd.c: <replaceable>file</replaceable></term>
+ <listitem>
+ <para>
+ Specifies a file with description of user accounts for &zebra;.
+ File format is similar to that used by the passwd directive except
+ that the password are encrypted. Use Apache's htpasswd or similar
+ for maintenance.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>perm.<replaceable>user</replaceable>:
+ <replaceable>permstring</replaceable></term>
+ <listitem>
+ <para>
+ Specifies permissions (privilege) for a user that are allowed
+ to access &zebra; via the passwd system. There are two kinds
+ of permissions currently: read (r) and write(w). By default
+ users not listed in a permission directive are given the read
+ privilege. To specify permissions for a user with no
+ username, or &acro.z3950; anonymous style use
+ <literal>anonymous</literal>. The permstring consists of
+ a sequence of characters. Include character <literal>w</literal>
+ for write/update access, <literal>r</literal> for read access and
+ <literal>a</literal> to allow anonymous access through this account.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>dbaccess: <replaceable>accessfile</replaceable></term>
+ <listitem>
+ <para>
+ Names a file which lists database subscriptions for individual users.
+ The access file should consists of lines of the form
+ <literal>username: dbnames</literal>, where dbnames is a list of
+ database names, separated by '+'. No whitespace is allowed in the
+ database list.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>encoding: <replaceable>charsetname</replaceable></term>
+ <listitem>
+ <para>
+ Tells &zebra; to interpret the terms in Z39.50 queries as
+ having been encoded using the specified character
+ encoding. The default is <literal>ISO-8859-1</literal>; one
+ useful alternative is <literal>UTF-8</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>storeKeys: <replaceable>value</replaceable></term>
+ <listitem>
+ <para>
+ Specifies whether &zebra; keeps a copy of indexed keys.
+ Use a value of 1 to enable; 0 to disable. If storeKeys setting is
+ omitted, it is enabled. Enabled storeKeys
+ are required for updating and deleting records. Disable only
+ storeKeys to save space and only plan to index data once.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>storeData: <replaceable>value</replaceable></term>
+ <listitem>
+ <para>
+ Specifies whether &zebra; keeps a copy of indexed records.
+ Use a value of 1 to enable; 0 to disable. If storeData setting is
+ omitted, it is enabled. A storeData setting of 0 (disabled) makes
+ Zebra fetch records from the original locaction in the file
+ system using filename, file offset and file length. For the
+ DOM and ALVIS filter, the storeData setting is ignored.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
</para>
- </note>
-
- <para>
- You cannot explicitly delete records when using this method (using the
- <literal>delete</literal> command to <literal>zebraidx</literal>. Instead
- you have to delete the files from the file system (or move them to a
- different location)
- and then run <literal>zebraidx</literal> with the
- <literal>update</literal> command.
- </para>
- <!-- ### what happens if a file contains multiple records? -->
-</sect1>
-
- <sect1 id="generic-ids">
- <title>Indexing with General Record IDs</title>
-
- <para>
- When using this method you construct an (almost) arbitrary, internal
- record key based on the contents of the record itself and other system
- information. If you have a group of records that explicitly associates
- an ID with each record, this method is convenient. For example, the
- record format may contain a title or a ID-number - unique within the group.
- In either case you specify the Z39.50 attribute set and use-attribute
- location in which this information is stored, and the system looks at
- that field to determine the identity of the record.
- </para>
-
- <para>
- As before, the record ID is defined by the <literal>recordId</literal>
- setting in the configuration file. The value of the record ID specification
- consists of one or more tokens separated by whitespace. The resulting
- ID is represented in the index by concatenating the tokens and
- separating them by ASCII value (1).
- </para>
-
- <para>
- There are three kinds of tokens:
- <variablelist>
-
- <varlistentry>
- <term>Internal record info</term>
- <listitem>
- <para>
- The token refers to a key that is
- extracted from the record. The syntax of this token is
- <literal>(</literal> <emphasis>set</emphasis> <literal>,</literal>
- <emphasis>use</emphasis> <literal>)</literal>,
- where <emphasis>set</emphasis> is the
- attribute set name <emphasis>use</emphasis> is the
- name or value of the attribute.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>System variable</term>
- <listitem>
- <para>
- The system variables are preceded by
-
- <screen>
- $
- </screen>
- and immediately followed by the system variable name, which
- may one of
- <variablelist>
-
- <varlistentry>
- <term>group</term>
- <listitem>
- <para>
- Group name.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>database</term>
- <listitem>
- <para>
- Current database specified.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>type</term>
- <listitem>
- <para>
- Record type.
- </para>
- </listitem>
- </varlistentry>
- </variablelist>
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>Constant string</term>
- <listitem>
- <para>
- A string used as part of the ID — surrounded
- by single- or double quotes.
- </para>
- </listitem>
- </varlistentry>
- </variablelist>
- </para>
-
- <para>
- For instance, the sample GILS records that come with the Zebra
- distribution contain a unique ID in the data tagged Control-Identifier.
- The data is mapped to the Bib-1 use attribute Identifier-standard
- (code 1007). To use this field as a record id, specify
- <literal>(bib1,Identifier-standard)</literal> as the value of the
- <literal>recordId</literal> in the configuration file.
- If you have other record types that uses the same field for a
- different purpose, you might add the record type
- (or group or database name) to the record id of the gils
- records as well, to prevent matches with other types of records.
- In this case the recordId might be set like this:
-
- <screen>
- gils.recordId: $type (bib1,Identifier-standard)
- </screen>
-
- </para>
-
- <para>
- (see <xref linkend="record-model-grs"/>
- for details of how the mapping between elements of your records and
- searchable attributes is established).
- </para>
-
- <para>
- As for the file record ID case described in the previous section,
- updating your system is simply a matter of running
- <literal>zebraidx</literal>
- with the <literal>update</literal> command. However, the update with general
- keys is considerably slower than with file record IDs, since all files
- visited must be (re)read to discover their IDs.
- </para>
-
- <para>
- As you might expect, when using the general record IDs
- method, you can only add or modify existing records with the
- <literal>update</literal> command.
- If you wish to delete records, you must use the,
- <literal>delete</literal> command, with a directory as a parameter.
- This will remove all records that match the files below that root
- directory.
- </para>
-
- </sect1>
-
- <sect1 id="register-location">
- <title>Register Location</title>
-
- <para>
- Normally, the index files that form dictionaries, inverted
- files, record info, etc., are stored in the directory where you run
- <literal>zebraidx</literal>. If you wish to store these, possibly large,
- files somewhere else, you must add the <literal>register</literal>
- entry to the <literal>zebra.cfg</literal> file.
- Furthermore, the Zebra system allows its file
- structures to span multiple file systems, which is useful for
- managing very large databases.
- </para>
-
- <para>
- The value of the <literal>register</literal> setting is a sequence
- of tokens. Each token takes the form:
-
- <screen>
- <emphasis>dir</emphasis><literal>:</literal><emphasis>size</emphasis>.
- </screen>
-
- The <emphasis>dir</emphasis> specifies a directory in which index files
- will be stored and the <emphasis>size</emphasis> specifies the maximum
- size of all files in that directory. The Zebra indexer system fills
- each directory in the order specified and use the next specified
- directories as needed.
- The <emphasis>size</emphasis> is an integer followed by a qualifier
- code,
- <literal>b</literal> for bytes,
- <literal>k</literal> for kilobytes.
- <literal>M</literal> for megabytes,
- <literal>G</literal> for gigabytes.
- </para>
-
- <para>
- For instance, if you have allocated two disks for your register, and
- the first disk is mounted
- on <literal>/d1</literal> and has 2GB of free space and the
- second, mounted on <literal>/d2</literal> has 3.6 GB, you could
- put this entry in your configuration file:
-
- <screen>
- register: /d1:2G /d2:3600M
- </screen>
-
- </para>
-
- <para>
- Note that Zebra does not verify that the amount of space specified is
- actually available on the directory (file system) specified - it is
- your responsibility to ensure that enough space is available, and that
- other applications do not attempt to use the free space. In a large
- production system, it is recommended that you allocate one or more
- file system exclusively to the Zebra register files.
- </para>
-
- </sect1>
-
- <sect1 id="shadow-registers">
- <title>Safe Updating - Using Shadow Registers</title>
-
- <sect2>
- <title>Description</title>
-
+
+ </sect1>
+
+ <sect1 id="locating-records">
+ <title>Locating Records</title>
+
<para>
- The Zebra server supports <emphasis>updating</emphasis> of the index
- structures. That is, you can add, modify, or remove records from
- databases managed by Zebra without rebuilding the entire index.
- Since this process involves modifying structured files with various
- references between blocks of data in the files, the update process
- is inherently sensitive to system crashes, or to process interruptions:
- Anything but a successfully completed update process will leave the
- register files in an unknown state, and you will essentially have no
- recourse but to re-index everything, or to restore the register files
- from a backup medium.
- Further, while the update process is active, users cannot be
- allowed to access the system, as the contents of the register files
- may change unpredictably.
+ The default behavior of the &zebra; system is to reference the
+ records from their original location, i.e. where they were found when you
+ run <literal>zebraidx</literal>.
+ That is, when a client wishes to retrieve a record
+ following a search operation, the files are accessed from the place
+ where you originally put them - if you remove the files (without
+ running <literal>zebraidx</literal> again, the server will return
+ diagnostic number 14 (``System error in presenting records'') to
+ the client.
</para>
-
+
<para>
- You can solve these problems by enabling the shadow register system in
- Zebra.
- During the updating procedure, <literal>zebraidx</literal> will temporarily
- write changes to the involved files in a set of "shadow
- files", without modifying the files that are accessed by the
- active server processes. If the update procedure is interrupted by a
- system crash or a signal, you simply repeat the procedure - the
- register files have not been changed or damaged, and the partially
- written shadow files are automatically deleted before the new updating
- procedure commences.
+ If your input files are not permanent - for example if you retrieve
+ your records from an outside source, or if they were temporarily
+ mounted on a CD-ROM drive,
+ you may want &zebra; to make an internal copy of them. To do this,
+ you specify 1 (true) in the <literal>storeData</literal> setting. When
+ the &acro.z3950; server retrieves the records they will be read from the
+ internal file structures of the system.
</para>
-
+
+ </sect1>
+
+ <sect1 id="simple-indexing">
+ <title>Indexing with no Record IDs (Simple Indexing)</title>
+
<para>
- At the end of the updating procedure (or in a separate operation, if
- you so desire), the system enters a "commit mode". First,
- any active server processes are forced to access those blocks that
- have been changed from the shadow files rather than from the main
- register files; the unmodified blocks are still accessed at their
- normal location (the shadow files are not a complete copy of the
- register files - they only contain those parts that have actually been
- modified). If the commit process is interrupted at any point during the
- commit process, the server processes will continue to access the
- shadow files until you can repeat the commit procedure and complete
- the writing of data to the main register files. You can perform
- multiple update operations to the registers before you commit the
- changes to the system files, or you can execute the commit operation
- at the end of each update operation. When the commit phase has
- completed successfully, any running server processes are instructed to
- switch their operations to the new, operational register, and the
- temporary shadow files are deleted.
+ If you have a set of records that are not expected to change over time
+ you may can build your database without record IDs.
+ This indexing method uses less space than the other methods and
+ is simple to use.
</para>
-
- </sect2>
-
- <sect2>
- <title>How to Use Shadow Register Files</title>
-
+
<para>
- The first step is to allocate space on your system for the shadow
- files.
- You do this by adding a <literal>shadow</literal> entry to the
- <literal>zebra.cfg</literal> file.
- The syntax of the <literal>shadow</literal> entry is exactly the
- same as for the <literal>register</literal> entry
- (see <xref linkend="register-location"/>).
- The location of the shadow area should be
- <emphasis>different</emphasis> from the location of the main register
- area (if you have specified one - remember that if you provide no
- <literal>register</literal> setting, the default register area is the
- working directory of the server and indexing processes).
+ To use this method, you simply omit the <literal>recordId</literal> entry
+ for the group of files that you index. To add a set of records you use
+ <literal>zebraidx</literal> with the <literal>update</literal> command. The
+ <literal>update</literal> command will always add all of the records that it
+ encounters to the index - whether they have already been indexed or
+ not. If the set of indexed files change, you should delete all of the
+ index files, and build a new index from scratch.
</para>
-
+
<para>
- The following excerpt from a <literal>zebra.cfg</literal> file shows
- one example of a setup that configures both the main register
- location and the shadow file area.
- Note that two directories or partitions have been set aside
- for the shadow file area. You can specify any number of directories
- for each of the file areas, but remember that there should be no
- overlaps between the directories used for the main registers and the
- shadow files, respectively.
+ Consider a system in which you have a group of text files called
+ <literal>simple</literal>.
+ That group of records should belong to a &acro.z3950; database called
+ <literal>textbase</literal>.
+ The following <literal>zebra.cfg</literal> file will suffice:
</para>
<para>
-
+
<screen>
- register: /d1:500M
- shadow: /scratch1:100M /scratch2:200M
+ profilePath: /usr/local/idzebra/tab
+ attset: bib1.att
+ simple.recordType: text
+ simple.database: textbase
</screen>
-
+
</para>
-
+
<para>
- When shadow files are enabled, an extra command is available at the
- <literal>zebraidx</literal> command line.
- In order to make changes to the system take effect for the
- users, you'll have to submit a "commit" command after a
- (sequence of) update operation(s).
+ Since the existing records in an index can not be addressed by their
+ IDs, it is impossible to delete or modify records when using this method.
</para>
-
+
+ </sect1>
+
+ <sect1 id="file-ids">
+ <title>Indexing with File Record IDs</title>
+
+ <para>
+ If you have a set of files that regularly change over time: Old files
+ are deleted, new ones are added, or existing files are modified, you
+ can benefit from using the <emphasis>file ID</emphasis>
+ indexing methodology.
+ Examples of this type of database might include an index of WWW
+ resources, or a USENET news spool area.
+ Briefly speaking, the file key methodology uses the directory paths
+ of the individual records as a unique identifier for each record.
+ To perform indexing of a directory with file keys, again, you specify
+ the top-level directory after the <literal>update</literal> command.
+ The command will recursively traverse the directories and compare
+ each one with whatever have been indexed before in that same directory.
+ If a file is new (not in the previous version of the directory) it
+ is inserted into the registers; if a file was already indexed and
+ it has been modified since the last update, the index is also
+ modified; if a file has been removed since the last
+ visit, it is deleted from the index.
+ </para>
+
+ <para>
+ The resulting system is easy to administrate. To delete a record you
+ simply have to delete the corresponding file (say, with the
+ <literal>rm</literal> command). And to add records you create new
+ files (or directories with files). For your changes to take effect
+ in the register you must run <literal>zebraidx update</literal> with
+ the same directory root again. This mode of operation requires more
+ disk space than simpler indexing methods, but it makes it easier for
+ you to keep the index in sync with a frequently changing set of data.
+ If you combine this system with the <emphasis>safe update</emphasis>
+ facility (see below), you never have to take your server off-line for
+ maintenance or register updating purposes.
+ </para>
+
<para>
-
+ To enable indexing with pathname IDs, you must specify
+ <literal>file</literal> as the value of <literal>recordId</literal>
+ in the configuration file. In addition, you should set
+ <literal>storeKeys</literal> to <literal>1</literal>, since the &zebra;
+ indexer must save additional information about the contents of each record
+ in order to modify the indexes correctly at a later time.
+ </para>
+
+ <!--
+ FIXME - There must be a simpler way to do this with Adams string tags -H
+ -->
+
+ <para>
+ For example, to update records of group <literal>esdd</literal>
+ located below
+ <literal>/data1/records/</literal> you should type:
<screen>
- $ zebraidx update /d1/records
- $ zebraidx commit
+ $ zebraidx -g esdd update /data1/records
</screen>
-
</para>
-
+
+ <para>
+ The corresponding configuration file includes:
+ <screen>
+ esdd.recordId: file
+ esdd.recordType: grs.sgml
+ esdd.storeKeys: 1
+ </screen>
+ </para>
+
+ <note>
+ <para>You cannot start out with a group of records with simple
+ indexing (no record IDs as in the previous section) and then later
+ enable file record Ids. &zebra; must know from the first time that you
+ index the group that
+ the files should be indexed with file record IDs.
+ </para>
+ </note>
+
+ <para>
+ You cannot explicitly delete records when using this method (using the
+ <literal>delete</literal> command to <literal>zebraidx</literal>. Instead
+ you have to delete the files from the file system (or move them to a
+ different location)
+ and then run <literal>zebraidx</literal> with the
+ <literal>update</literal> command.
+ </para>
+ <!-- ### what happens if a file contains multiple records? -->
+ </sect1>
+
+ <sect1 id="generic-ids">
+ <title>Indexing with General Record IDs</title>
+
+ <para>
+ When using this method you construct an (almost) arbitrary, internal
+ record key based on the contents of the record itself and other system
+ information. If you have a group of records that explicitly associates
+ an ID with each record, this method is convenient. For example, the
+ record format may contain a title or a ID-number - unique within the group.
+ In either case you specify the &acro.z3950; attribute set and use-attribute
+ location in which this information is stored, and the system looks at
+ that field to determine the identity of the record.
+ </para>
+
<para>
- Or you can execute multiple updates before committing the changes:
+ As before, the record ID is defined by the <literal>recordId</literal>
+ setting in the configuration file. The value of the record ID specification
+ consists of one or more tokens separated by whitespace. The resulting
+ ID is represented in the index by concatenating the tokens and
+ separating them by ASCII value (1).
</para>
-
+
<para>
-
+ There are three kinds of tokens:
+ <variablelist>
+
+ <varlistentry>
+ <term>Internal record info</term>
+ <listitem>
+ <para>
+ The token refers to a key that is
+ extracted from the record. The syntax of this token is
+ <literal>(</literal> <emphasis>set</emphasis> <literal>,</literal>
+ <emphasis>use</emphasis> <literal>)</literal>,
+ where <emphasis>set</emphasis> is the
+ attribute set name <emphasis>use</emphasis> is the
+ name or value of the attribute.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>System variable</term>
+ <listitem>
+ <para>
+ The system variables are preceded by
+
+ <screen>
+ $
+ </screen>
+ and immediately followed by the system variable name, which
+ may one of
+ <variablelist>
+
+ <varlistentry>
+ <term>group</term>
+ <listitem>
+ <para>
+ Group name.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>database</term>
+ <listitem>
+ <para>
+ Current database specified.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>type</term>
+ <listitem>
+ <para>
+ Record type.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>Constant string</term>
+ <listitem>
+ <para>
+ A string used as part of the ID — surrounded
+ by single- or double quotes.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+
+ <para>
+ For instance, the sample GILS records that come with the &zebra;
+ distribution contain a unique ID in the data tagged Control-Identifier.
+ The data is mapped to the &acro.bib1; use attribute Identifier-standard
+ (code 1007). To use this field as a record id, specify
+ <literal>(bib1,Identifier-standard)</literal> as the value of the
+ <literal>recordId</literal> in the configuration file.
+ If you have other record types that uses the same field for a
+ different purpose, you might add the record type
+ (or group or database name) to the record id of the gils
+ records as well, to prevent matches with other types of records.
+ In this case the recordId might be set like this:
+
<screen>
- $ zebraidx -g books update /d1/records /d2/more-records
- $ zebraidx -g fun update /d3/fun-records
- $ zebraidx commit
+ gils.recordId: $type (bib1,Identifier-standard)
</screen>
-
+
+ </para>
+
+ <para>
+ (see <xref linkend="grs"/>
+ for details of how the mapping between elements of your records and
+ searchable attributes is established).
+ </para>
+
+ <para>
+ As for the file record ID case described in the previous section,
+ updating your system is simply a matter of running
+ <literal>zebraidx</literal>
+ with the <literal>update</literal> command. However, the update with general
+ keys is considerably slower than with file record IDs, since all files
+ visited must be (re)read to discover their IDs.
+ </para>
+
+ <para>
+ As you might expect, when using the general record IDs
+ method, you can only add or modify existing records with the
+ <literal>update</literal> command.
+ If you wish to delete records, you must use the,
+ <literal>delete</literal> command, with a directory as a parameter.
+ This will remove all records that match the files below that root
+ directory.
</para>
-
+
+ </sect1>
+
+ <sect1 id="register-location">
+ <title>Register Location</title>
+
+ <para>
+ Normally, the index files that form dictionaries, inverted
+ files, record info, etc., are stored in the directory where you run
+ <literal>zebraidx</literal>. If you wish to store these, possibly large,
+ files somewhere else, you must add the <literal>register</literal>
+ entry to the <literal>zebra.cfg</literal> file.
+ Furthermore, the &zebra; system allows its file
+ structures to span multiple file systems, which is useful for
+ managing very large databases.
+ </para>
+
<para>
- If one of the update operations above had been interrupted, the commit
- operation on the last line would fail: <literal>zebraidx</literal>
- will not let you commit changes that would destroy the running register.
- You'll have to rerun all of the update operations since your last
- commit operation, before you can commit the new changes.
+ The value of the <literal>register</literal> setting is a sequence
+ of tokens. Each token takes the form:
+
+ <emphasis>dir</emphasis><literal>:</literal><emphasis>size</emphasis>
+
+ The <emphasis>dir</emphasis> specifies a directory in which index files
+ will be stored and the <emphasis>size</emphasis> specifies the maximum
+ size of all files in that directory. The &zebra; indexer system fills
+ each directory in the order specified and use the next specified
+ directories as needed.
+ The <emphasis>size</emphasis> is an integer followed by a qualifier
+ code,
+ <literal>b</literal> for bytes,
+ <literal>k</literal> for kilobytes.
+ <literal>M</literal> for megabytes,
+ <literal>G</literal> for gigabytes.
+ Specifying a negative value disables the checking (it still needs the unit,
+ use <literal>-1b</literal>).
</para>
-
+
<para>
- Similarly, if the commit operation fails, <literal>zebraidx</literal>
- will not let you start a new update operation before you have
- successfully repeated the commit operation.
- The server processes will keep accessing the shadow files rather
- than the (possibly damaged) blocks of the main register files
- until the commit operation has successfully completed.
+ For instance, if you have allocated three disks for your register, and
+ the first disk is mounted
+ on <literal>/d1</literal> and has 2GB of free space, the
+ second, mounted on <literal>/d2</literal> has 3.6 GB, and the third,
+ on which you have more space than you bother to worry about, mounted on
+ <literal>/d3</literal> you could put this entry in your configuration file:
+
+ <screen>
+ register: /d1:2G /d2:3600M /d3:-1b
+ </screen>
</para>
-
+
<para>
- You should be aware that update operations may take slightly longer
- when the shadow register system is enabled, since more file access
- operations are involved. Further, while the disk space required for
- the shadow register data is modest for a small update operation, you
- may prefer to disable the system if you are adding a very large number
- of records to an already very large database (we use the terms
- <emphasis>large</emphasis> and <emphasis>modest</emphasis>
- very loosely here, since every application will have a
- different perception of size).
- To update the system without the use of the the shadow files,
- simply run <literal>zebraidx</literal> with the <literal>-n</literal>
- option (note that you do not have to execute the
- <emphasis>commit</emphasis> command of <literal>zebraidx</literal>
- when you temporarily disable the use of the shadow registers in
- this fashion.
- Note also that, just as when the shadow registers are not enabled,
- server processes will be barred from accessing the main register
- while the update procedure takes place.
+ Note that &zebra; does not verify that the amount of space specified is
+ actually available on the directory (file system) specified - it is
+ your responsibility to ensure that enough space is available, and that
+ other applications do not attempt to use the free space. In a large
+ production system, it is recommended that you allocate one or more
+ file system exclusively to the &zebra; register files.
</para>
-
- </sect2>
-
- </sect1>
+ </sect1>
+
+ <sect1 id="shadow-registers">
+ <title>Safe Updating - Using Shadow Registers</title>
+
+ <sect2 id="shadow-registers-description">
+ <title>Description</title>
+
+ <para>
+ The &zebra; server supports <emphasis>updating</emphasis> of the index
+ structures. That is, you can add, modify, or remove records from
+ databases managed by &zebra; without rebuilding the entire index.
+ Since this process involves modifying structured files with various
+ references between blocks of data in the files, the update process
+ is inherently sensitive to system crashes, or to process interruptions:
+ Anything but a successfully completed update process will leave the
+ register files in an unknown state, and you will essentially have no
+ recourse but to re-index everything, or to restore the register files
+ from a backup medium.
+ Further, while the update process is active, users cannot be
+ allowed to access the system, as the contents of the register files
+ may change unpredictably.
+ </para>
+
+ <para>
+ You can solve these problems by enabling the shadow register system in
+ &zebra;.
+ During the updating procedure, <literal>zebraidx</literal> will temporarily
+ write changes to the involved files in a set of "shadow
+ files", without modifying the files that are accessed by the
+ active server processes. If the update procedure is interrupted by a
+ system crash or a signal, you simply repeat the procedure - the
+ register files have not been changed or damaged, and the partially
+ written shadow files are automatically deleted before the new updating
+ procedure commences.
+ </para>
+
+ <para>
+ At the end of the updating procedure (or in a separate operation, if
+ you so desire), the system enters a "commit mode". First,
+ any active server processes are forced to access those blocks that
+ have been changed from the shadow files rather than from the main
+ register files; the unmodified blocks are still accessed at their
+ normal location (the shadow files are not a complete copy of the
+ register files - they only contain those parts that have actually been
+ modified). If the commit process is interrupted at any point during the
+ commit process, the server processes will continue to access the
+ shadow files until you can repeat the commit procedure and complete
+ the writing of data to the main register files. You can perform
+ multiple update operations to the registers before you commit the
+ changes to the system files, or you can execute the commit operation
+ at the end of each update operation. When the commit phase has
+ completed successfully, any running server processes are instructed to
+ switch their operations to the new, operational register, and the
+ temporary shadow files are deleted.
+ </para>
+
+ </sect2>
+
+ <sect2 id="shadow-registers-how-to-use">
+ <title>How to Use Shadow Register Files</title>
+
+ <para>
+ The first step is to allocate space on your system for the shadow
+ files.
+ You do this by adding a <literal>shadow</literal> entry to the
+ <literal>zebra.cfg</literal> file.
+ The syntax of the <literal>shadow</literal> entry is exactly the
+ same as for the <literal>register</literal> entry
+ (see <xref linkend="register-location"/>).
+ The location of the shadow area should be
+ <emphasis>different</emphasis> from the location of the main register
+ area (if you have specified one - remember that if you provide no
+ <literal>register</literal> setting, the default register area is the
+ working directory of the server and indexing processes).
+ </para>
+
+ <para>
+ The following excerpt from a <literal>zebra.cfg</literal> file shows
+ one example of a setup that configures both the main register
+ location and the shadow file area.
+ Note that two directories or partitions have been set aside
+ for the shadow file area. You can specify any number of directories
+ for each of the file areas, but remember that there should be no
+ overlaps between the directories used for the main registers and the
+ shadow files, respectively.
+ </para>
+ <para>
+
+ <screen>
+ register: /d1:500M
+ shadow: /scratch1:100M /scratch2:200M
+ </screen>
+
+ </para>
+
+ <para>
+ When shadow files are enabled, an extra command is available at the
+ <literal>zebraidx</literal> command line.
+ In order to make changes to the system take effect for the
+ users, you'll have to submit a "commit" command after a
+ (sequence of) update operation(s).
+ </para>
+
+ <para>
+
+ <screen>
+ $ zebraidx update /d1/records
+ $ zebraidx commit
+ </screen>
+
+ </para>
+
+ <para>
+ Or you can execute multiple updates before committing the changes:
+ </para>
- <sect1 id="administration-ranking">
- <title>Relevance Ranking and Sorting of Result Sets</title>
+ <para>
- <sect2>
- <title>Overview</title>
- <para>
- The default ordering of a result set is left up to the server,
- which inside Zebra means sorting in ascending document ID order.
- This is not always the order humans want to browse the sometimes
- quite large hit sets. Ranking and sorting comes to the rescue.
- </para>
+ <screen>
+ $ zebraidx -g books update /d1/records /d2/more-records
+ $ zebraidx -g fun update /d3/fun-records
+ $ zebraidx commit
+ </screen>
- <para>
- In cases where a good presentation ordering can be computed at
- indexing time, we can use a fixed <literal>static ranking</literal>
- scheme, which is provided for the <literal>alvis</literal>
- indexing filter. This defines a fixed ordering of hit lists,
- independently of the query issued.
- </para>
+ </para>
- <para>
- There are cases, however, where relevance of hit set documents is
- highly dependent on the query processed.
- Simply put, <literal>dynamic relevance ranking</literal>
- sorts a set of retrieved
- records such
- that those most likely to be relevant to your request are
- retrieved first.
- Internally, Zebra retrieves all documents that satisfy your
- query, and re-orders the hit list to arrange them based on
- a measurement of similarity between your query and the content of
- each record.
- </para>
+ <para>
+ If one of the update operations above had been interrupted, the commit
+ operation on the last line would fail: <literal>zebraidx</literal>
+ will not let you commit changes that would destroy the running register.
+ You'll have to rerun all of the update operations since your last
+ commit operation, before you can commit the new changes.
+ </para>
- <para>
- Finally, there are situations where hit sets of documents should be
- <literal>sorted</literal> during query time according to the
- lexicographical ordering of certain sort indexes created at
- indexing time.
- </para>
- </sect2>
+ <para>
+ Similarly, if the commit operation fails, <literal>zebraidx</literal>
+ will not let you start a new update operation before you have
+ successfully repeated the commit operation.
+ The server processes will keep accessing the shadow files rather
+ than the (possibly damaged) blocks of the main register files
+ until the commit operation has successfully completed.
+ </para>
+ <para>
+ You should be aware that update operations may take slightly longer
+ when the shadow register system is enabled, since more file access
+ operations are involved. Further, while the disk space required for
+ the shadow register data is modest for a small update operation, you
+ may prefer to disable the system if you are adding a very large number
+ of records to an already very large database (we use the terms
+ <emphasis>large</emphasis> and <emphasis>modest</emphasis>
+ very loosely here, since every application will have a
+ different perception of size).
+ To update the system without the use of the the shadow files,
+ simply run <literal>zebraidx</literal> with the <literal>-n</literal>
+ option (note that you do not have to execute the
+ <emphasis>commit</emphasis> command of <literal>zebraidx</literal>
+ when you temporarily disable the use of the shadow registers in
+ this fashion.
+ Note also that, just as when the shadow registers are not enabled,
+ server processes will be barred from accessing the main register
+ while the update procedure takes place.
+ </para>
- <sect2 id="administration-ranking-static">
- <title>Static Ranking</title>
-
- <para>
- Zebra uses internally inverted indexes to look up term occurencies
- in documents. Multiple queries from different indexes can be
- combined by the binary boolean operations <literal>AND</literal>,
- <literal>OR</literal> and/or <literal>NOT</literal> (which
- is in fact a binary <literal>AND NOT</literal> operation).
- To ensure fast query execution
- speed, all indexes have to be sorted in the same order.
- </para>
- <para>
- The indexes are normally sorted according to document
- <literal>ID</literal> in
- ascending order, and any query which does not invoke a special
- re-ranking function will therefore retrieve the result set in
- document
- <literal>ID</literal>
- order.
- </para>
- <para>
- If one defines the
- <screen>
- staticrank: 1
- </screen>
- directive in the main core Zebra config file, the internal document
- keys used for ordering are augmented by a preceeding integer, which
- contains the static rank of a given document, and the index lists
- are ordered
- first by ascending static rank,
- then by ascending document <literal>ID</literal>.
- Zero
- is the ``best'' rank, as it occurs at the
- beginning of the list; higher numbers represent worse scores.
- </para>
- <para>
- The experimental <literal>alvis</literal> filter provides a
- directive to fetch static rank information out of the indexed XML
- records, thus making <emphasis>all</emphasis> hit sets orderd
- after <emphasis>ascending</emphasis> static
- rank, and for those doc's which have the same static rank, ordered
- after <emphasis>ascending</emphasis> doc <literal>ID</literal>.
- See <xref linkend="record-model-alvisxslt"/> for the gory details.
- </para>
- </sect2>
+ </sect2>
+ </sect1>
- <sect2 id="administration-ranking-dynamic">
- <title>Dynamic Ranking</title>
- <para>
- In order to fiddle with the static rank order, it is necessary to
- invoke additional re-ranking/re-ordering using dynamic
- ranking or score functions. These functions return positive
- integer scores, where <emphasis>highest</emphasis> score is
- ``best'';
- hit sets are sorted according to
- <emphasis>decending</emphasis>
- scores (in contrary
- to the index lists which are sorted according to
- ascending rank number and document ID).
- </para>
- <para>
- Dynamic ranking is enabled by a directive like one of the
- following in the zebra config file (use only one of these a time!):
- <screen>
- rank: rank-1 # default TDF-IDF like
- rank: rank-static # dummy do-nothing
- </screen>
- Notice that the <literal>rank-1</literal> algorithm
- does not use the static rank
- information in the list keys, and will produce the same ordering
- with or without static ranking enabled.
- </para>
- <para>
- The dummy <literal>rank-static</literal> reranking/scoring
- function returns just
- <literal>score = max int - staticrank</literal>
- in order to preserve the static ordering of hit sets that would
- have been produced had it not been invoked.
- Obviously, to combine static and dynamic ranking usefully,
- it is necessary
- to make a new ranking
- function; this is left
- as an exercise for the reader.
- </para>
+ <sect1 id="administration-ranking">
+ <title>Relevance Ranking and Sorting of Result Sets</title>
- <para>
- Dynamic ranking is done at query time rather than
- indexing time (this is why we
- call it ``dynamic ranking'' in the first place ...)
- It is invoked by adding
- the Bib-1 relation attribute with
- value ``relevance'' to the PQF query (that is,
- <literal>@attr 2=102</literal>, see also
- <ulink url="ftp://ftp.loc.gov/pub/z3950/defs/bib1.txt">
- The BIB-1 Attribute Set Semantics</ulink>).
- To find all articles with the word <literal>Eoraptor</literal> in
- the title, and present them relevance ranked, issue the PQF query:
- <screen>
- @attr 2=102 @attr 1=4 Eoraptor
- </screen>
- </para>
-
- <para>
- The default <literal>rank-1</literal> ranking module implements a
- TF-IDF (Term Frequecy over Inverse Document Frequency) like algorithm.
- </para>
+ <sect2 id="administration-overview">
+ <title>Overview</title>
+ <para>
+ The default ordering of a result set is left up to the server,
+ which inside &zebra; means sorting in ascending document ID order.
+ This is not always the order humans want to browse the sometimes
+ quite large hit sets. Ranking and sorting comes to the rescue.
+ </para>
- <warning>
- <para>
- Notice that <literal>dynamic ranking</literal> is not compatible
- with <literal>estimated hit sizes</literal>, as all documents in
- a hit set must be acessed to compute the correct placing in a
- ranking sorted list. Therefore the use attribute setting
- <literal>@attr 2=102</literal> clashes with
- <literal>@attr 9=integer</literal>.
- </para>
- </warning>
+ <para>
+ In cases where a good presentation ordering can be computed at
+ indexing time, we can use a fixed <literal>static ranking</literal>
+ scheme, which is provided for the <literal>alvis</literal>
+ indexing filter. This defines a fixed ordering of hit lists,
+ independently of the query issued.
+ </para>
- <para>
- It is possible to apply dynamic ranking on only parts of the PQF query:
- <screen>
- @and @attr 2=102 @attr 1=1010 Utah @attr 1=1018 Springer
- </screen>
- searches for all documents which have the term 'Utah' on the
- body of text, and which have the term 'Springer' in the publisher
- field, and sort them in the order of the relvance ranking made on
- the body-of-text index only.
- </para>
<para>
- Ranking weights may be used to pass a value to a ranking
- algorithm, using the non-standard BIB-1 attribute type 9.
- This allows one branch of a query to use one value while
- another branch uses a different one. For example, we can search
- for <literal>utah</literal> in the title index with weight 30, as
- well as in the ``any'' index with weight 20:
+ There are cases, however, where relevance of hit set documents is
+ highly dependent on the query processed.
+ Simply put, <literal>dynamic relevance ranking</literal>
+ sorts a set of retrieved records such that those most likely to be
+ relevant to your request are retrieved first.
+ Internally, &zebra; retrieves all documents that satisfy your
+ query, and re-orders the hit list to arrange them based on
+ a measurement of similarity between your query and the content of
+ each record.
+ </para>
+
+ <para>
+ Finally, there are situations where hit sets of documents should be
+ <literal>sorted</literal> during query time according to the
+ lexicographical ordering of certain sort indexes created at
+ indexing time.
+ </para>
+ </sect2>
+
+
+ <sect2 id="administration-ranking-static">
+ <title>Static Ranking</title>
+
+ <para>
+ &zebra; uses internally inverted indexes to look up term frequencies
+ in documents. Multiple queries from different indexes can be
+ combined by the binary boolean operations <literal>AND</literal>,
+ <literal>OR</literal> and/or <literal>NOT</literal> (which
+ is in fact a binary <literal>AND NOT</literal> operation).
+ To ensure fast query execution
+ speed, all indexes have to be sorted in the same order.
+ </para>
+ <para>
+ The indexes are normally sorted according to document
+ <literal>ID</literal> in
+ ascending order, and any query which does not invoke a special
+ re-ranking function will therefore retrieve the result set in
+ document
+ <literal>ID</literal>
+ order.
+ </para>
+ <para>
+ If one defines the
<screen>
- @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 utah
+ staticrank: 1
</screen>
+ directive in the main core &zebra; configuration file, the internal document
+ keys used for ordering are augmented by a preceding integer, which
+ contains the static rank of a given document, and the index lists
+ are ordered
+ first by ascending static rank,
+ then by ascending document <literal>ID</literal>.
+ Zero
+ is the ``best'' rank, as it occurs at the
+ beginning of the list; higher numbers represent worse scores.
</para>
- <warning>
- <para>
- The ranking-weight feature is experimental. It may change in future
- releases of zebra, and is not production mature.
- </para>
- </warning>
-
- <para>
- Notice that dynamic ranking can be enabled in sever side CQL
- query expansion by adding <literal>@attr 2=102</literal> to
- the CQL config file. For example
+ <para>
+ The experimental <literal>alvis</literal> filter provides a
+ directive to fetch static rank information out of the indexed &acro.xml;
+ records, thus making <emphasis>all</emphasis> hit sets ordered
+ after <emphasis>ascending</emphasis> static
+ rank, and for those doc's which have the same static rank, ordered
+ after <emphasis>ascending</emphasis> doc <literal>ID</literal>.
+ See <xref linkend="record-model-alvisxslt"/> for the gory details.
+ </para>
+ </sect2>
+
+
+ <sect2 id="administration-ranking-dynamic">
+ <title>Dynamic Ranking</title>
+ <para>
+ In order to fiddle with the static rank order, it is necessary to
+ invoke additional re-ranking/re-ordering using dynamic
+ ranking or score functions. These functions return positive
+ integer scores, where <emphasis>highest</emphasis> score is
+ ``best'';
+ hit sets are sorted according to <emphasis>descending</emphasis>
+ scores (in contrary
+ to the index lists which are sorted according to
+ ascending rank number and document ID).
+ </para>
+ <para>
+ Dynamic ranking is enabled by a directive like one of the
+ following in the zebra configuration file (use only one of these a time!):
<screen>
- relationModifier.relevant = 2=102
+ rank: rank-1 # default TDF-IDF like
+ rank: rank-static # dummy do-nothing
</screen>
- invokes dynamic ranking each time a CQL query of the form
- <screen>
- Z> querytype cql
- Z> f alvis.text =/relevant house
- </screen>
- is issued. Dynamic ranking can also be automatically used on
- specific CQL indexes by (for example) setting
+ </para>
+
+ <para>
+ Dynamic ranking is done at query time rather than
+ indexing time (this is why we
+ call it ``dynamic ranking'' in the first place ...)
+ It is invoked by adding
+ the &acro.bib1; relation attribute with
+ value ``relevance'' to the &acro.pqf; query (that is,
+ <literal>@attr 2=102</literal>, see also
+ <ulink url="&url.z39.50;bib1.html">
+ The &acro.bib1; Attribute Set Semantics</ulink>, also in
+ <ulink url="&url.z39.50.attset.bib1;">HTML</ulink>).
+ To find all articles with the word <literal>Eoraptor</literal> in
+ the title, and present them relevance ranked, issue the &acro.pqf; query:
<screen>
- index.alvis.text = 1=text 2=102
+ @attr 2=102 @attr 1=4 Eoraptor
</screen>
- which then invokes dynamic ranking each time a CQL query of the form
- <screen>
- Z> querytype cql
- Z> f alvis.text = house
- </screen>
- is issued.
- </para>
+ </para>
+
+ <sect3 id="administration-ranking-dynamic-rank1">
+ <title>Dynamically ranking using &acro.pqf; queries with the 'rank-1'
+ algorithm</title>
- </sect2>
+ <para>
+ The default <literal>rank-1</literal> ranking module implements a
+ TF/IDF (Term Frequecy over Inverse Document Frequency) like
+ algorithm. In contrast to the usual definition of TF/IDF
+ algorithms, which only considers searching in one full-text
+ index, this one works on multiple indexes at the same time.
+ More precisely,
+ &zebra; does boolean queries and searches in specific addressed
+ indexes (there are inverted indexes pointing from terms in the
+ dictionary to documents and term positions inside documents).
+ It works like this:
+ <variablelist>
+ <varlistentry>
+ <term>Query Components</term>
+ <listitem>
+ <para>
+ First, the boolean query is dismantled into its principal components,
+ i.e. atomic queries where one term is looked up in one index.
+ For example, the query
+ <screen>
+ @attr 2=102 @and @attr 1=1010 Utah @attr 1=1018 Springer
+ </screen>
+ is a boolean AND between the atomic parts
+ <screen>
+ @attr 2=102 @attr 1=1010 Utah
+ </screen>
+ and
+ <screen>
+ @attr 2=102 @attr 1=1018 Springer
+ </screen>
+ which gets processed each for itself.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>Atomic hit lists</term>
+ <listitem>
+ <para>
+ Second, for each atomic query, the hit list of documents is
+ computed.
+ </para>
+ <para>
+ In this example, two hit lists for each index
+ <literal>@attr 1=1010</literal> and
+ <literal>@attr 1=1018</literal> are computed.
+ </para>
+ </listitem>
+ </varlistentry>
- <sect2 id="administration-ranking-sorting">
- <title>Sorting</title>
- <para>
- Zebra sorts efficiently using special sorting indexes
+ <varlistentry>
+ <term>Atomic scores</term>
+ <listitem>
+ <para>
+ Third, each document in the hit list is assigned a score (_if_ ranking
+ is enabled and requested in the query) using a TF/IDF scheme.
+ </para>
+ <para>
+ In this example, both atomic parts of the query assign the magic
+ <literal>@attr 2=102</literal> relevance attribute, and are
+ to be used in the relevance ranking functions.
+ </para>
+ <para>
+ It is possible to apply dynamic ranking on only parts of the
+ &acro.pqf; query:
+ <screen>
+ @and @attr 2=102 @attr 1=1010 Utah @attr 1=1018 Springer
+ </screen>
+ searches for all documents which have the term 'Utah' on the
+ body of text, and which have the term 'Springer' in the publisher
+ field, and sort them in the order of the relevance ranking made on
+ the body-of-text index only.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Hit list merging</term>
+ <listitem>
+ <para>
+ Fourth, the atomic hit lists are merged according to the boolean
+ conditions to a final hit list of documents to be returned.
+ </para>
+ <para>
+ This step is always performed, independently of the fact that
+ dynamic ranking is enabled or not.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Document score computation</term>
+ <listitem>
+ <para>
+ Fifth, the total score of a document is computed as a linear
+ combination of the atomic scores of the atomic hit lists
+ </para>
+ <para>
+ Ranking weights may be used to pass a value to a ranking
+ algorithm, using the non-standard &acro.bib1; attribute type 9.
+ This allows one branch of a query to use one value while
+ another branch uses a different one. For example, we can search
+ for <literal>utah</literal> in the
+ <literal>@attr 1=4</literal> index with weight 30, as
+ well as in the <literal>@attr 1=1010</literal> index with weight 20:
+ <screen>
+ @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 @attr 1=1010 city
+ </screen>
+ </para>
+ <para>
+ The default weight is
+ sqrt(1000) ~ 34 , as the &acro.z3950; standard prescribes that the top score
+ is 1000 and the bottom score is 0, encoded in integers.
+ </para>
+ <warning>
+ <para>
+ The ranking-weight feature is experimental. It may change in future
+ releases of zebra.
+ </para>
+ </warning>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Re-sorting of hit list</term>
+ <listitem>
+ <para>
+ Finally, the final hit list is re-ordered according to scores.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+
+ </para>
+
+
+ <para>
+ The <literal>rank-1</literal> algorithm
+ does not use the static rank
+ information in the list keys, and will produce the same ordering
+ with or without static ranking enabled.
+ </para>
+
+
+ <!--
+ <sect3 id="administration-ranking-dynamic-rank1">
+ <title>Dynamically ranking &acro.pqf; queries with the 'rank-static'
+ algorithm</title>
+ <para>
+ The dummy <literal>rank-static</literal> reranking/scoring
+ function returns just
+ <literal>score = max int - staticrank</literal>
+ in order to preserve the static ordering of hit sets that would
+ have been produced had it not been invoked.
+ Obviously, to combine static and dynamic ranking usefully,
+ it is necessary
+ to make a new ranking
+ function; this is left
+ as an exercise for the reader.
+ </para>
+ </sect3>
+ -->
+
+ <warning>
+ <para>
+ <literal>Dynamic ranking</literal> is not compatible
+ with <literal>estimated hit sizes</literal>, as all documents in
+ a hit set must be accessed to compute the correct placing in a
+ ranking sorted list. Therefore the use attribute setting
+ <literal>@attr 2=102</literal> clashes with
+ <literal>@attr 9=integer</literal>.
+ </para>
+ </warning>
+
+ <!--
+ we might want to add ranking like this:
+ UNPUBLISHED:
+ Simple BM25 Extension to Multiple Weighted Fields
+ Stephen Robertson, Hugo Zaragoza and Michael Taylor
+ Microsoft Research
+ ser@microsoft.com
+ hugoz@microsoft.com
+ mitaylor2microsoft.com
+ -->
+
+ </sect3>
+
+ <sect3 id="administration-ranking-dynamic-cql">
+ <title>Dynamically ranking &acro.cql; queries</title>
+ <para>
+ Dynamic ranking can be enabled during sever side &acro.cql;
+ query expansion by adding <literal>@attr 2=102</literal>
+ chunks to the &acro.cql; config file. For example
+ <screen>
+ relationModifier.relevant = 2=102
+ </screen>
+ invokes dynamic ranking each time a &acro.cql; query of the form
+ <screen>
+ Z> querytype cql
+ Z> f alvis.text =/relevant house
+ </screen>
+ is issued. Dynamic ranking can also be automatically used on
+ specific &acro.cql; indexes by (for example) setting
+ <screen>
+ index.alvis.text = 1=text 2=102
+ </screen>
+ which then invokes dynamic ranking each time a &acro.cql; query of the form
+ <screen>
+ Z> querytype cql
+ Z> f alvis.text = house
+ </screen>
+ is issued.
+ </para>
+
+ </sect3>
+
+ </sect2>
+
+
+ <sect2 id="administration-ranking-sorting">
+ <title>Sorting</title>
+ <para>
+ &zebra; sorts efficiently using special sorting indexes
(type=<literal>s</literal>; so each sortable index must be known
at indexing time, specified in the configuration of record
- indexing. For example, to enable sorting according to the BIB-1
+ indexing. For example, to enable sorting according to the &acro.bib1;
<literal>Date/time-added-to-db</literal> field, one could add the line
<screen>
- xelm /*/@created Date/time-added-to-db:s
+ xelm /*/@created Date/time-added-to-db:s
</screen>
to any <literal>.abs</literal> record-indexing configuration file.
- Similarily, one could add an indexing element of the form
- <screen><![CDATA[
+ Similarly, one could add an indexing element of the form
+ <screen><![CDATA[
<z:index name="date-modified" type="s">
- <xsl:value-of select="some/xpath"/>
- </z:index>
+ <xsl:value-of select="some/xpath"/>
+ </z:index>
]]></screen>
to any <literal>alvis</literal>-filter indexing stylesheet.
- </para>
- <para>
- Indexing can be specified at searching time using a query term
- carrying the non-standard
- BIB-1 attribute-type <literal>7</literal>. This removes the
- need to send a Z39.50 <literal>Sort Request</literal>
- separately, and can dramatically improve latency when the client
- and server are on separate networks.
- The sorting part of the query is separate from the rest of the
- query - the actual search specification - and must be combined
- with it using OR.
- </para>
- <para>
- A sorting subquery needs two attributes: an index (such as a
- BIB-1 type-1 attribute) specifying which index to sort on, and a
- type-7 attribute whose value is be <literal>1</literal> for
- ascending sorting, or <literal>2</literal> for descending. The
- term associated with the sorting attribute is the priority of
- the sort key, where <literal>0</literal> specifies the primary
- sort key, <literal>1</literal> the secondary sort key, and so
- on.
- </para>
+ </para>
+ <para>
+ Indexing can be specified at searching time using a query term
+ carrying the non-standard
+ &acro.bib1; attribute-type <literal>7</literal>. This removes the
+ need to send a &acro.z3950; <literal>Sort Request</literal>
+ separately, and can dramatically improve latency when the client
+ and server are on separate networks.
+ The sorting part of the query is separate from the rest of the
+ query - the actual search specification - and must be combined
+ with it using OR.
+ </para>
+ <para>
+ A sorting subquery needs two attributes: an index (such as a
+ &acro.bib1; type-1 attribute) specifying which index to sort on, and a
+ type-7 attribute whose value is be <literal>1</literal> for
+ ascending sorting, or <literal>2</literal> for descending. The
+ term associated with the sorting attribute is the priority of
+ the sort key, where <literal>0</literal> specifies the primary
+ sort key, <literal>1</literal> the secondary sort key, and so
+ on.
+ </para>
<para>For example, a search for water, sort by title (ascending),
- is expressed by the PQF query
+ is expressed by the &acro.pqf; query
<screen>
- @or @attr 1=1016 water @attr 7=1 @attr 1=4 0
+ @or @attr 1=1016 water @attr 7=1 @attr 1=4 0
</screen>
- whereas a search for water, sort by title ascending,
+ whereas a search for water, sort by title ascending,
then date descending would be
<screen>
- @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1
+ @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1
</screen>
</para>
<para>
Notice the fundamental differences between <literal>dynamic
- ranking</literal> and <literal>sorting</literal>: there can be
+ ranking</literal> and <literal>sorting</literal>: there can be
only one ranking function defined and configured; but multiple
sorting indexes can be specified dynamically at search
time. Ranking does not need to use specific indexes, so
dynamic ranking can be enabled and disabled without
re-indexing; whereas, sorting indexes need to be
defined before indexing.
- </para>
+ </para>
- </sect2>
+ </sect2>
- </sect1>
+ </sect1>
- <sect1 id="administration-extended-services">
- <title>Extended Services: Remote Insert, Update and Delete</title>
-
- <para>
+ <sect1 id="administration-extended-services">
+ <title>Extended Services: Remote Insert, Update and Delete</title>
+
+ <note>
+ <para>
+ Extended services are only supported when accessing the &zebra;
+ server using the <ulink url="&url.z39.50;">&acro.z3950;</ulink>
+ protocol. The <ulink url="&url.sru;">&acro.sru;</ulink> protocol does
+ not support extended services.
+ </para>
+ </note>
+
+ <para>
The extended services are not enabled by default in zebra - due to the
- fact that they modify the system.
- In order to allow anybody to update, use
- <screen>
- perm.anonymous: rw
- </screen>
+ fact that they modify the system. &zebra; can be configured
+ to allow anybody to
+ search, and to allow only updates for a particular admin user
in the main zebra configuration file <filename>zebra.cfg</filename>.
- Or, even better, allow only updates for a particular admin user. For
- user <literal>admin</literal>, you could use:
+ For user <literal>admin</literal>, you could use:
<screen>
+ perm.anonymous: r
perm.admin: rw
passwd: passwordfile
</screen>
- And in <filename>passwordfile</filename>, specify users and
- passwords as colon seperated strings:
- <screen>
+ And in the password file
+ <filename>passwordfile</filename>, you have to specify users and
+ encrypted passwords as colon separated strings.
+ Use a tool like <filename>htpasswd</filename>
+ to maintain the encrypted passwords.
+ <screen>
admin:secret
- </screen>
- </para>
- <para>
- We can now start a yaz-client admin session and create a database:
- <screen>
- <![CDATA[
- $ yaz-client localhost:9999 -u admin/secret
- Z> adm-create
- ]]>
- </screen>
- Now the <literal>Default</literal> database was created,
- we can insert an XML file (esdd0006.grs
- from example/gils/records) and index it:
- <screen>
- <![CDATA[
- Z> update insert 1 esdd0006.grs
- ]]>
- </screen>
- The 3rd parameter - <literal>1</literal> here -
- is the opaque record ID from <literal>Ext update</literal>.
- It a record ID that <emphasis>we</emphasis> assign to the record
- in question. If we do not
- assign one, the usual rules for match apply (recordId: from zebra.cfg).
- </para>
- <para>
- Actually, we should have a way to specify "no opaque record id" for
- yaz-client's update command.. We'll fix that.
- </para>
- <para>
- The newly inserted record can be searched as usual:
+ </screen>
+ It is essential to configure &zebra; to store records internally,
+ and to support
+ modifications and deletion of records:
<screen>
- <![CDATA[
- Z> f utah
- Sent searchRequest.
- Received SearchResponse.
- Search was a success.
- Number of hits: 1, setno 1
- SearchResult-1: term=utah cnt=1
- records returned: 0
- Elapsed: 0.014179
- ]]>
+ storeData: 1
+ storeKeys: 1
</screen>
- </para>
- <para>
- Let's delete the beast:
+ The general record type should be set to any record filter which
+ is able to parse &acro.xml; records, you may use any of the two
+ declarations (but not both simultaneously!)
<screen>
- <![CDATA[
- Z> update delete 1
- No last record (update ignored)
- Z> update delete 1 esdd0006.grs
- Got extended services response
- Status: done
- Elapsed: 0.072441
- Z> f utah
- Sent searchRequest.
- Received SearchResponse.
- Search was a success.
- Number of hits: 0, setno 2
- SearchResult-1: term=utah cnt=0
- records returned: 0
- Elapsed: 0.013610
- ]]>
- </screen>
- </para>
- <para>
- If shadow register is enabled in your
- <filename>zebra.cfg</filename>,
- you must run the adm-commit command
+ recordType: dom.filter_dom_conf.xml
+ # recordType: grs.xml
+ </screen>
+ Notice the difference to the specific instructions
<screen>
- <![CDATA[
- Z> adm-commit
- ]]>
+ recordType.xml: dom.filter_dom_conf.xml
+ # recordType.xml: grs.xml
</screen>
- after each update session in order write your changes from the
- shadow to the life register space.
+ which only work when indexing XML files from the filesystem using
+ the <literal>*.xml</literal> naming convention.
</para>
<para>
- Extended services are also available from the YAZ client layer. An
- example of an YAZ-PHP extended service transaction is given here:
+ To enable transaction safe shadow indexing,
+ which is extra important for this kind of operation, set
<screen>
- <![CDATA[
- $record = '<record><title>A fine specimen of a record</title></record>';
-
- $options = array('action' => 'recordInsert',
- 'syntax' => 'xml',
- 'record' => $record,
- 'databaseName' => 'mydatabase'
- );
-
- yaz_es($yaz, 'update', $options);
- yaz_es($yaz, 'commit', array());
- yaz_wait();
-
- if ($error = yaz_error($yaz))
- echo "$error";
- ]]>
- </screen>
- The <literal>action</literal> parameter can be any of
- <literal>recordInsert</literal> (will fail if the record already exists),
- <literal>recordReplace</literal> (will fail if the record does not exist),
- <literal>recordDelete</literal> (will fail if the record does not
- exist), and
- <literal>specialUpdate</literal> (will insert or update the record
- as needed).
- </para>
- <para>
- If a record is inserted
- using the action <literal>recordInsert</literal>
- one can specify the optional
- <literal>recordIdOpaque</literal> parameter, which is a
- client-supplied, opaque record identifier. This identifier will
- replace zebra's own automagic identifier generation.
- </para>
- <para>
- When using the action <literal>recordReplace</literal> or
- <literal>recordDelete</literal>, one must specify the additional
- <literal>recordIdNumber</literal> parameter, which must be an
- existing Zebra internal system ID number. When retrieving existing
- records, the ID number is returned in the field
- <literal>/*/id:idzebra/localnumber</literal> in the namespace
- <literal>xmlns:id="http://www.indexdata.dk/zebra/"</literal>,
- where it can be picked up for later record updates or deletes.
+ shadow: directoryname: size (e.g. 1000M)
+ </screen>
+ See <xref linkend="zebra-cfg"/> for additional information on
+ these configuration options.
</para>
- </sect1>
+ <note>
+ <para>
+ It is not possible to carry information about record types or
+ similar to &zebra; when using extended services, due to
+ limitations of the <ulink url="&url.z39.50;">&acro.z3950;</ulink>
+ protocol. Therefore, indexing filters can not be chosen on a
+ per-record basis. One and only one general &acro.xml; indexing filter
+ must be defined.
+ <!-- but because it is represented as an OID, we would need some
+ form of proprietary mapping scheme between record type strings and
+ OIDs. -->
+ <!--
+ However, as a minimum, it would be extremely useful to enable
+ people to use &acro.marc21;, assuming grs.marcxml.marc21 as a record
+ type.
+ -->
+ </para>
+ </note>
- <sect1 id="gfs-config">
- <title>YAZ Frontend Virtual Hosts</title>
+ <sect2 id="administration-extended-services-z3950">
+ <title>Extended services in the &acro.z3950; protocol</title>
+
<para>
- <command>zebrasrv</command> uses the YAZ server frontend and does
- support multiple virtual servers behind multiple listening sockets.
+ The <ulink url="&url.z39.50;">&acro.z3950;</ulink> standard allows
+ servers to accept special binary <emphasis>extended services</emphasis>
+ protocol packages, which may be used to insert, update and delete
+ records into servers. These carry control and update
+ information to the servers, which are encoded in seven package fields:
</para>
- &zebrasrv-virtual;
-
- <para>
- Section "Virtual Hosts" in the YAZ manual.
- <filename>http://www.indexdata.dk/yaz/doc/server.vhosts.tkl</filename>
- </para>
- </sect1>
+ <table id="administration-extended-services-z3950-table" frame="top">
+ <title>Extended services &acro.z3950; Package Fields</title>
+ <tgroup cols="3">
+ <thead>
+ <row>
+ <entry>Parameter</entry>
+ <entry>Value</entry>
+ <entry>Notes</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry><literal>type</literal></entry>
+ <entry><literal>'update'</literal></entry>
+ <entry>Must be set to trigger extended services</entry>
+ </row>
+ <row>
+ <entry><literal>action</literal></entry>
+ <entry><literal>string</literal></entry>
+ <entry>
+ Extended service action type with
+ one of four possible values: <literal>recordInsert</literal>,
+ <literal>recordReplace</literal>,
+ <literal>recordDelete</literal>,
+ and <literal>specialUpdate</literal>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>record</literal></entry>
+ <entry><literal>&acro.xml; string</literal></entry>
+ <entry>An &acro.xml; formatted string containing the record</entry>
+ </row>
+ <row>
+ <entry><literal>syntax</literal></entry>
+ <entry><literal>'xml'</literal></entry>
+ <entry>XML/SUTRS/MARC. GRS-1 not supported.
+ The default filter (record type) as given by recordType in
+ zebra.cfg is used to parse the record.</entry>
+ </row>
+ <row>
+ <entry><literal>recordIdOpaque</literal></entry>
+ <entry><literal>string</literal></entry>
+ <entry>
+ Optional client-supplied, opaque record
+ identifier used under insert operations.
+ </entry>
+ </row>
+ <row>
+ <entry><literal>recordIdNumber </literal></entry>
+ <entry><literal>positive number</literal></entry>
+ <entry>&zebra;'s internal system number,
+ not allowed for <literal>recordInsert</literal> or
+ <literal>specialUpdate</literal> actions which result in fresh
+ record inserts.
+ </entry>
+ </row>
+ <row>
+ <entry><literal>databaseName</literal></entry>
+ <entry><literal>database identifier</literal></entry>
+ <entry>
+ The name of the database to which the extended services should be
+ applied.
+ </entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
- <sect1 id="administration-cql-to-pqf">
- <title>Server Side CQL to PQF Query Translation</title>
- <para>
- Using the
- <literal><cql2rpn>l2rpn.txt</cql2rpn></literal>
- YAZ Frontend Virtual
- Hosts option, one can configure
- the YAZ Frontend CQL-to-PQF
- converter, specifying the interpretation of various
- <ulink url="http://www.loc.gov/standards/sru/cql/">CQL</ulink>
- indexes, relations, etc. in terms of Type-1 query attributes.
- <!-- The yaz-client config file -->
- </para>
- <para>
- For example, using server-side CQL-to-PQF conversion, one might
- query a zebra server like this:
- <screen>
- <![CDATA[
- yaz-client localhost:9999
- Z> querytype cql
- Z> find text=(plant and soil)
- ]]>
- </screen>
- and - if properly configured - even static relevance ranking can
- be performed using CQL query syntax:
- <screen>
- <![CDATA[
- Z> find text = /relevant (plant and soil)
- ]]>
+
+ <para>
+ The <literal>action</literal> parameter can be any of
+ <literal>recordInsert</literal> (will fail if the record already exists),
+ <literal>recordReplace</literal> (will fail if the record does not exist),
+ <literal>recordDelete</literal> (will fail if the record does not
+ exist), and
+ <literal>specialUpdate</literal> (will insert or update the record
+ as needed, record deletion is not possible).
+ </para>
+
+ <para>
+ During all actions, the
+ usual rules for internal record ID generation apply, unless an
+ optional <literal>recordIdNumber</literal> &zebra; internal ID or a
+ <literal>recordIdOpaque</literal> string identifier is assigned.
+ The default ID generation is
+ configured using the <literal>recordId:</literal> from
+ <filename>zebra.cfg</filename>.
+ See <xref linkend="zebra-cfg"/>.
+ </para>
+
+ <para>
+ Setting of the <literal>recordIdNumber</literal> parameter,
+ which must be an existing &zebra; internal system ID number, is not
+ allowed during any <literal>recordInsert</literal> or
+ <literal>specialUpdate</literal> action resulting in fresh record
+ inserts.
+ </para>
+
+ <para>
+ When retrieving existing
+ records indexed with &acro.grs1; indexing filters, the &zebra; internal
+ ID number is returned in the field
+ <literal>/*/id:idzebra/localnumber</literal> in the namespace
+ <literal>xmlns:id="http://www.indexdata.dk/zebra/"</literal>,
+ where it can be picked up for later record updates or deletes.
+ </para>
+
+ <para>
+ A new element set for retrieval of internal record
+ data has been added, which can be used to access minimal records
+ containing only the <literal>recordIdNumber</literal> &zebra;
+ internal ID, or the <literal>recordIdOpaque</literal> string
+ identifier. This works for any indexing filter used.
+ See <xref linkend="special-retrieval"/>.
+ </para>
+
+ <para>
+ The <literal>recordIdOpaque</literal> string parameter
+ is an client-supplied, opaque record
+ identifier, which may be used under
+ insert, update and delete operations. The
+ client software is responsible for assigning these to
+ records. This identifier will
+ replace zebra's own automagic identifier generation with a unique
+ mapping from <literal>recordIdOpaque</literal> to the
+ &zebra; internal <literal>recordIdNumber</literal>.
+ <emphasis>The opaque <literal>recordIdOpaque</literal> string
+ identifiers
+ are not visible in retrieval records, nor are
+ searchable, so the value of this parameter is
+ questionable. It serves mostly as a convenient mapping from
+ application domain string identifiers to &zebra; internal ID's.
+ </emphasis>
+ </para>
+ </sect2>
+
+
+ <sect2 id="administration-extended-services-yaz-client">
+ <title>Extended services from yaz-client</title>
+
+ <para>
+ We can now start a yaz-client admin session and create a database:
+ <screen>
+ <![CDATA[
+ $ yaz-client localhost:9999 -u admin/secret
+ Z> adm-create
+ ]]>
</screen>
- </para>
+ Now the <literal>Default</literal> database was created,
+ we can insert an &acro.xml; file (esdd0006.grs
+ from example/gils/records) and index it:
+ <screen>
+ <![CDATA[
+ Z> update insert id1234 esdd0006.grs
+ ]]>
+ </screen>
+ The 3rd parameter - <literal>id1234</literal> here -
+ is the <literal>recordIdOpaque</literal> package field.
+ </para>
+ <para>
+ Actually, we should have a way to specify "no opaque record id" for
+ yaz-client's update command.. We'll fix that.
+ </para>
+ <para>
+ The newly inserted record can be searched as usual:
+ <screen>
+ <![CDATA[
+ Z> f utah
+ Sent searchRequest.
+ Received SearchResponse.
+ Search was a success.
+ Number of hits: 1, setno 1
+ SearchResult-1: term=utah cnt=1
+ records returned: 0
+ Elapsed: 0.014179
+ ]]>
+ </screen>
+ </para>
+ <para>
+ Let's delete the beast, using the same
+ <literal>recordIdOpaque</literal> string parameter:
+ <screen>
+ <![CDATA[
+ Z> update delete id1234
+ No last record (update ignored)
+ Z> update delete 1 esdd0006.grs
+ Got extended services response
+ Status: done
+ Elapsed: 0.072441
+ Z> f utah
+ Sent searchRequest.
+ Received SearchResponse.
+ Search was a success.
+ Number of hits: 0, setno 2
+ SearchResult-1: term=utah cnt=0
+ records returned: 0
+ Elapsed: 0.013610
+ ]]>
+ </screen>
+ </para>
+ <para>
+ If shadow register is enabled in your
+ <filename>zebra.cfg</filename>,
+ you must run the adm-commit command
+ <screen>
+ <![CDATA[
+ Z> adm-commit
+ ]]>
+ </screen>
+ after each update session in order write your changes from the
+ shadow to the life register space.
+ </para>
+ </sect2>
- <para>
- By the way, the same configuration can be used to
- search using client-side CQL-to-PQF conversion:
- (the only difference is <literal>querytype cql2rpn</literal>
- instead of
- <literal>querytype cql</literal>, and the call specifying a local
- conversion file)
- <screen>
- <![CDATA[
- yaz-client -q local/cql2pqf.txt localhost:9999
- Z> querytype cql2rpn
- Z> find text=(plant and soil)
- ]]>
+
+ <sect2 id="administration-extended-services-yaz-php">
+ <title>Extended services from yaz-php</title>
+
+ <para>
+ Extended services are also available from the &yaz; &acro.php; client layer. An
+ example of an &yaz;-&acro.php; extended service transaction is given here:
+ <screen>
+ <![CDATA[
+ $record = '<record><title>A fine specimen of a record</title></record>';
+
+ $options = array('action' => 'recordInsert',
+ 'syntax' => 'xml',
+ 'record' => $record,
+ 'databaseName' => 'mydatabase'
+ );
+
+ yaz_es($yaz, 'update', $options);
+ yaz_es($yaz, 'commit', array());
+ yaz_wait();
+
+ if ($error = yaz_error($yaz))
+ echo "$error";
+ ]]>
</screen>
- </para>
+ </para>
+ </sect2>
- <para>
- Exhaustive information can be found in the
- Section "Specification of CQL to RPN mappings" in the YAZ manual.
- <ulink url="http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map">
- http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map</ulink>,
- and shall therefore not be repeated here.
- </para>
- <!--
- <para>
- See
- <ulink url="http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html">
- http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html</ulink>
- for the Maintenance Agency's work-in-progress mapping of Dublin Core
- indexes to Attribute Architecture (util, XD and BIB-2)
- attributes.
- </para>
- -->
- </sect1>
+ <sect2 id="administration-extended-services-debugging">
+ <title>Extended services debugging guide</title>
+ <para>
+ When debugging ES over PHP we recommend the following order of tests:
+ </para>
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ Make sure you have a nice record on your filesystem, which you can
+ index from the filesystem by use of the zebraidx command.
+ Do it exactly as you planned, using one of the GRS-1 filters,
+ or the DOMXML filter.
+ When this works, proceed.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Check that your server setup is OK before you even coded one single
+ line PHP using ES.
+ Take the same record form the file system, and send as ES via
+ <literal>yaz-client</literal> like described in
+ <xref linkend="administration-extended-services-yaz-client"/>,
+ and
+ remember the <literal>-a</literal> option which tells you what
+ goes over the wire! Notice also the section on permissions:
+ try
+ <screen>
+ perm.anonymous: rw
+ </screen>
+ in <literal>zebra.cfg</literal> to make sure you do not run into
+ permission problems (but never expose such an insecure setup on the
+ internet!!!). Then, make sure to set the general
+ <literal>recordType</literal> instruction, pointing correctly
+ to the GRS-1 filters,
+ or the DOMXML filters.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ If you insist on using the <literal>sysno</literal> in the
+ <literal>recordIdNumber</literal> setting,
+ please make sure you do only updates and deletes. Zebra's internal
+ system number is not allowed for
+ <literal>recordInsert</literal> or
+ <literal>specialUpdate</literal> actions
+ which result in fresh record inserts.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ If <literal>shadow register</literal> is enabled in your
+ <literal>zebra.cfg</literal>, you must remember running the
+ <screen>
+ Z> adm-commit
+ </screen>
+ command as well.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ If this works, then proceed to do the same thing in your PHP script.
+ </para>
+ </listitem>
+ </itemizedlist>
+
+
+ </sect2>
+ </sect1>
-
-</chapter>
+ </chapter>
<!-- Keep this comment at the end of the file
Local variables:
sgml-always-quote-attributes:t
sgml-indent-step:1
sgml-indent-data:t
- sgml-parent-document: "zebra.xml"
+ sgml-parent-document: "idzebra.xml"
sgml-local-catalogs: nil
sgml-namecase-general:t
End: