+ <varlistentry>
+ <term>
+ <emphasis>group</emphasis>
+ .recordType[<emphasis>.name</emphasis>]:
+ <replaceable>type</replaceable>
+ </term>
+ <listitem>
+ <para>
+ Specifies how records with the file extension
+ <emphasis>name</emphasis> should be handled by the indexer.
+ This option may also be specified as a command line option
+ (<literal>-t</literal>). Note that if you do not specify a
+ <emphasis>name</emphasis>, the setting applies to all files.
+ In general, the record type specifier consists of the elements (each
+ element separated by dot), <emphasis>fundamental-type</emphasis>,
+ <emphasis>file-read-type</emphasis> and arguments. Currently, two
+ fundamental types exist, <literal>text</literal> and
+ <literal>grs</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>group</emphasis>.recordId:
+ <replaceable>record-id-spec</replaceable></term>
+ <listitem>
+ <para>
+ Specifies how the records are to be identified when updated. See
+ <xref linkend="locating-records"/>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>group</emphasis>.database:
+ <replaceable>database</replaceable></term>
+ <listitem>
+ <para>
+ Specifies the &acro.z3950; database name.
+ <!-- FIXME - now we can have multiple databases in one server. -H -->
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>group</emphasis>.storeKeys:
+ <replaceable>boolean</replaceable></term>
+ <listitem>
+ <para>
+ Specifies whether key information should be saved for a given
+ group of records. If you plan to update/delete this type of
+ records later this should be specified as 1; otherwise it
+ should be 0 (default), to save register space.
+ <!-- ### this is the first mention of "register" -->
+ See <xref linkend="file-ids"/>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>group</emphasis>.storeData:
+ <replaceable>boolean</replaceable></term>
+ <listitem>
+ <para>
+ Specifies whether the records should be stored internally
+ in the &zebra; system files.
+ If you want to maintain the raw records yourself,
+ this option should be false (0).
+ If you want &zebra; to take care of the records for you, it
+ should be true(1).
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <!-- ### probably a better place to define "register" -->
+ <term>register: <replaceable>register-location</replaceable></term>
+ <listitem>
+ <para>
+ Specifies the location of the various register files that &zebra; uses
+ to represent your databases.
+ See <xref linkend="register-location"/>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>shadow: <replaceable>register-location</replaceable></term>
+ <listitem>
+ <para>
+ Enables the <emphasis>safe update</emphasis> facility of &zebra;, and
+ tells the system where to place the required, temporary files.
+ See <xref linkend="shadow-registers"/>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lockDir: <replaceable>directory</replaceable></term>
+ <listitem>
+ <para>
+ Directory in which various lock files are stored.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>keyTmpDir: <replaceable>directory</replaceable></term>
+ <listitem>
+ <para>
+ Directory in which temporary files used during zebraidx's update
+ phase are stored.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>setTmpDir: <replaceable>directory</replaceable></term>
+ <listitem>
+ <para>
+ Specifies the directory that the server uses for temporary result sets.
+ If not specified <literal>/tmp</literal> will be used.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>profilePath: <replaceable>path</replaceable></term>
+ <listitem>
+ <para>
+ Specifies a path of profile specification files.
+ The path is composed of one or more directories separated by
+ colon. Similar to <literal>PATH</literal> for UNIX systems.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>modulePath: <replaceable>path</replaceable></term>
+ <listitem>
+ <para>
+ Specifies a path of record filter modules.
+ The path is composed of one or more directories separated by
+ colon. Similar to <literal>PATH</literal> for UNIX systems.
+ The 'make install' procedure typically puts modules in
+ <filename>/usr/local/lib/idzebra-2.0/modules</filename>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>index: <replaceable>filename</replaceable></term>
+ <listitem>
+ <para>
+ Defines the filename which holds fields structure
+ definitions. If omitted, the file <filename>default.idx</filename>
+ is read.
+ Refer to <xref linkend="default-idx-file"/> for
+ more information.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>sortmax: <replaceable>integer</replaceable></term>
+ <listitem>
+ <para>
+ Specifies the maximum number of records that will be sorted
+ in a result set. If the result set contains more than
+ <replaceable>integer</replaceable> records, records after the
+ limit will not be sorted. If omitted, the default value is
+ 1,000.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>staticrank: <replaceable>integer</replaceable></term>
+ <listitem>
+ <para>
+ Enables whether static ranking is to be enabled (1) or
+ disabled (0). If omitted, it is disabled - corresponding
+ to a value of 0.
+ Refer to <xref linkend="administration-ranking-static"/> .
+ </para>
+ </listitem>
+ </varlistentry>
+
+
+ <varlistentry>
+ <term>estimatehits: <replaceable>integer</replaceable></term>
+ <listitem>
+ <para>
+ Controls whether &zebra; should calculate approximate hit counts and
+ at which hit count it is to be enabled.
+ A value of 0 disables approximate hit counts.
+ For a positive value approximate hit count is enabled
+ if it is known to be larger than <replaceable>integer</replaceable>.
+ </para>
+ <para>
+ Approximate hit counts can also be triggered by a particular
+ attribute in a query.
+ Refer to <xref linkend="querymodel-zebra-global-attr-limit"/>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>attset: <replaceable>filename</replaceable></term>
+ <listitem>
+ <para>
+ Specifies the filename(s) of attribute set files for use in
+ searching. In many configurations <filename>bib1.att</filename>
+ is used, but that is not required. If Classic Explain
+ attributes is to be used for searching,
+ <filename>explain.att</filename> must be given.
+ The path to att-files in general can be given using
+ <literal>profilePath</literal> setting.
+ See also <xref linkend="attset-files"/>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>memMax: <replaceable>size</replaceable></term>
+ <listitem>
+ <para>
+ Specifies <replaceable>size</replaceable> of internal memory
+ to use for the zebraidx program.
+ The amount is given in megabytes - default is 4 (4 MB).
+ The more memory, the faster large updates happen, up to about
+ half the free memory available on the computer.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>tempfiles: <replaceable>Yes/Auto/No</replaceable></term>
+ <listitem>
+ <para>
+ Tells zebra if it should use temporary files when indexing. The
+ default is Auto, in which case zebra uses temporary files only
+ if it would need more that <replaceable>memMax</replaceable>
+ megabytes of memory. This should be good for most uses.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>root: <replaceable>dir</replaceable></term>
+ <listitem>
+ <para>
+ Specifies a directory base for &zebra;. All relative paths
+ given (in profilePath, register, shadow) are based on this
+ directory. This setting is useful if your &zebra; server
+ is running in a different directory from where
+ <literal>zebra.cfg</literal> is located.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>passwd: <replaceable>file</replaceable></term>
+ <listitem>
+ <para>
+ Specifies a file with description of user accounts for &zebra;.
+ The format is similar to that known to Apache's htpasswd files
+ and UNIX' passwd files. Non-empty lines not beginning with
+ # are considered account lines. There is one account per-line.
+ A line consists of fields separate by a single colon character.
+ First field is username, second is password.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>passwd.c: <replaceable>file</replaceable></term>
+ <listitem>
+ <para>
+ Specifies a file with description of user accounts for &zebra;.
+ File format is similar to that used by the passwd directive except
+ that the password are encrypted. Use Apache's htpasswd or similar
+ for maintenance.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>perm.<replaceable>user</replaceable>:
+ <replaceable>permstring</replaceable></term>
+ <listitem>
+ <para>
+ Specifies permissions (privilege) for a user that are allowed
+ to access &zebra; via the passwd system. There are two kinds
+ of permissions currently: read (r) and write(w). By default
+ users not listed in a permission directive are given the read
+ privilege. To specify permissions for a user with no
+ username, or &acro.z3950; anonymous style use
+ <literal>anonymous</literal>. The permstring consists of
+ a sequence of characters. Include character <literal>w</literal>
+ for write/update access, <literal>r</literal> for read access and
+ <literal>a</literal> to allow anonymous access through this account.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>dbaccess: <replaceable>accessfile</replaceable></term>
+ <listitem>
+ <para>
+ Names a file which lists database subscriptions for individual users.
+ The access file should consists of lines of the form
+ <literal>username: dbnames</literal>, where dbnames is a list of
+ database names, separated by '+'. No whitespace is allowed in the
+ database list.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>encoding: <replaceable>charsetname</replaceable></term>
+ <listitem>
+ <para>
+ Tells &zebra; to interpret the terms in Z39.50 queries as
+ having been encoded using the specified character
+ encoding. The default is <literal>ISO-8859-1</literal>; one
+ useful alternative is <literal>UTF-8</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>storeKeys: <replaceable>value</replaceable></term>
+ <listitem>
+ <para>
+ Specifies whether &zebra; keeps a copy of indexed keys.
+ Use a value of 1 to enable; 0 to disable. If storeKeys setting is
+ omitted, it is enabled. Enabled storeKeys
+ are required for updating and deleting records. Disable only
+ storeKeys to save space and only plan to index data once.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>storeData: <replaceable>value</replaceable></term>
+ <listitem>
+ <para>
+ Specifies whether &zebra; keeps a copy of indexed records.
+ Use a value of 1 to enable; 0 to disable. If storeData setting is
+ omitted, it is enabled. A storeData setting of 0 (disabled) makes
+ Zebra fetch records from the original locaction in the file
+ system using filename, file offset and file length. For the
+ DOM and ALVIS filter, the storeData setting is ignored.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+ </para>
+
+ </sect1>
+
+ <sect1 id="locating-records">
+ <title>Locating Records</title>
+
+ <para>
+ The default behavior of the &zebra; system is to reference the
+ records from their original location, i.e. where they were found when you
+ run <literal>zebraidx</literal>.
+ That is, when a client wishes to retrieve a record
+ following a search operation, the files are accessed from the place
+ where you originally put them - if you remove the files (without
+ running <literal>zebraidx</literal> again, the server will return
+ diagnostic number 14 (``System error in presenting records'') to
+ the client.
+ </para>
+
+ <para>
+ If your input files are not permanent - for example if you retrieve
+ your records from an outside source, or if they were temporarily
+ mounted on a CD-ROM drive,
+ you may want &zebra; to make an internal copy of them. To do this,
+ you specify 1 (true) in the <literal>storeData</literal> setting. When
+ the &acro.z3950; server retrieves the records they will be read from the
+ internal file structures of the system.
+ </para>
+
+ </sect1>
+
+ <sect1 id="simple-indexing">
+ <title>Indexing with no Record IDs (Simple Indexing)</title>
+
+ <para>
+ If you have a set of records that are not expected to change over time
+ you may can build your database without record IDs.
+ This indexing method uses less space than the other methods and
+ is simple to use.
+ </para>
+
+ <para>
+ To use this method, you simply omit the <literal>recordId</literal> entry
+ for the group of files that you index. To add a set of records you use
+ <literal>zebraidx</literal> with the <literal>update</literal> command. The
+ <literal>update</literal> command will always add all of the records that it
+ encounters to the index - whether they have already been indexed or
+ not. If the set of indexed files change, you should delete all of the
+ index files, and build a new index from scratch.
+ </para>
+
+ <para>
+ Consider a system in which you have a group of text files called
+ <literal>simple</literal>.
+ That group of records should belong to a &acro.z3950; database called
+ <literal>textbase</literal>.
+ The following <literal>zebra.cfg</literal> file will suffice:
+ </para>
+ <para>
+
+ <screen>
+ profilePath: /usr/local/idzebra/tab
+ attset: bib1.att
+ simple.recordType: text
+ simple.database: textbase
+ </screen>
+
+ </para>
+
+ <para>
+ Since the existing records in an index can not be addressed by their
+ IDs, it is impossible to delete or modify records when using this method.
+ </para>
+
+ </sect1>
+
+ <sect1 id="file-ids">
+ <title>Indexing with File Record IDs</title>
+
+ <para>
+ If you have a set of files that regularly change over time: Old files
+ are deleted, new ones are added, or existing files are modified, you
+ can benefit from using the <emphasis>file ID</emphasis>
+ indexing methodology.
+ Examples of this type of database might include an index of WWW
+ resources, or a USENET news spool area.
+ Briefly speaking, the file key methodology uses the directory paths
+ of the individual records as a unique identifier for each record.
+ To perform indexing of a directory with file keys, again, you specify
+ the top-level directory after the <literal>update</literal> command.
+ The command will recursively traverse the directories and compare
+ each one with whatever have been indexed before in that same directory.
+ If a file is new (not in the previous version of the directory) it
+ is inserted into the registers; if a file was already indexed and
+ it has been modified since the last update, the index is also
+ modified; if a file has been removed since the last
+ visit, it is deleted from the index.
+ </para>
+
+ <para>
+ The resulting system is easy to administrate. To delete a record you
+ simply have to delete the corresponding file (say, with the
+ <literal>rm</literal> command). And to add records you create new
+ files (or directories with files). For your changes to take effect
+ in the register you must run <literal>zebraidx update</literal> with
+ the same directory root again. This mode of operation requires more
+ disk space than simpler indexing methods, but it makes it easier for
+ you to keep the index in sync with a frequently changing set of data.
+ If you combine this system with the <emphasis>safe update</emphasis>
+ facility (see below), you never have to take your server off-line for
+ maintenance or register updating purposes.