+<sect1>Overview
+
+<p>
+The Zebra system is a fielded free-text indexing and retrieval engine with a
+Z39.50 frontend. You can use any commercial or freeware Z39.50 client
+to access data stored in Zebra.
+
+The Zebra server is our first step towards the development of a fully
+configurable, open information system. Eventually, it will be paired
+off with a powerful Z39.50 client to support complex information
+management tasks within almost any application domain. We're making
+the server available now because it's no fun to be in the open
+information retrieval business all by yourself. We want to allow
+people with interesting data to make their things
+available in interesting ways, without having to start out
+by implementing yet another protocol stack from scratch.
+
+This document is an introduction to the Zebra system. It will tell you
+how to compile the software, and how to prepare your first database.
+It also explains how the server can be configured to give you the
+functionality that you need.
+
+If you find the software interesting, you should join the support
+mailing-list by sending email to <tt/zebra-request@indexdata.dk/.
+
+<sect1>Features
+
+<p>
+This is a list of some of the most important features of the
+system.
+
+<itemize>
+
+<item>
+Supports updating - records can be added and deleted without
+rebuilding the index from scratch.
+The update procedure is tolerant to crashes or hard interrupts
+during register updating - registers can be reconstructed following a crash.
+Registers can be safely updated even while users are accessing the server.
+
+<item>
+Supports large databases - files for indices, etc. can be
+automatically partitioned over multiple disks.
+
+<item>
+Supports arbitrarily complex records - base input format is an
+SGML-like syntax which allows nested (structured) data elements, as
+well as variant forms of data.
+
+<item>
+Supports random storage formats. A system of input filters driven by
+regular expressions allows you to easily process most ASCII-based
+data formats. SGML, ISO2709 (MARC), and raw text are also supported.
+
+<item>
+Supports boolean queries as well as relevance-ranking (free-text)
+searching. Right truncation and masking in terms are supported, as
+well as full regular expressions.
+
+<item>
+Supports multiple concrete syntaxes
+for record exchange (depending on the configuration): GRS-1, SUTRS,
+ISO2709 (*MARC). Records can be mapped between record syntaxes and
+schema on the fly.
+
+<item>
+Supports approximate matching in registers (ie. spelling mistakes,
+etc).
+
+</itemize>
+
+<p>
+Protocol support:
+
+<itemize>
+
+<item>
+Protocol facilities: Init, Search, Retrieve, Browse and Sort.
+
+<item>
+Piggy-backed presents are honored in the search-request.
+
+<item>
+Named result sets are supported.
+
+<item>
+Easily configured to support different application profiles, with
+tables for attribute sets, tag sets, and abstract syntaxes.
+Additional tables control facilities such as element mappings to
+different schema (eg., GILS-to-USMARC).
+
+<item>
+Complex composition specifications using Espec-1 are partially
+supported (simple element requests only).
+
+<item>
+Element Set Names are defined using the Espec-1 capability of the
+system, and are given in configuration files as simple element
+requests (and possibly variant requests).
+
+<item>
+Some variant support (not fully implemented yet).
+
+<item>
+Using the YAZ toolkit for the protocol implementation, the
+server can utilise a plug-in XTI/mOSI implementation (not included) to
+provide SR services over an OSI stack, as well as Z39.50 over TCP/IP.
+
+<item>
+Zebra runs on most Unix-like systems as well as Windows NT - a binary
+distribution for Windows NT is forthcoming - so far, the installation
+requires MSVC++ to compile the system (we use version 5.0).
+
+</itemize>
+
+<sect1>Future Work
+
+<p>
+This is a beta-release of the software, to allow you to look at
+it - try it out, and assess whether it can be of use to you.
+
+These are some of the plans that we have for the software in the near
+and far future, approximately ordered after their relative importance.
+Items marked with an
+asterisk will be implemented before the
+last beta release.
+
+<itemize>
+
+<item>
+*Complete the support for variants.
+
+<item>
+*Finalize the data element <it/include/ facility to support multimedia
+data elements in records.
+
+<item>
+Add more sophisticated relevance ranking mechanisms. Add support for soundex
+and stemming. Add relevance <it/feedback/ support.
+
+<item>
+Complete EXPLAIN support.
+
+<item>
+Add support for very large records by implementing segmentation and/or
+variant pieces.
+
+<item>
+Support the Item Update extended service of the protocol.
+
+<item>
+We want to add a management system that allows you to
+control your databases and configuration tables from a graphical
+interface. We'll probably use Tcl/Tk to stay platform-independent.
+
+</itemize>
+
+Programmers thrive on user feedback. If you are interested in a facility that
+you don't see mentioned here, or if there's something you think we
+could do better, please drop us a mail. If you think it's all really
+neat, you're welcome to drop us a line saying that, too. You'll find
+contact info at the end of this file.
+
+<sect>Compiling the software
+
+<p>
+An ANSI C compiler is required to compile the Zebra
+server system — <tt/gcc/ works fine if your own system doesn't
+provide an adequate compiler.
+
+Unpack the distribution archive. The <tt>configure</tt> shell script
+attempts to guess correct values for various system-dependent variables
+used during compilation. It uses those values to create a 'Makefile' in
+each directory of Zebra.
+
+To run the configure script type:
+<tscreen><verb>
+ ./configure
+</verb></tscreen>
+
+The configure script attempts to use C compiler specified by
+the <tt>CC</tt> environment variable. If not set, <tt>cc</tt>
+will be used. The <tt>CFLAGS</tt> environment variable holds
+options to be passed to the C compiler. If you're using a Bourne-shell
+compatible shell you may pass something like this:
+<tscreen><verb>
+ CC=/opt/ccs/bin/cc CFLAGS=-O ./configure
+</verb></tscreen>
+
+When configured build the software by typing:
+<tscreen><verb>
+ make
+</verb></tscreen>
+
+As an option you may type <tt>make depend</tt> to create
+source file dependencies for the package. This is only needed,
+however, if you alter the source.
+
+If successful, two executables have been created in the sub-directory
+<tt/index/.
+<descrip>
+<tag><tt>zebrasrv</tt></tag> The Z39.50 server and search engine.
+<tag><tt>zebraidx</tt></tag> The administrative tool for the search index.
+</descrip>
+
+<sect>Quick Start
+<p>
+In this section, we will test the system by indexing a small set of sample
+GILS records that are included with the software distribution. Go to the
+<tt>test/gils</tt> subdirectory of the distribution archive. There you will
+find a configuration
+file named <tt>zebra.cfg</tt> with the following contents:
+<tscreen><verb>
+# Where are the YAZ tables located.
+profilePath: ../../../yaz/tab ../../tab
+
+# Files that describe the attribute sets supported.
+attset: bib1.att
+attset: gils.att
+</verb></tscreen>
+
+Now, edit the file and set <tt>profilePath</tt> to the path of the
+YAZ profile tables (sub directory <tt>tab</tt> of the YAZ distribution
+archive).
+
+The 48 test records are located in the sub directory <tt>records</tt>.
+To index these, type:
+<tscreen><verb>
+$ ../../index/zebraidx -t grs.sgml update records
+</verb></tscreen>
+
+In the command above the option <tt>-t</tt> specified the record
+type — in this case <tt>grs.sgml</tt>. The word <tt>update</tt> followed
+by a directory root updates all files below that directory node.
+
+If your indexing command was successful, you are now ready to
+fire up a server. To start a server on port 2100, type:
+<tscreen><verb>
+$ ../../index/zebrasrv tcp:@:2100
+</verb></tscreen>
+
+The Zebra index that you have just created has a single database
+named <tt/Default/. The database contains records structured according to
+the GILS profile, and the server will
+return records in either either USMARC, GRS-1, or SUTRS depending
+on what your client asks
+for.
+
+To test the server, you can use any Z39.50 client (1992 or later). For
+instance, you can use the demo client that comes with YAZ: Just cd to
+the <tt/client/ subdirectory of the YAZ distribution and type:
+
+<tscreen><verb>
+$ client tcp:localhost:2100
+</verb></tscreen>
+
+When the client has connected, you can type:
+
+<tscreen><verb>
+Z> find surficial
+Z> show 1
+</verb></tscreen>
+
+The default retrieval syntax for the client is USMARC. To try other
+formats for the same record, try:
+
+<tscreen><verb>
+Z>format sutrs
+Z>show 1
+Z>format grs-1
+Z>show 1
+Z>elements B
+Z>show 1
+</verb></tscreen>
+
+<it>NOTE: You may notice that more fields are returned when your
+client requests SUTRS or GRS-1 records. When retrieving GILS records,
+this is normal - not all of the GILS data elements have mappings in
+the USMARC record format.</it>
+
+If you've made it this far, there's a good chance that
+you've got through the compilation OK.
+
+<sect>Administrating Zebra<label id="administrating">
+
+<p>
+Unlike many simpler retrieval systems, Zebra supports safe, incremental
+updates to an existing index.
+
+Normally, when Zebra modifies the index it reads a number of records
+that you specify.
+Depending on your specifications and on the contents of each record
+one the following events take place for each record:
+<descrip>
+<tag>Insert</tag> The record is indexed as if it never occurred
+before. Either the Zebra system doesn't know how to identify the record or
+Zebra can identify the record but didn't find it to be already indexed.
+<tag>Modify</tag> The record has already been indexed. In this case
+either the contents of the record or the location (file) of the record
+indicates that it has been indexed before.
+<tag>Delete</tag> The record is deleted from the index. As in the
+update-case it must be able to identify the record.
+</descrip>
+
+Please note that in both the modify- and delete- case the Zebra
+indexer must be able to generate a unique key that identifies the record in
+question (more on this below).
+
+To administrate the Zebra retrieval system, you run the
+<tt>zebraidx</tt> program. This program supports a number of options
+which are preceded by a minus, and a few commands (not preceded by
+minus).
+
+Both the Zebra administrative tool and the Z39.50 server share a
+set of index files and a global configuration file. The
+name of the configuration file defaults to <tt>zebra.cfg</tt>.
+The configuration file includes specifications on how to index
+various kinds of records and where the other configuration files
+are located. <tt>zebrasrv</tt> and <tt>zebraidx</tt> <em>must</em>
+be run in the directory where the configuration file lives unless you
+indicate the location of the configuration file by option
+<tt>-c</tt>.
+
+<sect1>Record Types<label id="record-types">
+<p>
+Indexing is a per-record process, in which either insert/modify/delete
+will occur. Before a record is indexed search keys are extracted from
+whatever might be the layout the original record (sgml,html,text, etc..).
+The Zebra system currently supports two fundamantal types of records:
+structured and simple text.
+To specify a particular extraction process, use either the
+command line option <tt>-t</tt> or specify a
+<tt>recordType</tt> setting in the configuration file.
+
+<sect1>The Zebra Configuration File<label id="configuration-file">
+<p>
+The Zebra configuration file, read by <tt>zebraidx</tt> and
+<tt>zebrasrv</tt> defaults to <tt>zebra.cfg</tt> unless specified
+by <tt>-c</tt> option.
+
+You can edit the configuration file with a normal text editor.
+Parameter names and values are seperated by colons in the file. Lines
+starting with a hash sign (<tt/#/) are treated as comments.
+
+If you manage different sets of records that share common
+characteristics, you can organize the configuration settings for each
+type into &dquot;groups&dquot;.
+When <tt>zebraidx</tt> is run and you wish to address a given group
+you specify the group name with the <tt>-g</tt> option. In this case
+settings that have the group name as their prefix will be used
+by <tt>zebraidx</tt>. If no <tt/-g/ option is specified, the settings
+with no prefix are used.
+
+In the configuration file, the group name is placed before the option
+name itself, separated by a dot (.). For instance, to set the record type
+for group <tt/public/ to <tt/grs.sgml/ (the SGML-like format for structured
+records) you would write:
+
+<tscreen><verb>
+public.recordType: grs.sgml
+</verb></tscreen>
+
+To set the default value of the record type to <tt/text/ write:
+
+<tscreen><verb>
+recordType: text
+</verb></tscreen>
+
+The available configuration settings are summarized below. They will be
+explained further in the following sections.
+
+<descrip>
+<tag><it>group</it>.recordType[<it>.name</it>]</tag>
+ Specifies how records with the file extension <it>name</it> should
+ be handled by the indexer. This option may also be specified
+ as a command line option (<tt>-t</tt>). Note that if you do not
+ specify a <it/name/, the setting applies to all files. In general,
+ the record type specifier consists of the elements (each
+ element separated by dot), <it>fundamental-type</it>,
+ <it>file-read-type</it> and arguments. Currently, two
+ fundamental types exist, <tt>text</tt> and <tt>grs</tt>.
+ <tag><it>group</it>.recordId</tag>
+ Specifies how the records are to be identified when updated. See
+section <ref id="locating-records" name="Locating Records">.
+<tag><it>group</it>.database</tag>
+ Specifies the Z39.50 database name.
+<tag><it>group</it>.storeKeys</tag>
+ Specifies whether key information should be saved for a given
+ group of records. If you plan to update/delete this type of
+ records later this should be specified as 1; otherwise it
+ should be 0 (default), to save register space. See section
+<ref id="file-ids" name="Indexing With File Record IDs">.
+<tag><it>group</it>.storeData</tag>
+ Specifies whether the records should be stored internally
+ in the Zebra system files. If you want to maintain the raw records yourself,
+ this option should be false (0). If you want Zebra to take care of the records
+ for you, it should be true(1).
+<tag>register</tag>
+ Specifies the location of the various register files that Zebra uses
+ to represent your databases. See section
+<ref id="register-location" name="Register Location">.
+<tag>shadow</tag>
+ Enables the <it/safe update/ facility of Zebra, and tells the system
+ where to place the required, temporary files. See section
+<ref id="shadow-registers" name="Safe Updating - Using Shadow Registers">.
+<tag>lockDir</tag>
+ Directory in which various lock files are stored.
+<tag>keyTmpDir</tag>
+ Directory in which temporary files used during zebraidx' update
+ phase are stored.
+<tag>setTmpDir</tag>
+ Specifies the directory that the server uses for temporary result sets.
+ If not specified <tt>/tmp</tt> will be used.
+<tag>profilePath</tag>
+ Specifies the location of profile specification files.
+<tag>attset</tag>
+ Specifies the filename(s) of attribute set files for use in
+ searching. At least the Bib-1 set should be loaded (<tt/bib1.att/).
+ The <tt/profilePath/ setting is used to look for the specified files.
+ See section <ref id="attset-files" name="The Attribute Set Files">
+<tag>memMax</tag>
+ Specifies size of internal memory to use for the zebraidx program. The
+ amount is given in megabytes - default is 4 (4 MB).
+</descrip>
+<sect1>Locating Records<label id="locating-records">
+<p>
+The default behaviour of the Zebra system is to reference the
+records from their original location, i.e. where they were found when you
+ran <tt/zebraidx/. That is, when a client wishes to retrieve a record
+following a search operation, the files are accessed from the place
+where you originally put them - if you remove the files (without
+running <tt/zebraidx/ again, the client will receive a diagnostic
+message.
+
+If your input files are not permanent - for example if you retrieve
+your records from an outside source, or if they were temporarily
+mounted on a CD-ROM drive,
+you may want Zebra to make an internal copy of them. To do this,
+you specify 1 (true) in the <tt>storeData</tt> setting. When
+the Z39.50 server retrieves the records they will be read from the
+internal file structures of the system.
+
+<sect1>Indexing with no Record IDs (Simple Indexing)
+
+<p>
+If you have a set of records that are not expected to change over time
+you may can build your database without record IDs.
+This indexing method uses less space than the other methods and
+is simple to use.
+
+To use this method, you simply omit the <tt>recordId</tt> entry
+for the group of files that you index. To add a set of records you use
+<tt>zebraidx</tt> with the <tt>update</tt> command. The
+<tt>update</tt> command will always add all of the records that it
+encounters to the index - whether they have already been indexed or
+not. If the set of indexed files change, you should delete all of the
+index files, and build a new index from scratch.
+
+Consider a system in which you have a group of text files called
+<tt>simple</tt>. That group of records should belong to a Z39.50 database
+called <tt>textbase</tt>. The following <tt/zebra.cfg/ file will suffice:
+
+<tscreen><verb>
+profilePath: /usr/local/yaz
+attset: bib1.att
+simple.recordType: text
+simple.database: textbase
+</verb></tscreen>
+
+Since the existing records in an index can not be addressed by their
+IDs, it is impossible to delete or modify records when using this method.
+
+<sect1>Indexing with File Record IDs<label id="file-ids">
+
+<p>
+If you have a set of files that regularly change over time: Old files
+are deleted, new ones are added, or existing files are modified, you
+can benefit from using the <it/file ID/ indexing methodology. Examples
+of this type of database might include an index of WWW resources, or a
+USENET news spool area. Briefly speaking, the file key methodology
+uses the directory paths of the individual records as a unique
+identifier for each record. To perform indexing of a directory with
+file keys, again, you specify the top-level directory after the
+<tt>update</tt> command. The command will recursively traverse the
+directories and compare each one with whatever have been indexed before in
+that same directory. If a file is new (not in the previous version of
+the directory) it is inserted into the registers; if a file was
+already indexed and it has been modified since the last update,
+the index is also modified; if a file has been removed since the last
+visit, it is deleted from the index.
+
+The resulting system is easy to administrate. To delete a record you
+simply have to delete the corresponding file (say, with the <tt/rm/
+command). And to add records you create new files (or directories with
+files). For your changes to take effect in the register you must run
+<tt>zebraidx update</tt> with the same directory root again. This mode
+of operation requires more disk space than simpler indexing methods,
+but it makes it easier for you to keep the index in sync with a
+frequently changing set of data. If you combine this system with the
+<it/safe update/ facility (see below), you never have to take your
+server offline for maintenance or register updating purposes.
+
+To enable indexing with pathname IDs, you must specify <tt>file</tt> as
+the value of <tt>recordId</tt> in the configuration file. In addition,
+you should set <tt>storeKeys</tt> to <tt>1</tt>, since the Zebra
+indexer must save additional information about the contents of each record
+in order to modify the indices correctly at a later time.
+
+For example, to update records of group <tt>esdd</tt> located below
+<tt>/data1/records/</tt> you should type:
+<tscreen><verb>
+$ zebraidx -g esdd update /data1/records
+</verb></tscreen>
+
+The corresponding configuration file includes:
+<tscreen><verb>
+esdd.recordId: file
+esdd.recordType: grs.sgml
+esdd.storeKeys: 1
+</verb></tscreen>
+
+<em>Important note: You cannot start out with a group of records with simple
+indexing (no record IDs as in the previous section) and then later
+enable file record Ids. Zebra must know from the first time that you
+index the group that
+the files should be indexed with file record IDs.
+</em>
+
+You cannot explicitly delete records when using this method (using the
+<bf/delete/ command to <tt/zebraidx/. Instead
+you have to delete the files from the file system (or move them to a
+different location)
+and then run <tt>zebraidx</tt> with the <bf/update/ command.
+
+<sect1>Indexing with General Record IDs
+<p>
+When using this method you construct an (almost) arbritrary, internal
+record key based on the contents of the record itself and other system
+information. If you have a group of records that explicitly associates
+an ID with each record, this method is convenient. For example, the
+record format may contain a title or a ID-number - unique within the group.
+In either case you specify the Z39.50 attribute set and use-attribute
+location in which this information is stored, and the system looks at
+that field to determine the identity of the record.
+
+As before, the record ID is defined by the <tt>recordId</tt> setting
+in the configuration file. The value of the record ID specification
+consists of one or more tokens separated by whitespace. The resulting
+ID is
+represented in the index by concatenating the tokens and separating them by
+ASCII value (1).
+
+There are three kinds of tokens:
+<descrip>
+<tag>Internal record info</tag> The token refers to a key that is
+extracted from the record. The syntax of this token is
+ <tt/(/ <em/set/ <tt/,/ <em/use/ <tt/)/, where <em/set/ is the
+attribute set name <em/use/ is the name or value of the attribute.
+<tag>System variable</tag> The system variables are preceded by
+<verb>$</verb> and immediately followed by the system variable name, which
+may one of
+ <descrip>
+ <tag>group</tag> Group name.
+ <tag>database</tag> Current database specified.
+ <tag>type</tag> Record type.
+ </descrip>
+<tag>Constant string</tag> A string used as part of the ID — surrounded
+ by single- or double quotes.
+</descrip>
+
+For instance, the sample GILS records that come with the Zebra
+distribution contain a unique ID in the data tagged Control-Identifier.
+The data is mapped to the Bib-1 use attribute Identifier-standard
+(code 1007). To use this field as a record id, specify
+<tt>(bib1,Identifier-standard)</tt> as the value of the
+<tt>recordId</tt> in the configuration file.
+If you have other record types that uses the same field for a
+different purpose, you might add the record type
+(or group or database name) to the record id of the gils
+records as well, to prevent matches with other types of records.
+In this case the recordId might be set like this:
+<tscreen><verb>
+gils.recordId: $type (bib1,Identifier-standard)
+</verb></tscreen>
+
+(see section <ref id="data-model" name="Configuring Your Data Model">
+for details of how the mapping between elements of your records and
+searchable attributes is established).
+
+As for the file record ID case described in the previous section,
+updating your system is simply a matter of running <tt>zebraidx</tt>
+with the <tt>update</tt> command. However, the update with general
+keys is considerably slower than with file record IDs, since all files
+visited must be (re)read to discover their IDs.
+
+As you might expect, when using the general record IDs
+method, you can only add or modify existing records with the <tt>update</tt>
+command. If you wish to delete records, you must use the,
+<tt>delete</tt> command, with a directory as a parameter.
+This will remove all records that match the files below that root
+directory.
+
+<sect1>Register Location<label id="register-location">
+
+<p>
+Normally, the index files that form dictionaries, inverted
+files, record info, etc., are stored in the directory where you run
+<tt>zebraidx</tt>. If you wish to store these, possibly large, files
+somewhere else, you must add the <tt>register</tt> entry to the
+<tt/zebra.cfg/ file. Furthermore, the Zebra system allows its file
+structures to
+span multiple file systems, which is useful for managing very large
+databases.
+
+The value of the <tt>register</tt> setting is a sequence of tokens.
+Each token takes the form:
+<tscreen>
+<em>dir</em><tt>:</tt><em>size</em>.
+</tscreen>
+The <em>dir</em> specifies a directory in which index files will be
+stored and the <em>size</em> specifies the maximum size of all
+files in that directory. The Zebra indexer system fills each directory
+in the order specified and use the next specified directories as needed.
+The <em>size</em> is an integer followed by a qualifier
+code, <tt>M</tt> for megabytes, <tt>k</tt> for kilobytes.
+
+For instance, if you have allocated two disks for your register, and
+the first disk is mounted
+on <tt>/d1</tt> and has 200 Mb of free space and the
+second, mounted on <tt>/d2</tt> has 300 Mb, you could
+put this entry in your configuration file:
+<tscreen><verb>
+register: /d1:200M /d2:300M
+</verb></tscreen>
+
+Note that Zebra does not verify that the amount of space specified is
+actually available on the directory (file system) specified - it is
+your responsibility to ensure that enough space is available, and that
+other applications do not attempt to use the free space. In a large production system,
+it is recommended that you allocate one or more filesystem exclusively
+to the Zebra register files.
+
+<sect1>Safe Updating - Using Shadow Registers<label id="shadow-registers">
+
+<sect2>Description
+
+<p>
+The Zebra server supports <it/updating/ of the index structures. That is,
+you can add, modify, or remove records from databases managed by Zebra
+without rebuilding the entire index. Since this process involves
+modifying structured files with various references between blocks of
+data in the files, the update process is inherently sensitive to
+system crashes, or to process interruptions: Anything but a
+successfully completed update process will leave the register files in
+an unknown state, and you will essentially have no recourse but to
+re-index everything, or to restore the register files from a backup
+medium. Further, while the update process is active, users cannot be
+allowed to access the system, as the contents of the register files
+may change unpredictably.
+
+You can solve these problems by enabling the shadow register system in
+Zebra. During the updating procedure, <tt/zebraidx/ will temporarily
+write changes to the involved files in a set of &dquot;shadow
+files&dquot;, without modifying the files that are accessed by the
+active server processes. If the update procedure is interrupted by a
+system crash or a signal, you simply repeat the procedure - the
+register files have not been changed or damaged, and the partially
+written shadow files are automatically deleted before the new updating
+procedure commences.
+
+At the end of the updating procedure (or in a separate operation, if
+you so desire), the system enters a &dquot;commit mode&dquot;. First,
+any active server processes are forced to access those blocks that
+have been changed from the shadow files rather than from the main
+register files; the unmodified blocks are still accessed at their
+normal location (the shadow files are not a complete copy of the
+register files - they only contain those parts that have actually been
+modified). If the commit process is interrupted at any point during the
+commit process, the server processes will continue to access the
+shadow files until you can repeat the commit procedure and complete
+the writing of data to the main register files. You can perform
+multiple update operations to the registers before you commit the
+changes to the system files, or you can execute the commit operation
+at the end of each update operation. When the commit phase has
+completed successfully, any running server processes are instructed to
+switch their operations to the new, operational register, and the
+temporary shadow files are deleted.
+
+<sect2>How to Use Shadow Register Files
+
+<p>
+The first step is to allocate space on your system for the shadow
+files. You do this by adding a <tt/shadow/ entry to the <tt/zebra.cfg/
+file. The syntax of the <tt/shadow/ entry is exactly the same as for
+the <tt/register/ entry (see section <ref name="Register Location"
+id="register-location">). The location of the shadow area should be
+<it/different/ from the location of the main register area (if you
+have specified one - remember that if you provide no <tt/register/
+setting, the default register area is the
+working directory of the server and indexing processes).
+
+The following excerpt from a <tt/zebra.cfg/ file shows one example of
+a setup that configures both the main register location and the shadow
+file area. Note that two directories or partitions have been set aside
+for the shadow file area. You can specify any number of directories
+for each of the file areas, but remember that there should be no
+overlaps between the directories used for the main registers and the
+shadow files, respectively.
+
+<tscreen><verb>
+register: /d1:500M
+
+shadow: /scratch1:100M /scratch2:200M
+</verb></tscreen>
+
+When shadow files are enabled, an extra command is available at the
+<tt/zebraidx/ command line. In order to make changes to the system
+take effect for the users, you'll have to submit a
+&dquot;commit&dquot; command after a (sequence of) update
+operation(s). You can ask the indexer to commit the changes
+immediately after the update operation:
+
+<tscreen><verb>
+$ zebraidx update /d1/records update /d2/more-records commit
+</verb></tscreen>
+
+Or you can execute multiple updates before committing the changes:
+
+<tscreen><verb>
+$ zebraidx -g books update /d1/records update /d2/more-records
+$ zebraidx -g fun update /d3/fun-records
+$ zebraidx commit
+</verb></tscreen>
+
+If one of the update operations above had been interrupted, the commit
+operation on the last line would fail: <tt/zebraidx/ will not let you
+commit changes that would destroy the running register. You'll have to
+rerun all of the update operations since your last commit operation,
+before you can commit the new changes.
+
+Similarly, if the commit operation fails, <tt/zebraidx/ will not let
+you start a new update operation before you have successfully repeated
+the commit operation. The server processes will keep accessing the
+shadow files rather than the (possibly damaged) blocks of the main
+register files until the commit operation has successfully completed.
+
+You should be aware that update operations may take slightly longer
+when the shadow register system is enabled, since more file access
+operations are involved. Further, while the disk space required for
+the shadow register data is modest for a small update operation, you
+may prefer to disable the system if you are adding a very large number
+of records to an already very large database (we use the terms
+<it/large/ and <it/modest/ very loosely here, since every
+application will have a different perception of size). To update the system
+without the use of the the shadow files, simply run <tt/zebraidx/ with
+the <tt/-n/ option (note that you do not have to execute the
+<bf/commit/ command of <tt/zebraidx/ when you temporarily disable the
+use of the shadow registers in this fashion. Note also that, just as
+when the shadow registers are not enabled, server processes will be
+barred from accessing the main register while the update procedure
+takes place.
+
+<sect>Running the Maintenance Interface (zebraidx)
+
+<p>
+The following is a complete reference to the command line interface to
+the <tt/zebraidx/ application.
+
+<bf/Syntax/
+<tscreen><verb>
+$ zebraidx [options] command [directory] ...
+</verb></tscreen>
+<bf/Options/
+<descrip>
+<tag>-t <it/type/</tag>Update all files as <it/type/. Currently, the
+types supported are <tt/text/ and <tt/grs/<it/.subtype/. If no
+<it/subtype/ is provided for the GRS (General Record Structure) type,
+the canonical input format is assumed (see section <ref
+id="local-representation" name="Local Representation">). Generally, it
+is probably advisable to specify the record types in the
+<tt/zebra.cfg/ file (see section <ref id="record-types" name="Record
+Types">), to avoid confusion at subsequent updates.
+
+<tag>-c <it/config-file/</tag>Read the configuration file
+<it/config-file/ instead of <tt/zebra.cfg/.
+
+<tag>-g <it/group/</tag>Update the files according to the group
+settings for <it/group/ (see section <ref id="configuration-file"
+name="The Zebra Configuration File">).
+
+<tag>-d <it/database/</tag>The records located should be associated
+with the database name <it/database/ for access through the Z39.50
+server.
+
+<tag>-m <it/mbytes/</tag>Use <it/mbytes/ of megabytes before flushing
+keys to background storage. This setting affects performance when
+updating large databases.
+
+<tag>-n</tag>Disable the use of shadow registers for this operation
+(see section <ref id="shadow-registers" name="Robust Updating - Using
+Shadow Registers">).
+
+<tag>-s</tag>Show analysis of the indexing process. The maintenance
+program works in a read-only mode and doesn't change the state
+of the index. This options is very useful when you wish to test a
+new profile.
+
+<tag>-V</tag>Show Zebra version.
+
+<tag>-v <it/level/</tag>Set the log level to <it/level/. <it/level/
+should be one of <tt/none/, <tt/debug/, and <tt/all/.
+
+</descrip>
+
+<bf/Commands/
+<descrip>
+<tag>Update <it/directory/</tag>Update the register with the files
+contained in <it/directory/. If no directory is provided, a list of
+files is read from <tt/stdin/. See section <ref
+id="administrating" name="Administrating Zebra">.
+
+<tag>Delete <it/directory/</tag>Remove the records corresponding to
+the files found under <it/directory/ from the register.
+
+<tag/Commit/Write the changes resulting from the last <bf/update/
+commands to the register. This command is only available if the use of
+shadow register files is enabled (see section <ref
+id="shadow-registers" name="Robust Updating - Using Shadow
+Registers">).
+
+</descrip>
+
+<sect>The Z39.50 Server
+
+<sect1>Running the Z39.50 Server (zebrasrv)
+
+<p>
+<bf/Syntax/
+<tscreen><verb>
+zebrasrv [options] [listener-address ...]
+</verb></tscreen>
+
+<bf/Options/
+<descrip>
+<tag>-a <it/APDU file/</tag> Specify a file for dumping PDUs (for diagnostic purposes).
+The special name &dquot;-&dquot; sends output to <tt/stderr/.
+
+<tag>-c <it/config-file/</tag> Read configuration information from <it/config-file/. The default configuration is <tt>./zebra.cfg</tt>.
+
+<tag/-S/Don't fork on connection requests. This can be useful for
+symbolic-level debugging. The server can only accept a single
+connection in this mode.
+
+<tag/-s/Use the SR protocol.
+
+<tag/-z/Use the Z39.50 protocol (default). These two options complement
+eachother. You can use both multiple times on the same command
+line, between listener-specifications (see below). This way, you
+can set up the server to listen for connections in both protocols
+concurrently, on different local ports.
+
+<tag>-l <it/logfile/</tag>Specify an output file for the diagnostic
+messages. The default is to write this information to <tt/stderr/.
+
+<tag>-v <it/log-level/</tag>The log level. Use a comma-separated list of members of the set
+{fatal,debug,warn,log,all,none}.
+
+<tag>-u <it/username/</tag>Set user ID. Sets the real UID of the server process to that of the
+given <it/username/. It's useful if you aren't comfortable with having the
+server run as root, but you need to start it as such to bind a
+privileged port.
+
+<tag>-w <it/working-directory/</tag>Change working directory.
+
+<tag>-i</tag>Run under the Internet superserver, <tt/inetd/. Make
+sure you use the logfile option <tt/-l/ in conjunction with this
+mode and specify the <tt/-l/ option before any other options.
+
+<tag>-t <it/timeout/</tag>Set the idle session timeout (default 60 minutes).
+
+<tag>-k <it/kilobytes/</tag>Set the (approximate) maximum size of
+present response messages. Default is 1024 Kb (1 Mb).
+</descrip>
+
+A <it/listener-address/ consists of a transport mode followed by a
+colon (:) followed by a listener address. The transport mode is
+either <tt/osi/ or <tt/tcp/.
+
+For TCP, an address has the form
+
+<tscreen><verb>
+hostname | IP-number [: portnumber]
+</verb></tscreen>
+
+The port number defaults to 210 (standard Z39.50 port).
+
+For OSI (only available if the server is compiled with XTI/mOSI
+support enabled), the address form is
+
+<tscreen><verb>
+[t-selector /] hostname | IP-number [: portnumber]
+</verb></tscreen>
+
+The transport selector is given as a string of hex digits (with an even
+number of digits). The default port number is 102 (RFC1006 port).
+
+Examples
+
+<tscreen>
+<verb>
+tcp:dranet.dra.com
+
+osi:0402/dbserver.osiworld.com:3000
+</verb>
+</tscreen>
+
+In both cases, the special hostname &dquot;@&dquot; is mapped to
+the address INADDR_ANY, which causes the server to listen on any local
+interface. To start the server listening on the registered ports for
+Z39.50 and SR over OSI/RFC1006, and to drop root privileges once the
+ports are bound, execute the server like this (from a root shell):
+
+<tscreen><verb>
+zebrasrv -u daemon tcp:@ -s osi:@
+</verb></tscreen>
+
+You can replace <tt/daemon/ with another user, eg. your own account, or
+a dedicated IR server account.
+
+The default behavior for <tt/zebrasrv/ is to establish a single TCP/IP
+listener, for the Z39.50 protocol, on port 9999.
+
+<sect1>Z39.50 Protocol Support and Behavior
+
+<sect2>Initialization
+
+<p>
+During initialization, the server will negotiate to version 3 of the
+Z39.50 protocol, and the option bits for Search, Present, Scan,
+NamedResultSets, and concurrentOperations will be set, if requested by
+the client. The maximum PDU size is negotiated down to a maximum of
+1Mb by default.
+
+<sect2>Search<label id="search">
+
+<p>
+The supported query type are 1 and 101. All operators are currently
+supported with the restriction that only proximity units of type "word" are
+supported for the proximity operator.
+Queries can be arbitrarily complex.
+Named result sets are supported, and result sets can be used as operands
+without limitations.
+Searches may span multiple databases.
+
+The server has full support for piggy-backed present requests (see
+also the following section).
+
+<bf/Use/ attributes are interpreted according to the attribute sets which
+have been loaded in the <tt/zebra.cfg/ file, and are matched against
+specific fields as specified in the <tt/.abs/ file which describes the
+profile of the records which have been loaded. If no <bf/Use/
+attribute is provided, a default of Bib-1 <bf/Any/ is assumed.
+
+If a <bf/Structure/ attribute of <bf/Phrase/ is used in conjunction with a
+<bf/Completeness/ attribute of <bf/Complete (Sub)field/, the term is
+matched against the contents of the phrase (long word) register, if one
+exists for the given <bf/Use/ attribute.
+A phrase register is created for those fields in the <tt/.abs/
+file that contains a <tt/p/-specifier.
+
+If <bf/Structure/=<bf/Phrase/ is used in conjunction with
+<bf/Incomplete Field/ - the default value for <bf/Completeness/, the
+search is directed against the normal word registers, but if the term
+contains multiple words, the term will only match if all of the words
+are found immediately adjacent, and in the given order.
+The word search is performed on those fields that are indexed as
+type <tt/w/ in the <tt/.abs/ file.
+
+If the <bf/Structure/ attribute is <bf/Word List/,
+<bf/Free-form Text/, or <bf/Document Text/, the term is treated as a
+natural-language, relevance-ranked query.
+This search type uses the word register, i.e. those fields
+that are indexed as type <tt/w/ in the <tt/.abs/ file.
+
+If the <bf/Structure/ attribute is <bf/Numeric String/ the
+term is treated as an integer. The search is performed on those
+fields that are indexed as type <tt/n/ in the <tt/.abs/ file.
+
+If the <bf/Structure/ attribute is <bf/URx/ the
+term is treated as a URX (URL) entity. The search is performed on those
+fields that are indexed as type <tt/u/ in the <tt/.abs/ file.
+
+If the <bf/Structure/ attribute is <bf/Local Number/ the
+term is treated as native Zebra Record Identifier.
+
+If the <bf/Relation/ attribute is <bf/Equals/ (default), the term is
+matched in a normal fashion (modulo truncation and processing of
+individual words, if required). If <bf/Relation/ is <bf/Less Than/,
+<bf/Less Than or Equal/, <bf/Greater than/, or <bf/Greater than or
+Equal/, the term is assumed to be numerical, and a standard regular
+expression is constructed to match the given expression. If
+<bf/Relation/ is <bf/Relevance/, the standard natural-language query
+processor is invoked.
+
+For the <bf/Truncation/ attribute, <bf/No Truncation/ is the default.
+<bf/Left Truncation/ is not supported. <bf/Process #/ is supported, as
+is <bf/Regxp-1/. <bf/Regxp-2/ enables the fault-tolerant (fuzzy)
+search. As a default, a single error (deletion, insertion,
+replacement) is accepted when terms are matched against the register
+contents.
+
+<sect3>Regular expressions
+<p>
+
+Each term in a query is interpreted as a regular expression if
+the truncation value is either <bf/Regxp-1/ (102) or <bf/Regxp-2/ (103).
+Both query types follow the same syntax with the operands:
+<descrip>
+<tag/x/ Matches the character <it/x/.
+<tag/./ Matches any character.
+<tag><tt/[/..<tt/]/</tag> Matches the set of characters specified;
+ such as <tt/[abc]/ or <tt/[a-c]/.
+</descrip>
+and the operators:
+<descrip>
+<tag/x*/ Matches <it/x/ zero or more times. Priority: high.
+<tag/x+/ Matches <it/x/ one or more times. Priority: high.
+<tag/x?/ Matches <it/x/ once or twice. Priority: high.
+<tag/xy/ Matches <it/x/, then <it/y/. Priority: medium.
+<tag/x|y/ Matches either <it/x/ or <it/y/. Priority: low.
+</descrip>
+The order of evaluation may be changed by using parentheses.
+
+If the first character of the <bf/Regxp-2/ query is a plus character
+(<tt/+/) it marks the beginning of a section with non-standard
+specifiers. The next plus character marks the end of the section.
+Currently Zebra only supports one specifier, the error tolerance,
+which consists one digit.
+
+Since the plus operator is normally a suffix operator the addition to
+the query syntax doesn't violate the syntax for standard regular
+expressions.
+
+<sect3>Query examples
+<p>
+
+Phrase search for <bf/information retrieval/ in the title-register:
+<verb>
+ @attr 1=4 "information retrieval"
+</verb>
+
+Ranked search for the same thing:
+<verb>
+ @attr 1=4 @attr 2=102 "Information retrieval"
+</verb>
+
+Phrase search with a regular expression:
+<verb>
+ @attr 1=4 @attr 5=102 "informat.* retrieval"
+</verb>
+
+Ranked search with a regular expression:
+<verb>
+ @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
+</verb>
+
+In the GILS schema (<tt/gils.abs/), the west-bounding-coordinate is
+indexed as type <tt/n/, and is therefore searched by specifying
+<bf/structure/=<bf/Numeric String/.
+To match all those records with west-bounding-coordinate greater
+than -114 we use the following query:
+<verb>
+ @attr 4=109 @attr 2=5 @attr gils 1=2038 -114
+</verb>
+
+<sect2>Present
+<p>
+The present facility is supported in a standard fashion. The requested
+record syntax is matched against the ones supported by the profile of
+each record retrieved. If no record syntax is given, SUTRS is the
+default. The requested element set name, again, is matched against any
+provided by the relevant record profiles.
+
+<sect2>Scan
+
+<p>
+The attribute combinations provided with the TermListAndStartPoint are
+processed in the same way as operands in a query (see above).
+Currently, only the term and the globalOccurrences are returned with
+the TermInfo structure.
+
+<sect2>Sort
+
+<p>
+Z39.50 specifies three diffent types of sort criterias.
+Of these Zebra supports the attribute specification type in which
+case the use attribute specifies the "Sort register".
+Sort registers are created for those fields that are of type "sort" in
+the default.idx file.
+The corresponding character mapping file in default.idx specifies the
+ordinal of each character used in the actual sort.
+
+Z39.50 allows the client to specify sorting on one or more input
+result sets and one output result set.
+Zebra supports sorting on one result set only which may or may not
+be the same as the output result set.
+
+<sect2>Close
+
+<p>
+If a Close PDU is received, the server will respond with a Close PDU
+with reason=FINISHED, no matter which protocol version was negotiated
+during initialization. If the protocol version is 3 or more, the
+server will generate a Close PDU under certain circumstances,
+including a session timeout (60 minutes by default), and certain kinds of
+protocol errors. Once a Close PDU has been sent, the protocol
+association is considered broken, and the transport connection will be
+closed immediately upon receipt of further data, or following a short
+timeout.
+
+<sect>The Record Model
+