Updated doc.

[idzebra-moved-to-github.git] / doc / zebra.sgml
diff --git a/doc/zebra.sgml b/doc/zebra.sgml

index 8384f1b..6774a64 100644 (file)
--- a/doc/zebra.sgml
+++ b/doc/zebra.sgml
@@ -1,13 +1,13 @@
  <!doctype linuxdoc system>
  
  <!--
-  $Id: zebra.sgml,v 1.26 1996-05-13 10:20:17 quinn Exp $
+  $Id: zebra.sgml,v 1.29 1996-10-29 14:11:20 adam Exp $
  -->
  
  <article>
  <title>Zebra Server - Administrators's Guide and Reference
  <author><htmlurl url="http://www.indexdata.dk/" name="Index Data">, <tt><htmlurl url="mailto:info@index.ping.dk" name="info@index.ping.dk"></>
-<date>$Revision: 1.26 $
+<date>$Revision: 1.29 $
  <abstract>
  The Zebra information server combines a versatile fielded/free-text
  search engine with a Z39.50-1995 frontend to provide a powerful and flexible
@@ -49,7 +49,7 @@ mailing-list by sending Email to <tt/zebra-request@index.ping.dk/.
  <sect1>Features
  
  <p>
-This is a listof some of the most important features of the
+This is a list of some of the most important features of the
  system.
  
  <itemize>
@@ -225,6 +225,9 @@ profilePath: ../../yaz/tab ../tab
  # Files that describe the attribute sets supported.
  attset: bib1.att
  attset: gils.att
+
+# Name of character map file.
+charMap: scan.chr
  </verb></tscreen>
  
  Now, edit the file and set <tt>profilePath</tt> to the path of the
@@ -234,11 +237,11 @@ archive).
  The 48 test records are located in the sub directory <tt>records</tt>.
  To index these, type:
  <tscreen><verb>
-$ ../index/zebraidx -t grs update records
+$ ../index/zebraidx -t grs.sgml update records
  </verb></tscreen>
  
  In the command above the option <tt>-t</tt> specified the record
-type &mdash; in this case <tt>grs</tt>. The word <tt>update</tt> followed
+type &mdash; in this case <tt>grs.sgml</tt>. The word <tt>update</tt> followed
  by a directory root updates all files below that directory node.
  
  If your indexing command was successful, you are now ready to
@@ -361,13 +364,12 @@ by <tt>zebraidx</tt>. If no <tt/-g/ option is specified, the settings
  with no prefix are used.
  
  In the configuration file, the group name is placed before the option
-name
-itself, separated by a dot (.). For instance, to set the record type
-for group <tt/public/ to <tt/grs/ (the common format for structured
+name itself, separated by a dot (.). For instance, to set the record type
+for group <tt/public/ to <tt/grs.sgml/ (the SGML-like format for structured
  records) you would write:
  
  <tscreen><verb>
-public.recordType: grs
+public.recordType: grs.sgml
  </verb></tscreen>
  
  To set the default value of the record type to <tt/text/ write:
@@ -384,8 +386,12 @@ explained further in the following sections.
   Specifies how records with the file extension <it>name</it> should
   be handled by the indexer. This option may also be specified
   as a command line option (<tt>-t</tt>). Note that if you do not
- specify a <it/name/, the setting applies to all files.
-<tag><it>group</it>.recordId</tag>
+ specify a <it/name/, the setting applies to all files. In general,
+ the record type specifier consists of the elements (each
+ element separated by dot), <it>fundamental-type</it>,
+ <it>file-read-type</it> and arguments. Currently, two
+ fundamental types exist, <tt>text</tt> and <tt>grs</tt>.
+ <tag><it>group</it>.recordId</tag>
   Specifies how the records are to be identified when updated. See
  section <ref id="locating-records" name="Locating Records">.
  <tag><it>group</it>.database</tag>
@@ -409,9 +415,12 @@ section <ref id="locating-records" name="Locating Records">.
   Enables the <it/safe update/ facility of Zebra, and tells the system
   where to place the required, temporary files. See section
  <ref id="shadow-registers" name="Safe Updating - Using Shadow Registers">.
-<tag>lockPath</tag>
+<tag>lockDir</tag>
   Directory in which various lock files are stored.
-<tag>tempSetPath</tag>
+<tag>keyTmpDir</tag>
+ Directory in which temporary files used during zebraidx' update
+ phase are stored. 
+<tag>setTmpDir</tag>
   Specifies the directory that the server uses for temporary result sets.
   If not specified <tt>/tmp</tt> will be used.
  <tag>profilePath</tag>
@@ -421,8 +430,13 @@ section <ref id="locating-records" name="Locating Records">.
   searching. At least the Bib-1 set should be loaded (<tt/bib1.att/).
   The <tt/profilePath/ setting is used to look for the specified files.
   See section <ref id="attset-files" name="The Attribute Set Files">
+<tag>charMap</tag>
+ Specifies the filename of a character mapping. Zebra uses the path,
+ <tt>profilePath</tt>, to locate this file.
+<tag>memMax</tag>
+ Specifies size of internal memory to use for the zebraidx program. The
+ amount is given in megabytes - default is 4 (4 MB).
  </descrip>
-
  <sect1>Locating Records<label id="locating-records">
  <p>
  The default behaviour of the Zebra system is to reference the
@@ -971,7 +985,7 @@ expression is constructed to match the given expression. If
  processor is invoked.
  
  For the <bf/Truncation/ attribute, <bf/No Truncation/ is the default.
-<bf/Left Truncation/ is not supported. <bf/Process #/ is supported, as
+<bf/Left Truncation/ is not supported. <bf/Process &num;/ is supported, as
  is <bf/Regxp-1/. <bf/Regxp-2/ enables the fault-tolerant (fuzzy)
  search. As a default, a single error (deletion, insertion,
  replacement) is accepted when terms are matched against the register
@@ -1018,6 +1032,10 @@ record. Any number of record schema can coexist in the system.
  Although it may be wise to use only a single schema within
  one database, the system poses no such restrictions.
  
+The record model described in this chapter applies to the fundamental
+record type <tt>grs</tt> as introduced in
+section <ref id="record-types" name="Record Types">.
+
  Records pass through three different states during processing in the
  system.
  
@@ -1061,6 +1079,9 @@ a single, canonical input format that gives access to the full
  spectrum of structure and flexibility in the system. In Zebra, this
  canonical format is an &dquot;SGML-like&dquot; syntax.
  
+To use the canonical format specify <tt>grs.sgml</tt> as the record
+type,
+
  Consider a record describing an information resource (such a record is
  sometimes known as a <it/locator record/). It might contain a field
  describing the distributor of the information resource, which might in
@@ -1195,7 +1216,10 @@ work with.
  
  Input filters are ASCII files, generally with the suffix <tt/.flt/.
  The system looks for the files in the directories given in the
-<bf/profilePath/ setting in the <tt/zebra.cfg/ file.
+<bf/profilePath/ setting in the <tt/zebra.cfg/ files. The record type
+for the filter is <tt>grs.regx.</tt><it>filter-filename</it>
+(fundamental type <tt>grs</tt>, file read type <tt>regx</tt>, argument
+<it>filter-filename</it>).
  
  Generally, an input filter consists of a sequence of rules, where each
  rule consists of a sequence of expressions, followed by an action. The
@@ -1905,7 +1929,7 @@ belonging to the Explain schema.
  <sect>License
  
  <p>
-Copyright &copy; 1995, Index Data.
+Copyright &copy; 1995,1996 Index Data.
  
  All rights reserved.