<chapter id="fields-and-charsets">
- <!-- $Id: field-structure.xml,v 1.8 2006-11-28 13:05:57 marc Exp $ -->
+ <!-- $Id: field-structure.xml,v 1.10 2006-11-28 14:37:45 marc Exp $ -->
<title>Field Structure and Character Sets
</title>
</listitem></varlistentry>
</variablelist>
</para>
+ <para>
+ Following are three excerpts of the standard
+ <filename>tab/default.idx</filename> configuration file. Notice
+ that the <literal>index</literal> and <literal>sort</literal>
+ are grouping directives, which bind all other following directives
+ to them:
+ <screen>
+ # Traditional word index
+ # Used if completenss is 'incomplete field' (@attr 6=1) and
+ # structure is word/phrase/word-list/free-form-text/document-text
+ index w
+ completeness 0
+ position 1
+ alwaysmatches 1
+ firstinfield 1
+ charmap string.chr
+
+ ...
+
+ # Null map index (no mapping at all)
+ # Used if structure=key (@attr 4=3)
+ index 0
+ completeness 0
+ position 1
+ charmap @
+
+ ...
+
+ # Sort register
+ sort s
+ completeness 1
+ charmap string.chr
+ </screen>
+ </para>
</section>
<section id="character-map-files">
<para>
The contents of the character map files are structured as follows:
<variablelist>
+ <varlistentry>
+ <term>encoding <replaceable>encoding-name</replaceable></term>
+ <listitem>
+ <para>
+ This directive must be at the very beginning of the file, and it
+ specifies the character encoding used in the entire file. If
+ omitted, the encoding <literal>ISO-8859-1</literal> is assumed.
+ </para>
+ <para>
+ For example, one of the test files found at
+ <literal>test/rusmarc/tab/string.chr</literal> contains the following
+ encoding directive:
+ <screen>
+ encoding koi8-r
+ </screen>
+ and the test file
+ <literal>test/charmap/string.utf8.chr</literal> is encoded
+ in UTF-8:
+ <screen>
+ encoding utf-8
+ </screen>
+ </para>
+ </listitem></varlistentry>
<varlistentry>
<term>lowercase <replaceable>value-set</replaceable></term>