1 <chapter id="quick-start">
2 <title>Quick Start </title>
5 In this section, we will test the system by indexing a small set of sample
6 GILS records that are included with the software distribution. Go to the
7 <literal>test/gils</literal> subdirectory of the distribution archive.
8 There you will find a configuration
9 file named <literal>zebra.cfg</literal> with the following contents:
12 # Where are the YAZ tables located.
13 profilePath: ../../../yaz/tab ../../tab
15 # Files that describe the attribute sets supported.
22 Now, edit the file and set <literal>profilePath</literal> to the path of the
23 YAZ profile tables (sub directory <literal>tab</literal> of the YAZ
24 distribution archive).
28 The 48 test records are located in the sub directory
29 <literal>records</literal>. To index these, type:
32 $ ../../index/zebraidx -t grs.sgml update records
37 In the command above the option <literal>-t</literal> specified the record
38 type — in this case <literal>grs.sgml</literal>.
39 The word <literal>update</literal> followed
40 by a directory root updates all files below that directory node.
44 If your indexing command was successful, you are now ready to
45 fire up a server. To start a server on port 2100, type:
48 $ ../../index/zebrasrv tcp:@:2100
54 The Zebra index that you have just created has a single database
55 named <literal>Default</literal>.
56 The database contains records structured according to
57 the GILS profile, and the server will
58 return records in either either USMARC, GRS-1, or SUTRS depending
59 on what your client asks for.
63 To test the server, you can use any Z39.50 client (1992 or later).
64 For instance, you can use the demo client that comes with YAZ: Just
65 cd to the <literal>client</literal> subdirectory of the YAZ distribution
70 $ ./yaz-client tcp:localhost:2100
75 When the client has connected, you can type:
87 The default retrieval syntax for the client is USMARC. To try other
88 formats for the same record, try:
102 <para>You may notice that more fields are returned when your
103 client requests SUTRS or GRS-1 records. When retrieving GILS records,
104 this is normal - not all of the GILS data elements have mappings in
105 the USMARC record format.
109 If you've made it this far, there's a good chance that
110 you've got through the compilation OK.
115 <chapter id="administration">
116 <title>Administrating Zebra</title>
119 Unlike many simpler retrieval systems, Zebra supports safe, incremental
120 updates to an existing index.
124 Normally, when Zebra modifies the index it reads a number of records
126 Depending on your specifications and on the contents of each record
127 one the following events take place for each record:
134 The record is indexed as if it never occurred before.
135 Either the Zebra system doesn't know how to identify the record or
136 Zebra can identify the record but didn't find it to be already indexed.
144 The record has already been indexed. In this case
145 either the contents of the record or the location (file) of the record
146 indicates that it has been indexed before.
154 The record is deleted from the index. As in the
155 update-case it must be able to identify the record.
163 Please note that in both the modify- and delete- case the Zebra
164 indexer must be able to generate a unique key that identifies the record in
165 question (more on this below).
169 To administrate the Zebra retrieval system, you run the
170 <literal>zebraidx</literal> program.
171 This program supports a number of options which are preceded by a dash,
172 and a few commands (not preceded by dash).
176 Both the Zebra administrative tool and the Z39.50 server share a
177 set of index files and a global configuration file. The
178 name of the configuration file defaults to <literal>zebra.cfg</literal>.
179 The configuration file includes specifications on how to index
180 various kinds of records and where the other configuration files
181 are located. <literal>zebrasrv</literal> and <literal>zebraidx</literal>
182 <emphasis>must</emphasis> be run in the directory where the
183 configuration file lives unless you indicate the location of the
184 configuration file by option <literal>-c</literal>.
187 <sect1 id="record-types">
188 <title>Record Types</title>
191 Indexing is a per-record process, in which either insert/modify/delete
192 will occur. Before a record is indexed search keys are extracted from
193 whatever might be the layout the original record (sgml,html,text, etc..).
194 The Zebra system currently supports two fundamantal types of records:
195 structured and simple text.
196 To specify a particular extraction process, use either the
197 command line option <literal>-t</literal> or specify a
198 <literal>recordType</literal> setting in the configuration file.
203 <sect1 id="configuration-file">
204 <title>The Zebra Configuration File</title>
207 The Zebra configuration file, read by <literal>zebraidx</literal> and
208 <literal>zebrasrv</literal> defaults to <literal>zebra.cfg</literal>
209 unless specified by <literal>-c</literal> option.
213 You can edit the configuration file with a normal text editor.
214 parameter names and values are seperated by colons in the file. Lines
215 starting with a hash sign (<literal>#</literal>) are
220 If you manage different sets of records that share common
221 characteristics, you can organize the configuration settings for each
223 When <literal>zebraidx</literal> is run and you wish to address a
224 given group you specify the group name with the <literal>-g</literal>
226 In this case settings that have the group name as their prefix
227 will be used by <literal>zebraidx</literal>.
228 If no <literal>-g</literal> option is specified, the settings
229 without prefix are used.
233 In the configuration file, the group name is placed before the option
234 name itself, separated by a dot (.). For instance, to set the record type
235 for group <literal>public</literal> to <literal>grs.sgml</literal>
236 (the SGML-like format for structured records) you would write:
241 public.recordType: grs.sgml
246 To set the default value of the record type to <literal>text</literal>
257 The available configuration settings are summarized below. They will be
258 explained further in the following sections.
266 <emphasis>group</emphasis>
267 .recordType[<emphasis>.name</emphasis>]
271 Specifies how records with the file extension
272 <emphasis>name</emphasis> should be handled by the indexer.
273 This option may also be specified as a command line option
274 (<literal>-t</literal>). Note that if you do not specify a
275 <emphasis>name</emphasis>, the setting applies to all files.
276 In general, the record type specifier consists of the elements (each
277 element separated by dot), <emphasis>fundamental-type</emphasis>,
278 <emphasis>file-read-type</emphasis> and arguments. Currently, two
279 fundamental types exist, <literal>text</literal> and
280 <literal>grs</literal>.
285 <term><emphasis>group</emphasis>.recordId</term>
288 Specifies how the records are to be identified when updated. See
289 section <xref linkend="locating-records"/>.
294 <term><emphasis>group</emphasis>.database</term>
297 Specifies the Z39.50 database name.
302 <term><emphasis>group</emphasis>.storeKeys</term>
305 Specifies whether key information should be saved for a given
306 group of records. If you plan to update/delete this type of
307 records later this should be specified as 1; otherwise it
308 should be 0 (default), to save register space. See section
309 <xref linkend="file-ids"/>.
314 <term><emphasis>group</emphasis>.storeData</term>
317 Specifies whether the records should be stored internally
318 in the Zebra system files.
319 If you want to maintain the raw records yourself,
320 this option should be false (0).
321 If you want Zebra to take care of the records for you, it
327 <term>register</term>
330 Specifies the location of the various register files that Zebra uses
331 to represent your databases. See section
332 <xref linkend="register-location"/>.
340 Enables the <emphasis>safe update</emphasis> facility of Zebra, and
341 tells the system where to place the required, temporary files.
343 <xref linkend="shadow-registers"/>.
351 Directory in which various lock files are stored.
356 <term>keyTmpDir</term>
359 Directory in which temporary files used during zebraidx' update
365 <term>setTmpDir</term>
368 Specifies the directory that the server uses for temporary result sets.
369 If not specified <literal>/tmp</literal> will be used.
374 <term>profilePath</term>
377 Specifies the location of profile specification files.
385 Specifies the filename(s) of attribute set files for use in
386 searching. At least the Bib-1 set should be loaded
387 (<literal>bib1.att</literal>).
388 The <literal>profilePath</literal> setting is used to look for
390 See section <xref linkend="attset-files"/>
398 Specifies size of internal memory to use for the zebraidx program. The
399 amount is given in megabytes - default is 4 (4 MB).
408 <sect1 id="locating-records">
409 <title>Locating Records</title>
412 The default behaviour of the Zebra system is to reference the
413 records from their original location, i.e. where they were found when you
414 ran <literal>zebraidx</literal>.
415 That is, when a client wishes to retrieve a record
416 following a search operation, the files are accessed from the place
417 where you originally put them - if you remove the files (without
418 running <literal>zebraidx</literal> again, the client
419 will receive a diagnostic message.
423 If your input files are not permanent - for example if you retrieve
424 your records from an outside source, or if they were temporarily
425 mounted on a CD-ROM drive,
426 you may want Zebra to make an internal copy of them. To do this,
427 you specify 1 (true) in the <literal>storeData</literal> setting. When
428 the Z39.50 server retrieves the records they will be read from the
429 internal file structures of the system.
434 <sect1 id="simple-indexing">
435 <title>Indexing with no Record IDs (Simple Indexing)</title>
438 If you have a set of records that are not expected to change over time
439 you may can build your database without record IDs.
440 This indexing method uses less space than the other methods and
445 To use this method, you simply omit the <literal>recordId</literal> entry
446 for the group of files that you index. To add a set of records you use
447 <literal>zebraidx</literal> with the <literal>update</literal> command. The
448 <literal>update</literal> command will always add all of the records that it
449 encounters to the index - whether they have already been indexed or
450 not. If the set of indexed files change, you should delete all of the
451 index files, and build a new index from scratch.
455 Consider a system in which you have a group of text files called
456 <literal>simple</literal>.
457 That group of records should belong to a Z39.50 database called
458 <literal>textbase</literal>.
459 The following <literal>zebra.cfg</literal> file will suffice:
464 profilePath: /usr/local/yaz
466 simple.recordType: text
467 simple.database: textbase
473 Since the existing records in an index can not be addressed by their
474 IDs, it is impossible to delete or modify records when using this method.
479 <sect1 id="file-ids">
480 <title>Indexing with File Record IDs</title>
483 If you have a set of files that regularly change over time: Old files
484 are deleted, new ones are added, or existing files are modified, you
485 can benefit from using the <emphasis>file ID</emphasis>
486 indexing methodology.
487 Examples of this type of database might include an index of WWW
488 resources, or a USENET news spool area.
489 Briefly speaking, the file key methodology uses the directory paths
490 of the individual records as a unique identifier for each record.
491 To perform indexing of a directory with file keys, again, you specify
492 the top-level directory after the <literal>update</literal> command.
493 The command will recursively traverse the directories and compare
494 each one with whatever have been indexed before in that same directory.
495 If a file is new (not in the previous version of the directory) it
496 is inserted into the registers; if a file was already indexed and
497 it has been modified since the last update, the index is also
498 modified; if a file has been removed since the last
499 visit, it is deleted from the index.
503 The resulting system is easy to administrate. To delete a record you
504 simply have to delete the corresponding file (say, with the
505 <literal>rm</literal> command). And to add records you create new
506 files (or directories with files). For your changes to take effect
507 in the register you must run <literal>zebraidx update</literal> with
508 the same directory root again. This mode of operation requires more
509 disk space than simpler indexing methods, but it makes it easier for
510 you to keep the index in sync with a frequently changing set of data.
511 If you combine this system with the <emphasis>safe update</emphasis>
512 facility (see below), you never have to take your server offline for
513 maintenance or register updating purposes.
517 To enable indexing with pathname IDs, you must specify
518 <literal>file</literal> as the value of <literal>recordId</literal>
519 in the configuration file. In addition, you should set
520 <literal>storeKeys</literal> to <literal>1</literal>, since the Zebra
521 indexer must save additional information about the contents of each record
522 in order to modify the indices correctly at a later time.
526 For example, to update records of group <literal>esdd</literal>
528 <literal>/data1/records/</literal> you should type:
530 $ zebraidx -g esdd update /data1/records
535 The corresponding configuration file includes:
538 esdd.recordType: grs.sgml
544 <para>You cannot start out with a group of records with simple
545 indexing (no record IDs as in the previous section) and then later
546 enable file record Ids. Zebra must know from the first time that you
548 the files should be indexed with file record IDs.
553 You cannot explicitly delete records when using this method (using the
554 <literal>delete</literal> command to <literal>zebraidx</literal>. Instead
555 you have to delete the files from the file system (or move them to a
557 and then run <literal>zebraidx</literal> with the
558 <literal>update</literal> command.
562 <sect1 id="generic-ids">
563 <title>Indexing with General Record IDs</title>
566 When using this method you construct an (almost) arbritrary, internal
567 record key based on the contents of the record itself and other system
568 information. If you have a group of records that explicitly associates
569 an ID with each record, this method is convenient. For example, the
570 record format may contain a title or a ID-number - unique within the group.
571 In either case you specify the Z39.50 attribute set and use-attribute
572 location in which this information is stored, and the system looks at
573 that field to determine the identity of the record.
577 As before, the record ID is defined by the <literal>recordId</literal>
578 setting in the configuration file. The value of the record ID specification
579 consists of one or more tokens separated by whitespace. The resulting
580 ID is represented in the index by concatenating the tokens and
581 separating them by ASCII value (1).
585 There are three kinds of tokens:
589 <term>Internal record info</term>
592 The token refers to a key that is
593 extracted from the record. The syntax of this token is
594 <literal>(</literal> <emphasis>set</emphasis> <literal>,</literal>
595 <emphasis>use</emphasis> <literal>)</literal>,
596 where <emphasis>set</emphasis> is the
597 attribute set name <emphasis>use</emphasis> is the
598 name or value of the attribute.
603 <term>System variable</term>
606 The system variables are preceded by
611 and immediately followed by the system variable name, which
624 <term>database</term>
627 Current database specified.
644 <term>Constant string</term>
647 A string used as part of the ID — surrounded
648 by single- or double quotes.
656 For instance, the sample GILS records that come with the Zebra
657 distribution contain a unique ID in the data tagged Control-Identifier.
658 The data is mapped to the Bib-1 use attribute Identifier-standard
659 (code 1007). To use this field as a record id, specify
660 <literal>(bib1,Identifier-standard)</literal> as the value of the
661 <literal>recordId</literal> in the configuration file.
662 If you have other record types that uses the same field for a
663 different purpose, you might add the record type
664 (or group or database name) to the record id of the gils
665 records as well, to prevent matches with other types of records.
666 In this case the recordId might be set like this:
669 gils.recordId: $type (bib1,Identifier-standard)
675 (see section <xref linkend="data-model"/>
676 for details of how the mapping between elements of your records and
677 searchable attributes is established).
681 As for the file record ID case described in the previous section,
682 updating your system is simply a matter of running
683 <literal>zebraidx</literal>
684 with the <literal>update</literal> command. However, the update with general
685 keys is considerably slower than with file record IDs, since all files
686 visited must be (re)read to discover their IDs.
690 As you might expect, when using the general record IDs
691 method, you can only add or modify existing records with the
692 <literal>update</literal> command.
693 If you wish to delete records, you must use the,
694 <literal>delete</literal> command, with a directory as a parameter.
695 This will remove all records that match the files below that root
701 <sect1 id="register-location">
702 <title>Register Location</title>
705 Normally, the index files that form dictionaries, inverted
706 files, record info, etc., are stored in the directory where you run
707 <literal>zebraidx</literal>. If you wish to store these, possibly large,
708 files somewhere else, you must add the <literal>register</literal>
709 entry to the <literal>zebra.cfg</literal> file.
710 Furthermore, the Zebra system allows its file
711 structures to span multiple file systems, which is useful for
712 managing very large databases.
716 The value of the <literal>register</literal> setting is a sequence
717 of tokens. Each token takes the form:
720 <emphasis>dir</emphasis><literal>:</literal><emphasis>size</emphasis>.
723 The <emphasis>dir</emphasis> specifies a directory in which index files
724 will be stored and the <emphasis>size</emphasis> specifies the maximum
725 size of all files in that directory. The Zebra indexer system fills
726 each directory in the order specified and use the next specified
727 directories as needed.
728 The <emphasis>size</emphasis> is an integer followed by a qualifier
729 code, <literal>M</literal> for megabytes,
730 <literal>k</literal> for kilobytes.
734 For instance, if you have allocated two disks for your register, and
735 the first disk is mounted
736 on <literal>/d1</literal> and has 200 Mb of free space and the
737 second, mounted on <literal>/d2</literal> has 300 Mb, you could
738 put this entry in your configuration file:
741 register: /d1:200M /d2:300M
747 Note that Zebra does not verify that the amount of space specified is
748 actually available on the directory (file system) specified - it is
749 your responsibility to ensure that enough space is available, and that
750 other applications do not attempt to use the free space. In a large
751 production system, it is recommended that you allocate one or more
752 filesystem exclusively to the Zebra register files.
757 <sect1 id="shadow-registers">
758 <title>Safe Updating - Using Shadow Registers</title>
761 <title>Description</title>
764 The Zebra server supports <emphasis>updating</emphasis> of the index
765 structures. That is, you can add, modify, or remove records from
766 databases managed by Zebra without rebuilding the entire index.
767 Since this process involves modifying structured files with various
768 references between blocks of data in the files, the update process
769 is inherently sensitive to system crashes, or to process interruptions:
770 Anything but a successfully completed update process will leave the
771 register files in an unknown state, and you will essentially have no
772 recourse but to re-index everything, or to restore the register files
773 from a backup medium.
774 Further, while the update process is active, users cannot be
775 allowed to access the system, as the contents of the register files
776 may change unpredictably.
780 You can solve these problems by enabling the shadow register system in
782 During the updating procedure, <literal>zebraidx</literal> will temporarily
783 write changes to the involved files in a set of "shadow
784 files", without modifying the files that are accessed by the
785 active server processes. If the update procedure is interrupted by a
786 system crash or a signal, you simply repeat the procedure - the
787 register files have not been changed or damaged, and the partially
788 written shadow files are automatically deleted before the new updating
793 At the end of the updating procedure (or in a separate operation, if
794 you so desire), the system enters a "commit mode". First,
795 any active server processes are forced to access those blocks that
796 have been changed from the shadow files rather than from the main
797 register files; the unmodified blocks are still accessed at their
798 normal location (the shadow files are not a complete copy of the
799 register files - they only contain those parts that have actually been
800 modified). If the commit process is interrupted at any point during the
801 commit process, the server processes will continue to access the
802 shadow files until you can repeat the commit procedure and complete
803 the writing of data to the main register files. You can perform
804 multiple update operations to the registers before you commit the
805 changes to the system files, or you can execute the commit operation
806 at the end of each update operation. When the commit phase has
807 completed successfully, any running server processes are instructed to
808 switch their operations to the new, operational register, and the
809 temporary shadow files are deleted.
815 <title>How to Use Shadow Register Files</title>
818 The first step is to allocate space on your system for the shadow
820 You do this by adding a <literal>shadow</literal> entry to the
821 <literal>zebra.cfg</literal> file.
822 The syntax of the <literal>shadow</literal> entry is exactly the
823 same as for the <literal>register</literal> entry
824 (see section <xref linkend="register-location"/>).
825 The location of the shadow area should be
826 <emphasis>different</emphasis> from the location of the main register
827 area (if you have specified one - remember that if you provide no
828 <literal>register</literal> setting, the default register area is the
829 working directory of the server and indexing processes).
833 The following excerpt from a <literal>zebra.cfg</literal> file shows
834 one example of a setup that configures both the main register
835 location and the shadow file area.
836 Note that two directories or partitions have been set aside
837 for the shadow file area. You can specify any number of directories
838 for each of the file areas, but remember that there should be no
839 overlaps between the directories used for the main registers and the
840 shadow files, respectively.
847 shadow: /scratch1:100M /scratch2:200M
853 When shadow files are enabled, an extra command is available at the
854 <literal>zebraidx</literal> command line.
855 In order to make changes to the system take effect for the
856 users, you'll have to submit a "commit" command after a
857 (sequence of) update operation(s).
858 You can ask the indexer to commit the changes immediately
859 after the update operation:
865 $ zebraidx update /d1/records update /d2/more-records commit
871 Or you can execute multiple updates before committing the changes:
877 $ zebraidx -g books update /d1/records update /d2/more-records
878 $ zebraidx -g fun update /d3/fun-records
885 If one of the update operations above had been interrupted, the commit
886 operation on the last line would fail: <literal>zebraidx</literal>
887 will not let you commit changes that would destroy the running register.
888 You'll have to rerun all of the update operations since your last
889 commit operation, before you can commit the new changes.
893 Similarly, if the commit operation fails, <literal>zebraidx</literal>
894 will not let you start a new update operation before you have
895 successfully repeated the commit operation.
896 The server processes will keep accessing the shadow files rather
897 than the (possibly damaged) blocks of the main register files
898 until the commit operation has successfully completed.
902 You should be aware that update operations may take slightly longer
903 when the shadow register system is enabled, since more file access
904 operations are involved. Further, while the disk space required for
905 the shadow register data is modest for a small update operation, you
906 may prefer to disable the system if you are adding a very large number
907 of records to an already very large database (we use the terms
908 <emphasis>large</emphasis> and <emphasis>modest</emphasis>
909 very loosely here, since every application will have a
910 different perception of size).
911 To update the system without the use of the the shadow files,
912 simply run <literal>zebraidx</literal> with the <literal>-n</literal>
913 option (note that you do not have to execute the
914 <emphasis>commit</emphasis> command of <literal>zebraidx</literal>
915 when you temporarily disable the use of the shadow registers in
917 Note also that, just as when the shadow registers are not enabled,
918 server processes will be barred from accessing the main register
919 while the update procedure takes place.
928 <chapter id="zebraidx">
929 <title>Running the Maintenance Interface (zebraidx)</title>
932 The following is a complete reference to the command line interface to
933 the <literal>zebraidx</literal> application.
940 $ zebraidx [options] command [directory] ...
947 <term>-t <replaceable>type</replaceable></term>
950 Update all files as <replaceable>type</replaceable>. Currently, the
951 types supported are <literal>text</literal> and
952 <literal>grs</literal><replaceable>.subtype</replaceable>.
953 If no <replaceable>subtype</replaceable> is provided for the GRS
954 (General Record Structure) type, the canonical input format
955 is assumed (see section <xref linkend="local-representation"/>).
956 Generally, it is probably advisable to specify the record types
957 in the <literal>zebra.cfg</literal> file (see section
958 <xref linkend="record-types"/>), to avoid confusion at
964 <term>-c <replaceable>config-file</replaceable></term>
967 Read the configuration file
968 <replaceable>config-file</replaceable> instead of
969 <literal>zebra.cfg</literal>.
974 <term>-g <replaceable>group</replaceable></term>
977 Update the files according to the group
978 settings for <replaceable>group</replaceable> (see section
979 <xref linkend="configuration-file"/>).
984 <term>-d <replaceable>database</replaceable></term>
987 The records located should be associated with the database name
988 <replaceable>database</replaceable> for access through the Z39.50 server.
993 <term>-m <replaceable>mbytes</replaceable></term>
996 Use <replaceable>mbytes</replaceable> of megabytes before flushing
997 keys to background storage. This setting affects performance when
998 updating large databases.
1006 Disable the use of shadow registers for this operation
1007 (see section <xref linkend="shadow-registers"/>).
1015 Show analysis of the indexing process. The maintenance
1016 program works in a read-only mode and doesn't change the state
1017 of the index. This options is very useful when you wish to test a
1031 <term>-v <replaceable>level</replaceable></term>
1034 Set the log level to <replaceable>level</replaceable>.
1035 <replaceable>level</replaceable> should be one of
1036 <literal>none</literal>, <literal>debug</literal>, and
1037 <literal>all</literal>.
1049 <term>update <replaceable>directory</replaceable></term>
1052 Update the register with the files contained in
1053 <replaceable>directory</replaceable>.
1054 If no directory is provided, a list of files is read from
1055 <literal>stdin</literal>.
1056 See section <xref linkend="administration"/>.
1061 <term>delete <replaceable>directory</replaceable></term>
1064 Remove the records corresponding to the files found under
1065 <replaceable>directory</replaceable> from the register.
1073 Write the changes resulting from the last <literal>update</literal>
1074 commands to the register. This command is only available if the use of
1075 shadow register files is enabled (see section
1076 <xref linkend="shadow-registers"/>).
1085 <chapter id="server">
1086 <title>The Z39.50 Server</title>
1088 <sect1 id="zebrasrv">
1089 <title>Running the Z39.50 Server (zebrasrv)</title>
1092 <emphasis remap="bf">Syntax</emphasis>
1095 zebrasrv [options] [listener-address ...]
1101 <emphasis remap="bf">Options</emphasis>
1105 <term>-a <replaceable>APDU file</replaceable></term>
1108 Specify a file for dumping PDUs (for diagnostic purposes).
1109 The special name "-" sends output to <literal>stderr</literal>.
1114 <term>-c <replaceable>config-file</replaceable></term>
1117 Read configuration information from
1118 <replaceable>config-file</replaceable>.
1119 The default configuration is <literal>./zebra.cfg</literal>.
1127 Don't fork on connection requests. This can be useful for
1128 symbolic-level debugging. The server can only accept a single
1129 connection in this mode.
1137 Use the SR protocol.
1145 Use the Z39.50 protocol (default). These two options complement
1146 eachother. You can use both multiple times on the same command
1147 line, between listener-specifications (see below). This way, you
1148 can set up the server to listen for connections in both protocols
1149 concurrently, on different local ports.
1154 <term>-l <replaceable>logfile</replaceable></term>
1157 Specify an output file for the diagnostic messages.
1158 The default is to write this information to <literal>stderr</literal>.
1163 <term>-v <replaceable>log-level</replaceable></term>
1166 The log level. Use a comma-separated list of members of the set
1167 {fatal,debug,warn,log,all,none}.
1172 <term>-u <replaceable>username</replaceable></term>
1175 Set user ID. Sets the real UID of the server process to that of the
1176 given <replaceable>username</replaceable>.
1177 It's useful if you aren't comfortable with having the
1178 server run as root, but you need to start it as such to bind a
1184 <term>-w <replaceable>working-directory</replaceable></term>
1187 Change working directory.
1195 Run under the Internet superserver, <literal>inetd</literal>.
1196 Make sure you use the logfile option <literal>-l</literal> in
1197 conjunction with this mode and specify the <literal>-l</literal>
1198 option before any other options.
1203 <term>-t <replaceable>timeout</replaceable></term>
1206 Set the idle session timeout (default 60 minutes).
1211 <term>-k <replaceable>kilobytes</replaceable></term>
1214 Set the (approximate) maximum size of
1215 present response messages. Default is 1024 Kb (1 Mb).
1223 A <replaceable>listener-address</replaceable> consists of a transport
1224 mode followed by a colon (:) followed by a listener address.
1225 The transport mode is either <literal>ssl</literal> or
1226 <literal>tcp</literal>.
1230 For TCP, an address has the form
1236 hostname | IP-number [: portnumber]
1242 The port number defaults to 210 (standard Z39.50 port).
1254 ssl:secure.lib.com:3000
1260 In both cases, the special hostname "@" is mapped to
1261 the address INADDR_ANY, which causes the server to listen on any local
1262 interface. To start the server listening on the registered port for
1263 Z39.50, and to drop root privileges once the ports are bound, execute
1264 the server like this (from a root shell):
1270 zebrasrv -u daemon tcp:@
1276 You can replace <literal>daemon</literal> with another user, eg.
1277 your own account, or a dedicated IR server account.
1281 The default behavior for <literal>zebrasrv</literal> is to establish
1282 a single TCP/IP listener, for the Z39.50 protocol, on port 9999.
1287 <sect1 id="protocol-support">
1288 <title>Z39.50 Protocol Support and Behavior</title>
1291 <title>Initialization</title>
1294 During initialization, the server will negotiate to version 3 of the
1295 Z39.50 protocol, and the option bits for Search, Present, Scan,
1296 NamedResultSets, and concurrentOperations will be set, if requested by
1297 the client. The maximum PDU size is negotiated down to a maximum of
1304 <title>Search</title>
1307 The supported query type are 1 and 101. All operators are currently
1308 supported with the restriction that only proximity units of type "word"
1309 are supported for the proximity operator.
1310 Queries can be arbitrarily complex.
1311 Named result sets are supported, and result sets can be used as operands
1312 without limitations.
1313 Searches may span multiple databases.
1317 The server has full support for piggy-backed present requests (see
1318 also the following section).
1322 <emphasis>Use</emphasis> attributes are interpreted according to the
1323 attribute sets which have been loaded in the
1324 <literal>zebra.cfg</literal> file, and are matched against specific
1325 fields as specified in the <literal>.abs</literal> file which
1326 describes the profile of the records which have been loaded.
1327 If no Use attribute is provided, a default of Bib-1 Any is assumed.
1331 If a <emphasis>Structure</emphasis> attribute of
1332 <emphasis>Phrase</emphasis> is used in conjunction with a
1333 <emphasis>Completeness</emphasis> attribute of
1334 <emphasis>Complete (Sub)field</emphasis>, the term is matched
1335 against the contents of the phrase (long word) register, if one
1336 exists for the given <emphasis>Use</emphasis> attribute.
1337 A phrase register is created for those fields in the
1338 <literal>.abs</literal> file that contains a
1339 <literal>p</literal>-specifier.
1343 If <emphasis>Structure</emphasis>=<emphasis>Phrase</emphasis> is
1344 used in conjunction with <emphasis>Incomplete Field</emphasis> - the
1345 default value for <emphasis>Completeness</emphasis>, the
1346 search is directed against the normal word registers, but if the term
1347 contains multiple words, the term will only match if all of the words
1348 are found immediately adjacent, and in the given order.
1349 The word search is performed on those fields that are indexed as
1350 type <literal>w</literal> in the <literal>.abs</literal> file.
1354 If the <emphasis>Structure</emphasis> attribute is
1355 <emphasis>Word List</emphasis>,
1356 <emphasis>Free-form Text</emphasis>, or
1357 <emphasis>Document Text</emphasis>, the term is treated as a
1358 natural-language, relevance-ranked query.
1359 This search type uses the word register, i.e. those fields
1360 that are indexed as type <literal>w</literal> in the
1361 <literal>.abs</literal> file.
1365 If the <emphasis>Structure</emphasis> attribute is
1366 <emphasis>Numeric String</emphasis> the term is treated as an integer.
1367 The search is performed on those fields that are indexed
1368 as type <literal>n</literal> in the <literal>.abs</literal> file.
1372 If the <emphasis>Structure</emphasis> attribute is
1373 <emphasis>URx</emphasis> the term is treated as a URX (URL) entity.
1374 The search is performed on those fields that are indexed as type
1375 <literal>u</literal> in the <literal>.abs</literal> file.
1379 If the <emphasis>Structure</emphasis> attribute is
1380 <emphasis>Local Number</emphasis> the term is treated as
1381 native Zebra Record Identifier.
1385 If the <emphasis>Relation</emphasis> attribute is
1386 <emphasis>Equals</emphasis> (default), the term is matched
1387 in a normal fashion (modulo truncation and processing of
1388 individual words, if required).
1389 If <emphasis>Relation</emphasis> is <emphasis>Less Than</emphasis>,
1390 <emphasis>Less Than or Equal</emphasis>,
1391 <emphasis>Greater than</emphasis>, or <emphasis>Greater than or
1392 Equal</emphasis>, the term is assumed to be numerical, and a
1393 standard regular expression is constructed to match the given
1395 If <emphasis>Relation</emphasis> is <emphasis>Relevance</emphasis>,
1396 the standard natural-language query processor is invoked.
1400 For the <emphasis>Truncation</emphasis> attribute,
1401 <emphasis>No Truncation</emphasis> is the default.
1402 <emphasis>Left Truncation</emphasis> is not supported.
1403 <emphasis>Process #</emphasis> is supported, as is
1404 <emphasis>Regxp-1</emphasis>.
1405 <emphasis>Regxp-2</emphasis> enables the fault-tolerant (fuzzy)
1406 search. As a default, a single error (deletion, insertion,
1407 replacement) is accepted when terms are matched against the register
1412 <title>Regular expressions</title>
1415 Each term in a query is interpreted as a regular expression if
1416 the truncation value is either <emphasis>Regxp-1</emphasis> (102)
1417 or <emphasis>Regxp-2</emphasis> (103).
1418 Both query types follow the same syntax with the operands:
1425 Matches the character <emphasis>x</emphasis>.
1433 Matches any character.
1438 <term><literal>[</literal>..<literal>]</literal></term>
1441 Matches the set of characters specified;
1442 such as <literal>[abc]</literal> or <literal>[a-c]</literal>.
1454 Matches <emphasis>x</emphasis> zero or more times. Priority: high.
1462 Matches <emphasis>x</emphasis> one or more times. Priority: high.
1470 Matches <emphasis>x</emphasis> once or twice. Priority: high.
1478 Matches <emphasis>x</emphasis>, then <emphasis>y</emphasis>.
1484 <term>x|y</term>
1487 Matches either <emphasis>x</emphasis> or <emphasis>y</emphasis>.
1493 The order of evaluation may be changed by using parentheses.
1497 If the first character of the <emphasis>Regxp-2</emphasis> query
1498 is a plus character (<literal>+</literal>) it marks the
1499 beginning of a section with non-standard specifiers.
1500 The next plus character marks the end of the section.
1501 Currently Zebra only supports one specifier, the error tolerance,
1502 which consists one digit.
1506 Since the plus operator is normally a suffix operator the addition to
1507 the query syntax doesn't violate the syntax for standard regular
1514 <title>Query examples</title>
1517 Phrase search for <emphasis>information retrieval</emphasis> in
1520 @attr 1=4 "information retrieval"
1525 Ranked search for the same thing:
1527 @attr 1=4 @attr 2=102 "Information retrieval"
1532 Phrase search with a regular expression:
1534 @attr 1=4 @attr 5=102 "informat.* retrieval"
1539 Ranked search with a regular expression:
1541 @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
1546 In the GILS schema (<literal>gils.abs</literal>), the
1547 west-bounding-coordinate is indexed as type <literal>n</literal>,
1548 and is therefore searched by specifying
1549 <emphasis>structure</emphasis>=<emphasis>Numeric String</emphasis>.
1550 To match all those records with west-bounding-coordinate greater
1551 than -114 we use the following query:
1553 @attr 4=109 @attr 2=5 @attr gils 1=2038 -114
1560 <title>Present</title>
1562 The present facility is supported in a standard fashion. The requested
1563 record syntax is matched against the ones supported by the profile of
1564 each record retrieved. If no record syntax is given, SUTRS is the
1565 default. The requested element set name, again, is matched against any
1566 provided by the relevant record profiles.
1572 The attribute combinations provided with the termListAndStartPoint are
1573 processed in the same way as operands in a query (see above).
1574 Currently, only the term and the globalOccurrences are returned with
1575 the termInfo structure.
1582 Z39.50 specifies three diffent types of sort criterias.
1583 Of these Zebra supports the attribute specification type in which
1584 case the use attribute specifies the "Sort register".
1585 Sort registers are created for those fields that are of type "sort" in
1586 the default.idx file.
1587 The corresponding character mapping file in default.idx specifies the
1588 ordinal of each character used in the actual sort.
1592 Z39.50 allows the client to specify sorting on one or more input
1593 result sets and one output result set.
1594 Zebra supports sorting on one result set only which may or may not
1595 be the same as the output result set.
1599 <title>Close</title>
1601 If a Close PDU is received, the server will respond with a Close PDU
1602 with reason=FINISHED, no matter which protocol version was negotiated
1603 during initialization. If the protocol version is 3 or more, the
1604 server will generate a Close PDU under certain circumstances,
1605 including a session timeout (60 minutes by default), and certain kinds of
1606 protocol errors. Once a Close PDU has been sent, the protocol
1607 association is considered broken, and the transport connection will be
1608 closed immediately upon receipt of further data, or following a short
1615 <chapter id="record-model">
1616 <title>The Record Model</title>
1619 The Zebra system is designed to support a wide range of data management
1620 applications. The system can be configured to handle virtually any
1621 kind of structured data. Each record in the system is associated with
1622 a <emphasis>record schema</emphasis> which lends context to the data
1623 elements of the record.
1624 Any number of record schema can coexist in the system.
1625 Although it may be wise to use only a single schema within
1626 one database, the system poses no such restrictions.
1630 The record model described in this chapter applies to the fundamental,
1632 record type <literal>grs</literal> as introduced in
1633 section <xref linkend="record-types"/>.
1637 Records pass through three different states during processing in the
1647 When records are accessed by the system, they are represented
1648 in their local, or native format. This might be SGML or HTML files,
1649 News or Mail archives, MARC records. If the system doesn't already
1650 know how to read the type of data you need to store, you can set up an
1651 input filter by preparing conversion rules based on regular
1652 expressions and possibly augmented by a flexible scripting language
1654 The input filter produces as output an internal representation:
1661 When records are processed by the system, they are represented
1662 in a tree-structure, constructed by tagged data elements hanging off a
1663 root node. The tagged elements may contain data or yet more tagged
1664 elements in a recursive structure. The system performs various
1665 actions on this tree structure (indexing, element selection, schema
1673 Before transmitting records to the client, they are first
1674 converted from the internal structure to a form suitable for exchange
1675 over the network - according to the Z39.50 standard.
1683 <sect1 id="local-representation">
1684 <title>Local Representation</title>
1687 As mentioned earlier, Zebra places few restrictions on the type of
1688 data that you can index and manage. Generally, whatever the form of
1689 the data, it is parsed by an input filter specific to that format, and
1690 turned into an internal structure that Zebra knows how to handle. This
1691 process takes place whenever the record is accessed - for indexing and
1696 The RecordType parameter in the <literal>zebra.cfg</literal> file, or
1697 the <literal>-t</literal> option to the indexer tells Zebra how to
1698 process input records.
1699 Two basic types of processing are available - raw text and structured
1700 data. Raw text is just that, and it is selected by providing the
1701 argument <emphasis>text</emphasis> to Zebra. Structured records are
1702 all handled internally using the basic mechanisms described in the
1703 subsequent sections.
1704 Zebra can read structured records in many different formats.
1705 How this is done is governed by additional parameters after the
1706 "grs" keyboard, separated by "." characters.
1710 Three basic subtypes to the <emphasis>grs</emphasis> type are
1711 currently available:
1717 <term>grs.sgml</term>
1720 This is the canonical input format —
1721 described below. It is a simple SGML-like syntax.
1726 <term>grs.regx.<emphasis>filter</emphasis></term>
1729 This enables a user-supplied input
1730 filter. The mechanisms of these filters are described below.
1735 <term>grs.marc.<emphasis>abstract syntax</emphasis></term>
1738 This allows Zebra to read
1739 records in the ISO2709 (MARC) encoding standard. In this case, the
1740 last paramemeter <emphasis>abstract syntax</emphasis> names the
1741 <literal>.abs</literal> file (see below)
1742 which describes the specific MARC structure of the input record as
1743 well as the indexing rules.
1751 <title>Canonical Input Format</title>
1754 Although input data can take any form, it is sometimes useful to
1755 describe the record processing capabilities of the system in terms of
1756 a single, canonical input format that gives access to the full
1757 spectrum of structure and flexibility in the system. In Zebra, this
1758 canonical format is an "SGML-like" syntax.
1762 To use the canonical format specify <literal>grs.sgml</literal> as
1767 Consider a record describing an information resource (such a record is
1768 sometimes known as a <emphasis>locator record</emphasis>).
1769 It might contain a field describing the distributor of the
1770 information resource, which might in turn be partitioned into
1771 various fields providing details about the distributor, like this:
1777 <Distributor>
1778 <Name> USGS/WRD </Name>
1779 <Organization> USGS/WRD </Organization>
1780 <Street-Address>
1781 U.S. GEOLOGICAL SURVEY, 505 MARQUETTE, NW
1782 </Street-Address>
1783 <City> ALBUQUERQUE </City>
1784 <State> NM </State>
1785 <Zip-Code> 87102 </Zip-Code>
1786 <Country> USA </Country>
1787 <Telephone> (505) 766-5560 </Telephone>
1788 </Distributor>
1795 The indentation used above is used to illustrate how Zebra
1796 interprets the markup. The indentation, in itself, has no
1797 significance to the parser for the canonical input format, which
1798 discards superfluous whitespace.
1802 The keywords surrounded by <...> are
1803 <emphasis>tags</emphasis>, while the sections of text
1804 in between are the <emphasis>data elements</emphasis>.
1805 A data element is characterized by its location in the tree
1806 that is made up by the nested elements.
1807 Each element is terminated by a closing tag - beginning
1808 with <literal><</literal>/, and containing the same symbolic
1809 tag-name as the corresponding opening tag.
1810 The general closing tag - <literal><</literal>>/ -
1811 terminates the element started by the last opening tag. The
1812 structuring of elements is significant.
1813 The element <emphasis>Telephone</emphasis>,
1814 for instance, may be indexed and presented to the client differently,
1815 depending on whether it appears inside the
1816 <emphasis>Distributor</emphasis> element, or some other,
1817 structured data element such a <emphasis>Supplier</emphasis> element.
1821 <title>Record Root</title>
1824 The first tag in a record describes the root node of the tree that
1825 makes up the total record. In the canonical input format, the root tag
1826 should contain the name of the schema that lends context to the
1827 elements of the record (see section
1828 <xref linkend="internal-representation"/>).
1829 The following is a GILS record that
1830 contains only a single element (strictly speaking, that makes it an
1831 illegal GILS record, since the GILS profile includes several mandatory
1832 elements - Zebra does not validate the contents of a record against
1833 the Z39.50 profile, however - it merely attempts to match up elements
1834 of a local representation with the given schema):
1841 <title>Zen and the Art of Motorcycle Maintenance</title>
1850 <title>Variants</title>
1853 Zebra allows you to provide individual data elements in a number of
1854 <emphasis>variant forms</emphasis>. Examples of variant forms are
1855 textual data elements which might appear in different languages, and
1856 images which may appear in different formats or layouts.
1857 The variant system in Zebra is essentially a representation of
1858 the variant mechanism of Z39.50-1995.
1862 The following is an example of a title element which occurs in two
1863 different languages.
1870 <var lang lang "eng">
1871 Zen and the Art of Motorcycle Maintenance</>
1872 <var lang lang "dan">
1873 Zen og Kunsten at Vedligeholde en Motorcykel</>
1880 The syntax of the <emphasis>variant element</emphasis> is
1881 <literal><var class type value></literal>.
1882 The available values for the <emphasis>class</emphasis> and
1883 <emphasis>type</emphasis> fields are given by the variant set
1884 that is associated with the current schema
1885 (see section <xref linkend="variant-set"/>).
1889 Variant elements are terminated by the general end-tag </>, by
1890 the variant end-tag </var>, by the appearance of another variant
1891 tag with the same <emphasis>class</emphasis> and
1892 <emphasis>value</emphasis> settings, or by the
1893 appearance of another, normal tag. In other words, the end-tags for
1894 the variants used in the example above could have been saved.
1898 Variant elements can be nested. The element
1905 <var lang lang "eng"><var body iana "text/plain">
1906 Zen and the Art of Motorcycle Maintenance
1913 Associates two variant components to the variant list for the title
1918 Given the nesting rules described above, we could write
1925 <var body iana "text/plain>
1926 <var lang lang "eng">
1927 Zen and the Art of Motorcycle Maintenance
1928 <var lang lang "dan">
1929 Zen og Kunsten at Vedligeholde en Motorcykel
1936 The title element above comes in two variants. Both have the IANA body
1937 type "text/plain", but one is in English, and the other in
1938 Danish. The client, using the element selection mechanism of Z39.50,
1939 can retrieve information about the available variant forms of data
1940 elements, or it can select specific variants based on the requirements
1949 <title>Input Filters</title>
1952 In order to handle general input formats, Zebra allows the
1953 operator to define filters which read individual records in their
1954 native format and produce an internal representation that the system
1959 Input filters are ASCII files, generally with the suffix
1960 <literal>.flt</literal>.
1961 The system looks for the files in the directories given in the
1962 <emphasis>profilePath</emphasis> setting in the
1963 <literal>zebra.cfg</literal> files.
1964 The record type for the filter is
1965 <literal>grs.regx.</literal><emphasis>filter-filename</emphasis>
1966 (fundamental type <literal>grs</literal>, file read
1967 type <literal>regx</literal>, argument
1968 <emphasis>filter-filename</emphasis>).
1972 Generally, an input filter consists of a sequence of rules, where each
1973 rule consists of a sequence of expressions, followed by an action. The
1974 expressions are evaluated against the contents of the input record,
1975 and the actions normally contribute to the generation of an internal
1976 representation of the record.
1980 An expression can be either of the following:
1990 The action associated with this expression is evaluated
1991 exactly once in the lifetime of the application, before any records
1992 are read. It can be used in conjunction with an action that
1993 initializes tables or other resources that are used in the processing
2002 Matches the beginning of the record. It can be used to
2003 initialize variables, etc. Typically, the
2004 <emphasis>BEGIN</emphasis> rule is also used
2005 to establish the root node of the record.
2013 Matches the end of the record - when all of the contents
2014 of the record has been processed.
2019 <term>/pattern/</term>
2022 Matches a string of characters from the input record.
2030 This keyword may only be used between two patterns.
2031 It matches everything between (not including) those patterns.
2039 The expression asssociated with this pattern is evaluated
2040 once, before the application terminates. It can be used to release
2041 system resources - typically ones allocated in the
2042 <emphasis>INIT</emphasis> step.
2050 An action is surrounded by curly braces ({...}), and
2051 consists of a sequence of statements. Statements may be separated
2052 by newlines or semicolons (;).
2053 Within actions, the strings that matched the expressions
2054 immediately preceding the action can be referred to as
2055 $0, $1, $2, etc.
2059 The available statements are:
2066 <term>begin <emphasis>type [parameter ... ]</emphasis></term>
2070 data element. The type is one of the following:
2077 Begin a new record. The followingparameter should be the
2078 name of the schema that describes the structure of the record, eg.
2079 <literal>gils</literal> or <literal>wais</literal> (see below).
2080 The <literal>begin record</literal> call should precede
2081 any other use of the <emphasis>begin</emphasis> statement.
2086 <term>element</term>
2089 Begin a new tagged element. The parameter is the
2090 name of the tag. If the tag is not matched anywhere in the tagsets
2091 referenced by the current schema, it is treated as a local string
2097 <term>variant</term>
2100 Begin a new node in a variant tree. The parameters are
2101 <emphasis>class type value</emphasis>.
2113 Create a data element. The concatenated arguments make
2114 up the value of the data element.
2115 The option <literal>-text</literal> signals that
2116 the layout (whitespace) of the data should be retained for
2118 The option <literal>-element</literal>
2119 <emphasis>tag</emphasis> wraps the data up in
2120 the <emphasis>tag</emphasis>.
2121 The use of the <literal>-element</literal> option is equivalent to
2122 preceding the command with a <emphasis>begin
2123 element</emphasis> command, and following
2124 it with the <emphasis>end</emphasis> command.
2129 <term>end <emphasis>[type]</emphasis></term>
2132 Close a tagged element. If no parameter is given,
2133 the last element on the stack is terminated.
2134 The first parameter, if any, is a type name, similar
2135 to the <emphasis>begin</emphasis> statement.
2136 For the <emphasis>element</emphasis> type, a tag
2137 name can be provided to terminate a specific tag.
2145 The following input filter reads a Usenet news file, producing a
2146 record in the WAIS schema. Note that the body of a news posting is
2147 separated from the list of headers by a blank line (or rather a
2148 sequence of two newline characters.
2154 BEGIN { begin record wais }
2156 /^From:/ BODY /$/ { data -element name $1 }
2157 /^Subject:/ BODY /$/ { data -element title $1 }
2158 /^Date:/ BODY /$/ { data -element lastModified $1 }
2160 begin element bodyOfDisplay
2161 begin variant body iana "text/plain"
2170 If Zebra is compiled with support for Tcl (Tool Command Language)
2171 enabled, the statements described above are supplemented with a complete
2172 scripting environment, including control structures (conditional
2173 expressions and loop constructs), and powerful string manipulation
2174 mechanisms for modifying the elements of a record. Tcl is a popular
2175 scripting environment, with several tutorials available both online
2180 <emphasis>NOTE: Tcl support is not currently available, but will be
2181 included with one of the next alpha or beta releases.</emphasis>
2185 <emphasis>NOTE: Variant support is not currently available in the input
2186 filter, but will be included with one of the next alpha or beta
2187 releases.</emphasis>
2194 <sect1 id="internal-representation">
2195 <title>Internal Representation</title>
2198 When records are manipulated by the system, they're represented in a
2199 tree-structure, with data elements at the leaf nodes, and tags or
2200 variant components at the non-leaf nodes. The root-node identifies the
2201 schema that lends context to the tagging and structuring of the
2202 record. Imagine a simple record, consisting of a 'title' element and
2203 an 'author' element:
2209 TITLE "Zen and the Art of Motorcycle Maintenance"
2211 AUTHOR "Robert Pirsig"
2217 A slightly more complex record would have the author element consist
2218 of two elements, a surname and a first name:
2224 TITLE "Zen and the Art of Motorcycle Maintenance"
2234 The root of the record will refer to the record schema that describes
2235 the structuring of this particular record. The schema defines the
2236 element tags (TITLE, FIRST-NAME, etc.) that may occur in the record, as
2237 well as the structuring (SURNAME should appear below AUTHOR, etc.). In
2238 addition, the schema establishes element set names that are used by
2239 the client to request a subset of the elements of a given record. The
2240 schema may also establish rules for converting the record to a
2241 different schema, by stating, for each element, a mapping to a
2246 <title>Tagged Elements</title>
2249 A data element is characterized by its tag, and its position in the
2250 structure of the record. For instance, while the tag "telephone
2251 number" may be used different places in a record, we may need to
2252 distinguish between these occurrences, both for searching and
2253 presentation purposes. For instance, while the phone numbers for the
2254 "customer" and the "service provider" are both
2255 representatives for the same type of resource (a telephone number), it
2256 is essential that they be kept separate. The record schema provides
2257 the structure of the record, and names each data element (defined by
2258 the sequence of tags - the tag path - by which the element can be
2259 reached from the root of the record).
2265 <title>Variants</title>
2268 The children of a tag node may be either more tag nodes, a data node
2269 (possibly accompanied by tag nodes),
2270 or a tree of variant nodes. The children of variant nodes are either
2271 more variant nodes or a data node (possibly accompanied by more
2272 variant nodes). Each leaf node, which is normally a
2273 data node, corresponds to a <emphasis>variant form</emphasis> of the
2274 tagged element identified by the tag which parents the variant tree.
2275 The following title element occurs in two different languages:
2281 VARIANT LANG=ENG "War and Peace"
2283 VARIANT LANG=DAN "Krig og Fred"
2289 Which of the two elements are transmitted to the client by the server
2290 depends on the specifications provided by the client, if any.
2294 In practice, each variant node is associated with a triple of class,
2295 type, value, corresponding to the variant mechanism of Z39.50.
2301 <title>Data Elements</title>
2304 Data nodes have no children (they are always leaf nodes in the record
2310 Documentation needs extension here about types of nodes - numerical,
2311 textual, etc., plus the various types of inclusion notes.
2319 <sect1 id="data-model">
2320 <title>Configuring Your Data Model</title>
2323 The following sections describe the configuration files that govern
2324 the internal management of data records. The system searches for the files
2325 in the directories specified by the <emphasis>profilePath</emphasis>
2326 setting in the <literal>zebra.cfg</literal> file.
2330 <title>The Abstract Syntax</title>
2333 The abstract syntax definition (also known as an Abstract Record
2334 Structure, or ARS) is the focal point of the
2335 record schema description. For a given schema, the ABS file may state any
2336 or all of the following:
2345 The object identifier of the Z39.50 schema associated
2346 with the ARS, so that it can be referred to by the client.
2352 The attribute set (which can possibly be a compound of multiple
2353 sets) which applies in the profile. This is used when indexing and
2354 searching the records belonging to the given profile.
2360 The Tag set (again, this can consist of several different sets).
2361 This is used when reading the records from a file, to recognize the
2362 different tags, and when transmitting the record to the client -
2363 mapping the tags to their numerical representation, if they are
2370 The variant set which is used in the profile. This provides a
2371 vocabulary for specifying the <emphasis>forms</emphasis> of data that appear inside
2378 Element set names, which are a shorthand way for the client to
2379 ask for a subset of the data elements contained in a record. Element
2380 set names, in the retrieval module, are mapped to <emphasis>element
2381 specifications</emphasis>, which contain information equivalent to the
2382 <emphasis>Espec-1</emphasis> syntax of Z39.50.
2388 Map tables, which may specify mappings to
2389 <emphasis>other</emphasis> database profiles, if desired.
2395 Possibly, a set of rules describing the mapping of elements to a
2396 MARC representation.
2403 A list of element descriptions (this is the actual ARS of the
2404 schema, in Z39.50 terms), which lists the ways in which the various
2405 tags can be used and organized hierarchically.
2414 Several of the entries above simply refer to other files, which
2415 describe the given objects.
2421 <title>The Configuration Files</title>
2424 This section describes the syntax and use of the various tables which
2425 are used by the retrieval module.
2429 The number of different file types may appear daunting at first, but
2430 each type corresponds fairly clearly to a single aspect of the Z39.50
2431 retrieval facilities. Further, the average database administrator,
2432 who is simply reusing an existing profile for which tables already
2433 exist, shouldn't have to worry too much about the contents of these tables.
2437 Generally, the files are simple ASCII files, which can be maintained
2438 using any text editor. Blank lines, and lines beginning with a (#) are
2439 ignored. Any characters on a line followed by a (#) are also ignored.
2440 All other lines contain <emphasis>directives</emphasis>, which provide
2441 some setting or value to the system.
2442 Generally, settings are characterized by a single
2443 keyword, identifying the setting, followed by a number of parameters.
2444 Some settings are repeatable (r), while others may occur only once in a
2445 file. Some settings are optional (o), whicle others again are
2452 <title>The Abstract Syntax (.abs) Files</title>
2455 The name of this file type is slightly misleading in Z39.50 terms,
2456 since, apart from the actual abstract syntax of the profile, it also
2457 includes most of the other definitions that go into a database
2462 When a record in the canonical, SGML-like format is read from a file
2463 or from the database, the first tag of the file should reference the
2464 profile that governs the layout of the record. If the first tag of the
2465 record is, say, <literal><gils></literal>, the system will look
2466 for the profile definition in the file <literal>gils.abs</literal>.
2467 Profile definitions are cached, so they only have to be read once
2468 during the lifespan of the current process.
2472 When writing your own input filters, the
2473 <emphasis>record-begin</emphasis> command
2474 introduces the profile, and should always be called first thing when
2475 introducing a new record.
2479 The file may contain the following directives:
2486 <term>name <emphasis>symbolic-name</emphasis></term>
2489 (m) This provides a shorthand name or
2490 description for the profile. Mostly useful for diagnostic purposes.
2495 <term>reference <emphasis>OID-name</emphasis></term>
2498 (m) The reference name of the OID for the profile.
2499 The reference names can be found in the <emphasis>util</emphasis>
2500 module of <emphasis>YAZ</emphasis>.
2505 <term>attset <emphasis>filename</emphasis></term>
2508 (m) The attribute set that is used for
2509 indexing and searching records belonging to this profile.
2514 <term>tagset <emphasis>filename</emphasis></term>
2517 (o) The tag set (if any) that describe
2518 that fields of the records.
2523 <term>varset <emphasis>filename</emphasis></term>
2526 (o) The variant set used in the profile.
2531 <term>maptab <emphasis>filename</emphasis></term>
2534 (o,r) This points to a
2535 conversion table that might be used if the client asks for the record
2536 in a different schema from the native one.
2538 </listitem></varlistentry>
2540 <term>marc <emphasis>filename</emphasis></term>
2543 (o) Points to a file containing parameters
2544 for representing the record contents in the ISO2709 syntax. Read the
2545 description of the MARC representation facility below.
2547 </listitem></varlistentry>
2549 <term>esetname <emphasis>name filename</emphasis></term>
2552 (o,r) Associates the
2553 given element set name with an element selection file. If an (@) is
2554 given in place of the filename, this corresponds to a null mapping for
2555 the given element set name.
2557 </listitem></varlistentry>
2559 <term>any <emphasis>tags</emphasis></term>
2562 (o) This directive specifies a list of attributes
2563 which should be appended to the attribute list given for each
2564 element. The effect is to make every single element in the abstract
2565 syntax searchable by way of the given attributes. This directive
2566 provides an efficient way of supporting free-text searching across all
2567 elements. However, it does increase the size of the index
2568 significantly. The attributes can be qualified with a structure, as in
2569 the <emphasis>elm</emphasis> directive below.
2571 </listitem></varlistentry>
2573 <term>elm <emphasis>path name attributes</emphasis></term>
2576 (o,r) Adds an element to the abstract record syntax of the schema.
2577 The <emphasis>path</emphasis> follows the
2578 syntax which is suggested by the Z39.50 document - that is, a sequence
2579 of tags separated by slashes (/). Each tag is given as a
2580 comma-separated pair of tag type and -value surrounded by parenthesis.
2581 The <emphasis>name</emphasis> is the name of the element, and
2582 the <emphasis>attributes</emphasis>
2583 specifies which attributes to use when indexing the element in a
2584 comma-separated list.
2585 A ! in place of the attribute name is equivalent to
2586 specifying an attribute name identical to the element name.
2587 A - in place of the attribute name
2588 specifies that no indexing is to take place for the given element.
2589 The attributes can be qualified with <emphasis>field
2590 types</emphasis> to specify which
2591 character set should govern the indexing procedure for that field.
2592 The same data element may be indexed into several different
2593 fields, using different character set definitions.
2594 See the section <xref linkend="field-structure-and-character-sets"/>.
2595 The default field type is "w" for <emphasis>word</emphasis>.
2597 </listitem></varlistentry>
2603 The mechanism for controlling indexing is not adequate for
2604 complex databases, and will probably be moved into a separate
2605 configuration table eventually.
2610 The following is an excerpt from the abstract syntax file for the GILS
2618 reference GILS-schema
2623 maptab gils-usmarc.map
2627 esetname VARIANT gils-variant.est # for WAIS-compliance
2628 esetname B gils-b.est
2629 esetname G gils-g.est
2634 elm (1,14) localControlNumber Local-number
2635 elm (1,16) dateOfLastModification Date/time-last-modified
2636 elm (2,1) title w:!,p:!
2637 elm (4,1) controlIdentifier Identifier-standard
2638 elm (2,6) abstract Abstract
2639 elm (4,51) purpose !
2640 elm (4,52) originator -
2641 elm (4,53) accessConstraints !
2642 elm (4,54) useConstraints !
2643 elm (4,70) availability -
2644 elm (4,70)/(4,90) distributor -
2645 elm (4,70)/(4,90)/(2,7) distributorName !
2646 elm (4,70)/(4,90)/(2,10 distributorOrganization !
2647 elm (4,70)/(4,90)/(4,2) distributorStreetAddress !
2648 elm (4,70)/(4,90)/(4,3) distributorCity !
2655 <sect2 id="attset-files">
2656 <title>The Attribute Set (.att) Files</title>
2659 This file type describes the <emphasis>Use</emphasis> elements of
2661 It contains the following directives.
2667 <term>name <emphasis>symbolic-name</emphasis></term>
2670 (m) This provides a shorthand name or
2671 description for the attribute set.
2672 Mostly useful for diagnostic purposes.
2674 </listitem></varlistentry>
2676 <term>reference <emphasis>OID-name</emphasis></term>
2679 (m) The reference name of the OID for
2681 The reference names can be found in the <emphasis>util</emphasis>
2682 module of <emphasis>YAZ</emphasis>.
2684 </listitem></varlistentry>
2686 <term>include <emphasis>filename</emphasis></term>
2689 (o,r) This directive is used to
2690 include another attribute set as a part of the current one. This is
2691 used when a new attribute set is defined as an extension to another
2692 set. For instance, many new attribute sets are defined as extensions
2693 to the <emphasis>bib-1</emphasis> set.
2694 This is an important feature of the retrieval
2695 system of Z39.50, as it ensures the highest possible level of
2696 interoperability, as those access points of your database which are
2697 derived from the external set (say, bib-1) can be used even by clients
2698 who are unaware of the new set.
2700 </listitem></varlistentry>
2703 <emphasis>att-value att-name [local-value]</emphasis></term>
2707 repeatable directive introduces a new attribute to the set. The
2708 attribute value is stored in the index (unless a
2709 <emphasis>local-value</emphasis> is
2710 given, in which case this is stored). The name is used to refer to the
2711 attribute from the <emphasis>abstract syntax</emphasis>.
2713 </listitem></varlistentry>
2718 This is an excerpt from the GILS attribute set definition.
2719 Notice how the file describing the <emphasis>bib-1</emphasis>
2720 attribute set is referenced.
2727 reference GILS-attset
2730 att 2001 distributorName
2731 att 2002 indextermsControlled
2733 att 2004 accessConstraints
2734 att 2005 useConstraints
2742 <title>The Tag Set (.tag) Files</title>
2745 This file type defines the tagset of the profile, possibly by
2746 referencing other tag sets (most tag sets, for instance, will include
2747 tagsetG and tagsetM from the Z39.50 specification. The file may
2748 contain the following directives.
2755 <term>name <emphasis>symbolic-name</emphasis></term>
2758 (m) This provides a shorthand name or
2759 description for the tag set. Mostly useful for diagnostic purposes.
2761 </listitem></varlistentry>
2763 <term>reference <emphasis>OID-name</emphasis></term>
2766 (o) The reference name of the OID for the tag set.
2767 The reference names can be found in the <emphasis>util</emphasis>
2768 module of <emphasis>YAZ</emphasis>.
2769 The directive is optional, since not all tag sets
2770 are registered outside of their schema.
2772 </listitem></varlistentry>
2774 <term>type <emphasis>integer</emphasis></term>
2777 (m) The type number of the tagset within the schema
2778 profile (note: this specification really should belong to the .abs
2779 file. This will be fixed in a future release).
2781 </listitem></varlistentry>
2783 <term>include <emphasis>filename</emphasis></term>
2786 (o,r) This directive is used
2787 to include the definitions of other tag sets into the current one.
2789 </listitem></varlistentry>
2791 <term>tag <emphasis>number names type</emphasis></term>
2794 (o,r) Introduces a new tag to the set.
2795 The <emphasis>number</emphasis> is the tag number as used
2796 in the protocol (there is currently no mechanism for
2797 specifying string tags at this point, but this would be quick
2799 The <emphasis>names</emphasis> parameter is a list of names
2800 by which the tag should be recognized in the input file format.
2801 The names should be separated by slashes (/).
2802 The <emphasis>type</emphasis> is th recommended datatype of
2804 It should be one of the following:
2870 </listitem></varlistentry>
2875 The following is an excerpt from the TagsetG definition file.
2886 tag 3 publicationPlace string
2887 tag 4 publicationDate string
2888 tag 5 documentId string
2889 tag 6 abstract string
2891 tag 8 date generalizedtime
2892 tag 9 bodyOfDisplay string
2893 tag 10 organization string
2899 <sect2 id="variant-set">
2900 <title>The Variant Set (.var) Files</title>
2903 The variant set file is a straightforward representation of the
2904 variant set definitions associated with the protocol. At present, only
2905 the <emphasis>Variant-1</emphasis> set is known.
2909 These are the directives allowed in the file.
2916 <term>name <emphasis>symbolic-name</emphasis></term>
2919 (m) This provides a shorthand name or
2920 description for the variant set. Mostly useful for diagnostic purposes.
2922 </listitem></varlistentry>
2924 <term>reference <emphasis>OID-name</emphasis></term>
2927 (o) The reference name of the OID for
2928 the variant set, if one is required. The reference names can be found
2929 in the <emphasis>util</emphasis> module of <emphasis>YAZ</emphasis>.
2931 </listitem></varlistentry>
2933 <term>class <emphasis>integer class-name</emphasis></term>
2936 (m,r) Introduces a new
2937 class to the variant set.
2939 </listitem></varlistentry>
2941 <term>type <emphasis>integer type-name datatype</emphasis></term>
2945 new type to the current class (the one introduced by the most recent
2946 <emphasis>class</emphasis> directive).
2947 The type names belong to the same name space as the one used
2948 in the tag set definition file.
2950 </listitem></varlistentry>
2955 The following is an excerpt from the file describing the variant set
2956 <emphasis>Variant-1</emphasis>.
2967 type 1 variantId octetstring
2972 type 2 z39.50 string
2981 <title>The Element Set (.est) Files</title>
2984 The element set specification files describe a selection of a subset
2985 of the elements of a database record. The element selection mechanism
2986 is equivalent to the one supplied by the <emphasis>Espec-1</emphasis>
2987 syntax of the Z39.50 specification.
2988 In fact, the internal representation of an element set
2989 specification is identical to the <emphasis>Espec-1</emphasis> structure,
2990 and we'll refer you to the description of that structure for most of
2991 the detailed semantics of the directives below.
2996 Not all of the Espec-1 functionality has been implemented yet.
2997 The fields that are mentioned below all work as expected, unless
3003 The directives available in the element set file are as follows:
3009 <term>defaultVariantSetId <emphasis>OID-name</emphasis></term>
3012 (o) If variants are used in
3013 the following, this should provide the name of the variantset used
3014 (it's not currently possible to specify a different set in the
3015 individual variant request). In almost all cases (certainly all
3016 profiles known to us), the name
3017 <literal>Variant-1</literal> should be given here.
3019 </listitem></varlistentry>
3021 <term>defaultVariantRequest <emphasis>variant-request</emphasis></term>
3025 provides a default variant request for
3026 use when the individual element requests (see below) do not contain a
3027 variant request. Variant requests consist of a blank-separated list of
3028 variant components. A variant compont is a comma-separated,
3029 parenthesized triple of variant class, type, and value (the two former
3030 values being represented as integers). The value can currently only be
3031 entered as a string (this will change to depend on the definition of
3032 the variant in question). The special value (@) is interpreted as a
3033 null value, however.
3035 </listitem></varlistentry>
3038 <emphasis>path ['variant' variant-request]</emphasis></term>
3041 (o,r) This corresponds to a simple element request
3042 in <emphasis>Espec-1</emphasis>.
3043 The path consists of a sequence of tag-selectors, where each of
3044 these can consist of either:
3051 A simple tag, consisting of a comma-separated type-value pair in
3052 parenthesis, possibly followed by a colon (:) followed by an
3053 occurrences-specification (see below). The tag-value can be a number
3054 or a string. If the first character is an apostrophe ('), this
3055 forces the value to be interpreted as a string, even if it
3056 appears to be numerical.
3062 A WildThing, represented as a question mark (?), possibly
3063 followed by a colon (:) followed by an occurrences
3064 specification (see below).
3070 A WildPath, represented as an asterisk (*). Note that the last
3071 element of the path should not be a wildPath (wildpaths don't
3072 work in this version).
3081 The occurrences-specification can be either the string
3082 <literal>all</literal>, the string <literal>last</literal>, or
3083 an explicit value-range. The value-range is represented as
3084 an integer (the starting point), possibly followed by a
3085 plus (+) and a second integer (the number of elements, default
3090 The variant-request has the same syntax as the defaultVariantRequest
3091 above. Note that it may sometimes be useful to give an empty variant
3092 request, simply to disable the default for a specific set of fields
3093 (we aren't certain if this is proper <emphasis>Espec-1</emphasis>,
3094 but it works in this implementation).
3096 </listitem></varlistentry>
3101 The following is an example of an element specification belonging to
3108 simpleelement (1,10)
3109 simpleelement (1,12)
3111 simpleelement (1,14)
3113 simpleelement (4,52)
3120 <sect2 id="schema-mapping">
3121 <title>The Schema Mapping (.map) Files</title>
3124 Sometimes, the client might want to receive a database record in
3125 a schema that differs from the native schema of the record. For
3126 instance, a client might only know how to process WAIS records, while
3127 the database record is represented in a more specific schema, such as
3128 GILS. In this module, a mapping of data to one of the MARC formats is
3129 also thought of as a schema mapping (mapping the elements of the
3130 record into fields consistent with the given MARC specification, prior
3131 to actually converting the data to the ISO2709). This use of the
3132 object identifier for USMARC as a schema identifier represents an
3133 overloading of the OID which might not be entirely proper. However,
3134 it represents the dual role of schema and record syntax which
3135 is assumed by the MARC family in Z39.50.
3139 <emphasis>NOTE: The schema-mapping functions are so far limited to a
3140 straightforward mapping of elements. This should be extended with
3141 mechanisms for conversions of the element contents, and conditional
3142 mappings of elements based on the record contents.</emphasis>
3146 These are the directives of the schema mapping file format:
3153 <term>targetName <emphasis>name</emphasis></term>
3156 (m) A symbolic name for the target schema
3157 of the table. Useful mostly for diagnostic purposes.
3159 </listitem></varlistentry>
3161 <term>targetRef <emphasis>OID-name</emphasis></term>
3164 (m) An OID name for the target schema.
3165 This is used, for instance, by a server receiving a request to present
3166 a record in a different schema from the native one.
3167 The name, again, is found in the <emphasis>oid</emphasis>
3168 module of <emphasis>YAZ</emphasis>.
3170 </listitem></varlistentry>
3172 <term>map <emphasis>element-name target-path</emphasis></term>
3176 an element mapping rule to the table.
3178 </listitem></varlistentry>
3185 <title>The MARC (ISO2709) Representation (.mar) Files</title>
3188 This file provides rules for representing a record in the ISO2709
3189 format. The rules pertain mostly to the values of the constant-length
3190 header of the record.
3194 <emphasis>NOTE: This will be described better. We're in the process of
3195 re-evaluating and most likely changing the way that MARC records are
3196 handled by the system.</emphasis>
3201 <sect2 id="field-structure-and-character-sets">
3202 <title>Field Structure and Character Sets
3206 In order to provide a flexible approach to national character set
3207 handling, Zebra allows the administrator to configure the set up the
3208 system to handle any 8-bit character set — including sets that
3209 require multi-octet diacritics or other multi-octet characters. The
3210 definition of a character set includes a specification of the
3211 permissible values, their sort order (this affects the display in the
3212 SCAN function), and relationships between upper- and lowercase
3213 characters. Finally, the definition includes the specification of
3214 space characters for the set.
3218 The operator can define different character sets for different fields,
3219 typical examples being standard text fields, numerical fields, and
3220 special-purpose fields such as WWW-style linkages (URx).
3224 The field types, and hence character sets, are associated with data
3225 elements by the .abs files (see above).
3226 The file <literal>default.idx</literal>
3227 provides the association between field type codes (as used in the .abs
3228 files) and the character map files (with the .chr suffix). The format
3229 of the .idx file is as follows
3236 <term>index <emphasis>field type code</emphasis></term>
3239 This directive introduces a new search index code.
3240 The argument is a one-character code to be used in the
3241 .abs files to select this particular index type. An index, roughly,
3242 corresponds to a particular structure attribute during search. Refer
3243 to section <xref linkend="search"/>.
3245 </listitem></varlistentry>
3247 <term>sort <emphasis>field code type</emphasis></term>
3250 This directive introduces a
3251 sort index. The argument is a one-character code to be used in the
3252 .abs fie to select this particular index type. The corresponding
3253 use attribute must be used in the sort request to refer to this
3254 particular sort index. The corresponding character map (see below)
3255 is used in the sort process.
3257 </listitem></varlistentry>
3259 <term>completeness <emphasis>boolean</emphasis></term>
3262 This directive enables or disables complete field indexing.
3263 The value of the <emphasis>boolean</emphasis> should be 0
3264 (disable) or 1. If completeness is enabled, the index entry will
3265 contain the complete contents of the field (up to a limit), with words
3266 (non-space characters) separated by single space characters
3267 (normalized to " " on display). When completeness is
3268 disabled, each word is indexed as a separate entry. Complete subfield
3269 indexing is most useful for fields which are typically browsed (eg.
3270 titles, authors, or subjects), or instances where a match on a
3271 complete subfield is essential (eg. exact title searching). For fields
3272 where completeness is disabled, the search engine will interpret a
3273 search containing space characters as a word proximity search.
3275 </listitem></varlistentry>
3277 <term>charmap <emphasis>filename</emphasis></term>
3280 This is the filename of the character
3281 map to be used for this index for field type.
3283 </listitem></varlistentry>
3288 The contents of the character map files are structured as follows:
3295 <term>lowercase <emphasis>value-set</emphasis></term>
3298 This directive introduces the basic value set of the field type.
3299 The format is an ordered list (without spaces) of the
3300 characters which may occur in "words" of the given type.
3301 The order of the entries in the list determines the
3302 sort order of the index. In addition to single characters, the
3303 following combinations are legal:
3311 Backslashes may be used to introduce three-digit octal, or
3312 two-digit hex representations of single characters
3313 (preceded by <literal>x</literal>).
3314 In addition, the combinations
3315 \\, \\r, \\n, \\t, \\s (space — remember that real
3316 space-characters may ot occur in the value definition), and
3317 \\ are recognised, with their usual interpretation.
3323 Curly braces {} may be used to enclose ranges of single
3324 characters (possibly using the escape convention described in the
3325 preceding point), eg. {a-z} to entroduce the
3326 standard range of ASCII characters.
3327 Note that the interpretation of such a range depends on
3328 the concrete representation in your local, physical character set.
3334 paranthesises () may be used to enclose multi-byte characters -
3335 eg. diacritics or special national combinations (eg. Spanish
3336 "ll"). When found in the input stream (or a search term),
3337 these characters are viewed and sorted as a single character, with a
3338 sorting value depending on the position of the group in the value
3346 </listitem></varlistentry>
3348 <term>uppercase <emphasis>value-set</emphasis></term>
3351 This directive introduces the
3352 upper-case equivalencis to the value set (if any). The number and
3353 order of the entries in the list should be the same as in the
3354 <literal>lowercase</literal> directive.
3356 </listitem></varlistentry>
3358 <term>space <emphasis>value-set</emphasis></term>
3361 This directive introduces the character
3362 which separate words in the input stream. Depending on the
3363 completeness mode of the field in question, these characters either
3364 terminate an index entry, or delimit individual "words" in
3365 the input stream. The order of the elements is not significant —
3366 otherwise the representation is the same as for the
3367 <literal>uppercase</literal> and <literal>lowercase</literal>
3370 </listitem></varlistentry>
3372 <term>map <emphasis>value-set</emphasis>
3373 <emphasis>target</emphasis></term>
3376 This directive introduces a
3377 mapping between each of the members of the value-set on the left to
3378 the character on the right. The character on the right must occur in
3379 the value set (the <literal>lowercase</literal> directive) of
3380 the character set, but
3381 it may be a paranthesis-enclosed multi-octet character. This directive
3382 may be used to map diacritics to their base characters, or to map
3383 HTML-style character-representations to their natural form, etc.
3385 </listitem></varlistentry>
3393 <sect1 id="formats">
3394 <title>Exchange Formats</title>
3397 Converting records from the internal structure to en exchange format
3398 is largely an automatic process. Currently, the following exchange
3399 formats are supported:
3406 GRS-1. The internal representation is based on GRS-1, so the
3407 conversion here is straightforward. The system will create
3408 applied variant and supported variant lists as required, if a record
3409 contains variant information.
3415 SUTRS. Again, the mapping is fairly straighforward. Indentation
3416 is used to show the hierarchical structure of the record. All
3417 "GRS" type records support both the GRS-1 and SUTRS
3424 ISO2709-based formats (USMARC, etc.). Only records with a
3425 two-level structure (corresponding to fields and subfields) can be
3426 directly mapped to ISO2709. For records with a different structuring
3427 (eg., GILS), the representation in a structure like USMARC involves a
3428 schema-mapping (see section <xref linkend="schema-mapping"/>), to an
3429 "implied" USMARC schema (implied,
3430 because there is no formal schema which specifies the use of the
3431 USMARC fields outside of ISO2709). The resultant, two-level record is
3432 then mapped directly from the internal representation to ISO2709. See
3433 the GILS schema definition files for a detailed example of this
3440 Explain. This representation is only available for records
3441 belonging to the Explain schema.
3447 Summary. This ASN-1 based structure is only available for records
3448 belonging to the Summary schema - or schema which provide a mapping
3449 to this schema (see the description of the schema mapping facility
3456 SOIF. Support for this syntax is experimental, and is currently
3457 keyed to a private Index Data OID (1.2.840.10003.5.1000.81.2). All
3458 abstract syntaxes can be mapped to the SOIF format, although nested
3459 elements are represented by concatenation of the tag names at each
3471 <!-- Keep this comment at the end of the file
3476 sgml-minimize-attributes:nil
3477 sgml-always-quote-attributes:t
3480 sgml-parent-document: "zebra.xml"
3481 sgml-local-catalogs: nil
3482 sgml-namecase-general:t