doc/zebra.sgml

   1 <!doctype linuxdoc system>
   2
   3 <!--
   4   $Id: zebra.sgml,v 1.47 2000-02-25 11:35:41 adam Exp $
   5 -->
   6
   7 <article>
   8 <title>Zebra Server - Administrators's Guide and Reference
   9 <author><htmlurl url="http://www.indexdata.dk/" name="Index Data">,
  10 <tt><htmlurl url="mailto:info@indexdata.dk" name="info@indexdata.dk"></>
  11 <date>$Revision: 1.47 $
  12 <abstract>
  13
  14
  15 The Zebra server combines a versatile fielded/free-text
  16 indexing/search engine with a Z39.50-1995 frontend to provide a powerful and flexible
  17 information mining tool. This document explains the procedure for
  18 installing and configuring Zebra, and outlines the possibilities
  19 for managing data and providing Z39.50
  20 services with the software. Zebra is a free version of the Index Data Z'mbol
  21 information system, and it excludes some functionality such as incremental
  22 database updating and support for large databases.
  23 </abstract>
  24
  25 <toc>
  26
  27 <sect>Introduction
  28
  29 <sect1>Overview
  30
  31 <p>
  32 Zebra is a fielded free-text indexing and retrieval engine with a
  33 Z39.50 frontend. You can use any commercial or freeware Z39.50 client
  34 to access data stored in Zebra.
  35
  36 Zebra server can be used at the core of a Z39.50-based information retrieval
  37 framework. We're making
  38 the server available now to allow researchers and small organisations to
  39 share their information in the best possible way. We believe that Z39.50
  40 currently represents one of the best ways of sharing information with others, and
  41 we would like to encourage as many people as possible to do so.
  42 This document is a guide to using Zebra. It will tell you
  43 how to compile the software, and how to prepare your first database.
  44 It also explains how the server can be configured to give you the
  45 functionality that you need.
  46
  47 If you find the software interesting, you should join the support
  48 mailing-list by sending email to <tt/zebra-request@indexdata.dk/.
  49
  50 If you are interested in running a commercial service, if you wish to run large
  51 databases, or if you wish to make incremental updates to your databases even
  52 while users are accessing your system, then you might be interested in the Z'mbol
  53 Information Server which is available from <htmlurl
  54 url="http://www.indexdata.dk/zmbol/" name="Index Data"> or Fretwell-Downing
  55 Informatics. Z'mbol is a complete and supported package which offers many
  56 exciting possibilities that we have not been able to fit into this package.
  57
  58 <sect1>Features
  59
  60 <p>
  61 This is a list of some of the most important features of the
  62 system.
  63
  64 <itemize>
  65
  66 <item>
  67 Supports arbitrarily complex records - base input format is an
  68 XML-like syntax which allows nested (structured) data elements, as
  69 well as variant forms of data.
  70
  71 <item>
  72 Supports random storage formats. A system of input filters driven by
  73 regular expressions allows you to easily process most ASCII-based
  74 data formats. SGML/XML, ISO2709 (MARC), and raw text are also supported.
  75
  76 <item>
  77 Supports boolean queries as well as relevance-ranking (free-text)
  78 searching. Right truncation and masking in terms are supported, as
  79 well as full regular expressions.
  80
  81 <item>
  82 Supports multiple concrete syntaxes
  83 for record exchange (depending on the configuration): GRS-1, SUTRS,
  84 ISO2709 (*MARC), XML. Records can be mapped between record syntaxes and
  85 schema on the fly.
  86
  87 <item>
  88 Supports approximate matching in registers (ie. spelling mistakes,
  89 etc).
  90
  91 <item> Supports a subset of the Z39.50 Explain Facility. Zebra's Explain database
  92 is automatically updated when a set of records is loaded into Zebra.
  93
  94 </itemize>
  95
  96 <p>
  97 Protocol support:
  98
  99 <itemize>
 100
 101 <item>
 102 Protocol facilities: Init, Search, Retrieve, Browse, Sort, Close, and Explain.
 103
 104 <item>
 105 Piggy-backed presents are honored in the search-request.
 106
 107 <item>
 108 Named result sets are supported.
 109
 110 <item>
 111 Easily configured to support different application profiles, with
 112 tables for attribute sets, tag sets, and abstract syntaxes.
 113 Additional tables control facilities such as element mappings to
 114 different schema (eg., GILS-to-USMARC).
 115
 116 <item>
 117 Complex composition specifications using Espec-1 are partially
 118 supported (simple element requests only).
 119
 120 <item>
 121 Element Set Names are defined using the Espec-1 capability of the
 122 system, and are given in configuration files as simple element
 123 requests (and possibly variant requests).
 124
 125 <item>
 126 Zebra runs on most Unix-like systems as well as Windows NT - a binary
 127 distribution for Windows NT is forthcoming - so far, the installation
 128 requires Microsoft Visual C++ to compile the system (we use version 6.0).
 129
 130 </itemize>
 131
 132 <sect1>Future Work
 133
 134 <p>
 135
 136 These are some of the plans that we have for the software in the near
 137 and far future, approximately ordered after their relative importance.
 138 Items marked with an
 139 asterisk will be implemented before the
 140 last beta release.
 141
 142 <itemize>
 143
 144 <item>
 145 *Complete the support for variants.
 146
 147 <item>
 148 *Finalize the data element <it/include/ facility to support multimedia
 149 data elements in records.
 150
 151 <item>
 152 Add more sophisticated relevance ranking mechanisms. Add support for soundex
 153 and stemming. Add relevance <it/feedback/ support.
 154
 155 <item>
 156 Complete EXPLAIN support.
 157
 158 <item>
 159 We want to add a management system that allows you to
 160 control your databases and configuration tables from a graphical
 161 interface. We'll probably use Tcl/Tk to stay platform-independent.
 162
 163 </itemize>
 164
 165 Programmers thrive on user feedback. If you are interested in a facility that
 166 you don't see mentioned here, or if there's something you think we
 167 could do better, please drop us a mail. If you think it's all really
 168 neat, you're welcome to drop us a line saying that, too. You'll find
 169 contact info at the end of this file.
 170
 171 <sect>Compiling the software
 172 <p>
 173 You need the
 174 <bf><htmlurl url="http://www.indexdata.dk/yaz/" name="YAZ"></>
 175 package in order to compile this software. We suggest you
 176 unpack <bf/YAZ/ in the same directory as Zebra. Running
 177 ./configure (UNIX Only) and running make (nmake on WIN32) is
 178 in usully what it takes to compile YAZ.
 179
 180 <sect1>UNIX
 181 <p>
 182 An ANSI C compiler is required to compile the Zebra
 183 server system &mdash; <tt/gcc/ works very well if your own system doesn't
 184 provide an adequate compiler.
 185
 186 Unpack the distribution archive. The <tt>configure</tt> shell script
 187 attempts to guess correct values for various system-dependent variables
 188 used during compilation. It uses those values to create a 'Makefile' in
 189 each directory of Zebra.
 190
 191 To run the configure script type:
 192 <tscreen><verb>
 193   ./configure
 194 </verb></tscreen>
 195
 196 The configure script attempts to use the C compiler specified by
 197 the <tt>CC</tt> environment variable. If not set, GNU C
 198 will be used if it is available. The <tt>CFLAGS</tt> environment variable
 199 holds options to be passed to the C compiler. If you're using a
 200 Bourne-compatible shell you may pass something like this:
 201 <tscreen><verb>
 202   CC=/opt/ccs/bin/cc CFLAGS=-O ./configure
 203 </verb></tscreen>
 204
 205 To customize Zebra the configure script accepts a set of options. The
 206 most important are
 207 <descrip>
 208 <tag><tt>-</tt><tt>-prefix </tt>path</tag> Specifies installation prefix. This is
 209 only needed if you run <tt>make install</tt> later to perform a
 210 "system" installation. The prefix is <tt>/usr/local</tt> if not
 211 specified.
 212 <tag><tt>-</tt><tt>-with-tclconfig=</tt>DIR</tag> If Tcl is installed on
 213 the system you can tell configure in which directory Tcl's
 214 <tt>tclConfig.sh</tt> is stored. The <tt>tclConfig.sh</tt> include
 215 information about settings required to link with Tcl's libraries.
 216 If you don't specify this option, configure will see if Tcl's shell
 217 <tt>tclsh</tt> is in your path and if it is, it will guess where
 218 the equivalent tclConfig.sh is located. If tclsh is not found in
 219 your path and this option is not given Zebra will not include Tcl support.
 220 <tag><tt>-</tt><tt>-with-yazconfig=</tt>DIR</tag> This options allows you to
 221 specify the directory that contains YAZ's <tt>yaz-config</tt>.
 222 This options is useful if you wish to compile Zebra with a specific
 223 version of YAZ. YAZ version 1.5 and later creates a script
 224 <tt>yaz-config</tt> that includes information on compiler settings
 225 needed to link with it.
 226 </descrip>
 227
 228 When configured build the software by typing:
 229 <tscreen><verb>
 230   make
 231 </verb></tscreen>
 232
 233 As an option you may type <tt>make depend</tt> to create
 234 source file dependencies for the package. This is only needed,
 235 however, if you modify the source code later.
 236
 237 If successful, two executables have been created in the sub-directory
 238 <tt>bin</tt>.
 239 <descrip>
 240 <tag><tt>zebrasrv</tt></tag> The Z39.50 server and search engine.
 241 <tag><tt>zebraidx</tt></tag> The administrative tool for the search index.
 242 </descrip>
 243
 244 <p>
 245 The next step is optional and is only needed if you wish to install
 246 zebra in system directories such as /usr/bin, /usr/lib, etc.
 247
 248 To perform this step, type
 249 <tscreen><verb>
 250   make install
 251 </verb></tscreen>
 252
 253 The executables will be installed in prefix/bin, and profile
 254 tables will be installed in prefix/lib/zebra/tab. Here prefix
 255 represents the prefix as specified -- default being /usr/local.
 256
 257 <sect1>WIN32
 258
 259 <p>
 260 Zebra is shipped with "makefiles" for the NMAKE tool that comes
 261 with Visual C++.
 262
 263 Start an MS-DOS prompt and switch the sub directory <tt>WIN</tt> where
 264 the file <tt>makefile</tt> is located. Customize the installation
 265 by editing the <tt>makefile</tt> file (for example by using wordpad).
 266
 267 The following summarises the most important settings in that file.
 268
 269 <descrip>
 270 <tag><tt>YAZDIR</tt></tag> Specifies where YAZ is located.
 271 <tag><tt>DEBUG</tt></tag> If set to 1, the software is
 272 compiled with debugging libraries. If set to 0, the software
 273 is compiled with release (non-debugging) libraries.
 274 <tag>BZIP2</tag> A group of settings (<tt>BZIP2LIB</tt>,..)
 275 that must be defined if BZIP2 compression support is desired.
 276 </descrip>
 277
 278 When satisfied with the settings in the makefile type
 279 <tscreen><verb>
 280 nmake
 281 </verb></tscreen>
 282
 283 If compilation was successful the executables <tt>zebraidx.exe</tt>
 284 and <tt>zebrasrv.exe</tt> are put in the sub directory <tt>BIN</tt>.
 285
 286 <sect>Quick Start
 287 <p>
 288 In this section, we will test the system by indexing a small set of sample
 289 GILS records that are included with the software distribution. Go to the
 290 <tt>test/gils</tt> subdirectory of the distribution archive. There you will
 291 find a configuration
 292 file named <tt>zebra.cfg</tt> with the following contents:
 293 <tscreen><verb>
 294 # Where the schema files, attribute files, etc. are located.
 295 profilePath: .:../../tab:../../../yaz/tab
 296
 297 # Files that describe the attribute sets supported.
 298 attset: explain.att
 299 attset: bib1.att
 300 attset: gils.att
 301 </verb></tscreen>
 302
 303 Now, edit the file and set <tt>profilePath</tt> to the path of the
 304 YAZ profile tables (sub directory <tt>tab</tt> of the YAZ distribution
 305 archive).
 306
 307 The 48 test records are located in the sub directory <tt>records</tt>.
 308 To index these, type:
 309 <tscreen><verb>
 310 $ ../../bin/zebraidx -t grs.sgml update records
 311 </verb></tscreen>
 312
 313 In the command above the option <tt>-t</tt> specified the record
 314 type &mdash; in this case <tt>grs.sgml</tt>. The word <tt>update</tt> followed
 315 by a directory root updates all files below that directory node.
 316
 317 If your indexing command was successful, you are now ready to
 318 fire up a server. To start a server on port 2100, type:
 319 <tscreen><verb>
 320 $ ../../bin/zebrasrv tcp:@:2100
 321 </verb></tscreen>
 322
 323 The Zebra index that you have just created has a single database
 324 named <tt/Default/. The database contains records structured according to
 325 the GILS profile, and the server will
 326 return records in either either XML, USMARC, GRS-1, or SUTRS depending
 327 on what your client asks for.
 328
 329 To test the server, you can use any Z39.50 client (1992 or later). For
 330 instance, you can use the demo client that comes with YAZ: Just cd to
 331 the <tt/client/ subdirectory of the YAZ distribution and type:
 332
 333 <tscreen><verb>
 334 $ ./yaz-client tcp:localhost:2100
 335 </verb></tscreen>
 336
 337 When the client has connected, you can type:
 338
 339 <tscreen><verb>
 340 Z> find surficial
 341 Z> show 1
 342 </verb></tscreen>
 343
 344 The default retrieval syntax for the client is USMARC. To try other
 345 formats for the same record, try:
 346
 347 <tscreen><verb>
 348 Z>format sutrs
 349 Z>show 1
 350 Z>format grs-1
 351 Z>show 1
 352 Z>format xml
 353 Z>show 1
 354 Z>elements B
 355 Z>show 1
 356 </verb></tscreen>
 357
 358 <it>NOTE: You may notice that more fields are returned when your
 359 client requests SUTRS or GRS-1 records. When retrieving GILS records,
 360 this is normal - not all of the GILS data elements have mappings in
 361 the USMARC record format.</it>
 362
 363 If you've made it this far, there's a good chance that
 364 you've got through the compilation OK.
 365
 366 <sect>Administrating Zebra<label id="administrating">
 367
 368 <p>
 369
 370 To administrate Zebra, you run the
 371 <tt>zebraidx</tt> program. This program supports a number of options
 372 which are preceded by a minus, and a few commands (not preceded by
 373 minus).
 374
 375 Both the Zebra administrative tool and the Z39.50 server share a
 376 set of index files and a global configuration file. The
 377 name of the configuration file defaults to <tt>zebra.cfg</tt>.
 378 The configuration file includes specifications on how to index
 379 various kinds of records and where the other configuration files
 380 are located. <tt>zebrasrv</tt> and <tt>zebraidx</tt> <em>must</em>
 381 be run in the directory where the configuration file lives unless you
 382 indicate the location of the configuration file by option
 383 <tt>-c</tt>.
 384
 385 <sect1>Record Types<label id="record-types">
 386 <p>
 387 Indexing is a per-record process. Before a record is indexed search
 388 keys are extracted from whatever might be the layout the original
 389 record (sgml,html,text, etc..).
 390 The Zebra system currently supports two fundamantal types of records:
 391 structured and simple text.
 392 To specify a particular extraction process, use either the
 393 command line option <tt>-t</tt> or specify a
 394 <tt>recordType</tt> setting in the configuration file.
 395
 396 <sect1>The Zebra Configuration File<label id="configuration-file">
 397 <p>
 398 The Zebra configuration file, read by <tt>zebraidx</tt> and
 399 <tt>zebrasrv</tt> defaults to <tt>zebra.cfg</tt> unless specified
 400 by <tt>-c</tt> option.
 401
 402 You can edit the configuration file with a normal text editor.
 403 Parameter names and values are seperated by colons in the file. Lines
 404 starting with a hash sign (<tt/&num;/) are treated as comments.
 405
 406 If you manage different sets of records that share common
 407 characteristics, you can organize the configuration settings for each
 408 type into &dquot;groups&dquot;.
 409 When <tt>zebraidx</tt> is run and you wish to address a given group
 410 you specify the group name with the <tt>-g</tt> option. In this case
 411 settings that have the group name as their prefix will be used
 412 by <tt>zebraidx</tt>. If no <tt/-g/ option is specified, the settings
 413 with no prefix are used.
 414
 415 In the configuration file, the group name is placed before the option
 416 name itself, separated by a dot (.). For instance, to set the record type
 417 for group <tt/public/ to <tt/grs.sgml/ (the SGML-like format for structured
 418 records) you would write:
 419
 420 <tscreen><verb>
 421 public.recordType: grs.sgml
 422 </verb></tscreen>
 423
 424 To set the default value of the record type to <tt/text/ write:
 425
 426 <tscreen><verb>
 427 recordType: text
 428 </verb></tscreen>
 429
 430 The available configuration settings are summarized below. They will be
 431 explained further in the following sections.
 432
 433 <descrip>
 434 <tag><it>group</it>.recordType&lsqb;<it>.name</it>&rsqb;</tag>
 435  Specifies how records with the file extension <it>name</it> should
 436  be handled by the indexer. This option may also be specified
 437  as a command line option (<tt>-t</tt>). Note that if you do not
 438  specify a <it/name/, the setting applies to all files. In general,
 439  the record type specifier consists of the elements (each
 440  element separated by dot), <it>fundamental-type</it>,
 441  <it>file-read-type</it> and arguments. Currently, two
 442  fundamental types exist, <tt>text</tt> and <tt>grs</tt>.
 443  <tag><it>group</it>.recordId</tag>
 444  Specifies how the records are to be identified when updated. See
 445 section <ref id="locating-records" name="Locating Records">.
 446 <tag><it>group</it>.database</tag>
 447  Specifies the Z39.50 database name.
 448 <tag><it>group</it>.storeKeys</tag>
 449  Specifies whether key information should be saved for a given
 450  group of records. If you plan to update/delete this type of
 451  records later this should be specified as 1; otherwise it
 452  should be 0 (default), to save register space.
 453 <tag><it>group</it>.storeData</tag>
 454  Specifies whether the records should be stored internally
 455  in the Zebra system files. If you want to maintain the raw records yourself,
 456  this option should be false (0). If you want Zebra to take care of the records
 457  for you, it should be true(1).
 458 <tag>lockDir</tag>
 459  Directory in which various lock files are stored.
 460 <tag>keyTmpDir</tag>
 461  Directory in which temporary files used during zebraidx' update
 462  phase are stored.
 463 <tag>setTmpDir</tag>
 464  Specifies the directory that the server uses for temporary result sets.
 465  If not specified <tt>/tmp</tt> will be used.
 466 <tag>profilePath</tag>
 467  Specifies the location of profile specification files.
 468 <tag>attset</tag>
 469  Specifies the filename(s) of attribute set files for use in
 470  searching. At least the Bib-1 set should be loaded (<tt/bib1.att/).
 471  The <tt/profilePath/ setting is used to look for the specified files.
 472  See section <ref id="attset-files" name="The Attribute Set Files">
 473 <tag>memMax</tag>
 474  Specifies size of internal memory to use for the zebraidx program. The
 475  amount is given in megabytes - default is 4 (4 MB).
 476 </descrip>
 477 <sect1>Locating Records<label id="locating-records">
 478 <p>
 479 The default behaviour of the Zebra system is to reference the
 480 records from their original location, i.e. where they were found when you
 481 ran <tt/zebraidx/. That is, when a client wishes to retrieve a record
 482 following a search operation, the files are accessed from the place
 483 where you originally put them - if you remove the files (without
 484 running <tt/zebraidx/ again, the client will receive a diagnostic
 485 message.
 486
 487 If your input files are not permanent - for example if you retrieve
 488 your records from an outside source, or if they were temporarily
 489 mounted on a CD-ROM drive,
 490 you may want Zebra to make an internal copy of them. To do this,
 491 you specify 1 (true) in the <tt>storeData</tt> setting. When
 492 the Z39.50 server retrieves the records they will be read from the
 493 internal file structures of the system.
 494
 495 <sect1>Indexing example
 496
 497 <p>
 498 Consider a system in which you have a group of text files called
 499 <tt>simple</tt>. That group of records should belong to a Z39.50 database
 500 called <tt>textbase</tt>. The following <tt/zebra.cfg/ file will suffice:
 501
 502 <tscreen><verb>
 503 profilePath: /usr/lib/yaz/tab:/usr/lib/zebra/tab
 504 attset: explain.att
 505 attset: bib1.att
 506 simple.recordType: text
 507 simple.database: textbase
 508 </verb></tscreen>
 509
 510 <sect>Running the Maintenance Interface (zebraidx)
 511
 512 <p>
 513 The following is a complete reference to the command line interface to
 514 the <tt/zebraidx/ application.
 515
 516 <bf/Syntax/
 517 <tscreen><verb>
 518 $ zebraidx &lsqb;options&rsqb; command &lsqb;directory&rsqb; ...
 519 </verb></tscreen>
 520 <bf/Options/
 521 <descrip>
 522 <tag>-t <it/type/</tag>Update all files as <it/type/. Currently, the
 523 types supported are <tt/text/ and <tt/grs/<it/.subtype/. If no
 524 <it/subtype/ is provided for the GRS (General Record Structure) type,
 525 the canonical input format is assumed (see section <ref
 526 id="local-representation" name="Local Representation">). Generally, it
 527 is probably advisable to specify the record types in the
 528 <tt/zebra.cfg/ file (see section <ref id="record-types" name="Record
 529 Types">), to avoid confusion at subsequent updates.
 530
 531 <tag>-c <it/config-file/</tag>Read the configuration file
 532 <it/config-file/ instead of <tt/zebra.cfg/.
 533
 534 <tag>-g <it/group/</tag>Update the files according to the group
 535 settings for <it/group/ (see section <ref id="configuration-file"
 536 name="The Zebra Configuration File">).
 537
 538 <tag>-d <it/database/</tag>The records located should be associated
 539 with the database name <it/database/ for access through the Z39.50
 540 server.
 541
 542 <tag>-m <it/mbytes/</tag>Use <it/mbytes/ of megabytes before flushing
 543 keys to background storage. This setting affects performance when
 544 updating large databases.
 545
 546 <tag>-s</tag>Show analysis of the indexing process. The maintenance
 547 program works in a read-only mode and doesn't change the state
 548 of the index. This options is very useful when you wish to test a
 549 new profile.
 550
 551 <tag>-V</tag>Show Zebra version.
 552
 553 <tag>-v <it/level/</tag>Set the log level to <it/level/. <it/level/
 554 should be one of <tt/none/, <tt/debug/, and <tt/all/.
 555
 556 </descrip>
 557
 558 <bf/Commands/
 559 <descrip>
 560 <tag>Update <it/directory/</tag>Update the register with the files
 561 contained in <it/directory/. If no directory is provided, a list of
 562 files is read from <tt/stdin/. See section <ref
 563 id="administrating" name="Administrating Zebra">.
 564
 565 </descrip>
 566
 567 <sect>The Z39.50 Server
 568
 569 <sect1>Running the Z39.50 Server (zebrasrv)
 570
 571 <p>
 572 <bf/Syntax/
 573 <tscreen><verb>
 574 zebrasrv &lsqb;options&rsqb; &lsqb;listener-address ...&rsqb;
 575 </verb></tscreen>
 576
 577 <bf/Options/
 578 <descrip>
 579 <tag>-a <it/APDU file/</tag> Specify a file for dumping PDUs (for diagnostic purposes).
 580 The special name &dquot;-&dquot; sends output to <tt/stderr/.
 581
 582 <tag>-c <it/config-file/</tag> Read configuration information from <it/config-file/. The default configuration is <tt>./zebra.cfg</tt>.
 583
 584 <tag/-S/Don't fork on connection requests. This can be useful for
 585 symbolic-level debugging. The server can only accept a single
 586 connection in this mode.
 587
 588 <tag>-l <it/logfile/</tag>Specify an output file for the diagnostic
 589 messages. The default is to write this information to <tt/stderr/.
 590
 591 <tag>-v <it/log-level/</tag>The log level. Use a comma-separated list of members of the set
 592 {fatal,debug,warn,log,all,none}.
 593
 594 <tag>-u <it/username/</tag>Set user ID. Sets the real UID of the server process to that of the
 595 given <it/username/. It's useful if you aren't comfortable with having the
 596 server run as root, but you need to start it as such to bind a
 597 privileged port.
 598
 599 <tag>-w <it/working-directory/</tag>Change working directory.
 600
 601 <tag>-i</tag>Run under the Internet superserver, <tt/inetd/. Make
 602 sure you use the logfile option <tt/-l/ in conjunction with this
 603 mode and specify the <tt/-l/ option before any other options.
 604
 605 <tag>-t <it/timeout/</tag>Set the idle session timeout (default 60 minutes).
 606
 607 <tag>-k <it/kilobytes/</tag>Set the (approximate) maximum size of
 608 present response messages. Default is 1024 Kb (1 Mb).
 609 </descrip>
 610
 611 A <it/listener-address/ consists of a transport mode followed by a
 612 colon (:) followed by a listener address. The transport mode is
 613 either <tt/osi/ or <tt/tcp/.
 614
 615 For TCP, an address has the form
 616
 617 <tscreen><verb>
 618 hostname | IP-number &lsqb;: portnumber&rsqb;
 619 </verb></tscreen>
 620
 621 The port number defaults to 210 (standard Z39.50 port).
 622
 623 The special hostname &dquot;@&dquot; is mapped to
 624 the address INADDR_ANY, which causes the server to listen on any local
 625 interface. To start the server listening on the registered port for
 626 Z39.50, and to drop root privileges once the
 627 port is bound, execute the server like this (from a root shell):
 628
 629 <tscreen><verb>
 630 zebrasrv -u daemon tcp:@
 631 </verb></tscreen>
 632
 633 You can replace <tt/daemon/ with another user, eg. your own account, or
 634 a dedicated IR server account.
 635
 636 The default behavior for <tt/zebrasrv/ is to establish a single TCP/IP
 637 listener, for the Z39.50 protocol, on port 9999.
 638
 639 <sect1>Z39.50 Protocol Support and Behavior
 640
 641 <sect2>Initialization
 642
 643 <p>
 644 During initialization, the server will negotiate to version 3 of the
 645 Z39.50 protocol (unless the client specifies a lower version), and the option bits for Search, Present, Scan,
 646 NamedResultSets, and concurrentOperations will be set, if requested by
 647 the client. The maximum PDU size is negotiated down to a maximum of
 648 1Mb by default.
 649
 650 <sect2>Search<label id="search">
 651
 652 <p>
 653 The supported query type are 1 and 101. All operators are currently
 654 supported with the restriction that only proximity units of type "word" are
 655 supported for the proximity operator.
 656 Queries can be arbitrarily complex.
 657 Named result sets are supported, and result sets can be used as operands
 658 without limitations.
 659 Searches may span multiple databases.
 660
 661 The server has full support for piggy-backed present requests (see
 662 also the following section).
 663
 664 <bf/Use/ attributes are interpreted according to the attribute sets which
 665 have been loaded in the <tt/zebra.cfg/ file, and are matched against
 666 specific fields as specified in the <tt/.abs/ file which describes the
 667 profile of the records which have been loaded. If no <bf/Use/
 668 attribute is provided, a default of Bib-1 <bf/Any/ is assumed.
 669
 670 If a <bf/Structure/ attribute of <bf/Phrase/ is used in conjunction with a
 671 <bf/Completeness/ attribute of <bf/Complete (Sub)field/, the term is
 672 matched against the contents of the phrase (long word) register, if one
 673 exists for the given <bf/Use/ attribute.
 674 A phrase register is created for those fields in the <tt/.abs/
 675 file that contains a <tt/p/-specifier.
 676
 677 If <bf/Structure/=<bf/Phrase/ is used in conjunction with
 678 <bf/Incomplete Field/ - the default value for <bf/Completeness/, the
 679 search is directed against the normal word registers, but if the term
 680 contains multiple words, the term will only match if all of the words
 681 are found immediately adjacent, and in the given order.
 682 The word search is performed on those fields that are indexed as
 683 type <tt/w/ in the <tt/.abs/ file.
 684
 685 If the <bf/Structure/ attribute is <bf/Word List/,
 686 <bf/Free-form Text/, or <bf/Document Text/, the term is treated as a
 687 natural-language, relevance-ranked query.
 688 This search type uses the word register, i.e. those fields
 689 that are indexed as type <tt/w/ in the <tt/.abs/ file.
 690
 691 If the <bf/Structure/ attribute is <bf/Numeric String/ the
 692 term is treated as an integer. The search is performed on those
 693 fields that are indexed as type <tt/n/ in the <tt/.abs/ file.
 694
 695 If the <bf/Structure/ attribute is <bf/URx/ the
 696 term is treated as a URX (URL) entity. The search is performed on those
 697 fields that are indexed as type <tt/u/ in the <tt/.abs/ file.
 698
 699 If the <bf/Structure/ attribute is <bf/Local Number/ the
 700 term is treated as native Zebra Record Identifier.
 701
 702 If the <bf/Relation/ attribute is <bf/Equals/ (default), the term is
 703 matched in a normal fashion (modulo truncation and processing of
 704 individual words, if required). If <bf/Relation/ is <bf/Less Than/,
 705 <bf/Less Than or Equal/, <bf/Greater than/, or <bf/Greater than or
 706 Equal/, the term is assumed to be numerical, and a standard regular
 707 expression is constructed to match the given expression. If
 708 <bf/Relation/ is <bf/Relevance/, the standard natural-language query
 709 processor is invoked.
 710
 711 For the <bf/Truncation/ attribute, <bf/No Truncation/ is the default.
 712 <bf/Left Truncation/ is not supported. <bf/Process &num;/ is supported, as
 713 is <bf/Regxp-1/. <bf/Regxp-2/ enables the fault-tolerant (fuzzy)
 714 search. As a default, a single error (deletion, insertion,
 715 replacement) is accepted when terms are matched against the register
 716 contents.
 717
 718 <sect3>Regular expressions
 719 <p>
 720
 721 Each term in a query is interpreted as a regular expression if
 722 the truncation value is either <bf/Regxp-1/ (102) or <bf/Regxp-2/ (103).
 723 Both query types follow the same syntax with the operands:
 724 <descrip>
 725 <tag/x/ Matches the character <it/x/.
 726 <tag/./ Matches any character.
 727 <tag><tt/[/..<tt/]/</tag> Matches the set of characters specified;
 728  such as <tt/[abc]/ or <tt/[a-c]/.
 729 </descrip>
 730 and the operators:
 731 <descrip>
 732 <tag/x*/ Matches <it/x/ zero or more times. Priority: high.
 733 <tag/x+/ Matches <it/x/ one or more times. Priority: high.
 734 <tag/x?/ Matches <it/x/ once or twice. Priority: high.
 735 <tag/xy/ Matches <it/x/, then <it/y/. Priority: medium.
 736 <tag/x|y/ Matches either <it/x/ or <it/y/. Priority: low.
 737 </descrip>
 738 The order of evaluation may be changed by using parentheses.
 739
 740 If the first character of the <bf/Regxp-2/ query is a plus character
 741 (<tt/+/) it marks the beginning of a section with non-standard
 742 specifiers. The next plus character marks the end of the section.
 743 Currently Zebra only supports one specifier, the error tolerance,
 744 which consists one digit.
 745
 746 Since the plus operator is normally a suffix operator the addition to
 747 the query syntax doesn't violate the syntax for standard regular
 748 expressions.
 749
 750 <sect3>Query examples
 751 <p>
 752
 753 Phrase search for <bf/information retrieval/ in the title-register:
 754 <verb>
 755  @attr 1=4 "information retrieval"
 756 </verb>
 757
 758 Ranked search for the same thing:
 759 <verb>
 760  @attr 1=4 @attr 2=102 "Information retrieval"
 761 </verb>
 762
 763 Phrase search with a regular expression:
 764 <verb>
 765  @attr 1=4 @attr 5=102 "informat.* retrieval"
 766 </verb>
 767
 768 Ranked search with a regular expression:
 769 <verb>
 770  @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
 771 </verb>
 772
 773 In the GILS schema (<tt/gils.abs/), the west-bounding-coordinate is
 774 indexed as type <tt/n/, and is therefore searched by specifying
 775 <bf/structure/=<bf/Numeric String/.
 776 To match all those records with west-bounding-coordinate greater
 777 than -114 we use the following query:
 778 <verb>
 779  @attr 4=109 @attr 2=5 @attr gils 1=2038 -114
 780 </verb>
 781
 782 <sect2>Present
 783 <p>
 784 The present facility is supported in a standard fashion. The requested
 785 record syntax is matched against the ones supported by the profile of
 786 each record retrieved. If no record syntax is given, SUTRS is the
 787 default. The requested element set name, again, is matched against any
 788 provided by the relevant record profiles.
 789
 790 <sect2>Scan
 791
 792 <p>
 793 The attribute combinations provided with the TermListAndStartPoint are
 794 processed in the same way as operands in a query (see above).
 795 Currently, only the term and the globalOccurrences are returned with
 796 the TermInfo structure.
 797
 798 <sect2>Sort
 799
 800 <p>
 801 Z39.50 specifies three diffent types of sort criterias.
 802 Of these Zebra supports the attribute specification type in which
 803 case the use attribute specifies the "Sort register".
 804 Sort registers are created for those fields that are of type "sort" in
 805 the default.idx file.
 806 The corresponding character mapping file in default.idx specifies the
 807 ordinal of each character used in the actual sort.
 808
 809 Z39.50 allows the client to specify sorting on one or more input
 810 result sets and one output result set.
 811 Zebra supports sorting on one result set only which may or may not
 812 be the same as the output result set.
 813
 814 <sect2>Close
 815
 816 <p>
 817 If a Close PDU is received, the server will respond with a Close PDU
 818 with reason=FINISHED, no matter which protocol version was negotiated
 819 during initialization. If the protocol version is 3 or more, the
 820 server will generate a Close PDU under certain circumstances,
 821 including a session timeout (60 minutes by default), and certain kinds of
 822 protocol errors. Once a Close PDU has been sent, the protocol
 823 association is considered broken, and the transport connection will be
 824 closed immediately upon receipt of further data, or following a short
 825 timeout.
 826
 827 <sect>The Record Model
 828
 829 <p>
 830 Zebra is designed to support a wide range of data management
 831 applications. The system can be configured to handle virtually any
 832 kind of structured data. Each record in the system is associated with
 833 a <it/record schema/ which lends context to the data elements of the
 834 record. Any number of record schema can coexist in the system.
 835 Although it may be wise to use only a single schema within
 836 one database, the system poses no such restrictions.
 837
 838 The record model described in this chapter applies to the fundamental,
 839 structured
 840 record type <tt>grs</tt> as introduced in
 841 section <ref id="record-types" name="Record Types">.
 842
 843 Records pass through three different states during processing in the
 844 system.
 845
 846 <itemize>
 847 <item>When records are accessed by the system, they are represented
 848 in their local, or native format. This might be SGML or HTML files,
 849 News or Mail archives, MARC records. If the system doesn't already
 850 know how to read the type of data you need to store, you can set up an
 851 input filter by preparing conversion rules based on regular
 852 expressions and possibly augmented by a flexible scripting language (Tcl). The input filter
 853 produces as output an internal representation:
 854
 855 <item>When records are processed by the system, they are represented
 856 in a tree-structure, constructed by tagged data elements hanging off a
 857 root node. The tagged elements may contain data or yet more tagged
 858 elements in a recursive structure. The system performs various
 859 actions on this tree structure (indexing, element selection, schema
 860 mapping, etc.),
 861
 862 <item>Before transmitting records to the client, they are first
 863 converted from the internal structure to a form suitable for exchange
 864 over the network - according to the Z39.50 standard.
 865 </itemize>
 866
 867 <sect1>Local Representation<label id="local-representation">
 868
 869 <p>
 870 As mentioned earlier, Zebra places few restrictions on the type of
 871 data that you can index and manage. Generally, whatever the form of
 872 the data, it is parsed by an input filter specific to that format, and
 873 turned into an internal structure that Zebra knows how to handle. This
 874 process takes place whenever the record is accessed - for indexing and
 875 retrieval.
 876
 877 <p>
 878 The RecordType parameter in the <tt/zebra.cfg/ file, or the <tt/-t/
 879 option to the indexer tells Zebra how to process input records. Two
 880 basic types of processing are available - raw text and structured
 881 data. Raw text is just that, and it is selected by providing the
 882 argument <bf/text/ to Zebra. Structured records are all handled
 883 internally using the basic mechanisms described in the subsequent
 884 sections. Zebra can read structured records in many different formats.
 885 How this is done is governed by additional parameters after the
 886 &dquot;grs&dquot; keyboard, separated by &dquot;.&dquot; characters.
 887
 888 Three basic subtypes to the <bf/grs/ type are currently available:
 889
 890 <descrip>
 891 <tag>grs.sgml</tag>This is the canonical input format &mdash;
 892 described below. It is a simple SGML-like syntax.
 893
 894 <tag>grs.regx.<it/filter/</tag>This enables a user-supplied input
 895 filter. The mechanisms of these filters are described below.
 896
 897 <tag>grs.tcl.<it/filter/</tag>This enables a user-supplied input
 898 filter with Tcl rules (only availble if zebra is compiled with Tcl
 899 support).
 900
 901 <tag>grs.marc.<it/abstract syntax/</tag>This allows Zebra to read
 902 records in the ISO2709 (MARC) encoding standard. In this case, the
 903 last paramemeter <it/abstract syntax/ names the .abs file (see below)
 904 which describes the specific MARC structure of the input record as
 905 well as the indexing rules.
 906 </descrip>
 907
 908 <sect2>Canonical Input Format
 909
 910 <p>
 911 Although input data can take any form, it is sometimes useful to
 912 describe the record processing capabilities of the system in terms of
 913 a single, canonical input format that gives access to the full
 914 spectrum of structure and flexibility in the system. In Zebra, this
 915 canonical format is an &dquot;SGML-like&dquot; syntax.
 916
 917 To use the canonical format specify <tt>grs.sgml</tt> as the record
 918 type,
 919
 920 Consider a record describing an information resource (such a record is
 921 sometimes known as a <it/locator record/). It might contain a field
 922 describing the distributor of the information resource, which might in
 923 turn be partitioned into various fields providing details about the
 924 distributor, like this:
 925
 926 <tscreen><verb>
 927 <Distributor>
 928     <Name> USGS/WRD &etago;Name>
 929     <Organization> USGS/WRD &etago;Organization>
 930     <Street-Address>
 931         U.S. GEOLOGICAL SURVEY, 505 MARQUETTE, NW
 932     &etago;Street-Address>
 933     <City> ALBUQUERQUE &etago;City>
 934     <State> NM &etago;State>
 935     <Zip-Code> 87102 &etago;Zip-Code>
 936     <Country> USA &etago;Country>
 937     <Telephone> (505) 766-5560 &etago;Telephone>
 938 &etago;Distributor>
 939 </verb></tscreen>
 940
 941 <it>NOTE: The indentation used above is used to illustrate how Zebra
 942 interprets the markup. The indentation, in itself, has no
 943 significance to the parser for the canonical input format, which
 944 discards superfluous whitespace.</it>
 945
 946 The keywords surrounded by &lt;...&gt; are <it/tags/, while the
 947 sections of text in between are the <it/data elements/. A data element
 948 is characterized by its location in the tree that is made up by the
 949 nested elements. Each element is terminated by a closing tag -
 950 beginning with <tt/&etago;/, and containing the same symbolic tag-name as
 951 the corresponding opening tag. The general closing tag - <tt/&etago;&gt;/ -
 952 terminates the element started by the last opening tag. The
 953 structuring of elements is significant. The element <bf/Telephone/,
 954 for instance, may be indexed and presented to the client differently,
 955 depending on whether it appears inside the <bf/Distributor/ element,
 956 or some other, structured data element such a <bf/Supplier/ element.
 957
 958 <sect3>Record Root
 959
 960 <p>
 961 The first tag in a record describes the root node of the tree that
 962 makes up the total record. In the canonical input format, the root tag
 963 should contain the name of the schema that lends context to the
 964 elements of the record (see section <ref id="internal-representation"
 965 name="Internal Representation">). The following is a GILS record that
 966 contains only a single element (strictly speaking, that makes it an
 967 illegal GILS record, since the GILS profile includes several mandatory
 968 elements - Zebra does not validate the contents of a record against
 969 the Z39.50 profile, however - it merely attempts to match up elements
 970 of a local representation with the given schema):
 971
 972 <tscreen><verb>
 973 <gils>
 974     <title>Zen and the Art of Motorcycle Maintenance&etago;title>
 975 &etago;gils>
 976 </verb></tscreen>
 977
 978 <sect3>Variants
 979
 980 <p>
 981 Zebra allows you to provide individual data elements in a number of
 982 <it/variant forms/. Examples of variant forms are textual data
 983 elements which might appear in different languages, and images which
 984 may appear in different formats or layouts. The variant system in
 985 Zebra is
 986 essentially a representation of the variant mechanism of
 987 Z39.50-1995.
 988
 989 The following is an example of a title element which occurs in two
 990 different languages.
 991
 992 <tscreen><verb>
 993 <title>
 994   <var lang lang "eng">
 995     Zen and the Art of Motorcycle Maintenance&etago;>
 996   <var lang lang "dan">
 997     Zen og Kunsten at Vedligeholde en Motorcykel&etago;>
 998 &etago;title>
 999 </verb></tscreen>
1000
1001 The syntax of the <it/variant element/ is <tt>&lt;<bf/var/ <it/class
1002 type value/&gt;</tt>. The available values for the <it/class/ and
1003 <it/type/ fields are given by the variant set that is associated with the
1004 current schema (see section <ref id="variant-set" name="Variant Set
1005 File">).
1006
1007 Variant elements are terminated by the general end-tag &etago;>, by
1008 the variant end-tag &etago;var>, by the appearance of another variant
1009 tag with the same <it/class/ and <it/value/ settings, or by the
1010 appearance of another, normal tag. In other words, the end-tags for
1011 the variants used in the example above could have been saved.
1012
1013 Variant elements can be nested. The element
1014
1015 <tscreen><verb>
1016 <title>
1017   <var lang lang "eng"><var body iana "text/plain">
1018     Zen and the Art of Motorcycle Maintenance
1019 &etago;title>
1020 </verb></tscreen>
1021
1022 Associates two variant components to the variant list for the title
1023 element.
1024
1025 Given the nesting rules described above, we could write
1026
1027 <tscreen><verb>
1028 <title>
1029   <var body iana "text/plain>
1030     <var lang lang "eng">
1031       Zen and the Art of Motorcycle Maintenance
1032     <var lang lang "dan">
1033       Zen og Kunsten at Vedligeholde en Motorcykel
1034 &etago;title>
1035 </verb></tscreen>
1036
1037 The title element above comes in two variants. Both have the IANA body
1038 type &dquot;text/plain&dquot;, but one is in English, and the other in
1039 Danish. The client, using the element selection mechanism of Z39.50,
1040 can retrieve information about the available variant forms of data
1041 elements, or it can select specific variants based on the requirements
1042 of the end-user.
1043
1044 <sect2>Input Filters
1045
1046 <p>
1047 In order to handle general input formats, Zebra allows the
1048 operator to define filters which read individual records in their native format
1049 and produce an internal representation that the system can
1050 work with.
1051
1052 Input filters are ASCII files, generally with the suffix <tt/.flt/.
1053 The system looks for the files in the directories given in the
1054 <bf/profilePath/ setting in the <tt/zebra.cfg/ files. The record type
1055 for the filter is <tt>grs.regx.</tt><it>filter-filename</it>
1056 (fundamental type <tt>grs</tt>, file read type <tt>regx</tt>, argument
1057 <it>filter-filename</it>).
1058
1059 Generally, an input filter consists of a sequence of rules, where each
1060 rule consists of a sequence of expressions, followed by an action. The
1061 expressions are evaluated against the contents of the input record,
1062 and the actions normally contribute to the generation of an internal
1063 representation of the record.
1064
1065 An expression can be either of the following:
1066
1067 <descrip>
1068 <tag/INIT/The action associated with this expression is evaluated
1069 exactly once in the lifetime of the application, before any records
1070 are read. It can be used in conjunction with an action that
1071 initializes tables or other resources that are used in the processing
1072 of input records.
1073
1074 <tag/BEGIN/Matches the beginning of the record. It can be used to
1075 initialize variables, etc. Typically, the <bf/BEGIN/ rule is also used
1076 to establish the root node of the record.
1077
1078 <tag/END/Matches the end of the record - when all of the contents
1079 of the record has been processed.
1080
1081 <tag>/pattern/</tag>Matches a string of characters from the input
1082 record.
1083
1084 <tag/BODY/This keyword may only be used between two patterns. It
1085 matches everything between (not including) those patterns.
1086
1087 <tag/FINISH/THe expression asssociated with this pattern is evaluated
1088 once, before the application terminates. It can be used to release
1089 system resources - typically ones allocated in the <bf/INIT/ step.
1090
1091 </descrip>
1092
1093 An action is surrounded by curly braces ({...}), and consists of a
1094 sequence of statements. Statements may be separated by newlines or
1095 semicolons (;). Within actions, the strings that matched the
1096 expressions immediately preceding the action can be referred to as
1097 &dollar;0, &dollar;1, &dollar;2, etc.
1098
1099 The available statements are:
1100
1101 <descrip>
1102
1103 <tag>begin <it/type &lsqb;parameter ... &rsqb;/</tag>Begin a new
1104 data element. The type is one of the following:
1105 <descrip>
1106 <tag/record/Begin a new record. The followingparameter should be the
1107 name of the schema that describes the structure of the record, eg.
1108 <tt/gils/ or <tt/wais/ (see below). The <tt/begin record/ call should
1109 precede
1110 any other use of the <bf/begin/ statement.
1111
1112 <tag/element/Begin a new tagged element. The parameter is the
1113 name of the tag. If the tag is not matched anywhere in the tagsets
1114 referenced by the current schema, it is treated as a local string
1115 tag.
1116
1117 <tag/variant/Begin a new node in a variant tree. The parameters are
1118 <it/class type value/.
1119
1120 </descrip>
1121
1122 <tag/data/Create a data element. The concatenated arguments make
1123 up the value of the data element. The option <tt/-text/ signals that
1124 the layout (whitespace) of the data should be retained for
1125 transmission. The option <tt/-element/ <it/tag/ wraps the data up in
1126 the <it/tag/. The use of the <tt/-element/ option is equivalent to
1127 preceding the command with a <bf/begin element/ command, and following
1128 it with the <bf/end/ command.
1129
1130 <tag>end <it/&lsqb;type&rsqb;/</tag>Close a tagged element. If no parameter is given,
1131 the last element on the stack is terminated. The first parameter, if
1132 any, is a type name, similar to the <bf/begin/ statement. For the
1133 <bf/element/ type, a tag name can be provided to terminate a specific tag.
1134
1135 </descrip>
1136
1137 The following input filter reads a Usenet news file, producing a
1138 record in the WAIS schema. Note that the body of a news posting is
1139 separated from the list of headers by a blank line (or rather a
1140 sequence of two newline characters.
1141
1142 <tscreen><verb>
1143 BEGIN                { begin record wais }
1144
1145 /^From:/ BODY /$/    { data -element name $1 }
1146 /^Subject:/ BODY /$/ { data -element title $1 }
1147 /^Date:/ BODY /$/    { data -element lastModified $1 }
1148 /\n\n/ BODY END      {
1149                         begin element bodyOfDisplay
1150                         begin variant body iana "text/plain"
1151                         data -text $1
1152                         end record
1153                      }
1154 </verb></tscreen>
1155
1156 If Zebra is compiled with support for Tcl (Tool Command Language)
1157 enabled, the statements described above are supplemented with a complete
1158 scripting environment, including control structures (conditional
1159 expressions and loop constructs), and powerful string manipulation
1160 mechanisms for modifying the elements of a record. Tcl is a popular
1161 scripting environment, with several tutorials available both online
1162 and in hardcopy.
1163
1164 <it>NOTE: Variant support is not currently available in the input
1165 filter, but will be included with one of the next
1166 releases.</it>
1167
1168 <sect1>Internal Representation<label id="internal-representation">
1169
1170 <p>
1171 When records are manipulated by the system, they're represented in a
1172 tree-structure, with data elements at the leaf nodes, and tags or
1173 variant components at the non-leaf nodes. The root-node identifies the
1174 schema that lends context to the tagging and structuring of the
1175 record. Imagine a simple record, consisting of a 'title' element and
1176 an 'author' element:
1177
1178 <tscreen><verb>
1179         TITLE     "Zen and the Art of Motorcycle Maintenance"
1180 ROOT
1181         AUTHOR    "Robert Pirsig"
1182 </verb></tscreen>
1183
1184 A slightly more complex record would have the author element consist
1185 of two elements, a surname and a first name:
1186
1187 <tscreen><verb>
1188         TITLE     "Zen and the Art of Motorcycle Maintenance"
1189 ROOT
1190                   FIRST-NAME "Robert"
1191         AUTHOR
1192                   SURNAME    "Pirsig"
1193 </verb></tscreen>
1194
1195 The root of the record will refer to the record schema that describes
1196 the structuring of this particular record. The schema defines the
1197 element tags (TITLE, FIRST-NAME, etc.) that may occur in the record, as
1198 well as the structuring (SURNAME should appear below AUTHOR, etc.). In
1199 addition, the schema establishes element set names that are used by
1200 the client to request a subset of the elements of a given record. The
1201 schema may also establish rules for converting the record to a
1202 different schema, by stating, for each element, a mapping to a
1203 different tag path.
1204
1205 <sect2>Tagged Elements
1206
1207 <p>
1208 A data element is characterized by its tag, and its position in the
1209 structure of the record. For instance, while the tag &dquot;telephone
1210 number&dquot; may be used different places in a record, we may need to
1211 distinguish between these occurrences, both for searching and
1212 presentation purposes. For instance, while the phone numbers for the
1213 &dquot;customer&dquot; and the &dquot;service provider&dquot; are both
1214 representatives for the same type of resource (a telephone number), it
1215 is essential that they be kept separate. The record schema provides
1216 the structure of the record, and names each data element (defined by
1217 the sequence of tags - the tag path - by which the element can be
1218 reached from the root of the record).
1219
1220 <sect2>Variants
1221
1222 <p>
1223 The children of a tag node may be either more tag nodes, a data node
1224 (possibly accompanied by tag nodes),
1225 or a tree of variant nodes. The children of  variant nodes are either
1226 more variant nodes or a data node (possibly accompanied by more
1227 variant nodes). Each leaf node, which is normally a
1228 data node, corresponds to a <it/variant form/ of the tagged element
1229 identified by the tag which parents the variant tree. The following
1230 title element occurs in two different languages:
1231
1232 <tscreen><verb>
1233       VARIANT LANG=ENG  "War and Peace"
1234 TITLE
1235       VARIANT LANG=DAN  "Krig og Fred"
1236 </verb></tscreen>
1237
1238 Which of the two elements are transmitted to the client by the server
1239 depends on the specifications provided by the client, if any.
1240
1241 In practice, each variant node is associated with a triple of class,
1242 type, value, corresponding to the variant mechanism of Z39.50.
1243
1244 <sect2>Data Elements
1245
1246 <p>
1247 Data nodes have no children (they are always leaf nodes in the record
1248 tree).
1249
1250 <it>NOTE: Documentation needs extension here about types of nodes - numerical,
1251 textual, etc., plus the various types of inclusion notes.</it>
1252
1253 <sect1>Configuring Your Data Model<label id="data-model">
1254
1255 <p>
1256 The following sections describe the configuration files that govern
1257 the internal management of data records. The system searches for the files
1258 in the directories specified by the <bf/profilePath/ setting in the
1259 <tt/zebra.cfg/ file.
1260
1261 <sect2>About Object Identifers
1262 <p>
1263 When Object Identifiers (or OID's) need to be specified in the following
1264 a named OID reference or a raw OID reference may be used. For the named
1265 OID's refer to the source file <tt>util/oid.c</tt> from YAZ. The raw
1266 canonical OID's are specified in dot-notation (for example
1267 1.2.840.10003.3.1000.81.1).
1268
1269 <sect2>The Abstract Syntax
1270
1271 <p>
1272 The abstract syntax definition (also known as an Abstract Record
1273 Structure, or ARS) is the focal point of the
1274 record schema description. For a given schema, the ABS file may state any
1275 or all of the following:
1276
1277 <itemize>
1278 <item>The object identifier of the Z39.50 schema associated
1279 with the ARS, so that it can be referred to by the client.
1280
1281 <item>The attribute set (which can possibly be a compound of multiple
1282 sets) which applies in the profile. This is used when indexing and
1283 searching the records belonging to the given profile.
1284
1285 <item>The Tag set (again, this can consist of several different sets).
1286 This is used when reading the records from a file, to recognize the
1287 different tags, and when transmitting the record to the client -
1288 mapping the tags to their numerical representation, if they are
1289 known.
1290
1291 <item>The variant set which is used in the profile. This provides a
1292 vocabulary for specifying the <it/forms/ of data that appear inside
1293 the records.
1294
1295 <item>Element set names, which are a shorthand way for the client to
1296 ask for a subset of the data elements contained in a record. Element
1297 set names, in the retrieval module, are mapped to <it/element
1298 specifications/, which contain information equivalent to the
1299 <it/Espec-1/ syntax of Z39.50.
1300
1301 <item>Map tables, which may specify mappings to <it/other/ database
1302 profiles, if desired.
1303
1304 <item>Possibly, a set of rules describing the mapping of elements to a
1305 MARC representation.
1306
1307 <item>A list of element descriptions (this is the actual ARS of the
1308 schema, in Z39.50 terms), which lists the ways in which the various
1309 tags can be used and organized hierarchically.
1310 </itemize>
1311
1312 Several of the entries above simply refer to other files, which
1313 describe the given objects.
1314
1315 <sect2>The Configuration Files
1316
1317 <p>
1318 This section describes the syntax and use of the various tables which
1319 are used by the retrieval module.
1320
1321 The number of different file types may appear daunting at first, but
1322 each type corresponds fairly clearly to a single aspect of the Z39.50
1323 retrieval facilities. Further, the average database administrator,
1324 who is simply reusing an existing profile for which tables already
1325 exist, shouldn't have to worry too much about the contents of these tables.
1326
1327 Generally, the files are simple ASCII files, which can be maintained
1328 using any text editor. Blank lines, and lines beginning with a (&num;) are
1329 ignored. Any characters on a line followed by a (&num;) are also ignored.
1330 All other
1331 lines contain <it/directives/, which provide some setting or value
1332 to the system. Generally, settings are characterized by a single
1333 keyword, identifying the setting, followed by a number of parameters.
1334 Some settings are repeatable (r), while others may occur only once in a
1335 file. Some settings are optional (o), whicle others again are
1336 mandatory (m).
1337
1338 <sect2>The Abstract Syntax (.abs) Files
1339
1340 <p>
1341 The name of this file type is slightly misleading in Z39.50 terms,
1342 since, apart from the actual abstract syntax of the profile, it also
1343 includes most of the other definitions that go into a database
1344 profile.
1345
1346 When a record in the canonical, SGML-like format is read from a file
1347 or from the database, the first tag of the file should reference the
1348 profile that governs the layout of the record. If the first tag of the
1349 record is, say, <tt>&lt;gils&gt;</tt>, the system will look for the profile
1350 definition in the file <tt/gils.abs/. Profile definitions are cached,
1351 so they only have to be read once during the lifespan of the current
1352 process.
1353
1354 When writing your own input filters, the <bf/record-begin/ command
1355 introduces the profile, and should always be called first thing when
1356 introducing a new record.
1357
1358 The file may contain the following directives:
1359
1360 <descrip>
1361 <tag>name <it/symbolic-name/</tag> (m) This provides a shorthand name or
1362 description for the profile. Mostly useful for diagnostic purposes.
1363
1364 <tag>reference <it/OID-name/</tag> (m) The OID for
1365 the profile (name or dotted-numerical list).
1366
1367 <tag>attset <it/filename/</tag> (m) The attribute set that is used for
1368 indexing and searching records belonging to this profile.
1369
1370 <tag>tagset <it/filename/ &lsqb;<it/type/&rsqb;</tag> (o) The tag
1371 set (if any) that describe that fields of the records. The type, which
1372 is optional, specifies the tag type. If not given, the type-specifier
1373 in the Tag Set files is used.
1374
1375 <tag>varset <it/filename/</tag> (o) The variant set used in the profile.
1376
1377 <tag>maptab <it/filename/</tag> (o,r) This points to a
1378 conversion table that might be used if the client asks for the record
1379 in a different schema from the native one.
1380
1381 <tag>marc <it/filename/</tag> (o) Points to a file containing parameters
1382 for representing the record contents in the ISO2709 syntax. Read the
1383 description of the MARC representation facility below.
1384
1385 <tag>esetname <it/name filename/</tag> (o,r) Associates the
1386 given element set name with an element selection file. If an (@) is
1387 given in place of the filename, this corresponds to a null mapping for
1388 the given element set name.
1389
1390 <tag>any <it/tags/</tag> (o) This directive specifies a list of
1391 attributes which should be appended to the attribute list given for each
1392 element. The effect is to make every single element in the abstract
1393 syntax searchable by way of the given attributes. This directive
1394 provides an efficient way of supporting free-text searching across all
1395 elements. However, it does increase the size of the index
1396 significantly. The attributes can be qualified with a structure, as in
1397 the <bf/elm/ directive below.
1398
1399 <tag>elm <it/path name attributes/</tag> (o,r) Adds an element
1400 to the abstract record syntax of the schema. The <it/path/ follows the
1401 syntax which is suggested by the Z39.50 document - that is, a sequence
1402 of tags separated by slashes (/). Each tag is given as a
1403 comma-separated pair of tag type and -value surrounded by parenthesis.
1404 The <it/name/ is the name of the element, and the <it/attributes/
1405 specifies which attributes to use when indexing the element in a
1406 comma-separated list. A &excl; in
1407 place of the attribute name is equivalent to specifying an attribute
1408 name identical to the element name. A - in place of the attribute name
1409 specifies that no indexing is to take place for the given element. The
1410 attributes can be qualified with <it/field types/ to specify which
1411 character set should govern the indexing procedure for that field. The
1412 same data element may be indexed into several different fields, using
1413 different character set definitions. See the section
1414 <ref id="field structure and character sets"
1415 name="Field Structure and Character Sets">.
1416 The default field type is &dquot;w&dquot; for
1417 <it/word/.
1418 </descrip>
1419
1420 The following is an excerpt from the abstract syntax file for the GILS
1421 profile.
1422
1423 <tscreen><verb>
1424 name gils
1425 reference GILS-schema
1426 attset gils.att
1427 tagset gils.tag
1428 varset var1.var
1429
1430 maptab gils-usmarc.map
1431
1432 # Element set names
1433
1434 esetname VARIANT gils-variant.est  # for WAIS-compliance
1435 esetname B gils-b.est
1436 esetname G gils-g.est
1437 esetname F @
1438
1439 elm (1,10)              rank                        -
1440 elm (1,12)              url                         -
1441 elm (1,14)              localControlNumber     Local-number
1442 elm (1,16)              dateOfLastModification Date/time-last-modified
1443 elm (2,1)               Title                       w:!,p:!
1444 elm (4,1)               controlIdentifier      Identifier-standard
1445 elm (2,6)               abstract               Abstract
1446 elm (4,51)              purpose                     !
1447 elm (4,52)              originator                  -
1448 elm (4,53)              accessConstraints           !
1449 elm (4,54)              useConstraints              !
1450 elm (4,70)              availability                -
1451 elm (4,70)/(4,90)       distributor                 -
1452 elm (4,70)/(4,90)/(2,7) distributorName             !
1453 elm (4,70)/(4,90)/(2,10 distributorOrganization     !
1454 elm (4,70)/(4,90)/(4,2) distributorStreetAddress    !
1455 elm (4,70)/(4,90)/(4,3) distributorCity             !
1456 </verb></tscreen>
1457
1458 <sect2>The Attribute Set (.att) Files<label id="attset-files">
1459
1460 <p>
1461 This file type describes the <bf/Use/ elements of an attribute set.
1462 It contains the following directives.
1463
1464 <descrip>
1465
1466 <tag>name <it/symbolic-name/</tag> (m) This provides a shorthand name or
1467 description for the attribute set. Mostly useful for diagnostic purposes.
1468
1469 <tag>reference <it/OID-name/</tag> (m) The reference name of the OID for
1470 the attribute set.
1471
1472 <tag>include <it/filename/</tag> (o,r) This directive is used to
1473 include another attribute set as a part of the current one. This is
1474 used when a new attribute set is defined as an extension to another
1475 set. For instance, many new attribute sets are defined as extensions
1476 to the <bf/bib-1/ set. This is an important feature of the retrieval
1477 system of Z39.50, as it ensures the highest possible level of
1478 interoperability, as those access points of your database which are
1479 derived from the external set (say, bib-1) can be used even by clients
1480 who are unaware of the new set.
1481
1482 <tag>att <it/att-value att-name &lsqb;local-value&rsqb;/</tag> (o,r) This
1483 repeatable directive introduces a new attribute to the set. The
1484 attribute value is stored in the index (unless a <it/local-value/ is
1485 given, in which case this is stored). The name is used to refer to the
1486 attribute from the <it/abstract syntax/. </descrip>
1487
1488 This is an excerpt from the GILS attribute set definition. Notice how
1489 the file describing the <it/bib-1/ attribute set is referenced.
1490
1491 <tscreen><verb>
1492 name gils
1493 reference GILS-attset
1494 include bib1.att
1495
1496 att 2001                distributorName
1497 att 2002                indexTermsControlled
1498 att 2003                purpose
1499 att 2004                accessConstraints
1500 att 2005                useConstraints
1501 </verb></tscreen>
1502
1503 <sect2>The Tag Set (.tag) Files
1504
1505 <p>
1506 This file type defines the tagset of the profile, possibly by
1507 referencing other tag sets (most tag sets, for instance, will include
1508 tagsetG and tagsetM from the Z39.50 specification. The file may
1509 contain the following directives.
1510
1511 <descrip>
1512 <tag>name <it/symbolic-name/</tag> (m) This provides a shorthand name or
1513 description for the tag set. Mostly useful for diagnostic purposes.
1514
1515 <tag>reference <it/OID-name/</tag> (o) The reference name of the OID for
1516 the tag set. The directive is optional, since not all tag sets are
1517 registered outside of their schema.
1518
1519 <tag>type <it/integer/</tag> (m) The type number of the tagset within the schema
1520 profile (note: this specification really should belong to the .abs
1521 file. This will be fixed in a future release).
1522
1523 <tag>include <it/filename/</tag> (o,r) This directive is used
1524 to include the definitions of other tag sets into the current one.
1525
1526 <tag>tag <it/number names type/</tag> (o,r) Introduces a new
1527 tag to the set. The <it/number/ is the tag number as used in the protocol
1528 (there is currently no mechanism for specifying string tags at this
1529 point, but this would be quick work to add). The <it/names/ parameter
1530 is a list of names by which the tag should be recognized in the input
1531 file format. The names should be separated by slashes (/). The
1532 <it/type/ is th recommended datatype of the tag. It should be one of
1533 the following:
1534 <itemize>
1535 <item>structured
1536 <item>string
1537 <item>numeric
1538 <item>bool
1539 <item>oid
1540 <item>generalizedtime
1541 <item>intunit
1542 <item>int
1543 <item>octetstring
1544 <item>null
1545 </itemize>
1546 </descrip>
1547
1548 The following is an excerpt from the TagsetG definition file.
1549
1550 <tscreen><verb>
1551 name tagsetg
1552 reference TagsetG
1553 type 2
1554
1555 tag     1       title           string
1556 tag     2       author          string
1557 tag     3       publicationPlace string
1558 tag     4       publicationDate string
1559 tag     5       documentId      string
1560 tag     6       abstract        string
1561 tag     7       name            string
1562 tag     8       date            generalizedtime
1563 tag     9       bodyOfDisplay   string
1564 tag     10      organization    string
1565 </verb></tscreen>
1566
1567 <sect2>The Variant Set (.var) Files<label id="variant-set">
1568
1569 <p>
1570 The variant set file is a straightforward representation of the
1571 variant set definitions associated with the protocol. At present, only
1572 the <it/Variant-1/ set is known.
1573
1574 These are the directives allowed in the file.
1575
1576 <descrip>
1577 <tag>name <it/symbolic-name/</tag> (m) This provides a shorthand name or
1578 description for the variant set. Mostly useful for diagnostic purposes.
1579
1580 <tag>reference <it/OID-name/</tag> (o) The reference name of the OID for
1581 the variant set, if one is required.
1582
1583 <tag>class <it/integer class-name/</tag> (m,r) Introduces a new
1584 class to the variant set.
1585
1586 <tag>type <it/integer type-name datatype/</tag> (m,r) Addes a
1587 new type to the current class (the one introduced by the most recent
1588 <bf/class/ directive). The type names belong to the same name space as
1589 the one used in the tag set definition file.
1590 </descrip>
1591
1592 The following is an excerpt from the file describing the variant set
1593 <it/Variant-1/.
1594
1595 <tscreen><verb>
1596 name variant-1
1597 reference Variant-1
1598
1599 class 1 variantId
1600
1601   type  1       variantId               octetstring
1602
1603 class 2 body
1604
1605   type  1       iana                    string
1606   type  2       z39.50                  string
1607   type  3       other                   string
1608 </verb></tscreen>
1609
1610 <sect2>The Element Set (.est) Files
1611
1612 <p>
1613 The element set specification files describe a selection of a subset
1614 of the elements of a database record. The element selection mechanism
1615 is equivalent to the one supplied by the <it/Espec-1/ syntax of the
1616 Z39.50 specification. In fact, the internal representation of an
1617 element set specification is identical to the <it/Espec-1/ structure,
1618 and we'll refer you to the description of that structure for most of
1619 the detailed semantics of the directives below.
1620
1621 <it>
1622 NOTE: Not all of the Espec-1 functionality has been implemented yet.
1623 The fields that are mentioned below all work as expected, unless
1624 otherwise is noted.
1625 </it>
1626
1627 The directives available in the element set file are as follows:
1628
1629 <descrip>
1630 <tag>defaultVariantSetId <it/OID-name/</tag> (o) If variants are used in
1631 the following, this should provide the name of the variantset used
1632 (it's not currently possible to specify a different set in the
1633 individual variant request). In almost all cases (certainly all
1634 profiles known to us), the name <tt/Variant-1/ should be given here.
1635
1636 <tag>defaultVariantRequest <it/variant-request/</tag> (o) This directive
1637 provides a default variant request for
1638 use when the individual element requests (see below) do not contain a
1639 variant request. Variant requests consist of a blank-separated list of
1640 variant components. A variant compont is a comma-separated,
1641 parenthesized triple of variant class, type, and value (the two former
1642 values being represented as integers). The value can currently only be
1643 entered as a string (this will change to depend on the definition of
1644 the variant in question). The special value (@) is interpreted as a
1645 null value, however.
1646
1647 <tag>simpleElement <it/path &lsqb;'variant' variant-request&rsqb;/</tag>
1648 (o,r) This corresponds to a simple element request in <it/Espec-1/. The
1649 path consists of a sequence of tag-selectors, where each of these can
1650 consist of either:
1651
1652 <itemize>
1653 <item>A simple tag, consisting of a comma-separated type-value pair in
1654 parenthesis, possibly followed by a colon (:) followed by an
1655 occurrences-specification (see below). The tag-value can be a number
1656 or a string. If the first character is an apostrophe ('), this forces
1657 the value to be interpreted as a string, even if it appears to be numerical.
1658
1659 <item>A WildThing, represented as a question mark (?), possibly
1660 followed by a colon (:) followed by an occurrences specification (see
1661 below).
1662
1663 <item>A WildPath, represented as an asterisk (*). Note that the last
1664 element of the path should not be a wildPath (wildpaths don't work in
1665 this version).
1666 </itemize>
1667
1668 The occurrences-specification can be either the string <tt/all/, the
1669 string <tt/last/, or an explicit value-range. The value-range is
1670 represented as an integer (the starting point), possibly followed by a
1671 plus (+) and a second integer (the number of elements, default being
1672 one).
1673
1674 The variant-request has the same syntax as the defaultVariantRequest
1675 above. Note that it may sometimes be useful to give an empty variant
1676 request, simply to disable the default for a specific set of fields
1677 (we aren't certain if this is proper <it/Espec-1/, but it works in
1678 this implementation).
1679 </descrip>
1680
1681 The following is an example of an element specification belonging to
1682 the GILS profile.
1683
1684 <tscreen><verb>
1685 simpleelement (1,10)
1686 simpleelement (1,12)
1687 simpleelement (2,1)
1688 simpleelement (1,14)
1689 simpleelement (4,1)
1690 simpleelement (4,52)
1691 </verb></tscreen>
1692
1693 <sect2>The Schema Mapping (.map) Files<label id="schema-mapping">
1694
1695 <p>
1696 Sometimes, the client might want to receive a database record in
1697 a schema that differs from the native schema of the record. For
1698 instance, a client might only know how to process WAIS records, while
1699 the database record is represented in a more specific schema, such as
1700 GILS. In this module, a mapping of data to one of the MARC formats is
1701 also thought of as a schema mapping (mapping the elements of the
1702 record into fields consistent with the given MARC specification, prior
1703 to actually converting the data to the ISO2709). This use of the
1704 object identifier for USMARC as a schema identifier represents an
1705 overloading of the OID which might not be entirely proper. However,
1706 it represents the dual role of schema and record syntax which
1707 is assumed by the MARC family in Z39.50.
1708
1709 <it>
1710 NOTE: The schema-mapping functions are so far limited to a
1711 straightforward mapping of elements. This should be extended with
1712 mechanisms for conversions of the element contents, and conditional
1713 mappings of elements based on the record contents.
1714 </it>
1715
1716 These are the directives of the schema mapping file format:
1717
1718 <descrip>
1719 <tag>targetName <it/name/</tag> (m) A symbolic name for the target schema
1720 of the table. Useful mostly for diagnostic purposes.
1721
1722 <tag>targetRef <it/OID-name/</tag> (m) An OID name for the target schema.
1723 This is used, for instance, by a server receiving a request to present
1724 a record in a different schema from the native one.
1725
1726 <tag>map <it/element-name target-path/</tag> (o,r) Adds
1727 an element mapping rule to the table.
1728 </descrip>
1729
1730 <sect2>The MARC (ISO2709) Representation (.mar) Files
1731
1732 <p>
1733 This file provides rules for representing a record in the ISO2709
1734 format. The rules pertain mostly to the values of the constant-length
1735 header of the record.
1736
1737 <it>NOTE: This will be described better. We're in the process of
1738 re-evaluating and most likely changing the way that MARC records are
1739 handled by the system.</it>
1740
1741 <sect2>Field Structure and Character Sets
1742 <label id="field structure and character sets">
1743
1744 <p>
1745 In order to provide a flexible approach to national character set
1746 handling, Zebra allows the administrator to configure the set up the
1747 system to handle any 8-bit character set &mdash; including sets that
1748 require multi-octet diacritics or other multi-octet characters. The
1749 definition of a character set includes a specification of the
1750 permissible values, their sort order (this affects the display in the
1751 SCAN function), and relationships between upper- and lowercase
1752 characters. Finally, the definition includes the specification of
1753 space characters for the set.
1754
1755 The operator can define different character sets for different fields,
1756 typical examples being standard text fields, numerical fields, and
1757 special-purpose fields such as WWW-style linkages (URx).
1758
1759 The field types, and hence character sets, are associated with data
1760 elements by the .abs files (see above). The file <tt/default.idx/
1761 provides the association between field type codes (as used in the .abs
1762 files) and the character map files (with the .chr suffix). The format
1763 of the .idx file is as follows
1764
1765 <descrip>
1766 <tag>index <it/field type code/</tag>This directive introduces a new
1767 search index code. The argument is a one-character code to be used in the
1768 .abs files to select this particular index type. An index, roughly,
1769 corresponds to a particular structure attribute during search. Refer
1770 to section <ref id="search" name="Search">.
1771
1772 <tag>sort <it/field code type/</tag>This directive introduces a
1773 sort index. The argument is a one-character code to be used in the
1774 .abs fie to select this particular index type. The corresponding
1775 use attribute must be used in the sort request to refer to this
1776 particular sort index. The corresponding character map (see below)
1777 is used in the sort process.
1778
1779 <tag>completeness <it/boolean/</tag>This directive enables or disables
1780 complete field indexing. The value of the <it/boolean/ should be 0
1781 (disable) or 1. If completeness is enabled, the index entry will
1782 contain the complete contents of the field (up to a limit), with words
1783 (non-space characters) separated by single space characters
1784 (normalized to &dquot; &dquot; on display). When completeness is
1785 disabled, each word is indexed as a separate entry. Complete subfield
1786 indexing is most useful for fields which are typically browsed (eg.
1787 titles, authors, or subjects), or instances where a match on a
1788 complete subfield is essential (eg. exact title searching). For fields
1789 where completeness is disabled, the search engine will interpret a
1790 search containing space characters as a word proximity search.
1791
1792 <tag>charmap <it/filename/</tag> This is the filename of the character
1793 map to be used for this index for field type.
1794 </descrip>
1795
1796 The contents of the character map files are structured as follows:
1797
1798 <descrip>
1799 <tag>lowercase <it/value-set/</tag>This directive introduces the basic
1800 value set of the field type. The format is an ordered list (without
1801 spaces) of the characters which may occur in &dquot;words&dquot; of
1802 the given type. The order of the entries in the list determines the
1803 sort order of the index. In addition to single characters, the
1804 following combinations are legal:
1805
1806 <itemize>
1807 <item>Backslashes may be used to introduce three-digit octal, or
1808 two-digit hex representations of single characters (preceded by <tt/x/).
1809 In addition, the combinations
1810 \\, \\r, \\n, \\t, \\s (space &mdash; remember that real space-characters
1811 may ot occur in the value definition), and \\ are recognised,
1812 with their usual interpretation.
1813
1814 <item>Curly braces {} may be used to enclose ranges of single
1815 characters (possibly using the escape convention described in the
1816 preceding point), eg. {a-z} to entroduce the standard range of ASCII
1817 characters. Note that the interpretation of such a range depends on
1818 the concrete representation in your local, physical character set.
1819
1820 <item>Paranthesises () may be used to enclose multi-byte characters -
1821 eg. diacritics or special national combinations (eg. Spanish
1822 &dquot;ll&dquot;). When found in the input stream (or a search term),
1823 these characters are viewed and sorted as a single character, with a
1824 sorting value depending on the position of the group in the value
1825 statement.
1826 </itemize>
1827
1828 <tag>uppercase <it/value-set/</tag>This directive introduces the
1829 upper-case equivalencis to the value set (if any). The number and
1830 order of the entries in the list should be the same as in the
1831 <tt/lowercase/ directive.
1832
1833 <tag>space <it/value-set/</tag>This directive introduces the character
1834 which separate words in the input stream. Depending on the
1835 completeness mode of the field in question, these characters either
1836 terminate an index entry, or delimit individual &dquot;words&dquot; in
1837 the input stream. The order of the elements is not significant &mdash;
1838 otherwise the representation is the same as for the <tt/upercase/ and
1839 <tt/lowercase/ directives.
1840
1841 <tag>map <it/value-set/ <it/target/</tag>This directive introduces a
1842 mapping between each of the members of the value-set on the left to
1843 the character on the right. The character on the right must occur in
1844 the value set (the <tt/lowercase/ directive) of the character set, but
1845 it may be a paranthesis-enclosed multi-octet character. This directive
1846 may be used to map diacritics to their base characters, or to map
1847 HTML-style character-representations to their natural form, etc.
1848 </descrip>
1849
1850 <sect1>Exchange Formats
1851
1852 <p>
1853 Converting records from the internal structure to en exchange format
1854 is largely an automatic process. Currently, the following exchange
1855 formats are supported:
1856
1857 <itemize>
1858 <item>GRS-1. The internal representation is based on GRS-1, so the
1859 conversion here is straightforward. The system will create
1860 applied variant and supported variant lists as required, if a record
1861 contains variant information.
1862
1863 <item>SUTRS. Again, the mapping is fairly straighforward. Indentation
1864 is used to show the hierarchical structure of the record. All
1865 &dquot;GRS&dquot; type records support both the GRS-1 and SUTRS
1866 representations.
1867
1868 <item>ISO2709-based formats (USMARC, etc.). Only records with a
1869 two-level structure (corresponding to fields and subfields) can be
1870 directly mapped to ISO2709. For records with a different structuring
1871 (eg., GILS), the representation in a structure like USMARC involves a
1872 schema-mapping (see section <ref id="schema-mapping" name="Schema
1873 Mapping">), to an &dquot;implied&dquot; USMARC schema (implied,
1874 because there is no formal schema which specifies the use of the
1875 USMARC fields outside of ISO2709). The resultant, two-level record is
1876 then mapped directly from the internal representation to ISO2709. See
1877 the GILS schema definition files for a detailed example of this
1878 approach.
1879
1880 <item>Explain. This representation is only available for records
1881 belonging to the Explain schema.
1882
1883 <item>Summary.  This ASN-1 based structure is only available for records
1884 belonging to the Summary schema - or schema which provide a mapping
1885 to this schema (see the description of the schema mapping facility
1886 above).
1887
1888 <item>SOIF. Support for this syntax is experimental, and is currently
1889 keyed to a private Index Data OID (1.2.840.10003.5.1000.81.2). All
1890 abstract syntaxes can be mapped to the SOIF format, although nested
1891 elements are represented by concatenation of the tag names at each
1892 level.
1893
1894 <item>XML. The use of XML as a transfer syntax in Z39.50 is not yet widely established
1895 so the use of it here must be characterised as somewhat experimental. The
1896 tag-names used are taken from the tag-set in use, except for local string tags
1897 where the tag itself is passed through unchanged.
1898
1899 </itemize>
1900
1901 <sect>License
1902
1903 <p>
1904 Zebra
1905 Copyright (c) 1995-2000 Index Data ApS.
1906
1907 All rights reserved.
1908
1909 Use and redistribution in source or binary form, with or without
1910 modification, of any or all of this software and documentation is
1911 permitted, provided that the following Conditions 1 to 6 set out below
1912 are met.
1913
1914 1. Unless prior specific written permission is obtained this copyright
1915 and permission notice appear with all copies of the software and its
1916 documentation. Notices of copyright or attribution which appear at the
1917 beginning of any file must remain unchanged.
1918
1919 2. The names of Index Data or the individual authors may not be used
1920 to endorse or promote products derived from this software without
1921 specific prior written permission.
1922
1923 3. Source code or binary versions of this software and its documentation
1924 may be used freely in not for profit applications limited to databases
1925 of 100,000 records maximum. Other applications - such as publishing over
1926 100,000 records, providing for-pay services, distributing a product based
1927 in whole or in part on this software or its documentation, or generally
1928 distributing this software or its documentation under a different license
1929 require a commercial license from Index Data.
1930
1931 4. The software may be installed and used for evaluation purposes in
1932 conjunction with such commercially licensed applications for a trial
1933 period no longer than 60 days.
1934
1935 5. Unless a prior specific written agreement is obtained THIS SOFTWARE
1936 IS PROVIDED "AS IS" AND WITHOUT WARRANTY OF ANY KIND, EXPRESS, IMPLIED,
1937 OR OTHERWISE, INCLUDING WITHOUT LIMITATION, ANY WARRANTY OF
1938 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL
1939 INDEX DATA BE LIABLE FOR ANY SPECIAL, INCIDENTAL, INDIRECT OR
1940 CONSEQUENTIAL DAMAGES OF ANY KIND, OR ANY DAMAGES WHATSOEVER RESULTING
1941 FROM LOSS OF USE, DATA OR PROFITS, WHETHER OR NOT ADVISED OF THE
1942 POSSIBILITY OF DAMAGE, AND ON ANY THEORY OF LIABILITY, ARISING OUT OF
1943 OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
1944
1945 6. Commercial licenses and support agreements for Zebra and related
1946 Index Data products such as Z'bol (c) - and written agreements
1947 relating to these Conditions may be obtained only from Index Data
1948 or its appointed agents as follows:
1949
1950 Index Data: www.indexdata.dk
1951 Fretwell-Downing Informatics: www.fdgroup.co.uk
1952 Fretwell-Downing Informatics USA: www.fdi.com
1953
1954 <sect>About Index Data and the Zebra Server
1955
1956 <p>
1957 Index Data is a consulting and software-development enterprise that
1958 specialises in information management and retrieval applications. Our
1959 interests and expertise span a broad range of related fields, and one
1960 of our primary, long-term objectives is the development of a powerful
1961 information management
1962 system with open network interfaces and hypermedia capabilities. Zebra is an
1963 important component in this strategy.
1964
1965 We make this software available free of charge for not-for-profit
1966 purposes, as a service to the networking community, and to further
1967 the development and use of quality software for open network
1968 communication. We encourage your comments and questions if you have ideas, things
1969 you would like to  see in future versions, or things you would like to
1970 contribute.
1971
1972 If you like this software, and would like to use all or part of it in
1973 a commercial product, or to provide a commercial database service,
1974 please contact us. The Z'mbol Information System represents the commercial
1975 variant of Zebra. It includes full support; additional functionality and
1976 performance-boosting features, and it has what we think is a very exciting
1977 development path.
1978
1979 <tscreen><verb>
1980 Index Data
1981 Ryesgade 3
1982 DK-2200 Copenhagen N
1983 </verb></tscreen>
1984
1985 <p>
1986 <tscreen><verb>
1987 Phone: +45 3536 3672
1988 Fax  : +45 3536 0449
1989 Email: info@indexdata.dk
1990 </verb></tscreen>
1991
1992 The <it>Random House College Dictionary</it>, 1975 edition
1993 offers this definition of the
1994 word &dquot;Zebra&dquot;:
1995
1996 <it>
1997 Zebra, n., any of several horselike, African mammals of the genus Equus,
1998 having a characteristic pattern of black or dark-brown stripes on
1999 a whitish background.
2000 </it>
2001
2002 </article>