From 5ca4e60e990af6ad6b62ebff855d7b642f37c3ec Mon Sep 17 00:00:00 2001 From: Marc Cromme Date: Fri, 2 Feb 2007 09:58:39 +0000 Subject: [PATCH] added acronyme entities --- doc/administration.xml | 102 +++++++++++++------------- doc/architecture.xml | 54 +++++++------- doc/examples.xml | 18 ++--- doc/field-structure.xml | 10 +-- doc/idzebra-config.xml | 4 +- doc/indexdata.xml | 4 +- doc/installation.xml | 64 ++++++++--------- doc/introduction.xml | 119 +++++++++++++++++++----------- doc/license.xml | 10 +-- doc/marc_indexing.xml | 18 ++--- doc/querymodel.xml | 160 ++++++++++++++++++++--------------------- doc/quickstart.xml | 8 +-- doc/recordmodel-alvisxslt.xml | 70 +++++++++--------- doc/recordmodel-grs.xml | 42 +++++------ doc/zebra.xml | 34 ++++----- doc/zebraidx.xml | 20 +++--- 16 files changed, 380 insertions(+), 357 deletions(-) diff --git a/doc/administration.xml b/doc/administration.xml index 829ef75..13baec6 100644 --- a/doc/administration.xml +++ b/doc/administration.xml @@ -1,6 +1,6 @@ - - Administrating Zebra + + Administrating &zebra; - Unlike many simpler retrieval systems, Zebra supports safe, incremental + Unlike many simpler retrieval systems, &zebra; supports safe, incremental updates to an existing index. - Normally, when Zebra modifies the index it reads a number of records + Normally, when &zebra; modifies the index it reads a number of records that you specify. Depending on your specifications and on the contents of each record one the following events take place for each record: @@ -25,8 +25,8 @@ The record is indexed as if it never occurred before. - Either the Zebra system doesn't know how to identify the record or - Zebra can identify the record but didn't find it to be already indexed. + Either the &zebra; system doesn't know how to identify the record or + &zebra; can identify the record but didn't find it to be already indexed. @@ -53,20 +53,20 @@ - Please note that in both the modify- and delete- case the Zebra + Please note that in both the modify- and delete- case the &zebra; indexer must be able to generate a unique key that identifies the record in question (more on this below). - To administrate the Zebra retrieval system, you run the + To administrate the &zebra; retrieval system, you run the zebraidx program. This program supports a number of options which are preceded by a dash, and a few commands (not preceded by dash). - Both the Zebra administrative tool and the Z39.50 server share a + Both the &zebra; administrative tool and the Z39.50 server share a set of index files and a global configuration file. The name of the configuration file defaults to zebra.cfg. @@ -85,7 +85,7 @@ Indexing is a per-record process, in which either insert/modify/delete will occur. Before a record is indexed search keys are extracted from whatever might be the layout the original record (sgml,html,text, etc..). - The Zebra system currently supports two fundamental types of records: + The &zebra; system currently supports two fundamental types of records: structured and simple text. To specify a particular extraction process, use either the command line option -t or specify a @@ -95,10 +95,10 @@ - The Zebra Configuration File + The &zebra; Configuration File - The Zebra configuration file, read by zebraidx and + The &zebra; configuration file, read by zebraidx and zebrasrv defaults to zebra.cfg unless specified by -c option. @@ -220,10 +220,10 @@ Specifies whether the records should be stored internally - in the Zebra system files. + in the &zebra; system files. If you want to maintain the raw records yourself, this option should be false (0). - If you want Zebra to take care of the records for you, it + If you want &zebra; to take care of the records for you, it should be true(1). @@ -233,7 +233,7 @@ register: register-location - Specifies the location of the various register files that Zebra uses + Specifies the location of the various register files that &zebra; uses to represent your databases. See . @@ -243,7 +243,7 @@ shadow: register-location - Enables the safe update facility of Zebra, and + Enables the safe update facility of &zebra;, and tells the system where to place the required, temporary files. See . @@ -316,7 +316,7 @@ estimatehits:: integer - Controls whether Zebra should calculate approximite hit counts and + Controls whether &zebra; should calculate approximite hit counts and at which hit count it is to be enabled. A value of 0 disables approximiate hit counts. For a positive value approximaite hit count is enabled @@ -373,9 +373,9 @@ root: dir - Specifies a directory base for Zebra. All relative paths + Specifies a directory base for &zebra;. All relative paths given (in profilePath, register, shadow) are based on this - directory. This setting is useful if your Zebra server + directory. This setting is useful if your &zebra; server is running in a different directory from where zebra.cfg is located. @@ -386,7 +386,7 @@ passwd: file - Specifies a file with description of user accounts for Zebra. + Specifies a file with description of user accounts for &zebra;. The format is similar to that known to Apache's htpasswd files and UNIX' passwd files. Non-empty lines not beginning with # are considered account lines. There is one account per-line. @@ -400,7 +400,7 @@ passwd.c: file - Specifies a file with description of user accounts for Zebra. + Specifies a file with description of user accounts for &zebra;. File format is similar to that used by the passwd directive except that the password are encrypted. Use Apache's htpasswd or similar for maintenance. @@ -414,7 +414,7 @@ Specifies permissions (priviledge) for a user that are allowed - to access Zebra via the passwd system. There are two kinds + to access &zebra; via the passwd system. There are two kinds of permissions currently: read (r) and write(w). By default users not listed in a permission directive are given the read privilege. To specify permissions for a user with no @@ -448,7 +448,7 @@ Locating Records - The default behavior of the Zebra system is to reference the + The default behavior of the &zebra; system is to reference the records from their original location, i.e. where they were found when you run zebraidx. That is, when a client wishes to retrieve a record @@ -463,7 +463,7 @@ If your input files are not permanent - for example if you retrieve your records from an outside source, or if they were temporarily mounted on a CD-ROM drive, - you may want Zebra to make an internal copy of them. To do this, + you may want &zebra; to make an internal copy of them. To do this, you specify 1 (true) in the storeData setting. When the Z39.50 server retrieves the records they will be read from the internal file structures of the system. @@ -557,7 +557,7 @@ To enable indexing with pathname IDs, you must specify file as the value of recordId in the configuration file. In addition, you should set - storeKeys to 1, since the Zebra + storeKeys to 1, since the &zebra; indexer must save additional information about the contents of each record in order to modify the indexes correctly at a later time. @@ -587,7 +587,7 @@ You cannot start out with a group of records with simple indexing (no record IDs as in the previous section) and then later - enable file record Ids. Zebra must know from the first time that you + enable file record Ids. &zebra; must know from the first time that you index the group that the files should be indexed with file record IDs. @@ -698,7 +698,7 @@ - For instance, the sample GILS records that come with the Zebra + For instance, the sample GILS records that come with the &zebra; distribution contain a unique ID in the data tagged Control-Identifier. The data is mapped to the Bib-1 use attribute Identifier-standard (code 1007). To use this field as a record id, specify @@ -752,7 +752,7 @@ zebraidx. If you wish to store these, possibly large, files somewhere else, you must add the register entry to the zebra.cfg file. - Furthermore, the Zebra system allows its file + Furthermore, the &zebra; system allows its file structures to span multiple file systems, which is useful for managing very large databases. @@ -767,7 +767,7 @@ The dir specifies a directory in which index files will be stored and the size specifies the maximum - size of all files in that directory. The Zebra indexer system fills + size of all files in that directory. The &zebra; indexer system fills each directory in the order specified and use the next specified directories as needed. The size is an integer followed by a qualifier @@ -792,12 +792,12 @@ - Note that Zebra does not verify that the amount of space specified is + Note that &zebra; does not verify that the amount of space specified is actually available on the directory (file system) specified - it is your responsibility to ensure that enough space is available, and that other applications do not attempt to use the free space. In a large production system, it is recommended that you allocate one or more - file system exclusively to the Zebra register files. + file system exclusively to the &zebra; register files. @@ -809,9 +809,9 @@ Description - The Zebra server supports updating of the index + The &zebra; server supports updating of the index structures. That is, you can add, modify, or remove records from - databases managed by Zebra without rebuilding the entire index. + databases managed by &zebra; without rebuilding the entire index. Since this process involves modifying structured files with various references between blocks of data in the files, the update process is inherently sensitive to system crashes, or to process interruptions: @@ -826,7 +826,7 @@ You can solve these problems by enabling the shadow register system in - Zebra. + &zebra;. During the updating procedure, zebraidx will temporarily write changes to the involved files in a set of "shadow files", without modifying the files that are accessed by the @@ -977,7 +977,7 @@ Overview The default ordering of a result set is left up to the server, - which inside Zebra means sorting in ascending document ID order. + which inside &zebra; means sorting in ascending document ID order. This is not always the order humans want to browse the sometimes quite large hit sets. Ranking and sorting comes to the rescue. @@ -996,7 +996,7 @@ Simply put, dynamic relevance ranking sorts a set of retrieved records such that those most likely to be relevant to your request are retrieved first. - Internally, Zebra retrieves all documents that satisfy your + Internally, &zebra; retrieves all documents that satisfy your query, and re-orders the hit list to arrange them based on a measurement of similarity between your query and the content of each record. @@ -1015,7 +1015,7 @@ Static Ranking - Zebra uses internally inverted indexes to look up term occurencies + &zebra; uses internally inverted indexes to look up term occurencies in documents. Multiple queries from different indexes can be combined by the binary boolean operations AND, OR and/or NOT (which @@ -1037,7 +1037,7 @@ staticrank: 1 - directive in the main core Zebra configuration file, the internal document + directive in the main core &zebra; configuration file, the internal document keys used for ordering are augmented by a preceding integer, which contains the static rank of a given document, and the index lists are ordered @@ -1110,7 +1110,7 @@ algorithms, which only considers searching in one full-text index, this one works on multiple indexes at the same time. More precisely, - Zebra does boolean queries and searches in specific addressed + &zebra; does boolean queries and searches in specific addressed indexes (there are inverted indexes pointing from terms in the dictionary to documents and term positions inside documents). It works like this: @@ -1415,7 +1415,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci Sorting - Zebra sorts efficiently using special sorting indexes + &zebra; sorts efficiently using special sorting indexes (type=s; so each sortable index must be known at indexing time, specified in the configuration of record indexing. For example, to enable sorting according to the BIB-1 @@ -1485,7 +1485,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci - Extended services are only supported when accessing the Zebra + Extended services are only supported when accessing the &zebra; server using the Z39.50 protocol. The SRU protocol does not support extended services. @@ -1494,7 +1494,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci The extended services are not enabled by default in zebra - due to the - fact that they modify the system. Zebra can be configured + fact that they modify the system. &zebra; can be configured to allow anybody to search, and to allow only updates for a particular admin user in the main zebra configuration file zebra.cfg. @@ -1512,7 +1512,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci admin:secret - It is essential to configure Zebra to store records internally, + It is essential to configure &zebra; to store records internally, and to support modifications and deletion of records: @@ -1537,7 +1537,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci It is not possible to carry information about record types or - similar to Zebra when using extended services, due to + similar to &zebra; when using extended services, due to limitations of the Z39.50 protocol. Therefore, indexing filters can not be chosen on a per-record basis. One and only one general XML indexing filter @@ -1613,7 +1613,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci recordIdNumber positive number - Zebra's internal system number, + &zebra;'s internal system number, not allowed for recordInsert or specialUpdate actions which result in fresh record inserts. @@ -1645,7 +1645,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci During all actions, the usual rules for internal record ID generation apply, unless an - optional recordIdNumber Zebra internal ID or a + optional recordIdNumber &zebra; internal ID or a recordIdOpaque string identifier is assigned. The default ID generation is configured using the recordId: from @@ -1655,7 +1655,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci Setting of the recordIdNumber parameter, - which must be an existing Zebra internal system ID number, is not + which must be an existing &zebra; internal system ID number, is not allowed during any recordInsert or specialUpdate action resulting in fresh record inserts. @@ -1663,7 +1663,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci When retrieving existing - records indexed with GRS indexing filters, the Zebra internal + records indexed with GRS indexing filters, the &zebra; internal ID number is returned in the field /*/id:idzebra/localnumber in the namespace xmlns:id="http://www.indexdata.dk/zebra/", @@ -1673,7 +1673,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci A new element set for retrieval of internal record data has been added, which can be used to access minimal records - containing only the recordIdNumber Zebra + containing only the recordIdNumber &zebra; internal ID, or the recordIdOpaque string identifier. This works for any indexing filter used. See . @@ -1688,13 +1688,13 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci records. This identifier will replace zebra's own automagic identifier generation with a unique mapping from recordIdOpaque to the - Zebra internal recordIdNumber. + &zebra; internal recordIdNumber. The opaque recordIdOpaque string identifiers are not visible in retrieval records, nor are searchable, so the value of this parameter is questionable. It serves mostly as a convenient mapping from - application domain string identifiers to Zebra internal ID's. + application domain string identifiers to &zebra; internal ID's. diff --git a/doc/architecture.xml b/doc/architecture.xml index 3a8131a..86a04fb 100644 --- a/doc/architecture.xml +++ b/doc/architecture.xml @@ -1,29 +1,29 @@ - - Overview of Zebra Architecture + + Overview of &zebra; Architecture
Local Representation - As mentioned earlier, Zebra places few restrictions on the type of + As mentioned earlier, &zebra; places few restrictions on the type of data that you can index and manage. Generally, whatever the form of the data, it is parsed by an input filter specific to that format, and - turned into an internal structure that Zebra knows how to handle. This + turned into an internal structure that &zebra; knows how to handle. This process takes place whenever the record is accessed - for indexing and retrieval. The RecordType parameter in the zebra.cfg file, or - the -t option to the indexer tells Zebra how to + the -t option to the indexer tells &zebra; how to process input records. Two basic types of processing are available - raw text and structured data. Raw text is just that, and it is selected by providing the - argument text to Zebra. Structured records are + argument text to &zebra;. Structured records are all handled internally using the basic mechanisms described in the subsequent sections. - Zebra can read structured records in many different formats. + &zebra; can read structured records in many different formats. + Example Configurations @@ -15,7 +15,7 @@ option to specify an alternative master configuration file. - The master configuration file tells Zebra: + The master configuration file tells &zebra;: @@ -64,7 +64,7 @@ Example 1: XML Indexing And Searching - This example shows how Zebra can be used with absolutely minimal + This example shows how &zebra; can be used with absolutely minimal configuration to index a body of XML documents, and search them using @@ -88,9 +88,9 @@ would you? :-) - Now we need to create a Zebra database to hold and index the XML + Now we need to create a &zebra; database to hold and index the XML records. We do this with the - Zebra indexer, zebraidx, which is + &zebra; indexer, zebraidx, which is driven by the zebra.cfg configuration file. For our purposes, we don't need any special behaviour - we can use the defaults - so we can start with a @@ -102,7 +102,7 @@ - That's all you need for a minimal Zebra configuration. Now you can + That's all you need for a minimal &zebra; configuration. Now you can roll the XML records into the database and build the indexes: zebraidx update records @@ -228,7 +228,7 @@ <Zthes> element. - This is a two-step process. First, we need to tell Zebra that we + This is a two-step process. First, we need to tell &zebra; that we want to support the BIB-1 attribute set. Then we need to tell it which elements of its record pertain to access point 4. @@ -271,7 +271,7 @@ xelm /Zthes/termModifiedBy termModifiedBy:w Declare Bib-1 attribute set. See bib1.att in - Zebra's tab directory. + &zebra;'s tab directory. @@ -375,7 +375,7 @@ rendering engine can handle. I generated the EPS version of the image by exporting a line-drawing done in TGIF, then converted that to the GIF using a shell-script called "epstogif" which used an appallingly baroque sequence of conversions, which I would prefer not to pollute -the Zebra build environment with: +the &zebra; build environment with: #!/bin/sh diff --git a/doc/field-structure.xml b/doc/field-structure.xml index 449260d..4079205 100644 --- a/doc/field-structure.xml +++ b/doc/field-structure.xml @@ -1,11 +1,11 @@ - + Field Structure and Character Sets In order to provide a flexible approach to national character set - handling, Zebra allows the administrator to configure the set up the + handling, &zebra; allows the administrator to configure the set up the system to handle any 8-bit character set — including sets that require multi-octet diacritics or other multi-octet characters. The definition of a character set includes a specification of the @@ -149,13 +149,13 @@ The character map files are used to define the word tokenization and character normalization performed before inserting text into - the inverse indexes. Zebra ships with the predefined character map + the inverse indexes. &zebra; ships with the predefined character map files tab/*.chr. Users are allowed to add and/or modify maps according to their needs. - Character maps predefined in Zebra + Character maps predefined in &zebra; @@ -389,7 +389,7 @@ In addition to specifying sort orders, space (blank) handling, and upper/lowercase folding, you can also use the character map - files to make Zebra ignore leading articles in sorting records, + files to make &zebra; ignore leading articles in sorting records, or when doing complete field searching. diff --git a/doc/idzebra-config.xml b/doc/idzebra-config.xml index 247ad61..8e635b9 100644 --- a/doc/idzebra-config.xml +++ b/doc/idzebra-config.xml @@ -8,7 +8,7 @@ %common; ]> - + zebra @@ -104,7 +104,7 @@ --modules - Return directory for Zebra modules. + Return directory for &zebra; modules. diff --git a/doc/indexdata.xml b/doc/indexdata.xml index 9433231..431bed4 100644 --- a/doc/indexdata.xml +++ b/doc/indexdata.xml @@ -1,6 +1,6 @@ - - About Index Data and the Zebra Server + + About Index Data and the &zebra; Server Index Data is a consulting and software-development enterprise that diff --git a/doc/installation.xml b/doc/installation.xml index c654d4b..328c1c2 100644 --- a/doc/installation.xml +++ b/doc/installation.xml @@ -1,8 +1,8 @@ - + Installation - Zebra is written in ANSI C and was implemented with portability in mind. + &zebra; is written in ANSI C and was implemented with portability in mind. We primarily use GCC on UNIX and Microsoft Visual C++ on Windows. @@ -21,7 +21,7 @@ - Zebra can be configured to use the following utilities (most of + &zebra; can be configured to use the following utilities (most of which are optional): @@ -30,9 +30,9 @@ (required) - Zebra uses YAZ to support Z39.50 / + &zebra; uses YAZ to support Z39.50 / SRU. - Also the memory management utilites from YAZ is used by Zebra. + Also the memory management utilites from YAZ is used by &zebra;. @@ -64,7 +64,7 @@ Tcl is required if you need to use the Tcl record filter - for Zebra. You can find binary packages for Tcl for many + for &zebra;. You can find binary packages for Tcl for many Unices and Windows. @@ -78,8 +78,8 @@ GNU Automake and Autoconf are only required if you're - using the CVS version of Zebra. You do not need these - if you have fetched a Zebra tar. + using the CVS version of &zebra;. You do not need these + if you have fetched a &zebra; tar. @@ -90,7 +90,7 @@ These tools are only required if you're writing - documentation for Zebra. You need the following + documentation for &zebra;. You need the following Debian packages: docbook, docbook-xml, docbook-xsl, docbook-utils, xsltproc. @@ -111,7 +111,7 @@ shell script attempts to guess correct values for various system-dependent variables used during compilation. It uses those values to create a Makefile in each - directory of Zebra. + directory of &zebra;. @@ -177,7 +177,7 @@ index/*.so - The .so-files are Zebra record filter modules. + The .so-files are &zebra; record filter modules. There are modules for reading MARC (mod-grs-marc.so), XML (mod-grs-xml.so) , etc. @@ -191,18 +191,18 @@ Using configure option --disable-shared builds - Zebra statically and links "in" Zebra filter code statically, i.e. + &zebra; statically and links "in" &zebra; filter code statically, i.e. no .so-files are generated - You can now use Zebra. If you wish to install it system-wide, then + You can now use &zebra;. If you wish to install it system-wide, then as root type make install - By default this will install the Zebra executables in + By default this will install the &zebra; executables in /usr/local/bin, and the standard configuration files in /usr/local/share/idzebra-2.0. If @@ -240,7 +240,7 @@ apt-get update as root, the - Zebra indexer is + &zebra; indexer is easily installed issuing apt-get install idzebra-2.0 idzebra-2.0-doc @@ -251,7 +251,7 @@
Ubuntu/Debian and GNU/Debian on other platforms - These Zebra + These &zebra; packages are specifically compiled for GNU/Debian Linux systems. Installation on other GNU/Debian systems is possible by @@ -272,7 +272,7 @@ apt-get build-dep idzebra-2.0 as root, the - Zebra indexer is + &zebra; indexer is recompiled and installed issuing fakeroot apt-get source --compile idzebra-2.0 @@ -288,15 +288,15 @@
WIN32 - The easiest way to install Zebra on Windows is by downloading + The easiest way to install &zebra; on Windows is by downloading an installer from here. The installer comes with source too - in case you wish to - compile Zebra with different Compiler options. + compile &zebra; with different Compiler options. - Zebra is shipped with "makefiles" for the NMAKE tool that comes + &zebra; is shipped with "makefiles" for the NMAKE tool that comes with Microsoft Visual C++. Version 2003 and 2005 has been tested. We expect that zebra compiles with version 6 as well. @@ -323,7 +323,7 @@ YAZDIR - Directory of YAZ source. Zebra's makefile expects to find + Directory of YAZ source. &zebra;'s makefile expects to find yaz.lib, yaz.dll in yazdir/lib and yazdir/bin respectively. @@ -335,7 +335,7 @@ HAVE_EXPAT, EXPAT_DIR - If HAVE_EXPAT is set to 1, Zebra is compiled + If HAVE_EXPAT is set to 1, &zebra; is compiled with Expat support. In this configuration, set ZEBRA_DIR to the Expat source directory. @@ -348,7 +348,7 @@ HAVE_ICONV, ICONV_DIR - If HAVE_ICONV is set to 1, Zebra is compiled + If HAVE_ICONV is set to 1, &zebra; is compiled with iconv support. In this configuration, set ICONV_DIR to the iconv source directory. Iconv binaries can be downloaded from @@ -363,7 +363,7 @@ BZIP2DEF - Define these symbols if Zebra is to be compiled with + Define these symbols if &zebra; is to be compiled with BZIP2 record compression support. @@ -372,10 +372,10 @@ - The DEBUG setting in the makefile for Zebra must + The DEBUG setting in the makefile for &zebra; must be set to the same value as DEBUG setting in the makefile for YAZ. - If not, the Zebra server/indexer will crash. + If not, the &zebra; server/indexer will crash. @@ -395,7 +395,7 @@ - If you wish to recompile Zebra - for example if you modify + If you wish to recompile &zebra; - for example if you modify settings in the makefile you can delete object files, etc by running. @@ -408,12 +408,12 @@ bin/zebraidx.exe - The Zebra indexer. + The &zebra; indexer. bin/zebrasrv.exe - The Zebra server. + The &zebra; server. @@ -423,9 +423,9 @@
- Upgrading from Zebra version 1.3.x + Upgrading from &zebra; version 1.3.x - Zebra's installation directories have changed a bit. In addition, + &zebra;'s installation directories have changed a bit. In addition, the new loadable modules must be defined in the master zebra.cfg configuration file. The old version 1.3.x configuration options @@ -444,7 +444,7 @@ - The internal binary register structures have changed; all Zebra + The internal binary register structures have changed; all &zebra; databases must be re-indexed after upgrade. diff --git a/doc/introduction.xml b/doc/introduction.xml index 29d759d..ff4e41e 100644 --- a/doc/introduction.xml +++ b/doc/introduction.xml @@ -1,12 +1,45 @@ - + Introduction
Overview + + &zebra; is a free, fast, friendly information management system. It can + index records in XML/SGML, MARC, e-mail archives and many other + formats, and quickly find them using a combination of boolean + searching and relevance ranking. Search-and-retrieve applications can + be written using APIs in a wide variety of languages, communicating + with the &zebra; server using industry-standard information-retrieval + protocols or web services. + + + &zebra; is licensed Open Source, and can be + deployed by anyone for any purpose without license fees. The C source + code is open to anybody to read and change under the GPL license. + + + &zebra; is a networked component which acts as a reliable &z3950; server + for both record/document search, presentation, insert, update and + delete operations. In addition, it understands the &sru; family of + webservices, which exist in REST GET/POST and truly SOAP flavors. + + + &zebra; is available as MS Windows 2003 Server (32 bit) self-extracting + package as well as GNU/Debian Linux (32 bit and 64 bit) precompiled + packages. It has been deployed successfully on other Unix systems, + including Sun Sparc, HP Unix, and many variants of Linux and BSD + based systems. + + + http://www.indexdata.com/zebra/ + http://ftp.indexdata.dk/pub/zebra/win32/ + http://ftp.indexdata.dk/pub/zebra/debian/ + + - Zebra + &zebra; is a high-performance, general-purpose structured text indexing and retrieval engine. It reads records in a variety of input formats (eg. email, XML, MARC) and provides access @@ -15,11 +48,11 @@ - Zebra supports large databases (tens of millions of records, + &zebra; supports large databases (tens of millions of records, tens of gigabytes of data). It allows safe, incremental - database updates on live systems. Because Zebra supports + database updates on live systems. Because &zebra; supports the industry-standard information retrieval protocol, Z39.50, - you can search Zebra databases using an enormous variety of + you can search &zebra; databases using an enormous variety of programs and toolkits, both commercial and free, which understand this protocol. Application libraries are available to allow bespoke clients to be written in Perl, C, C++, Java, Tcl, Visual @@ -29,7 +62,7 @@ - This document is an introduction to the Zebra system. It explains + This document is an introduction to the &zebra; system. It explains how to compile the software, how to prepare your first database, and how to configure the server to give you the functionality that you need. @@ -37,11 +70,11 @@
- Zebra Features Overview + &zebra; Features Overview
- Zebra Features Overview + &zebra; Features Overview @@ -125,7 +158,7 @@ Document storage Index-only, Key storage, Document storage Data can be, and usually is, imported - into Zebra's own storage, but Zebra can also refer to + into &zebra;'s own storage, but &zebra; can also refer to external files, building and maintaining indexes of "live" collections. @@ -152,7 +185,7 @@ Supported Platforms UNIX, Linux, Windows (NT/2000/2003/XP) - Zebra is written in portable C, so it runs on most + &zebra; is written in portable C, so it runs on most Unix-like systems as well as Windows (NT/2000/2003/XP). Binary distributions are available for GNU/Debian Linux and Windows @@ -251,9 +284,9 @@
- References and Zebra based Applications + References and &zebra; based Applications - Zebra has been deployed in numerous applications, in both the + &zebra; has been deployed in numerous applications, in both the academic and commercial worlds, in application domains as diverse as bibliographic catalogues, geospatial information, structured vocabulary browsing, government information locators, civic @@ -278,7 +311,7 @@ LibLime, a company that is marketing and supporting Koha, adds in - the new release of Koha 3.0 the Zebra + the new release of Koha 3.0 the &zebra; database server to drive its bibliographic database. @@ -287,10 +320,10 @@ in the Koha 2.x series. After extensive evaluations of the best of the Open Source textual database engines - including MySQL full-text searching, PostgreSQL, Lucene and Plucene - the team - selected Zebra. + selected &zebra;. - "Zebra completely eliminates scalability limitations, because it + "&zebra; completely eliminates scalability limitations, because it can support tens of millions of records." explained Joshua Ferraro, LibLime's Technology President and Koha's Project Release Manager. "Our performance tests showed search results in @@ -298,16 +331,16 @@ modest i386 900Mhz test server." - "Zebra also includes support for true boolean search expressions + "&zebra; also includes support for true boolean search expressions and relevance-ranked free-text queries, both of which the Koha - 2.x series lack. Zebra also supports incremental and safe + 2.x series lack. &zebra; also supports incremental and safe database updates, which allow on-the-fly record - management. Finally, since Zebra has at its heart the Z39.50 + management. Finally, since &zebra; has at its heart the Z39.50 protocol, it greatly improves Koha's support for that critical library standard." - Although the bibliographic database will be moved to Zebra, Koha + Although the bibliographic database will be moved to &zebra;, Koha 3.0 will continue to use a relational SQL-based database design for the 'factual' database. "Relational database managers have their strengths, in spite of their inability to handle large @@ -338,7 +371,7 @@ As a surplus, 100% MARC compatibility has been achieved using the - Zebra Server from Index Data as backend server. + &zebra; Server from Index Data as backend server.
@@ -354,7 +387,7 @@ UTF8-encoding. - Reindex.net runs on GNU/Debian Linux with Zebra and Simpleserver + Reindex.net runs on GNU/Debian Linux with &zebra; and Simpleserver from Index Data for bibliographic data. The relational database system Sybase 9 XML is used for @@ -422,11 +455,11 @@ bioinformatics. - The Zebra information retrieval indexing machine is used inside + The &zebra; information retrieval indexing machine is used inside the Alvis framework to manage huge collections of natural language processed and enhanced XML data, coming from a topic relevant web crawl. - In this application, Zebra swallows and manages 37GB of XML data + In this application, &zebra; swallows and manages 37GB of XML data in about 4 hours, resulting in search times of fractions of seconds. @@ -449,9 +482,9 @@ The member libraries send in data files representing their periodicals, including both brief bibliographic data and summary holdings. Then 21 individual Z39.50 targets are created, each - using Zebra, and all mounted on the single hardware server. + using &zebra;, and all mounted on the single hardware server. The live service provides a web gateway allowing Z39.50 searching - of all of the targets or a selection of them. Zebra's small + of all of the targets or a selection of them. &zebra;'s small footprint allows a relatively modest system to comfortably host the 21 servers. @@ -469,7 +502,7 @@ In order to evaluate this interface for recall and precision, they - chose Zebra as the basis for retrieval effectiveness. The Zebra + chose &zebra; as the basis for retrieval effectiveness. The &zebra; server contains a copy of the GIRT database, consisting of more than 76000 records in SGML format (bibliographic records from social science), which are mapped to MARC for presentation. @@ -494,7 +527,7 @@
Various web indexes - Zebra has been used by a variety of institutions to construct + &zebra; has been used by a variety of institutions to construct indexes of large web sites, typically in the region of tens of millions of pages. In this role, it functions somewhat similarly to the engine of google or altavista, but for a selected intranet @@ -504,7 +537,7 @@ For example, Liverpool University's web-search facility (see on the home page at - and many sub-pages) works by relevance-searching a Zebra database + and many sub-pages) works by relevance-searching a &zebra; database which is populated by the Harvest-NG web-crawling software. @@ -514,31 +547,31 @@ Kang-Jin Lee - has recently modified the Harvest web indexer to use Zebra as + has recently modified the Harvest web indexer to use &zebra; as its native repository engine. His comments on the switch over from the old engine are revealing:
- The first results after some testing with Zebra are very + The first results after some testing with &zebra; are very promising. The tests were done with around 220,000 SOIF files, which occupies 1.6GB of disk space. - Building the index from scratch takes around one hour with Zebra + Building the index from scratch takes around one hour with &zebra; where [old-engine] needs around five hours. While [old-engine] - blocks search requests when updating its index, Zebra can still + blocks search requests when updating its index, &zebra; can still answer search requests. [...] - Zebra supports incremental indexing which will speed up indexing + &zebra; supports incremental indexing which will speed up indexing even further. While the search time of [old-engine] varies from some seconds - to some minutes depending how expensive the query is, Zebra + to some minutes depending how expensive the query is, &zebra; usually takes around one to three seconds, even for expensive queries. [...] - Zebra can search more than 100 times faster than [old-engine] + &zebra; can search more than 100 times faster than [old-engine] and can process multiple search requests simultaneously @@ -553,20 +586,20 @@
Support - You can get support for Zebra from at least three sources. + You can get support for &zebra; from at least three sources. - First, there's the Zebra web site at + First, there's the &zebra; web site at , which always has the most recent version available for download. - If you have a problem with Zebra, the first thing to do is see + If you have a problem with &zebra;, the first thing to do is see whether it's fixed in the current release. - Second, there's the Zebra mailing list. Its home page at + Second, there's the &zebra; mailing list. Its home page at includes a complete archive of all messages that have ever been - posted on the list. The Zebra mailing list is used both for + posted on the list. The &zebra; mailing list is used both for announcements from the authors (new releases, bug fixes, etc.) and general discussion. You are welcome to seek support there. Join by filling the form on the list home page. @@ -595,7 +628,7 @@ Improved support for XML in search and retrieval. Eventually, - the goal is for Zebra to pull double duty as a flexible + the goal is for &zebra; to pull double duty as a flexible information retrieval engine and high-performance XML repository. The recent addition of XPath searching is one example of the kind of enhancement we're working on. @@ -607,13 +640,13 @@ on this filter has been sponsored by the ALVIS EU project . We expect this filter to mature soon, as it is planned to be included in the version 2.0 - release of Zebra. + release of &zebra;. - Finalisation and documentation of Zebra's C programming + Finalisation and documentation of &zebra;'s C programming API, allowing updates, database management and other functions not readily expressed in Z39.50. We will also consider exposing the API through SOAP. diff --git a/doc/license.xml b/doc/license.xml index ac23303..d20138d 100644 --- a/doc/license.xml +++ b/doc/license.xml @@ -1,21 +1,21 @@ - + License - Zebra Server, + &zebra; Server, Copyright © 1995-2007 Index Data ApS. - Zebra is free software; you can redistribute it and/or modify it under + &zebra; is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. - Zebra is distributed in the hope that it will be useful, but WITHOUT ANY + &zebra; is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. @@ -23,7 +23,7 @@ You should have received a copy of the GNU General Public License - along with Zebra; see the file LICENSE.zebra. If not, write to the + along with &zebra;; see the file LICENSE.zebra. If not, write to the Free Software Foundation, 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA diff --git a/doc/marc_indexing.xml b/doc/marc_indexing.xml index a949a4b..813e246 100644 --- a/doc/marc_indexing.xml +++ b/doc/marc_indexing.xml @@ -2,13 +2,13 @@ - + - Indexing of MARC records by Zebra + Indexing of MARC records by &zebra; - Zebra is suitable for distribution of MARC records via Z39.50. We + &zebra; is suitable for distribution of MARC records via Z39.50. We have a several possibilities to describe the indexing process of MARC records. This document shows these possibilities. @@ -59,7 +59,7 @@ records. At the beginning, we have to define the term index-formula for MARC records. This term helps to understand the notation of extended indexing of MARC records -by Zebra. Our definition is based on the document "The +by &zebra;. Our definition is based on the document "The table of conformity for Z39.50 use attributes and RUSMARC fields". The document is available only in russian language. @@ -69,7 +69,7 @@ The document is available only in russian language. 71-00$a, $g, $h ($c){.$b ($c)} , (1) -We know that Zebra supports a Bib-1 attribute - right truncation. +We know that &zebra; supports a Bib-1 attribute - right truncation. In this case, the index-formula (1) consists from forms, defined in the same way as (1) @@ -138,13 +138,13 @@ forms, defined in the same way as (1) -Notation of <emphasis>index-formula</emphasis> for Zebra +Notation of <emphasis>index-formula</emphasis> for &zebra; Extended indexing overloads path of -elm definition in abstract syntax file of Zebra +elm definition in abstract syntax file of &zebra; (.abs file). It means that names beginning with -"mc-" are interpreted by Zebra as +"mc-" are interpreted by &zebra; as index-formula. The database index is created and linked with access point (Bib-1 use attribute) according to this formula. @@ -284,7 +284,7 @@ elm mc-100___$a[0-7]_ Date/time-added-to-db ! elm 70._1_$a,_$g_ Author !:w,!:p -When Zebra finds a field according to "70." pattern it checks +When &zebra; finds a field according to "70." pattern it checks the indicators. In this case the value of first indicator doesn't mater, but the value of second one must be whitespace, in another case a field is not indexed. diff --git a/doc/querymodel.xml b/doc/querymodel.xml index cdb344d..afdb407 100644 --- a/doc/querymodel.xml +++ b/doc/querymodel.xml @@ -1,5 +1,5 @@ - + Query Model
@@ -9,7 +9,7 @@ Query Languages - Zebra is born as a networking Information Retrieval engine adhering + &zebra; is born as a networking Information Retrieval engine adhering to the international standards Z39.50 and SRU, @@ -42,7 +42,7 @@ Prefix Query Notation, or in short PQN. See for further explanations and - descriptions of Zebra's capabilities. + descriptions of &zebra;'s capabilities.
@@ -56,7 +56,7 @@ CQL is not natively supported.
- Zebra can be configured to understand and map CQL to PQF. See + &zebra; can be configured to understand and map CQL to PQF. See .
@@ -66,7 +66,7 @@
Operation types - Zebra supports all of the three different + &zebra; supports all of the three different Z39.50/SRU operations defined in the standards: explain, search, and scan. A short description of the @@ -150,7 +150,7 @@ The PQF grammar is documented in the YAZ manual, and shall not be repeated here. This textual PQF representation - is not transmistted to Zebra during search, but it is in the + is not transmistted to &zebra; during search, but it is in the client mapped to the equivalent Z39.50 binary query parse tree. @@ -173,13 +173,13 @@ Attribute sets Attribute sets define the exact meaning and semantics of queries - issued. Zebra comes with some predefined attribute set + issued. &zebra; comes with some predefined attribute set definitions, others can easily be defined and added to the configuration.
- Attribute sets predefined in Zebra + Attribute sets predefined in &zebra; @@ -206,7 +206,7 @@ Standard PQF query language attribute set which defines the semantics of Z39.50 searching. In addition, all of the non-use attributes (types 2-12) define the hard-wired - Zebra internal query + &zebra; internal query processing. default @@ -237,7 +237,7 @@ - The Zebra internal query processing is modeled after + The &zebra; internal query processing is modeled after the Bib-1 attribute set, and the non-use attributes type 2-6 are hard-wired in. It is therefore essential to be familiar with . @@ -349,7 +349,7 @@ Atomic (APT) queries are always leaf nodes in the PQF query tree. UN-supplied non-use attributes types 2-12 are either inherited from - higher nodes in the query tree, or are set to Zebra's default values. + higher nodes in the query tree, or are set to &zebra;'s default values. See for details. @@ -369,7 +369,7 @@ List of orthogonal attributes Any of the orthogonal attribute types may be omitted, these are inherited from higher query tree nodes, or if not - inherited, are set to the default Zebra configuration values. + inherited, are set to the default &zebra; configuration values. @@ -427,7 +427,7 @@
Named Result Sets - Named result sets are supported in Zebra, and result sets can be + Named result sets are supported in &zebra;, and result sets can be used as operands without limitations. It follows that named result sets are leaf nodes in the PQF query tree, exactly as atomic APT queries are. @@ -462,28 +462,28 @@ Named result sets are only supported by the Z39.50 protocol. The SRU web service is stateless, and therefore the notion of - named result sets does not exist when accessing a Zebra server by + named result sets does not exist when accessing a &zebra; server by the SRU protocol.
- Zebra's special access point of type 'string' + &zebra;'s special access point of type 'string' The numeric use (type 1) attribute is usually referred to from a given - attribute set. In addition, Zebra let you use + attribute set. In addition, &zebra; let you use any internal index name defined in your configuration as use attribute value. This is a great feature for debugging, and when you do not need the complexity of defined use attribute values. It is - the preferred way of accessing Zebra indexes directly. + the preferred way of accessing &zebra; indexes directly. Finding all documents which have the term list "information - retrieval" in an Zebra index, using it's internal full string + retrieval" in an &zebra; index, using it's internal full string name. Scanning the same index. Z> find @attr 1=sometext "information retrieval" @@ -518,7 +518,7 @@
- Zebra's special access point of type 'XPath' + <title>&zebra;'s special access point of type 'XPath' for GRS filters As we have seen above, it is possible (albeit seldom a great @@ -634,7 +634,7 @@ Explain attribute set Exp-1, which is used to discover information about a server's search semantics and functional capabilities - Zebra exposes a "classic" + &zebra; exposes a "classic" Explain database by base name IR-Explain-1, which is populated with system internal information. @@ -679,7 +679,7 @@ Classic Explain only defines retrieval of Explain information via ASN.1. Practically no Z39.50 clients supports this. Fortunately - they don't have to - Zebra allows retrieval of this information + they don't have to - &zebra; allows retrieval of this information in other formats: SUTRS, XML, GRS-1 and ASN.1 Explain. @@ -741,7 +741,7 @@ Get attribute details record for database Default. - This query is very useful to study the internal Zebra indexes. + This query is very useful to study the internal &zebra; indexes. If records have been indexed using the alvis XSLT filter, the string representation names of the known indexes can be found. @@ -770,7 +770,7 @@ Attribute Set version from 2003. Index Data is not the copyright holder of this information, except for the configuration details, the listing of - Zebra's capabilities, and the example queries. + &zebra;'s capabilities, and the example queries. @@ -814,7 +814,7 @@ be sourced in the main configuration zebra.cfg. - In addition, Zebra allows the access of + In addition, &zebra; allows the access of internal index names and dynamic XPath as use attributes; see and @@ -835,7 +835,7 @@
- Zebra general Bib1 Non-Use Attributes (type 2-6) + &zebra; general Bib1 Non-Use Attributes (type 2-6)
Relation Attributes (type 2) @@ -1029,12 +1029,12 @@ - Zebra only supports first-in-field seaches if the + &zebra; only supports first-in-field seaches if the firstinfield is enabled for the index Refer to . - Zebra does not distinguish between first in field and + &zebra; does not distinguish between first in field and first in subfield. They result in the same hit count. - Searching for first position in (sub)field in only supported in Zebra + Searching for first position in (sub)field in only supported in &zebra; 2.0.2 and later. @@ -1046,7 +1046,7 @@ The structure attribute specifies the type of search term. This causes the search to be mapped on - different Zebra internal indexes, which must have been defined + different &zebra; internal indexes, which must have been defined at index time. @@ -1189,7 +1189,7 @@ The structure attribute value Local number (107) - is supported, and maps always to the Zebra internal document ID, + is supported, and maps always to the &zebra; internal document ID, irrespectively which use attribute is specified. The following queries have exactly the same unique record in the hit set: @@ -1213,7 +1213,7 @@ - The exact mapping between PQF queries and Zebra internal indexes + The exact mapping between PQF queries and &zebra; internal indexes and index types is explained in . @@ -1330,7 +1330,7 @@ The truncation attribute value - Regexp-2 (103) is a Zebra specific extension + Regexp-2 (103) is a &zebra; specific extension which allows fuzzy matches. One single error in spelling of search terms is allowed, i.e., a document is hit if it includes a term which can be mapped to the used @@ -1401,7 +1401,7 @@ Incomplete subfield (1) is the default, and - makes Zebra use + makes &zebra; use register type="w", whereas Complete field (3) triggers search and scan in index type="p". @@ -1409,13 +1409,13 @@ The Complete subfield (2) is a reminiscens from the happy MARC - binary format days. Zebra does not support it, but maps silently + binary format days. &zebra; does not support it, but maps silently to Complete field (3). - The exact mapping between PQF queries and Zebra internal indexes + The exact mapping between PQF queries and &zebra; internal indexes and index types is explained in . @@ -1427,9 +1427,9 @@
- Extended Zebra RPN Features + Extended &zebra; RPN Features - The Zebra internal query engine has been extended to specific needs + The &zebra; internal query engine has been extended to specific needs not covered by the bib-1 attribute set query model. These extensions are non-standard and non-portable: most functional extensions @@ -1441,9 +1441,9 @@
- Zebra specific retrieval of all records + &zebra; specific retrieval of all records - Zebra defines a hardwired string index name + &zebra; defines a hardwired string index name called _ALLRECORDS. It matches any record contained in the database, if used in conjunction with the relation attribute @@ -1470,28 +1470,28 @@ The special string index _ALLRECORDS is experimental, and the provided functionality and syntax may very - well change in future releases of Zebra. + well change in future releases of &zebra;.
- Zebra Search Attribute Extensions + &zebra; Search Attribute Extensions Name Value Operation - Zebra version + &zebra; version @@ -1542,7 +1542,7 @@
- Zebra Extension Embedded Sort Attribute (type 7) + &zebra; Extension Embedded Sort Attribute (type 7) The embedded sort is a way to specify sort within a query - thus removing the need to send a Sort Request separately. It is both @@ -1584,7 +1584,7 @@
@@ -2438,14 +2438,14 @@
diff --git a/doc/quickstart.xml b/doc/quickstart.xml index c3beee4..abb816a 100644 --- a/doc/quickstart.xml +++ b/doc/quickstart.xml @@ -1,12 +1,12 @@ - + Quick Start In this section, we will test the system by indexing a small set of - sample GILS records that are included with the Zebra distribution, - running a Zebra server against the newly created database, and + sample GILS records that are included with the &zebra; distribution, + running a &zebra; server against the newly created database, and searching the indexes with a client that connects to that server. @@ -35,7 +35,7 @@ - The Zebra index that you have just created has a single database + The &zebra; index that you have just created has a single database named Default. The database contains records structured according to the GILS profile, and the server will diff --git a/doc/recordmodel-alvisxslt.xml b/doc/recordmodel-alvisxslt.xml index e64bb84..328bbce 100644 --- a/doc/recordmodel-alvisxslt.xml +++ b/doc/recordmodel-alvisxslt.xml @@ -1,15 +1,15 @@ - - ALVIS XML Record Model and Filter Module + + ALVIS &xml; Record Model and Filter Module The record model described in this chapter applies to the fundamental, - structured XML + structured &xml; record type alvis, introduced in - . The ALVIS XML record model + . The ALVIS &xml; record model is experimental, and it's inner workings might change in future - releases of the Zebra Information Server. + releases of the &zebra; Information Server. This filter has been developed under the @@ -22,7 +22,7 @@
ALVIS Record Filter - The experimental, loadable Alvis XML/XSLT filter module + The experimental, loadable Alvis &xml;/XSLT filter module mod-alvis.so is packaged in the GNU/Debian package libidzebra1.4-mod-alvis. It is invoked by the zebra.cfg configuration statement @@ -35,7 +35,7 @@ path db/filter_alvis_conf.xml. The Alvis XSLT filter configuration file must be - valid XML. It might look like this (This example is + valid &xml;. It might look like this (This example is used for indexing and display of OAI harvested records): <?xml version="1.0" encoding="UTF-8"?> @@ -66,7 +66,7 @@ The <split level="2"/> decides where the - XML Reader shall split the + &xml; Reader shall split the collections of records into individual records, which then are loaded into DOM, and have the indexing XSLT stylesheet applied. @@ -78,13 +78,13 @@
ALVIS Internal Record Representation - When indexing, an XML Reader is invoked to split the input - files into suitable record XML pieces. Each record piece is then - transformed to an XML DOM structure, which is essentially the + When indexing, an &xml; Reader is invoked to split the input + files into suitable record &xml; pieces. Each record piece is then + transformed to an &xml; DOM structure, which is essentially the record model. Only XSLT transformations can be applied during index, search and retrieval. Consequently, output formats are - restricted to whatever XSLT can deliver from the record XML - structure, be it other XML formats, HTML, or plain text. In case + restricted to whatever XSLT can deliver from the record &xml; + structure, be it other &xml; formats, HTML, or plain text. In case you have libxslt1 running with EXSLT support, you can use this functionality inside the Alvis filter configuration XSLT stylesheets. @@ -127,13 +127,13 @@ </z:record> - This means the following: From the original XML file - one-record.xml (or from the XML record DOM of the + This means the following: From the original &xml; file + one-record.xml (or from the &xml; record DOM of the same form coming from a splitted input file), the indexing - stylesheet produces an indexing XML record, which is defined by + stylesheet produces an indexing &xml; record, which is defined by the record element in the magic namespace xmlns:z="http://indexdata.dk/zebra/xslt/1". - Zebra uses the content of + &zebra; uses the content of z:id="oai:JTRS:CP-3290---Volume-I" as internal record ID, and - in case static ranking is set - the content of z:rank="47896" as static rank. Following the @@ -236,7 +236,7 @@ As mentioned above, there can be only one indexing stylesheet, and configuration of the indexing process is a synonym - of writing an XSLT stylesheet which produces XML output containing the + of writing an XSLT stylesheet which produces &xml; output containing the magic elements discussed in . Obviously, there are million of different ways to accomplish this @@ -246,19 +246,19 @@ Stylesheets can be written in the pull or the push style: pull - means that the output XML structure is taken as starting point of + means that the output &xml; structure is taken as starting point of the internal structure of the XSLT stylesheet, and portions of - the input XML are pulled out and inserted - into the right spots of the output XML structure. On the other + the input &xml; are pulled out and inserted + into the right spots of the output &xml; structure. On the other side, push XSLT stylesheets are recursavly calling their template definitions, a process which is commanded - by the input XML structure, and avake to produce some output XML + by the input &xml; structure, and avake to produce some output &xml; whenever some special conditions in the input styelsheets are met. The pull type is well-suited for input - XML with strong and well-defined structure and semantcs, like the + &xml; with strong and well-defined structure and semantcs, like the following OAI indexing example, whereas the push type might be the only possible way to - sort out deeply recursive input XML formats. + sort out deeply recursive input &xml; formats. A pull stylesheet example used to index @@ -313,16 +313,16 @@ Notice also, that the names and types of the indexes can be defined in the indexing XSLT stylesheet dynamically according to - content in the original XML records, which has + content in the original &xml; records, which has opportunities for great power and wizardery as well as grande disaster. The following excerpt of a push stylesheet might - be a good idea according to your strict control of the XML + be a good idea according to your strict control of the &xml; input format (due to rigerours checking against well-defined and - tight RelaxNG or XML Schema's, for example): + tight RelaxNG or &xml; Schema's, for example): @@ -333,11 +333,11 @@ ]]> This template creates indexes which have the name of the working - node of any input XML file, and assigns a '1' to the index. + node of any input &xml; file, and assigns a '1' to the index. The example query find @attr 1=xyz 1 finds all files which contain at least one - xyz XML element. In case you can not control + xyz &xml; element. In case you can not control which element names the input files contain, you might ask for disaster and bad karma using this technique. @@ -378,15 +378,15 @@ XSLT transformation, as far as the stylesheet is registered in the main Alvis XSLT filter configuration file, see . - In principle anything that can be expressed in XML, HTML, and + In principle anything that can be expressed in &xml;, HTML, and TEXT can be the output of a schema or element set directive during search, as long as the information comes from the - original input record XML DOM tree - (and not the transformed and indexed XML!!). + original input record &xml; DOM tree + (and not the transformed and indexed &xml;!!). - In addition, internal administrative information from the Zebra + In addition, internal administrative information from the &zebra; indexer can be accessed during record retrieval. The following example is a summary of the possibilities: @@ -492,7 +492,7 @@ c) Main "alvis" XSLT filter config file: see: http://www.indexdata.com/yaz/doc/tools.tkl#tools.cql.map - in db/ an indexing XSLT stylesheet. This is a PULL-type XSLT thing, - as it constructs the new XML structure by pulling data out of the + as it constructs the new &xml; structure by pulling data out of the respective elements/attributes of the old structure. Notice the special zebra namespace, and the special elements in this @@ -502,7 +502,7 @@ c) Main "alvis" XSLT filter config file: indicates that a new record with given id and static rank has to be updated. - encloses all the text/XML which shall be indexed in the index named + encloses all the text/&xml; which shall be indexed in the index named "title" and of index type "w" (see file default.idx in your zebra installation) diff --git a/doc/recordmodel-grs.xml b/doc/recordmodel-grs.xml index 68744b0..c370ded 100644 --- a/doc/recordmodel-grs.xml +++ b/doc/recordmodel-grs.xml @@ -1,5 +1,5 @@ - + GRS Record Model and Filter Modules @@ -33,7 +33,7 @@ grs.marc.type - This allows Zebra to read + This allows &zebra; to read records in the ISO2709 (MARC) encoding standard. Last parameter type names the .abs file (see below) @@ -58,7 +58,7 @@ grs.marcxml.type - This allows Zebra to read ISO2709 encoded records. + This allows &zebra; to read ISO2709 encoded records. Last parameter type names the .abs file (see below) which describes the specific MARC structure of the input record as @@ -83,12 +83,12 @@ This filter reads XML records and uses Expat to - parse them and convert them into IDZebra's internal + parse them and convert them into ID&zebra;'s internal grs record model. Only one record per file is supported, due to the fact XML does not allow two documents to "follow" each other (there is no way to know when a document is finished). - This filter is only available if Zebra is compiled with EXPAT support. + This filter is only available if &zebra; is compiled with EXPAT support. The loadable grs.xml filter module @@ -136,7 +136,7 @@ Although input data can take any form, it is sometimes useful to describe the record processing capabilities of the system in terms of a single, canonical input format that gives access to the full - spectrum of structure and flexibility in the system. In Zebra, this + spectrum of structure and flexibility in the system. In &zebra;, this canonical format is an "SGML-like" syntax. @@ -175,7 +175,7 @@ + - Zebra - User's Guide and Reference + &zebra; - User's Guide and Reference - - AdamDickmeiss - - - HeikkiLevanto - - - MarcCromme - - - MikeTaylor - - - SebastianHammer - + &adam; + &heikki; + &marccromme; + &mike; + &sebastian; &version; @@ -39,18 +29,18 @@ - Zebra is a free, fast, friendly information management system. It - can index records in XML/SGML, MARC, e-mail archives and many + &zebra; is a free, fast, friendly information management system. It + can index records in &xml;, &sgml;, &marc;, e-mail archives and many other formats, and quickly find them using a combination of boolean searching and relevance ranking. Search-and-retrieve applications can be written using APIs in a wide variety of - languages, communicating with the Zebra server using + languages, communicating with the &zebra; server using industry-standard information-retrieval protocols or web services. - This manual explains how to build and install Zebra, configure it + This manual explains how to build and install &zebra;, configure it appropriately for your application, add data and set up a running - information service. It describes version &version; of Zebra. + information service. It describes version &version; of &zebra;. diff --git a/doc/zebraidx.xml b/doc/zebraidx.xml index 609ad35..8b4f8b0 100644 --- a/doc/zebraidx.xml +++ b/doc/zebraidx.xml @@ -8,7 +8,7 @@ %common; ]> - + zebra @@ -22,7 +22,7 @@ zebraidx - Zebra Administrative Tool + &zebra; Administrative Tool @@ -48,7 +48,7 @@ DESCRIPTION zebraidx allows you to insert, delete or updates - records in Zebra. zebraidx accepts a set options + records in &zebra;. zebraidx accepts a set options (see below) and exactly one command (mandatory). @@ -64,7 +64,7 @@ directory. If no directory is provided, a list of files is read from stdin. - See Administration in the Zebra + See Administration in the &zebra; Manual. @@ -86,7 +86,7 @@ commands to the register. This command is only available if the use of shadow register files is enabled (see Shadow Registers in the - Zebra Manual). + &zebra; Manual). @@ -129,7 +129,7 @@ and grs.subtype. Generally, it is probably advisable to specify the record types in the zebra.cfg file (see - Record Types in the Zebra manual), + Record Types in the &zebra; manual), to avoid confusion at subsequent updates. @@ -150,8 +150,8 @@ Update the files according to the group settings for group - (see Zebra Configuration File in - the Zebra manual). + (see &zebra; Configuration File in + the &zebra; manual). @@ -200,7 +200,7 @@ Disable the use of shadow registers for this operation (see Shadow Registers in - the Zebra manual). + the &zebra; manual). @@ -219,7 +219,7 @@ -V - Show Zebra version. + Show &zebra; version. -- 1.7.10.4