From 528edc9943ba3311a40b4ab875c0bc9aca5caa87 Mon Sep 17 00:00:00 2001 From: Mike Taylor Date: Tue, 8 Oct 2002 08:09:43 +0000 Subject: [PATCH] Rolling commit --- doc/examples.xml | 148 +++++++++++++++++++++++++------------------------- doc/installation.xml | 9 +-- doc/introduction.xml | 91 ++++++++++++++++++++++--------- doc/quickstart.xml | 41 +++++++------- doc/zebra.xml.in | 4 +- 5 files changed, 168 insertions(+), 125 deletions(-) diff --git a/doc/examples.xml b/doc/examples.xml index 153eaed..53f839d 100644 --- a/doc/examples.xml +++ b/doc/examples.xml @@ -1,5 +1,5 @@ - + Example Configurations @@ -19,106 +19,119 @@ - Where to find the default indexing rules (### default.idx) + Where to find subsidiary configuration files, including + default.idx + which specifies the default indexing rules. - ### Something to do with explain.abs?! + What attribute sets to recognise in searches. - ### Where to find other configuration files, e.g. searches using - BIB-1 attributes require a bib1.att configuration file (even if - the access point is actually an XPath expression). These are - searched for in the working directory unless otherwise - specified. + Policy details such as what record type to expect, what + low-level indexing algorithm to use, how to identify potential + duplicate records, etc. + + Now let's see what goes in the zebra.cfg file + for some example configurations. + - Example 1: Minimal Configuration + Example 1: XML Indexing And Searching This example shows how Zebra can be used with absolutely minimal - configuration to index a body of XML documents, and search them - using XPath expressions to specify access points. + configuration to index a body of + XML + documents, and search them using + XPath + expressions to specify access points. - Go to the zebra/examples/dinosauricon directory. + Go to the examples/dinosauricon subdirectory + of the distribution archive. There you will find a records subdirectory, which contains some raw XML data to be added to the database: in - this case, two files, genera.xml and - taxa.xml, which contain information about all - the known dinosaur genera as of August 2002. + this case, as single file, genera.xml, + which contain information about all the known dinosaur genera as of + August 2002. Now we need to create the Zebra database, which we do with the - Zebra indexer, zebraidx. This program's - behaviour is driven by a configuration life, generally called - zebra.cfg, although this can be changed with the - -c option. For our purposes, we don't need any - special behaviour - we can use the defaults - so an empty - configuration will do just fine. We can either create an empty - zebra.cfg or specify the name of an existing - empty file using, for example, -c /dev/null. - - - In this case, we'll use an empty zebra.cfg so - we can add more configuration to it later. + Zebra indexer, zebraidx, which is + driven by the zebra.cfg configuration file. + For our purposes, we don't need any + special behaviour - we can use the defaults - so we start with a + minimal file that just tells zebraidx where to + find the default indexing rules, and how to parse the records: + + profilePath: .:../../tab:../../../yaz/tab + recordType: grs.sgml + That's all you need for a minimal Zebra configuration. Now you can roll the XML records into the database and build the indexes: - zebraidx -t grs.sgml update records + zebraidx update records - (### What does "grs.sgml" actually mean?) Now start the server. Like the indexer, its behaviour is - controlled by a configuration file, generally - zebra.cfg; and like the indexer, it works just - fine with an empty configuration. + controlled by the + zebra.cfg file; and like the indexer, it works + just fine with this minimal configuration. zebrasrv By default, the server listens on IP port number 9999, although - this can easily be changed. + this can easily be changed - see + . Now you can use the Z39.50 client program of your choice to execute XPath-based boolean queries and fetch the XML records that satisfy them: - Z> open tcp:@:9999 - Connecting...Ok. - Z> find @attr 1=/GENUS/MEANING @or vertebra jaw - Number of hits: 1 - Z> format xml - Z> show 1 - Z> show 1 - <GENUS name="Hudiesaurus" type="with" xmlns:idzebra="http://www.indexdata.dk/zebra/"> - <MEANING> - butterfly <LOW>vertebra</LOW> lizard - </MEANING> - <LENGTH value="30"></LENGTH> - <PLACE name="China"></PLACE> - <REMAINS content="4 teeth, forelimb, first dorsal vertebra"></REMAINS> - <SPECIES name="sinojapanorum" status="nudum"> - <AUTHOR name="Dong" year="1997"></AUTHOR> - <MEANING> - Chinese-Japanese - </MEANING> - </SPECIES> - <idzebra:size>359</idzebra:size><idzebra:localnumber>447</idzebra:localnumber><idzebra:filename>records/genera.xml</idzebra:filename></GENUS> + $ yaz-client tcp:@:9999 + Connecting...Ok. + Z> find @attr 1=/GENUS/MEANING @and lizard earthquakes + Number of hits: 1 + Z> format xml + Z> show 1 + <GENUS name="Sauroposeidon" type="with"> + <MEANING>lizard Poseidon <LOW>(Greek god of, among other things, earthquakes)</LOW></MEANING> + <SPECIES name="proteles"> + <AUTHOR type="vide" name="Franklin" year="2000"></AUTHOR> + <AUTHOR name="Wedel, Cifelli, Sanders"></AUTHOR> + </SPECIES> + <PLACE name="Oklahoma"></PLACE> + <TIME value="Albian"></TIME> + <LENGTH value="30" q="1"></LENGTH> + <REMAINS content="rib, cervical vertebrae"></REMAINS> + <ESSAY> + <P> This new <NOMEN name="Brachiosaurus"></NOMEN>-like <LINK content="dinosaur"></LINK> + was perhaps the tallest. With its head raised, it stood 60 feet (nearly + 20 m) tall. </P> + </ESSAY> + + <idzebra xmlns="http://www.indexdata.dk/zebra/"> + <size>593</size> + <localnumber>891</localnumber> + <filename>records/genera.xml</filename> + </idzebra> + </GENUS> @@ -127,28 +140,15 @@ - Example 2: Adding Some Configuration + Example 2: Supporting Z39.50 Searches You may have noticed as zebraidx was building - the database that it issued several warnings, which we ignored at - the time: - -zebraidx -t grs.sgml update records -02:12:32-30/08: zebraidx(18151) [warn] default.idx [No such file or directory] -02:12:32-30/08: zebraidx(18151) [warn] Couldn't open explain.abs [No such file or directory] -02:12:32-30/08: zebraidx(18151) [warn] records/genera.xml:0 Couldn't open GENUS.abs [No such file or directory] -02:12:32-30/08: zebraidx(18151) [warn] records/genera.xml:0 Unknown register type: 0 -02:12:32-30/08: zebraidx(18151) [warn] records/genera.xml:0 Unknown register type: w -02:12:35-30/08: zebraidx(18151) [warn] records/taxa.xml:0 Couldn't open TAXON.abs [No such file or directory] - - And the server issued several more as the client connected to it, - then searched for and retrieved a record: + the database that it issued a warning, which we ignored at the + time: -02:17:10-30/08: zebrasrv(18165) [warn] default.idx [No such file or directory] -02:17:10-30/08: zebrasrv(18165) [warn] Couldn't open explain.abs [No such file or directory] -02:17:57-30/08: zebrasrv(18165) [warn] Unknown register type: w -02:18:42-30/08: zebrasrv(18165) [warn] Couldn't open GENUS.abs [No such file or directory] + $ zebraidx update records + 00:45:46-08/10: ../../index/zebraidx(5016) [warn] records/genera.xml:0 Couldn't open GENUS.abs [No such file or directory] @@ -161,7 +161,7 @@ zebraidx -t grs.sgml update records The master configuration file, zebra.cfg, which is as short and simple as it can be: - # $Header: /home/cvsroot/idis/doc/examples.xml,v 1.6 2002-09-20 09:58:04 mike Exp $ + # $Header: /home/cvsroot/idis/doc/examples.xml,v 1.7 2002-10-08 08:09:43 mike Exp $ # Bare-bones master configuration file for Zebra profilePath: .:../../tab:../../../yaz/tab @@ -178,7 +178,7 @@ zebraidx -t grs.sgml update records The BIB-1 attribute set configuration file, bib1.att, which is also as short as possible: - # $Header: /home/cvsroot/idis/doc/examples.xml,v 1.6 2002-09-20 09:58:04 mike Exp $ + # $Header: /home/cvsroot/idis/doc/examples.xml,v 1.7 2002-10-08 08:09:43 mike Exp $ # Bare-bones BIB-1 attribute set file for Zebra reference Bib-1 diff --git a/doc/installation.xml b/doc/installation.xml index e456616..05b9ab5 100644 --- a/doc/installation.xml +++ b/doc/installation.xml @@ -1,5 +1,5 @@ - + Installation An ANSI C compiler is required to compile the Zebra @@ -44,7 +44,7 @@ - When configured build the software by typing: + When configured, build the software by typing: make @@ -53,7 +53,7 @@ - If successful, two executables have been created in the sub-directory + If successful, two executables are created in the sub-directory index. @@ -77,7 +77,8 @@ - You can now use Zebra. If you wish to install it system-wide, type + You can now use Zebra. If you wish to install it system-wide, then + as root type make install diff --git a/doc/introduction.xml b/doc/introduction.xml index 8303231..e305382 100644 --- a/doc/introduction.xml +++ b/doc/introduction.xml @@ -1,5 +1,5 @@ - + Introduction @@ -35,17 +35,6 @@ and how to configure the server to give you the functionality that you need. - - - If you use Zebra, you should visit its - web site, - where you can join the - - mailing-list - by sending email to - ### zebra-subscribe@mailman.indexdata.dk - - @@ -69,7 +58,7 @@ Arbitrarily complex records. The internal data format is an structured format conceptually similar to XML or GRS-1, - which allows nested structured data elements and + which allows lists, nested structured data elements and variant forms of data. @@ -90,8 +79,9 @@ Configurable to understand many input formats. A system of input filters driven by - regular expressions allows you to easily process most ASCII-based - data formats. SGML, XML, ISO2709 (MARC), and raw text are also + regular expressions allows most ASCII-based + data formats to be easily processed. + SGML, XML, ISO2709 (MARC), and raw text are also supported. @@ -101,7 +91,7 @@ Searching supports a powerful combination of boolean queries as well as relevance-ranking (free-text) queries. Truncation, masking, full regular expression matching and "approximate - matching" (eg. spelling mistakes) are all supported. + matching" (eg. spelling mistakes) are all handled. @@ -118,7 +108,8 @@ Zebra is written in portable C, so it runs on most Unix-like systems as well as Windows NT. A binary distribution for Windows NT is - available. + available at + @@ -134,14 +125,18 @@ - Protocol facilities: Init, Search, Present (retrieval), Delete, - Scan (index browsing) and Sort. + Protocol facilities: Init, Search, Present (retrieval), + Segmentation (support for very large records), Delete, Scan + (index browsing), Sort, Close and some Extended Services. - Piggy-backed presents are honored in the search-request. + Piggy-backed presents are honored in the search request - that + is, a subset of the found records can be returned directly with + a search response, enabling search and retrieval to happen in a + single round-trip. @@ -238,7 +233,7 @@ (GIRT is the German Indexing and Retrieval Testdatabase. It is a standard German-language test database for intelligent indexing and retrieval systems. See - + ) Evaluation will take place as part of the TREC/CLEF campaign 2003 @@ -253,8 +248,9 @@ ULS (Union List of Serials) - The London School of Economics (### I think) - are involved in a projects called ULS to provide a union catalogue + The M25-Link systems team + () + are involved in a project called ULS to provide a union catalogue for periodicals in 21 member libraries. They do this with an unusual architecture which they call a ``non-distributed virtual union catalogue''. @@ -265,7 +261,9 @@ holdings. Then 21 individual Z39.50 targets are created, each using Zebra, and all mounted on the single hardware server. The live service provides a web gateway allowing Z39.50 searching - of all 21 targets or a selection of them. + of all of the targets or a selection of them. Zebra's small + footprint allows a relatively modest system to comfortably host + the 21 servers. More information can be found at @@ -280,7 +278,7 @@ indexes of large web sites, typically in the region of tens of millions of pages. In this role, it functions somewhat similarly to the engine of google or altavista, but for a selected intranet - or subset of the whole Web. + or a subset of the whole Web. For example, Liverpool University's web-search facility (see on @@ -296,6 +294,39 @@ + + + Support + + You can get support for Zebra from at least three sources. + + + First, there's the Zebra web site at + , + which always has the most recent version available for download. + If you have a problem with Zebra, the first thing to do is see + whether it's fixed in the current release. + + + Second, there's the Zebra mailing list. Its home page at + + includes a complete archive of all messages that have ever been + posted on the list. The Zebra mailing list is used both for + announcements from the authors (new + releases, bug fixes, etc.) and general discussion. You are welcome + to seek support there. Join by sending email to + zebra-subscribe-###@mailman.indexdata.dk + + + Third, it's possible to buy a commercial support contract, with + well defined service levels and response times, from Index Data. + See + + for details. + + + + Future Directions @@ -314,6 +345,9 @@ information retrieval engine and high-performance XML repository. + + ### Partially done. + @@ -321,6 +355,9 @@ Access to search engine through SOAP/RPC API to allow the construction of applications without requiring Z39.50 tools. + + ### Partially done, thanks to the new SRW/Z39.50 gateway. + @@ -352,7 +389,9 @@ If you think it's all really neat, you're welcome to drop us a line - saying that, too. You'll find contact info at the end of this file. + saying that, too. You can email us on + indo@indexdata.dk + or check the contact info at the end of this manual. diff --git a/doc/quickstart.xml b/doc/quickstart.xml index 8cac9bc..f962421 100644 --- a/doc/quickstart.xml +++ b/doc/quickstart.xml @@ -1,5 +1,5 @@ - + Quick Start @@ -15,17 +15,21 @@ file named zebra.cfg with the following contents: - # Where are the YAZ tables located. - profilePath: ../../../yaz/tab ../../tab - + # Where the schema files, attribute files, etc are located. + profilePath: .:../../tab:../../../yaz/tab:/usr/local/share/yaz/tab:/usr/share/yaz/tab + # Files that describe the attribute sets supported. attset: bib1.att attset: gils.att + attset: explain.att + + recordtype: grs.sgml + isam: c - Now, edit the file and set profilePath to the path of the + If necessary, edit the file and set profilePath to the path of the YAZ profile tables (sub directory tab of the YAZ distribution archive). @@ -35,14 +39,12 @@ records. To index these, type: - $ ../../index/zebraidx -t grs.sgml update records + zebraidx update records - In the command above the option -t specified the record - type — in this case grs.sgml. - The word update followed + In the command above, the word update followed by a directory root updates all files below that directory node. @@ -51,7 +53,7 @@ fire up a server. To start a server on port 2100, type: - $ ../../index/zebrasrv tcp:@:2100 + zebrasrv tcp:@:2100 @@ -66,14 +68,12 @@ - To test the server, you can use any Z39.50 client (1992 or later). - For instance, you can use the demo client that comes with YAZ: Just - cd to the client subdirectory of the YAZ distribution - and type: + To test the server, you can use any Z39.50 client. + For instance, you can use the demo client that comes with YAZ: - $ ./yaz-client tcp:localhost:2100 + yaz-client tcp:localhost:2100 @@ -81,8 +81,7 @@ When the client has connected, you can type: - - + Z> find surficial Z> show 1 @@ -114,8 +113,12 @@ - If you've made it this far, there's a good chance that - you've got through the compilation OK. + If you've made it this far, you know that your installation is + working, but there's a certain amount of voodoo going on - for + example, the mysterious incantations in the + zebra.cfg file. In order to help us understand + these fully, the next chapter will work through a series of + increasingly complex example configurations. diff --git a/doc/zebra.xml.in b/doc/zebra.xml.in index b9980d9..2ab791d 100644 --- a/doc/zebra.xml.in +++ b/doc/zebra.xml.in @@ -13,7 +13,7 @@ ]> - + Zebra - User's Guide and Reference @@ -38,7 +38,7 @@ Zebra is a free, fast, friendly information management system. It can index records in XML/SGML, MARC, e-mail archives and many other formats, and quickly find them using a combination of - boolean searching and relevance ranking. Search/retrieval + boolean searching and relevance ranking. Search-and-retrieve applications can be written using APIs in a wide variety of languages, communicating with the Zebra server using industry-standard information-retrieval protocols. -- 1.7.10.4