X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Fintroduction.xml;h=e38516cab00a766f024aff87f8e7003f58a8acf8;hb=72e26b79a2b55e5aaec932bad7e645e83824c5c4;hp=3e4d19f54a263530394fbab2a796ee19c36fb2fa;hpb=b7fc2a00e8b425dafdee22ec0fd73599f84b1760;p=idzebra-moved-to-github.git diff --git a/doc/introduction.xml b/doc/introduction.xml index 3e4d19f..e38516c 100644 --- a/doc/introduction.xml +++ b/doc/introduction.xml @@ -1,15 +1,14 @@ - + Introduction Overview - - Zebra + Zebra is a high-performance, general-purpose structured text - indexing and retrieval engine. It reads structured records in a + indexing and retrieval engine. It reads records in a variety of input formats (eg. email, XML, MARC) and provides access to them through a powerful combination of boolean search expressions and relevance-ranked free-text queries. @@ -49,7 +48,7 @@ - Very large databases: files for indexes, etc. can be + Very large databases: logical files can be automatically partitioned over multiple disks. @@ -57,7 +56,7 @@ Arbitrarily complex records. The internal data format - is an structured format conceptually similar to XML or GRS-1, + is a structured format conceptually similar to XML or GRS-1, which allows lists, nested structured data elements and variant forms of data. @@ -236,7 +235,7 @@ NLI-Z39.50 - a Natural Language Interface for Libraries - Fernuniversität Hagen in Germany have developed a natural + Fernuniversität Hagen in Germany have developed a natural language interface for access to library databases. In order to evaluate this interface for recall and precision, they @@ -304,9 +303,45 @@ which is populated by the Harvest-NG web-crawling software. - For more information, contact John Gilbertson + For more information on Liverpool university's intranet search + architecture, contact John Gilbertson jgilbert@liverpool.ac.uk + + Kang-Jin Lee + lee@arco.de, + has recently modified the Harvest web indexer to use Zebra as + its native repository engine. His comments on the switch over + from the old engine are revealing: +
+ + The first results after some testing with Zebra are very + promising. The tests were done with around 220,000 SOIF files, + which occupies 1.6GB of disk space. + + + Building the index from scratch takes around one hour with Zebra + where [old-engine] needs around five hours. While [old-engine] + blocks search requests when updating its index, Zebra can still + answer search requests. + [...] + Zebra supports incremental indexing which will speed up indexing + even further. + + + While the search time of [old-engine] varies from some seconds + to some minutes depending how expensive the query is, Zebra + usually takes around one to three seconds, even for expensive + queries. + [...] + Zebra can search more than 100 times faster than [old-engine] + and can process multiple search requests simultaneously + + + I am very happy to see such nice software available under GPL. + +
+
@@ -331,15 +366,14 @@ announcements from the authors (new releases, bug fixes, etc.) and general discussion. You are welcome to seek support there. Join by sending email to - zebra-request@indexdata.dk. Put the word + zebra-request@indexdata.dk with the word subscribe in the body of the message. Third, it's possible to buy a commercial support contract, with well defined service levels and response times, from Index Data. See - - + for details. @@ -361,20 +395,17 @@ Improved support for XML in search and retrieval. Eventually, the goal is for Zebra to pull double duty as a flexible information retrieval engine and high-performance XML - repository. - - - ### Partially done. + repository. The recent addition of XPath searching is one + example of the kind of enhancement we're working on. - Access to search engine through SOAP/RPC API to allow the + Access to the search engine through SOAP/RPC API to allow the construction of applications without requiring Z39.50 tools. - - - ### Partially done, thanks to the new SRW/Z39.50 gateway. + This will shortly be available by means of Index Data's + SRW-to-Z39.50 gateway, currently in beta test. @@ -389,6 +420,17 @@ + Support for the use of Perl both for access to the Zebra API + and for building extension ``plug-ins'' such as input filters. + The code for this has been contributed to the source tree by + Peter Popovics + pop@indexdata.dk, + and is in the process of being integrated and tested. + + + + + Improved free-text searching. We're first and foremost octet jockeys and we're actively looking for organisations or people who'd like to contribute experience in relevance ranking and text