+ <section id="features-protocol">
+ <title>&zebra; Networked Protocols</title>
+
+ <table id="table-features-protocol" frame="top">
+ <title>&zebra; networked protocols</title>
+ <tgroup cols="4">
+ <colspec colwidth="1*" colname="feature"/>
+ <colspec colwidth="1*" colname="availability"/>
+ <colspec colwidth="3*" colname="notes"/>
+ <colspec colwidth="2*" colname="references"/>
+ <thead>
+ <row>
+ <entry>Feature</entry>
+ <entry>Availability</entry>
+ <entry>Notes</entry>
+ <entry>Reference</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>Fundamental operations</entry>
+ <entry>&acro.z3950;/&acro.sru; <literal>explain</literal>,
+ <literal>search</literal>, <literal>scan</literal>, and
+ <literal>update</literal></entry>
+ <entry></entry>
+ <entry><xref linkend="querymodel-operation-types"/></entry>
+ </row>
+ <row>
+ <entry>&acro.z3950; protocol support</entry>
+ <entry>yes</entry>
+ <entry> Protocol facilities supported are:
+ <literal>init</literal>, <literal>search</literal>,
+ <literal>present</literal> (retrieval),
+ Segmentation (support for very large records),
+ <literal>delete</literal>, <literal>scan</literal>
+ (index browsing), <literal>sort</literal>,
+ <literal>close</literal> and support for the <literal>update</literal>
+ Extended Service to add or replace an existing &acro.xml;
+ record. Piggy-backed presents are honored in the search
+ request. Named result sets are supported.</entry>
+ <entry><xref linkend="protocol-support"/></entry>
+ </row>
+ <row>
+ <entry>Web Service support</entry>
+ <entry>&acro.sru;</entry>
+ <entry> The protocol operations <literal>explain</literal>,
+ <literal>searchRetrieve</literal> and <literal>scan</literal>
+ are supported. <ulink url="&url.cql;">&acro.cql;</ulink> to internal
+ query model &acro.rpn;
+ conversion is supported. Extended RPN queries
+ for search/retrieve and scan are supported.</entry>
+ <entry><xref linkend="zebrasrv-sru-support"/></entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ </section>
+
+ <section id="features-scalability">
+ <title>&zebra; Data Size and Scalability</title>
+
+ <table id="table-features-scalability" frame="top">
+ <title>&zebra; data size and scalability</title>
+ <tgroup cols="4">
+ <colspec colwidth="1*" colname="feature"/>
+ <colspec colwidth="1*" colname="availability"/>
+ <colspec colwidth="3*" colname="notes"/>
+ <colspec colwidth="2*" colname="references"/>
+ <thead>
+ <row>
+ <entry>Feature</entry>
+ <entry>Availability</entry>
+ <entry>Notes</entry>
+ <entry>Reference</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>No of records</entry>
+ <entry>40-60 million</entry>
+ <entry></entry>
+ <entry></entry>
+ </row>
+ <row>
+ <entry>Data size</entry>
+ <entry>100 GB of record data</entry>
+ <entry>&zebra; based applications have successfully indexed up
+ to 100 GB of record data</entry>
+ <entry></entry>
+ </row>
+ <row>
+ <entry>Scale out</entry>
+ <entry>multiple discs</entry>
+ <entry></entry>
+ <entry></entry>
+ </row>
+ <row>
+ <entry>Performance</entry>
+ <entry><literal>O(n * log N)</literal></entry>
+ <entry> &zebra; query speed and performance is affected roughly by
+ <literal>O(log N)</literal>,
+ where <literal>N</literal> is the total database size, and by
+ <literal>O(n)</literal>, where <literal>n</literal> is the
+ specific query hit set size.</entry>
+ <entry></entry>
+ </row>
+ <row>
+ <entry>Average search times</entry>
+ <entry></entry>
+ <entry> Even on very large size databases hit rates of 20 queries per
+ seconds with average query answering time of 1 second are possible,
+ provided that the boolean queries are constructed sufficiently
+ precise to result in hit sets of the order of 1000 to 5.000
+ documents.</entry>
+ <entry></entry>
+ </row>
+ <row>
+ <entry>Large databases</entry>
+ <entry>64 bit file pointers</entry>
+ <entry>64 file pointers assure that register files can extend
+ the 2 GB limit. Logical files can be
+ automatically partitioned over multiple disks, thus allowing for
+ large databases.</entry>
+ <entry></entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ </section>
+
+ <section id="features-platforms">
+ <title>&zebra; Supported Platforms</title>
+
+ <table id="table-features-platforms" frame="top">
+ <title>&zebra; supported platforms</title>
+ <tgroup cols="4">
+ <colspec colwidth="1*" colname="feature"/>
+ <colspec colwidth="1*" colname="availability"/>
+ <colspec colwidth="3*" colname="notes"/>
+ <colspec colwidth="2*" colname="references"/>
+ <thead>
+ <row>
+ <entry>Feature</entry>
+ <entry>Availability</entry>
+ <entry>Notes</entry>
+ <entry>Reference</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>Linux</entry>
+ <entry></entry>
+ <entry>GNU Linux (32 and 64bit), journaling Reiser or (better)
+ JFS file system
+ on disks. NFS file systems are not supported.
+ GNU/Debian Linux packages are available</entry>
+ <entry><xref linkend="installation-debian"/></entry>
+ </row>
+ <row>
+ <entry>Unix</entry>
+ <entry>tar-ball</entry>
+ <entry>&zebra; is written in portable C, so it runs on most
+ Unix-like systems.
+ Usual tar-ball install possible on many major Unix systems</entry>
+ <entry><xref linkend="installation-unix"/></entry>
+ </row>
+ <row>
+ <entry>Windows</entry>
+ <entry>NT/2000/2003/XP</entry>
+ <entry>&zebra; runs as well on Windows (NT/2000/2003/XP).
+ Windows installer packages available</entry>
+ <entry><xref linkend="installation-win32"/></entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ </section>
+
+
+ </section>
+
+ <section id="introduction-apps">
+ <title>References and &zebra; based Applications</title>
+ <para>
+ &zebra; has been deployed in numerous applications, in both the
+ academic and commercial worlds, in application domains as diverse
+ as bibliographic catalogues, Geo-spatial information, structured
+ vocabulary browsing, government information locators, civic
+ information systems, environmental observations, museum information
+ and web indexes.
+ </para>
+ <para>
+ Notable applications include the following:
+ </para>
+
+
+ <section id="koha-ils">
+ <title>Koha free open-source ILS</title>
+ <para>
+ <ulink url="http://www.koha.org/">Koha</ulink> is a full-featured
+ open-source ILS, initially developed in
+ New Zealand by Katipo Communications Ltd, and first deployed in
+ January of 2000 for Horowhenua Library Trust. It is currently
+ maintained by a team of software providers and library technology
+ staff from around the globe.
+ </para>
+ <para>
+ <ulink url="http://liblime.com/">LibLime</ulink>,
+ a company that is marketing and supporting Koha, adds in
+ the new release of Koha 3.0 the &zebra;
+ database server to drive its bibliographic database.
+ </para>
+ <para>
+ In early 2005, the Koha project development team began looking at
+ ways to improve &acro.marc; support and overcome scalability limitations
+ in the Koha 2.x series. After extensive evaluations of the best
+ of the Open Source textual database engines - including MySQL
+ full-text searching, PostgreSQL, Lucene and Plucene - the team
+ selected &zebra;.
+ </para>
+ <para>
+ "&zebra; completely eliminates scalability limitations, because it
+ can support tens of millions of records." explained Joshua
+ Ferraro, LibLime's Technology President and Koha's Project
+ Release Manager. "Our performance tests showed search results in
+ under a second for databases with over 5 million records on a
+ modest i386 900Mhz test server."
+ </para>
+ <para>
+ "&zebra; also includes support for true boolean search expressions
+ and relevance-ranked free-text queries, both of which the Koha
+ 2.x series lack. &zebra; also supports incremental and safe
+ database updates, which allow on-the-fly record
+ management. Finally, since &zebra; has at its heart the &acro.z3950;
+ protocol, it greatly improves Koha's support for that critical
+ library standard."
+ </para>
+ <para>
+ Although the bibliographic database will be moved to &zebra;, Koha
+ 3.0 will continue to use a relational SQL-based database design
+ for the 'factual' database. "Relational database managers have
+ their strengths, in spite of their inability to handle large
+ numbers of bibliographic records efficiently," summed up Ferraro,
+ "We're taking the best from both worlds in our redesigned Koha
+ 3.0.
+ </para>
+ <para>
+ See also LibLime's newsletter article
+ <ulink url="http://www.liblime.com/newsletter/2006/01/features/koha-earns-its-stripes/">
+ Koha Earns its Stripes</ulink>.
+ </para>
+ </section>
+
+
+ <section id="kete-dom">
+ <title>Kete Open Source Digital Library and Archiving software</title>
+ <para>
+ <ulink url="http://kete.net.nz/">Kete</ulink> is a digital object
+ management repository, initially developed in
+ New Zealand. Initial development has
+ been a partnership between the Horowhenua Library Trust and
+ Katipo Communications Ltd. funded as part of the Community
+ Partnership Fund in 2006.
+ Kete is purpose built
+ software to enable communities to build their own digital
+ libraries, archives and repositories.
+ </para>
+ <para>
+ It is based on Ruby-on-Rails and MySQL, and integrates the &zebra; server
+ and the &yaz; toolkit for indexing and retrieval of it's content.
+ Zebra is run as separate computer process from the Kete
+ application.
+ See
+ how Kete <ulink
+ url="http://kete.net.nz/documentation/topics/show/139-managing-zebra">manages
+ Zebra.</ulink>
+ </para>
+ <para>
+ Why does Kete wants to use Zebra?? Speed, Scalability and easy
+ integration with Koha. Read their
+ <ulink
+ url="http://kete.net.nz/blog/topics/show/44-who-what-why-when-answering-some-of-the-niggly-development-questions">detailed
+ reasoning here.</ulink>
+ </para>
+ </section>
+
+ <section id="reindex-ils">
+ <title>ReIndex.Net web based ILS</title>
+ <para>
+ <ulink url="http://www.reindex.net/index.php?lang=en">Reindex.net</ulink>
+ is a netbased library service offering all
+ traditional functions on a very high level plus many new
+ services. Reindex.net is a comprehensive and powerful WEB system
+ based on standards such as &acro.xml; and &acro.z3950;.
+ updates. Reindex supports &acro.marc21;, dan&acro.marc; eller Dublin Core with
+ UTF8-encoding.
+ </para>
+ <para>
+ Reindex.net runs on GNU/Debian Linux with &zebra; and Simpleserver
+ from Index
+ Data for bibliographic data. The relational database system
+ Sybase 9 &acro.xml; is used for
+ administrative data.
+ Internally &acro.marcxml; is used for bibliographical records. Update
+ utilizes &acro.z3950; extended services.
+ </para>
+ </section>
+
+ <section id="dads-article-database">
+ <title>DADS - the DTV Article Database
+ Service</title>
+ <para>
+ DADS is a huge database of more than ten million records, totalling
+ over ten gigabytes of data. The records are metadata about academic
+ journal articles, primarily scientific; about 10% of these
+ metadata records link to the full text of the articles they
+ describe, a body of about a terabyte of information (although the
+ full text is not indexed.)
+ </para>
+ <para>
+ It allows students and researchers at DTU (Danmarks Tekniske
+ Universitet, the Technical College of Denmark) to find and order
+ articles from multiple databases in a single query. The database
+ contains literature on all engineering subjects. It's available
+ on-line through a web gateway, though currently only to registered
+ users.
+ </para>
+ <para>
+ More information can be found at
+ <ulink url="http://www.dtic.dtu.dk/"/> and
+ <ulink url="http://dads.dtv.dk"/>
+ </para>
+ </section>
+
+ <section id="uls">
+ <title>ULS (Union List of Serials)</title>
+ <para>
+ The M25 Systems Team
+ has created a union catalogue for the periodicals of the
+ twenty-one constituent libraries of the University of London and
+ the University of Westminster
+ (<ulink url="http://www.m25lib.ac.uk/ULS/"/>).
+ They have achieved this using an
+ unusual architecture, which they describe as a
+ ``non-distributed virtual union catalogue''.
+ </para>
+ <para>
+ The member libraries send in data files representing their
+ periodicals, including both brief bibliographic data and summary
+ holdings. Then 21 individual &acro.z3950; targets are created, each
+ using &zebra;, and all mounted on the single hardware server.
+ The live service provides a web gateway allowing &acro.z3950; searching
+ of all of the targets or a selection of them. &zebra;'s small
+ footprint allows a relatively modest system to comfortably host
+ the 21 servers.
+ </para>
+ <para>
+ More information can be found at
+ <ulink url="http://www.m25lib.ac.uk/ULS/"/>
+ </para>
+ </section>
+
+ <section id="various-web-indexes">
+ <title>Various web indexes</title>
+ <para>
+ &zebra; has been used by a variety of institutions to construct
+ indexes of large web sites, typically in the region of tens of
+ millions of pages. In this role, it functions somewhat similarly
+ to the engine of Google or AltaVista, but for a selected intranet
+ or a subset of the whole Web.
+ </para>
+ <para>
+ For example, Liverpool University's web-search facility (see on
+ the home page at
+ <ulink url="http://www.liv.ac.uk/"/>
+ and many sub-pages) works by relevance-searching a &zebra; database
+ which is populated by the Harvest-NG web-crawling software.
+ </para>
+ <para>
+ For more information on Liverpool university's intranet search
+ architecture, contact John Gilbertson
+ <email>jgilbert@liverpool.ac.uk</email>
+ </para>
+ <para>
+ Kang-Jin Lee
+ has recently modified the Harvest web indexer to use &zebra; as
+ its native repository engine. His comments on the switch over
+ from the old engine are revealing:
+ <blockquote>
+ <para>
+ The first results after some testing with &zebra; are very
+ promising. The tests were done with around 220,000 SOIF files,
+ which occupies 1.6GB of disk space.
+ </para>
+ <para>
+ Building the index from scratch takes around one hour with &zebra;
+ where [old-engine] needs around five hours. While [old-engine]
+ blocks search requests when updating its index, &zebra; can still
+ answer search requests.
+ [...]
+ &zebra; supports incremental indexing which will speed up indexing
+ even further.
+ </para>
+ <para>
+ While the search time of [old-engine] varies from some seconds
+ to some minutes depending how expensive the query is, &zebra;
+ usually takes around one to three seconds, even for expensive
+ queries.
+ [...]
+ &zebra; can search more than 100 times faster than [old-engine]
+ and can process multiple search requests simultaneously
+ </para>
+ <para>
+ I am very happy to see such nice software available under GPL.
+ </para>
+ </blockquote>
+ </para>
+ </section>
+ </section>
+
+ <section id="introduction-support">
+ <title>Support</title>
+ <para>
+ You can get support for &zebra; from at least three sources.
+ </para>