Honor position attribute, i.e. allow first-in-field search. To

[idzebra-moved-to-github.git] / doc / introduction.xml
diff --git a/doc/introduction.xml b/doc/introduction.xml

index a5987fe..947b4a8 100644 (file)
--- a/doc/introduction.xml
+++ b/doc/introduction.xml
@@ -1,8 +1,8 @@
  <chapter id="introduction">
- <!-- $Id: introduction.xml,v 1.31 2006-04-24 12:53:03 marc Exp $ -->
+ <!-- $Id: introduction.xml,v 1.39 2006-09-03 21:37:26 adam Exp $ -->
   <title>Introduction</title>
   
- <sect1>
+ <section id="overview">
    <title>Overview</title>
    
    <para>
@@ -34,9 +34,9 @@
     and how to configure the server to give you the
     functionality that you need.
    </para>
- </sect1>
+ </section>
   
- <sect1 id="features">
+ <section id="features">
    <title>Features</title>
    
    <para>
@@ -126,7 +126,7 @@
    </para>
    
    <para>
-   Z39.50 protocol support:
+     <ulink url="&url.z39.50;">Z39.50</ulink> protocol support:
    </para>
    
    <para>   
@@ -137,15 +137,6 @@
        Segmentation (support for very large records), Delete, Scan
        (index browsing), Sort, Close and support for the ``update''
        Extended Service to add or replace an existing XML record.
-       <!-- Adam says:
-            * Supported
-            You can insert/delete/replace an XML record given an
-            "external" ID.  Actually this way of doing ES Update was
-            meant for an OAI application that Ian Ibbotson had in
-            mind to implement. The "update" command in YAZ client
-            implements this on the client side. My plan is to make
-            this available in ZOOM "extended" soon..
-       -->
       </para>
      </listitem>
  
@@ -194,11 +185,53 @@
     </itemizedlist>
     
    </para>
+
    
- </sect1>
+  <para>
+    <ulink url="&url.sru;">SRU</ulink> Web Service support:
+  </para>
+  <para>   
+   <itemizedlist>
+    <listitem>
+     <para>
+       The protocol operations <literal>explain</literal>, 
+       <literal>searchRetrieve</literal> and <literal>scan</literal>
+       are supported. 
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+       <ulink url="&url.cql;">CQL</ulink> to internal query model RPN 
+       conversion is supported. 
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+       Multiple XML record formats
+      for data retrieval are supported, modelled over the  GRS-1, SUTRS,
+      MARC record formats. Records can be mapped between record
+       schemas on the fly. Arbitrarily complex XSLT transformations
+      can be applied during record retrieval if one uses the 
+       <literal>alvis</literal> filter module.
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+       Additional PQF query syntax for
+       <literal>searchRetrieve</literal>
+       and <literal>scan</literal> operations is supported.
+     </para>
+    </listitem>
+
+   </itemizedlist>
+   
+  </para>
+
+  
+ </section>
   
-  <sect1 id="apps">
-  <title>Applications</title>
+  <section id="introduction-apps">
+  <title>References and Zebra based Applications</title>
    <para>
     Zebra has been deployed in numerous applications, in both the
     academic and commercial worlds, in application domains as diverse
@@ -211,9 +244,110 @@
     Notable applications include the following:
    </para>
  
-  <sect2>
-   <title>DADS - the DTV Article Database Service</title>
+
+  <section id="koha-ils">
+   <title>Koha free open-source ILS</title>
     <para>
+     <ulink url="http://www.koha.org/">Koha</ulink> is a full-featured
+     open-source ILS, initially developed  in 
+     New Zealand by Katipo Communications Ltd, and first deployed in
+     January of 2000 for Horowhenua Library Trust. It is currently
+     maintained by a team of software providers and library technology
+     staff from around the globe. 
+    </para>
+    <para>
+     <ulink url="http://liblime.com/">LibLime</ulink>, 
+     a company that is marketing and supporting Koha, adds in
+     the new release of Koha 3.0 the Zebra
+     database server to drive its bibliographic database.
+    </para>
+    <para>
+     In early 2005, the Koha project development team began looking at
+     ways to improve MARC support and overcome scalability limitations
+     in the Koha 2.x series. After extensive evaluations of the best
+     of the Open Source textual database engines - including MySQL
+     full-text searching, PostgreSQL, Lucene and Plucene - the team
+     selected Zebra. 
+    </para>
+    <para>
+     "Zebra completely eliminates scalability limitations, because it
+     can support tens of millions of records." explained Joshua
+     Ferraro, LibLime's Technology President and Koha's Project
+     Release Manager. "Our performance tests showed search results in
+     under a second for databases with over 5 million records on a
+     modest i386 900Mhz test server." 
+    </para>
+    <para>
+     "Zebra also includes support for true boolean search expressions
+     and relevance-ranked free-text queries, both of which the Koha
+     2.x series lack. Zebra also supports incremental and safe
+     database updates, which allow on-the-fly record
+     management. Finally, since Zebra has at its heart the Z39.50
+     protocol, it greatly improves Koha's support for that critical
+     library standard." 
+    </para>
+    <para> 
+     Although the bibliographic database will be moved to Zebra, Koha
+     3.0 will continue to use a relational SQL-based database design
+     for the 'factual' database. "Relational database managers have
+     their strengths, in spite of their inability to handle large
+     numbers of bibliographic records efficiently," summed up Ferraro,
+     "We're taking the best from both worlds in our redesigned Koha
+     3.0. 
+     </para>
+     <para>
+     See also LibLime's newsletter article
+      <ulink url="http://www.liblime.com/newsletter/2006/01/features/koha-earns-its-stripes/">
+     Koha Earns its Stripes</ulink>.
+     </para>
+   </section>
+
+  <section id="emilda-ils">
+   <title>Emilda open source ILS</title>
+   <para>
+     <ulink url="http://www.emilda.org/">Emilda</ulink> 
+     is a complete Integrated Library System, released under the 
+     GNU General Public License. It has a
+     full featured Web-OPAC, allowing comprehensive system management
+     from virtually any computer with an Internet connection, has
+     template based layout allowing anyone to alter the visual
+     appearance of Emilda, and is
+     XML based language for fast and easy portability to virtually any
+     language.
+     Currently, Emilda is used at three schools in Espoo, Finland.
+    </para>
+    <para>
+     As a surplus, 100% MARC compatibility has been achieved using the
+    Zebra Server from Index Data as backend server. 
+    </para> 
+   </section>
+
+  <section id="reindex-ils">
+   <title>ReIndex.Net web based ILS</title>
+    <para>
+     <ulink url="http://www.reindex.net/index.php?lang=en">Reindex.net</ulink>
+     is a netbased library service offering all
+     traditional functions on a very high level plus many new
+     services. Reindex.net is a comprehensive and powerful WEB system
+     based on standards such as XML and Z39.50.
+     updates. Reindex supports MARC21, danMARC eller Dublin Core with
+     UTF8-encoding.  
+    </para>
+    <para>
+     Reindex.net runs on GNU/Debian Linux with Zebra and Simpleserver
+     from Index 
+     Data for bibliographic data. The relational database system
+     Sybase 9 XML is used for
+     administrative data. 
+     Internally MARCXML is used for bibliographical records. Update
+     utilizes Z39.50 extended services. 
+    </para>
+   </section>
+
+   <section id="dads-article-database">
+    <title>DADS - the DTV Article Database
+     Service</title>
+    <para>
      DADS is a huge database of more than ten million records, totalling
      over ten gigabytes of data.  The records are metadata about academic
      journal articles, primarily scientific; about 10% of these
@@ -234,9 +368,9 @@
      <ulink url="http://www.dtv.dk/"/> and
      <ulink url="http://dads.dtv.dk"/>
     </para>
-  </sect2>
+  </section>
  
-  <sect2>
+  <section id="infonet-eprints">
     <title>Infonet Eprints</title>
     <para>
       The InfoNet Eprints service from the 
@@ -253,39 +387,33 @@
      The online search facility is found at
      <ulink url="http://preprints.cvt.dk"/>.
     </para>
-  </sect2>
+  </section>
  
-  <sect2>
-   <title>NLI-Z39.50 - a Natural Language Interface for Libraries</title>
+  <section id="alvis-project">
+   <title>Alvis</title>
     <para>
-    Fernuniversit&#x00E4;t Hagen in Germany have developed a natural
-    language interface for access to library databases.
-    <!-- <ulink
-    url="http://ki212.fernuni-hagen.de/nli/NLIintro.html"/> -->
-    In order to evaluate this interface for recall and precision, they
-    chose Zebra as the basis for retrieval effectiveness.  The Zebra
-    server contains a copy of the GIRT database, consisting of more
-    than 76000 records in SGML format (bibliographic records from
-    social science), which are mapped to MARC for presentation.
-   </para>
-   <para>
-    (GIRT is the German Indexing and Retrieval Testdatabase.  It is a
-    standard German-language test database for intelligent indexing
-    and retrieval systems.  See
-    <ulink url="http://www.gesis.org/forschung/informationstechnologie/clef-delos.htm"/>)
-   </para>
-   <para>
-    Evaluation will take place as part of the TREC/CLEF campaign 2003 
-    <ulink url="http://clef.iei.pi.cnr.it"/>.
-    <!-- or <ulink url="http://www4.eurospider.ch/CLEF/"/> -->
-   </para>
-   <para>
-    For more information, contact Johannes Leveling
-    <email>Johannes.Leveling@FernUni-Hagen.De</email>
-   </para>
-  </sect2>
+     The <ulink url="http://www.alvis.info/alvis/">Alvis</ulink> EU
+     project run under the 6th Framework (IST-1-002068-STP)
+     is building a semantic-based peer-to-peer search engine. A
+     consortium of eleven partners from six different European
+     Community countries plus Switzerland and China contribute
+     with expertise in a broad range of specialties including network
+     topologies, routing algorithms, linguistic analysis and
+     bioinformatics. 
+    </para>
+    <para>
+     The Zebra information retrieval indexing machine is used inside
+     the Alvis framework to
+     manage huge collections of natural language processed and
+     enhanced XML data, coming from a topic relevant web crawl.
+     In this application, Zebra swallows and manages 37GB of XML data
+     in about 4 hours, resulting in search times of fractions of
+     seconds. 
+     </para>
+   </section>
+
  
-  <sect2>
+  <section id="uls">
     <title>ULS (Union List of Serials)</title>
     <para>
      The M25 Systems Team
@@ -311,9 +439,39 @@
      More information can be found at
      <ulink url="http://www.m25lib.ac.uk/ULS/"/>
     </para>
-  </sect2>
+  </section>
+
+  <section id="nli">
+   <title>NLI-Z39.50 - a Natural Language Interface for Libraries</title>
+   <para>
+    Fernuniversit&#x00E4;t Hagen in Germany have developed a natural
+    language interface for access to library databases.
+    <!-- <ulink
+    url="http://ki212.fernuni-hagen.de/nli/NLIintro.html"/> -->
+    In order to evaluate this interface for recall and precision, they
+    chose Zebra as the basis for retrieval effectiveness.  The Zebra
+    server contains a copy of the GIRT database, consisting of more
+    than 76000 records in SGML format (bibliographic records from
+    social science), which are mapped to MARC for presentation.
+   </para>
+   <para>
+    (GIRT is the German Indexing and Retrieval Testdatabase.  It is a
+    standard German-language test database for intelligent indexing
+    and retrieval systems.  See
+    <ulink url="http://www.gesis.org/forschung/informationstechnologie/clef-delos.htm"/>)
+   </para>
+   <para>
+    Evaluation will take place as part of the TREC/CLEF campaign 2003 
+    <ulink url="http://clef.iei.pi.cnr.it"/>.
+    <!-- or <ulink url="http://www4.eurospider.ch/CLEF/"/> -->
+   </para>
+   <para>
+    For more information, contact Johannes Leveling
+    <email>Johannes.Leveling@FernUni-Hagen.De</email>
+   </para>
+  </section>
  
-  <sect2>
+  <section id="various-web-indexes">
     <title>Various web indexes</title>
     <para>
      Zebra has been used by a variety of institutions to construct
@@ -336,7 +494,6 @@
     </para>
     <para>
      Kang-Jin Lee
-    <email>lee@arco.de</email>,
      has recently modified the Harvest web indexer to use Zebra as
      its native repository engine.  His comments on the switch over
      from the old engine are revealing:
@@ -369,11 +526,11 @@
       </para>
      </blockquote>
     </para>
-  </sect2>
- </sect1>
+  </section>
+ </section>
  
  
- <sect1 id="support">
+ <section id="introduction-support">
    <title>Support</title>
    <para>
     You can get support for Zebra from at least three sources.
@@ -401,10 +558,10 @@
     <ulink url="http://indexdata.dk/support/"/>
     for details.
    </para>
- </sect1>  
+ </section>  
  
  
- <sect1 id="future">
+ <section id="future">
    <title>Future Directions</title>
    
    <para>
@@ -443,14 +600,14 @@
          <ulink url="http://www.loc.gov/standards/sru/srw/">SRW</ulink>-to-Z39.50 gateway, currently in beta test.
         -->
         Experimental support of the 
-       Search/Retrieve Via URL ( <ulink url="http://www.loc.gov/standards/sru/">SRU</ulink>) 
-       <ulink url="http://www.loc.gov/standards/sru/"/>
+       Search/Retrieve Via URL ( <ulink url="&url.sru;">SRU</ulink>) 
+       <ulink url="&url.sru;"/>
         REST webservice, and the 
          Search/Retrieve Web Service ( <ulink url="http://www.loc.gov/standards/sru/srw/">SRW</ulink>)
         <ulink url="http://www.loc.gov/standards/sru/srw/"/>
         SOAP Web Service have recently been added to the YAZ/Zebra
-       combo - including server side Common Query Language (<ulink url="http://www.loc.gov/standards/sru/cql/">CQL</ulink>)
-       <ulink url="http://www.loc.gov/standards/sru/cql/"/> parsing
+       combo - including server side Common Query Language (<ulink url="&url.cql;">CQL</ulink>)
+       <ulink url="&url.cql;"/> parsing
         and configuration. It remains to find a sponsor for further testing,
         documentation and packaging of this exiting component.
       </para>
@@ -501,11 +658,19 @@
     or check the contact info at the end of this manual.
    </para>
    
- </sect1>
+ </section>
  </chapter>
- <!-- Keep this Emacs mode comment at the end of the file
-Local variables:
-mode: nxml
-End:
--->
-
+ <!-- Keep this comment at the end of the file
+ Local variables:
+ mode: sgml
+ sgml-omittag:t
+ sgml-shorttag:t
+ sgml-minimize-attributes:nil
+ sgml-always-quote-attributes:t
+ sgml-indent-step:1
+ sgml-indent-data:t
+ sgml-parent-document: "zebra.xml"
+ sgml-local-catalogs: nil
+ sgml-namecase-general:t
+ End:
+ -->