X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Fintroduction.xml;h=03401df0eeea89962fa1ddb5757eb71e0d911527;hb=2b17f33968c38dfe50b7710da1c9e5004da9cf8b;hp=2175218889eaa6911e295dc2ea81d4d7b7b975aa;hpb=7d77cebae2b7af01eb7211f4ca9860217b3d32cb;p=idzebra-moved-to-github.git

diff --git a/doc/introduction.xml b/doc/introduction.xml
index 2175218..03401df 100644
--- a/doc/introduction.xml
+++ b/doc/introduction.xml
@@ -1,39 +1,49 @@
 <chapter id="introduction">
- <!-- $Id: introduction.xml,v 1.4 2002-04-09 19:20:22 adam Exp $ -->
+ <!-- $Id: introduction.xml,v 1.12 2002-08-30 01:17:10 mike Exp $ -->
  <title>Introduction</title>
  
  <sect1>
   <title>Overview</title>
   
   <para>
-   The Zebra system is a fielded free-text indexing and retrieval engine with a
-   Z39.50 front-end. You can use any commercial or free-ware Z39.50 client
-   to access data stored in Zebra.
+   <ulink url="http://www.indexdata.dk/zebra/">
+     Zebra</ulink>
+   is a high-performance, general-purpose structured text
+   indexing and retrieval engine. It reads structured records in a
+   variety of input formats (eg. email, XML, MARC) and provides access
+   to them through a powerful combination of boolean search
+   expressions and relevance-ranked free-text queries.
   </para>
-  
+
   <para>
-   The Zebra server is our first step towards the development of a fully
-   configurable, open information system. Eventually, it will be paired
-   off with a powerful Z39.50 client to support complex information
-   management tasks within almost any application domain. We're making
-   the server available now because it's no fun to be in the open
-   information retrieval business all by yourself. We want to allow
-   people with interesting data to make their things
-   available in interesting ways, without having to start out
-   by implementing yet another protocol stack from scratch.
+   Zebra supports large databases (tens of millions of records,
+   tens of gigabytes of data). It allows safe, incremental
+   database updates on live systems. Because Zebra supports
+   the industry-standard information retrieval protocol, Z39.50,
+   you can search Zebra databases using an enormous variety of
+   programs and toolkits, both commercial and free, which understand
+   this protocol.  Application libraries are available to allow
+   bespoke clients to be written in Perl, C, C++, Java, Tcl, Visual
+   Basic, Python, PHP and more - see
+   <ulink url="http://zoom.z3950.org/">the ZOOM web site</ulink>
+   for more information on some of these client toolkits.
   </para>
-  
+
   <para>
-   This document is an introduction to the Zebra system. It will tell you
-   how to compile the software, and how to prepare your first database.
-   It also explains how the server can be configured to give you the
+   This document is an introduction to the Zebra system. It explains
+   how to compile the software, how to prepare your first database,
+   and how to configure the server to give you the
    functionality that you need.
   </para>
   
   <para>
-   If you find the software interesting, you should join the support
-   mailing-list by sending email to
-   <literal>zebra-request@indexdata.dk</literal>.
+   If you use Zebra, you should visit its
+   <ulink url="http://www.indexdata.dk/zebra/">web site</ulink>,
+   where you can join the
+   <ulink url="http://www.indexdata.dk/mailman/listinfo/zebralist">
+   mailing-list</ulink>
+   by sending email to
+   <email>### zebra-subscribe@mailman.indexdata.dk</email>
   </para>
   
  </sect1>
@@ -42,42 +52,44 @@
   <title>Features</title>
   
   <para>
-   This is a list of some of the most important features of the
-   system.
+   This is an overview of some of Zebra's most important features:
   </para>
   
   <para>
    <itemizedlist>
+
     <listitem>
      <para>
-      Supports updating - records can be added and deleted without
-      rebuilding the index from scratch.
-      The update procedure is tolerant to crashes or hard interrupts
-      during register updating - registers can be reconstructed following
-      a crash.
-      Registers can be safely updated even while users are accessing
-      the server.
+      Very large databases: files for indexes, etc. can be
+      automatically partitioned over multiple disks.
      </para>
     </listitem>
 
     <listitem>
      <para>
-      Supports large databases - files for indices, etc. can be
-      automatically partitioned over multiple disks.
+      Arbitrarily complex records.  The internal data format
+      is an structured format conceptually similar to XML or GRS-1,
+      which allows nested structured data elements and
+      variant forms of data.
      </para>
     </listitem>
 
     <listitem>
      <para>
-      Supports arbitrarily complex records - base input format is an
-      SGML-like syntax which allows nested (structured) data elements, as
-      well as variant forms of data.
+      Robust updating - records can be added and deleted ``on the fly''
+      without rebuilding the index from scratch.
+      Records can be safely updated even while users are accessing
+      the server.
+      The update procedure is tolerant to crashes or hard interrupts
+      during database updating - data can be reconstructed following
+      a crash.
      </para>
     </listitem>
 
     <listitem>
      <para>
-      Supports random storage formats. A system of input filters driven by
+      Configurable to understand many input formats.
+      A system of input filters driven by
       regular expressions allows you to easily process most ASCII-based
       data formats. SGML, XML, ISO2709 (MARC), and raw text are also
       supported.
@@ -86,25 +98,27 @@
 
     <listitem>     
      <para>
-      Supports boolean queries as well as relevance-ranking (free-text)
-      searching. Right truncation and masking in terms are supported, as
-      well as full regular expressions.
+      Searching supports a powerful combination of boolean queries as
+      well as relevance-ranking (free-text) queries.  Truncation,
+      masking, full regular expression matching and "approximate
+      matching" (eg. spelling mistakes) are all supported.
      </para>
     </listitem>
 
     <listitem>
-     <para>
-      Supports multiple concrete syntaxes
-      for record exchange (depending on the configuration): GRS-1, SUTRS,
-      XML, ISO2709 (*MARC). Records can be mapped between record syntaxes
-      and schema on the fly.      
-     </para>
+      <para>
+	Index-only databases: data can be, and usually is, imported
+        into Zebra's own storage, but Zebra can also refer to
+        external files, building and maintaining indexes of "live"
+	collections.
+      </para>
     </listitem>
 
-    <listitem>     
+    <listitem>
      <para>
-      Supports approximate matching in registers (ie. spelling mistakes,
-      etc).
+      Zebra is written in portable C, so it runs on most Unix-like systems 
+      as well as Windows NT.  A binary distribution for Windows NT is
+      available.
      </para>
     </listitem>
     
@@ -113,14 +127,15 @@
   </para>
   
   <para>
-   Protocol support:
+   Z39.50 protocol support:
   </para>
   
   <para>   
    <itemizedlist>
     <listitem>
      <para>
-      Protocol facilities: Init, Search, Retrieve, Delete, Browse and Sort.
+      Protocol facilities: Init, Search, Present (retrieval), Delete,
+      Scan (index browsing) and Sort.
      </para>
     </listitem>
 
@@ -135,6 +150,7 @@
       Named result sets are supported.
      </para>
     </listitem>
+
     <listitem>
      <para>
       Easily configured to support different application profiles, with
@@ -146,105 +162,192 @@
 
     <listitem>
      <para>
-      Complex composition specifications using Espec-1 are partially
-      supported (simple element requests only).
-     </para>
-    </listitem>
-
-    <listitem>
-     <para>
-      Element Set Names are defined using the Espec-1 capability of the
-      system, and are given in configuration files as simple element
-      requests (and possibly variant requests).
+      Complex composition specifications using Espec-1 (partial support).
+      Element sets are defined using the Espec-1 capability,
+      and are specified in configuration files as simple element
+      requests (and, optionally, variant requests).
      </para>
     </listitem>
 
     <listitem>
      <para>
-      Some variant support (not fully implemented yet).
+      Multiple record syntaxes
+      for data retrieval: GRS-1, SUTRS,
+      XML, ISO2709 (MARC), etc. Records can be mapped between record syntaxes
+      and schemas on the fly.      
      </para>
     </listitem>
 
-    <listitem>
-     <para>
-      Zebra runs on most Unix-like systems as well as Windows NT - a binary
-      distribution for Windows NT is available.
-     </para>
-    </listitem>
-    
    </itemizedlist>
    
   </para>
   
  </sect1>
  
+ <sect1 id="apps">
+  <title>Applications</title>
+  <para>
+   Zebra has been deployed in numerous applications, in both the
+   academic and commercial worlds, in application domains as diverse
+   as bibliographic catalogues, geospatial information, structured
+   vocabulary browsing, government information locators, civic
+   information systems, environmental observations, museum information
+   and web indexes.
+  </para>
+  <para>
+   Notable applications include the following:
+  </para>
+
+  <sect2>
+   <title>DADS - the DTV Article Database Service</title>
+   <para>
+    DADS is a huge database of more than ten million records, totalling
+    over ten gigabytes of data.  The records are metadata about academic
+    journal articles, primarily scientific; about 10% of these
+    metadata records link to the full text of the articles they
+    describe, a body of about a terabyte of information (although the
+    full text is not indexed.)
+   </para>
+   <para>
+    It allows students and researchers at DTU (Danmarks Tekniske
+    Universitet, the Technical College of Denmark) to find and order
+    articles from multiple databases in a single query.  The database
+    contains literature on all engineering subjects.  It's available
+    on-line through a web gateway, though currently only to registered
+    users.
+   </para>
+   <para>
+    More information can be found at
+    <ulink url="http://www.dtv.dk/help/dads/index_e.htm"/>
+   </para>
+  </sect2>
+
+<!--
+Envelope-to: zebra@miketaylor.org.uk
+From: Johannes Leveling <Johannes.Leveling@FernUni-Hagen.de>
+Content-Type: text/plain; charset=iso-8859-1
+Date: Thu, 29 Aug 2002 19:19:55 +0200
+To: zebra@miketaylor.org.uk
+Subject: [Zebralist] Looking for Deployment Stories
+In-Reply-To: <200208281002.LAA16526@seatbooker.net>
+X-Virus-Scanned: by AMaViS perl-11
+X-MIME-Autoconverted: from quoted-printable to 8bit by localhost.localdomain id g7TLWR905724
+
+Mike Taylor writes:
+ > People,
+ > 
+ > In collaboration with Sebastian, Adam and Heikki, I am reworking some
+ > parts of the Zebra documentation in preparation for the forthcoming
+ > release.  One area I am keen to expand on is (briefly) describing
+ > interesting applications of Zebra.  If you've deployed it in a way
+ > that you consider interesting, I'd love to hear from you, however
+ > briefly.  Think of this as a chance to get some free publicity for
+ > your application in the Zebra documentation.
+ > 
+ > Replies off-list to <zebra@miketaylor.org.uk>, please.
+ > 
+ >  _/|_	 _______________________________________________________________
+ > /o ) \/  Mike Taylor   <mike@miketaylor.org.uk>   www.miketaylor.org.uk
+ > )_v__/\  There are some good things you can never have too much of.
+ > 
+ > 
+ > _______________________________________________
+ > Zebralist mailing list
+ > Zebralist@indexdata.dk
+ > http://www.indexdata.dk/mailman/listinfo/zebralist
+ > 
+Intersting?
+We have developed a natural language interface (NLI-Z39.50) for access
+to library databases at the Fernuniversität Hagen, Germany
+(http://ki212.fernuni-hagen.de/nli/NLI.html).
+To prepare formal information retrieval evaluation,
+we chose the Zebra server as the basis for
+evaluating retrieval effectiveness (measuring recall 
+and precision for the GIRT database). The Zebra database 
+consists of more than 76000 records in SGML format (bibliographic 
+records from social science), which are mapped to MARC for presentation. 
+Evaluation will take place as part of the TREC/CLEF campaign 2003 
+(see http://clef.iei.pi.cnr.it or http://www4.eurospider.ch/CLEF/).
+
+
+Johannes Leveling        Praktische Informatik VII/KI           
+                         FernUniversität Hagen
+
+Email : Johannes.Leveling@FernUni-Hagen.De  
+Tel.  : +49 2331 987-4525
+
+-->
+
+  <sect2>
+   <title>Various web indexes</title>
+   <para>
+    Zebra has been used by a variety of institutions to construct
+    indexes of large web sites, typically in the region of tens of
+    millions of pages.  In this role, it functions somewhat similarly
+    to the engine of google or altavista, but for a selected intranet
+    or subset of the whole Web.
+   </para>
+   <para>
+    ### examples, details and numbers, please!
+   </para>
+  </sect2>
+ </sect1>
+
  <sect1 id="future">
-  <title>Future Work</title>
+  <title>Future Directions</title>
   
   <para>
    These are some of the plans that we have for the software in the near
-   and far future, approximately ordered after their relative importance.
-   Items marked with an
-   asterisk will be implemented before the
-   last beta release.
+   and far future, ordered approximately as we expect to work on them.
   </para>
   
   <para>
    <itemizedlist>
-    <listitem>
-     <para>
-      *Complete the support for variants.
-     </para>
-    </listitem>
-
-    <listitem>
-     <para>
-      *Finalize the data element <emphasis>include</emphasis> facility
-      to support multimedia data elements in records.
-     </para>
-    </listitem>
 
     <listitem>
      <para>
-      Add more sophisticated relevance ranking mechanisms.
-      Add support for soundex and stemming.
-      Add relevance <emphasis>feedback</emphasis> support.
+       Improved support for XML in search and retrieval. Eventually,
+       the goal is for Zebra to pull double duty as a flexible
+       information retrieval engine and high-performance XML
+       repository.
      </para>
     </listitem>
 
     <listitem>
      <para>
-      Complete EXPLAIN support.
+       Access to search engine through SOAP/RPC API to allow the
+       construction of applications without requiring Z39.50 tools.
      </para>
     </listitem>
 
     <listitem>
      <para>
-      Add support for very large records by implementing segmentation and/or
-      variant pieces.
+       Finalisation and documentation of Zebra's C programming
+       API, allowing updates, database management and other functions
+       not readily expressed in Z39.50.  We will also consider
+       exposing the API through SOAP.
      </para>
     </listitem>
 
     <listitem>
      <para>
-      Support the Item Update extended service of the protocol.
+       Improved free-text searching. We're first and foremost octet jockeys and
+       we're actively looking for organisations or people who'd like
+       to contribute experience in relevance ranking and text
+       searching.
      </para>
     </listitem>
 
-    <listitem>
-     <para>
-      We want to add a management system that allows you to
-      control your databases and configuration tables from a graphical
-      interface.
-     </para>
-    </listitem>
    </itemizedlist>
   </para>
   
   <para>
    Programmers thrive on user feedback. If you are interested in a
    facility that you don't see mentioned here, or if there's something
-   you think we could do better, please drop us a mail.
+   you think we could do better, please drop us a mail.  Better still,
+   implement it and send us the patches.
+  </para>
+  <para>
    If you think it's all really neat, you're welcome to drop us a line
    saying that, too. You'll find contact info at the end of this file.
   </para>