doc/introduction.xml

   1 <chapter id="introduction">
   2  <!-- $Id: introduction.xml,v 1.8 2002-08-27 07:49:23 mike Exp $ -->
   3  <title>Introduction</title>
   4
   5  <sect1>
   6   <title>Overview</title>
   7
   8   <para>
   9    <ulink url="http://www.indexdata.dk/zebra/">
  10      Zebra</ulink>
  11    is a high-performance, general-purpose structured text
  12    indexing and retrieval engine. It reads structured records in a
  13    variety of input formats (eg. email, XML, MARC) and allows access
  14    to them through exact boolean search expressions and
  15    relevance-ranked free-text queries.
  16    </para>
  17
  18    <para>
  19    Zebra supports large databases (more than ten gigabytes of data,
  20    tens of millions of records). It supports safe, incremental
  21    database updates on live systems. You can access data stored in
  22    Zebra using a variety of Index Data tools (eg. YAZ and PHP/YAZ) as
  23    well as commercial and freeware Z39.50 clients and toolkits.
  24    </para>
  25
  26   <para>
  27    This document is an introduction to the Zebra system. It will tell you
  28    how to compile the software, and how to prepare your first database.
  29    It also explains how the server can be configured to give you the
  30    functionality that you need.
  31   </para>
  32
  33   <para>
  34
  35    If you find the software interesting, you should visit the
  36    <ulink url="http://www.indexdata.dk/zebra/">
  37      Zebra web site</ulink>, where you can join the
  38    <ulink url="http://www.indexdata.dk/mailman/listinfo/zebralist">
  39    mailing-list</ulink>
  40    by sending email to
  41   </para>
  42
  43  </sect1>
  44
  45  <sect1 id="features">
  46   <title>Features</title>
  47
  48   <para>
  49    This is an overview of some of the most important features of the
  50    system.
  51   </para>
  52
  53   <para>
  54    <itemizedlist>
  55
  56     <listitem>
  57      <para>
  58       Supports large databases - files for indices, etc. can be
  59       automatically partitioned over multiple disks.
  60      </para>
  61     </listitem>
  62
  63     <listitem>
  64      <para>
  65       Supports arbitrarily complex records - base input format is an
  66       SGML-like syntax which allows nested (structured) data elements, as
  67       well as variant forms of data.
  68      </para>
  69     </listitem>
  70
  71     <listitem>
  72      <para>
  73       Robust updating - records can be added and deleted without
  74       rebuilding the index from scratch.
  75       The update procedure is tolerant to crashes or hard interrupts
  76       during register updating - registers can be reconstructed following
  77       a crash.
  78       Registers can be safely updated even while users are accessing
  79       the server.
  80      </para>
  81     </listitem>
  82
  83     <listitem>
  84      <para>
  85       Supports random storage formats. A system of input filters driven by
  86       regular expressions allows you to easily process most ASCII-based
  87       data formats. SGML, XML, ISO2709 (MARC), and raw text are also
  88       supported.
  89      </para>
  90     </listitem>
  91
  92     <listitem>
  93      <para>
  94       Supports boolean queries as well as relevance-ranking (free-text)
  95       searching. Right truncation and masking in terms are supported, as
  96       well as full regular expressions.
  97      </para>
  98     </listitem>
  99
 100     <listitem>
 101       <para>
 102         Can import the data into Zebras own storage, or just refer to
 103         external files (good for building indexes of "live"
 104         collections).
 105       </para>
 106     </listitem>
 107
 108     <listitem>
 109      <para>
 110       Supports multiple concrete syntaxes
 111       for record exchange (depending on the configuration): GRS-1, SUTRS,
 112       XML, ISO2709 (*MARC). Records can be mapped between record syntaxes
 113       and schema on the fly.
 114      </para>
 115     </listitem>
 116
 117     <listitem>
 118      <para>
 119       Supports approximate matching in registers (ie. spelling mistakes,
 120       etc).
 121      </para>
 122     </listitem>
 123
 124     <listitem>
 125      <para>
 126       Zebra is written in portable C, so it runs on most Unix-like systems
 127       as well as Windows NT - a binary distribution for Windows NT is available.
 128      </para>
 129     </listitem>
 130
 131    </itemizedlist>
 132
 133   </para>
 134
 135   <para>
 136    Z39.50 protocol support:
 137   </para>
 138
 139   <para>
 140    <itemizedlist>
 141     <listitem>
 142      <para>
 143       Protocol facilities: Init, Search, Retrieve, Delete, Browse and Sort.
 144      </para>
 145     </listitem>
 146
 147     <listitem>
 148      <para>
 149       Piggy-backed presents are honored in the search-request.
 150      </para>
 151     </listitem>
 152
 153     <listitem>
 154      <para>
 155       Named result sets are supported.
 156      </para>
 157     </listitem>
 158     <listitem>
 159      <para>
 160       Easily configured to support different application profiles, with
 161       tables for attribute sets, tag sets, and abstract syntaxes.
 162       Additional tables control facilities such as element mappings to
 163       different schema (eg., GILS-to-USMARC).
 164      </para>
 165     </listitem>
 166
 167     <listitem>
 168      <para>
 169       Complex composition specifications using Espec-1 are partially
 170       supported (simple element requests only).
 171      </para>
 172     </listitem>
 173
 174     <listitem>
 175      <para>
 176       Element Set Names are defined using the Espec-1 capability of the
 177       system, and are given in configuration files as simple element
 178       requests (and possibly variant requests).
 179      </para>
 180     </listitem>
 181
 182    </itemizedlist>
 183
 184   </para>
 185
 186  </sect1>
 187
 188  <sect1 id="apps">
 189   <title>Applications</title>
 190   <para>
 191    Zebra has been deployed in numerous applications, in both the
 192    academic and commercial worlds, in application domains as diverse
 193    as bibliographic information, geospatial, ### (Help, guys!)
 194   </para>
 195   <para>
 196    Notable applications include the following:
 197   </para>
 198
 199   <sect2>
 200    <title>DADS - the DTV Article Database Service</title>
 201    <para>
 202     DADS is a huge database of ### records, allowing students and
 203     researchers at DTU (###) to search and order articles from several
 204     different databases at once.  The database contains
 205     literature on all engineering subjects.  It's available on-line
 206     through a web gateway at
 207         http://www.dtv.dk/search/index_e.htm
 208     though only to members of the university.
 209    </para>
 210    <para>
 211     ### Much more information needed.
 212    </para>
 213   </sect2>
 214  </sect1>
 215
 216  <sect1 id="future">
 217   <title>Future Work</title>
 218
 219   <para>
 220    These are some of the plans that we have for the software in the near
 221    and far future, approximately ordered after their relative importance.
 222   </para>
 223
 224   <para>
 225    <itemizedlist>
 226
 227     <listitem>
 228      <para>
 229        Improved support for XML in search and retrieval. Eventually,
 230        the goal is for Zebra to pull double duty as a flexible
 231        information retrieval engine and high-performance XML
 232        repository.
 233      </para>
 234     </listitem>
 235
 236     <listitem>
 237      <para>
 238        Access to search engine through SOAP/RPC API to allow the
 239        construction of applications without requiring Z39.50 tools.
 240      </para>
 241     </listitem>
 242
 243     <listitem>
 244      <para>
 245        Finalisation, documentation of the Zebra API. Consider
 246        exposing the API through SOAP as well (allowing updates,
 247        database management).
 248      </para>
 249     </listitem>
 250
 251     <listitem>
 252      <para>
 253        Improved free-text searching. We're first and foremost octet jockeys and
 254        we're actively looking for organisations or people who'd like
 255        to contribute experience in relevance ranking and text
 256        searching.
 257      </para>
 258     </listitem>
 259
 260    </itemizedlist>
 261   </para>
 262
 263   <para>
 264    Programmers thrive on user feedback. If you are interested in a
 265    facility that you don't see mentioned here, or if there's something
 266    you think we could do better, please drop us a mail.
 267    If you think it's all really neat, you're welcome to drop us a line
 268    saying that, too. You'll find contact info at the end of this file.
 269   </para>
 270
 271  </sect1>
 272 </chapter>
 273  <!-- Keep this comment at the end of the file
 274  Local variables:
 275  mode: sgml
 276  sgml-omittag:t
 277  sgml-shorttag:t
 278  sgml-minimize-attributes:nil
 279  sgml-always-quote-attributes:t
 280  sgml-indent-step:1
 281  sgml-indent-data:t
 282  sgml-parent-document: "zebra.xml"
 283  sgml-local-catalogs: nil
 284  sgml-namecase-general:t
 285  End:
 286  -->