doc/introduction.xml

   1 <chapter id="introduction">
   2  <!-- $Id: introduction.xml,v 1.7 2002-08-05 08:27:05 quinn Exp $ -->
   3  <title>Introduction</title>
   4
   5  <sect1>
   6   <title>Overview</title>
   7
   8   <para>
   9    The
  10    <ulink url="http://www.indexdata.dk/zebra/">
  11      Zebra</ulink>
  12    server is a high-performance, general-purpose structured text
  13    indexing and retrieval engine. It reads structured records in a
  14    variety of input formats (eg. email, XML, MARC) and allows access
  15    to them through exact boolean search expressions and
  16    relevance-ranked free-text queries.
  17    </para>
  18
  19    <para>
  20    Zebra supports large databases (more than ten gigabytes of data,
  21    tens of millions of records). It supports incremental, safe
  22    database updates on live systems. You can access data stored in
  23    Zebra using a variety of Index Data tools (eg. YAZ and PHP/YAZ) as
  24    well as commercial and freeware Z39.50 clients and toolkits.
  25    </para>
  26
  27   <para>
  28    This document is an introduction to the Zebra system. It will tell you
  29    how to compile the software, and how to prepare your first database.
  30    It also explains how the server can be configured to give you the
  31    functionality that you need.
  32   </para>
  33
  34   <para>
  35
  36    If you find the software interesting, you should visit the
  37    <ulink url="http://www.indexdata.dk/zebra/">
  38      Zebra web site</ulink>, where you can join the
  39    <ulink url="http://www.indexdata.dk/mailman/listinfo/zebralist">
  40    mailing-list</ulink>
  41    by sending email to
  42   </para>
  43
  44  </sect1>
  45
  46  <sect1 id="features">
  47   <title>Features</title>
  48
  49   <para>
  50    This is an overview of some of the most important features of the
  51    system.
  52   </para>
  53
  54   <para>
  55    <itemizedlist>
  56
  57     <listitem>
  58      <para>
  59       Supports large databases - files for indices, etc. can be
  60       automatically partitioned over multiple disks.
  61      </para>
  62     </listitem>
  63
  64     <listitem>
  65      <para>
  66       Supports arbitrarily complex records - base input format is an
  67       SGML-like syntax which allows nested (structured) data elements, as
  68       well as variant forms of data.
  69      </para>
  70     </listitem>
  71
  72     <listitem>
  73      <para>
  74       Robust updating - records can be added and deleted without
  75       rebuilding the index from scratch.
  76       The update procedure is tolerant to crashes or hard interrupts
  77       during register updating - registers can be reconstructed following
  78       a crash.
  79       Registers can be safely updated even while users are accessing
  80       the server.
  81      </para>
  82     </listitem>
  83
  84     <listitem>
  85      <para>
  86       Supports random storage formats. A system of input filters driven by
  87       regular expressions allows you to easily process most ASCII-based
  88       data formats. SGML, XML, ISO2709 (MARC), and raw text are also
  89       supported.
  90      </para>
  91     </listitem>
  92
  93     <listitem>
  94      <para>
  95       Supports boolean queries as well as relevance-ranking (free-text)
  96       searching. Right truncation and masking in terms are supported, as
  97       well as full regular expressions.
  98      </para>
  99     </listitem>
 100
 101     <listitem>
 102       <para>
 103         Can import the data into Zebras own storage, or just refer to
 104         external files (good for building indexes of "live"
 105         collections).
 106       </para>
 107     </listitem>
 108
 109     <listitem>
 110      <para>
 111       Supports multiple concrete syntaxes
 112       for record exchange (depending on the configuration): GRS-1, SUTRS,
 113       XML, ISO2709 (*MARC). Records can be mapped between record syntaxes
 114       and schema on the fly.
 115      </para>
 116     </listitem>
 117
 118     <listitem>
 119      <para>
 120       Supports approximate matching in registers (ie. spelling mistakes,
 121       etc).
 122      </para>
 123     </listitem>
 124
 125     <listitem>
 126      <para>
 127       Zebra is written in portable C, so it runs on most Unix-like systems
 128       as well as Windows NT - a binary distribution for Windows NT is available.
 129      </para>
 130     </listitem>
 131
 132    </itemizedlist>
 133
 134   </para>
 135
 136   <para>
 137    Z39.50 protocol support:
 138   </para>
 139
 140   <para>
 141    <itemizedlist>
 142     <listitem>
 143      <para>
 144       Protocol facilities: Init, Search, Retrieve, Delete, Browse and Sort.
 145      </para>
 146     </listitem>
 147
 148     <listitem>
 149      <para>
 150       Piggy-backed presents are honored in the search-request.
 151      </para>
 152     </listitem>
 153
 154     <listitem>
 155      <para>
 156       Named result sets are supported.
 157      </para>
 158     </listitem>
 159     <listitem>
 160      <para>
 161       Easily configured to support different application profiles, with
 162       tables for attribute sets, tag sets, and abstract syntaxes.
 163       Additional tables control facilities such as element mappings to
 164       different schema (eg., GILS-to-USMARC).
 165      </para>
 166     </listitem>
 167
 168     <listitem>
 169      <para>
 170       Complex composition specifications using Espec-1 are partially
 171       supported (simple element requests only).
 172      </para>
 173     </listitem>
 174
 175     <listitem>
 176      <para>
 177       Element Set Names are defined using the Espec-1 capability of the
 178       system, and are given in configuration files as simple element
 179       requests (and possibly variant requests).
 180      </para>
 181     </listitem>
 182
 183    </itemizedlist>
 184
 185   </para>
 186
 187  </sect1>
 188
 189  <sect1 id="future">
 190   <title>Future Work</title>
 191
 192   <para>
 193    These are some of the plans that we have for the software in the near
 194    and far future, approximately ordered after their relative importance.
 195   </para>
 196
 197   <para>
 198    <itemizedlist>
 199
 200     <listitem>
 201      <para>
 202        Improved support for XML in search and retrieval. Eventually,
 203        the goal is for Zebra to pull double duty as a flexible
 204        information retrieval engine and high-performance XML
 205        repository.
 206      </para>
 207     </listitem>
 208
 209     <listitem>
 210      <para>
 211        Access to search engine through SOAP/RPC API to allow the
 212        construction of applications without requiring Z39.50 tools.
 213      </para>
 214     </listitem>
 215
 216     <listitem>
 217      <para>
 218        Finalisation, documentation of the Zebra API. Consider
 219        exposing the API through SOAP as well (allowing updates,
 220        database management).
 221      </para>
 222     </listitem>
 223
 224     <listitem>
 225      <para>
 226        Improved free-text searching. We're first and foremost octet jockeys and
 227        we're actively looking for organisations or people who'd like
 228        to contribute experience in relevance ranking and text
 229        searching.
 230      </para>
 231     </listitem>
 232
 233    </itemizedlist>
 234   </para>
 235
 236   <para>
 237    Programmers thrive on user feedback. If you are interested in a
 238    facility that you don't see mentioned here, or if there's something
 239    you think we could do better, please drop us a mail.
 240    If you think it's all really neat, you're welcome to drop us a line
 241    saying that, too. You'll find contact info at the end of this file.
 242   </para>
 243
 244  </sect1>
 245 </chapter>
 246  <!-- Keep this comment at the end of the file
 247  Local variables:
 248  mode: sgml
 249  sgml-omittag:t
 250  sgml-shorttag:t
 251  sgml-minimize-attributes:nil
 252  sgml-always-quote-attributes:t
 253  sgml-indent-step:1
 254  sgml-indent-data:t
 255  sgml-parent-document: "zebra.xml"
 256  sgml-local-catalogs: nil
 257  sgml-namecase-general:t
 258  End:
 259  -->