Describe option -h

[pazpar2-moved-to-github.git] / doc / book.xml
diff --git a/doc/book.xml b/doc/book.xml

index 69fc0d9..c72c540 100644 (file)
--- a/doc/book.xml
+++ b/doc/book.xml
@@ -6,10 +6,10 @@
       %local;
       <!ENTITY % entities SYSTEM "entities.ent">
       %entities;
-     <!ENTITY % common SYSTEM "common/common.ent">
-     %common;
+     <!ENTITY % idcommon SYSTEM "common/common.ent">
+     %idcommon;
  ]>
-<!-- $Id: book.xml,v 1.12 2007-04-23 07:03:06 adam Exp $ -->
+<!-- $Id: book.xml,v 1.23 2007-07-06 20:15:06 adam Exp $ -->
  <book id="book">
   <bookinfo>
    <title>Pazpar2 - User's Guide and Reference</title>
@@ -19,6 +19,9 @@
    <author>
     <firstname>Adam</firstname><surname>Dickmeiss</surname>
    </author>
+  <author>
+   <firstname>Marc</firstname><surname>Cromme</surname>
+  </author>
    <releaseinfo>&version;</releaseinfo>
    <copyright>
     <year>&copyright-year;</year>
@@ -28,11 +31,11 @@
     <simpara>
      Pazpar2 is a high-performance, user interface-independent, data
      model-independent metasearching
-    middleware featuring merging, relevance ranking, record sorting, 
+    middle-ware featuring merging, relevance ranking, record sorting, 
      and faceted results.
     </simpara>
     <simpara>
-    This document is a guide and reference to Pazpar version &version;.
+    This document is a guide and reference to Pazpar2 version &version;.
     </simpara>
     <simpara>
      <inlinemediaobject>
@@ -45,392 +48,570 @@
     </inlinemediaobject>
     </simpara>
    </abstract>
-  </bookinfo>
+ </bookinfo>
   
-  <chapter id="introduction">
-   <title>Introduction</title>
-   <para>
-     Pazpar2 is a stand-alone metasearch client with a webservice API, designed
-     to be used either from a browser-based client (JavaScript, Flash, Java,
-     etc.), from from server-side code, or any combination of the two.
-     Pazpar2 is a highly optimized client designed to
-     search many resources in parallel. It implements record merging,
-     relevance-ranking and sorting by arbitrary data content, and facet
-     analysis for browsing purposes. It is designed to be data model
-     independent, and is capable of working with MARC, DublinCore, or any
-     other XML-structured response format -- XSLT is used to normalize and extract
-     data from retrieval records for display and analysis. It can be used
-     against any server which supports the Z39.50 protocol. Proprietary
-     backend modules can be used to support a large number of other protocols
-     (please contact Index Data for further information about this).
-   </para>
-   <para>
-     Additional functionality such as
-     user management, attractive displays are expected to be implemented by
-     applications that use pazpar2. Pazpar2 is user interface independent.
-     Its functionality is exposed through a simple REST-style webservice API,
-     designed to be simple to use from an Ajax-enbled browser, Flash
-     animation, Java applet, etc., or from a higher-level server-side language
-     like PHP or Java. Because session information can be shared between
-     browser-based logic and your server-side scripting, there is tremendous
-     flexibility in how you implement your business logic on top of pazpar2.
-   </para>
-   <para>
-     Once you launch a search in pazpar2, the operation continues behind the
-     scenes. Pazpar2 connects to servers, carries out searches, and
-     retrieves, deduplicates, and stores results internally. Your application
-     code may periodically inquire about the status of an ongoing operation,
-     and ask to see records or other result set facets. Result become
-     available immediately, and it is easy to build end-user interfaces which
-     feel extremely responsive, even when searching more than 100 servers
-     concurrently.
-   </para>
-   <para>
-     Pazpar2 is designed to be highly configurable. Incoming records are
-     normalized to XML/UTF-8, and then further normalized using XSLT to a
-     simple internal representation that is suitable for analysis. By
-     providing XSLT stylesheets for different kinds of result records, you
-     can tune pazpar2 to work against different kinds of information
-     retrieval servers. Finally, metadata is extracted, in a configurable
-     way, from this internal record, to support display, merging, ranking,
-     result set facets, and sorting. Pazpar2 is not bound to a specific model
-     of metadata, such as DublinCore or MARC -- by providing the right
-     configuration, it can work with a number of different kinds of data in
-     support of many different applications.
-   </para>
-   <para>
-     Pazpar2 is designed to be efficient and scalable. You can set it up to
-     search several hundred targets in parallel, or you can use it to support
-     hundreds of concurrent users. It is implemented with the same attention
-     to performance and economy that we use in our indexing engines, so that
-     you can focus on building your application, without worrying about the
-     details of metasearch logic. You can devote all of your attention to
-     usability and let pazpar2 do what it does best -- metasearch.
-    </para>
-    <para>
-      If you wish to connect to commercial or other databases which do not
-      support open standards, please contact Index Data. We have a licensing
-      agreement with a third party vendor which will enable pazpar2 to access
-      thousands of online databases, in addition the vast number of catalogs
-      and online services that support the Z39.50 protocol.
-    </para>
-    <para>
-      Pazpar2 is our attempt to re-think the traditional paradigms for
-      implementing and deploying metasearch logic, with an uncompromising
-      approach to performance, and attempting to make maximum use of the
-      capabilities of modern browsers. The demo user interface that
-      accompanies the distribution is but one example. If you think of new
-      ways of using pazpar2, we hope you'll share them with us, and if we
-      can provide assistance with regards to training, design, programming,
-      integration with different backends, hosting, or support, please don't
-      hesitate to contact us. If you'd like to see functionality in pazpar2
-      that is not there today, please don't hesitate to contact us. It may
-      already be in our development pipeline, or there might be a
-      possibility for you to help out by sponsoring development time or
-      code. Either way, get in touch and we will give you straight answers.
-    </para>
-    <para>
-      Enjoy!
-    </para>
-    <para>
-      Pazpar2 is covered by the GNU license version 2.
-      See <xref linkend="license"/> for further information.
-    </para>
-  </chapter>
+ <chapter id="introduction">
+  <title>Introduction</title>
+  <para>
+   Pazpar2 is a stand-alone metasearch client with a web-service API, designed
+   to be used either from a browser-based client (JavaScript, Flash, Java,
+   etc.), from server-side code, or any combination of the two.
+   Pazpar2 is a highly optimized client designed to
+   search many resources in parallel. It implements record merging,
+   relevance-ranking and sorting by arbitrary data content, and facet
+   analysis for browsing purposes. It is designed to be data model
+   independent, and is capable of working with MARC, DublinCore, or any
+   other <ulink url="&url.xml;">XML</ulink>-structured response format
+   -- <ulink url="&url.xslt;">XSLT</ulink> is used to normalize and extract
+   data from retrieval records for display and analysis. It can be used
+   against any server which supports the 
+   <ulink url="&url.z39.50;">Z39.50</ulink> protocol. Proprietary
+   backend modules can be used to support a large number of other protocols
+   (please contact Index Data for further information about this).
+  </para>
+  <para>
+   Additional functionality such as
+   user management, attractive displays are expected to be implemented by
+   applications that use Pazpar2. Pazpar2 is user interface independent.
+   Its functionality is exposed through a simple REST-style web-service API,
+   designed to be simple to use from an Ajax-enabled browser, Flash
+   animation, Java applet, etc., or from a higher-level server-side language
+   like PHP or Java. Because session information can be shared between
+   browser-based logic and your server-side scripting, there is tremendous
+   flexibility in how you implement your business logic on top of Pazpar2.
+  </para>
+  <para>
+   Once you launch a search in Pazpar2, the operation continues behind the
+   scenes. Pazpar2 connects to servers, carries out searches, and
+   retrieves, deduplicates, and stores results internally. Your application
+   code may periodically inquire about the status of an ongoing operation,
+   and ask to see records or other result set facets. Result become
+   available immediately, and it is easy to build end-user interfaces which
+   feel extremely responsive, even when searching more than 100 servers
+   concurrently.
+  </para>
+  <para>
+   Pazpar2 is designed to be highly configurable. Incoming records are
+   normalized to XML/UTF-8, and then further normalized using XSLT to a
+   simple internal representation that is suitable for analysis. By
+   providing XSLT stylesheets for different kinds of result records, you
+   can tune Pazpar2 to work against different kinds of information
+   retrieval servers. Finally, metadata is extracted, in a configurable
+   way, from this internal record, to support display, merging, ranking,
+   result set facets, and sorting. Pazpar2 is not bound to a specific model
+   of metadata, such as DublinCore or MARC -- by providing the right
+   configuration, it can work with a number of different kinds of data in
+   support of many different applications.
+  </para>
+  <para>
+   Pazpar2 is designed to be efficient and scalable. You can set it up to
+   search several hundred targets in parallel, or you can use it to support
+   hundreds of concurrent users. It is implemented with the same attention
+   to performance and economy that we use in our indexing engines, so that
+   you can focus on building your application, without worrying about the
+   details of metasearch logic. You can devote all of your attention to
+   usability and let Pazpar2 do what it does best -- metasearch.
+  </para>
+  <para>
+   If you wish to connect to commercial or other databases which do not
+   support open standards, please contact Index Data. We have a licensing
+   agreement with a third party vendor which will enable Pazpar2 to access
+   thousands of online databases, in addition the vast number of catalogs
+   and online services that support the Z39.50 protocol.
+  </para>
+  <para>
+   Pazpar2 is our attempt to re-think the traditional paradigms for
+   implementing and deploying metasearch logic, with an uncompromising
+   approach to performance, and attempting to make maximum use of the
+   capabilities of modern browsers. The demo user interface that
+   accompanies the distribution is but one example. If you think of new
+   ways of using Pazpar2, we hope you'll share them with us, and if we
+   can provide assistance with regards to training, design, programming,
+   integration with different backends, hosting, or support, please don't
+   hesitate to contact us. If you'd like to see functionality in Pazpar2
+   that is not there today, please don't hesitate to contact us. It may
+   already be in our development pipeline, or there might be a
+   possibility for you to help out by sponsoring development time or
+   code. Either way, get in touch and we will give you straight answers.
+  </para>
+  <para>
+   Enjoy!
+  </para>
+  <para>
+   Pazpar2 is covered by the GNU license version 2.
+   See <xref linkend="license"/> for further information.
+  </para>
+ </chapter>
+
+ <chapter id="installation">
+  <title>Installation</title>
+  <para>
+   The Pazpar2 package very small. It includes documentation as well
+   as the Pazpar2 server. The package also includes a simple user
+   interface test1 which consists of a single HTML page and a single
+   JavaScript file to illustrate the use of Pazpar2.
+  </para>
+  <para>
+   Pazpar2 depends on the following tools/libraries:
+   <variablelist>
+    <varlistentry><term><ulink url="&url.yaz;">YAZ</ulink></term>
+     <listitem>
+      <para>
+       The popular Z39.50 toolkit for the C language.
+       YAZ <emphasis>must</emphasis> be compiled with Libxml2/Libxslt support.
+      </para>
+     </listitem>
+    </varlistentry>
+    <varlistentry><term><ulink url="&url.icu;">International
+       Components for Unicode (ICU)</ulink></term>
+     <listitem>
+      <para>
+       ICU provides Unicode support for non-English languages with
+       character sets outside the range of 7bit ASCII, like
+       Greek, Russian, German and French. Pazpar2 uses the ICU
+       Unicode character conversions, Unicode normalization, case
+       folding and other fundamental operations needed in
+       tokenization, normalization and ranking of records. 
+      </para>
+      <para>
+       Compiling, linking, and usage of the ICU libraries is optional,
+       but strongly recommended for usage in an international
+       environment.  
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+  <para>
+   In order to compile Pazpar2, a C compiler which supports C99 or later
+   is required.
+  </para>
+
+  <section id="installation.unix">
+   <title>Installation on Unix (from Source)</title>
+   <para>
+    The latest source code for Pazpar2 is available from
+    <ulink url="&url.pazpar2.download;"/>.
+     Only few systems have none of the required
+     tools binary packages.
+     If, for example, Libxml2/libXSLT libraries
+    are already installed as development packages use these.
+   </para>
+   
+   <para>
+    Ensure that the development libraries + header files are
+    available on your system before compiling Pazpar2. For installation
+    of YAZ, refer to the YAZ installation chapter.
+   </para>
+   <screen>
+    gunzip -c pazpar2-version.tar.gz|tar xf -
+    cd pazpar2-version
+    ./configure
+    make
+    su
+    make install
+   </screen>
+   <para>
+    The <literal>make install</literal> will install manpages as well as the
+    Pazpar2 server, <literal>pazpar2</literal>, 
+    in PREFIX<literal>/sbin</literal>.
+    By default, PREFIX is <literal>/usr/local/</literal> . This can be
+    changed with configure option <option>--prefix</option>.
+   </para>
+  </section>
  
-  <chapter id="installation">
-   <title>Installation</title>
+  <section id="installation.test1">
+   <title>Installation of test1 interface</title>
     <para>
-    Pazpar2 depends on the following tools/libraries:
-    <variablelist>
-     <varlistentry><term><ulink url="&url.yaz;">YAZ</ulink></term>
-      <listitem>
-       <para>
-       The popular Z39.50 toolkit for the C language. YAZ must be
-       compiled with Libxml2/Libxslt support.
-       </para>
-      </listitem>
-     </varlistentry>
-    </variablelist>
+    In this section we outline how to install a simple interface that
+    is part of the Pazpar2 source package. Note that Debian users can
+    save time by just installing package <literal>pazpar2-test1</literal>.
     </para>
     <para>
-    In order to compile Pazpar2 an ANSI C compiler is
-    required. The requirements should be the same as for YAZ.
+    A web server must be installed and running on the system, such as Apache.
     </para>
  
-   <section id="installation.unix">
-    <title>Installation on Unix (from Source)</title>
-    <para>
-     Here is a quick step-by-step guide on how to compile the
-     tools that Pazpar2 uses. Only few systems have none of the required
-     tools binary packages. If, for example, Libxml2/libxslt are already
-     installed as development packages use these.
-    </para>
-    
-    <para>
-     Ensure that the development libraries + header files are
-     available on your system before compiling Pazpar2. For installation
-     of YAZ, refer to the YAZ installation chapter.
-    </para>
+   <para>
+    Start the Pazpar2 daemon using the 'in-source' binary of the Pazpar2
+    daemon.
      <screen>
-     gunzip -c pazpar2-version.tar.gz|tar xf -
-     cd pazpar2-version
-     ./configure
-     make
-     su
-     make install
+     cd etc
+     cp pazpar2.cfg.dist pazpar2.cfg
+     ../src/pazpar2 -f pazpar2.cfg -t edu.xml
      </screen>
-   </section>
+    This will start a Pazpar2 listener on port 8004. It will proxy 
+    HTTP requests to localhost - port 80, which we assume will be the regular
+    HTTP server on the system. Inspect and modify pazpar2.cfg as needed
+    if this is to be changed. The -t option specifies the list of targets
+    to use for searches.
+   </para>
+   <para>
+    Make a new console and move to the other stuff.
+    For more information about pazpar2 options refer to the manpage.
+   </para>
  
-   <section id="installation.debian">
-    <title>Installation on Debian GNU/Linux</title>
-    <para>
-     All dependencies for Pazpar2 are available as 
-     <ulink url="&url.debian;">Debian</ulink>
-     packages for the sarge (stable in 2005) and etch (testing in 2005)
-     distributions.
-    </para>
-    <para>
-     The procedures for Debian based systems, such as
-     <ulink url="&url.ubuntu;">Ubuntu</ulink> is probably similar
-    </para>
+   <para>
+    The test1 UI is located in <literal>www/test1</literal>. Ensure this
+    directory is available to the web server by either copying 
+    <literal>test1</literal> to the document root, create a symlink or
+    use Apache's <literal>Alias</literal> directive.
+   </para>
+
+   <para>
+    The interface test1 interface should now be available on port 8004.
+   </para>
+   <para>
+    If you don't see the test1 interface. See if test1 is really available
+    on the same URL but on port 80. If it's not, the Apache configuration
+    (or other) is not correct. 
+   </para>
+   <para>
+    In order to use Apache as frontend for the interface on port 80
+    for public access etc., refer to 
+    <xref linkend="installation.apache2proxy"/>.
+   </para>
+  </section>
+
+  <section id="installation.debian">
+   <title>Installation on Debian GNU/Linux</title>
+   <para>
+    Index Data provides Debian packages for Pazpar2. These are prepared
+    for Debian versions Etch and Lenny (as of 2007).
+    These packages are available at
+    <ulink url="&url.pazpar2.download.debian;"/>.
+   </para>
+  </section>
+
+  <section id="installation.apache2proxy">
+   <title>Apache 2 Proxy</title>
+   <para>
+    Apache 2 has a 
+    <ulink url="http://httpd.apache.org/docs/2.2/mod/mod_proxy.html">
+     proxy module
+    </ulink> which allows Pazpar2 to become a backend to an Apache 2
+    based web service. The Apache 2 proxy must operate in the
+    <emphasis>Reverse</emphasis> Proxy mode.
+   </para>
+   
+   <para>
+    On a Debian based Apache 2 system, the relevant modules can
+    be enabled with:
      <screen>
-     apt-get install libyaz-dev
+     sudo a2enmod proxy_http
      </screen>
+   </para>
+
+   <para>
+    Traditionally Pazpar2 interprets URL paths with suffix 
+    <literal>/search.pz2</literal>.
+    The 
+    <ulink 
+     url="http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypass"
+     >ProxyPass</ulink> directive of Apache must be used to map a URL path
+    the the Pazpar2 server (listening port).
+   </para>
+
+   <note>
      <para>
-     With these packages installed, the usual configure + make
-     procedure can be used for Pazpar2 as outlined in
-     <xref linkend="installation.unix"/>.
+     The ProxyPass directive takes a prefix rather than
+     a suffix as URL path. It is important that the Java Script code
+     uses the prefix given for it.
      </para>
-   </section>
-  </chapter>
+   </note>
  
-  <chapter id="using">
-    <title>Using pazpar2</title>
+   <example id="installation.apache2proxy.example">
+    <title>Apache 2 proxy configuration</title>
      <para>
-      This chapter provides a general introduction to the use and deployment of pazpar2.
+     If Pazpar2 is running on port 8004 and the portal is using
+     <filename>search.pz2</filename> inside portal in directory
+     <filename>/myportal/</filename> we could use the following
+     Apache 2 configuration:
+
+     <screen><![CDATA[
+      <IfModule mod_proxy.c>
+       ProxyRequests Off
+      
+       <Proxy *>
+        AddDefaultCharset off
+        Order deny,allow
+        Allow from all
+       </Proxy>
+      
+       ProxyPass /myportal/search.pz2 http://localhost:8004/search.pz2
+       ProxyVia Off
+      </IfModule>
+      ]]></screen>
      </para>
+   </example>
+  </section>
  
-    <section id="architecture">
-      <title>Pazpar2 and your systems architecture</title>
-      <para>
-       Pazpar2 is designed to provide asynchronous, behind-the-scenes
-       metasearching functionality to your application, exposing this
-       functionality using a simple webservice API that can be accessed
-       from any number of development environments. In particular, it is
-       possible to combine pazpar2 either with your server-side dynamic
-       website scripting, with scripting or code running in the browser, or
-       with any combination of the two. Pazpar2 is an excellent tool for
-       building advanced, Ajax-based user interfaces for metasearch
-       functionality, but it isn't a requirement -- you can choose to use
-       pazpar2 entirely as a backend to your regular server-side scripting.
-       When you do use pazpar2 in conjunction
-       with browser scripting (JavaScript/Ajax, Flash, applets, etc.), there are
-       special considerations.
-      </para>
+ </chapter>
  
-      <para>
-        Pazpar2 implements a simple but efficient HTTP server, and it is
-       designed to interact directly with scripting running in the browser
-       for the best possible performance, and to limit overhead when
-       several browser clients generate numerous webservice requests.
-       However, it is still desirable to use a conventional webserver,
-       such as Apache, to serve up graphics, HTML documents, and
-       server-side scripting. Because the security sandbox environment of
-       most browser-side programming environments only allows communication
-       with the server from which the enclosing HTML page or object
-       originated, pazpar2 is designed so that it can act as a transparent
-       proxy in front of an existing webserver (see <xref
-       linkend="pazpar2_conf"/> for details). In this mode, all regular
-       HTTP requests are transparently passed through to your webserver,
-       while pazpar2 only intercepts search-related webservice requests.
-      </para>
+ <chapter id="using">
+  <title>Using Pazpar2</title>
+  <para>
+   This chapter provides a general introduction to the use and
+   deployment of Pazpar2. 
+  </para>
  
-      <para>
-        If you want to expose your combined service on port 80, you can
-       either run your regular webserver on a different port, a different
-       server, or a different IP address associated with the same server.
-      </para>
+  <section id="architecture">
+   <title>Pazpar2 and your systems architecture</title>
+   <para>
+    Pazpar2 is designed to provide asynchronous, behind-the-scenes
+    metasearching functionality to your application, exposing this
+    functionality using a simple webservice API that can be accessed
+    from any number of development environments. In particular, it is
+    possible to combine Pazpar2 either with your server-side dynamic
+    website scripting, with scripting or code running in the browser, or
+    with any combination of the two. Pazpar2 is an excellent tool for
+    building advanced, Ajax-based user interfaces for metasearch
+    functionality, but it isn't a requirement -- you can choose to use
+    Pazpar2 entirely as a backend to your regular server-side scripting.
+    When you do use Pazpar2 in conjunction
+    with browser scripting (JavaScript/Ajax, Flash, applets,
+    etc.), there are   special considerations.
+   </para>
  
-      <para>
-        Sometimes, it may be necessary to implement functionality on your
-       regular webserver that makes use of search results, for example to
-       implement data import functionality, emailing results, history
-       lists, personal citation lists, interlibrary loan functionality
-       ,etc. Fortunately, it is simple to exchange information between
-       pazpar2, your browser scripting, and backend server-side scripting.
-       You can send a session ID and possibly a record ID from your browser
-       code to your server code, and from there use pazpar2s webservice API
-       to access result sets or individual records. You could even 'hide'
-       all of pazpar2s functionality between your own API implemented on
-       the server-side, and access that from the browser or elsewhere. The
-       possibilities are just about endless.
-      </para>
-    </section>
+   <para>
+    Pazpar2 implements a simple but efficient HTTP server, and it is
+    designed to interact directly with scripting running in the browser
+    for the best possible performance, and to limit overhead when
+    several browser clients generate numerous webservice requests.
+    However, it is still desirable to use a conventional webserver,
+    such as Apache, to serve up graphics, HTML documents, and
+    server-side scripting. Because the security sandbox environment of
+    most browser-side programming environments only allows communication
+    with the server from which the enclosing HTML page or object
+    originated, Pazpar2 is designed so that it can act as a transparent
+    proxy in front of an existing webserver (see <xref
+     linkend="pazpar2_conf"/> for details). 
+    In this mode, all regular
+    HTTP requests are transparently passed through to your webserver,
+    while Pazpar2 only intercepts search-related webservice requests.
+   </para>
  
-    <section id="data_model">
-      <title>Your data model</title>
-      <para>
-        Pazpar2 does not have a preconceived model of what makes up a data
-       model. There are no assumption that records have specific fields or
-       that they are organized in any particular way. The only assumption
-       is that data comes packaged in a form that the software can work
-       with (presently, that means XML or MARC), and that you can provide
-       the necessary information to massage it into pazpar2's internal
-       record abstraction.
-      </para>
+   <para>
+    If you want to expose your combined service on port 80, you can
+    either run your regular webserver on a different port, a different
+    server, or a different IP address associated with the same server.
+   </para>
  
-      <para>
-        Handling retrieval records in pazpar2 is a two-step process. First,
-       you decide which data elements of the source record you are
-       interested in, and you specify any desired massaging or combining of
-       elements using an XSLT stylesheet (MARC records are automatically
-       normalized to MARCXML before this step). If desired, you can run
-       multiple XSLT stylesheets in series to accomplish this, but the
-       output of the last one should be a representation of the record in a
-       schema that pazpar2 understands.
-      </para>
+   <para>
+    Pazpar2 can also work behind
+    a reverse Proxy. Refer to <xref linkend="installation.apache2proxy"/>)
+    for more information.
+    This allows your existing HTTP server to operate on port 80 as usual.
+    Pazpar2 can be started on another (internal) port.
+   </para>
  
-      <para>
-        The intermediate, internal representation of the record looks like
-       this:
-       <screen><![CDATA[
-<record   xmlns="http://www.indexdata.com/pazpar2/1.0"
-         mergekey="title The Shining author King, Stephen">
+   <para>
+    Sometimes, it may be necessary to implement functionality on your
+    regular webserver that makes use of search results, for example to
+    implement data import functionality, emailing results, history
+    lists, personal citation lists, interlibrary loan functionality
+    ,etc. Fortunately, it is simple to exchange information between
+    Pazpar2, your browser scripting, and backend server-side scripting.
+    You can send a session ID and possibly a record ID from your browser
+    code to your server code, and from there use Pazpar2s webservice API
+    to access result sets or individual records. You could even 'hide'
+    all of Pazpar2s functionality between your own API implemented on
+    the server-side, and access that from the browser or elsewhere. The
+    possibilities are just about endless.
+   </para>
+  </section>
  
-    <metadata type="title">The Shining</metadata>
+  <section id="data_model">
+   <title>Your data model</title>
+   <para>
+    Pazpar2 does not have a preconceived model of what makes up a data
+    model. There are no assumption that records have specific fields or
+    that they are organized in any particular way. The only assumption
+    is that data comes packaged in a form that the software can work
+    with (presently, that means XML or MARC), and that you can provide
+    the necessary information to massage it into Pazpar2's internal
+    record abstraction.
+   </para>
  
-    <metadata type="author">King, Stephen</metadata>
+   <para>
+    Handling retrieval records in Pazpar2 is a two-step process. First,
+    you decide which data elements of the source record you are
+    interested in, and you specify any desired massaging or combining of
+    elements using an XSLT stylesheet (MARC records are automatically
+    normalized to <ulink url="&url.marcxml;">MARCXML</ulink> before this step).
+    If desired, you can run multiple XSLT stylesheets in series to accomplish
+    this, but the output of the last one should be a representation of the
+    record in a schema that Pazpar2 understands.
+   </para>
  
-    <metadata type="kind">ebook</metadata>
+   <para>
+    The intermediate, internal representation of the record looks like
+    this:
+    <screen><![CDATA[
+     <record   xmlns="http://www.indexdata.com/pazpar2/1.0"
+     mergekey="title The Shining author King, Stephen">
  
-    <!-- ... and so on -->
-</record>
-]]></screen>
+     <metadata type="title">The Shining</metadata>
  
-        As you can see, there isn't much to it. There are really only a few
-       important elements to this file.
-      </para>
+     <metadata type="author">King, Stephen</metadata>
  
-      <para>
-        Elements should belong to the namespace
-       http://www.indexdata.com/pazpar2/1.0. If the root node contains the
-       attribute 'mergekey', then every record that generates the same
-       merge key (normalized for case differences, white space, and
-       truncation) will be joined into a cluster. In other words, you
-       decide how records are merged. If you don't include a merge key,
-       records are never merged. The 'metadata' elements provide the meat
-       of the elements -- the content. the 'type' attribute is used to
-       match each element against processing rules that determine what
-       happens to the data element next.
-      </para>
+     <metadata type="kind">ebook</metadata>
  
-      <para>
-        The next processing step is the extraction of metadata from the
-       intermediate representation of the record. This is governed by the
-       'metadata' elements in the 'service' section of the configuration
-       file. See <xref linkend="config-server"/> for details. The metadata
-       in the retrieval record ultimately drives merging, sorting, ranking,
-       the extraction of browse facets, and display, all configurable.
-      </para>
-    </section>
+     <!-- ... and so on -->
+    </record>
+     ]]></screen>
  
-    <section id="client">
-      <title>Client development overview</title>
-      <para>
-        You can use pazpar2 from any environment that allows you to use
-       webservices. The initial goal of the software was to support
-       Ajax-based applications, but there literally are no limits to what
-       you can do. You can use pazpar2 from Javascript, Flash, Java, etc.,
-       on the browser side, and from any development environment on the
-       server side, and you can pass session tokens and record IDs freely
-       around between these environments to build sophisticated applications.
-       Use your imagination.
-      </para>
+    As you can see, there isn't much to it. There are really only a few
+    important elements to this file.
+   </para>
  
-      <para>
-        The webservice API of pazpar2 is described in detail in <xref
-       linkend="pazpar2_protocol"/>.
-      </para>
+   <para>
+    Elements should belong to the namespace
+    <literal>http://www.indexdata.com/pazpar2/1.0</literal>.
+    If the root node contains the
+    attribute 'mergekey', then every record that generates the same
+    merge key (normalized for case differences, white space, and
+    truncation) will be joined into a cluster. In other words, you
+    decide how records are merged. If you don't include a merge key,
+    records are never merged. The 'metadata' elements provide the meat
+    of the elements -- the content. the 'type' attribute is used to
+    match each element against processing rules that determine what
+    happens to the data element next.
+   </para>
  
-      <para>
-        In brief, you use the 'init' command to create a session, a
-       temporary workspace which carries information about the current
-       search. You start a new search using the 'search' command. Once the
-       search has been started, you can follow its progress using the
-       'stat', 'bytarget', 'termlist', or 'show' commands. Detailed records
-       can be fetched using the 'record' command.
-      </para>
-    </section>
+   <para>
+    The next processing step is the extraction of metadata from the
+    intermediate representation of the record. This is governed by the
+    'metadata' elements in the 'service' section of the configuration
+    file. See <xref linkend="config-server"/> for details. The metadata
+    in the retrieval record ultimately drives merging, sorting, ranking,
+    the extraction of browse facets, and display, all configurable.
+   </para>
+  </section>
  
-    <section id="nonstandard">
-      <title>Connecting to non-standard resources</title>
-      <para>
-        Pazpar2 uses Z39.50 as its switchboard language -- i.e. as far as it
-       is concerned, all resources speak Z39.50. It is, however, equipped
-       to handle a broad range of different server behavior, through
-       configurable query mapping and record normalization. If you develop
-       configuration, stylesheets, etc., for a new type of resources, we
-       encourage you to share your work. But you can also use pazpar2 to
-       connect to hundreds of resources that do not support standard
-       protocols.
-      </para>
+  <section id="client">
+   <title>Client development overview</title>
+   <para>
+    You can use Pazpar2 from any environment that allows you to use
+    webservices. The initial goal of the software was to support
+    Ajax-based applications, but there literally are no limits to what
+    you can do. You can use Pazpar2 from Javascript, Flash, Java, etc.,
+    on the browser side, and from any development environment on the
+    server side, and you can pass session tokens and record IDs freely
+    around between these environments to build sophisticated applications.
+    Use your imagination.
+   </para>
  
-      <para>
-        For a growing number of resources, Z39.50 is all you need. Over the
-       last few years, a number of commercial, full-text resources have
-       implemented Z39.50. These can be used through pazpar2 with little or
-       no effort. Resources that use non-standard record formats will
-       require a bit of XSLT work, but that's all.
-      </para>
+   <para>
+    The webservice API of Pazpar2 is described in detail in <xref
+     linkend="pazpar2_protocol"/>.
+   </para>
  
-      <para>
-        But what about resources that don't support Z39.50 at all? The NISO
-       SRU (MXG) protocol is slowly gathering steam. Other resources might
-       support OpenSearch, private, XML/HTTP-based protocols, or something
-       else entirely. Some databases exist only as web user interfaces and
-         will require screen-scraping. Still others exist only as static
-         files, or perhaps as databases supporting the OAI-PMH protocol.
-         There is hope! Read on.
-      </para>
+   <para>
+    In brief, you use the 'init' command to create a session, a
+    temporary workspace which carries information about the current
+    search. You start a new search using the 'search' command. Once the
+    search has been started, you can follow its progress using the
+    'stat', 'bytarget', 'termlist', or 'show' commands. Detailed records
+    can be fetched using the 'record' command.
+   </para>
+  </section>
  
-      <para>
-        Index Data continues to advocate the support of open standards. We
-       work with database vendors to support standards, so you don't have
-       to worry about programming against non-standard services. We also
-       provide tools (see <ulink
-       url="http://www.indexdata.com/simpleserver">SimpleServer</ulink>)
-       which make it comparatively easy to build gateways against servers
-       with non-standard behavior. Again, we encourage you to share any
-       work you do in this direction.
-      </para>
+  <section id="nonstandard">
+   <title>Connecting to non-standard resources</title>
+   <para>
+    Pazpar2 uses Z39.50 as its switchboard language -- i.e. as far as it
+    is concerned, all resources speak Z39.50. It is, however, equipped
+    to handle a broad range of different server behavior, through
+    configurable query mapping and record normalization. If you develop
+    configuration, stylesheets, etc., for a new type of resources, we
+    encourage you to share your work. But you can also use Pazpar2 to
+    connect to hundreds of resources that do not support standard
+    protocols.
+   </para>
  
-      <para>
-        But the bottom line is that working with non-standard resources in
-       metasearching is really, really hard. If you want to build a
-       project with pazpar2, and you need access to resources with
-       non-standard interfaces, we can help. We run gateways to more than
-       2,000 popular, commercial databases and other resources, making it simple
-       to plug them directly into pazpar2. For a small annual fee per
-       database, we can help you establish connections to your licensed
-       resources. Meanwhile, you can help! If you build your own
-       standards-compliant gateways, host them for others, or share the
-       code! And tell your vendors that they can save everybody money and
-       increase the appeal of their resources by supporting standards.
-      </para>
+   <para>
+    For a growing number of resources, Z39.50 is all you need. Over the
+    last few years, a number of commercial, full-text resources have
+    implemented Z39.50. These can be used through Pazpar2 with little or
+    no effort. Resources that use non-standard record formats will
+    require a bit of XSLT work, but that's all.
+   </para>
  
-      <para>
-        There are those who will ask us why we are using Z39.50 as our
-       switchboard langyage rather than a different protocol. Basically,
-       we believe that Z39.50 is presently the most widely implemented 
-       information retrieval protocol that has the level of functionality
-       required to support a good metasearching experience (structured
-       searching, structured, well-defined results). It is also compact and
-       efficient, and there is a very broad range of tools available to
-       implement it.
-      </para>
-    </section>
-  </chapter> <!-- Using pazpar2 -->
+   <para>
+    But what about resources that don't support Z39.50 at all? The NISO
+    SRU (MXG) protocol is slowly gathering steam. Other resources might
+    support OpenSearch, private, XML/HTTP-based protocols, or something
+    else entirely. Some databases exist only as web user interfaces and
+    will require screen-scraping. Still others exist only as static
+    files, or perhaps as databases supporting the OAI-PMH protocol.
+    There is hope! Read on.
+   </para>
+
+   <para>
+    Index Data continues to advocate the support of open standards. We
+    work with database vendors to support standards, so you don't have
+    to worry about programming against non-standard services. We also
+    provide tools (see <ulink
+     url="http://www.indexdata.com/simpleserver">SimpleServer</ulink>)
+    which make it comparatively easy to build gateways against servers
+    with non-standard behavior. Again, we encourage you to share any
+    work you do in this direction.
+   </para>
+
+   <para>
+    But the bottom line is that working with non-standard resources in
+    metasearching is really, really hard. If you want to build a
+    project with Pazpar2, and you need access to resources with
+    non-standard interfaces, we can help. We run gateways to more than
+    2,000 popular, commercial databases and other resources,
+    making it simple 
+    to plug them directly into Pazpar2. For a small annual fee per
+    database, we can help you establish connections to your licensed
+    resources. Meanwhile, you can help! If you build your own
+    standards-compliant gateways, host them for others, or share the
+    code! And tell your vendors that they can save everybody money and
+    increase the appeal of their resources by supporting standards.
+   </para>
+
+   <para>
+    There are those who will ask us why we are using Z39.50 as our
+    switchboard language rather than a different protocol. Basically,
+    we believe that Z39.50 is presently the most widely implemented 
+    information retrieval protocol that has the level of functionality
+    required to support a good metasearching experience (structured
+    searching, structured, well-defined results). It is also compact and
+    efficient, and there is a very broad range of tools available to
+    implement it.
+   </para>
+  </section>
+
+  <section id="unicode">
+   <title>Unicode Compliance</title>
+   <para>
+    Pazpar2 is Unicode compliant and language and locale aware but relies
+    on character encoding for the targets to be specified correctly if
+    the targets themselves are not UTF-8 based (most aren't).
+    Just a few bad behaving targets can spoil the search experience
+    considerably if for example Greek, Russian or otherwise non 7-bit ASCII
+    search terms are entered. In these cases some targets return
+    records irrelevant to the query, and the result screens will be
+    cluttered with noise.
+   </para>
+   <para>
+    While noise from misbehaving targets can not be removed, it can
+    be reduced using truly Unicode based ranking. This is an
+    option which is available to the system administrator if ICU
+    support is compiled into Pazpar2, see
+    <xref linkend="installation"/> for details.
+   </para>
+   <para>
+    In addition, the ICU tokenization and normalization rules must
+    be defined in the master configuration file described in 
+    <xref linkend="config-server"/>.
+   </para>
+  </section>
+
+ </chapter> <!-- Using Pazpar2 -->
  
   <reference id="reference">
    <title>Reference</title>
@@ -757,7 +938,7 @@ POSSIBILITY OF SUCH DAMAGES.
     </screen> 
    </section>
   </appendix>
-
+ 
  </book>
  
   <!-- Keep this comment at the end of the file