Updates to pz:authentication documentation

[pazpar2-moved-to-github.git] / doc / book.xml
diff --git a/doc/book.xml b/doc/book.xml

index 7d28253..3f5f4c5 100644 (file)
--- a/doc/book.xml
+++ b/doc/book.xml
@@ -1,33 +1,51 @@
  <?xml version="1.0" standalone="no"?>
-<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1//EN"
-    "http://www.oasis-open.org/docbook/xml/4.1/docbookx.dtd" 
+<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+    "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"
  [
       <!ENTITY % local SYSTEM "local.ent">
       %local;
       <!ENTITY % entities SYSTEM "entities.ent">
       %entities;
-     <!ENTITY % common SYSTEM "common/common.ent">
-     %common;
+     <!ENTITY % idcommon SYSTEM "common/common.ent">
+     %idcommon;
  ]>
-<!-- $Id: book.xml,v 1.4 2007-01-13 05:48:41 quinn Exp $ -->
  <book id="book">
   <bookinfo>
    <title>Pazpar2 - User's Guide and Reference</title>
    <author>
     <firstname>Sebastian</firstname><surname>Hammer</surname>
    </author>
+  <author>
+   <firstname>Adam</firstname><surname>Dickmeiss</surname>
+  </author>
+  <author>
+   <firstname>Marc</firstname><surname>Cromme</surname>
+  </author>
+  <author>
+   <firstname>Jakub</firstname><surname>Skoczen</surname>
+  </author>
+  <author>
+   <firstname>Mike</firstname><surname>Taylor</surname>
+  </author>
+  <author>
+   <firstname>Dennis</firstname><surname>Schafroth</surname>
+  </author>
+  <releaseinfo>&version;</releaseinfo>
    <copyright>
     <year>&copyright-year;</year>
     <holder>Index Data</holder>
    </copyright>
    <abstract>
     <simpara>
-    Pazpar2 - High-performance, user-interface independent, metasearching
-         middleware featuring record merging, relevance ranking, and faceted search
-         results.
+    Pazpar2 is a high-performance metasearch engine featuring
+    merging, relevance ranking, record sorting,
+    and faceted results.
+    It is middleware: it has no user interface of its own, but can be
+    configured and controlled by an XML-over-HTTP web-service to provide
+    metasearching functionality behind any user interface.
     </simpara>
     <simpara>
-    This document is a guide and reference to Pazpar version &version;.
+    This document is a guide and reference to Pazpar2 version &version;.
     </simpara>
     <simpara>
      <inlinemediaobject>
@@ -44,123 +62,846 @@
  
   <chapter id="introduction">
    <title>Introduction</title>
-  <para>
-    Pazpar2 is a stand-alone package which implements
-    the best we know to do in terms of the core metasearching
-    functionality; that is, searching a number of databases in parallel,
-    merging, and analyzing the results. Additional functionality such as
-    user management, attractive displays are expected to be implemented by
-    applications that use pazpar2. Pazpar2 is user interface independent.
-    Its functionality is exposed through a simple REST-style webservice API,
-    designed to be simple to use from an Ajax-anbled browser, from a
-    higher-level server-side language like PHP or Java, or even from a Flash
-    application.
-  </para>
-  <para>
-    Once you launch a search in pazpar2, the operation continues behind the
+
+  <section id="what.pazpar2.is">
+   <title>What Pazpar2 is</title>
+   <para>
+    Pazpar2 is a stand-alone metasearch engine with a web-service API, designed
+    to be used either from a browser-based client (JavaScript, Flash,
+    Java applet,
+    etc.), from server-side code, or any combination of the two.
+    Pazpar2 is a highly optimized client designed to
+    search many resources in parallel. It implements record merging,
+    relevance-ranking and sorting by arbitrary data content, and facet
+    analysis for browsing purposes. It is designed to be data-model
+    independent, and is capable of working with MARC, DublinCore, or any
+    other <ulink url="&url.xml;">XML</ulink>-structured response format
+    -- <ulink url="&url.xslt;">XSLT</ulink> is used to normalize and extract
+    data from retrieval records for display and analysis. It can be used
+    against any server which supports the
+    <ulink url="&url.z39.50;">Z39.50</ulink>,
+    <ulink url="&url.sru;">SRU/SRW</ulink>
+    or <ulink url="&url.solr;">Apache Solr</ulink> protocol. Proprietary
+    backend modules can function as connectors between these standard
+    protocols and any non-standard API, including web-site scraping, to
+    support a large number of other protocols.
+   </para>
+   <para>
+    Additional functionality such as
+    user management and attractive displays are expected to be implemented by
+    applications that use Pazpar2. Pazpar2 itself is user-interface independent.
+    Its functionality is exposed through a simple XML-based web-service API,
+    designed to be easy to use from an Ajax-enabled browser, Flash
+    animation, Java applet, etc., or from a higher-level server-side language
+    like PHP, Perl or Java. Because session information can be shared between
+    browser-based logic and server-side scripting, there is tremendous
+    flexibility in how you implement application-specific logic on top
+    of Pazpar2.
+   </para>
+   <para>
+    Once you launch a search in Pazpar2, the operation continues behind the
      scenes. Pazpar2 connects to servers, carries out searches, and
      retrieves, deduplicates, and stores results internally. Your application
      code may periodically inquire about the status of an ongoing operation,
-    and ask to see records or other result set facets.
-  </para>
-  <para>
+    and ask to see records or result set facets. Results become
+    available immediately, and it is easy to build end-user interfaces than
+    feel extremely responsive, even when searching more than 100 servers
+    concurrently.
+   </para>
+   <para>
      Pazpar2 is designed to be highly configurable. Incoming records are
      normalized to XML/UTF-8, and then further normalized using XSLT to a
      simple internal representation that is suitable for analysis. By
      providing XSLT stylesheets for different kinds of result records, you
-    can tune pazpar2 to work against different kinds of information
-    retrieval servers. Finally, metadata is extracted, in a configurable
-    way, from this internal record, to support display, merging, ranking,
+    can configure Pazpar2 to work against different kinds of information
+    retrieval servers. Finally, metadata is extracted in a configurable
+    way from this internal record, to support display, merging, ranking,
      result set facets, and sorting. Pazpar2 is not bound to a specific model
-    of metadata, such as DublinCore or MARC -- by providing the right
-    configuration, it can work with a number of different kinds of data in
-    support of many different applications.
-  </para>
-  <para>
+    of metadata, such as DublinCore or MARC: by providing the right
+    configuration, it can work with any combination of different kinds of data
+    in support of many different applications.
+   </para>
+   <para>
      Pazpar2 is designed to be efficient and scalable. You can set it up to
      search several hundred targets in parallel, or you can use it to support
      hundreds of concurrent users. It is implemented with the same attention
      to performance and economy that we use in our indexing engines, so that
-    you can focus on building your application. You can devote all of your
-    attention to usability and let pazpar2 do what it does best -- search.
+    you can focus on building your application without worrying about the
+    details of metasearch logic. You can devote all of your attention to
+    usability and let Pazpar2 do what it does best -- metasearch.
     </para>
- </chapter>
+   <para>
+    Pazpar2 is our attempt to re-think the traditional paradigms for
+    implementing and deploying metasearch logic, with an uncompromising
+    approach to performance, and attempting to make maximum use of the
+    capabilities of modern browsers. The demo user interface that
+    accompanies the distribution is but one example. If you think of new
+    ways of using Pazpar2, we hope you'll share them with us, and if we
+    can provide assistance with regards to training, design, programming,
+    integration with different backends, hosting, or support, please don't
+    hesitate to contact us. If you'd like to see functionality in Pazpar2
+    that is not there today, please don't hesitate to contact us. It may
+    already be in our development pipeline, or there might be a
+    possibility for you to help out by sponsoring development time or
+    code. Either way, get in touch and we will give you straight answers.
+   </para>
+   <para>
+    Enjoy!
+   </para>
+   <para>
+    Pazpar2 is covered by the GNU General Public License (GPL) version 2.
+    See <xref linkend="license"/> for further information.
+   </para>
+  </section>
  
- <chapter id="license">
-  <title>Pazpar2 License</title>
-  <para>To be decided and written.</para>
+  <section id="connectors">
+   <title>Connectors to non-standard databases</title>
+   <para>
+    If you need to access commercial or open access resources that don't support
+    Z39.50 or SRU, one approach would be to use a tool like <ulink
+    url="&url.simpleserver;">SimpleServer</ulink> to build a
+    gateway. An easier option is to use Index Data's <ulink
+    url="&url.mkc;">MasterKey Connect</ulink>
+    service, which will expose virtually <emphasis>any</emphasis> resource
+    through Z39.50/SRU, dead easy to integrate with Pazpar2.
+    The service is hosted, so all you have to do is to let us
+    know which resources you are interested in, and we operate the gateways,
+    or Connectors for you for a low annual charge.
+    Types of resources supported include
+    commercial databases, free online resources, and even local resources;
+    almost anything that can be accessed through a web-facing user
+    interface can be accessed in this way.
+    Contact <email>info@indexdata.com</email> for more information.
+    See <xref linkend="masterkey_connect"/> for an example.
+   </para>
+  </section>
+
+  <section id="name">
+   <title>A note on the name Pazpar2</title>
+   <para>
+    The name Pazpar2 derives from three sources.  One one hand, it is
+    Index Data's second major piece of software that does parallel
+    searching of Z39.50 targets.  On the other, it is a near-homophone
+    of Passpartout, the ever-helpful servant in Jules Verne's novel
+    Around the World in Eighty Days (who helpfully uses the language
+    of his master).  Finally, "passe par tout" means something like
+    "passes through anything" in French -- on other words, a universal
+    solution, or if you like a MasterKey.
+   </para>
+  </section>
   </chapter>
- 
+
   <chapter id="installation">
    <title>Installation</title>
    <para>
+   The Pazpar2 package includes documentation as well
+   as the Pazpar2 server. The package also includes a simple user
+   interface called "test1", which consists of a single HTML page and a single
+   JavaScript file to illustrate the use of Pazpar2.
+  </para>
+  <para>
     Pazpar2 depends on the following tools/libraries:
     <variablelist>
      <varlistentry><term><ulink url="&url.yaz;">YAZ</ulink></term>
-     <listitem>
-      <para>
-       The popular Z39.50 toolkit for the C language. YAZ must be
-       compiled with Libxml2/Libxslt support.
-      </para>
-     </listitem>
+    <listitem>
+     <para>
+      The popular Z39.50 toolkit for the C language.
+      YAZ <emphasis>must</emphasis> be compiled with
+      <ulink url="&url.libxml2;">Libxml2</ulink>/<ulink url="&url.libxslt;">Libxslt</ulink> support.
+     </para>
+     <para>
+      It is highly recommended that YAZ is also compiled with
+      <ulink url="&url.icu;">ICU</ulink> support.
+     </para>
+    </listitem>
      </varlistentry>
     </variablelist>
    </para>
    <para>
-   In order to compile Pazpar2 an ANSI C compiler is
-   required. The requirements should be the same as for YAZ.
+   In order to compile Pazpar2, a C compiler which supports C99 or later
+   is required.
    </para>
  
    <section id="installation.unix">
-   <title>Installation on Unix (from Source)</title>
+   <title>Installation from source on Unix (including Linux, MacOS, etc.)</title>
     <para>
-    Here is a quick step-by-step guide on how to compile the
-    tools that Pazpar2 uses. Only few systems have none of the required
-    tools binary packages. If, for example, Libxml2/libxslt are already
-    installed as development packages use these.
+    The latest source code for Pazpar2 is available from
+    <ulink url="&url.pazpar2.download;"/>.
+    Most Unix-based operating systems have the required
+    tools available as binary packages.
+    For example, if Libxml2/libXSLT libraries
+    are already installed as development packages, use these.
     </para>
-   
+
     <para>
-    Ensure that the development libraries + header files are
+    Ensure that the development libraries and header files are
      available on your system before compiling Pazpar2. For installation
-    of YAZ, refer to the YAZ installation chapter.
+    of YAZ, refer to the Installation chapter of the YAZ manual at
+    <ulink url="&url.yaz.install;"/>.
+   </para>
+   <para>
+    Once the dependencies are in place, Pazpar2 can be unpacked and
+    installed as follows:
     </para>
     <screen>
-    gunzip -c pazpar2-version.tar.gz|tar xf -
-    cd pazpar2-version
+    tar xzf pazpar2-VERSION.tar.gz
+    cd pazpar2-VERSION
      ./configure
      make
-    su
-    make install
+    sudo make install
     </screen>
+   <para>
+    The <literal>make install</literal> will install manpages as well as the
+    Pazpar2 server, <literal>pazpar2</literal>,
+    in PREFIX<literal>/sbin</literal>.
+    By default, PREFIX is <literal>/usr/local/</literal> . This can be
+    changed with configure option <option>--prefix</option>.
+   </para>
+  </section>
+
+  <section id="installation.win32">
+   <title>Installation from source on Windows</title>
+   <para>
+    Pazpar2 can be built for Windows using
+    <ulink url="&url.vstudio;">Microsoft Visual Studio</ulink>.
+    The support files for building YAZ on Windows are located in the
+    <filename>win</filename> directory. The compilation is performed
+    using the <filename>win/makefile</filename> which is to be
+    processed by the NMAKE utility part of Visual Studio.
+   </para>
+   <para>
+    Ensure that the development libraries and header files are
+    available on your system before compiling Pazpar2. For installation
+    of YAZ, refer to
+    the Installation chapter of the YAZ manual at
+    <ulink url="&url.yaz.install;"/>.
+    It is easiest if YAZ and Pazpar2 are unpacked in the same
+    directory (side-by-side).
+   </para>
+   <para>
+    The compilation is tuned by editing the makefile of Pazpar2.
+    The process is similar to YAZ. Adjust the various directories
+    <literal>YAZ_DIR</literal>, <literal>ICU_DIR</literal>, etc.,
+    as required.
+   </para>
+   <para>
+    Compile Pazpar2 by invoking <application>nmake</application> in
+    the <filename>win</filename> directory.
+    The resulting binaries of the build process are located in the
+    <filename>bin</filename> of the Pazpar2 source
+    tree - including the <filename>pazpar2.exe</filename> and necessary DLLs.
+   </para>
+   <para>
+    The Windows version of Pazpar2 is a console application. It may
+    be installed as a Windows Service by adding option
+    <literal>-install</literal> for the pazpar2 program. This will
+    register Pazpar2 as a service and use the other options provided
+    in the same invocation. For example:
+    <screen>
+     cd \MyPazpar2\etc
+     ..\bin\pazpar2 -install -f pazpar2.cfg -l pazpar2.log
+    </screen>
+    The Pazpar2 service may now be controlled via the Service Control
+    Panel. It may be unregistered by passing the <literal>-remove</literal>
+    option. Example:
+    <screen>
+     cd \MyPazpar2\etc
+     ..\bin\pazpar2 -remove
+    </screen>
+   </para>
+  </section>
+
+  <section id="installation.test1">
+   <title>Installation of test interfaces</title>
+   <para>
+    In this section we show how to make available the set of simple
+    interfaces that are part of the Pazpar2 source package, and which
+    demonstrate some ways to use Pazpar2.  (Note that Debian users can
+    save time by just installing the package <literal>pazpar2-test1</literal>.)
+   </para>
+   <para>
+    A web server, such as Apache, must be installed and running on the system.
+   </para>
+
+   <para>
+    Start the Pazpar2 daemon using the 'in-source' binary of the Pazpar2
+    daemon. On Unix the process is:
+    <screen>
+     cd etc
+     cp pazpar2.cfg.dist pazpar2.cfg
+     ../src/pazpar2 -f pazpar2.cfg
+    </screen>
+    And on Windows:
+    <screen>
+     cd etc
+     copy pazpar2.cfg.dist pazpar2.cfg
+     ..\bin\pazpar2 -f pazpar2.cfg
+    </screen>
+    This will start a Pazpar2 listener on port 9004. It will proxy
+    HTTP requests to port 80 on localhost, which we assume will be the regular
+    HTTP server on the system. Inspect and modify pazpar2.cfg as needed
+    if this is to be changed. The pazpar2.cfg file includes settings from the
+    file <filename>settings/edu.xml</filename>
+    to use for searches.
+   </para>
+
+   <para>
+    The test UIs are located in <literal>www</literal>. Ensure that this
+    directory is available to the web server by copying
+    <literal>www</literal> to the document root,
+    using Apache's <literal>Alias</literal> directive, or
+    creating a symbolic link: for example, on a Debian or Ubuntu
+    system with Apache2 installed from the standard package, you might
+    make the link as follows:
+    <screen>
+     cd .../pazpar2
+     sudo ln -s `pwd`/www /var/www/pazpar2-demo
+    </screen>
+   </para>
+
+   <para>
+    This makes the test applications visible at
+    <ulink url="http://localhost/pazpar2-demo/"/>
+    but they can not be run successfully from that URL, as they submit
+    search requests back to the server form which they were served,
+    and Apache2 doesn't know how to handle them.  Instead, the test
+    applications must be accessed from Pazpar2 itself, acting as a
+    proxy to Apache2, at the URL
+    <ulink url="http://localhost:9004/pazpar2-demo/"/>
+   </para>
+
+   <para>
+    From here, the demo applications can be
+    accessed: <literal>test1</literal>, <literal>test2</literal> and
+    <literal>jsdemo</literal>
+    are pure HTML+JavaScript setups, needing no server-side
+    intelligence;
+    <literal>demo</literal>
+    requires PHP on the server.
+   </para>
+   <para>
+    If you don't see the test interfaces, check whether they are available
+    on port 80 (i.e. directly from the Apache2 server).  If not, the
+    Apache configuration is incorrect.
+   </para>
+   <para>
+    In order to use Apache as frontend for the interface on port 80
+    for public access etc., refer to
+    <xref linkend="installation.apache2proxy"/>.
+   </para>
    </section>
  
    <section id="installation.debian">
-   <title>Installation on Debian GNU/Linux</title>
+   <title>Installation on Debian GNU/Linux and Ubuntu</title>
     <para>
-    All dependencies for Pazpar2 are available as 
-    <ulink url="&url.debian;">Debian</ulink>
-    packages for the sarge (stable in 2005) and etch (testing in 2005)
-    distributions.
+    Index Data provides Debian and Ubuntu packages for Pazpar2 and YAZ.
+    Refer to these directories:
+    <ulink url="&url.pazpar2.download;debian/"/> and
+    <ulink url="&url.pazpar2.download;ubuntu/"/>.
     </para>
+  </section>
+
+  <section id="installation.centos">
+   <title>Installation on RedHat / CentOS</title>
     <para>
-    The procedures for Debian based systems, such as
-    <ulink url="&url.ubuntu;">Ubuntu</ulink> is probably similar
+    Index Data provides CentOS packages for Pazpar2 and YAZ.
+    Refer to
+    <ulink url="&url.pazpar2.download;redhat/centos"/> for
+    CentOS packages.
     </para>
-   <screen>
-    apt-get install libyaz-dev
-   </screen>
+  </section>
+
+  <section id="installation.apache2proxy">
+   <title>Apache 2 Proxy</title>
     <para>
-    With these packages installed, the usual configure + make
-    procedure can be used for Pazpar2 as outlined in
-    <xref linkend="installation.unix"/>.
+    Apache 2 has a
+    <ulink
+       url="http://httpd.apache.org/docs/2.2/mod/mod_proxy.html">
+     proxy module
+    </ulink>
+    which allows Pazpar2 to become a backend to an Apache 2
+    based web service. The Apache 2 proxy must operate in the
+    <emphasis>Reverse</emphasis> Proxy mode.
     </para>
+
+   <para>
+    On a Debian based Apache 2 system, the relevant modules can
+    be enabled with:
+    <screen>
+     sudo a2enmod proxy_http proxy_balancer
+    </screen>
+   </para>
+
+   <para>
+    Traditionally Pazpar2 interprets URL paths with suffix
+    <literal>/search.pz2</literal>.
+    The
+    <ulink
+       url="http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypass">
+    ProxyPass
+    </ulink>
+    directive of Apache must be used to map a URL path
+    the the Pazpar2 server (listening port).
+   </para>
+
+   <note>
+    <para>
+     The ProxyPass directive takes a prefix rather than
+     a suffix as URL path. It is important that the Java Script code
+     uses the prefix given for it.
+    </para>
+   </note>
+
+   <example id="installation.apache2proxy.example">
+    <title>Apache 2 proxy configuration</title>
+    <para>
+     If Pazpar2 is running on port 8004 and the portal is using
+     <filename>search.pz2</filename> inside portal in directory
+     <filename>/myportal/</filename> we could use the following
+     Apache 2 configuration:
+
+     <screen><![CDATA[
+      <IfModule mod_proxy.c>
+       ProxyRequests Off
+
+       <Proxy *>
+        AddDefaultCharset off
+        Order deny,allow
+        Allow from all
+       </Proxy>
+
+       ProxyPass /myportal/search.pz2 http://localhost:8004/search.pz2
+       ProxyVia Off
+      </IfModule>
+      ]]></screen>
+    </para>
+   </example>
    </section>
+
   </chapter>
- 
+
+ <chapter id="using">
+  <title>Using Pazpar2</title>
+  <para>
+   This chapter provides a general introduction to the use and
+   deployment of Pazpar2.
+  </para>
+
+  <section id="architecture">
+   <title>Pazpar2 and your systems architecture</title>
+   <para>
+    Pazpar2 is designed to provide asynchronous, behind-the-scenes
+    metasearching functionality to your application, exposing this
+    functionality using a simple webservice API that can be accessed
+    from any number of development environments. In particular, it is
+    possible to combine Pazpar2 either with your server-side dynamic
+    website scripting, with scripting or code running in the browser, or
+    with any combination of the two. Pazpar2 is an excellent tool for
+    building advanced, Ajax-based user interfaces for metasearch
+    functionality, but it isn't a requirement -- you can choose to use
+    Pazpar2 entirely as a backend to your regular server-side scripting.
+    When you do use Pazpar2 in conjunction
+    with browser scripting (JavaScript/Ajax, Flash, applets,
+    etc.), there are   special considerations.
+   </para>
+
+   <para>
+    Pazpar2 implements a simple but efficient HTTP server, and it is
+    designed to interact directly with scripting running in the browser
+    for the best possible performance, and to limit overhead when
+    several browser clients generate numerous webservice requests.
+    However, it is still desirable to use a conventional webserver,
+    such as Apache, to serve up graphics, HTML documents, and
+    server-side scripting. Because the security sandbox environment of
+    most browser-side programming environments only allows communication
+    with the server from which the enclosing HTML page or object
+    originated, Pazpar2 is designed so that it can act as a transparent
+    proxy in front of an existing webserver (see <xref
+    linkend="pazpar2_conf"/> for details).
+    In this mode, all regular
+    HTTP requests are transparently passed through to your webserver,
+    while Pazpar2 only intercepts search-related webservice requests.
+   </para>
+
+   <para>
+    If you want to expose your combined service on port 80, you can
+    either run your regular webserver on a different port, a different
+    server, or a different IP address associated with the same server.
+   </para>
+
+   <para>
+    Pazpar2 can also work behind
+    a reverse Proxy. Refer to <xref linkend="installation.apache2proxy"/>)
+    for more information.
+    This allows your existing HTTP server to operate on port 80 as usual.
+    Pazpar2 can be started on another (internal) port.
+   </para>
+
+   <para>
+    Sometimes, it may be necessary to implement functionality on your
+    regular webserver that makes use of search results, for example to
+    implement data import functionality, emailing results, history
+    lists, personal citation lists, interlibrary loan functionality,
+    etc. Fortunately, it is simple to exchange information between
+    Pazpar2, your browser scripting, and backend server-side scripting.
+    You can send a session ID and possibly a record ID from your browser
+    code to your server code, and from there use Pazpar2s webservice API
+    to access result sets or individual records. You could even 'hide'
+    all of Pazpar2s functionality between your own API implemented on
+    the server-side, and access that from the browser or elsewhere. The
+    possibilities are just about endless.
+   </para>
+  </section>
+
+  <section id="data_model">
+   <title>Your data model</title>
+   <para>
+    Pazpar2 does not have a preconceived model of what makes up a data
+    model. There are no assumptions that records have specific fields or
+    that they are organized in any particular way. The only assumption
+    is that data comes packaged in a form that the software can work
+    with (presently, that means XML or MARC), and that you can provide
+    the necessary information to massage it into Pazpar2's internal
+    record abstraction.
+   </para>
+
+   <para>
+    Handling retrieval records in Pazpar2 is a two-step process. First,
+    you decide which data elements of the source record you are
+    interested in, and you specify any desired massaging or combining of
+    elements using an XSLT stylesheet (MARC records are automatically
+    normalized to <ulink url="&url.marcxml;">MARCXML</ulink> before this step).
+    If desired, you can run multiple XSLT stylesheets in series to accomplish
+    this, but the output of the last one should be a representation of the
+    record in a schema that Pazpar2 understands.
+   </para>
+
+   <para>
+    The intermediate, internal representation of the record looks like
+    this:
+    <screen><![CDATA[
+     <record xmlns="http://www.indexdata.com/pazpar2/1.0"
+       mergekey="title The Shining author King, Stephen">
+
+       <metadata type="title" rank="2">The Shining</metadata>
+
+       <metadata type="author">King, Stephen</metadata>
+
+       <metadata type="kind">ebook</metadata>
+       <!-- ... and so on -->
+     </record>
+]]></screen>
+
+    As you can see, there isn't much to it. There are really only a few
+    important elements to this file.
+   </para>
+
+   <para>
+    Elements should belong to the namespace
+    <literal>http://www.indexdata.com/pazpar2/1.0</literal>.
+    If the root node contains the
+    attribute 'mergekey', then every record that generates the same
+    merge key (normalized for case differences, white space, and
+    truncation) will be joined into a cluster. In other words, you
+    decide how records are merged. If you don't include a merge key,
+    records are never merged. The 'metadata' elements provide the meat
+    of the elements -- the content. the 'type' attribute is used to
+    match each element against processing rules that determine what
+    happens to the data element next. The attribute, 'rank' specifies
+    specifies a multipler for ranking for this element.
+   </para>
+
+   <para>
+    The next processing step is the extraction of metadata from the
+    intermediate representation of the record. This is governed by the
+    'metadata' elements in the 'service' section of the configuration
+    file. See <xref linkend="config-server"/> for details. The metadata
+    in the retrieval record ultimately drives merging, sorting, ranking,
+    the extraction of browse facets, and display, all configurable.
+   </para>
+
+   <para>
+    Pazpar2 1.6.37 and later also allows already clustered records to
+    be ingested. Suppose a database already clusters for us and we would like
+    to keep that cluster for Pazpar2. In that case we can generate a
+    <literal>cluster</literal> wrapper element that holds individual
+    <literal>record</literal> elements.
+   </para>
+   <para>
+    Cluster record example:
+    <screen><![CDATA[
+     <cluster xmlns="http://www.indexdata.com/pazpar2/1.0">
+       <record>
+         <metadata type="title" rank="2">The Shining</metadata>
+        <metadata type="author">King, Stephen</metadata>
+        <metadata type="kind">ebook</metadata>
+       </record>
+       <record>
+         <metadata type="title" rank="2">The Shining</metadata>
+        <metadata type="author">King, Stephen</metadata>
+        <metadata type="kind">audio</metadata>
+       </record>
+    </cluster>
+     ]]></screen>
+   </para>
+  </section>
+
+  <section id="client">
+   <title>Client development overview</title>
+   <para>
+    You can use Pazpar2 from any environment that allows you to use
+    webservices. The initial goal of the software was to support
+    Ajax-based applications, but there literally are no limits to what
+    you can do. You can use Pazpar2 from Javascript, Flash, Java, etc.,
+    on the browser side, and from any development environment on the
+    server side, and you can pass session tokens and record IDs freely
+    around between these environments to build sophisticated applications.
+    Use your imagination.
+   </para>
+
+   <para>
+    The webservice API of Pazpar2 is described in detail in <xref
+     linkend="pazpar2_protocol"/>.
+   </para>
+
+   <para>
+    In brief, you use the 'init' command to create a session, a
+    temporary workspace which carries information about the current
+    search. You start a new search using the 'search' command. Once the
+    search has been started, you can follow its progress using the
+    'stat', 'bytarget', 'termlist', or 'show' commands. Detailed records
+    can be fetched using the 'record' command. 
+   </para>
+  </section>
+
+  &sect-ajaxdev;
+
+  <section id="unicode">
+   <title>Unicode Compliance</title>
+   <para>
+    Pazpar2 is Unicode compliant and language and locale aware but relies
+    on character encoding for the targets to be specified correctly if
+    the targets themselves are not UTF-8 based (most aren't).
+    Just a few bad behaving targets can spoil the search experience
+    considerably if for example Greek, Russian or otherwise non 7-bit ASCII
+    search terms are entered. In these cases some targets return
+    records irrelevant to the query, and the result screens will be
+    cluttered with noise.
+   </para>
+   <para>
+    While noise from misbehaving targets can not be removed, it can
+    be reduced using truly Unicode based ranking. This is an
+    option which is available to the system administrator if ICU
+    support is compiled into YAZ, see
+    <xref linkend="installation"/> for details.
+   </para>
+   <para>
+    In addition, the ICU tokenization and normalization rules must
+    be defined in the master configuration file described in
+    <xref linkend="config-server"/>.
+   </para>
+  </section>
+
+  <section id="load_balancing">
+   <title>Load balancing</title>
+   <para>
+     Just like any web server, Pazpar2, can be load balanced by a standard
+     hardware or software load balancer as long as the session stickiness
+     is ensured. If you are already running the Apache2 web server in front
+     of Pazpar2 and use the apache mod_proxy module to 'relay' client
+     requests to Pazpar2, this set up can be easily extended to include
+     load balancing capabilites.
+     To do so you need to enable the
+     <ulink url="http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html">
+      mod_proxy_balance
+     </ulink>
+     module in your Apache2 installation.
+   </para>
+
+   <para>
+    On a Debian based Apache 2 system, the relevant modules can
+    be enabled with:
+    <screen>
+     sudo a2enmod proxy_http
+    </screen>
+   </para>
+
+   <para>
+    The mod_proxy_balancer can pass all 'sessionsticky' requests to the
+    same backend worker as long as the requests are marked with the
+    originating worker's ID (called 'route'). If the Pazpar2 serverID is
+    configured (by setting an 'id' attribute on the 'server' element in
+    the Pazpar2 configuration file) Pazpar2 will append it to the
+    'session' element returned during the 'init' in a mod_proxy_balancer
+    compatible manner.
+    Since the 'session' is then re-sent by the client (for all pazpar2
+    request besides 'init'), the balancer can use the marker to pass
+    the request to the right route. To do so the balancer needs to be
+    configured to inspect the 'session' parameter.
+   </para>
+
+   <example id="load_balancing.example">
+    <title>Apache 2 load balancing configuration</title>
+    <para>
+     Having 4 Pazpar2 instances running on the same host, port range of
+     8004-8007 and serverIDs of: pz1, pz2, pz3 and pz4 respectively we
+     could use the following Apache 2 configuration to expose a single
+     pazpar2 'endpoint' on a standard
+     (<filename>/pazpar2/search.pz2</filename>) location:
+
+     <screen><![CDATA[
+       <Proxy *>
+         AddDefaultCharset off
+         Order deny,allow
+         Allow from all
+       </Proxy>
+       ProxyVia Off
+
+       # 'route' has to match the configured pazpar2 server ID
+       <Proxy balancer://pz2cluster>
+         BalancerMember http://localhost:8004 route=pz1
+         BalancerMember http://localhost:8005 route=pz2
+         BalancerMember http://localhost:8006 route=pz3
+         BalancerMember http://localhost:8007 route=pz4
+       </Proxy>
+
+       # route is resent in the 'session' param which has the form:
+       # 'sessid.serverid', understandable by the mod_proxy_load_balancer
+       # this is not going to work if the client tampers with the 'session' param
+       ProxyPass /pazpar2/search.pz2 balancer://pz2cluster lbmethod=byrequests stickysession=session nofailover=On
+     ]]></screen>
+
+     The 'ProxyPass' line sets up a reverse proxy for request
+     ‘/pazpar2/search.pz2’ and delegates all requests to the load balancer
+     (virtual worker) with name ‘pz2cluster’.
+     Sticky sessions are enabled and implemented using the ‘session’ parameter.
+     The ‘Proxy’ section lists all the servers (real workers) which the
+     load balancer can use.
+    </para>
+
+   </example>
+
+  </section>
+
+  <section id="relevance_ranking">
+   <title>Relevance ranking</title>
+   <para>
+    Pazpar2 uses a variant of the fterm frequency–inverse document frequency
+    (Tf-idf) ranking algorithm.
+   </para>
+   <para>
+    The Tf-part is straightforward to calculate and is based on the
+    documents that Pazpar2 fetches. The idf-part, however, is more tricky
+    since the corpus at hand is ONLY the relevant documents and not
+    irrelevant ones. Pazpar2 does not have the full corpus -- only the
+    documents that match a particular search.
+   </para>
+   <para>
+    Computatation of the Tf-part is based on the normalized documents.
+    The length, the position and terms are thus normalized at this point.
+    Also the computation if performed for each document received from the
+    target - before merging takes place. The result of a TF-compuation is
+    added to the TF-total of a cluster. Thus, if a document occurs twice,
+    then the TF-part is doubled. That, however, can be adjusted, because the
+    TF-part may be divided by the number of documents in a cluster.
+   </para>
+   <para>
+    The algorithm used by Pazpar2 has two phases. In phase one
+    Pazpar2 computes a tf-array .. This is being done as records are
+    fetched form the database. In this case, the rank weigth
+    <literal>w</literal>, the and rank tweaks <literal>lead</literal>,
+    <literal>follow</literal> and <literal>length</literal>.
+
+   </para>
+   <screen><![CDATA[
+    tf[1,2,..N] = 0;
+    foreach document in a cluster
+       foreach field
+          w[1,2,..N] = 0;
+          for i = 1, .. N:  (each term)
+             foreach pos (where term i occurs in field)
+                // w is configured weight for field
+                // pos is position of term in field
+                w[i] += w / (1 + log2(1+lead*pos))
+                if (d > 0)
+                    w[i] += w[i] * follow / (1+log2(d)
+          // length: length of field (number of terms that is)
+         if (length strategy is "linear")
+             tf[i] += w[i] / length;
+          else if (length strategy is "log")
+             tf[i] += w[i] / log2(length);
+          else if (length strategy is "none")
+             tf[i] += w[i];
+         ]]></screen>
+   <para>
+    In phase two, the idf-array is computed and the final score
+    is computed. This is done for each cluster as part of each show command.
+    The rank tweak <literal>cluster</literal> is in use here.
+   </para>
+   <screen><![CDATA[
+    // dococcur[i]: number of records where term occurs
+    // doctotal: number of records
+    for i = 1, .., N (each term)
+      if (dococcur[i] > 0)
+         idf[i] = log(1 + doctotal / dococcur[i])
+      else
+         idf[i] = 0;
+
+    relevance = 0;
+    for i = 1, .., N: (each term)
+       if (cluster is "yes")
+          tf[i] = tf[i] / cluster_size;
+       relevance += 100000 * tf[i] / idf[i];
+       ]]></screen>
+   <para>
+    For controlling the ranking parameters, refer to the
+    <link linkend="service-rank">rank</link> element of the 
+    service definition.
+    Refer to the <link linkend="metadata-rank">rank</link> attribute
+    of the metadata element for how to control ranking for individual
+    metadata fields.
+   </para>
+  </section> <!-- relevance_ranking -->
+
+  <section id="masterkey_connect">
+   <title>Pazpar2 and MasterKey Connect</title>
+   <para>
+    MasterKey Connect is a hosted connector, or gateway, service that exposes
+    whatever searchable resources you need. Since the service exposes all
+    resources using Z39.50 (or SRU), it is easy to set up Pazpar2 to use the
+    service. In particular, since all connectors expose basically the same core
+    behavior, it is a good use of Pazpar2's mechanism for managing default
+    behaviors across similar databases.
+   </para>
+   <para>
+    After installation of Pazpar2, the directory
+    <filename>/etc/pazpar2/settings/mkc</filename> (location may
+    vary depending on installation preferences) contains an example setup that
+    searches two different resources through a MasterKey Connect demo account.
+    The file mkc.xml contains default parameters that will work for all
+    MasterKey Connect resources (if you decide to become a customer of the
+    service, you will substitute your own account credentials for
+    the guest/guest). The other files contain specific information about
+    a couple of demonstration resources.
+   </para>
+
+   <para>
+    To play with the demo, just create a symlink from
+    <filename>/etc/pazpar2/services-enabled/default.xml</filename>
+    to <filename>/etc/pazpar2/services-available/mkc.xml</filename>.
+    And restart Pazpar2. You should now be able to search the two demo
+    resources using JSDemo or any user interface of your choice.
+    If you are interested in learning more about MasterKey Connect, or to
+    try out the service for free against your favorite online resource, just
+    contact us at <email>info@indexdata.com</email>.
+   </para>
+  </section>
+
+ </chapter> <!-- Using Pazpar2 -->
+
   <reference id="reference">
    <title>Reference</title>
-  <partintro>
+  <partintro id="reference-introduction">
     <para>
      The material in this chapter is drawn directly from the individual
      manual entries.
@@ -168,19 +909,45 @@
    </partintro>
    &manref;
   </reference>
+
+ <appendix id="license">
+  <title>License</title>
+
+ <para>
+  Pazpar2,
+  Copyright &#xa9; &copyright-year; Index Data.
+ </para>
+
+ <para>
+  Pazpar2 is free software; you can redistribute it and/or modify it under
+  the terms of the GNU General Public License as published by the Free
+  Software Foundation; either version 2, or (at your option) any later
+  version.
+ </para>
+
+ <para>
+  Pazpar2 is distributed in the hope that it will be useful, but WITHOUT ANY
+  WARRANTY; without even the implied warranty of MERCHANTABILITY or
+  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+  for more details.
+ </para>
+
+ <para>
+  You should have received a copy of the GNU General Public License
+  along with Pazpar2; see the file LICENSE.  If not, write to the
+  Free Software Foundation,
+  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ </para>
+
+ </appendix>
+
+ &gpl2;
+
  </book>
  
- <!-- Keep this comment at the end of the file
- Local variables:
- mode: sgml
- sgml-omittag:t
- sgml-shorttag:t
- sgml-minimize-attributes:nil
- sgml-always-quote-attributes:t
- sgml-indent-step:1
- sgml-indent-data:t
- sgml-parent-document: nil
- sgml-local-catalogs: nil
- sgml-namecase-general:t
- End:
- -->
+<!-- Keep this comment at the end of the file
+Local variables:
+mode: nxml
+nxml-child-indent: 1
+End:
+-->