X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Fbook.xml;h=58e9ca6da36de1707880354a304f71bcf3972bb1;hb=c7f4f3051cc7215e8513a7ddeb765d7db64c8f11;hp=ff71070fa2ad635a578073d76752a22176a968a7;hpb=685b5c5a8b3860fd30d3d8776c8828ed1a8fb7a1;p=pazpar2-moved-to-github.git
diff --git a/doc/book.xml b/doc/book.xml
index ff71070..58e9ca6 100644
--- a/doc/book.xml
+++ b/doc/book.xml
@@ -1,6 +1,6 @@
-
%local;
@@ -24,6 +24,12 @@
JakubSkoczen
+
+ MikeTaylor
+
+
+ DennisSchafroth
+
&version;
©right-year;
@@ -31,10 +37,12 @@
- Pazpar2 is a high-performance, user interface-independent, data
- model-independent metasearching
- middle-ware featuring merging, relevance ranking, record sorting,
+ Pazpar2 is a high-performance metasearch engine featuring
+ merging, relevance ranking, record sorting,
and faceted results.
+ It is middleware: it has no user interface of its own, but can be
+ configured and controlled by an XML-over-HTTP web-service to provide
+ metasearching functionality behind any user interface.
This document is a guide and reference to Pazpar2 version &version;.
@@ -43,145 +51,181 @@
-
-
-
-
-
+
+
+
+
+
-
+
Introduction
-
- Pazpar2 is a stand-alone metasearch client with a web-service API, designed
- to be used either from a browser-based client (JavaScript, Flash, Java,
- etc.), from server-side code, or any combination of the two.
- Pazpar2 is a highly optimized client designed to
- search many resources in parallel. It implements record merging,
- relevance-ranking and sorting by arbitrary data content, and facet
- analysis for browsing purposes. It is designed to be data model
- independent, and is capable of working with MARC, DublinCore, or any
- other XML-structured response format
- -- XSLT is used to normalize and extract
- data from retrieval records for display and analysis. It can be used
- against any server which supports the
- Z39.50 and SRU/SRW
- protocol. Proprietary
- backend modules can be used to support a large number of other protocols
- (please contact Index Data for further information about this).
-
-
- Additional functionality such as
- user management, attractive displays are expected to be implemented by
- applications that use Pazpar2. Pazpar2 is user interface independent.
- Its functionality is exposed through a simple REST-style web-service API,
- designed to be simple to use from an Ajax-enabled browser, Flash
- animation, Java applet, etc., or from a higher-level server-side language
- like PHP or Java. Because session information can be shared between
- browser-based logic and your server-side scripting, there is tremendous
- flexibility in how you implement your business logic on top of Pazpar2.
-
-
- Once you launch a search in Pazpar2, the operation continues behind the
- scenes. Pazpar2 connects to servers, carries out searches, and
- retrieves, deduplicates, and stores results internally. Your application
- code may periodically inquire about the status of an ongoing operation,
- and ask to see records or other result set facets. Result becomes
- available immediately, and it is easy to build end-user interfaces which
- feel extremely responsive, even when searching more than 100 servers
- concurrently.
-
-
- Pazpar2 is designed to be highly configurable. Incoming records are
- normalized to XML/UTF-8, and then further normalized using XSLT to a
- simple internal representation that is suitable for analysis. By
- providing XSLT stylesheets for different kinds of result records, you
- can tune Pazpar2 to work against different kinds of information
- retrieval servers. Finally, metadata is extracted, in a configurable
- way, from this internal record, to support display, merging, ranking,
- result set facets, and sorting. Pazpar2 is not bound to a specific model
- of metadata, such as DublinCore or MARC -- by providing the right
- configuration, it can work with a number of different kinds of data in
- support of many different applications.
-
-
- Pazpar2 is designed to be efficient and scalable. You can set it up to
- search several hundred targets in parallel, or you can use it to support
- hundreds of concurrent users. It is implemented with the same attention
- to performance and economy that we use in our indexing engines, so that
- you can focus on building your application, without worrying about the
- details of metasearch logic. You can devote all of your attention to
- usability and let Pazpar2 do what it does best -- metasearch.
-
-
- If you wish to connect to commercial or other databases which do not
- support open standards, please contact Index Data. We have a licensing
- agreement with a third party vendor which will enable Pazpar2 to access
- thousands of online databases, in addition to the vast number of catalogs
- and online services that support the Z39.50/SRU/SRW protocols.
-
-
- Pazpar2 is our attempt to re-think the traditional paradigms for
- implementing and deploying metasearch logic, with an uncompromising
- approach to performance, and attempting to make maximum use of the
- capabilities of modern browsers. The demo user interface that
- accompanies the distribution is but one example. If you think of new
- ways of using Pazpar2, we hope you'll share them with us, and if we
- can provide assistance with regards to training, design, programming,
- integration with different backends, hosting, or support, please don't
- hesitate to contact us. If you'd like to see functionality in Pazpar2
- that is not there today, please don't hesitate to contact us. It may
- already be in our development pipeline, or there might be a
- possibility for you to help out by sponsoring development time or
- code. Either way, get in touch and we will give you straight answers.
-
-
- Enjoy!
-
-
- Pazpar2 is covered by the GNU license version 2.
- See for further information.
-
+
+
+ What Pazpar2 is
+
+ Pazpar2 is a stand-alone metasearch engine with a web-service API, designed
+ to be used either from a browser-based client (JavaScript, Flash,
+ Java applet,
+ etc.), from server-side code, or any combination of the two.
+ Pazpar2 is a highly optimized client designed to
+ search many resources in parallel. It implements record merging,
+ relevance-ranking and sorting by arbitrary data content, and facet
+ analysis for browsing purposes. It is designed to be data-model
+ independent, and is capable of working with MARC, DublinCore, or any
+ other XML-structured response format
+ -- XSLT is used to normalize and extract
+ data from retrieval records for display and analysis. It can be used
+ against any server which supports the
+ Z39.50, SRU/SRW
+ or SOLR protocol. Proprietary
+ backend modules can function as connectors between these standard
+ protocols and any non-standard API, including web-site scraping, to
+ support a large number of other protocols.
+
+
+ Additional functionality such as
+ user management and attractive displays are expected to be implemented by
+ applications that use Pazpar2. Pazpar2 itself is user-interface independent.
+ Its functionality is exposed through a simple XML-based web-service API,
+ designed to be easy to use from an Ajax-enabled browser, Flash
+ animation, Java applet, etc., or from a higher-level server-side language
+ like PHP, Perl or Java. Because session information can be shared between
+ browser-based logic and server-side scripting, there is tremendous
+ flexibility in how you implement application-specific logic on top
+ of Pazpar2.
+
+
+ Once you launch a search in Pazpar2, the operation continues behind the
+ scenes. Pazpar2 connects to servers, carries out searches, and
+ retrieves, deduplicates, and stores results internally. Your application
+ code may periodically inquire about the status of an ongoing operation,
+ and ask to see records or result set facets. Results become
+ available immediately, and it is easy to build end-user interfaces than
+ feel extremely responsive, even when searching more than 100 servers
+ concurrently.
+
+
+ Pazpar2 is designed to be highly configurable. Incoming records are
+ normalized to XML/UTF-8, and then further normalized using XSLT to a
+ simple internal representation that is suitable for analysis. By
+ providing XSLT stylesheets for different kinds of result records, you
+ can configure Pazpar2 to work against different kinds of information
+ retrieval servers. Finally, metadata is extracted in a configurable
+ way from this internal record, to support display, merging, ranking,
+ result set facets, and sorting. Pazpar2 is not bound to a specific model
+ of metadata, such as DublinCore or MARC: by providing the right
+ configuration, it can work with any combination of different kinds of data
+ in support of many different applications.
+
+
+ Pazpar2 is designed to be efficient and scalable. You can set it up to
+ search several hundred targets in parallel, or you can use it to support
+ hundreds of concurrent users. It is implemented with the same attention
+ to performance and economy that we use in our indexing engines, so that
+ you can focus on building your application without worrying about the
+ details of metasearch logic. You can devote all of your attention to
+ usability and let Pazpar2 do what it does best -- metasearch.
+
+
+ Pazpar2 is our attempt to re-think the traditional paradigms for
+ implementing and deploying metasearch logic, with an uncompromising
+ approach to performance, and attempting to make maximum use of the
+ capabilities of modern browsers. The demo user interface that
+ accompanies the distribution is but one example. If you think of new
+ ways of using Pazpar2, we hope you'll share them with us, and if we
+ can provide assistance with regards to training, design, programming,
+ integration with different backends, hosting, or support, please don't
+ hesitate to contact us. If you'd like to see functionality in Pazpar2
+ that is not there today, please don't hesitate to contact us. It may
+ already be in our development pipeline, or there might be a
+ possibility for you to help out by sponsoring development time or
+ code. Either way, get in touch and we will give you straight answers.
+
+
+ Enjoy!
+
+
+ Pazpar2 is covered by the GNU General Public License (GPL) version 2.
+ See for further information.
+
+
+
+
+ Connectors to non-standard databases
+
+ If you need to access commercial or open access resources that don't support
+ Z39.50 or SRU, one approach would be to use a tool like SimpleServer to build a
+ gateway. An easier option is to use Index Data's MasterKey Connect
+ service, which will expose virtually any resource
+ through Z39.50/SRU, dead easy to integrate with Pazpar2.
+ The service is hosted, so all you have to do is to let us
+ know which resources you are interested in, and we operate the gateways,
+ or Connectors for you for a low annual charge.
+ Types of resources supported include
+ commercial databases, free online resources, and even local resources;
+ almost anything that can be accessed through a web-facing user
+ interface can be accessed in this way.
+ Contact info@indexdata.com for more information.
+ See for an example.
+
+
+
+
+ A note on the name Pazpar2
+
+ The name Pazpar2 derives from three sources. One one hand, it is
+ Index Data's second major piece of software that does parallel
+ searching of Z39.50 targets. On the other, it is a near-homophone
+ of Passpartout, the ever-helpful servant in Jules Verne's novel
+ Around the World in Eighty Days (who helpfully uses the language
+ of his master). Finally, "passe par tout" means something like
+ "passes through anything" in French -- on other words, a universal
+ solution, or if you like a MasterKey.
+
+
Installation
- The Pazpar2 package is very small. It includes documentation as well
+ The Pazpar2 package includes documentation as well
as the Pazpar2 server. The package also includes a simple user
- interface test1 which consists of a single HTML page and a single
+ interface called "test1", which consists of a single HTML page and a single
JavaScript file to illustrate the use of Pazpar2.
Pazpar2 depends on the following tools/libraries:
YAZ
-
-
- The popular Z39.50 toolkit for the C language.
- YAZ must be compiled with Libxml2/Libxslt support.
-
-
+
+
+ The popular Z39.50 toolkit for the C language.
+ YAZ must be compiled with Libxml2/Libxslt support.
+
+
International
- Components for Unicode (ICU)
-
-
- ICU provides Unicode support for non-English languages with
- character sets outside the range of 7bit ASCII, like
- Greek, Russian, German and French. Pazpar2 uses the ICU
- Unicode character conversions, Unicode normalization, case
- folding and other fundamental operations needed in
- tokenization, normalization and ranking of records.
-
-
- Compiling, linking, and usage of the ICU libraries is optional,
- but strongly recommended for usage in an international
- environment.
-
-
+ Components for Unicode (ICU)
+
+
+ ICU provides Unicode support for non-English languages with
+ character sets outside the range of 7bit ASCII, like
+ Greek, Russian, German and French. Pazpar2 uses the ICU
+ Unicode character conversions, Unicode normalization, case
+ folding and other fundamental operations needed in
+ tokenization, normalization and ranking of records.
+
+
+ Compiling, linking, and usage of the ICU libraries is optional,
+ but strongly recommended for usage in an international
+ environment.
+
+
@@ -191,32 +235,36 @@
- Installation on Unix (from Source)
+ Installation from source on Unix (including Linux, MacOS, etc.)
The latest source code for Pazpar2 is available from
.
- Only few systems have none of the required
- tools binary packages.
- If, for example, Libxml2/libXSLT libraries
- are already installed as development packages use these.
+ Most Unix-based operating systems have the required
+ tools available as binary packages.
+ For example, if Libxml2/libXSLT libraries
+ are already installed as development packages, use these.
-
+
- Ensure that the development libraries + header files are
+ Ensure that the development libraries and header files are
available on your system before compiling Pazpar2. For installation
- of YAZ, refer to the YAZ installation chapter.
+ of YAZ, refer to the Installation chapter of the YAZ manual at
+ .
+
+
+ Once the dependencies are in place, Pazpar2 can be unpacked and
+ installed as follows:
- gunzip -c pazpar2-version.tar.gz|tar xf -
- cd pazpar2-version
+ tar xzf pazpar2-VERSION.tar.gz
+ cd pazpar2-VERSION
./configure
make
- su
- make install
+ sudo make install
The make install will install manpages as well as the
- Pazpar2 server, pazpar2,
+ Pazpar2 server, pazpar2,
in PREFIX/sbin.
By default, PREFIX is /usr/local/ . This can be
changed with configure option .
@@ -224,63 +272,67 @@
- Installation on Windows (from Source)
-
- Pazpar2 can be built for Windows using
- Microsoft Visual Studio.
- The support files for building YAZ on Windows are located in the
- win directory. The compilation is performed
- using the win/makefile which is to be
- processed by the NMAKE utility part of Visual Studio.
-
-
- Ensure that the development libraries + header files are
- available on your system before compiling Pazpar2. For installation
- of YAZ, refer to the YAZ installation chapter.
- It is easiest if YAZ and Pazpar2 are unpacked in the same
- directory (side-by-side).
-
-
- The compilation is tuned by editing the makefile of Pazpar2.
- The process is similar to YAZ. Adjust the various directories
- YAZ_DIR, ZLIB_DIR, ..
-
-
- Compile Pazpar2 by invoking nmake in
- the win directory.
- The resulting binaries of the build process are located in the
- bin of the Pazpar2 source
- tree - including the pazpar2.exe and necessary DLLs.
-
-
- The Windows version of Pazpar2 is a console application. It may
- be installed as a Windows Service by adding option
- -install for the pazpar2 program. This will
- register Pazpar2 as a service and use the other options provided
- in the same invocation. For example:
-
- cd \MyPazpar2\etc
- ..\bin\pazpar2 -install -f pazpar2.cfg -l pazpar2.log
-
- The Pazpar2 service may now be controlled via the Service Control
- Panel. It may be unregistered by passing the -remove
- option. Example:
-
- cd \MyPazpar2\etc
- ..\bin\pazpar2 -remove
-
-
+ Installation from source on Windows
+
+ Pazpar2 can be built for Windows using
+ Microsoft Visual Studio.
+ The support files for building YAZ on Windows are located in the
+ win directory. The compilation is performed
+ using the win/makefile which is to be
+ processed by the NMAKE utility part of Visual Studio.
+
+
+ Ensure that the development libraries and header files are
+ available on your system before compiling Pazpar2. For installation
+ of YAZ, refer to
+ the Installation chapter of the YAZ manual at
+ .
+ It is easiest if YAZ and Pazpar2 are unpacked in the same
+ directory (side-by-side).
+
+
+ The compilation is tuned by editing the makefile of Pazpar2.
+ The process is similar to YAZ. Adjust the various directories
+ YAZ_DIR, ZLIB_DIR, etc.,
+ as required.
+
+
+ Compile Pazpar2 by invoking nmake in
+ the win directory.
+ The resulting binaries of the build process are located in the
+ bin of the Pazpar2 source
+ tree - including the pazpar2.exe and necessary DLLs.
+
+
+ The Windows version of Pazpar2 is a console application. It may
+ be installed as a Windows Service by adding option
+ -install for the pazpar2 program. This will
+ register Pazpar2 as a service and use the other options provided
+ in the same invocation. For example:
+
+ cd \MyPazpar2\etc
+ ..\bin\pazpar2 -install -f pazpar2.cfg -l pazpar2.log
+
+ The Pazpar2 service may now be controlled via the Service Control
+ Panel. It may be unregistered by passing the -remove
+ option. Example:
+
+ cd \MyPazpar2\etc
+ ..\bin\pazpar2 -remove
+
+
- Installation of test1 interface
+ Installation of test interfaces
- In this section we outline how to install a simple interface that
- is part of the Pazpar2 source package. Note that Debian users can
- save time by just installing package pazpar2-test1.
+ In this section we show how to make available the set of simple
+ interfaces that are part of the Pazpar2 source package, and which
+ demonstrate some ways to use Pazpar2. (Note that Debian users can
+ save time by just installing the package pazpar2-test1.)
- A web server must be installed and running on the system, such as Apache.
+ A web server, such as Apache, must be installed and running on the system.
@@ -297,76 +349,103 @@
copy pazpar2.cfg.dist pazpar2.cfg
..\bin\pazpar2 -f pazpar2.cfg
- This will start a Pazpar2 listener on port 9004. It will proxy
- HTTP requests to localhost - port 80, which we assume will be the regular
+ This will start a Pazpar2 listener on port 9004. It will proxy
+ HTTP requests to port 80 on localhost, which we assume will be the regular
HTTP server on the system. Inspect and modify pazpar2.cfg as needed
- if this is to be changed. The pazpar2.cfg includes settings from the
+ if this is to be changed. The pazpar2.cfg file includes settings from the
file settings/edu.xml
to use for searches.
+
- Make a new console and move to the other stuff.
- For more information about pazpar2 options refer to the manpage.
+ The test UIs are located in www. Ensure that this
+ directory is available to the web server by copying
+ www to the document root,
+ using Apache's Alias directive, or
+ creating a symbolic link: for example, on a Debian or Ubuntu
+ system with Apache2 installed from the standard package, you might
+ make the link as follows:
+
+ cd .../pazpar2
+ sudo ln -s `pwd`/www /var/www/pazpar2-demo
+
- The test1 UI is located in www/test1. Ensure this
- directory is available to the web server by either copying
- test1 to the document root, create a symlink or
- use Apache's Alias directive.
+ This makes the test applications visible at
+
+ but they can not be run successfully from that URL, as they submit
+ search requests back to the server form which they were served,
+ and Apache2 doesn't know how to handle them. Instead, the test
+ applications must be accessed from Pazpar2 itself, acting as a
+ proxy to Apache2, at the URL
+
- The interface test1 interface should now be available on port 8004.
+ From here, the demo applications can be
+ accessed: test1, test2 and
+ jsdemo
+ are pure HTML+JavaScript setups, needing no server-side
+ intelligence;
+ demo
+ requires PHP on the server.
- If you don't see the test1 interface. See if test1 is really available
- on the same URL but on port 80. If it's not, the Apache configuration
- (or other) is not correct.
+ If you don't see the test interfaces, check whether they are available
+ on port 80 (i.e. directly from the Apache2 server). If not, the
+ Apache configuration is incorrect.
In order to use Apache as frontend for the interface on port 80
- for public access etc., refer to
+ for public access etc., refer to
.
- Installation on Debian GNU/Linux
+ Installation on Debian GNU/Linux and Ubuntu
- Index Data provides Debian packages for Pazpar2. These are prepared
- for Debian versions Etch and Lenny (as of 2007).
- These packages are available at
- .
+ Index Data provides Debian and Ubuntu packages for Pazpar2.
+ As of February 2010, these
+ are prepared for Debian versions Etch, Lenny and Squeeze; and for
+ Ubuntu versions 8.04 (hardy), 8.10 (intrepid), 9.04 (jaunty) and
+ 9.10 (karmic). These packages are available at
+ and
+ .
Apache 2 Proxy
- Apache 2 has a
-
+ Apache 2 has a
+
proxy module
- which allows Pazpar2 to become a backend to an Apache 2
+
+ which allows Pazpar2 to become a backend to an Apache 2
based web service. The Apache 2 proxy must operate in the
Reverse Proxy mode.
-
+
On a Debian based Apache 2 system, the relevant modules can
be enabled with:
- sudo a2enmod proxy_http
+ sudo a2enmod proxy_http proxy_balancer
- Traditionally Pazpar2 interprets URL paths with suffix
+ Traditionally Pazpar2 interprets URL paths with suffix
/search.pz2.
- The
- ProxyPass directive of Apache must be used to map a URL path
+ The
+
+ ProxyPass
+
+ directive of Apache must be used to map a URL path
the the Pazpar2 server (listening port).
@@ -389,13 +468,13 @@
ProxyRequests Off
-
+
AddDefaultCharset off
Order deny,allow
Allow from all
-
+
ProxyPass /myportal/search.pz2 http://localhost:8004/search.pz2
ProxyVia Off
@@ -410,7 +489,7 @@
Using Pazpar2
This chapter provides a general introduction to the use and
- deployment of Pazpar2.
+ deployment of Pazpar2.
@@ -443,7 +522,7 @@
with the server from which the enclosing HTML page or object
originated, Pazpar2 is designed so that it can act as a transparent
proxy in front of an existing webserver (see for details).
+ linkend="pazpar2_conf"/> for details).
In this mode, all regular
HTTP requests are transparently passed through to your webserver,
while Pazpar2 only intercepts search-related webservice requests.
@@ -506,18 +585,17 @@
The intermediate, internal representation of the record looks like
this:
-
- The Shining
+
- King, Stephen
+ The Shining
- ebook
+ King, Stephen
-
-
- ]]>
+ ebook
+
+
+]]>
As you can see, there isn't much to it. There are really only a few
important elements to this file.
@@ -534,7 +612,8 @@
records are never merged. The 'metadata' elements provide the meat
of the elements -- the content. the 'type' attribute is used to
match each element against processing rules that determine what
- happens to the data element next.
+ happens to the data element next. The attribute, 'rank' specifies
+ specifies a multipler for ranking for this element.
@@ -545,6 +624,31 @@
in the retrieval record ultimately drives merging, sorting, ranking,
the extraction of browse facets, and display, all configurable.
+
+
+ Pazpar2 1.6.37 and later also allows already clustered records to
+ be ingested. Suppose a database already clusters for us and we would like
+ to keep that cluster for Pazpar2. In that case we can generate a
+ cluster wrapper element that holds individual
+ record elements.
+
+
+ Cluster record example:
+
+
+ The Shining
+ King, Stephen
+ ebook
+
+
+ The Shining
+ King, Stephen
+ audio
+
+
+ ]]>
+
@@ -571,81 +675,12 @@
search. You start a new search using the 'search' command. Once the
search has been started, you can follow its progress using the
'stat', 'bytarget', 'termlist', or 'show' commands. Detailed records
- can be fetched using the 'record' command.
+ can be fetched using the 'record' command.
§-ajaxdev;
-
- Connecting to non-standard resources
-
- Pazpar2 uses Z39.50 as its switchboard language -- i.e. as far as it
- is concerned, all resources speak Z39.50, or its webservices derivatives,
- SRU/SRW. It is, however, equipped
- to handle a broad range of different server behavior, through
- configurable query mapping and record normalization. If you develop
- configuration, stylesheets, etc., for a new type of resources, we
- encourage you to share your work. But you can also use Pazpar2 to
- connect to hundreds of resources that do not support standard
- protocols.
-
-
-
- For a growing number of resources, Z39.50 is all you need. Over the
- last few years, a number of commercial, full-text resources have
- implemented Z39.50. These can be used through Pazpar2 with little or
- no effort. Resources that use non-standard record formats will
- require a bit of XSLT work, but that's all.
-
-
-
- But what about resources that don't support Z39.50 at all? Some resources might
- support OpenSearch, private, XML/HTTP-based protocols, or something
- else entirely. Some databases exist only as web user interfaces and
- will require screen-scraping. Still others exist only as static
- files, or perhaps as databases supporting the OAI-PMH protocol.
- There is hope! Read on.
-
-
-
- Index Data continues to advocate the support of open standards. We
- work with database vendors to support standards, so you don't have
- to worry about programming against non-standard services. We also
- provide tools (see SimpleServer)
- which make it comparatively easy to build gateways against servers
- with non-standard behavior. Again, we encourage you to share any
- work you do in this direction.
-
-
-
- But the bottom line is that working with non-standard resources in
- metasearching is really, really hard. If you want to build a
- project with Pazpar2, and you need access to resources with
- non-standard interfaces, we can help. We run gateways to more than
- 2,000 popular, commercial databases and other resources,
- making it simple
- to plug them directly into Pazpar2. For a small annual fee per
- database, we can help you establish connections to your licensed
- resources. Meanwhile, you can help! If you build your own
- standards-compliant gateways, host them for others, or share the
- code! And tell your vendors that they can save everybody money and
- increase the appeal of their resources by supporting standards.
-
-
-
- There are those who will ask us why we are using Z39.50 as our
- switchboard language rather than a different protocol. Basically,
- we believe that Z39.50 is presently the most widely implemented
- information retrieval protocol that has the level of functionality
- required to support a good metasearching experience (structured
- searching, structured, well-defined results). It is also compact and
- efficient, and there is a very broad range of tools available to
- implement it.
-
-
-
Unicode Compliance
@@ -667,11 +702,206 @@
In addition, the ICU tokenization and normalization rules must
- be defined in the master configuration file described in
+ be defined in the master configuration file described in
.
+
+ Load balancing
+
+ Just like any web server, Pazpar2, can be load balanced by a standard
+ hardware or software load balancer as long as the session stickiness
+ is ensured. If you are already running the Apache2 web server in front
+ of Pazpar2 and use the apache mod_proxy module to 'relay' client
+ requests to Pazpar2, this set up can be easily extended to include
+ load balancing capabilites.
+ To do so you need to enable the
+
+ mod_proxy_balance
+
+ module in your Apache2 installation.
+
+
+
+ On a Debian based Apache 2 system, the relevant modules can
+ be enabled with:
+
+ sudo a2enmod proxy_http
+
+
+
+
+ The mod_proxy_balancer can pass all 'sessionsticky' requests to the
+ same backend worker as long as the requests are marked with the
+ originating worker's ID (called 'route'). If the Pazpar2 serverID is
+ configured (by setting an 'id' attribute on the 'server' element in
+ the Pazpar2 configuration file) Pazpar2 will append it to the
+ 'session' element returned during the 'init' in a mod_proxy_balancer
+ compatible manner.
+ Since the 'session' is then re-sent by the client (for all pazpar2
+ request besides 'init'), the balancer can use the marker to pass
+ the request to the right route. To do so the balancer needs to be
+ configured to inspect the 'session' parameter.
+
+
+
+ Apache 2 load balancing configuration
+
+ Having 4 Pazpar2 instances running on the same host, port range of
+ 8004-8007 and serverIDs of: pz1, pz2, pz3 and pz4 respectively we
+ could use the following Apache 2 configuration to expose a single
+ pazpar2 'endpoint' on a standard
+ (/pazpar2/search.pz2) location:
+
+
+ AddDefaultCharset off
+ Order deny,allow
+ Allow from all
+
+ ProxyVia Off
+
+ # 'route' has to match the configured pazpar2 server ID
+
+ BalancerMember http://localhost:8004 route=pz1
+ BalancerMember http://localhost:8005 route=pz2
+ BalancerMember http://localhost:8006 route=pz3
+ BalancerMember http://localhost:8007 route=pz4
+
+
+ # route is resent in the 'session' param which has the form:
+ # 'sessid.serverid', understandable by the mod_proxy_load_balancer
+ # this is not going to work if the client tampers with the 'session' param
+ ProxyPass /pazpar2/search.pz2 balancer://pz2cluster lbmethod=byrequests stickysession=session nofailover=On
+ ]]>
+
+ The 'ProxyPass' line sets up a reverse proxy for request
+ â/pazpar2/search.pz2â and delegates all requests to the load balancer
+ (virtual worker) with name âpz2clusterâ.
+ Sticky sessions are enabled and implemented using the âsessionâ parameter.
+ The âProxyâ section lists all the servers (real workers) which the
+ load balancer can use.
+
+
+
+
+
+
+
+ Relevance ranking
+
+ Pazpar2 uses a variant of the fterm frequencyâinverse document frequency
+ (Tf-idf) ranking algorithm.
+
+
+ The Tf-part is straightforward to calculate and is based on the
+ documents that Pazpar2 fetches. The idf-part, however, is more tricky
+ since the corpus at hand is ONLY the relevant documents and not
+ irrelevant ones. Pazpar2 does not have the full corpus -- only the
+ documents that match a particular search.
+
+
+ Computatation of the Tf-part is based on the normalized documents.
+ The length, the position and terms are thus normalized at this point.
+ Also the computation if performed for each document received from the
+ target - before merging takes place. The result of a TF-compuation is
+ added to the TF-total of a cluster. Thus, if a document occurs twice,
+ then the TF-part is doubled. That, however, can be adjusted, because the
+ TF-part may be divided by the number of documents in a cluster.
+
+
+ The algorithm used by Pazpar2 has two phases. In phase one
+ Pazpar2 computes a tf-array .. This is being done as records are
+ fetched form the database. In this case, the rank weigth
+ w, the and rank tweaks lead,
+ follow and length.
+
+
+ 0)
+ w[i] += w[i] * follow / (1+log2(d)
+ // length: length of field (number of terms that is)
+ if (length strategy is "linear")
+ tf[i] += w[i] / length;
+ else if (length strategy is "log")
+ tf[i] += w[i] / log2(length);
+ else if (length strategy is "none")
+ tf[i] += w[i];
+ ]]>
+
+ In phase two, the idf-array is computed and the final score
+ is computed. This is done for each cluster as part of each show command.
+ The rank tweak cluster is in use here.
+
+ 0)
+ idf[i] = log(1 + doctotal / dococcur[i])
+ else
+ idf[i] = 0;
+
+ relevance = 0;
+ for i = 1, .., N: (each term)
+ if (cluster is "yes")
+ tf[i] = tf[i] / cluster_size;
+ relevance += 100000 * tf[i] / idf[i];
+ ]]>
+
+ For controlling the ranking parameters, refer to the
+ rank element of the
+ service definition.
+ Refer to the rank attribute
+ of the metadata element for how to control ranking for individual
+ metadata fields.
+
+
+
+
+ Pazpar2 and MasterKey Connect
+
+ MasterKey Connect is a hosted connector, or gateway, service that exposes
+ whatever searchable resources you need. Since the service exposes all
+ resources using Z39.50 (or SRU), it is easy to set up Pazpar2 to use the
+ service. In particular, since all connectors expose basically the same core
+ behavior, it is a good use of Pazpar2's mechanism for managing default
+ behaviors across similar databases.
+
+
+ After installation of Pazpar2, the directory
+ /etc/pazpar2/settings/mkc (location may
+ vary depending on installation preferences) contains an example setup that
+ searches two different resources through a MasterKey Connect demo account.
+ The file mkc.xml contains default parameters that will work for all
+ MasterKey Connect resources (if you decide to become a customer of the
+ service, you will substitute your own account credentials for
+ the guest/guest). The other files contain specific information about
+ a couple of demonstration resources.
+
+
+
+ To play with the demo, just create a symlink from
+ /etc/pazpar2/services-enabled/default.xml
+ to /etc/pazpar2/services-available/mkc.xml.
+ And restart Pazpar2. You should now be able to search the two demo
+ resources using JSDemo or any user interface of your choice.
+ If you are interested in learning more about MasterKey Connect, or to
+ try out the service for free against your favorite online resource, just
+ contact us at info@indexdata.com.
+
+
+
@@ -685,51 +915,44 @@
&manref;
- License
-
-
- Pazpar2,
- Copyright © ©right-year; Index Data.
-
-
-
- Pazpar2 is free software; you can redistribute it and/or modify it under
- the terms of the GNU General Public License as published by the Free
- Software Foundation; either version 2, or (at your option) any later
- version.
-
-
-
- Pazpar2 is distributed in the hope that it will be useful, but WITHOUT ANY
- WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
- for more details.
-
-
-
- You should have received a copy of the GNU General Public License
- along with Pazpar2; see the file LICENSE. If not, write to the
- Free Software Foundation,
- 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
-
+
+ License
+
+
+ Pazpar2,
+ Copyright © ©right-year; Index Data.
+
+
+
+ Pazpar2 is free software; you can redistribute it and/or modify it under
+ the terms of the GNU General Public License as published by the Free
+ Software Foundation; either version 2, or (at your option) any later
+ version.
+
+
+
+ Pazpar2 is distributed in the hope that it will be useful, but WITHOUT ANY
+ WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ for more details.
+
+
+
+ You should have received a copy of the GNU General Public License
+ along with Pazpar2; see the file LICENSE. If not, write to the
+ Free Software Foundation,
+ 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+
&gpl2;
-
+
-
+