X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Fbook.xml;h=58e9ca6da36de1707880354a304f71bcf3972bb1;hb=c7f4f3051cc7215e8513a7ddeb765d7db64c8f11;hp=0bbd98e4c0ae7df7a3605c26c836e6baf7d7b182;hpb=7ab0676c686a45df10325dc96ec42e5f3210d9d7;p=pazpar2-moved-to-github.git diff --git a/doc/book.xml b/doc/book.xml index 0bbd98e..58e9ca6 100644 --- a/doc/book.xml +++ b/doc/book.xml @@ -1,6 +1,6 @@ - %local; @@ -24,6 +24,12 @@ JakubSkoczen + + MikeTaylor + + + DennisSchafroth + &version; ©right-year; @@ -31,10 +37,12 @@ - Pazpar2 is a high-performance, user interface-independent, data - model-independent metasearching - middle-ware featuring merging, relevance ranking, record sorting, + Pazpar2 is a high-performance metasearch engine featuring + merging, relevance ranking, record sorting, and faceted results. + It is middleware: it has no user interface of its own, but can be + configured and controlled by an XML-over-HTTP web-service to provide + metasearching functionality behind any user interface. This document is a guide and reference to Pazpar2 version &version;. @@ -43,145 +51,181 @@ - - - - - + + + + + - + Introduction - - Pazpar2 is a stand-alone metasearch client with a web-service API, designed - to be used either from a browser-based client (JavaScript, Flash, Java, - etc.), from server-side code, or any combination of the two. - Pazpar2 is a highly optimized client designed to - search many resources in parallel. It implements record merging, - relevance-ranking and sorting by arbitrary data content, and facet - analysis for browsing purposes. It is designed to be data model - independent, and is capable of working with MARC, DublinCore, or any - other XML-structured response format - -- XSLT is used to normalize and extract - data from retrieval records for display and analysis. It can be used - against any server which supports the - Z39.50 and SRU/SRW - protocol. Proprietary - backend modules can be used to support a large number of other protocols - (please contact Index Data for further information about this). - - - Additional functionality such as - user management, attractive displays are expected to be implemented by - applications that use Pazpar2. Pazpar2 is user interface independent. - Its functionality is exposed through a simple REST-style web-service API, - designed to be simple to use from an Ajax-enabled browser, Flash - animation, Java applet, etc., or from a higher-level server-side language - like PHP or Java. Because session information can be shared between - browser-based logic and your server-side scripting, there is tremendous - flexibility in how you implement your business logic on top of Pazpar2. - - - Once you launch a search in Pazpar2, the operation continues behind the - scenes. Pazpar2 connects to servers, carries out searches, and - retrieves, deduplicates, and stores results internally. Your application - code may periodically inquire about the status of an ongoing operation, - and ask to see records or other result set facets. Result becomes - available immediately, and it is easy to build end-user interfaces which - feel extremely responsive, even when searching more than 100 servers - concurrently. - - - Pazpar2 is designed to be highly configurable. Incoming records are - normalized to XML/UTF-8, and then further normalized using XSLT to a - simple internal representation that is suitable for analysis. By - providing XSLT stylesheets for different kinds of result records, you - can tune Pazpar2 to work against different kinds of information - retrieval servers. Finally, metadata is extracted, in a configurable - way, from this internal record, to support display, merging, ranking, - result set facets, and sorting. Pazpar2 is not bound to a specific model - of metadata, such as DublinCore or MARC -- by providing the right - configuration, it can work with a number of different kinds of data in - support of many different applications. - - - Pazpar2 is designed to be efficient and scalable. You can set it up to - search several hundred targets in parallel, or you can use it to support - hundreds of concurrent users. It is implemented with the same attention - to performance and economy that we use in our indexing engines, so that - you can focus on building your application, without worrying about the - details of metasearch logic. You can devote all of your attention to - usability and let Pazpar2 do what it does best -- metasearch. - - - If you wish to connect to commercial or other databases which do not - support open standards, please contact Index Data. We have a licensing - agreement with a third party vendor which will enable Pazpar2 to access - thousands of online databases, in addition to the vast number of catalogs - and online services that support the Z39.50/SRU/SRW protocols. - - - Pazpar2 is our attempt to re-think the traditional paradigms for - implementing and deploying metasearch logic, with an uncompromising - approach to performance, and attempting to make maximum use of the - capabilities of modern browsers. The demo user interface that - accompanies the distribution is but one example. If you think of new - ways of using Pazpar2, we hope you'll share them with us, and if we - can provide assistance with regards to training, design, programming, - integration with different backends, hosting, or support, please don't - hesitate to contact us. If you'd like to see functionality in Pazpar2 - that is not there today, please don't hesitate to contact us. It may - already be in our development pipeline, or there might be a - possibility for you to help out by sponsoring development time or - code. Either way, get in touch and we will give you straight answers. - - - Enjoy! - - - Pazpar2 is covered by the GNU license version 2. - See for further information. - + +
+ What Pazpar2 is + + Pazpar2 is a stand-alone metasearch engine with a web-service API, designed + to be used either from a browser-based client (JavaScript, Flash, + Java applet, + etc.), from server-side code, or any combination of the two. + Pazpar2 is a highly optimized client designed to + search many resources in parallel. It implements record merging, + relevance-ranking and sorting by arbitrary data content, and facet + analysis for browsing purposes. It is designed to be data-model + independent, and is capable of working with MARC, DublinCore, or any + other XML-structured response format + -- XSLT is used to normalize and extract + data from retrieval records for display and analysis. It can be used + against any server which supports the + Z39.50, SRU/SRW + or SOLR protocol. Proprietary + backend modules can function as connectors between these standard + protocols and any non-standard API, including web-site scraping, to + support a large number of other protocols. + + + Additional functionality such as + user management and attractive displays are expected to be implemented by + applications that use Pazpar2. Pazpar2 itself is user-interface independent. + Its functionality is exposed through a simple XML-based web-service API, + designed to be easy to use from an Ajax-enabled browser, Flash + animation, Java applet, etc., or from a higher-level server-side language + like PHP, Perl or Java. Because session information can be shared between + browser-based logic and server-side scripting, there is tremendous + flexibility in how you implement application-specific logic on top + of Pazpar2. + + + Once you launch a search in Pazpar2, the operation continues behind the + scenes. Pazpar2 connects to servers, carries out searches, and + retrieves, deduplicates, and stores results internally. Your application + code may periodically inquire about the status of an ongoing operation, + and ask to see records or result set facets. Results become + available immediately, and it is easy to build end-user interfaces than + feel extremely responsive, even when searching more than 100 servers + concurrently. + + + Pazpar2 is designed to be highly configurable. Incoming records are + normalized to XML/UTF-8, and then further normalized using XSLT to a + simple internal representation that is suitable for analysis. By + providing XSLT stylesheets for different kinds of result records, you + can configure Pazpar2 to work against different kinds of information + retrieval servers. Finally, metadata is extracted in a configurable + way from this internal record, to support display, merging, ranking, + result set facets, and sorting. Pazpar2 is not bound to a specific model + of metadata, such as DublinCore or MARC: by providing the right + configuration, it can work with any combination of different kinds of data + in support of many different applications. + + + Pazpar2 is designed to be efficient and scalable. You can set it up to + search several hundred targets in parallel, or you can use it to support + hundreds of concurrent users. It is implemented with the same attention + to performance and economy that we use in our indexing engines, so that + you can focus on building your application without worrying about the + details of metasearch logic. You can devote all of your attention to + usability and let Pazpar2 do what it does best -- metasearch. + + + Pazpar2 is our attempt to re-think the traditional paradigms for + implementing and deploying metasearch logic, with an uncompromising + approach to performance, and attempting to make maximum use of the + capabilities of modern browsers. The demo user interface that + accompanies the distribution is but one example. If you think of new + ways of using Pazpar2, we hope you'll share them with us, and if we + can provide assistance with regards to training, design, programming, + integration with different backends, hosting, or support, please don't + hesitate to contact us. If you'd like to see functionality in Pazpar2 + that is not there today, please don't hesitate to contact us. It may + already be in our development pipeline, or there might be a + possibility for you to help out by sponsoring development time or + code. Either way, get in touch and we will give you straight answers. + + + Enjoy! + + + Pazpar2 is covered by the GNU General Public License (GPL) version 2. + See for further information. + +
+ +
+ Connectors to non-standard databases + + If you need to access commercial or open access resources that don't support + Z39.50 or SRU, one approach would be to use a tool like SimpleServer to build a + gateway. An easier option is to use Index Data's MasterKey Connect + service, which will expose virtually any resource + through Z39.50/SRU, dead easy to integrate with Pazpar2. + The service is hosted, so all you have to do is to let us + know which resources you are interested in, and we operate the gateways, + or Connectors for you for a low annual charge. + Types of resources supported include + commercial databases, free online resources, and even local resources; + almost anything that can be accessed through a web-facing user + interface can be accessed in this way. + Contact info@indexdata.com for more information. + See for an example. + +
+ +
+ A note on the name Pazpar2 + + The name Pazpar2 derives from three sources. One one hand, it is + Index Data's second major piece of software that does parallel + searching of Z39.50 targets. On the other, it is a near-homophone + of Passpartout, the ever-helpful servant in Jules Verne's novel + Around the World in Eighty Days (who helpfully uses the language + of his master). Finally, "passe par tout" means something like + "passes through anything" in French -- on other words, a universal + solution, or if you like a MasterKey. + +
Installation - The Pazpar2 package is very small. It includes documentation as well + The Pazpar2 package includes documentation as well as the Pazpar2 server. The package also includes a simple user - interface test1 which consists of a single HTML page and a single + interface called "test1", which consists of a single HTML page and a single JavaScript file to illustrate the use of Pazpar2. Pazpar2 depends on the following tools/libraries: YAZ - - - The popular Z39.50 toolkit for the C language. - YAZ must be compiled with Libxml2/Libxslt support. - - + + + The popular Z39.50 toolkit for the C language. + YAZ must be compiled with Libxml2/Libxslt support. + + International - Components for Unicode (ICU) - - - ICU provides Unicode support for non-English languages with - character sets outside the range of 7bit ASCII, like - Greek, Russian, German and French. Pazpar2 uses the ICU - Unicode character conversions, Unicode normalization, case - folding and other fundamental operations needed in - tokenization, normalization and ranking of records. - - - Compiling, linking, and usage of the ICU libraries is optional, - but strongly recommended for usage in an international - environment. - - + Components for Unicode (ICU) + + + ICU provides Unicode support for non-English languages with + character sets outside the range of 7bit ASCII, like + Greek, Russian, German and French. Pazpar2 uses the ICU + Unicode character conversions, Unicode normalization, case + folding and other fundamental operations needed in + tokenization, normalization and ranking of records. + + + Compiling, linking, and usage of the ICU libraries is optional, + but strongly recommended for usage in an international + environment. + + @@ -191,32 +235,36 @@
- Installation on Unix (from Source) + Installation from source on Unix (including Linux, MacOS, etc.) The latest source code for Pazpar2 is available from . - Only few systems have none of the required - tools binary packages. - If, for example, Libxml2/libXSLT libraries - are already installed as development packages use these. + Most Unix-based operating systems have the required + tools available as binary packages. + For example, if Libxml2/libXSLT libraries + are already installed as development packages, use these. - + - Ensure that the development libraries + header files are + Ensure that the development libraries and header files are available on your system before compiling Pazpar2. For installation - of YAZ, refer to the YAZ installation chapter. + of YAZ, refer to the Installation chapter of the YAZ manual at + . + + + Once the dependencies are in place, Pazpar2 can be unpacked and + installed as follows: - gunzip -c pazpar2-version.tar.gz|tar xf - - cd pazpar2-version + tar xzf pazpar2-VERSION.tar.gz + cd pazpar2-VERSION ./configure make - su - make install + sudo make install The make install will install manpages as well as the - Pazpar2 server, pazpar2, + Pazpar2 server, pazpar2, in PREFIX/sbin. By default, PREFIX is /usr/local/ . This can be changed with configure option . @@ -224,63 +272,67 @@
- Installation on Windows (from Source) - - Pazpar2 can be built for Windows using - Microsoft Visual Studio. - The support files for building YAZ on Windows are located in the - win directory. The compilation is performed - using the win/makefile which is to be - processed by the NMAKE utility part of Visual Studio. - - - Ensure that the development libraries + header files are - available on your system before compiling Pazpar2. For installation - of YAZ, refer to the YAZ installation chapter. - It is easiest if YAZ and Pazpar2 are unpacked in the same - directory (side-by-side). - - - The compilation is tuned by editing the makefile of Pazpar2. - The process is similar to YAZ. Adjust the various directories - YAZ_DIR, ZLIB_DIR, .. - - - Compile Pazpar2 by invoking nmake in - the win directory. - The resulting binaries of the build process are located in the - bin of the Pazpar2 source - tree - including the pazpar2.exe and necessary DLLs. - - - The Windows version of Pazpar2 is a console application. It may - be installed as a Windows Service by adding option - -install for the pazpar2 program. This will - register Pazpar2 as a service and use the other options provided - in the same invocation. For example: - - cd \MyPazpar2\etc - ..\bin\pazpar2 -install -c pazpar2.cfg -l pazpar2.log - - The Pazpar2 service may now be controlled via the Service Control - Panel. It may be unregistered by passing the -remove - option. Example: - - cd \MyPazpar2\etc - ..\bin\pazpar2 -remove - - + Installation from source on Windows + + Pazpar2 can be built for Windows using + Microsoft Visual Studio. + The support files for building YAZ on Windows are located in the + win directory. The compilation is performed + using the win/makefile which is to be + processed by the NMAKE utility part of Visual Studio. + + + Ensure that the development libraries and header files are + available on your system before compiling Pazpar2. For installation + of YAZ, refer to + the Installation chapter of the YAZ manual at + . + It is easiest if YAZ and Pazpar2 are unpacked in the same + directory (side-by-side). + + + The compilation is tuned by editing the makefile of Pazpar2. + The process is similar to YAZ. Adjust the various directories + YAZ_DIR, ZLIB_DIR, etc., + as required. + + + Compile Pazpar2 by invoking nmake in + the win directory. + The resulting binaries of the build process are located in the + bin of the Pazpar2 source + tree - including the pazpar2.exe and necessary DLLs. + + + The Windows version of Pazpar2 is a console application. It may + be installed as a Windows Service by adding option + -install for the pazpar2 program. This will + register Pazpar2 as a service and use the other options provided + in the same invocation. For example: + + cd \MyPazpar2\etc + ..\bin\pazpar2 -install -f pazpar2.cfg -l pazpar2.log + + The Pazpar2 service may now be controlled via the Service Control + Panel. It may be unregistered by passing the -remove + option. Example: + + cd \MyPazpar2\etc + ..\bin\pazpar2 -remove + +
- Installation of test1 interface + Installation of test interfaces - In this section we outline how to install a simple interface that - is part of the Pazpar2 source package. Note that Debian users can - save time by just installing package pazpar2-test1. + In this section we show how to make available the set of simple + interfaces that are part of the Pazpar2 source package, and which + demonstrate some ways to use Pazpar2. (Note that Debian users can + save time by just installing the package pazpar2-test1.) - A web server must be installed and running on the system, such as Apache. + A web server, such as Apache, must be installed and running on the system. @@ -289,86 +341,111 @@ cd etc cp pazpar2.cfg.dist pazpar2.cfg - cp edu.xml settings ../src/pazpar2 -f pazpar2.cfg And on Windows: cd etc copy pazpar2.cfg.dist pazpar2.cfg - copy edu.xml settings ..\bin\pazpar2 -f pazpar2.cfg - This will start a Pazpar2 listener on port 9004. It will proxy - HTTP requests to localhost - port 80, which we assume will be the regular + This will start a Pazpar2 listener on port 9004. It will proxy + HTTP requests to port 80 on localhost, which we assume will be the regular HTTP server on the system. Inspect and modify pazpar2.cfg as needed - if this is to be changed. The pazpar2.cfg includes settings from the - directory settings. + if this is to be changed. The pazpar2.cfg file includes settings from the + file settings/edu.xml to use for searches. + - Make a new console and move to the other stuff. - For more information about pazpar2 options refer to the manpage. + The test UIs are located in www. Ensure that this + directory is available to the web server by copying + www to the document root, + using Apache's Alias directive, or + creating a symbolic link: for example, on a Debian or Ubuntu + system with Apache2 installed from the standard package, you might + make the link as follows: + + cd .../pazpar2 + sudo ln -s `pwd`/www /var/www/pazpar2-demo + - The test1 UI is located in www/test1. Ensure this - directory is available to the web server by either copying - test1 to the document root, create a symlink or - use Apache's Alias directive. + This makes the test applications visible at + + but they can not be run successfully from that URL, as they submit + search requests back to the server form which they were served, + and Apache2 doesn't know how to handle them. Instead, the test + applications must be accessed from Pazpar2 itself, acting as a + proxy to Apache2, at the URL + - The interface test1 interface should now be available on port 8004. + From here, the demo applications can be + accessed: test1, test2 and + jsdemo + are pure HTML+JavaScript setups, needing no server-side + intelligence; + demo + requires PHP on the server. - If you don't see the test1 interface. See if test1 is really available - on the same URL but on port 80. If it's not, the Apache configuration - (or other) is not correct. + If you don't see the test interfaces, check whether they are available + on port 80 (i.e. directly from the Apache2 server). If not, the + Apache configuration is incorrect. In order to use Apache as frontend for the interface on port 80 - for public access etc., refer to + for public access etc., refer to .
- Installation on Debian GNU/Linux + Installation on Debian GNU/Linux and Ubuntu - Index Data provides Debian packages for Pazpar2. These are prepared - for Debian versions Etch and Lenny (as of 2007). - These packages are available at - . + Index Data provides Debian and Ubuntu packages for Pazpar2. + As of February 2010, these + are prepared for Debian versions Etch, Lenny and Squeeze; and for + Ubuntu versions 8.04 (hardy), 8.10 (intrepid), 9.04 (jaunty) and + 9.10 (karmic). These packages are available at + and + .
Apache 2 Proxy - Apache 2 has a - + Apache 2 has a + proxy module - which allows Pazpar2 to become a backend to an Apache 2 + + which allows Pazpar2 to become a backend to an Apache 2 based web service. The Apache 2 proxy must operate in the Reverse Proxy mode. - + On a Debian based Apache 2 system, the relevant modules can be enabled with: - sudo a2enmod proxy_http + sudo a2enmod proxy_http proxy_balancer - Traditionally Pazpar2 interprets URL paths with suffix + Traditionally Pazpar2 interprets URL paths with suffix /search.pz2. - The - ProxyPass directive of Apache must be used to map a URL path + The + + ProxyPass + + directive of Apache must be used to map a URL path the the Pazpar2 server (listening port). @@ -391,13 +468,13 @@ ProxyRequests Off - + AddDefaultCharset off Order deny,allow Allow from all - + ProxyPass /myportal/search.pz2 http://localhost:8004/search.pz2 ProxyVia Off @@ -412,7 +489,7 @@ Using Pazpar2 This chapter provides a general introduction to the use and - deployment of Pazpar2. + deployment of Pazpar2.
@@ -445,7 +522,7 @@ with the server from which the enclosing HTML page or object originated, Pazpar2 is designed so that it can act as a transparent proxy in front of an existing webserver (see for details). + linkend="pazpar2_conf"/> for details). In this mode, all regular HTTP requests are transparently passed through to your webserver, while Pazpar2 only intercepts search-related webservice requests. @@ -508,18 +585,17 @@ The intermediate, internal representation of the record looks like this: - - The Shining + - King, Stephen + The Shining - ebook + King, Stephen - - - ]]> + ebook + + +]]> As you can see, there isn't much to it. There are really only a few important elements to this file. @@ -536,7 +612,8 @@ records are never merged. The 'metadata' elements provide the meat of the elements -- the content. the 'type' attribute is used to match each element against processing rules that determine what - happens to the data element next. + happens to the data element next. The attribute, 'rank' specifies + specifies a multipler for ranking for this element. @@ -547,6 +624,31 @@ in the retrieval record ultimately drives merging, sorting, ranking, the extraction of browse facets, and display, all configurable. + + + Pazpar2 1.6.37 and later also allows already clustered records to + be ingested. Suppose a database already clusters for us and we would like + to keep that cluster for Pazpar2. In that case we can generate a + cluster wrapper element that holds individual + record elements. + + + Cluster record example: + + + The Shining + King, Stephen + ebook + + + The Shining + King, Stephen + audio + + + ]]> +
@@ -573,81 +675,12 @@ search. You start a new search using the 'search' command. Once the search has been started, you can follow its progress using the 'stat', 'bytarget', 'termlist', or 'show' commands. Detailed records - can be fetched using the 'record' command. + can be fetched using the 'record' command.
§-ajaxdev; -
- Connecting to non-standard resources - - Pazpar2 uses Z39.50 as its switchboard language -- i.e. as far as it - is concerned, all resources speak Z39.50, or its webservices derivatives, - SRU/SRW. It is, however, equipped - to handle a broad range of different server behavior, through - configurable query mapping and record normalization. If you develop - configuration, stylesheets, etc., for a new type of resources, we - encourage you to share your work. But you can also use Pazpar2 to - connect to hundreds of resources that do not support standard - protocols. - - - - For a growing number of resources, Z39.50 is all you need. Over the - last few years, a number of commercial, full-text resources have - implemented Z39.50. These can be used through Pazpar2 with little or - no effort. Resources that use non-standard record formats will - require a bit of XSLT work, but that's all. - - - - But what about resources that don't support Z39.50 at all? Some resources might - support OpenSearch, private, XML/HTTP-based protocols, or something - else entirely. Some databases exist only as web user interfaces and - will require screen-scraping. Still others exist only as static - files, or perhaps as databases supporting the OAI-PMH protocol. - There is hope! Read on. - - - - Index Data continues to advocate the support of open standards. We - work with database vendors to support standards, so you don't have - to worry about programming against non-standard services. We also - provide tools (see SimpleServer) - which make it comparatively easy to build gateways against servers - with non-standard behavior. Again, we encourage you to share any - work you do in this direction. - - - - But the bottom line is that working with non-standard resources in - metasearching is really, really hard. If you want to build a - project with Pazpar2, and you need access to resources with - non-standard interfaces, we can help. We run gateways to more than - 2,000 popular, commercial databases and other resources, - making it simple - to plug them directly into Pazpar2. For a small annual fee per - database, we can help you establish connections to your licensed - resources. Meanwhile, you can help! If you build your own - standards-compliant gateways, host them for others, or share the - code! And tell your vendors that they can save everybody money and - increase the appeal of their resources by supporting standards. - - - - There are those who will ask us why we are using Z39.50 as our - switchboard language rather than a different protocol. Basically, - we believe that Z39.50 is presently the most widely implemented - information retrieval protocol that has the level of functionality - required to support a good metasearching experience (structured - searching, structured, well-defined results). It is also compact and - efficient, and there is a very broad range of tools available to - implement it. - -
-
Unicode Compliance @@ -669,11 +702,206 @@ In addition, the ICU tokenization and normalization rules must - be defined in the master configuration file described in + be defined in the master configuration file described in .
+
+ Load balancing + + Just like any web server, Pazpar2, can be load balanced by a standard + hardware or software load balancer as long as the session stickiness + is ensured. If you are already running the Apache2 web server in front + of Pazpar2 and use the apache mod_proxy module to 'relay' client + requests to Pazpar2, this set up can be easily extended to include + load balancing capabilites. + To do so you need to enable the + + mod_proxy_balance + + module in your Apache2 installation. + + + + On a Debian based Apache 2 system, the relevant modules can + be enabled with: + + sudo a2enmod proxy_http + + + + + The mod_proxy_balancer can pass all 'sessionsticky' requests to the + same backend worker as long as the requests are marked with the + originating worker's ID (called 'route'). If the Pazpar2 serverID is + configured (by setting an 'id' attribute on the 'server' element in + the Pazpar2 configuration file) Pazpar2 will append it to the + 'session' element returned during the 'init' in a mod_proxy_balancer + compatible manner. + Since the 'session' is then re-sent by the client (for all pazpar2 + request besides 'init'), the balancer can use the marker to pass + the request to the right route. To do so the balancer needs to be + configured to inspect the 'session' parameter. + + + + Apache 2 load balancing configuration + + Having 4 Pazpar2 instances running on the same host, port range of + 8004-8007 and serverIDs of: pz1, pz2, pz3 and pz4 respectively we + could use the following Apache 2 configuration to expose a single + pazpar2 'endpoint' on a standard + (/pazpar2/search.pz2) location: + + + AddDefaultCharset off + Order deny,allow + Allow from all + + ProxyVia Off + + # 'route' has to match the configured pazpar2 server ID + + BalancerMember http://localhost:8004 route=pz1 + BalancerMember http://localhost:8005 route=pz2 + BalancerMember http://localhost:8006 route=pz3 + BalancerMember http://localhost:8007 route=pz4 + + + # route is resent in the 'session' param which has the form: + # 'sessid.serverid', understandable by the mod_proxy_load_balancer + # this is not going to work if the client tampers with the 'session' param + ProxyPass /pazpar2/search.pz2 balancer://pz2cluster lbmethod=byrequests stickysession=session nofailover=On + ]]> + + The 'ProxyPass' line sets up a reverse proxy for request + ‘/pazpar2/search.pz2’ and delegates all requests to the load balancer + (virtual worker) with name ‘pz2cluster’. + Sticky sessions are enabled and implemented using the ‘session’ parameter. + The ‘Proxy’ section lists all the servers (real workers) which the + load balancer can use. + + + + +
+ +
+ Relevance ranking + + Pazpar2 uses a variant of the fterm frequency–inverse document frequency + (Tf-idf) ranking algorithm. + + + The Tf-part is straightforward to calculate and is based on the + documents that Pazpar2 fetches. The idf-part, however, is more tricky + since the corpus at hand is ONLY the relevant documents and not + irrelevant ones. Pazpar2 does not have the full corpus -- only the + documents that match a particular search. + + + Computatation of the Tf-part is based on the normalized documents. + The length, the position and terms are thus normalized at this point. + Also the computation if performed for each document received from the + target - before merging takes place. The result of a TF-compuation is + added to the TF-total of a cluster. Thus, if a document occurs twice, + then the TF-part is doubled. That, however, can be adjusted, because the + TF-part may be divided by the number of documents in a cluster. + + + The algorithm used by Pazpar2 has two phases. In phase one + Pazpar2 computes a tf-array .. This is being done as records are + fetched form the database. In this case, the rank weigth + w, the and rank tweaks lead, + follow and length. + + + 0) + w[i] += w[i] * follow / (1+log2(d) + // length: length of field (number of terms that is) + if (length strategy is "linear") + tf[i] += w[i] / length; + else if (length strategy is "log") + tf[i] += w[i] / log2(length); + else if (length strategy is "none") + tf[i] += w[i]; + ]]> + + In phase two, the idf-array is computed and the final score + is computed. This is done for each cluster as part of each show command. + The rank tweak cluster is in use here. + + 0) + idf[i] = log(1 + doctotal / dococcur[i]) + else + idf[i] = 0; + + relevance = 0; + for i = 1, .., N: (each term) + if (cluster is "yes") + tf[i] = tf[i] / cluster_size; + relevance += 100000 * tf[i] / idf[i]; + ]]> + + For controlling the ranking parameters, refer to the + rank element of the + service definition. + Refer to the rank attribute + of the metadata element for how to control ranking for individual + metadata fields. + +
+ +
+ Pazpar2 and MasterKey Connect + + MasterKey Connect is a hosted connector, or gateway, service that exposes + whatever searchable resources you need. Since the service exposes all + resources using Z39.50 (or SRU), it is easy to set up Pazpar2 to use the + service. In particular, since all connectors expose basically the same core + behavior, it is a good use of Pazpar2's mechanism for managing default + behaviors across similar databases. + + + After installation of Pazpar2, the directory + /etc/pazpar2/settings/mkc (location may + vary depending on installation preferences) contains an example setup that + searches two different resources through a MasterKey Connect demo account. + The file mkc.xml contains default parameters that will work for all + MasterKey Connect resources (if you decide to become a customer of the + service, you will substitute your own account credentials for + the guest/guest). The other files contain specific information about + a couple of demonstration resources. + + + + To play with the demo, just create a symlink from + /etc/pazpar2/services-enabled/default.xml + to /etc/pazpar2/services-available/mkc.xml. + And restart Pazpar2. You should now be able to search the two demo + resources using JSDemo or any user interface of your choice. + If you are interested in learning more about MasterKey Connect, or to + try out the service for free against your favorite online resource, just + contact us at info@indexdata.com. + +
+ @@ -687,51 +915,44 @@ &manref; - License - - - Pazpar2, - Copyright © ©right-year; Index Data. - - - - Pazpar2 is free software; you can redistribute it and/or modify it under - the terms of the GNU General Public License as published by the Free - Software Foundation; either version 2, or (at your option) any later - version. - - - - Pazpar2 is distributed in the hope that it will be useful, but WITHOUT ANY - WARRANTY; without even the implied warranty of MERCHANTABILITY or - FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License - for more details. - - - - You should have received a copy of the GNU General Public License - along with Pazpar2; see the file LICENSE. If not, write to the - Free Software Foundation, - 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA - + + License + + + Pazpar2, + Copyright © ©right-year; Index Data. + + + + Pazpar2 is free software; you can redistribute it and/or modify it under + the terms of the GNU General Public License as published by the Free + Software Foundation; either version 2, or (at your option) any later + version. + + + + Pazpar2 is distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or + FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + for more details. + + + + You should have received a copy of the GNU General Public License + along with Pazpar2; see the file LICENSE. If not, write to the + Free Software Foundation, + 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + &gpl2; - + - +