X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Fbook.xml;h=4a460b5e3b8506431476238023ba703c941f606a;hb=b91e0566bea75bbed670fd2eca5c2868e4879053;hp=94f34580336d1ffc836480245755acbfa1394c22;hpb=d198a8df0eb138d75b8ca09734bfa0b368b80d43;p=pazpar2-moved-to-github.git diff --git a/doc/book.xml b/doc/book.xml index 94f3458..4a460b5 100644 --- a/doc/book.xml +++ b/doc/book.xml @@ -1,15 +1,14 @@ - %local; %entities; - - %common; + + %idcommon; ]> - Pazpar2 - User's Guide and Reference @@ -19,6 +18,18 @@ AdamDickmeiss + + MarcCromme + + + JakubSkoczen + + + MikeTaylor + + + DennisSchafroth + &version; ©right-year; @@ -26,713 +37,917 @@ - Pazpar2 is a high-performance, user interface-independent, data - model-independent metasearching - middleware featuring merging, relevance ranking, record sorting, + Pazpar2 is a high-performance metasearch engine featuring + merging, relevance ranking, record sorting, and faceted results. + It is middleware: it has no user interface of its own, but can be + configured and controlled by an XML-over-HTTP web-service to provide + metasearching functionality behind any user interface. - This document is a guide and reference to Pazpar version &version;. + This document is a guide and reference to Pazpar2 version &version;. - - - - - + + + + + - - - - Introduction - - Pazpar2 is a stand-alone metasearch client with a webservice API, designed - to be used either from a browser-based client (JavaScript, Flash, Java, - etc.), from from server-side code, or any combination of the two. - Pazpar2 is a highly optimized client designed to - search many resources in parallel. It implements record merging, - relevance-ranking and sorting by arbitrary data content, and facet - analysis for browsing purposes. It is designed to be data model - independent, and is capable of working with MARC, DublinCore, or any - other XML-structured response format -- XSLT is used to normalize and extract - data from retrieval records for display and analysis. It can be used - against any server which supports the Z39.50 protocol. Proprietary - backend modules can be used to support a large number of other protocols - (please contact Index Data for further information about this). - - - Additional functionality such as - user management, attractive displays are expected to be implemented by - applications that use pazpar2. Pazpar2 is user interface independent. - Its functionality is exposed through a simple REST-style webservice API, - designed to be simple to use from an Ajax-enbled browser, Flash - animation, Java applet, etc., or from a higher-level server-side language - like PHP or Java. Because session information can be shared between - browser-based logic and your server-side scripting, there is tremendous - flexibility in how you implement your business logic on top of pazpar2. - - - Once you launch a search in pazpar2, the operation continues behind the - scenes. Pazpar2 connects to servers, carries out searches, and - retrieves, deduplicates, and stores results internally. Your application - code may periodically inquire about the status of an ongoing operation, - and ask to see records or other result set facets. Result become - available immediately, and it is easy to build end-user interfaces which - feel extremely responsive, even when searching more than 100 servers - concurrently. - - - Pazpar2 is designed to be highly configurable. Incoming records are - normalized to XML/UTF-8, and then further normalized using XSLT to a - simple internal representation that is suitable for analysis. By - providing XSLT stylesheets for different kinds of result records, you - can tune pazpar2 to work against different kinds of information - retrieval servers. Finally, metadata is extracted, in a configurable - way, from this internal record, to support display, merging, ranking, - result set facets, and sorting. Pazpar2 is not bound to a specific model - of metadata, such as DublinCore or MARC -- by providing the right - configuration, it can work with a number of different kinds of data in - support of many different applications. - - - Pazpar2 is designed to be efficient and scalable. You can set it up to - search several hundred targets in parallel, or you can use it to support - hundreds of concurrent users. It is implemented with the same attention - to performance and economy that we use in our indexing engines, so that - you can focus on building your application, without worrying about the - details of metasearch logic. You can devote all of your attention to - usability and let pazpar2 do what it does best -- metasearch. - - - If you wish to connect to commercial or other databases which do not - support open standards, please contact Index Data. We have a licensing - agreement with a third party vendor which will enable pazpar2 to access - thousands of online databases, in addition the vast number of catalogs - and online services that support the Z39.50 protocol. - - - Pazpar2 is our attempt to re-think the traditional paradigms for - implementing and deploying metasearch logic, with an uncompromising - approach to performance, and attempting to make maximum use of the - capabilities of modern browsers. The demo user interface that - accompanies the distribution is but one example. If you think of new - ways of using pazpar2, we hope you'll share them with us, and if we - can provide assistance with regards to training, design, programming, - integration with different backends, hosting, or support, please don't - hesitate to contact us. If you'd like to see functionality in pazpar2 - that is not there today, please don't hesitate to contact us. It may - already be in our development pipeline, or there might be a - possibility for you to help out by sponsoring development time or - code. Either way, get in touch and we will give you straight answers. - - - Enjoy! - - - Pazpar2 is covered by the GNU license version 2. - See for further information. - - + + + + Introduction - - Installation +
+ What Pazpar2 is + + Pazpar2 is a stand-alone metasearch engine with a web-service API, designed + to be used either from a browser-based client (JavaScript, Flash, + Java applet, + etc.), from server-side code, or any combination of the two. + Pazpar2 is a highly optimized client designed to + search many resources in parallel. It implements record merging, + relevance-ranking and sorting by arbitrary data content, and facet + analysis for browsing purposes. It is designed to be data-model + independent, and is capable of working with MARC, DublinCore, or any + other XML-structured response format + -- XSLT is used to normalize and extract + data from retrieval records for display and analysis. It can be used + against any server which supports the + Z39.50, + SRU/SRW + or Solr protocol. Proprietary + backend modules can function as connectors between these standard + protocols and any non-standard API, including web-site scraping, to + support a large number of other protocols. + + + Additional functionality such as + user management and attractive displays are expected to be implemented by + applications that use Pazpar2. Pazpar2 itself is user-interface independent. + Its functionality is exposed through a simple XML-based web-service API, + designed to be easy to use from an Ajax-enabled browser, Flash + animation, Java applet, etc., or from a higher-level server-side language + like PHP, Perl or Java. Because session information can be shared between + browser-based logic and server-side scripting, there is tremendous + flexibility in how you implement application-specific logic on top + of Pazpar2. + + + Once you launch a search in Pazpar2, the operation continues behind the + scenes. Pazpar2 connects to servers, carries out searches, and + retrieves, deduplicates, and stores results internally. Your application + code may periodically inquire about the status of an ongoing operation, + and ask to see records or result set facets. Results become + available immediately, and it is easy to build end-user interfaces than + feel extremely responsive, even when searching more than 100 servers + concurrently. + + + Pazpar2 is designed to be highly configurable. Incoming records are + normalized to XML/UTF-8, and then further normalized using XSLT to a + simple internal representation that is suitable for analysis. By + providing XSLT stylesheets for different kinds of result records, you + can configure Pazpar2 to work against different kinds of information + retrieval servers. Finally, metadata is extracted in a configurable + way from this internal record, to support display, merging, ranking, + result set facets, and sorting. Pazpar2 is not bound to a specific model + of metadata, such as DublinCore or MARC: by providing the right + configuration, it can work with any combination of different kinds of data + in support of many different applications. + + + Pazpar2 is designed to be efficient and scalable. You can set it up to + search several hundred targets in parallel, or you can use it to support + hundreds of concurrent users. It is implemented with the same attention + to performance and economy that we use in our indexing engines, so that + you can focus on building your application without worrying about the + details of metasearch logic. You can devote all of your attention to + usability and let Pazpar2 do what it does best -- metasearch. + - Pazpar2 depends on the following tools/libraries: - - YAZ - - - The popular Z39.50 toolkit for the C language. YAZ must be - compiled with Libxml2/Libxslt support. - - - - + Pazpar2 is our attempt to re-think the traditional paradigms for + implementing and deploying metasearch logic, with an uncompromising + approach to performance, and attempting to make maximum use of the + capabilities of modern browsers. The demo user interface that + accompanies the distribution is but one example. If you think of new + ways of using Pazpar2, we hope you'll share them with us, and if we + can provide assistance with regards to training, design, programming, + integration with different backends, hosting, or support, please don't + hesitate to contact us. If you'd like to see functionality in Pazpar2 + that is not there today, please don't hesitate to contact us. It may + already be in our development pipeline, or there might be a + possibility for you to help out by sponsoring development time or + code. Either way, get in touch and we will give you straight answers. - In order to compile Pazpar2 an ANSI C compiler is - required. The requirements should be the same as for YAZ. + Enjoy! + + Pazpar2 is covered by the GNU General Public License (GPL) version 2. + See for further information. + +
-
- Installation on Unix (from Source) - - Here is a quick step-by-step guide on how to compile the - tools that Pazpar2 uses. Only few systems have none of the required - tools binary packages. If, for example, Libxml2/libxslt are already - installed as development packages use these. - - - - Ensure that the development libraries + header files are - available on your system before compiling Pazpar2. For installation - of YAZ, refer to the YAZ installation chapter. - +
+ Connectors to non-standard databases + + If you need to access commercial or open access resources that don't support + Z39.50 or SRU, one approach would be to use a tool like SimpleServer to build a + gateway. An easier option is to use Index Data's MasterKey Connect + service, which will expose virtually any resource + through Z39.50/SRU, dead easy to integrate with Pazpar2. + The service is hosted, so all you have to do is to let us + know which resources you are interested in, and we operate the gateways, + or Connectors for you for a low annual charge. + Types of resources supported include + commercial databases, free online resources, and even local resources; + almost anything that can be accessed through a web-facing user + interface can be accessed in this way. + Contact info@indexdata.com for more information. + See for an example. + +
+ +
+ A note on the name Pazpar2 + + The name Pazpar2 derives from three sources. One one hand, it is + Index Data's second major piece of software that does parallel + searching of Z39.50 targets. On the other, it is a near-homophone + of Passpartout, the ever-helpful servant in Jules Verne's novel + Around the World in Eighty Days (who helpfully uses the language + of his master). Finally, "passe par tout" means something like + "passes through anything" in French -- on other words, a universal + solution, or if you like a MasterKey. + +
+ + + + Installation + + The Pazpar2 package includes documentation as well + as the Pazpar2 server. The package also includes a simple user + interface called "test1", which consists of a single HTML page and a single + JavaScript file to illustrate the use of Pazpar2. + + + Pazpar2 depends on the following tools/libraries: + + YAZ + + + The popular Z39.50 toolkit for the C language. + YAZ must be compiled with + Libxml2/Libxslt support. + + + It is highly recommended that YAZ is also compiled with + ICU support. + + + + + + + In order to compile Pazpar2, a C compiler which supports C99 or later + is required. + + +
+ Installation from source on Unix (including Linux, MacOS, etc.) + + The latest source code for Pazpar2 is available from + . + Most Unix-based operating systems have the required + tools available as binary packages. + For example, if Libxml2/libXSLT libraries + are already installed as development packages, use these. + + + + Ensure that the development libraries and header files are + available on your system before compiling Pazpar2. For installation + of YAZ, refer to the Installation chapter of the YAZ manual at + . + + + Once the dependencies are in place, Pazpar2 can be unpacked and + installed as follows: + + + tar xzf pazpar2-VERSION.tar.gz + cd pazpar2-VERSION + ./configure + make + sudo make install + + + The make install will install manpages as well as the + Pazpar2 server, pazpar2, + in PREFIX/sbin. + By default, PREFIX is /usr/local/ . This can be + changed with configure option . + +
+ +
+ Installation from source on Windows + + Pazpar2 can be built for Windows using + Microsoft Visual Studio. + The support files for building YAZ on Windows are located in the + win directory. The compilation is performed + using the win/makefile which is to be + processed by the NMAKE utility part of Visual Studio. + + + Ensure that the development libraries and header files are + available on your system before compiling Pazpar2. For installation + of YAZ, refer to + the Installation chapter of the YAZ manual at + . + It is easiest if YAZ and Pazpar2 are unpacked in the same + directory (side-by-side). + + + The compilation is tuned by editing the makefile of Pazpar2. + The process is similar to YAZ. Adjust the various directories + YAZ_DIR, ZLIB_DIR, etc., + as required. + + + Compile Pazpar2 by invoking nmake in + the win directory. + The resulting binaries of the build process are located in the + bin of the Pazpar2 source + tree - including the pazpar2.exe and necessary DLLs. + + + The Windows version of Pazpar2 is a console application. It may + be installed as a Windows Service by adding option + -install for the pazpar2 program. This will + register Pazpar2 as a service and use the other options provided + in the same invocation. For example: + + cd \MyPazpar2\etc + ..\bin\pazpar2 -install -f pazpar2.cfg -l pazpar2.log + + The Pazpar2 service may now be controlled via the Service Control + Panel. It may be unregistered by passing the -remove + option. Example: + + cd \MyPazpar2\etc + ..\bin\pazpar2 -remove + + +
+ +
+ Installation of test interfaces + + In this section we show how to make available the set of simple + interfaces that are part of the Pazpar2 source package, and which + demonstrate some ways to use Pazpar2. (Note that Debian users can + save time by just installing the package pazpar2-test1.) + + + A web server, such as Apache, must be installed and running on the system. + + + + Start the Pazpar2 daemon using the 'in-source' binary of the Pazpar2 + daemon. On Unix the process is: + + cd etc + cp pazpar2.cfg.dist pazpar2.cfg + ../src/pazpar2 -f pazpar2.cfg + + And on Windows: + + cd etc + copy pazpar2.cfg.dist pazpar2.cfg + ..\bin\pazpar2 -f pazpar2.cfg + + This will start a Pazpar2 listener on port 9004. It will proxy + HTTP requests to port 80 on localhost, which we assume will be the regular + HTTP server on the system. Inspect and modify pazpar2.cfg as needed + if this is to be changed. The pazpar2.cfg file includes settings from the + file settings/edu.xml + to use for searches. + + + + The test UIs are located in www. Ensure that this + directory is available to the web server by copying + www to the document root, + using Apache's Alias directive, or + creating a symbolic link: for example, on a Debian or Ubuntu + system with Apache2 installed from the standard package, you might + make the link as follows: + + cd .../pazpar2 + sudo ln -s `pwd`/www /var/www/pazpar2-demo + + + + + This makes the test applications visible at + + but they can not be run successfully from that URL, as they submit + search requests back to the server form which they were served, + and Apache2 doesn't know how to handle them. Instead, the test + applications must be accessed from Pazpar2 itself, acting as a + proxy to Apache2, at the URL + + + + + From here, the demo applications can be + accessed: test1, test2 and + jsdemo + are pure HTML+JavaScript setups, needing no server-side + intelligence; + demo + requires PHP on the server. + + + If you don't see the test interfaces, check whether they are available + on port 80 (i.e. directly from the Apache2 server). If not, the + Apache configuration is incorrect. + + + In order to use Apache as frontend for the interface on port 80 + for public access etc., refer to + . + +
+ +
+ Installation on Debian GNU/Linux and Ubuntu + + Index Data provides Debian and Ubuntu packages for Pazpar2 and YAZ. + Refer to these directories: + and + . + +
+ +
+ Installation on RedHat / CentOS + + Index Data provides CentOS packages for Pazpar2 and YAZ. + Refer to + for + CentOS packages. + +
+ +
+ Apache 2 Proxy + + Apache 2 has a + + proxy module + + which allows Pazpar2 to become a backend to an Apache 2 + based web service. The Apache 2 proxy must operate in the + Reverse Proxy mode. + + + + On a Debian based Apache 2 system, the relevant modules can + be enabled with: - gunzip -c pazpar2-version.tar.gz|tar xf - - cd pazpar2-version - ./configure - make - su - make install + sudo a2enmod proxy_http proxy_balancer -
+ -
- Installation on Debian GNU/Linux + + Traditionally Pazpar2 interprets URL paths with suffix + /search.pz2. + The + + ProxyPass + + directive of Apache must be used to map a URL path + the the Pazpar2 server (listening port). + + + - All dependencies for Pazpar2 are available as - Debian - packages for the sarge (stable in 2005) and etch (testing in 2005) - distributions. + The ProxyPass directive takes a prefix rather than + a suffix as URL path. It is important that the Java Script code + uses the prefix given for it. + + + + Apache 2 proxy configuration - The procedures for Debian based systems, such as - Ubuntu is probably similar + If Pazpar2 is running on port 8004 and the portal is using + search.pz2 inside portal in directory + /myportal/ we could use the following + Apache 2 configuration: + + + ProxyRequests Off + + + AddDefaultCharset off + Order deny,allow + Allow from all + + + ProxyPass /myportal/search.pz2 http://localhost:8004/search.pz2 + ProxyVia Off + + ]]> + +
+ +
+ + + Using Pazpar2 + + This chapter provides a general introduction to the use and + deployment of Pazpar2. + + +
+ Pazpar2 and your systems architecture + + Pazpar2 is designed to provide asynchronous, behind-the-scenes + metasearching functionality to your application, exposing this + functionality using a simple webservice API that can be accessed + from any number of development environments. In particular, it is + possible to combine Pazpar2 either with your server-side dynamic + website scripting, with scripting or code running in the browser, or + with any combination of the two. Pazpar2 is an excellent tool for + building advanced, Ajax-based user interfaces for metasearch + functionality, but it isn't a requirement -- you can choose to use + Pazpar2 entirely as a backend to your regular server-side scripting. + When you do use Pazpar2 in conjunction + with browser scripting (JavaScript/Ajax, Flash, applets, + etc.), there are special considerations. + + + + Pazpar2 implements a simple but efficient HTTP server, and it is + designed to interact directly with scripting running in the browser + for the best possible performance, and to limit overhead when + several browser clients generate numerous webservice requests. + However, it is still desirable to use a conventional webserver, + such as Apache, to serve up graphics, HTML documents, and + server-side scripting. Because the security sandbox environment of + most browser-side programming environments only allows communication + with the server from which the enclosing HTML page or object + originated, Pazpar2 is designed so that it can act as a transparent + proxy in front of an existing webserver (see for details). + In this mode, all regular + HTTP requests are transparently passed through to your webserver, + while Pazpar2 only intercepts search-related webservice requests. + + + + If you want to expose your combined service on port 80, you can + either run your regular webserver on a different port, a different + server, or a different IP address associated with the same server. + + + + Pazpar2 can also work behind + a reverse Proxy. Refer to ) + for more information. + This allows your existing HTTP server to operate on port 80 as usual. + Pazpar2 can be started on another (internal) port. + + + + Sometimes, it may be necessary to implement functionality on your + regular webserver that makes use of search results, for example to + implement data import functionality, emailing results, history + lists, personal citation lists, interlibrary loan functionality, + etc. Fortunately, it is simple to exchange information between + Pazpar2, your browser scripting, and backend server-side scripting. + You can send a session ID and possibly a record ID from your browser + code to your server code, and from there use Pazpar2s webservice API + to access result sets or individual records. You could even 'hide' + all of Pazpar2s functionality between your own API implemented on + the server-side, and access that from the browser or elsewhere. The + possibilities are just about endless. + +
+ +
+ Your data model + + Pazpar2 does not have a preconceived model of what makes up a data + model. There are no assumptions that records have specific fields or + that they are organized in any particular way. The only assumption + is that data comes packaged in a form that the software can work + with (presently, that means XML or MARC), and that you can provide + the necessary information to massage it into Pazpar2's internal + record abstraction. + + + + Handling retrieval records in Pazpar2 is a two-step process. First, + you decide which data elements of the source record you are + interested in, and you specify any desired massaging or combining of + elements using an XSLT stylesheet (MARC records are automatically + normalized to MARCXML before this step). + If desired, you can run multiple XSLT stylesheets in series to accomplish + this, but the output of the last one should be a representation of the + record in a schema that Pazpar2 understands. + + + + The intermediate, internal representation of the record looks like + this: + + + The Shining + + King, Stephen + + ebook + + +]]> + + As you can see, there isn't much to it. There are really only a few + important elements to this file. + + + + Elements should belong to the namespace + http://www.indexdata.com/pazpar2/1.0. + If the root node contains the + attribute 'mergekey', then every record that generates the same + merge key (normalized for case differences, white space, and + truncation) will be joined into a cluster. In other words, you + decide how records are merged. If you don't include a merge key, + records are never merged. The 'metadata' elements provide the meat + of the elements -- the content. the 'type' attribute is used to + match each element against processing rules that determine what + happens to the data element next. The attribute, 'rank' specifies + specifies a multipler for ranking for this element. + + + + The next processing step is the extraction of metadata from the + intermediate representation of the record. This is governed by the + 'metadata' elements in the 'service' section of the configuration + file. See for details. The metadata + in the retrieval record ultimately drives merging, sorting, ranking, + the extraction of browse facets, and display, all configurable. + + + + Pazpar2 1.6.37 and later also allows already clustered records to + be ingested. Suppose a database already clusters for us and we would like + to keep that cluster for Pazpar2. In that case we can generate a + cluster wrapper element that holds individual + record elements. + + + Cluster record example: + + + The Shining + King, Stephen + ebook + + + The Shining + King, Stephen + audio + + + ]]> + +
+ +
+ Client development overview + + You can use Pazpar2 from any environment that allows you to use + webservices. The initial goal of the software was to support + Ajax-based applications, but there literally are no limits to what + you can do. You can use Pazpar2 from Javascript, Flash, Java, etc., + on the browser side, and from any development environment on the + server side, and you can pass session tokens and record IDs freely + around between these environments to build sophisticated applications. + Use your imagination. + + + + The webservice API of Pazpar2 is described in detail in . + + + + In brief, you use the 'init' command to create a session, a + temporary workspace which carries information about the current + search. You start a new search using the 'search' command. Once the + search has been started, you can follow its progress using the + 'stat', 'bytarget', 'termlist', or 'show' commands. Detailed records + can be fetched using the 'record' command. + +
+ + §-ajaxdev; + +
+ Unicode Compliance + + Pazpar2 is Unicode compliant and language and locale aware but relies + on character encoding for the targets to be specified correctly if + the targets themselves are not UTF-8 based (most aren't). + Just a few bad behaving targets can spoil the search experience + considerably if for example Greek, Russian or otherwise non 7-bit ASCII + search terms are entered. In these cases some targets return + records irrelevant to the query, and the result screens will be + cluttered with noise. + + + While noise from misbehaving targets can not be removed, it can + be reduced using truly Unicode based ranking. This is an + option which is available to the system administrator if ICU + support is compiled into YAZ, see + for details. + + + In addition, the ICU tokenization and normalization rules must + be defined in the master configuration file described in + . + +
+ +
+ Load balancing + + Just like any web server, Pazpar2, can be load balanced by a standard + hardware or software load balancer as long as the session stickiness + is ensured. If you are already running the Apache2 web server in front + of Pazpar2 and use the apache mod_proxy module to 'relay' client + requests to Pazpar2, this set up can be easily extended to include + load balancing capabilites. + To do so you need to enable the + + mod_proxy_balance + + module in your Apache2 installation. + + + + On a Debian based Apache 2 system, the relevant modules can + be enabled with: - apt-get install libyaz-dev + sudo a2enmod proxy_http - - With these packages installed, the usual configure + make - procedure can be used for Pazpar2 as outlined in - . - -
-
+ + + + The mod_proxy_balancer can pass all 'sessionsticky' requests to the + same backend worker as long as the requests are marked with the + originating worker's ID (called 'route'). If the Pazpar2 serverID is + configured (by setting an 'id' attribute on the 'server' element in + the Pazpar2 configuration file) Pazpar2 will append it to the + 'session' element returned during the 'init' in a mod_proxy_balancer + compatible manner. + Since the 'session' is then re-sent by the client (for all pazpar2 + request besides 'init'), the balancer can use the marker to pass + the request to the right route. To do so the balancer needs to be + configured to inspect the 'session' parameter. + - - Using pazpar2 + + Apache 2 load balancing configuration - This chapter provides a general introduction to the use and deployment of pazpar2. + Having 4 Pazpar2 instances running on the same host, port range of + 8004-8007 and serverIDs of: pz1, pz2, pz3 and pz4 respectively we + could use the following Apache 2 configuration to expose a single + pazpar2 'endpoint' on a standard + (/pazpar2/search.pz2) location: + + + AddDefaultCharset off + Order deny,allow + Allow from all + + ProxyVia Off + + # 'route' has to match the configured pazpar2 server ID + + BalancerMember http://localhost:8004 route=pz1 + BalancerMember http://localhost:8005 route=pz2 + BalancerMember http://localhost:8006 route=pz3 + BalancerMember http://localhost:8007 route=pz4 + + + # route is resent in the 'session' param which has the form: + # 'sessid.serverid', understandable by the mod_proxy_load_balancer + # this is not going to work if the client tampers with the 'session' param + ProxyPass /pazpar2/search.pz2 balancer://pz2cluster lbmethod=byrequests stickysession=session nofailover=On + ]]> + + The 'ProxyPass' line sets up a reverse proxy for request + ‘/pazpar2/search.pz2’ and delegates all requests to the load balancer + (virtual worker) with name ‘pz2cluster’. + Sticky sessions are enabled and implemented using the ‘session’ parameter. + The ‘Proxy’ section lists all the servers (real workers) which the + load balancer can use. -
- Pazpar2 and your systems architecture - - Pazpar2 is designed to provide asynchronous, behind-the-scenes - metasearching functionality to your application, exposing this - functionality using a simple webservice API that can be accessed - from any number of development environments. In particular, it is - possible to combine pazpar2 either with your server-side dynamic - website scripting, with scripting or code running in the browser, or - with any combination of the two. Pazpar2 is an excellent tool for - building advanced, Ajax-based user interfaces for metasearch - functionality, but it isn't a requirement -- you can choose to use - pazpar2 entirely as a backend to your regular server-side scripting. - When you do use pazpar2 in conjunction - with browser scripting (JavaScript/Ajax, Flash, applets, etc.), there are - special considerations. - - - - Pazpar2 implements a simple but efficient HTTP server, and it is - designed to interact directly with scripting running in the browser - for the best possible performance, and to limit overhead when - several browser clients generate numerous webservice requests. - However, it is still desirable to use a conventional webserver, - such as Apache, to serve up graphics, HTML documents, and - server-side scripting. Because the security sandbox environment of - most browser-side programming environments only allows communication - with the server from which the enclosing HTML page or object - originated, pazpar2 is designed so that it can act as a transparent - proxy in front of an existing webserver (see for details). In this mode, all regular - HTTP requests are transparently passed through to your webserver, - while pazpar2 only intercepts search-related webservice requests. - - - - If you want to expose your combined service on port 80, you can - either run your regular webserver on a different port, a different - server, or a different IP address associated with the same server. - - - - Sometimes, it may be necessary to implement functionality on your - regular webserver that makes use of search results, for example to - implement data import functionality, emailing results, history - lists, personal citation lists, interlibrary loan functionality - ,etc. Fortunately, it is simple to exchange information between - pazpar2, your browser scripting, and backend server-side scripting. - You can send a session ID and possibly a record ID from your browser - code to your server code, and from there use pazpar2s webservice API - to access result sets or individual records. You could even 'hide' - all of pazpar2s functionality between your own API implemented on - the server-side, and access that from the browser or elsewhere. The - possibilities are just about endless. - -
- -
- Your data model - - Pazpar2 does not have a preconceived model of what makes up a data - model. There are no assumption that records have specific fields or - that they are organized in any particular way. The only assumption - is that data comes packaged in a form that the software can work - with (presently, that means XML or MARC), and that you can provide - the necessary information to massage it into pazpar2's internal - record abstraction. - - - - Handling retrieval records in pazpar2 is a two-step process. First, - you decide which data elements of the source record you are - interested in, and you specify any desired massaging or combining of - elements using an XSLT stylesheet (MARC records are automatically - normalized to MARCXML before this step). If desired, you can run - multiple XSLT stylesheets in series to accomplish this, but the - output of the last one should be a representation of the record in a - schema that pazpar2 understands. - - - - The intermediate, internal representation of the record looks like - this: - - - The Shining - - King, Stephen - - ebook - - - -]]> + - As you can see, there isn't much to it. There are really only a few - important elements to this file. - - - - Elements should belong to the namespace - http://www.indexdata.com/pazpar2/1.0. If the root node contains the - attribute 'mergekey', then every record that generates the same - merge key (normalized for case differences, white space, and - truncation) will be joined into a cluster. In other words, you - decide how records are merged. If you don't include a merge key, - records are never merged. The 'metadata' elements provide the meat - of the elements -- the content. the 'type' attribute is used to - match each element against processing rules that determine what - happens to the data element next. - - - - The next processing step is the extraction of metadata from the - intermediate representation of the record. This is governed by the - 'metadata' elements in the 'service' section of the configuration - file. See for details. The metadata - in the retrieval record ultimately drives merging, sorting, ranking, - the extraction of browse facets, and display, all configurable. - -
- -
- Client development - - You can use pazpar2 from any environment that allows you to use - webservices. The initial goal of the software was to support - Ajax-based applications, but there literally are no limits to what - you can do. You can use pazpar2 from Javascript, Flash, Java, etc., - on the browser side, and from any development environment on the - server side, and you can pass session tokens and record IDs freely - around between these environments to build sophisticated applications. - Use your imagination. - - - - The webservice API of pazpar2 is described in detail in . - - - - In brief, you use the 'init' command to create a session, a - temporary workspace which carries information about the current - search. You start a new search using the 'search' command. Once the - search has been started, you can follow its progress using the - 'stat', 'bytarget', 'termlist', or 'show' commands. Detailed records - can be fetched using the 'record' command. - -
- -
- Connecting to non-standard resources - - Pazpar2 uses Z39.50 as its switchboard language -- i.e. as far as it - is concerned, all resources speak Z39.50. It is, however, equipped - to handle a broad range of different server behavior, through - configurable query mapping and record normalization. If you develop - configuration, stylesheets, etc., for a new type of resources, we - encourage you to share your work. - - - - For a growing number of resources, Z39.50 is all you need. Over the - last few years, a number of commercial, full-text resources have - implemented Z39.50. These can be used through pazpar2 with little or - no effort. Resources that use non-standard record formats will - require a bit of XSLT work, but that's all. - - - - But what about resources that don't support Z39.50 at all? The NISO - SRU protocol is slowly gathering steam. Other resources might - support OpenSearch, private, XML/HTTP-based protocols, or something - else entirely. Some databases exist only as web user interfaces and - will require screen-scraping. Still others exist only as static - files, or perhaps as databases supporting the OAI-PMH protocol. - There is hope! Read on. - -
-
+
- - Reference - +
+ Relevance ranking - The material in this chapter is drawn directly from the individual - manual entries. + Pazpar2 uses a variant of the fterm frequency–inverse document frequency + (Tf-idf) ranking algorithm. - - &manref; - + + The Tf-part is straightforward to calculate and is based on the + documents that Pazpar2 fetches. The idf-part, however, is more tricky + since the corpus at hand is ONLY the relevant documents and not + irrelevant ones. Pazpar2 does not have the full corpus -- only the + documents that match a particular search. + + + Computatation of the Tf-part is based on the normalized documents. + The length, the position and terms are thus normalized at this point. + Also the computation if performed for each document received from the + target - before merging takes place. The result of a TF-compuation is + added to the TF-total of a cluster. Thus, if a document occurs twice, + then the TF-part is doubled. That, however, can be adjusted, because the + TF-part may be divided by the number of documents in a cluster. + + + The algorithm used by Pazpar2 has two phases. In phase one + Pazpar2 computes a tf-array .. This is being done as records are + fetched form the database. In this case, the rank weigth + w, the and rank tweaks lead, + follow and length. - License - -
GPL - + + 0) + w[i] += w[i] * follow / (1+log2(d) + // length: length of field (number of terms that is) + if (length strategy is "linear") + tf[i] += w[i] / length; + else if (length strategy is "log") + tf[i] += w[i] / log2(length); + else if (length strategy is "none") + tf[i] += w[i]; + ]]> - Pazpar2, - Copyright © ©right-year; Index Data. + In phase two, the idf-array is computed and the final score + is computed. This is done for each cluster as part of each show command. + The rank tweak cluster is in use here. - + 0) + idf[i] = log(1 + doctotal / dococcur[i]) + else + idf[i] = 0; + + relevance = 0; + for i = 1, .., N: (each term) + if (cluster is "yes") + tf[i] = tf[i] / cluster_size; + relevance += 100000 * tf[i] / idf[i]; + ]]> - Pazpar2 is free software; you can redistribute it and/or modify it under - the terms of the GNU General Public License as published by the Free - Software Foundation; either version 2, or (at your option) any later - version. + For controlling the ranking parameters, refer to the + rank element of the + service definition. + Refer to the rank attribute + of the metadata element for how to control ranking for individual + metadata fields. - +
+ +
+ Pazpar2 and MasterKey Connect - Pazpar2 is distributed in the hope that it will be useful, but WITHOUT ANY - WARRANTY; without even the implied warranty of MERCHANTABILITY or - FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License - for more details. + MasterKey Connect is a hosted connector, or gateway, service that exposes + whatever searchable resources you need. Since the service exposes all + resources using Z39.50 (or SRU), it is easy to set up Pazpar2 to use the + service. In particular, since all connectors expose basically the same core + behavior, it is a good use of Pazpar2's mechanism for managing default + behaviors across similar databases. - - You should have received a copy of the GNU General Public License - along with Pazpar2; see the file LICENSE. If not, write to the - Free Software Foundation, 59 Temple Place - Suite 330, Boston, MA - 02111-1307, USA. + After installation of Pazpar2, the directory + /etc/pazpar2/settings/mkc (location may + vary depending on installation preferences) contains an example setup that + searches two different resources through a MasterKey Connect demo account. + The file mkc.xml contains default parameters that will work for all + MasterKey Connect resources (if you decide to become a customer of the + service, you will substitute your own account credentials for + the guest/guest). The other files contain specific information about + a couple of demonstration resources. + + + + To play with the demo, just create a symlink from + /etc/pazpar2/services-enabled/default.xml + to /etc/pazpar2/services-available/mkc.xml. + And restart Pazpar2. You should now be able to search the two demo + resources using JSDemo or any user interface of your choice. + If you are interested in learning more about MasterKey Connect, or to + try out the service for free against your favorite online resource, just + contact us at info@indexdata.com. - - - GNU GENERAL PUBLIC LICENSE - Version 2, June 1991 - - Copyright (C) 1989, 1991 Free Software Foundation, Inc. - 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - Everyone is permitted to copy and distribute verbatim copies - of this license document, but changing it is not allowed. - - Preamble - - The licenses for most software are designed to take away your -freedom to share and change it. By contrast, the GNU General Public -License is intended to guarantee your freedom to share and change free -software--to make sure the software is free for all its users. This -General Public License applies to most of the Free Software -Foundation's software and to any other program whose authors commit to -using it. (Some other Free Software Foundation software is covered by -the GNU Library General Public License instead.) You can apply it to -your programs, too. - - When we speak of free software, we are referring to freedom, not -price. Our General Public Licenses are designed to make sure that you -have the freedom to distribute copies of free software (and charge for -this service if you wish), that you receive source code or can get it -if you want it, that you can change the software or use pieces of it -in new free programs; and that you know you can do these things. - - To protect your rights, we need to make restrictions that forbid -anyone to deny you these rights or to ask you to surrender the rights. -These restrictions translate to certain responsibilities for you if you -distribute copies of the software, or if you modify it. - - For example, if you distribute copies of such a program, whether -gratis or for a fee, you must give the recipients all the rights that -you have. You must make sure that they, too, receive or can get the -source code. And you must show them these terms so they know their -rights. - - We protect your rights with two steps: (1) copyright the software, and -(2) offer you this license which gives you legal permission to copy, -distribute and/or modify the software. - - Also, for each author's protection and ours, we want to make certain -that everyone understands that there is no warranty for this free -software. If the software is modified by someone else and passed on, we -want its recipients to know that what they have is not the original, so -that any problems introduced by others will not reflect on the original -authors' reputations. - - Finally, any free program is threatened constantly by software -patents. We wish to avoid the danger that redistributors of a free -program will individually obtain patent licenses, in effect making the -program proprietary. To prevent this, we have made it clear that any -patent must be licensed for everyone's free use or not licensed at all. - - The precise terms and conditions for copying, distribution and -modification follow. - - GNU GENERAL PUBLIC LICENSE - TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION - - 0. This License applies to any program or other work which contains -a notice placed by the copyright holder saying it may be distributed -under the terms of this General Public License. The "Program", below, -refers to any such program or work, and a "work based on the Program" -means either the Program or any derivative work under copyright law: -that is to say, a work containing the Program or a portion of it, -either verbatim or with modifications and/or translated into another -language. (Hereinafter, translation is included without limitation in -the term "modification".) Each licensee is addressed as "you". - -Activities other than copying, distribution and modification are not -covered by this License; they are outside its scope. The act of -running the Program is not restricted, and the output from the Program -is covered only if its contents constitute a work based on the -Program (independent of having been made by running the Program). -Whether that is true depends on what the Program does. - - 1. You may copy and distribute verbatim copies of the Program's -source code as you receive it, in any medium, provided that you -conspicuously and appropriately publish on each copy an appropriate -copyright notice and disclaimer of warranty; keep intact all the -notices that refer to this License and to the absence of any warranty; -and give any other recipients of the Program a copy of this License -along with the Program. - -You may charge a fee for the physical act of transferring a copy, and -you may at your option offer warranty protection in exchange for a fee. - - 2. You may modify your copy or copies of the Program or any portion -of it, thus forming a work based on the Program, and copy and -distribute such modifications or work under the terms of Section 1 -above, provided that you also meet all of these conditions: - - a) You must cause the modified files to carry prominent notices - stating that you changed the files and the date of any change. - - b) You must cause any work that you distribute or publish, that in - whole or in part contains or is derived from the Program or any - part thereof, to be licensed as a whole at no charge to all third - parties under the terms of this License. - - c) If the modified program normally reads commands interactively - when run, you must cause it, when started running for such - interactive use in the most ordinary way, to print or display an - announcement including an appropriate copyright notice and a - notice that there is no warranty (or else, saying that you provide - a warranty) and that users may redistribute the program under - these conditions, and telling the user how to view a copy of this - License. (Exception: if the Program itself is interactive but - does not normally print such an announcement, your work based on - the Program is not required to print an announcement.) - -These requirements apply to the modified work as a whole. If -identifiable sections of that work are not derived from the Program, -and can be reasonably considered independent and separate works in -themselves, then this License, and its terms, do not apply to those -sections when you distribute them as separate works. But when you -distribute the same sections as part of a whole which is a work based -on the Program, the distribution of the whole must be on the terms of -this License, whose permissions for other licensees extend to the -entire whole, and thus to each and every part regardless of who wrote it. - -Thus, it is not the intent of this section to claim rights or contest -your rights to work written entirely by you; rather, the intent is to -exercise the right to control the distribution of derivative or -collective works based on the Program. - -In addition, mere aggregation of another work not based on the Program -with the Program (or with a work based on the Program) on a volume of -a storage or distribution medium does not bring the other work under -the scope of this License. - - 3. You may copy and distribute the Program (or a work based on it, -under Section 2) in object code or executable form under the terms of -Sections 1 and 2 above provided that you also do one of the following: - - a) Accompany it with the complete corresponding machine-readable - source code, which must be distributed under the terms of Sections - 1 and 2 above on a medium customarily used for software interchange; or, - - b) Accompany it with a written offer, valid for at least three - years, to give any third party, for a charge no more than your - cost of physically performing source distribution, a complete - machine-readable copy of the corresponding source code, to be - distributed under the terms of Sections 1 and 2 above on a medium - customarily used for software interchange; or, - - c) Accompany it with the information you received as to the offer - to distribute corresponding source code. (This alternative is - allowed only for noncommercial distribution and only if you - received the program in object code or executable form with such - an offer, in accord with Subsection b above.) - -The source code for a work means the preferred form of the work for -making modifications to it. For an executable work, complete source -code means all the source code for all modules it contains, plus any -associated interface definition files, plus the scripts used to -control compilation and installation of the executable. However, as a -special exception, the source code distributed need not include -anything that is normally distributed (in either source or binary -form) with the major components (compiler, kernel, and so on) of the -operating system on which the executable runs, unless that component -itself accompanies the executable. - -If distribution of executable or object code is made by offering -access to copy from a designated place, then offering equivalent -access to copy the source code from the same place counts as -distribution of the source code, even though third parties are not -compelled to copy the source along with the object code. - - 4. You may not copy, modify, sublicense, or distribute the Program -except as expressly provided under this License. Any attempt -otherwise to copy, modify, sublicense or distribute the Program is -void, and will automatically terminate your rights under this License. -However, parties who have received copies, or rights, from you under -this License will not have their licenses terminated so long as such -parties remain in full compliance. - - 5. You are not required to accept this License, since you have not -signed it. However, nothing else grants you permission to modify or -distribute the Program or its derivative works. These actions are -prohibited by law if you do not accept this License. Therefore, by -modifying or distributing the Program (or any work based on the -Program), you indicate your acceptance of this License to do so, and -all its terms and conditions for copying, distributing or modifying -the Program or works based on it. - - 6. Each time you redistribute the Program (or any work based on the -Program), the recipient automatically receives a license from the -original licensor to copy, distribute or modify the Program subject to -these terms and conditions. You may not impose any further -restrictions on the recipients' exercise of the rights granted herein. -You are not responsible for enforcing compliance by third parties to -this License. - - 7. If, as a consequence of a court judgment or allegation of patent -infringement or for any other reason (not limited to patent issues), -conditions are imposed on you (whether by court order, agreement or -otherwise) that contradict the conditions of this License, they do not -excuse you from the conditions of this License. If you cannot -distribute so as to satisfy simultaneously your obligations under this -License and any other pertinent obligations, then as a consequence you -may not distribute the Program at all. For example, if a patent -license would not permit royalty-free redistribution of the Program by -all those who receive copies directly or indirectly through you, then -the only way you could satisfy both it and this License would be to -refrain entirely from distribution of the Program. - -If any portion of this section is held invalid or unenforceable under -any particular circumstance, the balance of the section is intended to -apply and the section as a whole is intended to apply in other -circumstances. - -It is not the purpose of this section to induce you to infringe any -patents or other property right claims or to contest validity of any -such claims; this section has the sole purpose of protecting the -integrity of the free software distribution system, which is -implemented by public license practices. Many people have made -generous contributions to the wide range of software distributed -through that system in reliance on consistent application of that -system; it is up to the author/donor to decide if he or she is willing -to distribute software through any other system and a licensee cannot -impose that choice. - -This section is intended to make thoroughly clear what is believed to -be a consequence of the rest of this License. - - 8. If the distribution and/or use of the Program is restricted in -certain countries either by patents or by copyrighted interfaces, the -original copyright holder who places the Program under this License -may add an explicit geographical distribution limitation excluding -those countries, so that distribution is permitted only in or among -countries not thus excluded. In such case, this License incorporates -the limitation as if written in the body of this License. - - 9. The Free Software Foundation may publish revised and/or new versions -of the General Public License from time to time. Such new versions will -be similar in spirit to the present version, but may differ in detail to -address new problems or concerns. - -Each version is given a distinguishing version number. If the Program -specifies a version number of this License which applies to it and "any -later version", you have the option of following the terms and conditions -either of that version or of any later version published by the Free -Software Foundation. If the Program does not specify a version number of -this License, you may choose any version ever published by the Free Software -Foundation. - - 10. If you wish to incorporate parts of the Program into other free -programs whose distribution conditions are different, write to the author -to ask for permission. For software which is copyrighted by the Free -Software Foundation, write to the Free Software Foundation; we sometimes -make exceptions for this. Our decision will be guided by the two goals -of preserving the free status of all derivatives of our free software and -of promoting the sharing and reuse of software generally. - - NO WARRANTY - - 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY -FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN -OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES -PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED -OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF -MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS -TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE -PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, -REPAIR OR CORRECTION. - - 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING -WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR -REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, -INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING -OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED -TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY -YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER -PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE -POSSIBILITY OF SUCH DAMAGES. - - END OF TERMS AND CONDITIONS -
+ + + + + Reference + + + The material in this chapter is drawn directly from the individual + manual entries. + + + &manref; + + + + License + + + Pazpar2, + Copyright © ©right-year; Index Data. + + + + Pazpar2 is free software; you can redistribute it and/or modify it under + the terms of the GNU General Public License as published by the Free + Software Foundation; either version 2, or (at your option) any later + version. + + + + Pazpar2 is distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or + FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + for more details. + + + + You should have received a copy of the GNU General Public License + along with Pazpar2; see the file LICENSE. If not, write to the + Free Software Foundation, + 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + + + &gpl2; + - +