X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Fbook.xml;h=51807a2a239a01b9faa3e0ddb6f889953aa33ba3;hb=1ee01d28f4d668e35631b20241dd0ce7a108efde;hp=08a3904a6c61cbae77ec9c88b8a36841e59212fa;hpb=1e61b0aa05e2351e33d909f7503eaf936a2d9bb0;p=metaproxy-moved-to-github.git diff --git a/doc/book.xml b/doc/book.xml index 08a3904..51807a2 100644 --- a/doc/book.xml +++ b/doc/book.xml @@ -1,47 +1,50 @@ - + + + %local; - - - %common; - - - - + + + %idcommon; ]> - - + Metaproxy - User's Guide and Reference - - MikeTaylor - - - AdamDickmeiss - + + + AdamDickmeiss + + + MarcCromme + + + MikeTaylor + + + &version; - 2006 - Index Data ApS + 2005-2014 + Index Data + This manual is part of Metaproxy version &version;. + + Metaproxy is a universal router, proxy and encapsulated metasearcher for information retrieval protocols. It accepts, processes, interprets and redirects requests from IR clients using - standard protocols such as + standard protocols such as the binary ANSI/NISO Z39.50 - (and in the future SRU - and SRW), as - well as functioning as a limited - HTTP server. + and the information search and retrieval + web service SRU + as well as functioning as a limited + HTTP server. + + Metaproxy is configured by an XML file which specifies how the software should function in terms of routes that the request packets can take through the proxy, each step on a @@ -52,10 +55,7 @@ using the filter API. - The terms under which Metaproxy will be distributed have yet to be - established, but it will not necessarily be open source; so users - should not at this stage redistribute the code without explicit - written permission from the copyright holders, Index Data ApS. + Metaproxy is covered by the GNU General Public License version 2. @@ -72,16 +72,15 @@ Introduction - - + Metaproxy - is a standalone program that acts as a universal router, proxy and + is a stand alone program that acts as a universal router, proxy and encapsulated metasearcher for information retrieval protocols such - as Z39.50, and in the future - SRU and SRW. + as Z39.50 and + SRU. To clients, it acts as a server of these protocols: it can be searched, - records can be retrieved from it, etc. + records can be retrieved from it, etc. To servers, it acts as a client: it searches in them, retrieves records from them, etc. it satisfies its clients' requests by transforming them, multiplexing them, forwarding them @@ -104,13 +103,13 @@ being more powerful, flexible, configurable and extensible. Among its many advantages over the older, more pedestrian work are support for multiplexing (encapsulated metasearching), routing by - database name, authentication and authorisation and serving local + database name, authentication and authorization and serving local files via HTTP. Equally significant, its modular architecture facilitites the creation of pluggable modules implementing further functionality. - This manual will briefly describe Metaproxy's licensing situation + This manual will describe how to install Metaproxy before giving an overview of its architecture, then discussing the key concept of a filter in some depth and giving an overview of the various filter types, then discussing the configuration file @@ -125,60 +124,6 @@ - - The Metaproxy License - - - - You are allowed to download this software for evaluation purposes. - You can unpack it, build it, run it, see how it works and how it fits - your needs, all at zero cost. - - - - - You may NOT deploy the software. For the purposes of this license, - deployment means running it for any purpose other than evaluation, - whether or not you or anyone else makes a profit from doing so. If - you wish to deploy the software, you must first contact Index Data and - arrange to purchase a DEPLOYMENT LICENCE. If you are unsure - whether or not your proposed use of the software constitutes - deployment, email us at info@indexdata.com - for clarification. - - - - - You may modify your copy of the software (fix bugs, add features) - if you need to. We encourage you to send your changes back to us for - integration into the master copy, but you are not obliged to do so. You - may NOT pass your changes on to any other party. - - - - - There is NO WARRANTY for this software, to the extent permitted by - applicable law. We provide the software ``as is'' without warranty of - any kind, either expressed or implied, including, but not limited to, the - implied warranties of MERCHANTABILITY and FITNESS FOR A - PARTICULAR PURPOSE. The entire risk as to the quality and - performance of the software is with you. Should the software prove - defective, you assume the cost of all necessary servicing, repair or - correction. In no event unless required by applicable law will we be - liable to you for damages, arising out of the use of the software, - including but not limited to loss of data or data being rendered - inaccurate. - - - - - All rights to the software are reserved by Index Data except where - this license explicitly says otherwise. - - - - - Installation @@ -193,7 +138,7 @@ Libxslt - This is an XSLT processor - based on + This is an XSLT processor - based on Libxml2. Both Libxml2 and Libxslt must be installed with the development components (header files, etc.) as well as the run-time libraries. @@ -204,7 +149,8 @@ The popular C++ library. Initial versions of Metaproxy - was built with 1.33.0. Version 1.33.1 works too. + was built with 1.32 but this is no longer supported. + Metaproxy is known to work with Boost version 1.33 through 1.55. @@ -218,11 +164,16 @@ for more information. - We have succesfully built Metaproxy using the compilers - GCC version 4.0 and - Microsoft Visual Studio 2003/2005. + We have successfully built Metaproxy using the compilers + GCC and + Microsoft Visual Studio. + + As an option, Metaproxy may also be compiled with + USEMARCON support which allows for + MARC conversions for the filter. +
Installation on Unix (from Source) @@ -231,76 +182,125 @@ tools binary packages. If, for example, Libxml2/libxslt are already installed as development packages use those (and omit compilation). - - - Libxml2/libxslt: - - - gunzip -c libxml2-version.tar.gz|tar xf - - cd libxml2-version - ./configure - make - su - make install - - - gunzip -c libxslt-version.tar.gz|tar xf - - cd libxslt-version - ./configure - make - su - make install - - - YAZ/YAZ++: - - - gunzip -c yaz-version.tar.gz|tar xf - - cd yaz-version - ./configure - make - su - make install - - - gunzip -c yazpp-version.tar.gz|tar xf - - cd yazpp-version - ./configure - make - su - make install - - - Boost: - - - gunzip -c boost-version.tar.gz|tar xf - - cd boost-version - ./configure - make - su - make install - - - Metaproxy: - - - gunzip -c metaproxy-version.tar.gz|tar xf - - cd metaproxy-version - ./configure - make - su - make install - + + + + USEMARCON is not available + as a package at the moment, so Metaproxy must be built from source + if that is to be used. + + + +
+ Libxml2/libxslt + + Libxml2/libxslt: + + + gunzip -c libxml2-version.tar.gz|tar xf - + cd libxml2-version + ./configure + make + su + make install + + + gunzip -c libxslt-version.tar.gz|tar xf - + cd libxslt-version + ./configure + make + su + make install + +
+
+ USEMARCON (optional) + + gunzip -c usemarcon317.tar.gz|tar xf - + cd usemarcon317 + ./configure + make + su + make install + +
+ +
+ YAZ/YAZ++ + + gunzip -c yaz-version.tar.gz|tar xf - + cd yaz-version + ./configure + make + su + make install + + + gunzip -c yazpp-version.tar.gz|tar xf - + cd yazpp-version + ./configure + make + su + make install + +
+
+ Boost + + Metaproxy needs components thread and test from + Boost. + + + gunzip -c boost-version.tar.gz|tar xf - + cd boost-version + ./configure --with-libraries=thread,test,regex --with-toolset=gcc + make + su + make install + + + However, under the hood bjam is used. You can invoke that with + + + ./bjam --toolset=gcc --with-thread --with-test --with-regex stage + + + Replace stage with clean / + install to perform clean and install respectively. + + + Add --prefix=DIR to install Boost in other + prefix than /usr/local. + +
+
+ Metaproxy + + gunzip -c metaproxy-version.tar.gz|tar xf - + cd metaproxy-version + ./configure + make + su + make install + + + You may have to tell configure where Boost is installed by supplying + options --with-boost and --with-boost-toolset. + The former sets the PREFIX for Boost (same as --prefix for Boost above). + The latter the compiler toolset (eg. gcc34). + + + Pass --help to configure to get a list of + available options. + +
Installation on Debian GNU/Linux - All dependencies for Metaproxy are available as - Debian - packages for the sarge (stable in 2005) and etch (testing in 2005) - distributions. + All dependencies for Metaproxy are available as + Debian packages. The procedures for Debian based systems, such as @@ -308,7 +308,10 @@ There is currently no official Debian package for YAZ++. - And the Debian package for YAZ is probably too old. + And the official Debian package for YAZ is probably too old. + But Index Data builds "new" versions of those for Debian (i386, amd64 only). + + Update the /etc/apt/sources.list to include the Index Data repository. See YAZ' Download Debian @@ -316,12 +319,12 @@ apt-get install libxslt1-dev - apt-get install libyazpp-dev + apt-get install libyazpp6-dev apt-get install libboost-dev + apt-get install libboost-system-dev apt-get install libboost-thread-dev - apt-get install libboost-date-time-dev - apt-get install libboost-program-options-dev apt-get install libboost-test-dev + apt-get install libboost-regex-dev With these packages installed, the usual configure + make @@ -330,52 +333,59 @@
+
+ Installation on RPM based Linux Systems + + All external dependencies for Metaproxy are available as + RPM packages, either from your distribution site, or from the + RPMfind site. + + + For example, an installation of the requires Boost C++ development + libraries on RedHat Fedora C4 and C5 can be done like this: + + wget ftp://fr.rpmfind.net/wlinux/fedora/core/updates/testing/4/SRPMS/boost-1.33.0-3.fc4.src.rpm + sudo rpmbuild --buildroot src/ --rebuild -p fc4/boost-1.33.0-3.fc4.src.rpm + sudo rpm -U /usr/src/redhat/RPMS/i386/boost-*rpm + + + + The YAZ library is needed to + compile &metaproxy;, see there + for more information on available RPM packages. + + + There is currently no official RPM package for YAZ++. + See the YAZ++ pages + for more information on a Unix tarball install. + + + With these packages installed, the usual configure + make + procedure can be used for Metaproxy as outlined in + . + +
+
Installation on Windows Metaproxy can be compiled with Microsoft Visual Studio. - Version 2003 (C 7.1) and 2005 (C 8.0) is known to work. + Versions 2003 (C 7.1), 2005 (C 8.0), 2008 (C 9.0) and 2013 (C 12.0) + are known to work.
Boost - Get Boost from its home page. - You also need Boost Jam (an alternative to make). - That's also available from the Boost home page. - The files to be downloaded are called something like: - boost_1_33-1.exe - and - boost-jam-3.1.12-1-ntx86.zip. - Unpack Boost Jam first. Put bjam.exe - in your system path. Make a command prompt and ensure - it can be found automatically. If not check the PATH. - The Boost .exe is a self-extracting exe with - complete source for Boost. Compile that source with - Boost Jam (An alternative to Make). - The compilation takes a while. - For Visual Studio 2003, use - - bjam "-sTOOLS=vc-7_1" - - Here vc-7_1 refers to a "Toolset" (compiler system). - For Visual Studio 2005, use - - bjam "-sTOOLS=vc-8_0" - - To install the libraries in a common place, use - - bjam "-sTOOLS=vc-7_1" install - - (or vc-8_0 for VS 2005). - - - By default, the Boost build process installs the resulting - libraries + header files in - \boost\lib, \boost\include. + For Windows, it's easiest to get the precompiled Boost + package from here. + Several versions of the Boost libraries may be selected when + installing Boost for windows. Please choose at least the + multithreaded (non-DLL) version because + the Metaproxy makefile uses that. - For more informatation about installing Boost refer to the + For more information about installing Boost refer to the getting started pages. @@ -386,12 +396,10 @@ Libxslt can be downloaded for Windows from - here. + here. - Libxslt has other dependencies, but thes can all be downloaded - from the same site. Get the following: - iconv, zlib, libxml2, libxslt. + Libxslt also requires libxml2 to operate.
@@ -408,9 +416,7 @@ YAZ++ Get YAZ++ as well. - Version 1.0 or later is required. For now get it from - Index Data's - Snapshot area. + Version 1.5.2 or later is required. YAZ++ includes NMAKE makefiles, similar to those found in the @@ -421,7 +427,7 @@
Metaproxy - Metaproxy is shipped with NMAKE makfiles as well - similar + Metaproxy is shipped with NMAKE makefiles as well - similar to those found in the YAZ++/YAZ packages. Adjust this Makefile to point to the proper locations of Boost, Libxslt, Libxml2, zlib, iconv, yaz and yazpp. @@ -475,11 +481,11 @@ - + - + - After succesful compilation you'll find + After successful compilation you'll find metaproxy.exe in the bin directory. @@ -488,7 +494,154 @@
- + + + YAZ Proxy Comparison + + The table below lists facilities either supported by either + YAZ Proxy or Metaproxy. + + + Metaproxy / YAZ Proxy comparison + + + + Facility + Metaproxy + YAZ Proxy + + + + + Z39.50 server + Using filter + Supported + + + SRU server + Supported with filter + Supported + + + Z39.50 client + Supported with filter + Supported + + + SRU client + Supported with filter + Unsupported + + + Connection reuse + Supported with filter session_shared + Supported + + + Connection share + Supported with filter session_shared + Unsupported + + + Result set reuse + Supported with filter session_shared + Within one Z39.50 session / HTTP keep-alive + + + Record cache + Supported by filter session_shared + Supported for last result set within one Z39.50/HTTP-keep alive session + + + Z39.50 Virtual database, i.e. select any Z39.50 target for database + Supported with filter virt_db + Unsupported + + + SRU Virtual database, i.e. select any Z39.50 target for path + Supported with filter virt_db, + sru_z3950 + Supported + + + Multi target search + Supported with filter multi (round-robin) + Unsupported + + + Retrieval and search limits + Supported using filter limit + Supported + + + Bandwidth limits + Supported using filter limit + Supported + + + Connect limits + Supported by filter frontend_net (connect-max) + Supported + + + Retrieval sanity check and conversions + Supported using filter record_transform + Supported + + + Query check + + Supported by query_rewrite which may be check + a query and throw diagnostics (errors) + + Supported + + + Query rewrite + Supported with query_rewrite + Unsupported + + + Session invalidate for -1 hits + Unsupported + Supported + + + Architecture + Multi-threaded + select for networked modules such as + frontend_net) + Single-threaded using select + + + + Extensability + Most functionality implemented as loadable modules + Unsupported and experimental + + + + USEMARCON + Supported with record_transform + Supported + + + + Portability + + Requires YAZ, YAZ++ and modern C++ compiler supporting + Boost. + + + Requires YAZ and YAZ++. + STL is not required so pretty much any C++ compiler out there should work. + + + + + +
+
+ The Metaproxy Architecture @@ -516,7 +669,7 @@ In general, packages are doctored as they pass through Metaproxy. For example, when the proxy performs authentication - and authorisation on a Z39.50 Init request, it removes the + and authorization on a Z39.50 Init request, it removes the authentication credentials from the package so that they are not passed onto the back-end server; and when search-response packages are obtained from multiple servers, they are merged @@ -555,7 +708,7 @@ The word ``filter'' is sometimes used rather loosely, in two different ways: it may be used to mean a particular type of filter, as when we speak of ``the - auth_simplefilter'' or ``the multi filter''; or it may be used + auth_simple filter'' or ``the multi filter''; or it may be used to be a specific instance of a filter within a Metaproxy configuration. For example, a single configuration will often contain multiple instances of the @@ -569,7 +722,7 @@ plugins that provide new filters. The filter API is small and conceptually simple, but there are many details to master. See the section below on - extensions. + Filters. @@ -587,9 +740,9 @@ Filters - - -
+ + +
Introductory notes It's useful to think of Metaproxy as an interpreter providing a small @@ -612,7 +765,7 @@ as part of Metaproxy, and others may be provided by third parties and dynamically loaded. They all conform to the same simple API of essentially two methods: configure() is - called at startup time, and is passed a DOM tree representing that + called at startup time, and is passed an XML DOM tree representing that part of the configuration file that pertains to this filter instance: it is expected to walk that tree extracting relevant information; and process() is called every @@ -624,31 +777,33 @@ packages (frontend_net); others are sinks: they consume packages and return a result - (z3950_client, - backend_test, - http_file); + (backend_test, + bounce, + http_file, + z3950_client); the others are true filters, that read, process and pass on the packages they are fed (auth_simple, log, multi, query_rewrite, + record_transform, session_shared, + sru_z3950, template, virt_db).
- - + +
Overview of filter types We now briefly consider each of the types of filter supported by the core Metaproxy binary. This overview is intended to give a - flavour of the available functionality; more detailed information + flavor of the available functionality; more detailed information about each type of filter is included below in - the reference guide to Metaproxy filters. + . The filters are here named by the string that is used as the @@ -662,12 +817,35 @@ The filters are here listed in alphabetical order: - -
+ + + +
<literal>auth_simple</literal> (mp::filter::AuthSimple) - Simple authentication and authorisation. The configuration + Simple authentication and authorization. The configuration specifies the name of a file that is the user register, which lists username:password pairs, one per line, colon separated. When a session begins, it @@ -681,51 +859,108 @@ the user.
- -
+ +
<literal>backend_test</literal> (mp::filter::Backend_test) - A sink that provides dummy responses in the manner of the + A partial sink that provides dummy responses in the manner of the yaz-ztest Z39.50 server. This is useful only for testing. Seriously, you don't need this. Pretend you didn't even read this section.
- -
+ +
+ <literal>bounce</literal> + (mp::filter::Bounce) + + A sink that swallows all packages, + and returns them almost unprocessed. + It never sends any package of any type further down the row, but + sets Z39.50 packages to Z_Close, and HTTP_Request packages to + HTTP_Response err code 400 packages, and adds a suitable bounce + message. + The bounce filter is usually added at end of each filter chain route + to prevent infinite hanging of for example HTTP + requests packages when only the Z39.50 client partial sink + filter is found in the + route. + +
+ +
+ <literal>cql_rpn</literal> + (mp::filter::CQLtoRPN) + + A query language transforming filter which catches Z39.50 + searchRequest + packages containing CQL queries, transforms + those to RPN queries, + and sends the searchRequests on to the next + filters. It is among other things useful in a SRU context. + +
+ +
<literal>frontend_net</literal> (mp::filter::FrontendNet) A source that accepts Z39.50 connections from a port specified in the configuration, reads protocol units, and feeds them into the next filter in the route. When the result is - revceived, it is returned to the original origin. + received, it is returned to the original origin.
-
+
<literal>http_file</literal> (mp::filter::HttpFile) - A sink that returns the contents of files from the local - filesystem in response to HTTP requests. (Yes, Virginia, this + A partial sink which swallows only + HTTP_Request packages, and + returns the contents of files from the local + filesystem in response to HTTP requests. + It lets Z39.50 packages and all other forthcoming package types + pass untouched. + (Yes, Virginia, this does mean that Metaproxy is also a Web-server in its spare time. So far it does not contain either an email-reader or a Lisp interpreter, but that day is surely coming.)
- -
+ +
+ <literal>load_balance</literal> + (mp::filter::LoadBalance) + + Performs load balancing for incoming Z39.50 init requests. + It is used together with the virt_db filter, + but unlike the multi filter it does send an + entire session to only one of the virtual backends. The + load_balance filter is assuming that + all backend targets have equal content, and chooses the backend + with least load cost for a new session. + + + This filter is experimental and yet not mature for heavy load + production sites. + + + +
+ +
<literal>log</literal> (mp::filter::Log) Writes logging information to standard output, and passes on - the package unchanged. + the package unchanged. A log file name can be specified, as well + as multiple different logging formats.
- -
+ +
<literal>multi</literal> (mp::filter::Multi) @@ -735,12 +970,14 @@ of virtual databases and multi-database searching below.
- -
+ +
<literal>query_rewrite</literal> (mp::filter::QueryRewrite) - Rewrites Z39.50 Type-1 and Type-101 (``RPN'') queries by a + Rewrites Z39.50 Type-1 + and Type-101 (``RPN'') + queries by a three-step process: the query is transliterated from Z39.50 packet structures into an XML representation; that XML representation is transformed by an XSLT stylesheet; and the @@ -748,26 +985,56 @@ structure.
- -
+ + +
+ <literal>record_transform</literal> + (mp::filter::RecordTransform) + + This filter acts only on Z3950 present requests, and let all + other types of packages and requests pass untouched. It's use is + twofold: blocking Z3950 present requests, which the backend + server does not understand and can not honor, and transforming + the present syntax and elementset name according to the rules + specified, to fetch only existing record formats, and transform + them on the fly to requested record syntaxes. + +
+ +
<literal>session_shared</literal> (mp::filter::SessionShared) - When this is finished, it will implement global sharing of + This filter implements global sharing of result sets (i.e. between threads and therefore between - clients), yielding performance improvements especially when - incoming requests are from a stateless environment such as a - web-server, in which the client process representing a session - might be any one of many. However: + clients), yielding performance improvements by clever resource + pooling. - - - This filter is not yet completed. - -
- -
+ +
+ <literal>sru_z3950</literal> + (mp::filter::SRUtoZ3950) + + This filter transforms valid + SRU GET/POST/SOAP searchRetrieve requests to Z3950 init, search, + and present requests, and wraps the + received hit counts and XML records into suitable SRU response + messages. + The sru_z3950 filter processes also SRU + GET/POST/SOAP explain requests, returning + either the absolute minimum required by the standard, or a full + pre-defined ZeeReX explain record. + See the + ZeeReX Explain + standard pages and the + SRU Explain pages + for more information on the correct explain syntax. + SRU scan requests are not supported yet. + +
+ +
<literal>template</literal> (mp::filter::Template) @@ -779,10 +1046,10 @@ intended for civilians.
- -
+ +
<literal>virt_db</literal> - (mp::filter::Virt_db) + (mp::filter::VirtualDB) Performs virtual database selection: based on the name of the database in the search request, a server is selected, and its @@ -794,23 +1061,49 @@ of virtual databases and multi-database searching below.
- -
+ +
<literal>z3950_client</literal> (mp::filter::Z3950Client) - Performs Z39.50 searching and retrieval by proxying the + A partial sink which swallows only Z39.50 packages. + It performs Z39.50 searching and retrieval by proxying the packages that are passed to it. Init requests are sent to the address specified in the VAL_PROXY otherInfo attached to the request: this may have been specified by client, or generated by a virt_db filter earlier in the route. Subsequent requests are sent to the same address, which is remembered at Init time in a Session object. + HTTP_Request packages and all other forthcoming package types + are passed untouched.
+ + +
+ <literal>zeerex_explain</literal> + (mp::filter::ZeerexExplain) + + This filter acts as a sink for + Z39.50 explain requests, returning a static ZeeReX + Explain XML record from the config section. All other packages + are passed through. + See the + ZeeReX Explain + standard pages + for more information on the correct explain syntax. + + + + This filter is not yet completed. + + +
+ +
- - + +
Future directions @@ -830,34 +1123,10 @@ - frontend_sru (source) - - - Receive SRU (and perhaps SRW) requests. - - - - - sru2z3950 (filter) - - - Translate SRU requests into Z39.50 requests. - - - - sru_client (sink) - SRU searching and retrieval. - - - - - srw_client (sink) - - - SRW searching and retrieval. + SRU/GET and SRU/SOAP searching and retrieval. @@ -872,64 +1141,51 @@
- - - + + + Configuration: the Metaproxy configuration file format - - -
+ + +
Introductory notes If Metaproxy is an interpreter providing operations on packages, then its configuration file can be thought of as a program for that - interpreter. Configuration is by means of a single file, the name + interpreter. Configuration is by means of a single XML file, the name of which is supplied as the sole command-line argument to the metaproxy program. (See - the reference guide - below for more information on invoking Metaproxy.) - - - The configuration files are written in XML. (But that's just an - implementation detail - they could just as well have been written - in YAML or Lisp-like S-expressions, or in a custom syntax.) - - - Since XML has been chosen, an XML schema, - config.xsd, is provided for validating - configuration files. This file is supplied in the - etc directory of the Metaproxy distribution. It - can be used by (among other tools) the xmllint - program supplied as part of the libxml2 - distribution: - - - xmllint --noout --schema etc/config.xsd my-config-file.xml - - - (A recent version of libxml2 is required, as - support for XML Schemas is a relatively recent addition.) + below for more information on invoking + Metaproxy.)
- +
- Overview of XML structure + Overview of the config file XML structure All elements and attributes are in the namespace - . + . This is most easily achieved by setting the default namespace on the top-level element, as here: - <yp2 xmlns="http://indexdata.dk/yp2/config/1"> + <metaproxy xmlns="http://indexdata.com/metaproxy" version="1.0"> - The top-level element is <yp2>. This contains a - <start> element, a <filters> element and a - <routes> element, in that order. <filters> is - optional; the other two are mandatory. All three are - non-repeatable. + The top-level element is <metaproxy>. This contains + a <dlpath> element, + a <start> element, + a <filters> element and + a <routes> element, in that order. <dlpath> and + <filters> are optional; the other two are mandatory. + All four are non-repeatable. + + + The <dlpath;> element contains a text element which + specifies the location of filter modules. This is only needed + if Metaproxy must load 3rd party filters (most filters with Metaproxy + are built into the Metaproxy application). The <start> element is empty, but carries a @@ -945,7 +1201,7 @@ and contain various elements that provide suitable configuration for a filter of its type. The filter-specific elements are described in - the reference guide below. + . Filters defined in this part of the file must carry an id attribute so that they can be referenced from elsewhere. @@ -974,14 +1230,15 @@ The following is a small, but complete, Metaproxy configuration file (included in the distribution as - metaproxy/etc/config0.xml). + metaproxy/etc/config1.xml). This file defines a very simple configuration that simply proxies to whatever back-end server the client requests, but logs each request and response. This can be useful for debugging complex client-server dialogues. - + + /usr/lib/metaproxy/modules @@ -990,18 +1247,19 @@ - + + - + ]]> It works by defining a single route, called - start, which consists of a sequence of three + start, which consists of a sequence of four filters. The first and last of these are included by reference: their <filter> elements have refid attributes that refer to filters defined @@ -1009,18 +1267,70 @@ middle filter is included inline in the route. - The three filters in the route are as follows: first, a + The four filters in the route are as follows: first, a frontend_net filter accepts Z39.50 requests from any host on port 9000; then these requests are passed through a log filter that emits a message for each request; they are then fed into a z3950_client - filter, which forwards the requests to the client-specified - back-end Z39.509 server. When the response arrives, it is handed + filter, which forwards all Z39.50 requests to the client-specified + back-end Z39.509 server. Those Z39.50 packages are returned by the + z3950_client filter, with the response data + filled by the external Z39.50 server targeted. + All non-Z39.50 packages are passed through to the + bounce filter, which definitely bounces + everything, including fish, bananas, cold pyjamas, + mutton, beef and trout packages. + When the response arrives, it is handed back to the log filter, which emits another - message; and then to the front-end filter, which returns the - response to the client. + message; and then to the frontend_net filter, + which returns the response to the client.
+ +
+ Config file modularity + + Metaproxy XML configuration snippets can be reused by other + filters using the XInclude standard, as seen in + the /etc/config-sru-to-z3950.xml example SRU + configuration. + + + + + +]]> + +
+ +
+ Config file syntax checking + + The distribution contains RelaxNG Compact and XML syntax checking + files, as well as XML Schema files. These are found in the + distribution paths + + xml/schema/metaproxy.rnc + xml/schema/metaproxy.rng + xml/schema/metaproxy.xsd + + and can be used to verify or debug the XML structure of + configuration files. For example, using the utility + xmllint, syntax checking is done like this: + + xmllint --noout --schema xml/schema/metaproxy.xsd etc/config-local.xml + xmllint --noout --relaxng xml/schema/metaproxy.rng etc/config-local.xml + + (A recent version of libxml2 is required, as + support for XML Schemas is a relatively recent addition.) + + + You can of course use any other RelaxNG or XML Schema compliant tool + you wish. + +
@@ -1029,7 +1339,7 @@ Virtual databases and multi-database searching -
+
Introductory notes Two of Metaproxy's filters are concerned with multiple-database @@ -1043,14 +1353,14 @@ The interaction between these two filters is necessarily complex: it reflects the real, irreducible complexity of multi-database searching in a protocol such - as Z39.50 that separates initialisation from searching, and in - which the database to be searched is not known at initialisation + as Z39.50 that separates initialization from searching, and in + which the database to be searched is not known at initialization time. It's possible to use these filters without understanding the details of their functioning and the interaction between them; the - next two sections of this chapter are ``HOWTO'' guides for doing + next two sections of this chapter are ``HOW-TO'' guides for doing just that. However, debugging complex configurations will require a deeper understanding, which the last two sections of this chapters attempt to provide. @@ -1088,7 +1398,7 @@ marc - indexdata.dk/marc + indexdata.com/marc ]]> @@ -1113,7 +1423,7 @@ Index Data's tiny testing database of MARC records: - + @@ -1128,21 +1438,22 @@ marc - indexdata.dk/marc + indexdata.com/marc all z3950.loc.gov:7090/voyager - indexdata.dk/marc + indexdata.com/marc 30 + -]]> +]]> (Using a virt_db @@ -1246,7 +1557,7 @@ Z> can be inconvenient in deployment, when users typically don't want to be bothered with problems of this kind and prefer just to get the records from the databases that are available. To obtain this - latter behaviour add an empty + latter behavior add an empty <hideunavailable> element inside the multi filter: @@ -1362,13 +1673,8 @@ Z> merges them into a single Search response, which is what eventually makes it back to the client. -
- -
- A picture is worth a thousand words (but only five hundred on 64-bit architectures) - - + @@ -1381,28 +1687,133 @@ Z> [Here there should be a diagram showing the progress of packages through the filters during a simple virtual-database search and a multi-database search, but is seems that your - toolchain has not been able to include the diagram in this - document. This is because of LaTeX suckage. Time to move to - OpenOffice. Yes, really.] + tool chain has not been able to include the diagram in this + document.] - - - +
+ + Combined SRU webservice and Z39.50 server configuration + + Metaproxy can act as + SRU and + web service server, which translates web service requests to + ANSI/NISO Z39.50 packages and + sends them off to common available targets. + + + A typical setup for this operation needs a filter route including the + following modules: + + + + SRU/Z39.50 Server Filter Route Configuration + + + + Filter + Importance + Purpose + + + + + + frontend_net + required + Accepting HTTP connections and passing them to following + filters. Since this filter also accepts Z39.50 connections, the + server works as SRU and Z39.50 server on the same port. + + + sru_z3950 + required + Accepting SRU GET/POST/SOAP explain and + searchRetrieve requests for the the configured databases. + Explain requests are directly served from the static XML configuration. + SearchRetrieve requests are + transformed to Z39.50 search and present packages. + All other HTTP and Z39.50 packages are passed unaltered. + + + http_file + optional + Serving HTTP requests from the filesystem. This is only + needed if the server should serve XSLT stylesheets, static HTML + files or Java Script for thin browser based clients. + Z39.50 packages are passed unaltered. + + + cql_rpn + required + Usually, Z39.50 servers do not talk CQL, hence the + translation of the CQL query language to RPN is mandatory in + most cases. Affects only Z39.50 search packages. + + + record_transform + optional + Some Z39.50 backend targets can not present XML record + syntaxes in common wanted element sets. using this filter, one + can transform binary MARC records to MARCXML records, and + further transform those to any needed XML schema/format by XSLT + transformations. Changes only Z39.50 present packages. + + + session_shared + optional + The stateless nature of web services requires frequent + re-searching of the same targets for display of paged result set + records. This might be an unacceptable burden for the accessed + backend Z39.50 targets, and this mosule can be added for + efficient backend target resource pooling. + + + z3950_client + required + Finally, a Z39.50 package sink is needed in the filter + chain to provide the response packages. The Z39.50 client module + is used to access external targets over the network, but any + coming local Z39.50 package sink could be used instead of. + + + bounce + required + Any Metaproxy package arriving here did not do so by + purpose, and is bounced back with connection closure. this + prevents inifinite package hanging inside the SRU server. + + + +
+ + A typical minimal example SRU + server configuration file is found in the tarball distribution at + etc/config-sru-to-z3950.xml. + + + Off course, any other metaproxy modules can be integrated into a + SRU server solution, including, but not limited to, load balancing, + multiple target querying + (see ), and complex RPN query rewrites. + + +
+ + @@ -1410,12 +1821,12 @@ Z> Classes in the Metaproxy source code -
+
Introductory notes Stop! Do not read this! You won't enjoy it at all. You should just skip ahead to - the reference guide, + , which tells @@ -1430,7 +1841,7 @@ Z> This chapter contains documentation of the Metaproxy source code, and is of interest only to maintainers and developers. If you need to - change Metaproxy's behaviour or write a new filter, then you will most + change Metaproxy's behavior or write a new filter, then you will most likely find this chapter helpful. Otherwise it's a waste of your good time. Seriously: go and watch a film or something. This is Spinal Tap is particularly good. @@ -1456,7 +1867,7 @@ Z> parentheses. -
+
<literal>mp::FactoryFilter</literal> (<filename>factory_filter.cpp</filename>) @@ -1471,7 +1882,7 @@ Z>
-
+
<literal>mp::FactoryStatic</literal> (<filename>factory_static.cpp</filename>) @@ -1486,24 +1897,24 @@ Z>
-
+
<literal>mp::filter::Base</literal> (<filename>filter.cpp</filename>) The virtual base class of all filters. The filter API is, on the surface at least, extremely simple: two methods. - configure() is passed a DOM tree representing + configure() is passed an XML DOM tree representing that part of the configuration file that pertains to this filter instance, and is expected to walk that tree extracting relevant information. And process() processes a - package (see below). That surface simplicitly is a bit + package (see below). That surface simplicity is a bit misleading, as process() needs to know a lot about the Package class in order to do anything useful.
-
+
<literal>mp::filter::AuthSimple</literal>, <literal>Backend_test</literal>, etc. (<filename>filter_auth_simple.cpp</filename>, @@ -1514,12 +1925,7 @@ Z> <filename>filter_*.cpp</filename> respectively. All the header files should be pretty much identical, in that they declare the class, including a private <literal>Rep</literal> class and a - member pointer to it, and the two public methods. The only extra - information in any filter header is additional private types and - members (which should really all be in the <literal>Rep</literal> - anyway) and private methods (which should also remain known only - to the source file, but C++'s brain-damaged design requires this - dirty laundry to be exhibited in public. Thanks, Bjarne!) + member pointer to it, and the two public methods. </para> <para> The source file for each filter needs to supply: @@ -1550,7 +1956,7 @@ Z> </itemizedlist> </section> - <section> + <section id="class-Package"> <title><literal>mp::Package</literal> (<filename>package.cpp</filename>) @@ -1561,7 +1967,7 @@ Z>
-
+
<literal>mp::Pipe</literal> (<filename>pipe.cpp</filename>) @@ -1571,7 +1977,7 @@ Z>
-
+
<literal>mp::RouterChain</literal> (<filename>router_chain.cpp</filename>) @@ -1579,7 +1985,7 @@ Z>
-
+
<literal>mp::RouterFleXML</literal> (<filename>router_flexml.cpp</filename>) @@ -1587,7 +1993,7 @@ Z>
-
+
<literal>mp::Session</literal> (<filename>session.cpp</filename>) @@ -1595,7 +2001,7 @@ Z>
-
+
<literal>mp::ThreadPoolSocketObserver</literal> (<filename>thread_pool_observer.cpp</filename>) @@ -1603,7 +2009,7 @@ Z>
-
+
<literal>mp::util</literal> (<filename>util.cpp</filename>) @@ -1614,7 +2020,7 @@ Z>
-
+
<literal>mp::xml</literal> (<filename>xmlutil.cpp</filename>) @@ -1669,42 +2075,54 @@ Z> + + Reference + + + The material in this chapter is drawn directly from the individual + manual entries. In particular, the Metaproxy invocation section is + available using man metaproxy, and the section + on each individual filter is available using the name of the filter + as the argument to the man command. + + + &manref; + + + + License + + ©right; - - Reference guide - The material in this chapter is drawn directly from the individual - manual entries. In particular, the Metaproxy invocation section is - available using man metaproxy, and the section - on each individual filter is available using the name of the filter - as the argument to the man command. - + Metaproxy is free software; you can redistribute it and/or modify it under + the terms of the GNU General Public License as published by the Free + Software Foundation; either version 2, or (at your option) any later + version. + + + Metaproxy is distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or + FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + for more details. + -
- Metaproxy invocation - &progref; -
+ + You should have received a copy of the GNU General Public License + along with Metaproxy; see the file LICENSE. If not, write to the + Free Software Foundation, + 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + + -
- Reference guide to Metaproxy filters - &manref; -
- + &gpl2; - +