X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Fbook.xml;h=86c2886f521578261098bb0302fc82123e7cca7f;hb=e3b7970c1a533e4c14aa459f80ab002af4fdabf3;hp=03032af0ef8dd490f248b1ff91e18a5fa808400e;hpb=4e876b7359785f69806459e19fcaa8aea37fc15d;p=metaproxy-moved-to-github.git diff --git a/doc/book.xml b/doc/book.xml index 03032af..86c2886 100644 --- a/doc/book.xml +++ b/doc/book.xml @@ -1,781 +1,2127 @@ - + + + + %local; + + + + %idcommon; +]> + - Metaproxy - User's Guide and Reference - - MikeTaylor - - - 2006 - Index Data - - - - ### - Metaproxy is ... in need of description :-) - - + Metaproxy - User's Guide and Reference + + + AdamDickmeiss + + + MarcCromme + + + MikeTaylor + + + &version; + + 2005-2014 + Index Data + + + + This manual is part of Metaproxy version &version;. + + + Metaproxy is a universal router, proxy and encapsulated + metasearcher for information retrieval protocols. It accepts, + processes, interprets and redirects requests from IR clients using + standard protocols such as the binary + ANSI/NISO Z39.50 + and the information search and retrieval + web service SRU + as well as functioning as a limited + HTTP server. + + + Metaproxy is configured by an XML file which + specifies how the software should function in terms of routes that + the request packets can take through the proxy, each step on a + route being an instantiation of a filter. Filters come in many + types, one for each operation: accepting Z39.50 packets, logging, + query transformation, multiplexing, etc. Further filter-types can + be added as loadable modules to extend Metaproxy functionality, + using the filter API. + + + Metaproxy is covered by the GNU General Public License version 2. + + + + + + + + + + + + - - - Introduction - + Introduction -
- Overview - Metaproxy - is .. + Metaproxy + is a stand alone program that acts as a universal router, proxy and + encapsulated metasearcher for information retrieval protocols such + as Z39.50 and + SRU. + To clients, it acts as a server of these protocols: it can be searched, + records can be retrieved from it, etc. + To servers, it acts as a client: it searches in them, + retrieves records from them, etc. it satisfies its clients' + requests by transforming them, multiplexing them, forwarding them + on to zero or more servers, merging the results, transforming + them, and delivering them back to the client. In addition, it + acts as a simple HTTP server; support + for further protocols can be added in a modular fashion, through the + creation of new filters. + + Anything goes in! + Anything goes out! + Fish, bananas, cold pyjamas, + Mutton, beef and trout! + - attributed to Cole Porter. + - ### We should probably consider saying a little more by way of - introduction. + Metaproxy is a more capable alternative to + YAZ Proxy, + being more powerful, flexible, configurable and extensible. Among + its many advantages over the older, more pedestrian work are + support for multiplexing (encapsulated metasearching), routing by + database name, authentication and authorization and serving local + files via HTTP. Equally significant, its modular architecture + facilitites the creation of pluggable modules implementing further + functionality. -
-
- - - - - Filters - - -
- Introductory notes - It's useful to think of Metaproxy as an interpreter providing a small - number of primitives and operations, but operating on a very - complex data type, namely the ``package''. + This manual will describe how to install Metaproxy + before giving an overview of its architecture, then discussing the + key concept of a filter in some depth and giving an overview of + the various filter types, then discussing the configuration file + format. After this come several optional chapters which may be + freely skipped: a detailed discussion of virtual databases and + multi-database searching, some notes on writing extensions + (additional filter types) and a high-level description of the + source code. Finally comes the reference guide, which contains + instructions for invoking the metaproxy + program, and detailed information on each type of filter, + including examples. + + + + Installation - A package represents a Z39.50 or SRW/U request (whether for Init, - Search, Scan, etc.) together with information about where it came - from. Packages are created by front-end filters such as - frontend_net (see below), which reads them from - the network; other front-end filters are possible. They then pass - along a route consisting of a sequence of filters, each of which - transforms the package and may also have side-effects such as - generating logging. Eventually, the route will yield a response, - which is sent back to the origin. + Metaproxy depends on the following tools/libraries: + + YAZ++ + + + This is a C++ library based on YAZ. + + + + Libxslt + + This is an XSLT processor - based on + Libxml2. Both Libxml2 and + Libxslt must be installed with the development components + (header files, etc.) as well as the run-time libraries. + + + + Boost + + + The popular C++ library. Initial versions of Metaproxy + was built with 1.32 but this is no longer supported. + Metaproxy is known to work with Boost version 1.33 through 1.55. + + + + - There are many kinds of filter: some that are defined statically - as part of Metaproxy, and other that may be provided by third parties - and dynamically loaded. They all conform to the same simple API - of essentially two methods: configure() is - called at startup time, and is passed a DOM tree representing that - part of the configuration file that pertains to this filter - instance: it is expected to walk that tree extracting relevant - information; and process() is called every - time the filter has to processes a package. + In order to compile Metaproxy a modern C++ compiler is + required. Boost, in particular, requires the C++ compiler + to facilitate the newest features. Refer to Boost + Compiler Status + for more information. - While all filters provide the same API, there are different modes - of functionality. Some filters are sources: they create - packages - (frontend_net); - others are sinks: they consume packages and return a result - (z3950_client, - backend_test, - http_file); - the others are true filters, that read, process and pass on the - packages they are fed - (auth_simple, - log, - multi, - session_shared, - template, - virt_db). + We have successfully built Metaproxy using the compilers + GCC and + Microsoft Visual Studio. -
- -
- Individual filters - The filters are here named by the string that is used as the - type attribute of a - <filter> element in the configuration - file to request them, with the name of the class that implements - them in parentheses. + As an option, Metaproxy may also be compiled with + USEMARCON support which allows for + MARC conversions for the filter. +
+ Installation on Unix (from Source) + + Here is a quick step-by-step guide on how to compile all the + tools that Metaproxy uses. Only few systems have none of the required + tools binary packages. If, for example, Libxml2/libxslt are already + installed as development packages use those (and omit compilation). + + + + + USEMARCON is not available + as a package at the moment, so Metaproxy must be built from source + if that is to be used. + + + +
+ Libxml2/libxslt + + Libxml2/libxslt: + + + gunzip -c libxml2-version.tar.gz|tar xf - + cd libxml2-version + ./configure + make + su + make install + + + gunzip -c libxslt-version.tar.gz|tar xf - + cd libxslt-version + ./configure + make + su + make install + +
+
+ USEMARCON (optional) + + gunzip -c usemarcon317.tar.gz|tar xf - + cd usemarcon317 + ./configure + make + su + make install + +
+ +
+ YAZ/YAZ++ + + gunzip -c yaz-version.tar.gz|tar xf - + cd yaz-version + ./configure + make + su + make install + + + gunzip -c yazpp-version.tar.gz|tar xf - + cd yazpp-version + ./configure + make + su + make install + +
+
+ Boost + + Metaproxy needs components thread and test from + Boost. + + + gunzip -c boost-version.tar.gz|tar xf - + cd boost-version + ./configure --with-libraries=thread,test,regex --with-toolset=gcc + make + su + make install + + + However, under the hood bjam is used. You can invoke that with + + + ./bjam --toolset=gcc --with-thread --with-test --with-regex stage + + + Replace stage with clean / + install to perform clean and install respectively. + + + Add --prefix=DIR to install Boost in other + prefix than /usr/local. + +
+
+ Metaproxy + + gunzip -c metaproxy-version.tar.gz|tar xf - + cd metaproxy-version + ./configure + make + su + make install + + + You may have to tell configure where Boost is installed by supplying + options --with-boost and --with-boost-toolset. + The former sets the PREFIX for Boost (same as --prefix for Boost above). + The latter the compiler toolset (eg. gcc34). + + + Pass --help to configure to get a list of + available options. + +
+
-
- <literal>auth_simple</literal> - (mp::filter::AuthSimple) +
+ Installation on Debian GNU/Linux - Simple authentication and authorisation. The configuration - specifies the name of a file that is the user register, which - lists username:password - pairs, one per line, colon separated. When a session begins, it - is rejected unless username and passsword are supplied, and match - a pair in the register. + All dependencies for Metaproxy are available as + Debian packages. - ### discuss authorisation phase + The procedures for Debian based systems, such as + Ubuntu is probably similar -
- -
- <literal>backend_test</literal> - (mp::filter::Backend_test) - A sink that provides dummy responses in the manner of the - yaz-ztest Z39.50 server. This is useful only - for testing. + There is currently no official Debian package for YAZ++. + And the official Debian package for YAZ is probably too old. + But Index Data builds "new" versions of those for Debian (i386, amd64 only). -
- -
- <literal>frontend_net</literal> - (mp::filter::FrontendNet) - A source that accepts Z39.50 and SRW connections from a port - specified in the configuration, reads protocol units, and - feeds them into the next filter, eventually returning the - result to the origin. + Update the /etc/apt/sources.list + to include the Index Data repository. + See YAZ' Download Debian + for more information. -
- -
- <literal>http_file</literal> - (mp::filter::HttpFile) + + apt-get install libxslt1-dev + apt-get install libyazpp6-dev + apt-get install libboost-dev + apt-get install libboost-system-dev + apt-get install libboost-thread-dev + apt-get install libboost-test-dev + apt-get install libboost-regex-dev + - A sink that returns the contents of files from the local - filesystem in response to HTTP requests. (Yes, Virginia, this - does mean that Metaproxy is also a Web-server in its spare time. So - far it does not contain either an email-reader or a Lisp - interpreter, but that day is surely coming.) + With these packages installed, the usual configure + make + procedure can be used for Metaproxy as outlined in + .
-
- <literal>log</literal> - (mp::filter::Log) +
+ Installation on RPM based Linux Systems - Writes logging information to standard output, and passes on - the package unchanged. + All external dependencies for Metaproxy are available as + RPM packages, either from your distribution site, or from the + RPMfind site. -
- -
- <literal>multi</literal> - (mp::filter::Multi) - Performs multicast searching. See the extended discussion of - multi-database searching below. + For example, an installation of the requires Boost C++ development + libraries on RedHat Fedora C4 and C5 can be done like this: + + wget ftp://fr.rpmfind.net/wlinux/fedora/core/updates/testing/4/SRPMS/boost-1.33.0-3.fc4.src.rpm + sudo rpmbuild --buildroot src/ --rebuild -p fc4/boost-1.33.0-3.fc4.src.rpm + sudo rpm -U /usr/src/redhat/RPMS/i386/boost-*rpm + -
- -
- <literal>session_shared</literal> - (mp::filter::SessionShared) - When this is finished, it will implement global sharing of - result sets (i.e. between threads and therefore between - clients), but it's not yet done. + The YAZ library is needed to + compile &metaproxy;, see there + for more information on available RPM packages. -
- -
- <literal>template</literal> - (mp::filter::Template) - Does nothing at all, merely passing the packet on. (Maybe it - should be called nop or - passthrough?) This exists not to be used, but - to be copied - to become the skeleton of new filters as they are - written. + There is currently no official RPM package for YAZ++. + See the YAZ++ pages + for more information on a Unix tarball install. -
- -
- <literal>virt_db</literal> - (mp::filter::Virt_db) - Performs virtual database selection. See the extended discussion - of virtual databases below. + With these packages installed, the usual configure + make + procedure can be used for Metaproxy as outlined in + .
-
- <literal>z3950_client</literal> - (mp::filter::Z3950Client) +
+ Installation on Windows - Performs Z39.50 searching and retrieval by proxying the - packages that are passed to it. Init requests are sent to the - address specified in the VAL_PROXY otherInfo - attached to the request: this may have been specified by client, - or generated by a virt_db filter earlier in - the route. Subsequent requests are sent to the same address, - which is remembered at Init time in a Session object. + Metaproxy has been tested Microsoft + Visual Studio. + 2013 (C 12.0). -
-
+
+ Boost + + For Windows, it's easiest to get the precompiled Boost + package from here. + Several versions of the Boost libraries may be selected when + installing Boost for windows. Please choose at least the + multithreaded (non-DLL) version because + the Metaproxy makefile uses that. + + + For more information about installing Boost refer to the + getting started + pages. + +
+ +
+ Libxslt + + Libxslt can be downloaded + for Windows from + here. + + + Libxslt also requires libxml2 to operate. + +
+ +
+ YAZ + + YAZ can be downloaded + for Windows from + here. + +
+ +
+ YAZ++ + + Get YAZ++ as well. + Version 1.6.0 or later is required. + + + YAZ++ includes NMAKE makefiles, similar to those found in the + YAZ package. + +
+ +
+ Metaproxy + + Metaproxy is shipped with NMAKE makefiles as well - similar + to those found in the YAZ++/YAZ packages. Adjust this Makefile + to point to the proper locations of Boost, Libxslt, Libxml2, + zlib, iconv, yaz and yazpp. + + + + DEBUG + + If set to 1, the software is + compiled with debugging libraries (code generation is + multi-threaded debug DLL). + If set to 0, the software is compiled with release libraries + (code generation is multi-threaded DLL). + + + + + BOOST + + + Boost install location + + + + + + BOOST_VERSION + + + Boost version (replace . with _). + + + + + + BOOST_TOOLSET + + + Boost toolset. + + + + + + LIBXSLT_DIR, + LIBXML2_DIR .. + + + Specify the locations of Libxslt, libiconv, libxml2 and + libxslt. + + + + + + + + After successful compilation you'll find + metaproxy.exe in the + bin directory. + +
-
- Future directions +
+ + + + YAZ Proxy Comparison + + The table below lists facilities either supported by either + YAZ Proxy or Metaproxy. + + + Metaproxy / YAZ Proxy comparison + + + + Facility + Metaproxy + YAZ Proxy + + + + + Z39.50 server + Using filter + Supported + + + SRU server + Supported with filter + Supported + + + Z39.50 client + Supported with filter + Supported + + + SRU client + Supported with filter + Unsupported + + + Connection reuse + Supported with filter session_shared + Supported + + + Connection share + Supported with filter session_shared + Unsupported + + + Result set reuse + Supported with filter session_shared + Within one Z39.50 session / HTTP keep-alive + + + Record cache + Supported by filter session_shared + Supported for last result set within one Z39.50/HTTP-keep alive session + + + Z39.50 Virtual database, i.e. select any Z39.50 target for database + Supported with filter virt_db + Unsupported + + + SRU Virtual database, i.e. select any Z39.50 target for path + Supported with filter virt_db, + sru_z3950 + Supported + + + Multi target search + Supported with filter multi (round-robin) + Unsupported + + + Retrieval and search limits + Supported using filter limit + Supported + + + Bandwidth limits + Supported using filter limit + Supported + + + Connect limits + Supported by filter frontend_net (connect-max) + Supported + + + Retrieval sanity check and conversions + Supported using filter record_transform + Supported + + + Query check + + Supported by query_rewrite which may be check + a query and throw diagnostics (errors) + + Supported + + + Query rewrite + Supported with query_rewrite + Unsupported + + + Session invalidate for -1 hits + Unsupported + Supported + + + Architecture + Multi-threaded + select for networked modules such as + frontend_net) + Single-threaded using select + + + + Extensability + Most functionality implemented as loadable modules + Unsupported and experimental + + + + USEMARCON + Supported with record_transform + Supported + + + + Portability + + Requires YAZ, YAZ++ and modern C++ compiler supporting + Boost. + + + Requires YAZ and YAZ++. + STL is not required so pretty much any C++ compiler out there should work. + + + + + +
+
+ + + The Metaproxy Architecture - Some other filters that do not yet exist, but which would be - useful, are briefly described. These may be added in future - releases. + The Metaproxy architecture is based on three concepts: + the package, + the route + and the filter. - - frontend_cli (source) + Packages - Command-line interface for generating requests. + A package is request or response, encoded in some protocol, + issued by a client, making its way through Metaproxy, send to or + received from a server, or sent back to the client. - - - - srw2z3950 (filter) - - Translate SRW requests into Z39.50 requests. + The core of a package is the protocol unit - for example, a + Z39.50 Init Request or Search Response, or an SRU searchRetrieve + URL or Explain Response. In addition to this core, a package + also carries some extra information added and used by Metaproxy + itself. - - - - srw_client (sink) - - SRW searching and retrieval. + In general, packages are doctored as they pass through + Metaproxy. For example, when the proxy performs authentication + and authorization on a Z39.50 Init request, it removes the + authentication credentials from the package so that they are not + passed onto the back-end server; and when search-response + packages are obtained from multiple servers, they are merged + into a single unified package that makes its way back to the + client. - sru_client (sink) + Routes - SRU searching and retrieval. + Packages make their way through routes, which can be thought of + as programs that operate on the package data-type. Each + incoming package initially makes its way through a default + route, but may be switched to a different route based on various + considerations. Routes are made up of sequences of filters (see + below). - opensearch_client (sink) + Filters - A9 OpenSearch searching and retrieval. + Filters provide the individual instructions within a route, and + effect the necessary transformations on packages. A particular + configuration of Metaproxy is essentially a set of filters, + described by configuration details and arranged in order in one + or more routes. There are many kinds of filter - about a dozen + at the time of writing with more appearing all the time - each + performing a specific function and configured by different + information. + + + The word ``filter'' is sometimes used rather loosely, in two + different ways: it may be used to mean a particular + type of filter, as when we speak of ``the + auth_simple filter'' or ``the multi filter''; or it may be used + to be a specific instance of a filter + within a Metaproxy configuration. For example, a single + configuration will often contain multiple instances of the + z3950_client filter. In + operational terms, of these is a separate filter. In practice, + context always make it clear which sense of the word ``filter'' + is being used. + + + Extensibility of Metaproxy is primarily through the creation of + plugins that provide new filters. The filter API is small and + conceptually simple, but there are many details to master. See + the section below on + Filters. -
+ + Since packages are created and handled by the system itself, and + routes are conceptually simple, most of the remainder of this + document concentrates on filters. After a brief overview of the + filter types follows, along with some thoughts on possible future + directions. + - - Configuration: the Metaproxy configuration file format + + Filters -
- Introductory notes - - If Metaproxy is an interpreter providing operations on packages, then - its configuration file can be thought of as a program for that - interpreter. Configuration is by means of a single file, the name - of which is supplied as the sole command-line argument to the - yp2 program. - - - The configuration files are written in XML. (But that's just an - implementation detail - they could just as well have been written - in YAML or Lisp-like S-expressions, or in a custom syntax.) - - - Since XML has been chosen, an XML schema, - config.xsd, is provided for validating - configuration files. This file is supplied in the - etc directory of the Metaproxy distribution. It - can be used by (among other tools) the xmllint - program supplied as part of the libxml2 - distribution: - - - xmllint --noout --schema etc/config.xsd my-config-file.xml - - - (A recent version of libxml2 is required, as - support for XML Schemas is a relatively recent addition.) - +
+ Introductory notes + + It's useful to think of Metaproxy as an interpreter providing a small + number of primitives and operations, but operating on a very + complex data type, namely the ``package''. + + + A package represents a Z39.50 or SRU/W request (whether for Init, + Search, Scan, etc.) together with information about where it came + from. Packages are created by front-end filters such as + frontend_net (see below), which reads them from + the network; other front-end filters are possible. They then pass + along a route consisting of a sequence of filters, each of which + transforms the package and may also have side-effects such as + generating logging. Eventually, the route will yield a response, + which is sent back to the origin. + + + There are many kinds of filter: some that are defined statically + as part of Metaproxy, and others may be provided by third parties + and dynamically loaded. They all conform to the same simple API + of essentially two methods: configure() is + called at startup time, and is passed an XML DOM tree representing that + part of the configuration file that pertains to this filter + instance: it is expected to walk that tree extracting relevant + information; and process() is called every + time the filter has to processes a package. + + + While all filters provide the same API, there are different modes + of functionality. Some filters are sources: they create + packages + (frontend_net); + others are sinks: they consume packages and return a result + (backend_test, + bounce, + http_file, + z3950_client); + the others are true filters, that read, process and pass on the + packages they are fed + (auth_simple, + log, + multi, + query_rewrite, + record_transform, + session_shared, + sru_z3950, + template, + virt_db). +
-
- Overview of XML structure - - All elements and attributes are in the namespace - . - This is most easily achieved by setting the default namespace on - the top-level element, as here: - - - <yp2 xmlns="http://indexdata.dk/yp2/config/1"> - - - The top-level element is <yp2>. This contains a - <start> element, a <filters> element and a - <routes> element, in that order. <filters> is - optional; the other two are mandatory. All three are - non-repeatable. - - - The <start> element is empty, but carries a - route attribute, whose value is the name of - route at which to start running - analogouse to the name of the - start production in a formal grammar. - - - If present, <filters> contains zero or more <filter> - elements; filters carry a type attribute and - contain various elements that provide suitable configuration for - filters of that type. The filter-specific elements are described - below. Filters defined in this part of the file must carry an - id attribute so that they can be referenced - from elsewhere. - - - <routes> contains one or more <route> elements, each - of which must carry an id element. One of the - routes must have the ID value that was specified as the start - route in the <start> element's route - attribute. Each route contains zero or more <filter> - elements. These are of two types. They may be empty, but carry a - refid attribute whose value is the same as the - id of a filter previously defined in the - <filters> section. Alternatively, a route within a filter - may omit the refid attribute, but contain - configuration elements similar to those used for filters defined - in the <filters> section. - -
+
+ Overview of filter types + + We now briefly consider each of the types of filter supported by + the core Metaproxy binary. This overview is intended to give a + flavor of the available functionality; more detailed information + about each type of filter is included below in + . + + + The filters are here named by the string that is used as the + type attribute of a + <filter> element in the configuration + file to request them, with the name of the class that implements + them in parentheses. (The classname is not needed for normal + configuration and use of Metaproxy; it is useful only to + developers.) + + + The filters are here listed in alphabetical order: + -
- Filter configuration - - All <filter> elements have in common that they must carry a - type attribute whose value is one of the - supported ones, listed in the schema file and discussed below. In - additional, <filters>s occurring the <filters> section - must have an id attribute, and those occurring - within a route must have either a refid - attribute referencing a previously defined filter or contain its - own configuration information. - - - In general, each filter recognises different configuration - elements within its element, as each filter has different - functionality. These are as follows: - + -
- <literal>template</literal> - - <filter type="template"/> - -
+
+ <literal>auth_simple</literal> + (mp::filter::AuthSimple) + + Simple authentication and authorization. The configuration + specifies the name of a file that is the user register, which + lists username:password + pairs, one per line, colon separated. When a session begins, it + is rejected unless username and passsword are supplied, and match + a pair in the register. The configuration file may also specific + the name of another file that is the target register: this lists + lists username:dbname,dbname... + sets, one per line, with multiple database names separated by + commas. When a search is processed, it is rejected unless the + database to be searched is one of those listed as available to + the user. + +
+ +
+ <literal>backend_test</literal> + (mp::filter::Backend_test) + + A partial sink that provides dummy responses in the manner of the + yaz-ztest Z39.50 server. This is useful only + for testing. Seriously, you don't need this. Pretend you didn't + even read this section. + +
+ +
+ <literal>bounce</literal> + (mp::filter::Bounce) + + A sink that swallows all packages, + and returns them almost unprocessed. + It never sends any package of any type further down the row, but + sets Z39.50 packages to Z_Close, and HTTP_Request packages to + HTTP_Response err code 400 packages, and adds a suitable bounce + message. + The bounce filter is usually added at end of each filter chain route + to prevent infinite hanging of for example HTTP + requests packages when only the Z39.50 client partial sink + filter is found in the + route. + +
+ +
+ <literal>cql_rpn</literal> + (mp::filter::CQLtoRPN) + + A query language transforming filter which catches Z39.50 + searchRequest + packages containing CQL queries, transforms + those to RPN queries, + and sends the searchRequests on to the next + filters. It is among other things useful in a SRU context. + +
+ +
+ <literal>frontend_net</literal> + (mp::filter::FrontendNet) + + A source that accepts Z39.50 connections from a port + specified in the configuration, reads protocol units, and + feeds them into the next filter in the route. When the result is + received, it is returned to the original origin. + +
+ +
+ <literal>http_file</literal> + (mp::filter::HttpFile) + + A partial sink which swallows only + HTTP_Request packages, and + returns the contents of files from the local + filesystem in response to HTTP requests. + It lets Z39.50 packages and all other forthcoming package types + pass untouched. + (Yes, Virginia, this + does mean that Metaproxy is also a Web-server in its spare time. So + far it does not contain either an email-reader or a Lisp + interpreter, but that day is surely coming.) + +
+ +
+ <literal>load_balance</literal> + (mp::filter::LoadBalance) + + Performs load balancing for incoming Z39.50 init requests. + It is used together with the virt_db filter, + but unlike the multi filter it does send an + entire session to only one of the virtual backends. The + load_balance filter is assuming that + all backend targets have equal content, and chooses the backend + with least load cost for a new session. + + + This filter is experimental and yet not mature for heavy load + production sites. + + + +
+ +
+ <literal>log</literal> + (mp::filter::Log) + + Writes logging information to standard output, and passes on + the package unchanged. A log file name can be specified, as well + as multiple different logging formats. + +
-
- <literal>virt_db</literal> - - <filter type="virt_db"> - <virtual> - <database>loc</database> - <target>z3950.loc.gov:7090/voyager</target> - </virtual> - <virtual> - <database>idgils</database> - <target>indexdata.dk/gils</target> - </virtual> - </filter> - +
+ <literal>multi</literal> + (mp::filter::Multi) + + Performs multi-database searching. + See + the extended discussion + of virtual databases and multi-database searching below. + +
+ +
+ <literal>query_rewrite</literal> + (mp::filter::QueryRewrite) + + Rewrites Z39.50 Type-1 + and Type-101 (``RPN'') + queries by a + three-step process: the query is transliterated from Z39.50 + packet structures into an XML representation; that XML + representation is transformed by an XSLT stylesheet; and the + resulting XML is transliterated back into the Z39.50 packet + structure. + +
+ + +
+ <literal>record_transform</literal> + (mp::filter::RecordTransform) + + This filter acts only on Z3950 present requests, and let all + other types of packages and requests pass untouched. It's use is + twofold: blocking Z3950 present requests, which the backend + server does not understand and can not honor, and transforming + the present syntax and elementset name according to the rules + specified, to fetch only existing record formats, and transform + them on the fly to requested record syntaxes. + +
+ +
+ <literal>session_shared</literal> + (mp::filter::SessionShared) + + This filter implements global sharing of + result sets (i.e. between threads and therefore between + clients), yielding performance improvements by clever resource + pooling. + +
+ +
+ <literal>sru_z3950</literal> + (mp::filter::SRUtoZ3950) + + This filter transforms valid + SRU GET/POST/SOAP searchRetrieve requests to Z3950 init, search, + and present requests, and wraps the + received hit counts and XML records into suitable SRU response + messages. + The sru_z3950 filter processes also SRU + GET/POST/SOAP explain requests, returning + either the absolute minimum required by the standard, or a full + pre-defined ZeeReX explain record. + See the + ZeeReX Explain + standard pages and the + SRU Explain pages + for more information on the correct explain syntax. + SRU scan requests are not supported yet. + +
+ +
+ <literal>template</literal> + (mp::filter::Template) + + Does nothing at all, merely passing the packet on. (Maybe it + should be called nop or + passthrough?) This exists not to be used, but + to be copied - to become the skeleton of new filters as they are + written. As with backend_test, this is not + intended for civilians. + +
+ +
+ <literal>virt_db</literal> + (mp::filter::VirtualDB) + + Performs virtual database selection: based on the name of the + database in the search request, a server is selected, and its + address added to the request in a VAL_PROXY + otherInfo packet. It will subsequently be used by a + z3950_client filter. + See + the extended discussion + of virtual databases and multi-database searching below. + +
+ +
+ <literal>z3950_client</literal> + (mp::filter::Z3950Client) + + A partial sink which swallows only Z39.50 packages. + It performs Z39.50 searching and retrieval by proxying the + packages that are passed to it. Init requests are sent to the + address specified in the VAL_PROXY otherInfo + attached to the request: this may have been specified by client, + or generated by a virt_db filter earlier in + the route. Subsequent requests are sent to the same address, + which is remembered at Init time in a Session object. + HTTP_Request packages and all other forthcoming package types + are passed untouched. +
-
- <literal>z3950_client</literal> - - <filter type="z3950_client"> - <timeout>30</timeout> - </filter> - -
-
- +
+ <literal>zeerex_explain</literal> + (mp::filter::ZeerexExplain) + + This filter acts as a sink for + Z39.50 explain requests, returning a static ZeeReX + Explain XML record from the config section. All other packages + are passed through. + See the + ZeeReX Explain + standard pages + for more information on the correct explain syntax. + + + + This filter is not yet completed. + + +
- - Virtual database as multi-database searching +
-
- Introductory notes - - Two of Metaproxy's filters are concerned with multiple-database - operations. Of these, virt_db can work alone - to control the routing of searches to one of a number of servers, - while multi can work with the output of - virt_db to perform multicast searching, merging - the results into a unified result-set. The interaction between - these two filters is necessarily complex, reflecting the real - complexity of multicast searching in a protocol such as Z39.50 - that separates initialisation from searching, with the database to - search known only during the latter operation. - +
+ Future directions - ### Much, much more to say! - -
+ Some other filters that do not yet exist, but which would be + useful, are briefly described. These may be added in future + releases (or may be created by third parties, as loadable + modules). + + + + + frontend_cli (source) + + + Command-line interface for generating requests. + + + + + sru_client (sink) + + + SRU/GET and SRU/SOAP searching and retrieval. + + + + + opensearch_client (sink) + + + A9 OpenSearch searching and retrieval. + + + + +
- - Classes in the Metaproxy source code + + Configuration: the Metaproxy configuration file format -
- Introductory notes - - Stop! Do not read this! - You won't enjoy it at all. - - - This chapter contains documentation of the Metaproxy source code, and is - of interest only to maintainers and developers. If you need to - change Metaproxy's behaviour or write a new filter, then you will most - likely find this chapter helpful. Otherwise it's a waste of your - good time. Seriously: go and watch a film or something. - This is Spinal Tap is particularly good. - - - Still here? OK, let's continue. - - - In general, classes seem to be named big-endianly, so that - FactoryFilter is not a filter that filters - factories, but a factory that produces filters; and - FactoryStatic is a factory for the statically - registered filters (as opposed to those that are dynamically - loaded). - -
+
+ Introductory notes + + If Metaproxy is an interpreter providing operations on packages, then + its configuration file can be thought of as a program for that + interpreter. Configuration is by means of a single XML file, the name + of which is supplied as the sole command-line argument to the + metaproxy program. (See + below for more information on invoking + Metaproxy.) + +
-
- Individual classes +
+ Overview of the config file XML structure + + All elements and attributes are in the namespace + . + This is most easily achieved by setting the default namespace on + the top-level element, as here: + + + <metaproxy xmlns="http://indexdata.com/metaproxy" version="1.0"> + + + The top-level element is <metaproxy>. This contains + a <dlpath> element, + a <start> element, + a <filters> element and + a <routes> element, in that order. <dlpath> and + <filters> are optional; the other two are mandatory. + All four are non-repeatable. + + + The <dlpath;> element contains a text element which + specifies the location of filter modules. This is only needed + if Metaproxy must load 3rd party filters (most filters with Metaproxy + are built into the Metaproxy application). + - The classes making up the Metaproxy application are here listed by - class-name, with the names of the source files that define them in - parentheses. - - -
- <literal>mp::FactoryFilter</literal> - (<filename>factory_filter.cpp</filename>) + The <start> element is empty, but carries a + route attribute, whose value is the name of + route at which to start running - analogous to the name of the + start production in a formal grammar. + + + If present, <filters> contains zero or more <filter> + elements. Each filter carries a type attribute + which specifies what kind of filter is being defined + (frontend_net, log, etc.) + and contain various elements that provide suitable configuration + for a filter of its type. The filter-specific elements are + described in + . + Filters defined in this part of the file must carry an + id attribute so that they can be referenced + from elsewhere. + - A factory class that exists primarily to provide the - create() method, which takes the name of a - filter class as its argument and returns a new filter of that - type. To enable this, the factory must first be populated by - calling add_creator() for static filters (this - is done by the FactoryStatic class, see below) - and add_creator_dyn() for filters loaded - dynamically. + <routes> contains one or more <route> elements, each + of which must carry an id element. One of the + routes must have the ID value that was specified as the start + route in the <start> element's route + attribute. Each route contains zero or more <filter> + elements. These are of two types. They may be empty, but carry a + refid attribute whose value is the same as the + id of a filter previously defined in the + <filters> section. Alternatively, a route within a filter + may omit the refid attribute, but contain + configuration elements similar to those used for filters defined + in the <filters> section. (In other words, each filter in a + route may be included either by reference or by physical + inclusion.)
-
- <literal>mp::FactoryStatic</literal> - (<filename>factory_static.cpp</filename>) + +
+ An example configuration - A subclass of FactoryFilter which is - responsible for registering all the statically defined filter - types. It does this by knowing about all those filters' - structures, which are listed in its constructor. Merely - instantiating this class registers all the static classes. It is - for the benefit of this class that struct - yp2_filter_struct exists, and that all the filter - classes provide a static object of that type. + The following is a small, but complete, Metaproxy configuration + file (included in the distribution as + metaproxy/etc/config1.xml). + This file defines a very simple configuration that simply proxies + to whatever back-end server the client requests, but logs each + request and response. This can be useful for debugging complex + client-server dialogues. + + + + /usr/lib/metaproxy/modules + + + + @:9000 + + + + + + + + + + + + + +]]> + + It works by defining a single route, called + start, which consists of a sequence of four + filters. The first and last of these are included by reference: + their <filter> elements have + refid attributes that refer to filters defined + within the prior <filters> section. The + middle filter is included inline in the route. + + + The four filters in the route are as follows: first, a + frontend_net filter accepts Z39.50 requests + from any host on port 9000; then these requests are passed through + a log filter that emits a message for each + request; they are then fed into a z3950_client + filter, which forwards all Z39.50 requests to the client-specified + back-end Z39.509 server. Those Z39.50 packages are returned by the + z3950_client filter, with the response data + filled by the external Z39.50 server targeted. + All non-Z39.50 packages are passed through to the + bounce filter, which definitely bounces + everything, including fish, bananas, cold pyjamas, + mutton, beef and trout packages. + When the response arrives, it is handed + back to the log filter, which emits another + message; and then to the frontend_net filter, + which returns the response to the client.
-
- <literal>mp::filter::Base</literal> - (<filename>filter.cpp</filename>) - - The virtual base class of all filters. The filter API is, on the - surface at least, extremely simple: two methods. - configure() is passed a DOM tree representing - that part of the configuration file that pertains to this filter - instance, and is expected to walk that tree extracting relevant - information. And process() processes a - package (see below). That surface simplicitly is a bit - misleading, as process() needs to know a lot - about the Package class in order to do - anything useful. - +
+ Config file modularity + + Metaproxy XML configuration snippets can be reused by other + filters using the XInclude standard, as seen in + the /etc/config-sru-to-z3950.xml example SRU + configuration. + + + + + +]]> +
-
- <literal>mp::filter::AuthSimple</literal>, - <literal>Backend_test</literal>, etc. - (<filename>filter_auth_simple.cpp</filename>, - <filename>filter_backend_test.cpp</filename>, etc.) +
+ Config file syntax checking - Individual filters. Each of these is implemented by a header and - a source file, named filter_*.hpp and - filter_*.cpp respectively. All the header - files should be pretty much identical, in that they declare the - class, including a private Rep class and a - member pointer to it, and the two public methods. The only extra - information in any filter header is additional private types and - members (which should really all be in the Rep - anyway) and private methods (which should also remain known only - to the source file, but C++'s brain-damaged design requires this - dirty laundry to be exhibited in public. Thanks, Bjarne!) + The distribution contains RelaxNG Compact and XML syntax checking + files, as well as XML Schema files. These are found in the + distribution paths + + xml/schema/metaproxy.rnc + xml/schema/metaproxy.rng + xml/schema/metaproxy.xsd + + and can be used to verify or debug the XML structure of + configuration files. For example, using the utility + xmllint, syntax checking is done like this: + + xmllint --noout --schema xml/schema/metaproxy.xsd etc/config-local.xml + xmllint --noout --relaxng xml/schema/metaproxy.rng etc/config-local.xml + + (A recent version of libxml2 is required, as + support for XML Schemas is a relatively recent addition.) - The source file for each filter needs to supply: + You can of course use any other RelaxNG or XML Schema compliant tool + you wish. - - - - A definition of the private Rep class. - - - - - Some boilerplate constructors and destructors. - - - - - A configure() method that uses the - appropriate XML fragment. - - - - - Most important, the process() method that - does all the actual work. - - - -
+
+ + + -
- <literal>mp::Package</literal> - (<filename>package.cpp</filename>) + + Virtual databases and multi-database searching + + +
+ Introductory notes - Represents a package on its way through the series of filters - that make up a route. This is essentially a Z39.50 or SRU APDU - together with information about where it came from, which is - modified as it passes through the various filters. + Two of Metaproxy's filters are concerned with multiple-database + operations. Of these, virt_db can work alone + to control the routing of searches to one of a number of servers, + while multi can work together with + virt_db to perform multi-database searching, merging + the results into a unified result-set - ``metasearch in a box''. -
- -
- <literal>mp::Pipe</literal> - (<filename>pipe.cpp</filename>) - This class provides a compatibility layer so that we have an IPC - mechanism that works the same under Unix and Windows. It's not - particularly exciting. + The interaction between + these two filters is necessarily complex: it reflects the real, + irreducible complexity of multi-database searching in a protocol such + as Z39.50 that separates initialization from searching, and in + which the database to be searched is not known at initialization + time. + + + It's possible to use these filters without understanding the + details of their functioning and the interaction between them; the + next two sections of this chapter are ``HOW-TO'' guides for doing + just that. However, debugging complex configurations will require + a deeper understanding, which the last two sections of this + chapters attempt to provide.
-
- <literal>mp::RouterChain</literal> - (<filename>router_chain.cpp</filename>) + +
+ Virtual databases with the <literal>virt_db</literal> filter + + Working alone, the purpose of the + virt_db + filter is to route search requests to one of a selection of + back-end databases. In this way, a single Z39.50 endpoint + (running Metaproxy) can provide access to several different + underlying services, including those that would otherwise be + inaccessible due to firewalls. In many useful configurations, the + back-end databases are local to the Metaproxy installation, but + the software does not enforce this, and any valid Z39.50 servers + may be used as back-ends. + - ### + For example, a virt_db + filter could be set up so that searches in the virtual database + ``lc'' are forwarded to the Library of Congress bibliographic + catalogue server, and searches in the virtual database ``marc'' + are forwarded to the toy database of MARC records that Index Data + hosts for testing purposes. A virt_db + configuration to make this switch would look like this: + + + + lc + z3950.loc.gov:7090/voyager + + + marc + indexdata.com/marc + +]]> + + As well as being useful in it own right, this filter also provides + the foundation for multi-database searching.
-
- <literal>mp::RouterFleXML</literal> - (<filename>router_flexml.cpp</filename>) + +
+ Multi-database search with the <literal>multi</literal> filter + + To arrange for Metaproxy to broadcast searches to multiple back-end + servers, the configuration needs to include two components: a + virt_db + filter that specifies multiple + <target> + elements, and a subsequent + multi + filter. Here, for example, is a complete configuration that + broadcasts searches to both the Library of Congress catalogue and + Index Data's tiny testing database of MARC records: + + + + + + + + 10 + @:9000 + + + + lc + z3950.loc.gov:7090/voyager + + + marc + indexdata.com/marc + + + all + z3950.loc.gov:7090/voyager + indexdata.com/marc + + + + + 30 + + + + +]]> + + (Using a + virt_db + filter that specifies multiple + <target> + elements but without a subsequent + multi + filter yields surprising and undesirable results, as will be + described below. Don't do that.) + + + Metaproxy can be invoked with this configuration as follows: + + ../src/metaproxy --config config-simple-multi.xml + + And thereafter, Z39.50 clients can connect to the running server + (on port 9000, as specified in the configuration) and search in + any of the databases + lc (the Library of Congress catalogue), + marc (Index Data's test database of MARC records) + or + all (both of these). As an example, a session + using the YAZ command-line client yaz-client is + here included (edited for brevity and clarity): + + base lc +Z> find computer +Search was a success. +Number of hits: 10000, setno 1 +Elapsed: 5.521070 +Z> base marc +Z> find computer +Search was a success. +Number of hits: 10, setno 3 +Elapsed: 0.060187 +Z> base all +Z> find computer +Search was a success. +Number of hits: 10010, setno 4 +Elapsed: 2.237648 +Z> show 1 +[marc]Record type: USmarc +001 11224466 +003 DLC +005 00000000000000.0 +008 910710c19910701nju 00010 eng +010 $a 11224466 +040 $a DLC $c DLC +050 00 $a 123-xyz +100 10 $a Jack Collins +245 10 $a How to program a computer +260 1 $a Penguin +263 $a 8710 +300 $a p. cm. +Elapsed: 0.119612 +Z> show 2 +[VOYAGER]Record type: USmarc +001 13339105 +005 20041229102447.0 +008 030910s2004 caua 000 0 eng +035 $a (DLC) 2003112666 +906 $a 7 $b cbc $c orignew $d 4 $e epcn $f 20 $g y-gencatlg +925 0 $a acquire $b 1 shelf copy $x policy default +955 $a pc10 2003-09-10 $a pv12 2004-06-23 to SSCD; $h sj05 2004-11-30 $e sj05 2004-11-30 to Shelf. +010 $a 2003112666 +020 $a 0761542892 +040 $a DLC $c DLC $d DLC +050 00 $a MLCM 2004/03312 (G) +245 10 $a 007, everything or nothing : $b Prima's official strategy guide / $c created by Kaizen Media Group. +246 3 $a Double-O-seven, everything or nothing +246 30 $a Prima's official strategy guide +260 $a Roseville, CA : $b Prima Games, $c c2004. +300 $a 161 p. : $b col. ill. ; $c 28 cm. +500 $a "Platforms: Nintendo GameCube, Macintosh, PC, PlayStation 2 computer entertainment system, Xbox"--P. [4] of cover. +650 0 $a Video games. +710 2 $a Kaizen Media Group. +856 42 $3 Publisher description $u http://www.loc.gov/catdir/description/random052/2003112666.html +Elapsed: 0.150623 +Z> +]]> + + As can be seen, the first record in the result set is from the + Index Data test database, and the second from the Library of + Congress database. The result-set continues alternating records + round-robin style until the point where one of the databases' + records are exhausted. + + + This example uses only two back-end databases; more may be used. + There is no limitation imposed on the number of databases that may + be metasearched in this way: issues of resource usage and + administrative complexity dictate the practical limits. + - ### + What happens when one of the databases doesn't respond? By default, + the entire multi-database search fails, and the appropriate + diagnostic is returned to the client. This is usually appropriate + during development, when technicians need maximum information, but + can be inconvenient in deployment, when users typically don't want + to be bothered with problems of this kind and prefer just to get + the records from the databases that are available. To obtain this + latter behavior add an empty + <hideunavailable> + element inside the + multi filter: + + + + ]]> + + Under this regime, an error is reported to the client only if + all the databases in a multi-database search + are unavailable.
-
- <literal>mp::Session</literal> - (<filename>session.cpp</filename>) + +
+ What's going on? + + Lark's vomit + + This section goes into a level of technical detail that is + probably not necessary in order to configure and use Metaproxy. + It is provided only for those who like to know how things work. + You should feel free to skip on to the next section if this one + doesn't seem like fun. + + + + Hold on tight - this may get a little hairy. + - ### + In the general course of things, a Z39.50 Init request may carry + with it an otherInfo packet of type VAL_PROXY, + whose value indicates the address of a Z39.50 server to which the + ultimate connection is to be made. (This otherInfo packet is + supported by YAZ-based Z39.50 clients and servers, but has not yet + been ratified by the Maintenance Agency and so is not widely used + in non-Index Data software. We're working on it.) + The VAL_PROXY packet functions + analogously to the absoluteURI-style Request-URI used with the GET + method when a web browser asks a proxy to forward its request: see + the + Request-URI + section of + the HTTP 1.1 specification. + + Within Metaproxy, Search requests that are part of the same + session as an Init request that carries a + VAL_PROXY otherInfo are also annotated with the + same information. The role of the virt_db + filter is to rewrite this otherInfo packet dependent on the + virtual database that the client wants to search. + + + When Metaproxy receives a Z39.50 Init request from a client, it + doesn't immediately forward that request to the back-end server. + Why not? Because it doesn't know which + back-end server to forward it to until the client sends a Search + request that specifies the database that it wants to search in. + Instead, it just treasures the Init request up in its heart; and, + later, the first time the client does a search on one of the + specified virtual databases, a connection is forged to the + appropriate server and the Init request is forwarded to it. If, + later in the session, the same client searches in a different + virtual database, then a connection is forged to the server that + hosts it, and the same cached Init request is forwarded there, + too. + + + All of this clever Init-delaying is done by the + frontend_net filter. The + virt_db filter knows nothing about it; in + fact, because the Init request that is received from the client + doesn't get forwarded until a Search request is received, the + virt_db filter (and the + z3950_client filter behind it) doesn't even get + invoked at Init time. The only thing that a + virt_db filter ever does is rewrite the + VAL_PROXY otherInfo in the requests that pass + through it. + + + It is possible for a virt_db filter to contain + multiple + <target> + elements. What does this mean? Only that the filter will add + multiple VAL_PROXY otherInfo packets to the + Search requests that pass through it. That's because the virtual + DB filter is dumb, and does exactly what it's told - no more, no + less. + If a Search request with multiple VAL_PROXY + otherInfo packets reaches a z3950_client + filter, this is an error. That filter doesn't know how to deal + with multiple targets, so it will either just pick one and search + in it, or (better) fail with an error message. + + + The multi filter comes to the rescue! This is + the only filter that knows how to deal with multiple + VAL_PROXY otherInfo packets, and it does so by + making multiple copies of the entire Search request: one for each + VAL_PROXY. Each of these new copies is then + passed down through the remaining filters in the route. (The + copies are handled in parallel though the + spawning of new threads.) Since the copies each have only one + VAL_PROXY otherInfo, they can be handled by the + z3950_client filter, which happily deals with + each one individually. When the results of the individual + searches come back up to the multi filter, it + merges them into a single Search response, which is what + eventually makes it back to the client. + + + + + + + + + + + + + [Here there should be a diagram showing the progress of + packages through the filters during a simple virtual-database + search and a multi-database search, but is seems that your + tool chain has not been able to include the diagram in this + document.] + + + + A picture is worth a thousand words (but only five hundred on 64-bit architectures) + +
+ + + + + Combined SRU webservice and Z39.50 server configuration + + Metaproxy can act as + SRU and + web service server, which translates web service requests to + ANSI/NISO Z39.50 packages and + sends them off to common available targets. + + + A typical setup for this operation needs a filter route including the + following modules: + + + + SRU/Z39.50 Server Filter Route Configuration + + + + Filter + Importance + Purpose + + + + + + frontend_net + required + Accepting HTTP connections and passing them to following + filters. Since this filter also accepts Z39.50 connections, the + server works as SRU and Z39.50 server on the same port. + + + sru_z3950 + required + Accepting SRU GET/POST/SOAP explain and + searchRetrieve requests for the the configured databases. + Explain requests are directly served from the static XML configuration. + SearchRetrieve requests are + transformed to Z39.50 search and present packages. + All other HTTP and Z39.50 packages are passed unaltered. + + + http_file + optional + Serving HTTP requests from the filesystem. This is only + needed if the server should serve XSLT stylesheets, static HTML + files or Java Script for thin browser based clients. + Z39.50 packages are passed unaltered. + + + cql_rpn + required + Usually, Z39.50 servers do not talk CQL, hence the + translation of the CQL query language to RPN is mandatory in + most cases. Affects only Z39.50 search packages. + + + record_transform + optional + Some Z39.50 backend targets can not present XML record + syntaxes in common wanted element sets. using this filter, one + can transform binary MARC records to MARCXML records, and + further transform those to any needed XML schema/format by XSLT + transformations. Changes only Z39.50 present packages. + + + session_shared + optional + The stateless nature of web services requires frequent + re-searching of the same targets for display of paged result set + records. This might be an unacceptable burden for the accessed + backend Z39.50 targets, and this mosule can be added for + efficient backend target resource pooling. + + + z3950_client + required + Finally, a Z39.50 package sink is needed in the filter + chain to provide the response packages. The Z39.50 client module + is used to access external targets over the network, but any + coming local Z39.50 package sink could be used instead of. + + + bounce + required + Any Metaproxy package arriving here did not do so by + purpose, and is bounced back with connection closure. this + prevents inifinite package hanging inside the SRU server. + + + +
+ + A typical minimal example SRU + server configuration file is found in the tarball distribution at + etc/config-sru-to-z3950.xml. + + + Off course, any other metaproxy modules can be integrated into a + SRU server solution, including, but not limited to, load balancing, + multiple target querying + (see ), and complex RPN query rewrites. + + + +
+ + + + -
- <literal>mp::ThreadPoolSocketObserver</literal> - (<filename>thread_pool_observer.cpp</filename>) + + Classes in the Metaproxy source code + + +
+ Introductory notes + + Stop! Do not read this! + You won't enjoy it at all. You should just skip ahead to + , + which tells + + you things you really need to know, like the fact that the + fabulously beautiful planet Bethselamin is now so worried about + the cumulative erosion by ten billion visiting tourists a year + that any net imbalance between the amount you eat and the amount + you excrete whilst on the planet is surgically removed from your + bodyweight when you leave: so every time you go to the lavatory it + is vitally important to get a receipt. + - ### + This chapter contains documentation of the Metaproxy source code, and is + of interest only to maintainers and developers. If you need to + change Metaproxy's behavior or write a new filter, then you will most + likely find this chapter helpful. Otherwise it's a waste of your + good time. Seriously: go and watch a film or something. + This is Spinal Tap is particularly good. + + + Still here? OK, let's continue. + + + In general, classes seem to be named big-endianly, so that + FactoryFilter is not a filter that filters + factories, but a factory that produces filters; and + FactoryStatic is a factory for the statically + registered filters (as opposed to those that are dynamically + loaded).
-
- <literal>mp::util</literal> - (<filename>util.cpp</filename>) +
+ Individual classes - A namespace of various small utility functions and classes, - collected together for convenience. Most importantly, includes - the mp::util::odr class, a wrapper for YAZ's - ODR facilities. + The classes making up the Metaproxy application are here listed by + class-name, with the names of the source files that define them in + parentheses. + +
+ <literal>mp::FactoryFilter</literal> + (<filename>factory_filter.cpp</filename>) + + A factory class that exists primarily to provide the + create() method, which takes the name of a + filter class as its argument and returns a new filter of that + type. To enable this, the factory must first be populated by + calling add_creator() for static filters (this + is done by the FactoryStatic class, see below) + and add_creator_dyn() for filters loaded + dynamically. + +
+ +
+ <literal>mp::FactoryStatic</literal> + (<filename>factory_static.cpp</filename>) + + A subclass of FactoryFilter which is + responsible for registering all the statically defined filter + types. It does this by knowing about all those filters' + structures, which are listed in its constructor. Merely + instantiating this class registers all the static classes. It is + for the benefit of this class that struct + metaproxy_1_filter_struct exists, and that all the filter + classes provide a static object of that type. + +
+ +
+ <literal>mp::filter::Base</literal> + (<filename>filter.cpp</filename>) + + The virtual base class of all filters. The filter API is, on the + surface at least, extremely simple: two methods. + configure() is passed an XML DOM tree representing + that part of the configuration file that pertains to this filter + instance, and is expected to walk that tree extracting relevant + information. And process() processes a + package (see below). That surface simplicity is a bit + misleading, as process() needs to know a lot + about the Package class in order to do + anything useful. + +
+ +
+ <literal>mp::filter::AuthSimple</literal>, + <literal>Backend_test</literal>, etc. + (<filename>filter_auth_simple.cpp</filename>, + <filename>filter_backend_test.cpp</filename>, etc.) + + Individual filters. Each of these is implemented by a header and + a source file, named filter_*.hpp and + filter_*.cpp respectively. All the header + files should be pretty much identical, in that they declare the + class, including a private Rep class and a + member pointer to it, and the two public methods. + + + The source file for each filter needs to supply: + + + + + A definition of the private Rep class. + + + + + Some boilerplate constructors and destructors. + + + + + A configure() method that uses the + appropriate XML fragment. + + + + + Most important, the process() method that + does all the actual work. + + + +
+ +
+ <literal>mp::Package</literal> + (<filename>package.cpp</filename>) + + Represents a package on its way through the series of filters + that make up a route. This is essentially a Z39.50 or SRU APDU + together with information about where it came from, which is + modified as it passes through the various filters. + +
+ +
+ <literal>mp::Pipe</literal> + (<filename>pipe.cpp</filename>) + + This class provides a compatibility layer so that we have an IPC + mechanism that works the same under Unix and Windows. It's not + particularly exciting. + +
+ +
+ <literal>mp::RouterChain</literal> + (<filename>router_chain.cpp</filename>) + + ### to be written + +
+ +
+ <literal>mp::RouterFleXML</literal> + (<filename>router_flexml.cpp</filename>) + + ### to be written + +
+ +
+ <literal>mp::Session</literal> + (<filename>session.cpp</filename>) + + ### to be written + +
+ +
+ <literal>mp::ThreadPoolSocketObserver</literal> + (<filename>thread_pool_observer.cpp</filename>) + + ### to be written + +
+ +
+ <literal>mp::util</literal> + (<filename>util.cpp</filename>) + + A namespace of various small utility functions and classes, + collected together for convenience. Most importantly, includes + the mp::util::odr class, a wrapper for YAZ's + ODR facilities. + +
+ +
+ <literal>mp::xml</literal> + (<filename>xmlutil.cpp</filename>) + + A namespace of various XML utility functions and classes, + collected together for convenience. + +
-
- <literal>mp::xml</literal> - (<filename>xmlutil.cpp</filename>) + +
+ Other Source Files + + In addition to the Metaproxy source files that define the classes + described above, there are a few additional files which are + briefly described here: + + + + metaproxy_prog.cpp + + + The main function of the metaproxy program. + + + + + ex_router_flexml.cpp + + + Identical to metaproxy_prog.cpp: it's not clear why. + + + + + test_*.cpp + + + Unit-tests for various modules. + + + + - A namespace of various XML utility functions and classes, - collected together for convenience. + ### Still to be described: + ex_filter_frontend_net.cpp, + filter_dl.cpp, + plainfile.cpp, + tstdl.cpp.
-
+ -
- Other Source Files + + Reference + + + The material in this chapter is drawn directly from the individual + manual entries. In particular, the Metaproxy invocation section is + available using man metaproxy, and the section + on each individual filter is available using the name of the filter + as the argument to the man command. + + + &manref; + + + + License + + ©right; + - In addition to the Metaproxy source files that define the classes - described above, there are a few additional files which are - briefly described here: - - - - metaproxy_prog.cpp - - - The main function of the yp2 program. - - - - - ex_router_flexml.cpp - - - Identical to metaproxy_prog.cpp: it's not clear why. - - - - - test_*.cpp - - - Unit-tests for various modules. - - - - + Metaproxy is free software; you can redistribute it and/or modify it under + the terms of the GNU General Public License as published by the Free + Software Foundation; either version 2, or (at your option) any later + version. + + - ### Still to be described: - ex_filter_frontend_net.cpp, - filter_dl.cpp, - plainfile.cpp, - tstdl.cpp. + Metaproxy is distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or + FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + for more details. - - - + - -- - - - - - - - - -
- + You should have received a copy of the GNU General Public License + along with Metaproxy; see the file LICENSE. If not, write to the + Free Software Foundation, + 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + + + + + &gpl2; +