X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Fbook.xml;h=dffd47497f61499fd04e72dffe73503b41c7bf06;hb=7cad8681e3c14990e2e8bf31e20b1c36c5f51805;hp=03e08cf94e36763195dba1e6c862c7dc4559f691;hpb=3146ee2a8440f983f36788eca1e310dcd2f16adb;p=metaproxy-moved-to-github.git diff --git a/doc/book.xml b/doc/book.xml index 03e08cf..dffd474 100644 --- a/doc/book.xml +++ b/doc/book.xml @@ -2,7 +2,8 @@ + + %local; @@ -17,34 +18,43 @@ --> ]> - + Metaproxy - User's Guide and Reference - - AdamDickmeiss - - - MarcCromme - - - MikeTaylor - + + + AdamDickmeiss + + + MarcCromme + + + MikeTaylor + + + &version; - 2006 + 2005-2007 Index Data ApS + This manual is part of Metaproxy version &version;. + + Metaproxy is a universal router, proxy and encapsulated metasearcher for information retrieval protocols. It accepts, processes, interprets and redirects requests from IR clients using - standard protocols such as + standard protocols such as the binary ANSI/NISO Z39.50 - (and in the future SRU - and SRW), as + and the information search and retireval + web services SRU + and SRW, as well as functioning as a limited HTTP server. + + Metaproxy is configured by an XML file which specifies how the software should function in terms of routes that the request packets can take through the proxy, each step on a @@ -79,7 +89,7 @@ Metaproxy - is a standalone program that acts as a universal router, proxy and + is a stand alone program that acts as a universal router, proxy and encapsulated metasearcher for information retrieval protocols such as Z39.50, and in the future SRU and SRW. @@ -107,7 +117,7 @@ being more powerful, flexible, configurable and extensible. Among its many advantages over the older, more pedestrian work are support for multiplexing (encapsulated metasearching), routing by - database name, authentication and authorisation and serving local + database name, authentication and authorization and serving local files via HTTP. Equally significant, its modular architecture facilitites the creation of pluggable modules implementing further functionality. @@ -155,7 +165,7 @@ You may modify your copy of the software (fix bugs, add features) if you need to. We encourage you to send your changes back to us for integration into the master copy, but you are not obliged to do so. You - may NOT pass your changes on to any other party. + may NOT pass your changes on to any other party. @@ -221,7 +231,7 @@ for more information. - We have succesfully built Metaproxy using the compilers + We have successfully built Metaproxy using the compilers GCC version 4.0 and Microsoft Visual Studio 2003/2005. @@ -378,7 +388,7 @@ \boost\lib, \boost\include. - For more informatation about installing Boost refer to the + For more information about installing Boost refer to the getting started pages. @@ -392,7 +402,7 @@ here. - Libxslt has other dependencies, but thes can all be downloaded + Libxslt has other dependencies, but these can all be downloaded from the same site. Get the following: iconv, zlib, libxml2, libxslt. @@ -424,7 +434,7 @@
Metaproxy - Metaproxy is shipped with NMAKE makfiles as well - similar + Metaproxy is shipped with NMAKE makefiles as well - similar to those found in the YAZ++/YAZ packages. Adjust this Makefile to point to the proper locations of Boost, Libxslt, Libxml2, zlib, iconv, yaz and yazpp. @@ -482,7 +492,7 @@ - After succesful compilation you'll find + After successful compilation you'll find metaproxy.exe in the bin directory. @@ -519,7 +529,7 @@ In general, packages are doctored as they pass through Metaproxy. For example, when the proxy performs authentication - and authorisation on a Z39.50 Init request, it removes the + and authorization on a Z39.50 Init request, it removes the authentication credentials from the package so that they are not passed onto the back-end server; and when search-response packages are obtained from multiple servers, they are merged @@ -572,7 +582,7 @@ plugins that provide new filters. The filter API is small and conceptually simple, but there are many details to master. See the section below on - extensions. + Filters. @@ -627,10 +637,10 @@ packages (frontend_net); others are sinks: they consume packages and return a result - (z3950_client, - backend_test, + (backend_test, bounce, - http_file); + http_file, + z3950_client); the others are true filters, that read, process and pass on the packages they are fed (auth_simple, @@ -651,10 +661,9 @@ We now briefly consider each of the types of filter supported by the core Metaproxy binary. This overview is intended to give a - flavour of the available functionality; more detailed information + flavor of the available functionality; more detailed information about each type of filter is included below in - the reference guide to Metaproxy filters. + . The filters are here named by the string that is used as the @@ -696,7 +705,7 @@ Figure out what additional information we need in: <literal>auth_simple</literal> (mp::filter::AuthSimple) - Simple authentication and authorisation. The configuration + Simple authentication and authorization. The configuration specifies the name of a file that is the user register, which lists username:password pairs, one per line, colon separated. When a session begins, it @@ -732,8 +741,8 @@ Figure out what additional information we need in: sets Z39.50 packages to Z_Close, and HTTP_Request packages to HTTP_Response err code 400 packages, and adds a suitable bounce message. - The bounce filter is usually added at end of each filter chain - config.xml to prevent infinite hanging of for example HTTP + The bounce filter is usually added at end of each filter chain route + to prevent infinite hanging of for example HTTP requests packages when only the Z39.50 client partial sink filter is found in the route. @@ -741,13 +750,26 @@ Figure out what additional information we need in:
+ <literal>cql_rpn</literal> + (mp::filter::CQLtoRPN) + + A query language transforming filter which catches Z39.50 + searchRequest + packages containing CQL queries, transforms + those to RPN queries, + and sends the searchRequests on to the next + filters. It is among other things useful in a SRU context. + +
+ +
<literal>frontend_net</literal> (mp::filter::FrontendNet) A source that accepts Z39.50 connections from a port specified in the configuration, reads protocol units, and feeds them into the next filter in the route. When the result is - revceived, it is returned to the original origin. + received, it is returned to the original origin.
@@ -755,7 +777,8 @@ Figure out what additional information we need in: <literal>http_file</literal> (mp::filter::HttpFile) - A partial sink which swallows only HTTP_Request packages, and + A partial sink which swallows only + HTTP_Request packages, and returns the contents of files from the local filesystem in response to HTTP requests. It lets Z39.50 packages and all other forthcoming package types @@ -768,6 +791,26 @@ Figure out what additional information we need in:
+ <literal>load_balance</literal> + (mp::filter::LoadBalance) + + Performs load balancing for incoming Z39.50 init requests. + It is used together with the virt_db filter, + but unlike the multi filter it does send an + entire session to only one of the virtual backends. The + load_balance filter is assuming that + all backend targets have equal content, and chooses the backend + with least load cost for a new session. + + + This filter is experimental and yet not mature for heavy load + production sites. + + + +
+ +
<literal>log</literal> (mp::filter::Log) @@ -776,7 +819,7 @@ Figure out what additional information we need in: as multiple different logging formats.
- +
<literal>multi</literal> (mp::filter::Multi) @@ -792,7 +835,9 @@ Figure out what additional information we need in: <literal>query_rewrite</literal> (mp::filter::QueryRewrite) - Rewrites Z39.50 Type-1 and Type-101 (``RPN'') queries by a + Rewrites Z39.50 Type-1 + and Type-101 (``RPN'') + queries by a three-step process: the query is transliterated from Z39.50 packet structures into an XML representation; that XML representation is transformed by an XSLT stylesheet; and the @@ -809,9 +854,9 @@ Figure out what additional information we need in: This filter acts only on Z3950 present requests, and let all other types of packages and requests pass untouched. It's use is twofold: blocking Z3950 present requests, which the backend - server does not understand and can not honour, and transforming + server does not understand and can not honor, and transforming the present syntax and elementset name according to the rules - specified, to fetch only exisitng record formats, and transform + specified, to fetch only existing record formats, and transform them on the fly to requested record syntaxes.
@@ -820,18 +865,11 @@ Figure out what additional information we need in: <literal>session_shared</literal> (mp::filter::SessionShared) - When this is finished, it will implement global sharing of + This filter implements global sharing of result sets (i.e. between threads and therefore between - clients), yielding performance improvements especially when - incoming requests are from a stateless environment such as a - web-server, in which the client process representing a session - might be any one of many. However: + clients), yielding performance improvements by clever resource + pooling. - - - This filter is not yet completed. - -
@@ -839,8 +877,20 @@ Figure out what additional information we need in: (mp::filter::SRUtoZ3950) This filter transforms valid - SRU/GET or SRU/SOAP requests to Z3950 requests, and wraps the - recieved hit counts and XML records into suitable SRU response messages. + SRU GET/POST/SOAP searchRetrieve requests to Z3950 init, search, + and present requests, and wraps the + received hit counts and XML records into suitable SRU response + messages. + The sru_z3950 filter processes also SRU + GET/POST/SOAP explain requests, returning + either the absolute minimum required by the standard, or a full + pre-defined ZeeReX explain record. + See the + ZeeReX Explain + standard pages and the + SRU Explain pages + for more information on the correct explain syntax. + SRU scan requests are not supported yet.
@@ -888,6 +938,29 @@ Figure out what additional information we need in: are passed untouched.
+ + +
+ <literal>zeerex_explain</literal> + (mp::filter::ZeerexExplain) + + This filter acts as a sink for + Z39.50 explain requests, returning a static ZeeReX + Explain XML record from the config section. All other packages + are passed through. + See the + ZeeReX Explain + standard pages + for more information on the correct explain syntax. + + + + This filter is not yet completed. + + +
+ + @@ -910,34 +983,10 @@ Figure out what additional information we need in:
- frontend_sru (source) - - - Receive SRU (and perhaps SRW) requests. - - - - - sru2z3950 (filter) - - - Translate SRU requests into Z39.50 requests. - - - - sru_client (sink) - SRU searching and retrieval. - - - - - srw_client (sink) - - - SRW searching and retrieval. + SRU/GET and SRU/SOAP searching and retrieval. @@ -964,16 +1013,11 @@ Figure out what additional information we need in: If Metaproxy is an interpreter providing operations on packages, then its configuration file can be thought of as a program for that - interpreter. Configuration is by means of a single file, the name + interpreter. Configuration is by means of a single XML file, the name of which is supplied as the sole command-line argument to the metaproxy program. (See - the reference guide - below for more information on invoking Metaproxy.) - - - The configuration files are written in XML. (But that's just an - implementation detail - they could just as well have been written - in YAML or Lisp-like S-expressions, or in a custom syntax.) + below for more information on invoking + Metaproxy.) @@ -981,15 +1025,15 @@ Figure out what additional information we need in: Overview of the config file XML structure All elements and attributes are in the namespace - . + . This is most easily achieved by setting the default namespace on the top-level element, as here: - <yp2 xmlns="http://indexdata.dk/yp2/config/1"> + <metaproxy xmlns="http://indexdata.com/metaproxy" version="1.0"> - The top-level element is <yp2>. This contains a + The top-level element is <metaproxy>. This contains a <start> element, a <filters> element and a <routes> element, in that order. <filters> is optional; the other two are mandatory. All three are @@ -1009,7 +1053,7 @@ Figure out what additional information we need in: and contain various elements that provide suitable configuration for a filter of its type. The filter-specific elements are described in - the reference guide below. + . Filters defined in this part of the file must carry an id attribute so that they can be referenced from elsewhere. @@ -1045,7 +1089,7 @@ Figure out what additional information we need in: client-server dialogues. - + @@ -1062,7 +1106,7 @@ Figure out what additional information we need in: - + ]]> It works by defining a single route, called @@ -1084,7 +1128,7 @@ Figure out what additional information we need in: z3950_client filter, with the response data filled by the external Z39.50 server targeted. All non-Z39.50 packages are passed through to the - bounce filter, which defitely bounces + bounce filter, which definitely bounces everything, including fish, bananas, cold pyjamas, mutton, beef and trout packages. When the response arrives, it is handed @@ -1093,12 +1137,31 @@ Figure out what additional information we need in: which returns the response to the client. -
+ +
+ Config file modularity + + Metaproxy XML configuration snippets can be reused by other + filters using the XInclude standard, as seen in + the /etc/config-sru-to-z3950.xml example SRU + configuration. + + + + + +]]> + +
+ +
Config file syntax checking The distribution contains RelaxNG Compact and XML syntax checking files, as well as XML Schema files. These are found in the - distribution pathes + distribution paths xml/schema/metaproxy.rnc xml/schema/metaproxy.rng @@ -1141,14 +1204,14 @@ Figure out what additional information we need in: The interaction between these two filters is necessarily complex: it reflects the real, irreducible complexity of multi-database searching in a protocol such - as Z39.50 that separates initialisation from searching, and in - which the database to be searched is not known at initialisation + as Z39.50 that separates initialization from searching, and in + which the database to be searched is not known at initialization time. It's possible to use these filters without understanding the details of their functioning and the interaction between them; the - next two sections of this chapter are ``HOWTO'' guides for doing + next two sections of this chapter are ``HOW-TO'' guides for doing just that. However, debugging complex configurations will require a deeper understanding, which the last two sections of this chapters attempt to provide. @@ -1186,7 +1249,7 @@ Figure out what additional information we need in: marc - indexdata.dk/marc + indexdata.com/marc ]]> @@ -1211,7 +1274,7 @@ Figure out what additional information we need in: Index Data's tiny testing database of MARC records: - + @@ -1226,12 +1289,12 @@ Figure out what additional information we need in: marc - indexdata.dk/marc + indexdata.com/marc all z3950.loc.gov:7090/voyager - indexdata.dk/marc + indexdata.com/marc @@ -1241,7 +1304,7 @@ Figure out what additional information we need in: -]]> +]]> (Using a virt_db @@ -1345,7 +1408,7 @@ Z> can be inconvenient in deployment, when users typically don't want to be bothered with problems of this kind and prefer just to get the records from the databases that are available. To obtain this - latter behaviour add an empty + latter behavior add an empty <hideunavailable> element inside the multi filter: @@ -1480,9 +1543,8 @@ Z> [Here there should be a diagram showing the progress of packages through the filters during a simple virtual-database search and a multi-database search, but is seems that your - toolchain has not been able to include the diagram in this - document. This is because of LaTeX suckage. Time to move to - OpenOffice. Yes, really.] + tool chain has not been able to include the diagram in this + document.] @@ -1514,7 +1687,7 @@ Z> Stop! Do not read this! You won't enjoy it at all. You should just skip ahead to - the reference guide, + , which tells @@ -1529,7 +1702,7 @@ Z> This chapter contains documentation of the Metaproxy source code, and is of interest only to maintainers and developers. If you need to - change Metaproxy's behaviour or write a new filter, then you will most + change Metaproxy's behavior or write a new filter, then you will most likely find this chapter helpful. Otherwise it's a waste of your good time. Seriously: go and watch a film or something. This is Spinal Tap is particularly good. @@ -1595,7 +1768,7 @@ Z> that part of the configuration file that pertains to this filter instance, and is expected to walk that tree extracting relevant information. And process() processes a - package (see below). That surface simplicitly is a bit + package (see below). That surface simplicity is a bit misleading, as process() needs to know a lot about the Package class in order to do anything useful. @@ -1613,12 +1786,7 @@ Z> filter_*.cpp respectively. All the header files should be pretty much identical, in that they declare the class, including a private Rep class and a - member pointer to it, and the two public methods. The only extra - information in any filter header is additional private types and - members (which should really all be in the Rep - anyway) and private methods (which should also remain known only - to the source file, but C++'s brain-damaged design requires this - dirty laundry to be exhibited in public. Thanks, Bjarne!) + member pointer to it, and the two public methods. The source file for each filter needs to supply: @@ -1768,9 +1936,9 @@ Z> - - - Reference guide + + Reference + The material in this chapter is drawn directly from the individual manual entries. In particular, the Metaproxy invocation section is @@ -1778,7 +1946,8 @@ Z> on each individual filter is available using the name of the filter as the argument to the man command. - &manref; + + &manref;