From f5628ec48d43245fd435b9ef78b8a37bf1b42544 Mon Sep 17 00:00:00 2001 From: Mike Taylor Date: Thu, 27 Apr 2006 16:16:28 +0000 Subject: [PATCH] Much more on virtual databases and multi-database searching. Rearrange material within VDB chapter. Consistent use of "back-end" throughout prose. Replace "multicast" with "multi-database" throughout. Close up sections around their contents. Fix typos. --- doc/book.xml | 264 +++++++++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 223 insertions(+), 41 deletions(-) diff --git a/doc/book.xml b/doc/book.xml index 385f763..0b861da 100644 --- a/doc/book.xml +++ b/doc/book.xml @@ -1,4 +1,4 @@ - + Metaproxy - User's Guide and Reference @@ -578,7 +578,7 @@ <literal>multi</literal> (mp::filter::Multi) - Performs multicast searching. + Performs multi-database searching. See the extended discussion of virtual databases and multi-database searching below. @@ -825,12 +825,11 @@ file (included in the distribution as metaproxy/etc/config0.xml). This file defines a very simple configuration that simply proxies - to whatever backend server the client requests, but logs each + to whatever back-end server the client requests, but logs each request and response. This can be useful for debugging complex client-server dialogues. - + @@ -865,7 +864,7 @@ a log filter that emits a message for each request; they are then fed into a z3950_client filter, which forwards the requests to the client-specified - backend Z39.509 server. When the response arrives, it is handed + back-end Z39.509 server. When the response arrives, it is handed back to the log filter, which emits another message; and then to the front-end filter, which returns the response to the client. @@ -881,31 +880,29 @@
Introductory notes - - Lark's vomit - - This chapter goes into a level of technical detail that is - probably not necessary in order to configure and use Metaproxy. - It is provided only for those who like to know how things work. - You should feel free to skip on to the next section if this one - doesn't seem like fun. - - Two of Metaproxy's filters are concerned with multiple-database operations. Of these, virt_db can work alone to control the routing of searches to one of a number of servers, - while multi can work with the output of - virt_db to perform multicast searching, merging - the results into a unified result-set. The interaction between - these two filters is necessarily complex: it reflecting the real, - irreducible complexity of multicast searching in a protocol such + while multi can work together with + virt_db to perform multi-database searching, merging + the results into a unified result-set - ``metasearch in a box''. + + + The interaction between + these two filters is necessarily complex: it reflects the real, + irreducible complexity of multi-database searching in a protocol such as Z39.50 that separates initialisation from searching, and in which the database to be searched is not known at initialisation time. - Hold on tight - this may get a little hairy. + It's possible to use these filters without understanding the + details of their functioning and the interaction between them; the + next two sections of this chapter are ``HOWTO'' guides for doing + just that. However, debugging complex configurations will require + a deeper understanding, which the last two sections of this + chapters attempt to provide.
@@ -913,6 +910,202 @@
Virtual databases with the <literal>virt_db</literal> filter + Working alone, the purpose of the + virt_db + filter is to route search requests to one of a selection of + back-end databases. In this way, a single Z39.50 endpoint + (running Metaproxy) can provide access to several different + underlying services, including those that would otherwise be + inaccessible due to firewalls. In many useful configurations, the + back-end databases are local to the Metaproxy installation, but + the software does not enforce this, and any valid Z39.50 servers + may be used as back-ends. + + + For example, a virt_db + filter could be set up so that searches in the virtual database + ``lc'' are forwarded to the Library of Congress bibliographic + catalogue server, and searches in the virtual database ``marc'' + are forwarded to the toy database of MARC records that Index Data + hosts for testing purposes. A virt_db + configuration to make this switch would look like this: + + + + lc + z3950.loc.gov:7090/voyager + + + marc + indexdata.dk/marc + +]]> + + As well as being useful in it own right, this filter also provides + the foundation for multi-database searching. + +
+ + +
+ Multi-database search with the <literal>multi</literal> filter + + To arrange for Metaproxy to broadcast searches to multiple back-end + servers, the configuration needs to include two components: a + virt_db + filter that specifies multiple + <target> + elements, and a subsequent + multi + filter. Here, for example, is a complete configuration that + broadcasts searches to both the Library of Congress catalogue and + Index Data's tiny testing database of MARC records: + + + + + + + + 10 + @:9000 + + + + lc + z3950.loc.gov:7090/voyager + + + marc + indexdata.dk/marc + + + all + z3950.loc.gov:7090/voyager + indexdata.dk/marc + + + + + 30 + + + +]]> + + (Using a + virt_db + filter that specifies multiple + <target> + elements but without a subsequent + multi + filter yields surprising and undesirable results, as will be + described below. Don't do that.) + + + Metaproxy can be invoked with this configuration as follows: + + ../src/metaproxy --config config-simple-multi.xml + + And thereafter, Z39.50 clients can connect to the running server + (on port 9000, as specified in the configuration) and search in + any of the databases + lc (the Library of Congress catalogue), + marc (Index Data's test database of MARC records) + or + all (both of these). As an example, a session + using the YAZ command-line client yaz-client is + here included (edited for brevity and clarity): + + base lc +Z> find computer +Search was a success. +Number of hits: 10000, setno 1 +Elapsed: 5.521070 +Z> base marc +Z> find computer +Search was a success. +Number of hits: 10, setno 3 +Elapsed: 0.060187 +Z> base all +Z> find computer +Search was a success. +Number of hits: 10010, setno 4 +Elapsed: 2.237648 +Z> show 1 +[marc]Record type: USmarc +001 11224466 +003 DLC +005 00000000000000.0 +008 910710c19910701nju 00010 eng +010 $a 11224466 +040 $a DLC $c DLC +050 00 $a 123-xyz +100 10 $a Jack Collins +245 10 $a How to program a computer +260 1 $a Penguin +263 $a 8710 +300 $a p. cm. +Elapsed: 0.119612 +Z> show 2 +[VOYAGER]Record type: USmarc +001 13339105 +005 20041229102447.0 +008 030910s2004 caua 000 0 eng +035 $a (DLC) 2003112666 +906 $a 7 $b cbc $c orignew $d 4 $e epcn $f 20 $g y-gencatlg +925 0 $a acquire $b 1 shelf copy $x policy default +955 $a pc10 2003-09-10 $a pv12 2004-06-23 to SSCD; $h sj05 2004-11-30 $e sj05 2004-11-30 to Shelf. +010 $a 2003112666 +020 $a 0761542892 +040 $a DLC $c DLC $d DLC +050 00 $a MLCM 2004/03312 (G) +245 10 $a 007, everything or nothing : $b Prima's official strategy guide / $c created by Kaizen Media Group. +246 3 $a Double-O-seven, everything or nothing +246 30 $a Prima's official strategy guide +260 $a Roseville, CA : $b Prima Games, $c c2004. +300 $a 161 p. : $b col. ill. ; $c 28 cm. +500 $a "Platforms: Nintendo GameCube, Macintosh, PC, PlayStation 2 computer entertainment system, Xbox"--P. [4] of cover. +650 0 $a Video games. +710 2 $a Kaizen Media Group. +856 42 $3 Publisher description $u http://www.loc.gov/catdir/description/random052/2003112666.html +Elapsed: 0.150623 +Z> +]]> + + As can be seen, the first record in the result set is from the + Index Data test database, and the second from the Library of + Congress database. The result-set continues alternating records + round-robin style until the point where one of the databases' + records are exhausted. + + + This example uses only two back-end databases; more may be used. + There is no limitation imposed on the number of databases that may + be metasearched in this way: issues of resource usage and + administrative complexity dictate the practical limits. + +
+ + +
+ What's going on? + + Lark's vomit + + This section goes into a level of technical detail that is + probably not necessary in order to configure and use Metaproxy. + It is provided only for those who like to know how things work. + You should feel free to skip on to the next section if this one + doesn't seem like fun. + + + + Hold on tight - this may get a little hairy. + + In the general course of things, a Z39.50 Init request may carry with it an otherInfo packet of type VAL_PROXY, whose value indicates the address of a Z39.50 server to which the @@ -933,25 +1126,8 @@ The role of the virt_db filter is to rewrite this otherInfo packet dependent on the virtual database that the - client wants to search. For example, a virt_db - filter could be set up so that searches in the virtual database - ``lc'' are forwarded to the Library of Congress server, and - searches in the virtual database ``id'' are forwarded to the toy - GILS database that Index Data hosts for testing purposes. A - virt_db configuration to make this switch would - look like this: + client wants to search. - - - lc - z3950.loc.gov:7090/Voyager - - - id - indexdata.dk/gils - - ]]> When Metaproxy receives a Z39.50 Init request from a client, it doesn't immediately forward that request to the back-end server. @@ -972,7 +1148,7 @@ frontend_net filter. The virt_db filter knows nothing about it; in fact, because the Init request that is received from the client - doesn't get forwarded until a Search reqeust is received, the + doesn't get forwarded until a Search request is received, the virt_db filter (and the z3950_client filter behind it) doesn't even get invoked at Init time. The only thing that a @@ -980,8 +1156,14 @@ VAL_PROXY otherInfo in the requests that pass through it. + + ### Describe the use of multiple VAL_PROXY + otherInfos, added by virt_db and used by + multi. +
+
A picture is worth a thousand words (but only five hundred on 64-bit architectures) -- 1.7.10.4