X-Git-Url: http://git.indexdata.com/?p=yazpp-moved-to-github.git;a=blobdiff_plain;f=doc%2Fproxy.xml;h=556245410a2c4366e16f7100a49b887549bba276;hp=3948f4efa8d6cbd9e4def57dd64e949d3fab03ab;hb=d84b43231c7c5b0786e9aa62d0f7ca7ecd83bdb5;hpb=ed3df7ad6c5f8c8dbd9c1c97c5c9cd2957a24ad9 diff --git a/doc/proxy.xml b/doc/proxy.xml index 3948f4e..5562454 100644 --- a/doc/proxy.xml +++ b/doc/proxy.xml @@ -1,39 +1,153 @@ - - - YAZ Proxy + + The YAZ Proxy - The YAZ proxy is a transparent Z39.50 to Z39.50 gateway. - It is useful for debugging Z39.50 software, redirect - Z39.50 packages through fire walls, etc. + The YAZ proxy is a transparent Z39.50-to-Z39.50 gateway. That is, + it is a Z39.50 server which has as its back-end a Z39.50 client + that forwards requests on to another server (known as the + backend target.) - Furthermore, the proxy offers facilities that often - boost performance for "connection-less" Z39.50 clients such + The YAZ Proxy is useful for debugging Z39.50 software, logging + APDUs, redirecting Z39.50 packages through firewalls, etc. + Furthermore, it offers facilities that often + boost performance for connectionless Z39.50 clients such as web gateways. - Unlike most other "server" software the proxy runs single-threaded, + Unlike most other server software, the proxy runs single-threaded, single-process. Every I/O operation - is non-blocking so it is light-weight and very fast. - It does not store "state" information on the hard drive - except the log files you want. + is non-blocking so it is very lightweight and extremely fast. + It does not store any state information on the hard drive, + except any log files you ask for. + +
+ Example: Using the Proxy to Log APDUs + + Suppose you use a commercial Z39.50 client for which you do not + have source code, and it's not behaving how you think it should + when running against some specific server that you have no control + over. One way to diagnose the problem is to find out what packets + (APDUs) are being sent and received, but not all client + applications have facilities to do APDU logging. + + + No problem. Run the proxy on a friendly machine, get it to log + APDUs, and point the errant client at the proxy instead of + directly at the server that's causing it problems. + + + Suppose the server is running on foo.bar.com, + port 18398. Run the proxy on the machine of your choice, say + your.company.com like this: + + + yaz-proxy -a - -t tcp:foo.bar.com:18398 tcp:@:9000 + + + (The -a - option requests APDU logging on + standard output, -t tcp:foo.bar.com:18398 + specifies where the backend target is, and + tcp:@:9000 tells the proxy to listen on port + 9000 and accept connections from any machine.) + + + Now change your client application's configuration so that instead + of connecting to foo.bar.com port 18398, it + connects to your.company.com port 9000, and + start it up. It will work exactly as usual, but all the packets + will be sent via the proxy, which will generate a log like this: + + + +
+
- Specifying the backend target + Specifying the Backend Target - When a Z39.50 client session is accepted by the proxy, the proxy + When the proxy accepts a Z39.50 client session, it determines the backend target by the following rules: - If the Initialize Request PDU from the client - includes Other-Information, with OID, - 1.2.840.10003.10.1000.81.1, that - specifies the target. + If the InitializeRequest PDU from the + client includes an + otherInfo + element with OID + 1.2.840.10003.10.1000.81.1, then the + contents of that element specify the target to be used, in the + usual YAZ address format (typically + tcp:hostname:port) + as described in + the Addresses section of the YAZ manual. - Otherwise, the Proxy uses the default target if given. - (option -t). + Otherwise, the Proxy uses the default target, if one was + specified on the command-line with the -t + option. A default target can also be specified in the + XML Config file. @@ -45,179 +159,512 @@
- Keep-alive facility for Stateless clients + Keep-alive Facility - Stateless clients may generate a cookie for a Z39.50 - session which is sent to the proxy as part of PDUs. - In this case, the proxy will keep the Z39.50 session alive - to the backend target even the connection from the client - to the proxy is closed. When the client contacts the - proxy again it will re-issue the cookie and reuse the - Z39.50 connection with the backend target. Note that there is not - guarantee that the Z39.50 is kept forever to the backend - target, since the proxy will shut it down after certain - idle time. So in effect, the connection from the client's - point of view should be considered stateless. + The keep-alive is a facility where the proxy keeps the connection to the + backend - even if the client closes the connection to the proxy. - As for the target specification, the Other-Information - area is used to hold the cookie with OID - 1.2.840.10003.10.1000.81.2. + If a new or another client connects to the proxy again and requests the + same backend it will be reassigned to this backend. In this case, the + proxy sends an initialize response directly to the client and an + initialize handshake with the backend is omitted. + + + When a client reconnects, query and record caching works better, if the + proxy assigns it to the same backend as before. And the result set + (if any) is re-used. To achieve this, Index Data defined a session + cookie which identifies the backend session. + + + The cookie is defined by the client and is sent as part of the + Initialize Request and passed in an + otherInfo + element with OID 1.2.840.10003.10.1000.81.2. + + + Clients that do not send a cookie as part of the initialize request + may still better performance, since the init handshake is saved.
- -
+ +
Query Caching - Simple stateless clients often sends identical Z39.50 searches - in a relatively short period of time (full-list, next-page, - single full-record, etc). And for many targets, it's - much more expensive to produce a new result set than - reuse and existing one. + Simple stateless clients often send identical Z39.50 searches + in a relatively short period of time (e.g. in order to produce a + results-list page, the next page, + a single full-record, etc). And for many targets, it's + much more expensive to produce a new result set than to + reuse an existing one. - The proxy tries to solve that by storing the last query for each - backend target. So when an identical query is received that + The proxy tries to solve that by remembering the last query for each + backend target, so that if an identical query is received next, it is turned into Present Requests rather than new Search Requests. + + + In a future we release will will probably allows for + an arbitrary-sized cache for targets supporting named result sets. + + - This optimization should work for any Z39.50 client and/or - target. The target does not have to support named result sets. + You can enable/disable query caching using option -o. -
- -
- Other optimizations + +
+ Record Caching + + As an option, the proxy may also cache result set records for the + last search. + The proxy takes into account the Record Syntax and CompSpec. + The CompSpec includes simple element set names as well. + By default the cache is 200000 bytes per session. + +
+ +
+ Query Validation + + The Proxy may also be configured to trap particular attributes in + Type-1 queries and send Bib-1 diagnostics back to the client without + even consulting the backend target. This facility may be useful if + a target does not properly issue diagnostics when unsupported attributes + are send to it. + +
+ +
+ Record Syntax Validation + + The proxy may be configured to accept, reject or convert records. + When accepted, the target passes search/present requests to the + backend target under the assumption that the target can honor the + request (In fact it may not do that). When a record is rejected because + the record syntax is "unsupported" the proxy returns a diagnostic to the + client. Finally, the proxy may convert records. + + + In the current version the only supported conversion is + MARC21/USMARC in MARC-8 charset to MARCXML in UTF-8. Future version of + the proxy may do other record/charset conversions. + +
+ +
+ Other Optimizations - We've had some plans to support caching of result set records, - but this had not yet been implemented. + We've had some plans to support global caching of result set records, + but this has not yet been implemented.
-
- Proxy usage +
+ Proxy Configuration File + + The Proxy as an option may read a configuration file using option + -c followed by the filename of a config file. + + The config file is in XML format. The YAZ proxy must be compiled + with libxml2 and + libXSLT support in + order for the config file facility to be enabled. - - - yaz-proxy - 8 - - - yaz-proxy - Z39.50 proxy - - - - yaz-proxy - -a fname - -c num - -v level - -t target - -u auth - -o level - host:port - - - - DESCRIPTION + + To check for a config file to be well-formed, the yaz-proxy may + be invoked without specifying a listening port, i.e. + + yaz-proxy -c myconfig.xml + + If this does not produce errors, the file is well-formed. + + +
+ Proxy Configuration Header + + The proxy config file must have a root element called + proxy. All information except an optional XML + header must be stored within the proxy element. + + + <?xml version="1.0"?> + <proxy> + <!-- content here .. --> + </proxy> + +
+
+ Configuration: target + + The element target which may be repeated zero + or more times with parent element proxy contains + information about each backend target. + The target element have two attributes: + name which holds the logical name of the backend + target (required) and default (optional) which + (when given) specifies that the backend target is the default target - + equivalent to command line option -t. + + + + <?xml version="1.0"?> + <proxy> + <target name="server1" default="1"> + <!-- description of server1 .. --> + </target> + <target name="server2"> + <!-- description of server2 .. --> + </target> + </proxy> + + +
+
+ Configuration:url + + The url which may be repeated one or more times + should be the child of the target element. + The CDATA of url is the Z-URL of the backend. + + + Multiple url element may be used. In that case, then + a client initiates a session, the proxy chooses the URL with the lowest + number of active sessions, thereby distributing the load. It is + assumed that each URL represents the same database (data). + +
+
+ Configuration: keepalive + The keepalive element holds information about + the keepalive Z39.50 sessions. Keepalive sessions are proxy-to-backend + sessions that is no longer associated with a client session. + + The keepalive element which is the child of + the targetholds two elements: + bandwidth and pdu. + The bandwidth is the maximum total bytes + transferred to/from the target. If a target session exceeds this + limit, it is shut down (and no longer kept alive). + The pdu is the maximum number of requests sent + to the target. If a target session exceeds this limit, it is + shut down. The idea of these two limits is that avoid very long + sessions that use resources in a backend (that leaks!). + + + The following sets maximum number of bytes transferred in a + target session to 1 MB and maxinum of requests to 400. + + <keepalive> + <bandwidth>1048576</bandwidth> + <retrieve>400</retrieve> + </keepalive> + + +
+
+ Configuration: limit + + The limit section specifies bandwidth/pdu requests + limits for an active session. + The proxy records bandwidth/pdu requests during the last 60 seconds + (1 minute). The limit may include the + elements bandwidth, pdu, + and retrieve. The bandwidth + measures the number of bytes transferred within the last minute. + The pdu is the number of requests in the last + minute. The retrieve holds the maximum records to + be retrieved in one Present Request. + + + If a bandwidth/pdu limit is reached the proxy will postpone the + requests to the target and wait one or more seconds. The idea of the + limit is to ensure that clients that downloads hundreds or thousands of + records do not hurt other users. + + + The following sets maximum number of bytes transferred per minute to + 500Kbytes and maximum number of requests to 40. + + <limit> + <bandwidth>524288</bandwidth> + <retrieve>40</retrieve> + </limit> + + + - The proxy is a daemon on its own and runs stand-alone (no - inetd support). The host:port specifies host address and - listening port respectively. Use @ - for ANY address. + Typically the limits for keepalive are much higher than + those for session minute average. - - OPTIONS - - -a fname - - APDU log. - - - -c num - - Specifies maximum number of connections to be cached. - - - -v level - - Debug level (like YAZ). - - - -t target - - Default target. - - - -t target - - Authentication info sent to the backend target. - Useful if you happen to have an internal target that does - require authentication or if the client software does not allow - you to set it. - - - -o level - - Sets level for optimization. Use zero to disable; non-zero - to enable. Handling for this is not fully implemented; - we will probably use a bit mask to enable/disable specific - features. - - - - - - EXAMPLES - - The following starts the proxy so that it listens on port - 9000. The default backend target is LOC. - - $ yaz-proxy -t z3950.loc.gov:7090 @:9000 - - The LOC target is sometimes very slow. You can connect to - it using yaz-client as follows: - -$ yaz-client localhost:9000/voyager -Connecting...Ok. -Sent initrequest. -Connection accepted by target. -ID : 34 -Name : Voyager LMS - Z39.50 Server -Version: 1.13 -Options: search present -Elapsed: 7.131197 -Z> f computer -Sent searchRequest. -Received SearchResponse. -Search was a success. -Number of hits: 10000 -records returned: 0 -Elapsed: 6.695174 -Z> f computer -Sent searchRequest. -Received SearchResponse. -Search was a success. -Number of hits: 10000 -records returned: 0 -Elapsed: 0.001417 - - In this test, the second search was more than 4000 times faster - than the first. + +
+ +
+ Configuration: attribute + + The attribute element specifies accept or reject + or a particular attribute type, value pair. + Well-behaving targets will reject unsupported attributes on their + own. This feature is useful for targets that do not gracefully + handle unsupported attributes. + + + Attribute elements may be repeated. The proxy inspects the attribute + specifications in the order as specified in the configuration file. + When a given attribute specification matches a given attribute list + in a query, the proxy takes appropriate action (reject, accept). + + + If no attribute specifications matches the attribute list in a query, + it is accepted. + + + The attribute element has two required attributes: + type which is the Attribute Type-1 type, and + value which is the Attribute Type-1 value. + The special value/type * matches any attribute + type/value. A value may also be specified as a list with each + value separated by comma, a value may also be specified as a + list: low value - dash - high value. + + + If attribute error is given, that holds a + Bib-1 diagnostic which is sent to the client if the particular + type, value is part of a query. + + + If attribute error is not given, the attribute + type, value is accepted and passed to the backend target. + + + A target that supports use attributes 1,4, 1000 through 1003 and + no other use attributes, could use the following rules: + + <attribute type="1" value="1,4,1000-1003"> + <attribute type="1" value="*" error="114"/> + + +
+ +
+ Configuration: syntax + + The syntax element specifies accept or reject + or a particular record syntax request from the client. + + + The syntax has one required attribute: + type which is the Preferred Record Syntax. + + + If attribute error is given, that holds a + Bib-1 diagnostic which is sent to the client if the particular + record syntax is part of a present - or search request. + + + If attribute error is not given, the record syntax + is accepted and passed to the backend target. + + + If attribute marcxml is given, the proxy will + perform MARC21 to MARCXML conversion. In this case the + type should be XML. The proxy will use + preferred record syntax USMARC/MARC21 against the backend target. + + To accept USMARC and offer MARCXML XML records but reject + all other requests the following configuration could be used: + + <proxy> + <target name="mytarget"> + <syntax type="usmarc"/> + <syntax type="xml" marcxml="1"/> + <syntax type="*" error="238"/> + </target> + </proxy> + + +
+ +
+ Configuration: target-timeout + + The element target-timeout is the child of element + target and specifies the amount in seconds before + a target session is shut down. + + + This can also be specified on the command line by using option + -T. Refer to . + +
+ +
+ Configuration: client-timeout + + The element client-timeout is the child of element + target and specifies the amount in seconds before + a client session is shut down. + + This can also be specified on the command line by using option + -i. Refer to . + +
+ +
+ Configuration: preinit + + The element preinit is the child of element + target and specifies the number of spare + connection to a target. By default no spare connection are + created by the proxy. If the proxy uses a target exclusive or + a lot, the preinit session will ensure that target sessions + have been made before the client makes a connection and will therefore + reduce the connect-init handshake dramatically. Never set this to + more than 5. + +
+ +
+ Configuration: max-clients + + The element max-clients is the child of element + proxy and specifies the total number of + allowed connections to targets (all targets). If this limit + is reached the proxy will close the least recently used connection. + + + Note, that many Unix systems impose a system on the number of + open files allowed in a single process, typically in the + range 256 (Solaris) to 1024 (Linux). + The proxy uses 2 sockets per session + a few files + for logging. As a rule of thumb, ensure that 2*max-clients + 5 + can be opened by the proxy process. + + - The YAZ client allows you to set the backend target in - the Initialize Request using option -p. To connect to - Index Data's target you could use: - - yaz-client -p indexdata.dk localhost:9000/gils - + Using the + bash shell, you can set the limit with + ulimit -nno. + Use ulimit -a to display limits. + + +
+ +
+ Configuration: log + + The element log is the child of element + proxy and specifies what to be logged by the + proxy. - + + Specify the log file with command-line option -l. + + + The text of the log element is a sequence of + options separated by white space. See the table below: + Logging options + + + + + + Option + Description + + + + + client-apdu + + Log APDUs as reported by YAZ for the + communication between the client and the proxy. + This facility is equivalent to the APDU logging that + happens when using option -a, however + this tells the proxy to log in the same file as given + by -l. + + + + server-apdu + + Log APDUs as reported by YAZ for the + communication between the proxy and the server (backend). + + + + clients-requests + + Log a brief description about requests transferred between + the client and the proxy. The name of the request and the size + of the APDU is logged. + + + + server-requests + + Log a brief description about requests transferred between + the proxy and the server (backend). The name of the request + and the size of the APDU is logged. + + + + +
+
+ + To log communication in details between the proxy and the backend, th + following configuration could be used: + + server-apdu server-requests + +]]> + + +
+ +
+
+ Proxy Usage + + + + &yaz-proxy-ref;
+
OtherInformation Encoding + + The proxy uses the OtherInformation definition to carry + information about the target address and cookie. + + + OtherInformation ::= [201] IMPLICIT SEQUENCE OF SEQUENCE{ + category [1] IMPLICIT InfoCategory OPTIONAL, + information CHOICE{ + characterInfo [2] IMPLICIT InternationalString, + binaryInfo [3] IMPLICIT OCTET STRING, + externallyDefinedInfo [4] IMPLICIT EXTERNAL, + oid [5] IMPLICIT OBJECT IDENTIFIER}} +-- + InfoCategory ::= SEQUENCE{ + categoryTypeId [1] IMPLICIT OBJECT IDENTIFIER OPTIONAL, + categoryValue [2] IMPLICIT INTEGER} + + + The categoryTypeId is either + OID 1.2.840.10003.10.1000.81.1, 1.2.840.10003.10.1000.81.2 + for proxy target and proxy cookie respectively. The + integer element category is set to 0. + The value proxy and cookie is stored in element + characterInfo of the information + choice. + +
+