X-Git-Url: http://git.indexdata.com/?p=yazpp-moved-to-github.git;a=blobdiff_plain;f=doc%2Fproxy.xml;h=556245410a2c4366e16f7100a49b887549bba276;hp=cf76fe6459be62fc7811f8936026e655461d61df;hb=d84b43231c7c5b0786e9aa62d0f7ca7ecd83bdb5;hpb=a03eabd5ecf775dc8ba57186fc906b4b698772ed diff --git a/doc/proxy.xml b/doc/proxy.xml index cf76fe6..5562454 100644 --- a/doc/proxy.xml +++ b/doc/proxy.xml @@ -1,5 +1,5 @@ - - The YAZ Proxy + + The YAZ Proxy The YAZ proxy is a transparent Z39.50-to-Z39.50 gateway. That is, it is a Z39.50 server which has as its back-end a Z39.50 client @@ -58,7 +58,7 @@ start it up. It will work exactly as usual, but all the packets will be sent via the proxy, which will generate a log like this: - + - +
Specifying the Backend Target @@ -130,7 +131,9 @@ If the InitializeRequest PDU from the - client includes an otherInfo element with OID + client includes an + otherInfo + element with OID 1.2.840.10003.10.1000.81.1, then the contents of that element specify the target to be used, in the usual YAZ address format (typically @@ -143,7 +146,8 @@ Otherwise, the Proxy uses the default target, if one was specified on the command-line with the -t - option. + option. A default target can also be specified in the + XML Config file. @@ -155,33 +159,36 @@
- Keep-alive Facility for Stateless Clients + Keep-alive Facility + + The keep-alive is a facility where the proxy keeps the connection to the + backend - even if the client closes the connection to the proxy. + - Stateless clients such as web gateways may generate a cookie for a Z39.50 - session which is sent to the proxy as part of PDUs. - In this case, the proxy will keep alive its Z39.50 session - to the backend target even when the connection from the client - to the proxy is closed. When the client contacts the - proxy again, and re-issues the same cookie, the proxy reuses the - Z39.50 connection with the backend target. + If a new or another client connects to the proxy again and requests the + same backend it will be reassigned to this backend. In this case, the + proxy sends an initialize response directly to the client and an + initialize handshake with the backend is omitted. - There is no - guarantee that the Z39.50 connection to the backend - target is kept forever: the proxy will shut it down after certain - idle time. - - So in effect, the connection from the client's - point of view should be considered stateless, and the keep-alive - facility should be treated only as a performance booster. + When a client reconnects, query and record caching works better, if the + proxy assigns it to the same backend as before. And the result set + (if any) is re-used. To achieve this, Index Data defined a session + cookie which identifies the backend session. - Cookies may be passed in an otherInfo element - with OID 1.2.840.10003.10.1000.81.2. + The cookie is defined by the client and is sent as part of the + Initialize Request and passed in an + otherInfo + element with OID 1.2.840.10003.10.1000.81.2. + + + Clients that do not send a cookie as part of the initialize request + may still better performance, since the init handshake is saved.
- -
+ +
Query Caching Simple stateless clients often send identical Z39.50 searches @@ -196,167 +203,468 @@ backend target, so that if an identical query is received next, it is turned into Present Requests rather than new Search Requests. - + + + In a future we release will will probably allows for + an arbitrary-sized cache for targets supporting named result sets. + + - This optimization should work for any Z39.50 client and/or - target. The target does not have to support named result sets. + You can enable/disable query caching using option -o. -
- -
+ +
+ Record Caching + + As an option, the proxy may also cache result set records for the + last search. + The proxy takes into account the Record Syntax and CompSpec. + The CompSpec includes simple element set names as well. + By default the cache is 200000 bytes per session. + +
+ +
+ Query Validation + + The Proxy may also be configured to trap particular attributes in + Type-1 queries and send Bib-1 diagnostics back to the client without + even consulting the backend target. This facility may be useful if + a target does not properly issue diagnostics when unsupported attributes + are send to it. + +
+ +
+ Record Syntax Validation + + The proxy may be configured to accept, reject or convert records. + When accepted, the target passes search/present requests to the + backend target under the assumption that the target can honor the + request (In fact it may not do that). When a record is rejected because + the record syntax is "unsupported" the proxy returns a diagnostic to the + client. Finally, the proxy may convert records. + + + In the current version the only supported conversion is + MARC21/USMARC in MARC-8 charset to MARCXML in UTF-8. Future version of + the proxy may do other record/charset conversions. + +
+ +
Other Optimizations - We've had some plans to support caching of result set records, + We've had some plans to support global caching of result set records, but this has not yet been implemented.
-
- Proxy Usage +
+ Proxy Configuration File + The Proxy as an option may read a configuration file using option + -c followed by the filename of a config file. + + + The config file is in XML format. The YAZ proxy must be compiled + with libxml2 and + libXSLT support in + order for the config file facility to be enabled. - - - yaz-proxy - 8 - - - yaz-proxy - The YAZ toolkit's transparent Z39.50 proxy - - - - yaz-proxy - -a filename - -c num - -v level - -t target - -u auth - -o level - host:port - - - - DESCRIPTION - - The proxy runs stand-alone (not from - inetd). The - host:port - argument specifies host address to listen to, and the port to - listen on. Use the host @ - to listen for connections coming from any address. - - - OPTIONS - - -a filename - - Specifies the name of a file to which to write a log of the - APDUs (protocol packets) that pass through the proxy. The - special filename - may be used to indicate - standard output. - - - -c num - - Specifies the maximum number of connections to be cached - [default 50]. - - - -v level - - Sets the logging level. level is - a comma-separated list of members of the set - {fatal,debug,warn,log,malloc,all,none}. - - - -t target - - Specifies the default backend target to use when a client - connects that does not explicitly specify a target in its - initRequest. - - - -u auth - - Specifies authentication info to be sent to the backend target. - This is useful if you happen to have an internal target that - requires authentication, or if the client software does not allow - you to set it. - - - -o level - - Sets level for optimization. Use zero to disable; non-zero - to enable. Handling for this is not fully implemented; - we will probably use a bit mask to enable/disable specific - features. - - - - - - EXAMPLES - - The following command starts the proxy, listening on port - 9000, with its default backend target set to the Library of - Congress bibliographic server: - + + To check for a config file to be well-formed, the yaz-proxy may + be invoked without specifying a listening port, i.e. - $ yaz-proxy -t z3950.loc.gov:7090 @:9000 + yaz-proxy -c myconfig.xml + If this does not produce errors, the file is well-formed. + + +
+ Proxy Configuration Header + + The proxy config file must have a root element called + proxy. All information except an optional XML + header must be stored within the proxy element. + + + <?xml version="1.0"?> + <proxy> + <!-- content here .. --> + </proxy> + +
+
+ Configuration: target + + The element target which may be repeated zero + or more times with parent element proxy contains + information about each backend target. + The target element have two attributes: + name which holds the logical name of the backend + target (required) and default (optional) which + (when given) specifies that the backend target is the default target - + equivalent to command line option -t. + + + + <?xml version="1.0"?> + <proxy> + <target name="server1" default="1"> + <!-- description of server1 .. --> + </target> + <target name="server2"> + <!-- description of server2 .. --> + </target> + </proxy> + + +
+
+ Configuration:url + + The url which may be repeated one or more times + should be the child of the target element. + The CDATA of url is the Z-URL of the backend. + + + Multiple url element may be used. In that case, then + a client initiates a session, the proxy chooses the URL with the lowest + number of active sessions, thereby distributing the load. It is + assumed that each URL represents the same database (data). + +
+
+ Configuration: keepalive + The keepalive element holds information about + the keepalive Z39.50 sessions. Keepalive sessions are proxy-to-backend + sessions that is no longer associated with a client session. + + The keepalive element which is the child of + the targetholds two elements: + bandwidth and pdu. + The bandwidth is the maximum total bytes + transferred to/from the target. If a target session exceeds this + limit, it is shut down (and no longer kept alive). + The pdu is the maximum number of requests sent + to the target. If a target session exceeds this limit, it is + shut down. The idea of these two limits is that avoid very long + sessions that use resources in a backend (that leaks!). + + + The following sets maximum number of bytes transferred in a + target session to 1 MB and maxinum of requests to 400. + + <keepalive> + <bandwidth>1048576</bandwidth> + <retrieve>400</retrieve> + </keepalive> + + +
+
+ Configuration: limit + + The limit section specifies bandwidth/pdu requests + limits for an active session. + The proxy records bandwidth/pdu requests during the last 60 seconds + (1 minute). The limit may include the + elements bandwidth, pdu, + and retrieve. The bandwidth + measures the number of bytes transferred within the last minute. + The pdu is the number of requests in the last + minute. The retrieve holds the maximum records to + be retrieved in one Present Request. + + + If a bandwidth/pdu limit is reached the proxy will postpone the + requests to the target and wait one or more seconds. The idea of the + limit is to ensure that clients that downloads hundreds or thousands of + records do not hurt other users. + + + The following sets maximum number of bytes transferred per minute to + 500Kbytes and maximum number of requests to 40. + + <limit> + <bandwidth>524288</bandwidth> + <retrieve>40</retrieve> + </limit> + + + - The LOC target is sometimes very slow. You can connect to - it using yaz-client as follows: + Typically the limits for keepalive are much higher than + those for session minute average. + +
+ +
+ Configuration: attribute + + The attribute element specifies accept or reject + or a particular attribute type, value pair. + Well-behaving targets will reject unsupported attributes on their + own. This feature is useful for targets that do not gracefully + handle unsupported attributes. + + + Attribute elements may be repeated. The proxy inspects the attribute + specifications in the order as specified in the configuration file. + When a given attribute specification matches a given attribute list + in a query, the proxy takes appropriate action (reject, accept). + + + If no attribute specifications matches the attribute list in a query, + it is accepted. + + + The attribute element has two required attributes: + type which is the Attribute Type-1 type, and + value which is the Attribute Type-1 value. + The special value/type * matches any attribute + type/value. A value may also be specified as a list with each + value separated by comma, a value may also be specified as a + list: low value - dash - high value. + + + If attribute error is given, that holds a + Bib-1 diagnostic which is sent to the client if the particular + type, value is part of a query. + + + If attribute error is not given, the attribute + type, value is accepted and passed to the backend target. + + + A target that supports use attributes 1,4, 1000 through 1003 and + no other use attributes, could use the following rules: - $ yaz-client localhost:9000/voyager - Connecting...Ok. - Sent initrequest. - Connection accepted by target. - ID : 34 - Name : Voyager LMS - Z39.50 Server - Version: 1.13 - Options: search present - Elapsed: 7.131197 - Z> f computer - Sent searchRequest. - Received SearchResponse. - Search was a success. - Number of hits: 10000 - records returned: 0 - Elapsed: 6.695174 - Z> f computer - Sent searchRequest. - Received SearchResponse. - Search was a success. - Number of hits: 10000 - records returned: 0 - Elapsed: 0.001417 + <attribute type="1" value="1,4,1000-1003"> + <attribute type="1" value="*" error="114"/> - - In this test, the second search was more than 4000 times faster - than the first, because the proxy cached the result of the first - search and noticed that the second was the same. + +
+ +
+ Configuration: syntax + + The syntax element specifies accept or reject + or a particular record syntax request from the client. + + + The syntax has one required attribute: + type which is the Preferred Record Syntax. + + + If attribute error is given, that holds a + Bib-1 diagnostic which is sent to the client if the particular + record syntax is part of a present - or search request. + + + If attribute error is not given, the record syntax + is accepted and passed to the backend target. + + + If attribute marcxml is given, the proxy will + perform MARC21 to MARCXML conversion. In this case the + type should be XML. The proxy will use + preferred record syntax USMARC/MARC21 against the backend target. + + To accept USMARC and offer MARCXML XML records but reject + all other requests the following configuration could be used: + + <proxy> + <target name="mytarget"> + <syntax type="usmarc"/> + <syntax type="xml" marcxml="1"/> + <syntax type="*" error="238"/> + </target> + </proxy> + + +
+ +
+ Configuration: target-timeout + + The element target-timeout is the child of element + target and specifies the amount in seconds before + a target session is shut down. + + + This can also be specified on the command line by using option + -T. Refer to . + +
+ +
+ Configuration: client-timeout + + The element client-timeout is the child of element + target and specifies the amount in seconds before + a client session is shut down. + + This can also be specified on the command line by using option + -i. Refer to . + +
+ +
+ Configuration: preinit + + The element preinit is the child of element + target and specifies the number of spare + connection to a target. By default no spare connection are + created by the proxy. If the proxy uses a target exclusive or + a lot, the preinit session will ensure that target sessions + have been made before the client makes a connection and will therefore + reduce the connect-init handshake dramatically. Never set this to + more than 5. + +
+ +
+ Configuration: max-clients + + The element max-clients is the child of element + proxy and specifies the total number of + allowed connections to targets (all targets). If this limit + is reached the proxy will close the least recently used connection. + + + Note, that many Unix systems impose a system on the number of + open files allowed in a single process, typically in the + range 256 (Solaris) to 1024 (Linux). + The proxy uses 2 sockets per session + a few files + for logging. As a rule of thumb, ensure that 2*max-clients + 5 + can be opened by the proxy process. + + - The YAZ command-line client, - yaz-client, - allows you to set the backend target in - the initRequest using the - -p option. For example, to connect to - Index Data's target you could use: + Using the + bash shell, you can set the limit with + ulimit -nno. + Use ulimit -a to display limits. - - yaz-client -p indexdata.dk localhost:9000/gils + +
+ +
+ Configuration: log + + The element log is the child of element + proxy and specifies what to be logged by the + proxy. + + + Specify the log file with command-line option -l. + + + The text of the log element is a sequence of + options separated by white space. See the table below: + Logging options + + + + + + Option + Description + + + + + client-apdu + + Log APDUs as reported by YAZ for the + communication between the client and the proxy. + This facility is equivalent to the APDU logging that + happens when using option -a, however + this tells the proxy to log in the same file as given + by -l. + + + + server-apdu + + Log APDUs as reported by YAZ for the + communication between the proxy and the server (backend). + + + + clients-requests + + Log a brief description about requests transferred between + the client and the proxy. The name of the request and the size + of the APDU is logged. + + + + server-requests + + Log a brief description about requests transferred between + the proxy and the server (backend). The name of the request + and the size of the APDU is logged. + + + + +
+
+ + To log communication in details between the proxy and the backend, th + following configuration could be used: + + server-apdu server-requests + +]]> - + +
+ +
+
+ Proxy Usage + + + + &yaz-proxy-ref;
+
OtherInformation Encoding + + The proxy uses the OtherInformation definition to carry + information about the target address and cookie. + + + OtherInformation ::= [201] IMPLICIT SEQUENCE OF SEQUENCE{ + category [1] IMPLICIT InfoCategory OPTIONAL, + information CHOICE{ + characterInfo [2] IMPLICIT InternationalString, + binaryInfo [3] IMPLICIT OCTET STRING, + externallyDefinedInfo [4] IMPLICIT EXTERNAL, + oid [5] IMPLICIT OBJECT IDENTIFIER}} +-- + InfoCategory ::= SEQUENCE{ + categoryTypeId [1] IMPLICIT OBJECT IDENTIFIER OPTIONAL, + categoryValue [2] IMPLICIT INTEGER} + + + The categoryTypeId is either + OID 1.2.840.10003.10.1000.81.1, 1.2.840.10003.10.1000.81.2 + for proxy target and proxy cookie respectively. The + integer element category is set to 0. + The value proxy and cookie is stored in element + characterInfo of the information + choice. + +
+