Proxy ReferenceOperating Environment
The YAZ proxy is a console program. After startup it spawns
a child process (except on Windows or if option -X is given).
The child process is the core of the proxy and it handles all
communication with clients and servers. The parent process
will restart the child process if it dies unexpectedly and report
the reason. For options for YAZ proxy,
see .
As an option, the proxy may change user identity to a less privileged
user.
Choosing the Backend Server
When the proxy receives a Z39.50 Initialize Request from a Z39.50
client, it determines the backend server by the following rules:
If the InitializeRequest PDU from the
client includes an
otherInfo
element with OID
1.2.840.10003.10.1000.81.1, then the
contents of that element specify the server to be used, in the
usual YAZ address format (typically
tcp:hostname:port)
as described in
the Addresses section of the YAZ manual.
Otherwise, the Proxy uses the default server, if one was
specified in the proxy configuration file. See
.
Otherwise, the Proxy uses the default server, if one was
specified on the command-line with the -t
option.
Otherwise, the proxy closes the connection with
the client.
If the proxy receives an SRW/SRU request, the following rules are used.
If default target has Explain information with a
database that matches the path of the
HTTP request of SRW/SRU that backend server is used for
SRW/SRU operation.
Otherwise the service will return HTTP 404 (Not found).
We know it is stupid to only check for explain in default target.
It means that it is only possible to offer one SRW/SRU server.
We expect to improve that in the next version of the YAZ proxy.
Keep-alive Facility
The keep-alive is a facility where the proxy keeps the connection to the
backend server - even if the client closes the connection to the proxy.
If a new or another client connects to the proxy again and requests the
same backend it will be reassigned to this backend. In this case, the
proxy sends an initialize response directly to the client and an
initialize handshake with the backend is omitted.
When a client reconnects, query and record caching works better, if the
proxy assigns it to the same backend as before. And the result set
(if any) is re-used. To achieve this, Index Data defined a session
cookie which identifies the backend session.
The cookie is defined by the client and is sent as part of the
Initialize Request and passed in an
otherInfo
element with OID 1.2.840.10003.10.1000.81.2.
Clients that do not send a cookie as part of the initialize request
may still better performance, since the init handshake is saved.
Refer to on how to setup
configuration parameters for keepalive.
Query Caching
Simple stateless clients often send identical Z39.50 searches
in a relatively short period of time (e.g. in order to produce a
results-list page, the next page,
a single full-record, etc). And for many targets, it's
much more expensive to produce a new result set than to
reuse an existing one.
The proxy tries to solve that by remembering the last query for each
backend target, so that if an identical query is received next, it
is turned into Present Requests rather than new Search Requests.
In a future we release will will probably allows for
an arbitrary-sized cache for targets supporting named result sets.
You can enable/disable query caching using option -o.
Record Caching
As an option, the proxy may also cache result set records for the
last search.
The proxy takes into account the Record Syntax and CompSpec.
The CompSpec includes simple element set names as well.
By default the cache is 200000 bytes per session.
Query Validation
The Proxy may also be configured to trap particular attributes in
Type-1 queries and send Bib-1 diagnostics back to the client without
even consulting the backend target. This facility may be useful if
a target does not properly issue diagnostics when unsupported attributes
are send to it.
Record Syntax Validation
The proxy may be configured to accept, reject or convert records.
When accepted, the target passes search/present requests to the
backend target under the assumption that the target can honor the
request (In fact it may not do that). When a record is rejected because
the record syntax is "unsupported" the proxy returns a diagnostic to the
client. Finally, the proxy may convert records.
The proxy can convert from MARC to MARCXML and thereby offer an
XML version of any MARC record as long as it is ISO2709 encoded.
If the proxy is compiled with libXSLT support it can also
perform XSLT on XML.
Other Optimizations
We've had some plans to support global caching of result set records,
but this has not yet been implemented.
Proxy Configuration File
The Proxy may read a configuration file using option
-c followed by the filename of a config file.
The config file is XML based. The YAZ proxy must be compiled
with libxml2 and
libXSLT support in
order for the config file facility to be enabled.
See for an XML schema
for the configuration.
To check for a config file to be well-formed, the yazproxy may
be invoked without specifying a listening port, i.e.
yazproxy -c myconfig.xml
If this does not produce errors, the file is well-formed.
Proxy Configuration Header
The proxy config file must have a root element called
proxy and scoped within namespace
xmlns="http://indexdata.dk/yazproxy/schema/0.8/.
All information except an optional XML header must be stored
within the proxy element.
<?xml version="1.0"?>
<proxy xmlns="http://indexdata.dk/yazproxy/schema/0.8/">
<!-- content here .. -->
</proxy>
target
The element target which may be repeated zero
or more times with parent element proxy contains
information about each backend target.
The target element have two attributes:
name which holds the logical name of the backend
target (required) and default (optional) which
(when given) specifies that the backend target is the default target -
equivalent to command line option -t.
<?xml version="1.0"?>
<proxy xmlns="http://indexdata.dk/yazproxy/schema/0.8/">
<target name="server1" default="1">
<!-- description of server1 .. -->
</target>
<target name="server2">
<!-- description of server2 .. -->
</target>
</proxy>
url
The url which may be repeated one or more times
should be the child of the target element.
The CDATA of url is the Z-URL of the backend.
Multiple url element may be used. In that case, then
a client initiates a session, the proxy chooses the URL with the lowest
number of active sessions, thereby distributing the load. It is
assumed that each URL represents the same database (data).
target-timeout
The element target-timeout is the child of element
target and specifies the amount in seconds before
a target session is shut down.
This can also be specified on the command line by using option
-T. Refer to OPTIONS in .
client-timeout
The element client-timeout is the child of element
target and specifies the amount in seconds before
a client session is shut down.
This can also be specified on the command line by using option
-i. Refer to OPTIONS in .
keepaliveThe keepalive element holds information about
the keepalive Z39.50 sessions. Keepalive sessions are proxy-to-backend
sessions that is no longer associated with a client session.
The keepalive element which is the child of
the targetholds two elements:
bandwidth and pdu.
The bandwidth is the maximum total bytes
transferred to/from the target. If a target session exceeds this
limit, it is shut down (and no longer kept alive).
The pdu is the maximum number of requests sent
to the target. If a target session exceeds this limit, it is
shut down. The idea of these two limits is that avoid very long
sessions that use resources in a backend (that leaks!).
The following sets maximum number of bytes transferred in a
target session to 1 MB and maximum of requests to 400.
<keepalive>
<bandwidth>1048576</bandwidth>
<pdu>400</pdu>
</keepalive>
limit
The limit section specifies bandwidth/pdu requests
limits for an active session.
The proxy records bandwidth/pdu requests during the last 60 seconds
(1 minute). The limit may include the
elements bandwidth, pdu,
and retrieve. The bandwidth
measures the number of bytes transferred within the last minute.
The pdu is the number of requests in the last
minute. The retrieve holds the maximum records to
be retrieved in one Present Request.
If a bandwidth/pdu limit is reached the proxy will postpone the
requests to the target and wait one or more seconds. The idea of the
limit is to ensure that clients that downloads hundreds or thousands of
records do not hurt other users.
The following sets maximum number of bytes transferred per minute to
500Kbytes and maximum number of requests to 40.
<limit>
<bandwidth>524288</bandwidth>
<retrieve>40</retrieve>
</limit>
Typically the limits for keepalive are much higher than
those for session minute average.
attribute
The attribute element specifies accept or reject
or a particular attribute type, value pair.
Well-behaving targets will reject unsupported attributes on their
own. This feature is useful for targets that do not gracefully
handle unsupported attributes.
Attribute elements may be repeated. The proxy inspects the attribute
specifications in the order as specified in the configuration file.
When a given attribute specification matches a given attribute list
in a query, the proxy takes appropriate action (reject, accept).
If no attribute specifications matches the attribute list in a query,
it is accepted.
The attribute element has two required attributes:
type which is the Attribute Type-1 type, and
value which is the Attribute Type-1 value.
The special value/type * matches any attribute
type/value. A value may also be specified as a list with each
value separated by comma, a value may also be specified as a
list: low value - dash - high value.
If attribute error is given, that holds a
Bib-1 diagnostic which is sent to the client if the particular
type, value is part of a query.
If attribute error is not given, the attribute
type, value is accepted and passed to the backend target.
A target that supports use attributes 1,4, 1000 through 1003 and
no other use attributes, could use the following rules:
<attribute type="1" value="1,4,1000-1003"/>
<attribute type="1" value="*" error="114"/>
syntax
The syntax element specifies accept or reject
or a particular record syntax request from the client.
The syntax has one required attribute:
type which is the Preferred Record Syntax.
If attribute error is given, that holds a
Bib-1 diagnostic which is sent to the client if the particular
record syntax is part of a present - or search request.
If attribute error is not given, the record syntax
is accepted and passed to the backend target.
If attribute marcxml is given, the proxy will
perform MARC21 to MARCXML conversion. In this case the
type should be XML. The proxy will use
preferred record syntax USMARC/MARC21 or backendtype
(if given) against the backend target.
If attribute backendtype is given, that holds the
record syntax to be transmitted to backend.
If attribute stylesheet is given, the proxy
will convert XML record from server via XSLT. It is important
that the content from server is XML. If used in conjunction with
attribute marcxml the MARC to MARCXML conversion
takes place before the XSLT conversion takes place.
If attribute identifier is given that is the
SRW/SRU record schema identifier for the resulting output record (after
MARCXML and/or XSLT conversion).
If sub element title is given (as child element
of syntax, then that is the official SRW/SRU
name of the resulting record schema.
If sub element name is given that is an alias
for the record schema identifier. Multiple names
may be specified.
MARCXML conversionTo accept USMARC and offer MARCXML XML plus Dublin Core (via
XSLT conversion) but the following configuration could be used:
<proxy>
<target name="mytarget">
..
<syntax type="usmarc"/>
<syntax type="xml" marcxml="1"
identifier="info:srw/schema/1/marcxml-v1.1"
<title>MARCXML<title>
<name>marcxml<name>
</syntax>
<syntax type="xml" marcxml="1" stylesheet="MARC21slim2SRWDC.xsl"
identifier="info:srw/schema/1/dc-v1.1">
<title>Dublin Core<title>
<name>dc<name>
</syntax>
<syntax type="*" error="238"/>
..
</target>
</proxy>
explain
The explain element includes Explain information
for SRW/SRU about the server in the target section. This
information must have a serverInfo element
with a database that this target must be available as (URL path).
For example,
myhost.org8000mydatabase
]]>
In the above case, the SRW/SRU service is available as
http://myhost.org:8000/mydatabase.
cql2rpn
The content of the cql2rpn element specifies
the path from the working directory to a CQL-to-RPN conversion
file for the server in the target section. This element
is required for SRW/SRU searches to operate against Z39.50
servers that don't support CQL. Most Z39.50 servers only support
Type-1/RPN so this is usually required.
See YAZ documentation for more information about the
CQL
to PQF conversion. See also the
pqf.properties in the etc
(or prefix/share/yazproxy)
directory of the YAZ proxy distribution.
preinit
The element preinit is the child of element
target and specifies the number of spare
connection to a target. By default no spare connection are
created by the proxy. If the proxy uses a target exclusive or
a lot, the preinit session will ensure that target sessions
have been made before the client makes a connection and will therefore
reduce the connect-init handshake dramatically. Never set this to
more than 5.
max-clients
The element max-clients is the child of element
proxy and specifies the total number of
allowed connections to targets (all targets). If this limit
is reached the proxy will close the least recently used connection.
Note, that many Unix systems impose a system on the number of
open files allowed in a single process, typically in the
range 256 (Solaris) to 1024 (Linux).
The proxy uses 2 sockets per session + a few files
for logging. As a rule of thumb, ensure that 2*max-clients + 5
can be opened by the proxy process.
Using the
bash shell, you can set the limit with
ulimit -nno.
Use ulimit -a to display limits.
log
The element log is the child of element
proxy and specifies what to be logged by the
proxy.
Specify the log file with command-line option -l.
The text of the log element is a sequence of
options separated by white space. See the table below:
Logging optionsOptionDescriptionclient-apdu
Log APDUs as reported by YAZ for the
communication between the client and the proxy.
This facility is equivalent to the APDU logging that
happens when using option -a, however
this tells the proxy to log in the same file as given
by -l.
server-apdu
Log APDUs as reported by YAZ for the
communication between the proxy and the server (backend).
clients-requests
Log a brief description about requests transferred between
the client and the proxy. The name of the request and the size
of the APDU is logged.
server-requests
Log a brief description about requests transferred between
the proxy and the server (backend). The name of the request
and the size of the APDU is logged.
To log communication in details between the proxy and the backend, th
following configuration could be used:
server-apdu server-requests
]]>
Proxy Usage (man page)
&yaz-proxy-ref;
OtherInformation Encoding
The proxy uses the OtherInformation definition to carry
information about the target address and cookie.
OtherInformation ::= [201] IMPLICIT SEQUENCE OF SEQUENCE{
category [1] IMPLICIT InfoCategory OPTIONAL,
information CHOICE{
characterInfo [2] IMPLICIT InternationalString,
binaryInfo [3] IMPLICIT OCTET STRING,
externallyDefinedInfo [4] IMPLICIT EXTERNAL,
oid [5] IMPLICIT OBJECT IDENTIFIER}}
--
InfoCategory ::= SEQUENCE{
categoryTypeId [1] IMPLICIT OBJECT IDENTIFIER OPTIONAL,
categoryValue [2] IMPLICIT INTEGER}
The categoryTypeId is either
OID 1.2.840.10003.10.1000.81.1, 1.2.840.10003.10.1000.81.2
for proxy target and proxy cookie respectively. The
integer element category is set to 0.
The value proxy and cookie is stored in element
characterInfo of the information
choice.
YAZ Proxy Configuration Schema
Here an XML Schema for the YAZ proxy configuration file.
The schema, yazproxy.xsd is located in sub
directory etc of the distribution.
]]>