X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=lib%2FZOOM.pod;h=20fb9a1d2e4b07d7dc614dc418511d62631ed3d3;hb=150f46f520c95d48bc52eacae074639b1718a8fc;hp=83a55e302eb0cd3a730bdb8b07391e50883a3e6f;hpb=4c4359d9cf8b2b16ccb5254a24fc1ab1eb5cab3e;p=ZOOM-Perl-moved-to-github.git diff --git a/lib/ZOOM.pod b/lib/ZOOM.pod index 83a55e3..20fb9a1 100644 --- a/lib/ZOOM.pod +++ b/lib/ZOOM.pod @@ -1,5 +1,3 @@ -# $Id: ZOOM.pod,v 1.2 2005-11-15 17:23:45 mike Exp $ - use strict; use warnings; @@ -11,7 +9,8 @@ ZOOM - Perl extension implementing the ZOOM API for Information Retrieval use ZOOM; eval { - $conn = new ZOOM::Connection($host, $port) + $conn = new ZOOM::Connection($host, $port, + databaseName => "mydb"); $conn->option(preferredRecordSyntax => "usmarc"); $rs = $conn->search_pqf('@attr 1=4 dinosaur'); $n = $rs->size(); @@ -24,7 +23,7 @@ ZOOM - Perl extension implementing the ZOOM API for Information Retrieval =head1 DESCRIPTION This module provides a nice, Perlish implementation of the ZOOM -Abstract API described at http://zoom.z3950.org/api/ +Abstract API described and documented at http://zoom.z3950.org/api/ the ZOOM module is implemented as a set of thin classes on top of the non-OO functions provided by this distribution's C @@ -38,8 +37,8 @@ API such as ZOOM is that all implementations should be compatible anyway; but knowing that the same code is running is reassuring.) The ZOOM module provides two enumerations (C and -C), a single utility function C in the C -package itself, and eight classes: +C), three utility functions C, C +and C in the C package itself, and eight classes: C, C, C, @@ -49,20 +48,1517 @@ C, C and C. -Of these, the Query class is abstract, and has two concrete +Of these, the Query class is abstract, and has four concrete subclasses: -C +C, +C, +C and -C. +C. +Finally, it also provides a +C +module which supplies a useful general-purpose logging facility. Many useful ZOOM applications can be built using only the Connection, -ResultSet and Record classes, as in the example code-snippet above. +ResultSet, Record and Exception classes, as in the example +code-snippet above. + +A typical application will begin by creating an Connection object, +then using that to execute searches that yield ResultSet objects, then +fetching records from the result-sets to yield Record objects. If an +error occurs, an Exception object is thrown and can be dealt with. + +More sophisticated applications might also browse the server's indexes +to create a ScanSet, from which indexed terms may be retrieved; others +might send ``Extended Services'' Packages to the server, to achieve +non-standard tasks such as database creation and record update. +Searching using a query syntax other than PQF can be done using an +query object of one of the Query subclasses. Finally, sets of options +may be manipulated independently of the objects they are associated +with using an Options object. + +In general, method calls throw an exception if anything goes wrong, so +you don't need to test for success after each call. See the section +below on the Exception class for details. + +=head1 UTILITY FUNCTIONS + +=head2 ZOOM::diag_str() + + $msg = ZOOM::diag_str(ZOOM::Error::INVALID_QUERY); + +Returns a human-readable English-language string corresponding to the +error code that is its own parameter. This works for any error-code +returned from +C, +C +or +C, +irrespective of whether it is a member of the C +enumeration or drawn from the BIB-1 diagnostic set. + +=head2 ZOOM::diag_srw_str() + + $msg = ZOOM::diag_srw_str(18); + +Returns a human-readable English-language string corresponding to the +specified SRW error code. + +=head2 ZOOM::event_str() + + $msg = ZOOM::event_str(ZOOM::Event::RECV_APDU); + +Returns a human-readable English-language string corresponding to the +event code that is its own parameter. This works for any value of the +C enumeration. + +=head2 ZOOM::event() + + $connsRef = [ $conn1, $conn2, $conn3 ]; + $which = ZOOM::event($connsRef); + $ev = $connsRef->[$which-1]->last_event() + if ($which != 0); + +Used only in complex asynchronous applications, this function takes a +reference to a list of Connection objects, waits until an event +occurs on any one of them, and returns an integer indicating which of +the connections it occurred on. The return value is a 1-based index +into the list; 0 is returned if no event occurs within the longest +timeout specified by the C options of all the connections. + +See the section below on asynchronous applications. + +=head1 CLASSES + +The eight ZOOM classes are described here in ``sensible order'': +first, the four commonly used classes, in the he order that they will +tend to be used in most programs (Connection, ResultSet, Record, +Exception); then the four more esoteric classes in descending order of +how often they are needed. + +With the exception of the Options class, which is an extension to the +ZOOM model, the introduction to each class includes a link to the +relevant section of the ZOOM Abstract API. + +=head2 ZOOM::Connection + + $conn = new ZOOM::Connection("indexdata.dk:210/gils"); + print("server is '", $conn->option("serverImplementationName"), "'\n"); + $conn->option(preferredRecordSyntax => "usmarc"); + $rs = $conn->search_pqf('@attr 1=4 mineral'); + $ss = $conn->scan('@attr 1=1003 a'); + if ($conn->errcode() != 0) { + die("somthing went wrong: " . $conn->errmsg()) + } + $conn->destroy() + +This class represents a connection to an information retrieval server, +using an IR protocol such as ANSI/NISO Z39.50, SRW (the +Search/Retrieve Webservice), SRU (the Search/Retrieve URL) or +OpenSearch. Not all of these protocols require a low-level connection +to be maintained, but the Connection object nevertheless provides a +location for the necessary cache of configuration and state +information, as well as a uniform API to the connection-oriented +facilities (searching, index browsing, etc.), provided by these +protocols. + +See the description of the C class in the ZOOM Abstract +API at +http://zoom.z3950.org/api/zoom-current.html#3.2 + +=head3 Methods + +=head4 new() + + $conn = new ZOOM::Connection("indexdata.dk", 210); + $conn = new ZOOM::Connection("indexdata.dk:210/gils"); + $conn = new ZOOM::Connection("tcp:indexdata.dk:210/gils"); + $conn = new ZOOM::Connection("http:indexdata.dk:210/gils"); + $conn = new ZOOM::Connection("indexdata.dk", 210, + databaseName => "mydb", + preferredRecordSyntax => "marc"); + +Creates a new Connection object, and immediately connects it to the +specified server. If you want to make a new Connection object but +delay forging the connection, use the C and C +methods instead. + +This constructor can be called with two arguments or a single +argument. In the former case, the arguments are the name and port +number of the Z39.50 server to connect to; in the latter case, the +single argument is a YAZ service-specifier string of the form + +When the two-option form is used (which may be done using a vacuous +second argument of zero), any number of additional argument pairs may +be provided, which are interpreted as key-value pairs to be set as +options after the Connection object is created but before it is +connected to the server. This is a convenient way to set options, +including those that must be set before connecting such as +authentication tokens. + +The server-name string is of the form: + +=over 4 + +=item + +[I:]I[:I][/I] + +=back + +In which the I and I parts are as in the two-argument +form, the I if provided specifies the name of the +database to be used in subsequent searches on this connection, and the +optional I (default C) indicates what protocol should be +used. At present, the following schemes are supported: + +=over 4 + +=item tcp + +Z39.50 connection. + +=item ssl + +Z39.50 connection encrypted using SSL (Secure Sockets Layer). Not +many servers support this, but Index Data's Zebra is one that does. + +=item unix + +Z39.50 connection on a Unix-domain (local) socket, in which case the +I portion of the string is instead used as a filename in the +local filesystem. + +=item http + +SRU connection over HTTP. + +=back + +If the C scheme is used, the particular SRU flavour to be used +may be specified by the C option, which takes the following +values: + +=over 4 + +=item soap + +SRU over SOAP (i.e. what used to be called SRW). +This is the default. + +=item get + +"SRU Classic" (i.e. SRU over HTTP GET). + +=item post + +SRU over HTTP POST. + +=back + +If an error occurs, an exception is thrown. This may indicate a +networking problem (e.g. the host is not found or unreachable), or a +protocol-level problem (e.g. a Z39.50 server rejected the Init +request). + +=head4 create() / connect() + + $options = new ZOOM::Options(); + $options->option(implementationName => "my client"); + $options->option(implementationId => 12345); + $conn = create ZOOM::Connection($options) + # or + $conn = create ZOOM::Connection(implementationName => "my client", + implementationId => 12345); + + $conn->connect($host, 0); + +The usual Connection constructor, C brings a new object into +existence and forges the connection to the server all in one +operation, which is often what you want. For applications that need +more control, however, these two methods separate the two steps, +allowing additional steps in between such as the setting of options. + +C creates and returns a new Connection object, which is +I connected to any server. It may be passed an options block, of +type C (see below), into which options may be set +before or after the creation of the Connection. Alternatively and +equivalently, C may be passed a list of key-value option +pairs directly. The connection to the server may then be forged by +the C method, which accepts hostname and port arguments +like those of the C constructor. + +=head4 error_x() / errcode() / errmsg() / addinfo() / diagset() + + ($errcode, $errmsg, $addinfo, $diagset) = $conn->error_x(); + $errcode = $conn->errcode(); + $errmsg = $conn->errmsg(); + $addinfo = $conn->addinfo(); + $diagset = $conn->diagset(); + +These methods may be used to obtain information about the last error +to have occurred on a connection - although typically they will not +been used, as the same information is available through the +C that is thrown when the error occurs. The +C, +C, +C +and +C +methods each return one element of the diagnostic, and +C +returns all four at once. + +See the C for the interpretation of these elements. + +=head4 exception() + + die $conn->exception(); + +C returns the same information as C in the +form of a C object which may be thrown or rendered. +If no error occurred on the connection, then C returns an +undefined value. + +=head4 check() + + $conn->check(); + +Checks whether an error is pending on the connection, and throw a +C object if so. Since errors are thrown as they +occur for synchronous connections, there is no need ever to call this +except in asynchronous applications. + +=head4 option() / option_binary() + + print("server is '", $conn->option("serverImplementationName"), "'\n"); + $conn->option(preferredRecordSyntax => "usmarc"); + $conn->option_binary(iconBlob => "foo\0bar"); + die if length($conn->option_binary("iconBlob") != 7); + +Objects of the Connection, ResultSet, ScanSet and Package classes +carry with them a set of named options which affect their behaviour in +certain ways. See the ZOOM-C options documentation for details: + +Connection options are listed at +http://indexdata.com/yaz/doc/zoom.tkl#zoom.connections + +These options are set and fetched using the C method, which +may be called with either one or two arguments. In the two-argument +form, the option named by the first argument is set to the value of +the second argument, and its old value is returned. In the +one-argument form, the value of the specified option is returned. + +For historical reasons, option values are not binary-clean, so that a +value containing a NUL byte will be returned in truncated form. The +C method behaves identically to C except +that it is binary-clean, so that values containing NUL bytes are set +and returned correctly. + +=head4 search() / search_pqf() + + $rs = $conn->search(new ZOOM::Query::CQL('title=dinosaur')); + # The next two lines are equivalent + $rs = $conn->search(new ZOOM::Query::PQF('@attr 1=4 dinosaur')); + $rs = $conn->search_pqf('@attr 1=4 dinosaur'); + +The principal purpose of a search-and-retrieve protocol is searching +(and, er, retrieval), so the principal method used on a Connection +object is C. It accepts a single argument, a C +object (or, more precisely, an object of a subclass of this class); +and it creates and returns a new ResultSet object representing the set +of records resulting from the search. + +Since queries using PQF (Prefix Query Format) are so common, we make +them a special case by providing a C method. This is +identical to C except that it accepts a string containing +the query rather than an object, thereby obviating the need to create +a C object. See the documentation of that class for +information about PQF. + +=head4 scan() / scan_pqf() + + $rs = $conn->scan(new ZOOM::Query::CQL('title=dinosaur')); + # The next two lines are equivalent + $rs = $conn->scan(new ZOOM::Query::PQF('@attr 1=4 dinosaur')); + $rs = $conn->scan_pqf('@attr 1=4 dinosaur'); + +Many Z39.50 servers allow you to browse their indexes to find terms to +search for. This is done using the C method, which creates and +returns a new ScanSet object representing the set of terms resulting +from the scan. + +C takes a single argument, but it has to work hard: it +specifies both what index to scan for terms, and where in the index to +start scanning. What's more, the specification of what index to scan +includes multiple facets, such as what database fields it's an index +of (author, subject, title, etc.) and whether to scan for whole fields +or single words (e.g. the title ``I'', or the +four words ``Back'', ``Empire'', ``Strikes'' and ``The'', interleaved +with words from other titles in the same index. + +All of this is done by using a Query object representing a query of a +single term as the C argument. The attributes associated with +the term indicate which index is to be used, and the term itself +indicates the point in the index at which to start the scan. For +example, if the argument is the query C<@attr 1=4 fish>, then + +=over 4 + +=item @attr 1=4 + +This is the BIB-1 attribute with type 1 (meaning access-point, which +specifies an index), and type 4 (which means ``title''). So the scan +is in the title index. + +=item fish + +Start the scan from the lexicographically earliest term that is equal +to or falls after ``fish''. + +=back + +The argument C<@attr 1=4 @attr 6=3 fish> would behave similarly; but +the BIB-1 attribute 6=3 mean completeness=``complete field'', so the +scan would be for complete titles rather than for words occurring in +titles. + +This takes a bit of getting used to. + +The behaviour is C is affected by the following options, which +may be set on the Connection through which the scan is done: + +=over 4 + +=item number [default: 10] + +Indicates how many terms should be returned in the ScanSet. The +number actually returned may be less, if the start-point is near the +end of the index, but will not be greater. + +=item position [default: 1] + +A 1-based index specifying where in the returned list of terms the +seed-term should appear. By default it should be the first term +returned, but C may be set, for example, to zero (requesting +the next terms I the seed-term), or to the same value as +C (requesting the index terms I the seed term). + +=item stepSize [default: 0] + +An integer indicating how many indexed terms are to be skipped between +each one returned in the ScanSet. By default, no terms are skipped, +but overriding this can be useful to get a high-level overview of the +index. + +Since scans using PQF (Prefix Query Format) are so common, we make +them a special case by providing a C method. This is +identical to C except that it accepts a string containing the +query rather than an object, thereby obviating the need to create a +C object. + +=back + +=head4 package() + + $p = $conn->package(); + $o = new ZOOM::Options(); + $o->option(databaseName => "newdb"); + $p = $conn->package($o); + +Creates and returns a new C, to be used in invoking an +Extended Service. An options block may optionally be passed in. See +the C documentation. + +=head4 last_event() + + if ($conn->last_event() == ZOOM::Event::CONNECT) { + print "Connected!\n"; + } + +Returns a C enumerated value indicating the type of the +last event that occurred on the connection. This is used only in +complex asynchronous applications - see the sections below on the +C enumeration and asynchronous applications. + +=head4 destroy() + + $conn->destroy() + +Destroys a Connection object, tearing down any low-level connection +associated with it and freeing its resources. It is an error to reuse +a Connection that has been Ced. + +=head2 ZOOM::ResultSet + + $rs = $conn->search_pqf('@attr 1=4 mineral'); + $n = $rs->size(); + for $i (1 .. $n) { + $rec = $rs->record($i-1); + print $rec->render(); + } + +A ResultSet object represents the set of zero or more records +resulting from a search, and is the means whereby these records can be +retrieved. A ResultSet object may maintain client side cache or some, +less, none, all or more of the server's records: in general, this is +supposed to an implementaton detail of no interest to a typical +application, although more sophisticated applications do have +facilities for messing with the cache. Most applications will only +need the C, C and C methods. + +There is no C method nor any other explicit constructor. The +only way to create a new ResultSet is by using C (or +C) on a Connection. + +See the description of the C class in the ZOOM Abstract +API at +http://zoom.z3950.org/api/zoom-current.html#3.4 + +=head3 Methods + +=head4 option() + + $rs->option(elementSetName => "f"); + +Allows options to be set into, and read from, a ResultSet, just like +the Connection class's C method. There is no +C method for ResultSet objects. + +ResultSet options are listed at +http://indexdata.com/yaz/doc/zoom.resultsets.tkl + +=head4 size() + + print "Found ", $rs->size(), " records\n"; + +Returns the number of records in the result set. + +=head4 record() / record_immediate() + + $rec = $rs->record(0); + $rec2 = $rs->record_immediate(0); + $rec3 = $rs->record_immediate(1) + or print "second record wasn't in cache\n"; + +The C method returns a C object representing +a record from result-set, whose position is indicated by the argument +passed in. This is a zero-based index, so that legitimate values +range from zero to C<$rs-Esize()-1>. + +The C API is identical, but it never invokes a +network operation, merely returning the record from the ResultSet's +cache if it's already there, or an undefined value otherwise. So if +you use this method, B. + +=head4 records() + + $rs->records(0, 10, 0); + for $i (0..10) { + print $rs->record_immediate($i)->render(); + } + + @nextseven = $rs->records(10, 7, 1); + +The C method only fetches records from the cache, +whereas C fetches them from the server if they have not +already been cached; but the ZOOM module has to guess what the most +efficient strategy for this is. It might fetch each record, alone +when asked for: that's optimal in an application that's only +interested in the top hit from each search, but pessimal for one that +wants to display a whole list of results. Conversely, the software's +strategy might be always to ask for blocks of a twenty records: +that's great for assembling long lists of things, but wasteful when +only one record is wanted. The problem is that the ZOOM module can't +tell, when you call C<$rs-Erecord()>, what your intention is. + +But you can tell it. The C method fetches a sequence of +records, all in one go. It takes three arguments: the first is the +zero-based index of the first record in the sequence, the second is +the number of records to fetch, and the third is a boolean indication +of whether or not to return the retrieved records as well as adding +them to the cache. (You can always pass 1 for this if you like, and +Perl will discard the unused return value, but there is a small +efficiency gain to be had by passing 0.) + +Once the records have been retrieved from the server +(i.e. C has completed without throwing an exception), they +can be fetched much more efficiently using C - or +C, which is then guaranteed to succeed. + +=head4 cache_reset() + + $rs->cache_reset() + +Resets the ResultSet's record cache, so that subsequent invocations of +C will fail. I struggle to imagine a real +scenario where you'd want to do this. + +=head4 sort() + + if ($rs->sort("yaz", "1=4 >i 1=21 >s") < 0) { + die "sort failed"; + } + +Sorts the ResultSet in place (discarding any cached records, as they +will in general be sorted into a different position). There are two +arguments: the first is a string indicating the type of the +sort-specification, and the second is the specification itself. + +The C method returns 0 on success, or -1 if the +sort-specification is invalid. + +At present, the only supported sort-specification type is C. +Such a specification consists of a space-separated sequence of keys, +each of which itself consists of two space-separated words (so that +the total number of words in the sort-specification is even). The two +words making up each key are a field and a set of flags. The field +can take one of two forms: if it contains an C<=> sign, then it is a +BIB-1 I=I pair specifying which field to sort +(e.g. C<1=4> for a title sort); otherwise it is sent for the server to +interpret as best it can. The word of flags is made up from one or +more of the following: C for case sensitive, C for case +insensitive; C<<> for ascending order and C> for descending +order. + +For example, the sort-specification in the code-fragment above will +sort the records in C<$rs> case-insensitively in descending order of +title, with records having equivalent titles sorted case-sensitively +in ascending order of subject. (The BIB-1 access points 4 and 21 +represent title and subject respectively.) + +=head4 destroy() + + $rs->destroy() + +Destroys a ResultSet object, freeing its resources. It is an error to +reuse a ResultSet that has been Ced. + +=head2 ZOOM::Record + + $rec = $rs->record($i); + print $rec->render(); + $raw = $rec->raw(); + $marc = new_from_usmarc MARC::Record($raw); + print "Record title is: ", $marc->title(), "\n"; + +A Record object represents a record that has been retrived from the +server. + +There is no C method nor any other explicit constructor. The +only way to create a new Record is by using C (or +C, or C) on a ResultSet. + +In general, records are ``owned'' by their result-sets that they were +retrieved from, so they do not have to be explicitly memory-managed: +they are deallocated (and therefore can no longer be used) when the +result-set is destroyed. + +See the description of the C class in the ZOOM Abstract +API at +http://zoom.z3950.org/api/zoom-current.html#3.5 + +=head3 Methods + +=head4 error() / exception() + + if ($rec->error()) { + my($code, $msg, $addinfo, $dset) = $rec->error(); + print "error $code, $msg ($addinfo) from $dset set\n"; + die $rec->exception(); + } + +These functions test for surrogate diagnostics associated with a +record: that is, errors pertaining to a particular record rather than +to the fetch-some-records operation as a whole. (The latter are known +in Z39.50 as non-surrogate diagnostics, and are reported as exceptions +thrown by searches.) If a particular record can't be obtained - for +example, because it is not available in the requested record syntax - +then the record object obtained from the result-set, when interrogated +with these functions, will report the error. + +C returns the error-code, a human-readable message, +additional information and the name of the diagnostic set that the +error is from. When called in a scalar context, it just returns the +error-code. Since error 0 means "no error", it can be used as a +boolean has-there-been-an-error indicator. + +C returns the same information in the form of a +C object which may be thrown or rendered. If no +error occurred on the record, then C returns an undefined +value. + +=head4 render() + + print $rec->render(); + print $rec->render("charset=latin1,utf8"); + +Returns a human-readable representation of the record. Beyond that, +no promises are made: careful programs should not make assumptions +about the format of the returned string. + +If the optional argument is provided, then it is interpreted as in the +C method (q.v.) + +This method is useful mostly for debugging. + +=head4 raw() + + use MARC::Record; + $raw = $rec->raw(); + $marc = new_from_usmarc MARC::Record($raw); + $trans = $rec->render("charset=latin1,utf8"); + +Returns an opaque blob of data that is the raw form of the record. +Exactly what this is, and what you can do with it, varies depending on +the record-syntax. For example, XML records will be returned as, +well, XML; MARC records will be returned as ISO 2709-encoded blocks +that can be decoded by software such as the fine C +module; GRS-1 record will be ... gosh, what an interesting question. +But no-one uses GRS-1 any more, do they? + +If the optional argument is provided, then it is interpreted as in the +C method (q.v.) + +=head4 get() + + $raw = $rec->get("raw"); + $rendered = $rec->get("render"); + $trans = $rec->get("render;charset=latin1,utf8"); + $trans = $rec->get("render", "charset=latin1,utf8"); + +This is the underlying method used by C and C, and +which in turn delegates to the C function of the +underlying ZOOM-C library. Most applications will find it more +natural to work with C and C. + +C may be called with either one or two arguments. The +two-argument form is syntactic sugar: the two arguments are simply +joined with a semi-colon to make a single argument, so the third and +fourth example invocations above are equivalent. The second argument +(or portion of the first argument following the semicolon) is used in +the C argument of C, as described in +http://www.indexdata.com/yaz/doc/zoom.records.tkl +This is useful primarily for invoking the character-set transformation +- in the examples above, from ISO Latin-1 to UTF-8 Unicode. + +=head4 clone() / destroy() + + $rec = $rs->record($i); + $newrec = $rec->clone(); + $rs->destroy(); + print $newrec->render(); + $newrec->destroy(); + +Usually, it's convenient that Record objects are owned by their +ResultSets and go away when the ResultSet is destroyed; but +occasionally you need a Record to outlive its parent and destroy it +later, explicitly. To do this, C the record, keep the new +Record object that is returned, and C it when it's no +longer needed. This is B situation in which a Record needs to +be destroyed. + +=head2 ZOOM::Exception + +In general, method calls throw an exception (of class +C) if anything goes wrong, so you don't need to test +for success after each call. Exceptions are caught by enclosing the +main code in an C block and checking C<$@> on exit from that +block, as in the code-sample above. + +There are a small number of exceptions to this rule: the three +record-fetching methods in the C class, +C, +C, +and +C +can all return undefined values for legitimate reasons, under +circumstances that do not merit throwing an exception. For this +reason, the return values of these methods should be checked. See the +individual methods' documentation for details. + +An exception carries the following pieces of information: + +=over 4 + +=item error-code + +A numeric code that specifies the type of error. This can be checked +for equality with known values, so that intelligent applications can +take appropriate action. + +=item error-message + +A human-readable message corresponding with the code. This can be +shown to users, but its value should not be tested, as it could vary +in different versions or under different locales. + +=item additional information [optional] + +A string containing information specific to the error-code. For +example, when the error-code is the BIB-1 diagnostic 109 ("Database +unavailable"), the additional information is the name of the database +that the application tried to use. For some error-codes, there is no +additional information at all; for some others, the additional +information is undefined and may just be an human-readable string. + +=item diagnostic set [optional] +A short string specifying the diagnostic set from which the error-code +was drawn: for example, C for a ZOOM-specific error such as +C ("out of memory"), and C for a Z39.50 +error-code drawn from the BIB-1 diagnostic set. + +=back + +In theory, the error-code should be interpreted in the context of the +diagnostic set from which it is drawn; in practice, nearly all errors +are from either the ZOOM or BIB-1 diagnostic sets, and the codes in +those sets have been chosen so as not to overlap, so the diagnostic +set can usually be ignored. + +See the description of the C class in the ZOOM Abstract +API at +http://zoom.z3950.org/api/zoom-current.html#3.7 + +=head3 Methods + +=head4 new() + + die new ZOOM::Exception($errcode, $errmsg, $addinfo, $diagset); + +Creates and returns a new Exception object with the specified +error-code, error-message, additional information and diagnostic set. +Applications will not in general need to use this, but may find it +useful to simulate ZOOM exceptions. As is usual with Perl, exceptions +are thrown using C. + +=head4 code() / message() / addinfo() / diagset() + + print "Error ", $@->code(), ": ", $@->message(), "\n"; + print "(addinfo '", $@->addinfo(), "', set '", $@->diagset(), "')\n"; + +These methods, of no arguments, return the exception's error-code, +error-message, additional information and diagnostic set respectively. + +=head4 render() + + print $@->render(); + +Returns a human-readable rendition of an exception. The C<""> +operator is overloaded on the Exception class, so that an Exception +used in a string context is automatically rendered. Among other +consequences, this has the useful result that a ZOOM application that +died due to an uncaught exception will emit an informative message +before exiting. + +=head2 ZOOM::ScanSet + + $ss = $conn->scan('@attr 1=1003 a'); + $n = $ss->size(); + ($term, $occ) = $ss->term($n-1); + $rs = $conn->search_pqf('@attr 1=1003 "' . $term . "'"); + assert($rs->size() == $occ); + +A ScanSet represents a set of candidate search-terms returned from an +index scan. Its sole purpose is to provide access to those term, to +the corresponding display terms, and to the occurrence-counts of the +terms. + +There is no C method nor any other explicit constructor. The +only way to create a new ScanSet is by using C on a +Connection. + +See the description of the C class in the ZOOM Abstract +API at +http://zoom.z3950.org/api/zoom-current.html#3.6 + +=head3 Methods + +=head4 size() + + print "Found ", $ss->size(), " terms\n"; + +Returns the number of terms in the scan set. In general, this will be +the scan-set size requested by the C option in the Connection +on which the scan was performed [default 10], but it may be fewer if +the scan is close to the end of the index. + +=head4 term() / display_term() + + $ss = $conn->scan('@attr 1=1004 whatever'); + ($term, $occurrences) = $ss->term(0); + ($displayTerm, $occurrences2) = $ss->display_term(0); + assert($occurrences == $occurrences2); + if (user_likes_the_look_of($displayTerm)) { + $rs = $conn->search_pqf('@attr 1=4 "' . $term . '"'); + assert($rs->size() == $occurrences); + } + +These methods return the scanned terms themselves. C returns +the term is a form suitable for submitting as part of a query, whereas +C returns it in a form suitable for displaying to a +user. Both versions also return the number of occurrences of the term +in the index, i.e. the number of hits that will be found if the term +is subsequently used in a query. + +In most cases, the term and display term will be identical; however, +they may be different in cases where punctuation or case is +normalised, or where identifiers rather than the original document +terms are indexed. + +=head4 option() + + print "scan status is ", $ss->option("scanStatus"); + +Allows options to be set into, and read from, a ScanSet, just like +the Connection class's C method. There is no +C method for ScanSet objects. + +ScanSet options are also described, though not particularly +informatively, at +http://indexdata.com/yaz/doc/zoom.scan.tkl + +=head4 destroy() + + $ss->destroy() + +Destroys a ScanSet object, freeing its resources. It is an error to +reuse a ScanSet that has been Ced. + +=head2 ZOOM::Package + + $p = $conn->package(); + $p->option(action => "specialUpdate"); + $p->option(recordIdOpaque => 145); + $p->option(record => content_of("/tmp/record.xml")); + $p->send("update"); + $p->destroy(); + +This class represents an Extended Services Package: an instruction to +the server to do something not covered by the core parts of the Z39.50 +standard (or the equivalent in SRW or SRU). Since the core protocols +are read-only, such requests are often used to make changes to the +database, such as in the record update example above. + +Requesting an extended service is a four-step process: first, create a +package associated with the connection to the relevant database; +second, set options on the package to instruct the server on what to +do; third, send the package (which may result in an exception being +thrown if the server cannot execute the requested operations; and +finally, destroy the package. + +Package options are listed at +http://indexdata.com/yaz/doc/zoom.ext.tkl + +The particular options that have meaning are determined by the +top-level operation string specified as the argument to C. +For example, when the operation is C (the most commonly used +extended service), the C option may be set to any of +C +(add a new record, failing if that record already exists), +C +(delete a record, failing if it is not in the database). +C +(replace a record, failing if an old version is not already present) +or +C +(add a record, replacing any existing version that may be present). + +For update, the C option should be set to the full text of the +XML record to added, deleted or replaced. Depending on how the server +is configured, it may extract the record's unique ID from the text +(i.e. from a known element such as the C<001> field of a MARCXML +record), or it may require the unique ID to passed in explicitly using +the C option. + +Extended services packages are B in the ZOOM +Abstract API at +http://zoom.z3950.org/api/zoom-current.html +They will be added in a forthcoming version, and will function much +as those implemented in this module. + +=head3 Methods + +=head4 option() + + $p->option(recordIdOpaque => "46696f6e61"); + +Allows options to be set into, and read from, a Package, just like +the Connection class's C method. There is no +C method for Package objects. + +Package options are listed at +http://indexdata.com/yaz/doc/zoom.ext.tkl + +=head4 send() + + $p->send("create"); + +Sends a package to the server associated with the Connection that +created it. Problems are reported by throwing an exception. The +single parameter indicates the operation that the server is being +requested to perform, and controls the interpretation of the package's +options. Valid operations include: + +=over 4 + +=item itemorder + +Request a copy of a nominated object, e.g. place an ILL request. + +=item create + +Create a new database, the name of which is specified by the +C option. + +=item drop + +Drop an existing database, the name of which is specified by the +C option. + +=item commit + +Commit changes made to the database within a transaction. + +=item update + +Modify the contents of the database by adding, deleting or replacing +records (as described above in the overview of the C +class). + +=item xmlupdate + +I have no idea what this does. + +=back + +Although the module is capable of I all these requests, not +all servers are capable of I them. Refusal is indicated by +throwing an exception. Problems may also be caused by lack of +privileges; so C must be used with caution, and is perhaps +best wrapped in a clause that checks for execptions, like so: + + eval { $p->send("create") }; + if ($@ && $@->isa("ZOOM::Exception")) { + print "Oops! ", $@->message(), "\n"; + return $@->code(); + } + +=head4 destroy() + + $p->destroy() + +Destroys a Package object, freeing its resources. It is an error to +reuse a Package that has been Ced. + +=head2 ZOOM::Query + + $q = new ZOOM::Query::CQL("creator=pike and subject=unix"); + $q->sortby("1=4 >i 1=21 >s"); + $rs = $conn->search($q); + $q->destroy(); + +C is a virtual base class from which various concrete +subclasses can be derived. Different subclasses implement different +types of query. The sole purpose of a Query object is to be used in a +C on a Connection; because PQF is such a common special +case, the shortcut Connection method C is provided. + +The following Query subclasses are provided, each providing the +same set of methods described below: + +=over 4 + +=item ZOOM::Query::PQF + +Implements Prefix Query Format (PQF), also sometimes known as Prefix +Query Notation (PQN). This esoteric but rigorous and expressive +format is described in the YAZ Manual at +http://indexdata.com/yaz/doc/tools.tkl#PQF + +=item ZOOM::Query::CQL + +Implements the Common Query Language (CQL) of SRU, the Search/Retrieve +URL. CQL is a much friendlier notation than PQF, using a simple infix +notation. The queries are passed ``as is'' to the server rather than +being compiled into a Z39.50 Type-1 query, so only CQL-compliant +servers can support such querier. CQL is described at +http://www.loc.gov/standards/sru/cql/ +and in a slight out-of-date but nevertheless useful tutorial at +http://zing.z3950.org/cql/intro.html + +=item ZOOM::Query::CQL2RPN + +Implements CQL by compiling it on the client-side into a Z39.50 +Type-1 (RPN) query, and sending that. This provides essentially the +same functionality as C, but it will work against +any standard Z39.50 server rather than only against the small subset +that support CQL natively. The drawback is that, because the +compilation is done on the client side, a configuration file is +required to direct the mapping of CQL constructs such as index names, +relations and modifiers into Type-1 query attributes. An example CQL +configuration file is included in the ZOOM-Perl distribution, in the +file C + +=item ZOOM::Query::CCL2RPN + +Implements CCL by compiling it on the client-side into a Z39.50 Type-1 +(RPN) query, and sending that. Because the compilation is done on the +client side, a configuration file is required to direct the mapping of +CCL constructs such as index names and boolean operators into Type-1 +query attributes. An example CCL configuration file is included in +the ZOOM-Perl distribution, in the file C + +CCL is syntactically very similar to CQL, but much looser. While CQL +is an entirely precise language in which each possible query has +rigorously defined semantics, and is thus suitable for transfer as +part of a protocol, CCL is best deployed as a human-facing UI +language. + +=back + +See the description of the C class in the ZOOM Abstract +API at +http://zoom.z3950.org/api/zoom-current.html#3.3 + +=head3 Methods + +=head4 new() + + $q = new ZOOM::Query::CQL('title=dinosaur'); + $q = new ZOOM::Query::PQF('@attr 1=4 dinosaur'); + +Creates a new query object, compiling the query passed as its argument +according to the rules of the particular query-type being +instantiated. If compilation fails, an exception is thrown. +Otherwise, the query may be passed to the C method +C. + + $conn->option(cqlfile => "samples/cql/pqf.properties"); + $q = new ZOOM::Query::CQL2RPN('title=dinosaur', $conn); + +Note that for the C subclass, the Connection +must also be passed into the constructor. This is used for two +purposes: first, its C option is used to find the CQL +configuration file that directs the translations into RPN; and second, +if compilation fails, then diagnostic information is cached in the +Connection and be retrieved using C<$conn-Eerrcode()> and related +methods. + + $conn->option(cclfile => "samples/ccl/default.bib"); + # or + $conn->option(cclqual => "ti u=4 s=pw\nab u=62 s=pw"); + $q = new ZOOM::Query::CCL2RPN('ti=dinosaur', $conn); + +For the C subclass, too, the Connection must be +passed into the constructor, for the same reasons as when client-side +CQL compilation is used. The C option, if defined, gives a +CCL qualification specification inline; otherwise, the contents of the +file named by the C option are used. + +=head4 sortby() + + $q->sortby("1=4 >i 1=21 >s"); + +Sets a sort specification into the query, so that when a C +is run on the query, the result is automatically sorted. The sort +specification language is the same as the C sort-specification +type of the C method C, described above. + +=head4 destroy() + + $p->destroy() + +Destroys a Query object, freeing its resources. It is an error to +reuse a Query that has been Ced. + +=head2 ZOOM::Options + + $o1 = new ZOOM::Options(); + $o1->option(user => "alf"); + $o2 = new ZOOM::Options(); + $o2->option(password => "fruit"); + $opts = new ZOOM::Options($o1, $o2); + $conn = create ZOOM::Connection($opts); + $conn->connect($host); # Uses the specified username and password + +Several classes of ZOOM objects carry their own sets of options, which +can be manipulated using their C method. Sometimes, +however, it's useful to deal with the option sets directly, and the +C class exists to enable this approach. + +Option sets are B in the ZOOM +Abstract API at +http://zoom.z3950.org/api/zoom-current.html +They are an extension to that specification. + +=head3 Methods + +=head4 new() + + $o1 = new ZOOM::Options(); + $o1and2 = new ZOOM::Options($o1); + $o3 = new ZOOM::Options(); + $o1and3and4 = new ZOOM::Options($o1, $o3); + +Creates and returns a new option set. One or two (but no more) +existing option sets may be passed as arguments, in which case they +become ``parents'' of the new set, which thereby ``inherits'' their +options, the values of the first parent overriding those of the second +when both have a value for the same key. An option set that inherits +from a parent that has its own parents also inherits the grandparent's +options, and so on. + +=head4 option() / option_binary() + + $o->option(preferredRecordSyntax => "usmarc"); + $o->option_binary(iconBlob => "foo\0bar"); + die if length($o->option_binary("iconBlob") != 7); + +These methods are used to get and set options within a set, and behave +the same way as the same-named C methods - see above. As +with the C methods, values passed to and retrieved using +C are interpreted as NUL-terminated, while those passed to +and retrieved from C are binary-clean. + +=head4 bool() + + $o->option(x => "T"); + $o->option(y => "F"); + assert($o->bool("x", 1)); + assert(!$o->bool("y", 1)); + assert($o->bool("z", 1)); + +The first argument is a key, and the second is a default value. +Returns the value associated with the specified key as a boolean, or +the default value if the key has not been set. The values C (upper +case) and C<1> are considered true; all other values (including C +(lower case) and non-zero integers other than one) are considered +false. + +This method is provided in ZOOM-C because in a statically typed +language it's convenient to have the result returned as an +easy-to-test type. In a dynamically typed language such as Perl, this +problem doesn't arise, so C is nearly useless; but it is made +available in case applications need to duplicate the idiosyncratic +interpretation of truth and falsehood and ZOOM-C uses. + +=head4 int() + + $o->option(x => "012"); + assert($o->int("x", 20) == 12); + assert($o->int("y", 20) == 20); + +Returns the value associated with the specified key as an integer, or +the default value if the key has not been set. See the description of +C for why you almost certainly don't want to use this. + +=head4 set_int() + + $o->set_int(x => "29"); + +Sets the value of the specified option as an integer. Of course, Perl +happily converts strings to integers on its own, so you can just use +C for this, but C is guaranteed to use the same +string-to-integer conversion as ZOOM-C does, which might occasionally +be useful. Though I can't imagine how. + +=head4 set_callback() + + sub cb { + ($udata, $key) = @; + return "$udata-$key-$udata"; + } + $o->set_callback(\&cb, "xyz"); + assert($o->option("foo") eq "xyz-foo-xyz"); + +This method allows a callback function to be installed in an option +set, so that the values of options can be calculated algorithmically +rather than, as usual, looked up in a table. Along with the callback +function itself, an additional datum is provided: when an option is +subsequently looked up, this datum is passed to the callback function +along with the key; and its return value is returned to the caller as +the value of the option. + +B +Although it ought to be possible to specify callback function using +the C<\&name> syntax above, or a literal C code +reference, the complexities of the Perl-internal memory management +system mean that the function must currently be specified as a string +containing the fully-qualified name, e.g. C<"main::cb">.> + +B +The current implementation of the this method leaks memory, not only +when the callback is installed, but on every occasion that it is +consulted to look up an option value. + +=head4 destroy() + + $o->destroy() + +Destroys an Options object, freeing its resources. It is an error to +reuse an Options object that has been Ced. + +=head1 ENUMERATIONS + +The ZOOM module provides two enumerations that list possible return +values from particular functions. They are described in the following +sections. + +=head2 ZOOM::Error + + if ($@->code() == ZOOM::Error::QUERY_PQF) { + return "your query was not accepted"; + } + +This class provides a set of manifest constants representing some of +the possible error codes that can be raised by the ZOOM module. The +methods that return error-codes are +C, +C +and +C. + +The C class provides the constants +C, +C, +C, +C, +C, +C, +C, +C, +C, +C, +C, +C, +C, +C, +C, +C, +C, +C, +C, +C, +C, +C, +C +and +C, +each of which specifies a client-side error. These codes constitute +the C diagnostic set. + +Since errors may also be diagnosed by the server, and returned to the +client, error codes may also take values from the BIB-1 diagnostic set +of Z39.50, listed at the Z39.50 Maintenance Agency's web-site at +http://www.loc.gov/z3950/agency/defns/bib1diag.html + +All error-codes, whether client-side from the C +enumeration or server-side from the BIB-1 diagnostic set, can be +translated into human-readable messages by passing them to the +C utility function. + +=head2 ZOOM::Event + + if ($conn->last_event() == ZOOM::Event::CONNECT) { + print "Connected!\n"; + } + +In applications that need it - mostly complex multiplexing +applications - The C method is used to +return an indication of the last event that occurred on a particular +connection. It always returns a value drawn from this enumeration, +that is, one of C, C, C, C, +C, C, C, C, C, +C or C. + +See the section below on asynchronous applications. + +=head1 LOGGING + + ZOOM::Log::init_level(ZOOM::Log::mask_str("zoom,myapp,-warn")); + ZOOM::Log::log("myapp", "starting up with pid ", $$); + +Logging facilities are provided by a set of functions in the +C module. Note that C is not a class, and it +is not possible to create C objects: the API is imperative, +reflecting that of the underlying YAZ logging facilities. Although +there are nine logging functions altogether, you can ignore nearly +all of them: most applications that use logging will begin by calling +C and C once each, as above, and will then +repeatedly call C. + +=head2 mask_str() + + $level = ZOOM::Log::mask_str("zoom,myapp,-warn"); + +Returns an integer corresponding to the log-level specified by the +parameter. This is a string of zero or more comma-separated +module-names, each indicating an individual module to be either added +to the default log-level or removed from it (for those components +prefixed by a minus-sign). The names may be those of either standard +YAZ-logging modules such as C, C and C, or custom +modules such as C in the example above. The module C +requests logging from the ZOOM module itself, which may be helpful for +debugging. + +Note that calling this function does not in any way change the logging +state: it merely returns a value. To change the state, this value +must be passed to C. + +=head2 module_level() + + $level = ZOOM::Log::module_level("zoom"); + ZOOM::Log::log($level, "all systems clear: thrusters invogriated"); + +Returns the integer corresponding to the single log-level specified as +the parameter, or zero if that level has not been registered by a +prior call to C. Since C accepts either a numeric +log-level or a string, there is no reason to call this function; but, +what the heck, maybe you enjoy that kind of thing. Who are we to +judge? + +=head2 init_level() + + ZOOM::Log::init_level($level); + +Initialises the log-level to the specified integer, which is a bitmask +of values, typically as returned from C. All subsequent +calls to C made with a log-level that matches one of the bits +in this mask will result in a log-message being emitted. All logging +can be turned off by calling C. + +=head2 init_prefix() + + ZOOM::Log::init_prefix($0); + +Initialises a prefix string to be included in all log-messages. + +=head2 init_file() + + ZOOM::Log::init_file("/tmp/myapp.log"); + +Initialises the output file to be used for logging: subsequent +log-messages are written to the nominated file. If this function is +not called, log-messages are written to the standard error stream. + +=head2 init() + + ZOOM::Log::init($level, $0, "/tmp/myapp.log"); + +Initialises the log-level, the logging prefix and the logging output +file in a single operation. + +=head2 time_format() + + ZOOM::Log::time_format("%Y-%m-%d %H:%M:%S"); + +Sets the format in which log-messages' timestamps are emitted, by +means of a format-string like that used in the C function +C. The example above emits year, month, day, hours, +minutes and seconds in big-endian order, such that timestamps can be +sorted lexicographically. + +=head2 init_max_size() + +(This doesn't seem to work, so I won't bother describing it.) + +=head2 log() + + ZOOM::Log::log(8192, "reducing to warp-factor $wf"); + ZOOM::Log::log("myapp", "starting up with pid ", $$); + +Provided that the first argument, log-level, is among the modules +previously established by C, this function emits a +log-message made up of a timestamp, the prefix supplied to +C, if any, and the concatenation of all arguments after +the first. The message is written to the standard output stream, or +to the file previous specified by C if this has been +called. + +The log-level argument may be either a numeric value, as returned from +C, or a string containing the module name. + +=head1 ASYNCHRONOUS APPLICATIONS + +Although asynchronous applications are conceptually complex, the ZOOM +support for them is provided through a very simple interface, +consisting of one option (C), one function (C), +one Connection method (C and an enumeration +(C). + +The approach is as follows: + +=over 4 + +=item Initialisation + +Create several connections to the various servers, each of them having +the option C set, and with whatever additional options are +required - e.g. the piggyback retrieval record-count can be set so +that records will be returned in search responses. + +=item Operations + +Send searches to the connections, request records, etc. + +=item Event harvesting + +Repeatedly call C to discover what responses are being +received from the servers. Each time this function returns, it +indicates which of the connections has fired; this connection can then +be interrogated with the C method to discover what event +has occurred, and the return value - an element of the C +enumeration - can be tested to determine what to do next. For +example, the C event indicates that no further operations are +outstanding on the connection, so any fetched records can now be +immediately obtained. + +=back + +Here is a very short program (omitting all error-checking!) which +demonstrates this process. It parallel-searches three servers (or more +of you add them the list), displaying the first record in the +result-set of each server as soon as it becomes available. + + use ZOOM; + @servers = ('z3950.loc.gov:7090/Voyager', + 'z3950.indexdata.com:210/gils', + 'agricola.nal.usda.gov:7190/Voyager'); + for ($i = 0; $i < @servers; $i++) { + $z[$i] = new ZOOM::Connection($servers[$i], 0, + async => 1, # asynchronous mode + count => 1, # piggyback retrieval count + preferredRecordSyntax => "usmarc"); + $r[$i] = $z[$i]->search_pqf("mineral"); + } + while (($i = ZOOM::event(\@z)) != 0) { + $ev = $z[$i-1]->last_event(); + print("connection ", $i-1, ": ", ZOOM::event_str($ev), "\n"); + if ($ev == ZOOM::Event::ZEND) { + $size = $r[$i-1]->size(); + print "connection ", $i-1, ": $size hits\n"; + print $r[$i-1]->record(0)->render() + if $size > 0; + } + } =head1 SEE ALSO +The ZOOM abstract API, +http://zoom.z3950.org/api/zoom-current.html + The C module, included in the same distribution as this one. The C module, which this one supersedes. +http://perl.z3950.org/ + +The documentation for the ZOOM-C module of the YAZ Toolkit, which this +module is built on. Specifically, its lists of options are useful. +http://indexdata.com/yaz/doc/zoom.tkl + +The BIB-1 diagnostic set of Z39.50, +http://www.loc.gov/z3950/agency/defns/bib1diag.html =head1 AUTHOR @@ -70,7 +1566,7 @@ Mike Taylor, Emike@indexdata.comE =head1 COPYRIGHT AND LICENCE -Copyright (C) 2005 by Index Data. +Copyright (C) 2005-2014 by Index Data. This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or,