# $Id: ZOOM.pod,v 1.10 2005-11-18 17:55:08 mike Exp $ use strict; use warnings; =head1 NAME ZOOM - Perl extension implementing the ZOOM API for Information Retrieval =head1 SYNOPSIS use ZOOM; eval { $conn = new ZOOM::Connection($host, $port) $conn->option(preferredRecordSyntax => "usmarc"); $rs = $conn->search_pqf('@attr 1=4 dinosaur'); $n = $rs->size(); print $rs->record(0)->render(); }; if ($@) { print "Error ", $@->code(), ": ", $@->message(), "\n"; } =head1 DESCRIPTION This module provides a nice, Perlish implementation of the ZOOM Abstract API described and documented at http://zoom.z3950.org/api/ the ZOOM module is implemented as a set of thin classes on top of the non-OO functions provided by this distribution's C module, which in turn is a thin layer on top of the ZOOM-C code supplied as part of Index Data's YAZ Toolkit. Because ZOOM-C is also the underlying code that implements ZOOM bindings in C++, Visual Basic, Scheme, Ruby, .NET (including C#) and other languages, this Perl module works compatibly with those other implementations. (Of course, the point of a public API such as ZOOM is that all implementations should be compatible anyway; but knowing that the same code is running is reassuring.) The ZOOM module provides two enumerations (C and C), a single utility function C in the C package itself, and eight classes: C, C, C, C, C, C, C and C. Of these, the Query class is abstract, and has two concrete subclasses: C and C. Many useful ZOOM applications can be built using only the Connection, ResultSet, Record and Exception classes, as in the example code-snippet above. A typical application will begin by creating an Connection object, then using that to execute searches that yield ResultSet objects, then fetching records from the result-sets to yield Record objects. If an error occurs, an Exception object is thrown and can be dealt with. More sophisticated applications might also browse the server's indexes to create a ScanSet, from which indexed terms may be retrieved; others might send ``Extended Services'' Packages to the server, to achieve non-standard tasks such as database creation and record update. Searching using a query syntax other than PQF can be done using an query object of one of the Query subclasses. Finally, sets of options may be manipulated independently of the objects they are associated with using an Options object. In general, method calls throw an exception if anything goes wrong, so you don't need to test for success after each call. See the section below on the Exception class for details. =head1 UTILITY FUNCTION =head2 ZOOM::diag_str() $msg = ZOOM::diag_str(ZOOM::Error::INVALID_QUERY); Returns a human-readable English-language string corresponding to the error code that is its own parameter. This works for any error-code returned from C, C or C, irrespective of whether it is a member of the C enumeration or drawn from the BIB-1 diagnostic set. =head1 CLASSES The eight ZOOM classes are described here in ``sensible order'': first, the four commonly used classes, in the he order that they will tend to be used in most programs (Connection, ResultSet, Record, Exception); then the four more esoteric classes in descending order of how often they are needed. With the exception of the Options class, which is an extension to the ZOOM model, the introduction to each class includes a link to the relevant section of the ZOOM Abstract API. =head2 ZOOM::Connection $conn = new ZOOM::Connection("indexdata.dk:210/gils"); print("server is '", $conn->option("serverImplementationName"), "'\n"); $conn->option(preferredRecordSyntax => "usmarc"); $rs = $conn->search_pqf('@attr 1=4 mineral'); $ss = $conn->scan('@attr 1=1003 a'); if ($conn->errcode() != 0) { die("somthing went wrong: " . $conn->errmsg()) } $conn->destroy() This class represents a connection to an information retrieval server, using an IR protocol such as ANSI/NISO Z39.50, SRW (the Search/Retrieve Webservice), SRU (the Search/Retrieve URL) or OpenSearch. Not all of these protocols require a low-level connection to be maintained, but the Connection object nevertheless provides a location for the necessary cache of configuration and state information, as well as a uniform API to the connection-oriented facilities (searching, index browsing, etc.), provided by these protocols. See the description of the C class in the ZOOM Abstract API at http://zoom.z3950.org/api/zoom-current.html#3.2 =head3 Methods =head4 new() $conn = new ZOOM::Connection("indexdata.dk", 210); $conn = new ZOOM::Connection("indexdata.dk:210/gils"); $conn = new ZOOM::Connection("tcp:indexdata.dk:210/gils"); $conn = new ZOOM::Connection("http:indexdata.dk:210/gils"); Creates a new Connection object, and immediately connects it to the specified server. If you want to make a new Connection object but delay forging the connection, use the C and C methods instead. This constructor can be called with two arguments or a single argument. In the former case, the arguments are the name and port number of the Z39.50 server to connect to; in the latter case, the single argument is a YAZ service-specifier string of the form =over 4 =item [I:]I[:I][/I] =back In which the I and I parts are as in the two-argument form, the I if provided specifies the name of the database to be used in subsequent searches on this connection, and the optional I (default C) indicates what protocol should be used. At present, the following schemes are supported: =over 4 =item tcp Z39.50 connection. =item ssl Z39.50 connection encrypted using SSL (Secure Sockets Layer). Not many servers support this, but Index Data's Zebra is one that does. =item unix Z39.50 connection on a Unix-domain (local) socket, in which case the I portion of the string is instead used as a filename in the local filesystem. =item http SRW connection using SOAP over HTTP. =back Support for SRU will follow in the fullness of time. If an error occurs, an exception is thrown. This may indicate a networking problem (e.g. the host is not found or unreachable), or a protocol-level problem (e.g. a Z39.50 server rejected the Init request). =head4 create() / connect() $options = new ZOOM::Options(); $options->option(implementationName => "my client"); $conn = create ZOOM::Connection($options) $conn->connect($host, 0); The usual Connection constructor, C brings a new object into existence and forges the connection to the server all in one operation, which is often what you want. For applications that need more control, however, these two method separate the two steps, allowing additional steps in between such as the setting of options. C creates and returns a new Connection object, which is I connected to any server. It may be passed an options block, of type C (see below), into which options may be set before or after the creation of the Connection. The connection to the server may then be forged by the C method, the arguments of which are the same as those of the C constructor. =head4 error_x() / errcode() / errmsg() / addinfo() / diagset() ($errcode, $errmsg, $addinfo, $diagset) = $conn->error_x(); $errcode = $conn->errcode(); $errmsg = $conn->errmsg(); $addinfo = $conn->addinfo(); $diagset = $conn->diagset(); These methods may be used to obtain information about the last error to have occurred on a connection - although typically they will not been used, as the same information is available through the C that is thrown when the error occurs. The C, C, C and C methods each return one element of the diagnostic, and C returns all four at once. See the C for the interpretation of these elements. =head4 option() / option_binary() print("server is '", $conn->option("serverImplementationName"), "'\n"); $conn->option(preferredRecordSyntax => "usmarc"); $conn->option_binary(iconBlob => "foo\0bar"); die if length($conn->option_binary("iconBlob") != 7); Objects of the Connection, ResultSet, ScanSet and Package classes carry with them a set of named options which affect their behaviour in certain ways. See the ZOOM-C options documentation for details: =over 4 =item * Connection options are listed at http://indexdata.com/yaz/doc/zoom.tkl#zoom.connections =item * ScanSet options are listed at http://indexdata.com/yaz/doc/zoom.scan.tkl I<### move this obvservation down to the appropriate place> =item * Package options are listed at http://indexdata.com/yaz/doc/zoom.ext.html I<### move this obvservation down to the appropriate place> =back These options are set and fetched using the C method, which may be called with either one or two arguments. In the two-argument form, the option named by the first argument is set to the value of the second argument, and its old value is returned. In the one-argument form, the value of the specified option is returned. For historical reasons, option values are not binary-clean, so that a value containing a NUL byte will be returned in truncated form. The C method behaves identically to C except that it is binary-clean, so that values containing NUL bytes are set and returned correctly. =head4 search() / search_pqf() $rs = $conn->search(new ZOOM::Query::CQL('title=dinosaur')); # The next two lines are equivalent $rs = $conn->search(new ZOOM::Query::PQF('@attr 1=4 dinosaur')); $rs = $conn->search_pqf('@attr 1=4 dinosaur'); The principal purpose of a search-and-retrieve protocol is searching (and, er, retrieval), so the principal method used on a Connection object is C. It accepts a single argument, a C object (or, more precisely, an object of a subclass of this class); and it creates and returns a new ResultSet object representing the set of records resulting from the search. Since queries using PQF (Prefix Query Format) are so common, we make them a special case by providing a C method. This is identical to C except that it accepts a string containing the query rather than an object, thereby obviating the need to create a C object. See the documentation of that class for information about PQF. =head4 scan() Many Z39.50 servers allow you to browse their indexes to find terms to search for. This is done using the C method, which creates and returns a new ScanSet object representing the set of terms resulting from the scan. C takes a single argument, but it has to work hard: it specifies both what index to scan for terms, and where in the index to start scanning. What's more, the specification of what index to scan includes multiple facets, such as what database fields it's an index of (author, subject, title, etc.) and whether to scan for whole fields or single words (e.g. the title ``I'', or the four words ``Back'', ``Empire'', ``Strikes'' and ``The'', interleaved with words from other titles in the same index. All of this is done by using a single term from the PQF query as the C argument. (At present, only PQF is supported, although there is no reason in principle why CQL and other query syntaxes should not be supported in future). The attributes associated with the term indicate which index is to be used, and the term itself indicates the point in the index at which to start the scan. For example, if the argument is C<@attr 1=4 fish>, then =over 4 =item @attr 1=4 This is the BIB-1 attribute with type 1 (meaning access-point, which specifies an index), and type 4 (which means ``title''). So the scan is in the title index. =item fish Start the scan from the lexicographically earliest term that is equal to or falls after ``fish''. =back The argument C<@attr 1=4 @attr 6=3 fish> would behave similarly; but the BIB-1 attribute 6=3 mean completeness=``complete field'', so the scan would be for complete titles rather than for words occurring in titles. This takes a bit of getting used to. The behaviour is C is affected by the following options, which may be set on the Connection through which the scan is done: =over 4 =item number [default: 10] Indicates how many terms should be returned in the ScanSet. The number actually returned may be less, if the start-point is near the end of the index, but will not be greater. =item position [default: 1] A 1-based index specifying where in the returned list of terms the seed-term should appear. By default it should be the first term returned, but C may be set, for example, to zero (requesting the next terms I the seed-term), or to the same value as C (requesting the index terms I the seed term). =item stepSize [default: 0] An integer indicating how many indexed terms are to be skipped between each one returned in the ScanSet. By default, no terms are skipped, but overriding this can be useful to get a high-level overview of the index. =back =head4 package() $p = $conn->package(); $o = new ZOOM::Options(); $o->option(databaseName => "newdb"); $p = $conn->package($o); Creates and returns a new C, to be used in invoking an Extended Service. An options block may optionally be passed in. See the C documentation. =head4 destroy() $conn->destroy() Destroys a Connection object, tearing down any low-level connection associated with it and freeing its resources. It is an error to reuse a Connection that has been Ced. =head2 ZOOM::ResultSet $rs = $conn->search_pqf('@attr 1=4 mineral'); $n = $rs->size(); for $i (1 .. $n) { $rec = $rs->record($i-1); print $rec->render(); } A ResultSet object represents the set of zero or more records resulting from a search, and is the means whereby these records can be retrieved. A ResultSet object may maintain client side cache or some, less, none, all or more of the server's records: in general, this is supposed to an implementaton detail of no interest to a typical application, although more sophisticated applications do have facilities for messing with the cache. Most applications will only need the C, C and C methods. There is no C method nor any other explicit constructor. The only way to create a new ResultSet is by using C (or C) on a Connection. See the description of the C class in the ZOOM Abstract API at http://zoom.z3950.org/api/zoom-current.html#3.4 =head3 Methods =head4 option() $conn->option(elementSetName => "f"); Allows options to be set into, and read from a ResultSet, just like the Connection class's C method. There is no C method for ResultSet objects. ResultSet options are listed at http://indexdata.com/yaz/doc/zoom.resultsets.tkl =head4 size() print "Found ", $rs->size(), " records\n"; Returns the number of records in the result set. =head4 record(), record_immediate() $rec = $rs->record(0); $rec2 = $rs->record_immediate(0); $rec3 = $rs->record_immediate(1) or print "second record wasn't in cache\n"; The C method returns a C object representing a record from result-set, whose position is indicated by the argument passed in. This is a zero-based index, so that legitimate values range from zero to C<$rs->size()-1>. The C API is identical, but it never invokes a network operation, merely returning the record from the ResultSet's cache if it's already there, or an undefined value otherwise. So if you use this method, B. =head4 records() $rs->records(0, 10, 0); for $i (0..10) { print $rs->record_immediate($i)->render(); } @nextseven = $rs->records(10, 7, 1); The C method only fetches records from the cache, whereas C fetches them from the server if they have not already been cached; but the ZOOM module has to guess what the most efficient strategy for this is. It might fetch each record, alone when asked for: that's optimal in an application that's only interested in the top hit from each search, but pessimal for one that wants to display a whole list of results. Conversely, the software's strategy might be always to ask for blocks of a twenty records: that's great for assembling long lists of things, but wasteful when only one record is wanted. The problem is that the ZOOM module can't tell, when you call C<$rs->record()>, what your intention is. But you can tell it. The C method fetches a sequence of records, all in one go. It takes three arguments: the first is the zero-based index of the first record in the sequence, the second is the number of records to fetch, and the third is a boolean indication of whether or not to return the retrieved records as well as adding them to the cache. (You can always pass 1 for this if you like, and Perl will discard the unused return value, but there is a small efficiency gain to be had by passing 0.) Once the records have been retrieved from the server (i.e. C has completed without throwing an exception), they can be fetched much more efficiently using C - or Ccache_reset() Resets the ResultSet's record cache, so that subsequent invocations of C will fail. I struggle to imagine a real scenario where you'd want to do this. =head4 sort() if ($rs->sort("yaz", "1=4 >i") < 0) { die "sort failed"; } Sorts the ResultSet in place ### =head4 destroy() $rs->destroy() Destroys a ResultSet object, freeing its resources. It is an error to reuse a ResultSet that has been Ced. =head2 ZOOM::Record I<###> =head2 ZOOM::Exception In general, method calls throw an exception (of class C) if anything goes wrong, so you don't need to test for success after each call. Exceptions are caught by enclosing the main code in an C block and checking C<$@> on exit from that block, as in the code-sample above. There are a small number of exceptions to this rule: the three record-fetching methods in the C class, C, C, and C can all return undefined values for legitimate reasons, under circumstances that do not merit throwing an exception. For this reason, the return values of these methods should be checked. See the individual methods' documentation for details. =head3 Methods I<###> =head2 ZOOM::ScanSet I<###> =head2 ZOOM::Package I<###> =head2 ZOOM::Query I<###> =head2 ZOOM::Options I<###> =head1 ENUMERATIONS The ZOOM module provides two enumerations that list possible return values from particular functions. They are described in the following sections. =head2 ZOOM::Error if ($@->code() == ZOOM::Error::QUERY_PQF) { return "your query was not accepted"; } This class provides a set of manifest constants representing some of the possible error codes that can be raised by the ZOOM module. The methods that return error-codes are C, C and C. The C class provides the constants C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C and C, each of which specifies a client-side error. Since errors may also be diagnosed by the server, and returned to the client, error codes may also take values from the BIB-1 diagnostic set of Z39.50, listed at the Z39.50 Maintenance Agency's web-site at http://www.loc.gov/z3950/agency/defns/bib1diag.html All error-codes, whether client-side from the C enumeration or server-side from the BIB-1 diagnostic set, can be translated into human-readable messages by passing them to the C utility function. =head2 ZOOM::Event if ($conn->last_event() == ZOOM::Event::CONNECT) { print "Connected!\n"; } In applications that need it - mostly complex multiplexing applications - The C method is used to return an indication of the last event that occurred on a particular connection. It always returns a value drawn from this enumeration, that is, one of C, C, C, C, C, C, C, C, C or C. You almost certainly don't need to know about this. Frankly, I'm not sure how to use it myself. =head1 SEE ALSO The ZOOM abstract API, http://zoom.z3950.org/api/zoom-current.html The C module, included in the same distribution as this one. The C module, which this one supersedes. http://perl.z3950.org/ The documentation for the ZOOM-C module of the YAZ Toolkit, which this module is built on. Specifically, its lists of options are useful. http://indexdata.com/yaz/doc/zoom.tkl The BIB-1 diagnostic set of Z39.50, http://www.loc.gov/z3950/agency/defns/bib1diag.html =head1 AUTHOR Mike Taylor, Emike@indexdata.comE =head1 COPYRIGHT AND LICENCE Copyright (C) 2005 by Index Data. This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available. =cut 1;