# $Id: ZOOM.pod,v 1.9 2005-11-17 15:31:06 mike Exp $ use strict; use warnings; =head1 NAME ZOOM - Perl extension implementing the ZOOM API for Information Retrieval =head1 SYNOPSIS use ZOOM; eval { $conn = new ZOOM::Connection($host, $port) $conn->option(preferredRecordSyntax => "usmarc"); $rs = $conn->search_pqf('@attr 1=4 dinosaur'); $n = $rs->size(); print $rs->record(0)->render(); }; if ($@) { print "Error ", $@->code(), ": ", $@->message(), "\n"; } =head1 DESCRIPTION This module provides a nice, Perlish implementation of the ZOOM Abstract API described and documented at http://zoom.z3950.org/api/ the ZOOM module is implemented as a set of thin classes on top of the non-OO functions provided by this distribution's C module, which in turn is a thin layer on top of the ZOOM-C code supplied as part of Index Data's YAZ Toolkit. Because ZOOM-C is also the underlying code that implements ZOOM bindings in C++, Visual Basic, Scheme, Ruby, .NET (including C#) and other languages, this Perl module works compatibly with those other implementations. (Of course, the point of a public API such as ZOOM is that all implementations should be compatible anyway; but knowing that the same code is running is reassuring.) The ZOOM module provides two enumerations (C and C), a single utility function C in the C package itself, and eight classes: C, C, C, C, C, C, C and C. Of these, the Query class is abstract, and has two concrete subclasses: C and C. Many useful ZOOM applications can be built using only the Connection, ResultSet, Record and Exception classes, as in the example code-snippet above. A typical application will begin by creating an Connection object, then using that to execute searches that yield ResultSet objects, then fetching records from the result-sets to yield Record objects. If an error occurs, an Exception object is thrown and can be dealt with. More sophisticated applications might also browse the server's indexes to create a ScanSet, from which indexed terms may be retrieved; others might send ``Extended Services'' Packages to the server, to achieve non-standard tasks such as database creation and record update. Searching using a query syntax other than PQF can be done using an query object of one of the Query subclasses. Finally, sets of options may be manipulated independently of the objects they are associated with using an Options object. In general, method calls throw an exception if anything goes wrong, so you don't need to test for success after each call. See the section below on the Exception class for details. =head1 UTILITY FUNCTION =head2 ZOOM::diag_str() $msg = ZOOM::diag_str(ZOOM::Error::INVALID_QUERY); Returns a human-readable English-language string corresponding to the error code that is its own parameter. This works for any error-code returned from C, C or C, irrespective of whether it is a member of the C enumeration or drawn from the BIB-1 diagnostic set. =head1 CLASSES The eight ZOOM classes are described here in ``sensible order'': first, the four commonly used classes, in the he order that they will tend to be used in most programs (Connection, ResultSet, Record, Exception); then the four more esoteric classes in descending order of how often they are needed. With the exception of the Options class, which is an extension to the ZOOM model, the introduction to each class includes a link to the relevant section of the ZOOM Abstract API. =head2 ZOOM::Connection $conn = new ZOOM::Connection("indexdata.dk:210/gils"); print("server is '", $conn->option("serverImplementationName"), "'\n"); $conn->option(preferredRecordSyntax => "usmarc"); $rs = $conn->search_pqf('@attr 1=4 mineral'); $ss = $conn->scan('@attr 1=1003 a'); if ($conn->errcode() != 0) { die("somthing went wrong: " . $conn->errmsg()) } $conn->destroy() This class represents a connection to an information retrieval server, using an IR protocol such as ANSI/NISO Z39.50, SRW (the Search/Retrieve Webservice), SRU (the Search/Retrieve URL) or OpenSearch. Not all of these protocols require a low-level connection to be maintained, but the Connection object nevertheless provides a location for the necessary cache of configuration and state information, as well as a uniform API to the connection-oriented facilities (searching, index browsing, etc.), provided by these protocols. See the description of the C class in the ZOOM Abstract API at http://zoom.z3950.org/api/zoom-current.html#3.2 =head3 Methods =head4 new() $conn = new ZOOM::Connection("indexdata.dk", 210); $conn = new ZOOM::Connection("indexdata.dk:210/gils"); $conn = new ZOOM::Connection("tcp:indexdata.dk:210/gils"); $conn = new ZOOM::Connection("http:indexdata.dk:210/gils"); Creates a new Connection object, and immediately connects it to the specified server. If you want to make a new Connection object but delay forging the connection, use the C and C methods instead. This constructor can be called with two arguments or a single argument. In the former case, the arguments are the name and port number of the Z39.50 server to connect to; in the latter case, the single argument is a YAZ service-specifier string of the form =over 4 =item [I:]I[:I][/I] =back In which the I and I parts are as in the two-argument form, the I if provided specifies the name of the database to be used in subsequent searches on this connection, and the optional I (default C) indicates what protocol should be used. At present, the following schemes are supported: =over 4 =item tcp Z39.50 connection. =item ssl Z39.50 connection encrypted using SSL (Secure Sockets Layer). Not many servers support this, but Index Data's Zebra is one that does. =item unix Z39.50 connection on a Unix-domain (local) socket, in which case the I portion of the string is instead used as a filename in the local filesystem. =item http SRW connection using SOAP over HTTP. =back Support for SRU will follow in the fullness of time. If an error occurs, an exception is thrown. This may indicate a networking problem (e.g. the host is not found or unreachable), or a protocol-level problem (e.g. a Z39.50 server rejected the Init request). =head4 create() / connect() $options = new ZOOM::Options(); $options->option(implementationName => "my client"); $conn = create ZOOM::Connection($options) $conn->connect($host, 0); The usual Connection constructor, C brings a new object into existence and forges the connection to the server all in one operation, which is often what you want. For applications that need more control, however, these two method separate the two steps, allowing additional steps in between such as the setting of options. C creates and returns a new Connection object, which is I connected to any server. It may be passed an options block, of type C (see below), into which options may be set before or after the creation of the Connection. The connection to the server may then be forged by the C method, the arguments of which are the same as those of the C constructor. =head4 error_x() / errcode() / errmsg() / addinfo() / diagset() ($errcode, $errmsg, $addinfo, $diagset) = $conn->error_x(); $errcode = $conn->errcode(); $errmsg = $conn->errmsg(); $addinfo = $conn->addinfo(); $diagset = $conn->diagset(); These methods may be used to obtain information about the last error to have occurred on a connection - although typically they will not been used, as the same information is available through the C that is thrown when the error occurs. The C, C, C and C methods each return one element of the diagnostic, and C returns all four at once. See the C for the interpretation of these elements. =head4 option() / option_binary() print("server is '", $conn->option("serverImplementationName"), "'\n"); $conn->option(preferredRecordSyntax => "usmarc"); $conn->option_binary(iconBlob => "foo\0bar"); die if length($conn->option_binary("iconBlob") != 7); Objects of the Connection, ResultSet, ScanSet and Package classes carry with them a set of named options which affect their behaviour in certain ways. See the ZOOM-C options documentation for details: =over 4 =item * Connection options are listed at http://indexdata.com/yaz/doc/zoom.tkl#zoom.connections =item * ResultSet options are listed at http://indexdata.com/yaz/doc/zoom.resultsets.tkl I<### move this obvservation down to the appropriate place> =item * ScanSet options are listed at http://indexdata.com/yaz/doc/zoom.scan.tkl I<### move this obvservation down to the appropriate place> =item * Package options are listed at http://indexdata.com/yaz/doc/zoom.ext.html I<### move this obvservation down to the appropriate place> =back These options are set and fetched using the C method, which may be called with either one or two arguments. In the two-argument form, the option named by the first argument is set to the value of the second argument, and its old value is returned. In the one-argument form, the value of the specified option is returned. For historical reasons, option values are not binary-clean, so that a value containing a NUL byte will be returned in truncated form. The C method behaves identically to C except that it is binary-clean, so that values containing NUL bytes are set and returned correctly. =head4 search() / search_pqf() $rs = $conn->search(new ZOOM::Query::CQL('title=dinosaur')); # The next two lines are equivalent $rs = $conn->search(new ZOOM::Query::PQF('@attr 1=4 dinosaur')); $rs = $conn->search_pqf('@attr 1=4 dinosaur'); The principal purpose of a search-and-retrieve protocol is searching (and, er, retrieval), so the principal method used on a Connection object is C. It accepts a single argument, a C object (or, more precisely, an object of a subclass of this class); and it creates and returns a new ResultSet object representing the set of records resulting from the search. Since queries using PQF (Prefix Query Format) are so common, we make them a special case by providing a C method. This is identical to C except that it accepts a string containing the query rather than an object, thereby obviating the need to create a C object. See the documentation of that class for information about PQF. =head4 scan() Many Z39.50 servers allow you to browse their indexes to find terms to search for. This is done using the C method, which creates and returns a new ScanSet object representing the set of terms resulting from the scan. C takes a single argument, but it has to work hard: it specifies both what index to scan for terms, and where in the index to start scanning. What's more, the specification of what index to scan includes multiple facets, such as what database fields it's an index of (author, subject, title, etc.) and whether to scan for whole fields or single words (e.g. the title ``I'', or the four words ``Back'', ``Empire'', ``Strikes'' and ``The'', interleaved with words from other titles in the same index. All of this is done by using a single term from the PQF query as the C argument. (At present, only PQF is supported, although there is no reason in principle why CQL and other query syntaxes should not be supported in future). The attributes associated with the term indicate which index is to be used, and the term itself indicates the point in the index at which to start the scan. For example, if the argument is C<@attr 1=4 fish>, then =over 4 =item @attr 1=4 This is the BIB-1 attribute with type 1 (meaning access-point, which specifies an index), and type 4 (which means ``title''). So the scan is in the title index. =item fish Start the scan from the lexicographically earliest term that is equal to or falls after ``fish''. =back The argument C<@attr 1=4 @attr 6=3 fish> would behave similarly; but the BIB-1 attribute 6=3 mean completeness=``complete field'', so the scan would be for complete titles rather than for words occurring in titles. This takes a bit of getting used to. I<###> discuss how the values of options affect scanning. =head4 package() $p = $conn->package(); $o = new ZOOM::Options(); $o->option(databaseName => "newdb"); $p = $conn->package($o); Creates and returns a new C, to be used in invoking an Extended Service. An options block may optionally be passed in. See the C documentation. =head4 destroy() $conn->destroy() Destroys a Connection object, tearing down any low-level connection associated with it and freeing its resources. It is an error to reuse a Connection that has been Ced. =head2 ZOOM::ResultSet I<###> =head2 ZOOM::Record I<###> =head2 ZOOM::Exception In general, method calls throw an exception (of class C) if anything goes wrong, so you don't need to test for success after each call. Exceptions are caught by enclosing the main code in an C block and checking C<$@> on exit from that block, as in the code-sample above. There are a small number of exceptions to this rule: the three record-fetching methods in the C class, C, C, and C can all return undefined values for legitimate reasons, under circumstances that do not merit throwing an exception. For this reason, the return values of these methods should be checked. See the individual methods' documentation for details. =head3 Methods I<###> =head2 ZOOM::ScanSet I<###> =head2 ZOOM::Package I<###> =head2 ZOOM::Query I<###> =head2 ZOOM::Options I<###> =head1 ENUMERATIONS The ZOOM module provides two enumerations that list possible return values from particular functions. They are described in the following sections. =head2 ZOOM::Error if ($@->code() == ZOOM::Error::QUERY_PQF) { return "your query was not accepted"; } This class provides a set of manifest constants representing some of the possible error codes that can be raised by the ZOOM module. The methods that return error-codes are C, C and C. The C class provides the constants C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C and C, each of which specifies a client-side error. Since errors may also be diagnosed by the server, and returned to the client, error codes may also take values from the BIB-1 diagnostic set of Z39.50, listed at the Z39.50 Maintenance Agency's web-site at http://www.loc.gov/z3950/agency/defns/bib1diag.html All error-codes, whether client-side from the C enumeration or server-side from the BIB-1 diagnostic set, can be translated into human-readable messages by passing them to the C utility function. =head2 ZOOM::Event if ($conn->last_event() == ZOOM::Event::CONNECT) { print "Connected!\n"; } In applications that need it - mostly complex multiplexing applications - The C method is used to return an indication of the last event that occurred on a particular connection. It always returns a value drawn from this enumeration, that is, one of C, C, C, C, C, C, C, C, C or C. You almost certainly don't need to know about this. Frankly, I'm not sure how to use it myself. =head1 SEE ALSO The ZOOM abstract API, http://zoom.z3950.org/api/zoom-current.html The C module, included in the same distribution as this one. The C module, which this one supersedes. http://perl.z3950.org/ The documentation for the ZOOM-C module of the YAZ Toolkit, which this module is built on. Specifically, its lists of options are useful. http://indexdata.com/yaz/doc/zoom.tkl The BIB-1 diagnostic set of Z39.50, http://www.loc.gov/z3950/agency/defns/bib1diag.html =head1 AUTHOR Mike Taylor, Emike@indexdata.comE =head1 COPYRIGHT AND LICENCE Copyright (C) 2005 by Index Data. This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available. =cut 1;