# $Id: ZOOM.pod,v 1.10 2005-11-18 17:55:08 mike Exp $

use strict;
use warnings;

=head1 NAME

ZOOM - Perl extension implementing the ZOOM API for Information Retrieval

=head1 SYNOPSIS

 use ZOOM;
 eval {
     $conn = new ZOOM::Connection($host, $port)
     $conn->option(preferredRecordSyntax => "usmarc");
     $rs = $conn->search_pqf('@attr 1=4 dinosaur');
     $n = $rs->size();
     print $rs->record(0)->render();
 };
 if ($@) {
     print "Error ", $@->code(), ": ", $@->message(), "\n";
 }

=head1 DESCRIPTION

This module provides a nice, Perlish implementation of the ZOOM
Abstract API described and documented at http://zoom.z3950.org/api/

the ZOOM module is implemented as a set of thin classes on top of the
non-OO functions provided by this distribution's C<Net::Z3950::ZOOM>
module, which in 
turn is a thin layer on top of the ZOOM-C code supplied as part of
Index Data's YAZ Toolkit.  Because ZOOM-C is also the underlying code
that implements ZOOM bindings in C++, Visual Basic, Scheme, Ruby, .NET
(including C#) and other languages, this Perl module works compatibly
with those other implementations.  (Of course, the point of a public
API such as ZOOM is that all implementations should be compatible
anyway; but knowing that the same code is running is reassuring.)

The ZOOM module provides two enumerations (C<ZOOM::Error> and
C<ZOOM::Event>), a single utility function C<diag_str()> in the C<ZOOM>
package itself, and eight classes:
C<ZOOM::Exception>,
C<ZOOM::Options>,
C<ZOOM::Connection>,
C<ZOOM::Query>,
C<ZOOM::ResultSet>,
C<ZOOM::Record>,
C<ZOOM::ScanSet>
and
C<ZOOM::Package>.
Of these, the Query class is abstract, and has two concrete
subclasses:
C<ZOOM::Query::CQL>
and
C<ZOOM::Query::PQF>.
Many useful ZOOM applications can be built using only the Connection,
ResultSet, Record and Exception classes, as in the example
code-snippet above.

A typical application will begin by creating an Connection object,
then using that to execute searches that yield ResultSet objects, then
fetching records from the result-sets to yield Record objects.  If an
error occurs, an Exception object is thrown and can be dealt with.

More sophisticated applications might also browse the server's indexes
to create a ScanSet, from which indexed terms may be retrieved; others
might send ``Extended Services'' Packages to the server, to achieve
non-standard tasks such as database creation and record update.
Searching using a query syntax other than PQF can be done using an
query object of one of the Query subclasses.  Finally, sets of options
may be manipulated independently of the objects they are associated
with using an Options object.

In general, method calls throw an exception if anything goes wrong, so
you don't need to test for success after each call.  See the section
below on the Exception class for details.

=head1 UTILITY FUNCTION

=head2 ZOOM::diag_str()

 $msg = ZOOM::diag_str(ZOOM::Error::INVALID_QUERY);

Returns a human-readable English-language string corresponding to the
error code that is its own parameter.  This works for any error-code
returned from
C<ZOOM::Exception::code()>,
C<ZOOM::Connection::error_x()>
or
C<ZOOM::Connection::errcode()>,
irrespective of whether it is a member of the C<ZOOM::Error>
enumeration or drawn from the BIB-1 diagnostic set.

=head1 CLASSES

The eight ZOOM classes are described here in ``sensible order'':
first, the four commonly used classes, in the he order that they will
tend to be used in most programs (Connection, ResultSet, Record,
Exception); then the four more esoteric classes in descending order of
how often they are needed.

With the exception of the Options class, which is an extension to the
ZOOM model, the introduction to each class includes a link to the
relevant section of the ZOOM Abstract API.

=head2 ZOOM::Connection

 $conn = new ZOOM::Connection("indexdata.dk:210/gils");
 print("server is '", $conn->option("serverImplementationName"), "'\n");
 $conn->option(preferredRecordSyntax => "usmarc");
 $rs = $conn->search_pqf('@attr 1=4 mineral');
 $ss = $conn->scan('@attr 1=1003 a');
 if ($conn->errcode() != 0) {
    die("somthing went wrong: " . $conn->errmsg())
 }
 $conn->destroy()

This class represents a connection to an information retrieval server,
using an IR protocol such as ANSI/NISO Z39.50, SRW (the
Search/Retrieve Webservice), SRU (the Search/Retrieve URL) or
OpenSearch.  Not all of these protocols require a low-level connection
to be maintained, but the Connection object nevertheless provides a
location for the necessary cache of configuration and state
information, as well as a uniform API to the connection-oriented
facilities (searching, index browsing, etc.), provided by these
protocols.

See the description of the C<Connection> class in the ZOOM Abstract
API at
http://zoom.z3950.org/api/zoom-current.html#3.2

=head3 Methods

=head4 new()

 $conn = new ZOOM::Connection("indexdata.dk", 210);
 $conn = new ZOOM::Connection("indexdata.dk:210/gils");
 $conn = new ZOOM::Connection("tcp:indexdata.dk:210/gils");
 $conn = new ZOOM::Connection("http:indexdata.dk:210/gils");

Creates a new Connection object, and immediately connects it to the
specified server.  If you want to make a new Connection object but
delay forging the connection, use the C<create()> and C<connect()>
methods instead.

This constructor can be called with two arguments or a single
argument.  In the former case, the arguments are the name and port
number of the Z39.50 server to connect to; in the latter case, the
single argument is a YAZ service-specifier string of the form

=over 4

=item

[I<scheme>:]I<host>[:I<port>][/I<databaseName>]

=back

In which the I<host> and I<port> parts are as in the two-argument
form, the I<databaseName> if provided specifies the name of the
database to be used in subsequent searches on this connection, and the
optional I<scheme> (default C<tcp>) indicates what protocol should be
used.  At present, the following schemes are supported:

=over 4

=item tcp

Z39.50 connection.

=item ssl

Z39.50 connection encrypted using SSL (Secure Sockets Layer).  Not
many servers support this, but Index Data's Zebra is one that does.

=item unix

Z39.50 connection on a Unix-domain (local) socket, in which case the
I<hostname> portion of the string is instead used as a filename in the
local filesystem.

=item http

SRW connection using SOAP over HTTP.

=back

Support for SRU will follow in the fullness of time.

If an error occurs, an exception is thrown.  This may indicate a
networking problem (e.g. the host is not found or unreachable), or a
protocol-level problem (e.g. a Z39.50 server rejected the Init
request).

=head4 create() / connect()

 $options = new ZOOM::Options();
 $options->option(implementationName => "my client");
 $conn = create ZOOM::Connection($options)
 $conn->connect($host, 0);

The usual Connection constructor, C<new()> brings a new object into
existence and forges the connection to the server all in one
operation, which is often what you want.  For applications that need
more control, however, these two method separate the two steps,
allowing additional steps in between such as the setting of options.

C<create()> creates and returns a new Connection object, which is
I<not> connected to any server.  It may be passed an options block, of
type C<ZOOM::Options> (see below), into which options may be set
before or after the creation of the Connection.  The connection to the
server may then be forged by the C<connect()> method, the arguments of
which are the same as those of the C<new()> constructor.

=head4 error_x() / errcode() / errmsg() / addinfo() / diagset()

 ($errcode, $errmsg, $addinfo, $diagset) = $conn->error_x();
 $errcode = $conn->errcode();
 $errmsg = $conn->errmsg();
 $addinfo = $conn->addinfo();
 $diagset = $conn->diagset();

These methods may be used to obtain information about the last error
to have occurred on a connection - although typically they will not
been used, as the same information is available through the
C<ZOOM::Exception> that is thrown when the error occurs.  The
C<errcode()>,
C<errmsg()>,
C<addinfo()>
and
C<diagset()>
methods each return one element of the diagnostic, and
C<error_x()>
returns all four at once.

See the C<ZOOM::Exception> for the interpretation of these elements.

=head4 option() / option_binary()

 print("server is '", $conn->option("serverImplementationName"), "'\n");
 $conn->option(preferredRecordSyntax => "usmarc");
 $conn->option_binary(iconBlob => "foo\0bar");
 die if length($conn->option_binary("iconBlob") != 7);

Objects of the Connection, ResultSet, ScanSet and Package classes
carry with them a set of named options which affect their behaviour in
certain ways.  See the ZOOM-C options documentation for details:

=over 4

=item *

Connection options are listed at
http://indexdata.com/yaz/doc/zoom.tkl#zoom.connections

=item *

ScanSet options are listed at
http://indexdata.com/yaz/doc/zoom.scan.tkl
I<### move this obvservation down to the appropriate place>

=item *

Package options are listed at
http://indexdata.com/yaz/doc/zoom.ext.html
I<### move this obvservation down to the appropriate place>

=back

These options are set and fetched using the C<option()> method, which
may be called with either one or two arguments.  In the two-argument
form, the option named by the first argument is set to the value of
the second argument, and its old value is returned.  In the
one-argument form, the value of the specified option is returned.

For historical reasons, option values are not binary-clean, so that a
value containing a NUL byte will be returned in truncated form.  The
C<option_binary()> method behaves identically to C<option()> except
that it is binary-clean, so that values containing NUL bytes are set
and returned correctly.

=head4 search() / search_pqf()

 $rs = $conn->search(new ZOOM::Query::CQL('title=dinosaur'));
 # The next two lines are equivalent
 $rs = $conn->search(new ZOOM::Query::PQF('@attr 1=4 dinosaur'));
 $rs = $conn->search_pqf('@attr 1=4 dinosaur');

The principal purpose of a search-and-retrieve protocol is searching
(and, er, retrieval), so the principal method used on a Connection
object is C<search()>.  It accepts a single argument, a C<ZOOM::Query>
object (or, more precisely, an object of a subclass of this class);
and it creates and returns a new ResultSet object representing the set
of records resulting from the search.

Since queries using PQF (Prefix Query Format) are so common, we make
them a special case by providing a C<search_prefix()> method.  This is
identical to C<search()> except that it accepts a string containing
the query rather than an object, thereby obviating the need to create
a C<ZOOM::Query::PQF> object.  See the documentation of that class for
information about PQF.

=head4 scan()

Many Z39.50 servers allow you to browse their indexes to find terms to
search for.  This is done using the C<scan> method, which creates and
returns a new ScanSet object representing the set of terms resulting
from the scan.

C<scan()> takes a single argument, but it has to work hard: it
specifies both what index to scan for terms, and where in the index to
start scanning.  What's more, the specification of what index to scan
includes multiple facets, such as what database fields it's an index
of (author, subject, title, etc.) and whether to scan for whole fields
or single words (e.g. the title ``I<The Empire Strikes Back>'', or the
four words ``Back'', ``Empire'', ``Strikes'' and ``The'', interleaved
with words from other titles in the same index.

All of this is done by using a single term from the PQF query as the
C<scan()> argument.  (At present, only PQF is supported, although
there is no reason in principle why CQL and other query syntaxes
should not be supported in future).  The attributes associated with
the term indicate which index is to be used, and the term itself
indicates the point in the index at which to start the scan.  For
example, if the argument is C<@attr 1=4 fish>, then

=over 4

=item @attr 1=4

This is the BIB-1 attribute with type 1 (meaning access-point, which
specifies an index), and type 4 (which means ``title'').  So the scan
is in the title index.

=item fish

Start the scan from the lexicographically earliest term that is equal
to or falls after ``fish''.

=back

The argument C<@attr 1=4 @attr 6=3 fish> would behave similarly; but
the BIB-1 attribute 6=3 mean completeness=``complete field'', so the
scan would be for complete titles rather than for words occurring in
titles.

This takes a bit of getting used to.

The behaviour is C<scan()> is affected by the following options, which
may be set on the Connection through which the scan is done:

=over 4

=item number [default: 10]

Indicates how many terms should be returned in the ScanSet.  The
number actually returned may be less, if the start-point is near the
end of the index, but will not be greater.

=item position [default: 1]

A 1-based index specifying where in the returned list of terms the
seed-term should appear.  By default it should be the first term
returned, but C<position> may be set, for example, to zero (requesting
the next terms I<after> the seed-term), or to the same value as
C<number> (requesting the index terms I<before> the seed term).

=item stepSize [default: 0]

An integer indicating how many indexed terms are to be skipped between
each one returned in the ScanSet.  By default, no terms are skipped,
but overriding this can be useful to get a high-level overview of the
index.

=back

=head4 package()

 $p = $conn->package();
 $o = new ZOOM::Options();
 $o->option(databaseName => "newdb");
 $p = $conn->package($o);

Creates and returns a new C<ZOOM::Package>, to be used in invoking an
Extended Service.  An options block may optionally be passed in.  See
the C<ZOOM::Package> documentation.

=head4 destroy()

 $conn->destroy()

Destroys a Connection object, tearing down any low-level connection
associated with it and freeing its resources.  It is an error to reuse
a Connection that has been C<destroy()>ed.

=head2 ZOOM::ResultSet

 $rs = $conn->search_pqf('@attr 1=4 mineral');
 $n = $rs->size();
 for $i (1 .. $n) {
     $rec = $rs->record($i-1);
     print $rec->render();
 }

A ResultSet object represents the set of zero or more records
resulting from a search, and is the means whereby these records can be
retrieved.  A ResultSet object may maintain client side cache or some,
less, none, all or more of the server's records: in general, this is
supposed to an implementaton detail of no interest to a typical
application, although more sophisticated applications do have
facilities for messing with the cache.  Most applications will only
need the C<size()>, C<record()> and C<sort()> methods.

There is no C<new()> method nor any other explicit constructor.  The
only way to create a new ResultSet is by using C<search()> (or
C<search_prefix()>) on a Connection.

See the description of the C<Result Set> class in the ZOOM Abstract
API at
http://zoom.z3950.org/api/zoom-current.html#3.4

=head3 Methods

=head4 option()

 $conn->option(elementSetName => "f");

Allows options to be set into, and read from a ResultSet, just like
the Connection class's C<option()> method.  There is no
C<option_binary()> method for ResultSet objects.

ResultSet options are listed at
http://indexdata.com/yaz/doc/zoom.resultsets.tkl

=head4 size()

 print "Found ", $rs->size(), " records\n";

Returns the number of records in the result set.

=head4 record(), record_immediate()

 $rec = $rs->record(0);
 $rec2 = $rs->record_immediate(0);
 $rec3 = $rs->record_immediate(1)
     or print "second record wasn't in cache\n";

The C<record()> method returns a C<ZOOM::Record> object representing
a record from result-set, whose position is indicated by the argument
passed in.  This is a zero-based index, so that legitimate values
range from zero to C<$rs->size()-1>.

The C<record_immediate()> API is identical, but it never invokes a
network operation, merely returning the record from the ResultSet's
cache if it's already there, or an undefined value otherwise.  So if
you use this method, B<you must always check the return value>.

=head4 records()

 $rs->records(0, 10, 0);
 for $i (0..10) {
     print $rs->record_immediate($i)->render();
 }

 @nextseven = $rs->records(10, 7, 1);

The C<record_immediate()> method only fetches records from the cache,
whereas C<record()> fetches them from the server if they have not
already been cached; but the ZOOM module has to guess what the most
efficient strategy for this is.  It might fetch each record, alone
when asked for: that's optimal in an application that's only
interested in the top hit from each search, but pessimal for one that
wants to display a whole list of results.  Conversely, the software's
strategy might be always to ask for blocks of a twenty records:
that's great for assembling long lists of things, but wasteful when
only one record is wanted.  The problem is that the ZOOM module can't
tell, when you call C<$rs->record()>, what your intention is.

But you can tell it.  The C<records()> method fetches a sequence of
records, all in one go.  It takes three arguments: the first is the
zero-based index of the first record in the sequence, the second is
the number of records to fetch, and the third is a boolean indication
of whether or not to return the retrieved records as well as adding
them to the cache.  (You can always pass 1 for this if you like, and
Perl will discard the unused return value, but there is a small
efficiency gain to be had by passing 0.)

Once the records have been retrieved from the server
(i.e. C<records()> has completed without throwing an exception), they
can be fetched much more efficiently using C<record()> - or
C<record_immediate(), which is then guaranteed to succeed.

=head4 cache_reset()

 $rs->cache_reset()

Resets the ResultSet's record cache, so that subsequent invocations of
C<record_immediate()> will fail.  I struggle to imagine a real
scenario where you'd want to do this.

=head4 sort()

 if ($rs->sort("yaz", "1=4 >i") < 0) {
     die "sort failed";
 }

Sorts the ResultSet in place ###

=head4 destroy()

 $rs->destroy()

Destroys a ResultSet object, freeing its resources.  It is an error to
reuse a ResultSet that has been C<destroy()>ed.

=head2 ZOOM::Record

I<###>

=head2 ZOOM::Exception

In general, method calls throw an exception (of class
C<ZOOM::Exception>) if anything goes wrong, so you don't need to test
for success after each call.  Exceptions are caught by enclosing the
main code in an C<eval{}> block and checking C<$@> on exit from that
block, as in the code-sample above.

There are a small number of exceptions to this rule: the three
record-fetching methods in the C<ZOOM::ResultSet> class,
C<record()>,
C<record_immediate()>,
and
C<records()>
can all return undefined values for legitimate reasons, under
circumstances that do not merit throwing an exception.  For this
reason, the return values of these methods should be checked.  See the
individual methods' documentation for details.

=head3 Methods

I<###>

=head2 ZOOM::ScanSet

I<###>

=head2 ZOOM::Package

I<###>

=head2 ZOOM::Query

I<###>

=head2 ZOOM::Options

I<###>

=head1 ENUMERATIONS

The ZOOM module provides two enumerations that list possible return
values from particular functions.  They are described in the following
sections.

=head2 ZOOM::Error

 if ($@->code() == ZOOM::Error::QUERY_PQF) {
     return "your query was not accepted";
 }

This class provides a set of manifest constants representing some of
the possible error codes that can be raised by the ZOOM module.  The
methods that return error-codes are
C<ZOOM::Exception::code()>,
C<ZOOM::Connection::error_x()>
and
C<ZOOM::Connection::errcode()>.

The C<ZOOM::Error> class provides the constants
C<NONE>,
C<CONNECT>,
C<MEMORY>,
C<ENCODE>,
C<DECODE>,
C<CONNECTION_LOST>,
C<INIT>,
C<INTERNAL>,
C<TIMEOUT>,
C<UNSUPPORTED_PROTOCOL>,
C<UNSUPPORTED_QUERY>,
C<INVALID_QUERY>,
C<CREATE_QUERY>,
C<QUERY_CQL>,
C<QUERY_PQF>,
C<SORTBY>,
C<CLONE>,
C<PACKAGE>
and
C<SCANTERM>,
each of which specifies a client-side error.  Since errors may also be
diagnosed by the server, and returned to the client, error codes may
also take values from the BIB-1 diagnostic set of Z39.50, listed at
the Z39.50 Maintenance Agency's web-site at
http://www.loc.gov/z3950/agency/defns/bib1diag.html

All error-codes, whether client-side from the C<ZOOM::Error>
enumeration or server-side from the BIB-1 diagnostic set, can be
translated into human-readable messages by passing them to the
C<ZOOM::diag_str()> utility function.

=head2 ZOOM::Event

 if ($conn->last_event() == ZOOM::Event::CONNECT) {
     print "Connected!\n";
 }

In applications that need it - mostly complex multiplexing
applications - The C<ZOOM::Connection::last_event()> method is used to
return an indication of the last event that occurred on a particular
connection.  It always returns a value drawn from this enumeration,
that is, one of C<NONE>, C<CONNECT>, C<SEND_DATA>, C<RECV_DATA>,
C<TIMEOUT>, C<UNKNOWN>, C<SEND_APDU>, C<RECV_APDU>, C<RECV_RECORD> or
C<RECV_SEARCH>.

You almost certainly don't need to know about this.  Frankly, I'm not
sure how to use it myself.

=head1 SEE ALSO

The ZOOM abstract API,
http://zoom.z3950.org/api/zoom-current.html

The C<Net::Z3950::ZOOM> module, included in the same distribution as this one.

The C<Net::Z3950> module, which this one supersedes.
http://perl.z3950.org/

The documentation for the ZOOM-C module of the YAZ Toolkit, which this
module is built on.  Specifically, its lists of options are useful.
http://indexdata.com/yaz/doc/zoom.tkl

The BIB-1 diagnostic set of Z39.50,
http://www.loc.gov/z3950/agency/defns/bib1diag.html

=head1 AUTHOR

Mike Taylor, E<lt>mike@indexdata.comE<gt>

=head1 COPYRIGHT AND LICENCE

Copyright (C) 2005 by Index Data.

This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself, either Perl version 5.8.4 or,
at your option, any later version of Perl 5 you may have available.

=cut

1;