Email/Z39.50 gateway guide <author>Europagate, 1995 <date>$Revision: 1.8 $ <abstract> This document describes a Email server that provides access to the Z39.50 protocol. </abstract> <toc> <sect>Introduction <p> This document describes an email server subsystem developed within the EUROPAGATE project. The first part of this document serves as an administrators guide, while the second part is a follow-up on the Design deliverable (WP4.1) that outline the deviations from the design. Also, the second part contains a quick overview of the source code. <sect>Compilation <p> An ANSI C compiler is required in order to compile the ES software. The ES can use either CNIDR's Zdist package or the YAZ package from Index Data to interface the Z39.50 protocol. So you need to obtain one of these first. The Zdist package can be found in: <htmlurl url="ftp://ftp.cnidr.org/pub/NIDR.tools/zdist/zdist102b1-1.tar.Z" name="ftp://ftp.cnidr.org/pub/NIDR.tools/zdist/zdist102b1-1.tar.Z"> The Zdist package doesn't support result-set references. Also, it has a few bugs. Therefore we've included a patch <tt/zdist.patch/ which fixes some of these bugs. Run patch in the directory above <tt/zdist102b1-1/: <tscreen><verb> $ patch <zdist.patch </verb></tscreen> The ES server only depends on <tt>libz3950.a</tt> so you only need to build the Zdist software in the directory <tt/libz3950/. YAZ can be found at the FTP host: <htmlurl url="ftp://130.225.252.168/index/yaz" name="ftp://130.225.252.168/index/yaz"> The ES also use GNU's regex package to parse regular expressions. The ES has been tested with regex-0.12. Some systems, such as Linux, come with the regex package preinstalled. Unpack <tt>egate.tar.gz</tt> and edit the top level <tt/Makefile/. Specify where the GNU regex package is located and specify whether you use YAZ or Zdist. One some systems, you may have to set the <tt/NETLIB/ as well. The shell variables <tt/CC/ and <tt/CFLAGS/ are used by the <tt/Makefile/ so you may modify these before compiling. Now, type <tt/make/. <sect>Installation <p> If the compilation succeeds, you should install the software. Edit the <tt/Makefile/ and set the LIBDIR to the installation directory. Since, the ES is executed by the mail system, and not by a user, this directory shouldn't be globally executable. When satisfied, type <tt/make install/. Three executables are installed in LIBDIR: <descrip> <tag/eti/ The email transport interface. This program receives incoming mail, identifies the user, and delivers the mail request to the monitor or kernel (depending on configuration). <tag/monitor/ The monitor is optional component. The main objective of the monitor is to limit the number of simultanous running kernel processes. <tag/kernel/ The kernel process is the core of the ES. It parses the user's requests and interfaces the Z39.50 protocols. </descrip> The <tt/sendmail/ or a similar program delivers the mail to the <tt/eti/ program. The <tt/sendmail/ program usually runs as user <tt/mail/ or some other special user name. We strongly suggest that you create a special user and group for the ES software. In this case you should use <tt/chmod/ to and set the 'set user ID on execution' bits on the executable files and give that user read/write/execute permissions in LIBDIR. The mail system needs to know about the ES. Pick some name that serves as the ES user and edit <tt/aliases/ used by your mail system (usually <tt>usr/lib/aliases</tt>). Now add the following line: <tt>es:"|/usr/local/lib/es/eti </tt><em>options</em><tt>"</tt> In this example the mail user name is <tt/es/ and the LIBDIR is <tt>/usr/local/lib/es</tt>. The ES system can operate with or without the monitor. When using the monitor the number of simultanous running kernels can be controlled. If the <tt>eti</tt> program is started with two dashes (<tt>--</tt>) it will operate without the monitor and the options specified after the two dashes are transferred to the kernel. <sect1>Running with the monitor <p> The monitor must be running at all times in this mode. You should start the monitor in one of your boot scripts (rc). For example this might be put in a boot script: <tscreen><verb> (cd /usr/local/lib/es; ./monitor -d -l mon.log -- -d -l kernel.log &) </verb></tscreen> Here the monitor is started with the options <tt>-d -l mon.log</tt> and the options after the two dashes are transferred to the kernel. In this mode, the eti should contact the monitor (and not the kernel), so the following might be put in the aliases file: <tscreen><verb> es:"|/usr/local/lib/es/eti -c /usr/local/lib/es" </verb></tscreen> The eti sets current directory to the path specified by option <tt>-c</tt>. <sect1>Running without the monitor <p> In this mode you should never start the monitor. The eti will contact the kernel directly. The following line could be put in your aliases file: <tscreen><verb> es:"|/usr/local/lib/es/eti -c /usr/local/lib/es -- -d -l kernel.log" </verb></tscreen> <sect1>eti <p> The eti program accepts the following options: <descrip> <tag><tt>-l </tt>log</tag> The log file. If absent stderr is used. <tag><tt>-d</tt></tag> Turns on debugging. <tag><tt>-c </tt>dir</tag> Sets current directory to dir. <tag><tt>-H</tt></tag> Help message. <tag><tt>--</tt></tag> Indicates that the eti program should contact the kernel (and not the monitor. All options after this one are transferred to the kernel </descrip> <sect1>monitor <p> The monitor program accepts the following command line options: <descrip> <tag><tt>-l </tt>log</tag> The log file. If absent stderr is used. <tag><tt>-d</tt></tag> Turns on debugging. <tag><tt>-H</tt></tag> Help message. <tag><tt>--</tt></tag> Precedes options that are transferred to the kernel </descrip> The monitor normally reads the resource <tt>default.res</tt> in current directory. You can change this behaviour by specifying an alternate file on the command line. <sect1>kernel <p> List of options observed by the kernel: <descrip> <tag><tt>-d</tt></tag> Turns on debugging. <tag><tt>-t </tt>target</tag> Opens connection to target (for testing only). <tag><tt>-g </tt>lang</tag> Set language name. <tag><tt>-o </tt>res</tag> Overriding resource file name. These resources override both <tt>default.res</tt> and all user resources. <tag><tt>-h </tt>host</tag> Override host name (for testing only). <tag><tt>-p </tt>port</tag> Override port no (for testing only). <tag><tt>-l </tt>log</tag> Specify log file. <tag><tt>-H</tt></tag> Help message. </descrip> The kernel normally reads the resource <tt>default.res</tt> in current directory. You can change this behaviour by specifying an alternate file on the command line. <sect>Managing the system <sect1>Summary of files <p> To maintain the ES you need to know the files it uses. These are: <descrip> <tag>*.res</tag> Resource files with several settings that control how the system operates, such as definition of targets, messages, etc. <tag>*.bib</tag> Bib-1 attribute mapping files. These files describe the mapping between CCL and the RPN query. <tag>user.db</tag> Database of users. Only the eti process accesses this file. <tag>user.*.r</tag> Resource file for a user — accessed by the kernel — only created when the user uses the <tt>def</tt> command. <tag>user.*.p</tag> Persistency file for a user — accessed by the kernel process. </descrip> The ES system is mostly managed by resource files. The following are example resource files that comes with the ES: <descrip> <tag><tt>default.res</tt></tag> General resource with reasonable defaults. This file is read by the monitor and the kernel. <tag><tt>loc.res</tt></tag> Resource file for Library of Congress test server. <tag><tt>drewdb.res</tt></tag> Resource file for Data Research's test server. <tag><tt>lang.uk.res</tt></tag> Resource file for english conversation. <tag><tt>lang.dk.res</tt></tag> Resource file for danish conversation. </descrip> <sect1>Resources <p> Most general resources should be set in the file <tt>default.res</tt>. Some of the resources may be changed (overridden) by the user, while others may be overridden by individual target defintions. The complete scenario is depicted below: <tscreen><verb> +-------------+ | default.res | +-------------+ | +--------------+ |<---------| "target.res" | | +--------------+ | | +--------------+ |<---------| user.x.res | | +--------------+ | | +--------------+ |<---------| "lang.res" | | +--------------+ | | +--------------+ |<---------| "override" | | +--------------+ result </verb></tscreen> The following describes the general resources: <descrip> <tag>gw.reply.mta</tag> Name of MTA program — default <tt>/usr/lib/sendmail</tt>. <tag>gw.reply.tmp.prefix</tag> Prefix of temporary files used by the ES. <tag>gw.reply.tmp.dir</tag> Name of directory with temporary files. <tag>gw.marc.log</tag> If this resource is specified, retrieved MARC records will be appended to this file. <tag>gw.timeout</tag> Idle time before the kernel exits. When the kernel exits, the Z39.50 persistency layer will reconnect when necessary. <tag>gw.resultset</tag> If this setting is 1, the Z39.50 client will use named result sets. If 0, the Z39.50 system will always use <tt/Default/ as result-set name. <tag>gw.persist</tag> If this setting 1, the persistency is enabled; disabled otherwise. <tag>gw.max.process</tag> This settings is the maximum number of simultaneous kernel processes — only used by the monitor. <tag>gw.ignore.which</tag> Some targets doesn't indicate whether a record is a diagnostic messaage or a database record. If this setting is 1, the ES will always try to interpret the record as a database record in ISO2709 format. If 0, the ES will use the record type. <tag>gw.default.show</tag> Default number of records to retrieve and display when using the show command. This setting may be changed by the user with the <tt>def defaultshow</tt> command. <tag>gw.max.show</tag> This setting specifies the maximum number of records the user may retrieve in one show command — default 100. <tag>gw.autoshow</tag> Number of records to retrieve in a find command — default 0. This setting may be changed by the user by the <tt>def autoshow</tt> command. <tag>gw.display.format</tag> Default display format. This setting may be changed by the user by the <tt>def f</tt> command. <tag>gw.language</tag> Current language. This setting may be changed by the user with the <tt>def lang</tt> command. When the langauge is set to something, say x, then the resource gw.lang.x should hold a name of a resource file read by the kernel. <tag>gw.lang.<em/x/</tag> Specifies name of resource file for language <em/x/. <tag>gw.target.<em/name/ </tag> Name of resource file of target <em/name/. <tag>gw.portno</tag> Z39.50 target port number — default 210. <tag>gw.hostname</tag> Z39.50 target host name. <tag>gw.bibset</tag> Name of file with Bib-1 attribute mapping. <tag>gw.databases</tag> Available databases on target. <tag>gw.description</tag> Description of a target. This message is returned to the user when the connection is established with the target. <tag>gw.account</tag> Z39.50 Authentication string — default empty (i.e. none). </descrip> <sect1>Messages <p> There are several resource settings that deal with language dependencies. These fall into the following categories that depend on the resource name prefixes: <descrip> <tag>gw.msg</tag> Miscellaneous messages. <tag>gw.err</tag> Error messages. <tag>gw.bib1.diag.<em/no/</tag> Diagnostic error message indicated by <em/no/. <tag>gw.help</tag> Help/description of various commands. <tag>ccl.command</tag> CCL command names. <tag>ccl.token</tag> CCL tokens names. </descrip> Refer to the sample files, <tt>default.res</tt>, <tt>lang.uk.res</tt> and <tt>lang.dk.res</tt> for all available settings. <sect1>Target definitions <p> To add a target definition called <em/mytarget/ you need to make a resource entry in <tt>default.res</tt> called <tt>gw.target.</tt><em>mytarget</em>. The value of this resource is the name of a resource file — for example <em>mytarget</em><tt>.res</tt>. The resource file should at least define the resources: <tt/gw.hostname/, <tt/gw.databases/ and <tt/gw.description/. You might also consider specifying <tt/gw.account/, <tt/gw.bibset/, <tt/gw.resultset/ and <tt/gw.portno/ in the target resource file. The user only needs to use the command <tt>target </tt><em>mytarget</em> to use the target. Also, since we already specified database names, the user doesn't need to use the <tt/base/ command. <sect1>CCL to RPN mapping <p> The mapping between CCL-queries and RPN are stored in files — normally with the suffix <tt>.bib</tt>. We will refer these files as bibset-files. You might consult the file <tt/default.bib/ to see an example of such file. The mapping is necessary because targets usually only support a little subset of the Bib-1 attribute set and because the CCL qualifiers (field names) are not standardized. A bibset-file is specified by the <tt/gw.bibset/ resource. Column zero of a bib-file line either hold a hash character (<tt/#/) indicating a comment in which case the rest of the line is ignored; or a CCL qualifier. The name of the CCL qualifier is up to you. However, the special qualifier name <tt/term/ applies to the case where no qualifier is specified in CCL. The CCL qualifier is followed by one or more mapping specifications. A mapping specification takes the form: <em/type/<tt/=/<em/value/<tt/,/<em/value/... The type is simply one of the six Bib-1 attribute query types: <descrip> <tag/u/ Use attribute. Value is an integer. <tag/t/ Truncation attribute. Value is an integer; or the value is a combination of: <descrip> <tag/l/ This character indicates that the CCL parser should allow left truncation (2) if indicated by a <tt/?/ on the left side of a term. <tag/r/ This character indicates that the CCL parser should allow right truncation (1) if indicated by a <tt/?/ on the right side of a term. <tag/b/ This character indicates that the CCL parser should allow both left and right (3) truncation indicated by a <tt/?/ on both left and right side of a term. <tag/n/ This character indicates that the CCL parser should announce no truncation (100) if no truncation was specified. </descrip> <tag/p/ Position attribute. Valus is an integer. <tag/s/ Structure attribute. Value is an integer; or the value is <tt/pw/ in which case the CCL parser announces word (2) or phrase (1) depending on the number of adjacent terms. <tag/r/ Relation attribute. Value is an integer; or the value is <tt/o/ in which case, the CCL parser will select <em/less than/, <em/less than or equal/, ... <em/greater than/ — depending on the relation specified in CCL. <tag/p/ Position attribute. Value is an integer. </descrip> Consider these bibset-lines: <tscreen><verb> term t=l,r,b s=pw au= u=1 t=l,r,b s=pw date u=30 r=o </verb></tscreen> The first line describes the mapping in when no qualifiers are present, as in: <tscreen><verb> find foo bar? </verb></tscreen> In this case the right truncation is enabled and the structure is <em/phrase/. The second line is used in this search: <tscreen><verb> find au=andersen </verb></tscreen> where the use attribute is <em/author/ and the structure is <em/word/. The third line is used in: <tscreen><verb> find date>1990 </verb></tscreen> where the use attribute is <em/date/ and the relation is <em/greater than/. <sect>Implementation <p> The implementation of the email server includes all the modules described in the design deliverable. The work was roughly carried out as follows: <enum> <item>The logging facilities and resource management utilities were implemented — virtually all other modules depend on these modules. <item>A minimal ES was implemented — including a high-level API to the Z39.50 sub-system and a CCL parser with a few commands, such as FIND and SHOW. This version displayed MARC records in a raw format. This version served as base for the URP. <item>The first version of the MARC display formatting tool, FML, was implemented and included in the ES. <item>The ETI program was implemented along with the IPC (interprocess communication) utilities based on FIFOs. Facilities to keep connections alive (to Z39.50 targets) was implemented. To identify a user, a file-resident symbol table (small database) was implemented which maps a email username to a unique integer (email userid). <item>The protocol persistency was implemented and more CCL commands were added. <item>The monitor program was implemented. </enum> The following sections cover the most important modules in the ES and deviations from the design. <sect1>Z39.50 Interface layer <p> The design report specified that the Zdist toolkit from CNIDR would be used in the ES to provide access to the Z39.50 protocol. The package was choosen bacause it is easy to use and, more important, we felt that the API would be reasonably stable and supported. Nevertheless it turned out that CNIDR choose to change the API completely around January 1995 and announced a new version called zdist102b1-1. <em>Note: As of this date the newest version of Zdist is still zdist102b1-1. CNIDR seems to concentracte on their Isite package which also includes a Zdist package presumably similar to the standalone Zdist package</em> During the work with the Zdist package a few bugs were discovered. Fortunately, they could be solved within a few days. We also discovered that the package lacks result-set references. We posted the bug fixes to Kevin Gamiel who is responsible for the package but we didn't get responses. So, eventually, we weren't satisfied with the package after all. In February some of us began the development of a new Z39.50 package called YAZ — in retrospect somewhat motivated by the experiences with existing Z39.50/SR toolkits. To support result-set references we chose to incorporate a YAZ interface in the ES also. And we designed and implemented a simple high-level Z39.50 origin API that supported both Zdist and YAZ. The protocol persistency module was implemented on top of the high-level API and not on top of Zdist. The obvious advantage is that the persistency module is not tied to one particular Z39.50/SR package. Persistency information stored for each user is simply: <itemize> <item>hostname and port number. <item>authentication string <item>selected database(s) <item>next result set number <item>next result set position <item>result set information </itemize> Information about each result set includes: <itemize> <item>name <item>size (number of hits) <item>database(s) <item>query </itemize> A persistency file is removed each time a new target is selected. It is our experiences that the persistency files are very small. <sect1>CCL <p> The CCL was implemented as described in the design. A CCL utility was made as a separate module which implements a tokenization package and a parser which translates from FIND to RPN. The data structure used to represent the RPN query is also used in Z39.50 search API on top of YAZ or Zdist. The CCL parser is quite configurable. Token names can be redefined to one or more names (aliases). Also, the specification of mapping between CCL field names (qualifiers) and Bib-1 attributes can be specified in either the C API or a file. Although the Z39.50 system in the ES uses the Bib-1 attribute set, the CCL parser itself is not tied to Bib-1. <sect1>FML <p> The FML system is used to handle the presentation of MARC records. There are some deviations to the design report, however. The most important changes are: <itemize> <item>The <tt/expr/ function is not implemented. Instead arithmetic operators <tt/plus/, <tt/minus/, <tt/mult/ and <tt/div/ are implemented. Also relational operators <tt/gt/, <tt/lt/ ... are implemented. <item>The <tt/lindex/ function is called <tt/index/ and it is a binary operator where the left operand is the list and the right operand is the index integer. <item>The MARC extraction routines are not implemented. Instead, a MARC record is transferred as an argument to a formatting-routine (in list notation). The formatting routine then extracts fields from the list by list/string manipulation functions. <item>A new statement, <tt/bin/, is implemented to define binary operators (functions). </itemize> <sect1>IPC <p> As described in the design, FIFOs are used to communicate between the ETI, monitor and kernel. The ES can run without the monitor, however. The primary reason for the presence of the monitor was to assure that the kernel releases the resources used by the persistency layer. But, since the persistency layer did turn out to use virtually no disk space at all, there was no point in starting a kernel process to remove its files — hence this facility was not implemented. The only purpose of the monitor is to keep the number of running kernels at a maximum level and even that is probably useless since most unices will swap kernel processes out anyway. The idle time before a kernel exits and saves its persistency file is not controlled by the monitor. Saving the persistency file and keeping it is usually a good approach — even when a user doesn't reference/show old result-sets since the user has a notion of <em/current target/ and database. <sect1>Source <p> In this section a short description of each source module is given. Each module is implemented in a separate sub directory. Any public headers are located in the <tt/include/ directory. <descrip> <tag/res+log/ is an implementation of the logging system and the resource management sub system. Note that the resource module depends on the logging facility. Logging is implemented in <tt>gw-log.c</tt> and <tt/gw-log.h/. The file <tt>gw-log-test.c</tt> is small test program for the logging system. The core of the resource management is implemented in <tt>gw-res.c</tt>. The files <tt>gw-res-bool.c</tt> and <tt>gw-res-int.c</tt> implement two utility routines &mdash on top of the resource management. The header file <tt>gw-resp.h</tt> is a private header file and <tt>gw-res.h</tt> is a public header file. <tag/ccl/ implements CCL to RPN mapping and a tokenization utility for other CCL commands. The mapping function is implemented in <tt>cclfind.c</tt>. Qualifiers are handled in <tt>cclqual.c</tt> while reading of qualifier mappings from a file is implemented in <tt>cclqfile.c</tt>. Scanning is implemented in <tt>ccltoken.c</tt>. String utilities, which might be changed if other character sets are needed, is implemented in <tt>cclstr.c</tt>. Table of error messages is implemented in <tt>cclerrms.c</tt>. <tag/util/ implements various utilities: <descrip> <tag>MARC utility</tag> implemented in <tt>iso2709</tt>... <tag>Database utility</tag> implemented in <tt>gw-db.[ch]</tt>. This utility is used to map a user (email) to an integer. <tag>String queue utility</tag> implemented in <tt>strqueue.[ch]</tt>. This utiltiy is used to queue incoming mail in the ETI, kernel and the monitor. <tag>Pretty printer</tag> implemented in <tt>ttyemit.[ch]</tt> — used by the URP. <tag>FIFO IPC utiltiy</tag> implemented in <tt>gip*.[ch]</tt> — used by the ETI, kernel and monitor. </descrip> <tag/fml/ implements FML. The top level functions are implemented in <tt>fml.c</tt>, <tt>fmlcall.c</tt> and <tt>fmlcalls.c</tt>. Scanning is implemented in <tt>fmltoken.c</tt>. Memory management is implemented in <tt>fmlmem.c</tt>. Arithmetic operators are implemented in <tt>fmlarit.c</tt>. String manipulation functions are implemented in <tt>fmlstr.c</tt>. Relational operators are implemented in <tt>fmlrel.c</tt>. List maniuplations are performed in <tt>fmllist.c</tt>. FML symbol table management is implemented in <tt>fmlsym.c</tt>. Conversion from ISO2709 to list notation is implemented in <tt>fmlmarc.c</tt>. <tag/zlayer-zdist/ implements the high-level Z39.50 API on top of Zdist. This task is implemented in <tt>zaccess.c</tt>. The public header file is called <tt>zaccess.h</tt>. <tag/zlayer-yaz/ implements the high-level Z39.50 API on top of YAZ. This task is implemented in <tt>zaccess.c</tt>. The public header file is called <tt>zaccess.h</tt>. <tag/kernel/ implements the ETI, kernel and monitor. The kernel itself is implemented in <tt>main.c</tt>, <tt>urp.c</tt> and <tt>persist.c</tt>. The ETI is implemented in <tt>eti.c</tt> and the monitor is implemented <tt>monitor.c</tt>. </descrip> <sect>LICENSE <p> Copyright © 1995, the EUROPAGATE consortium (see below). The EUROPAGATE consortium members are: <itemize> <item>University College Dublin <item>Danmarks Teknologiske Videnscenter <item>An Chomhairle Leabharlanna <item>Consejo Superior de Investigaciones Cientificas </itemize> Permission to use, copy, modify, distribute, and sell this software and its documentation, in whole or in part, for any purpose, is hereby granted, provided that: 1. This copyright and permission notice appear in all copies of the software and its documentation. Notices of copyright or attribution which appear at the beginning of any file must remain unchanged. 2. The names of EUROPAGATE or the project partners may not be used to endorse or promote products derived from this software without specific prior written permission. 3. Users of this software (implementors and gateway operators) agree to inform the EUROPAGATE consortium of their use of the software. This information will be used to evaluate the EUROPAGATE project and the software, and to plan further developments. The consortium may use the information in later publications. 4. Users of this software agree to make their best efforts, when documenting their use of the software, to acknowledge the EUROPAGATE consortium, and the role played by the software in their work. THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT WARRANTY OF ANY KIND, EXPRESS, IMPLIED, OR OTHERWISE, INCLUDING WITHOUT LIMITATION, ANY WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL THE EUROPAGATE CONSORTIUM OR ITS MEMBERS BE LIABLE FOR ANY SPECIAL, INCIDENTAL, INDIRECT OR CONSEQUENTIAL DAMAGES OF ANY KIND, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER OR NOT ADVISED OF THE POSSIBILITY OF DAMAGE, AND ON ANY THEORY OF LIABILITY, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. </article>