X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Fegate.sgml;h=06d222e418f6109b4118ec407e166f326c2a91a2;hb=d6f09c50ab4d498c8fef6f50feb90329b88a5c13;hp=d891eca125c6aac136ab45c91bc75aaaf7f5678d;hpb=2ab2d1285b3a4c685499bdf8e9fd386ec7fec98f;p=egate.git diff --git a/doc/egate.sgml b/doc/egate.sgml index d891eca..06d222e 100644 --- a/doc/egate.sgml +++ b/doc/egate.sgml @@ -1,13 +1,14 @@
-Email/Z39.50 gateway guide -<author>Europagate, 1995 -<date>$Revision: 1.4 $ +<title>Email/Z39.50 gateway guide +<author>Europagate, 1996 <htmlurl url="http://europagate.dtv.dk" + name="http://europagate.dtv.dk"> +<date>$Revision: 1.11 $ <abstract> This document describes a Email server that provides access to the Z39.50 protocol. @@ -18,63 +19,77 @@ Z39.50 protocol. <sect>Introduction <p> -This document describes how to compile, install and setup the -Email server (ES) software. It does not address the internal design of -the software. +This document describes an email server (ES) system developed +within the <htmlurl url="http://europagate.dtv.dk" name="EUROPAGATE"> +project. The first part of this document +serves as an administrators guide, while the second part is a +follow-up on the Design deliverable (WP4.1) that outlines the +deviations from the design. Also, the second part contains +a quick overview of the source code. -<sect>Before you begin +The software distribution also includes a Web to Z39.50 gateway. Refer +to the web.txt documentation about installation on this gateway. + +<sect>Installation <p> An ANSI C compiler is required in order to compile the ES software. -The ES can use either CNIDR's zdist package or the YAZ package from +The ES can use either CNIDR's Zdist package or the YAZ package from Index Data to interface the Z39.50 protocol. So you need to obtain -either of these first. +one of these first. -The Zdist package can be found in: +The zdist package can be found in: -<url url="ftp://ftp.cnidr.org/pub/NIDR.tools/zdist/zdist102b1-1.tar.Z" > +<htmlurl +url="ftp://ftp.cnidr.org/pub/NIDR.tools/zdist/zdist102b1-1.tar.Z" + name="ftp://ftp.cnidr.org/pub/NIDR.tools/zdist/zdist102b1-1.tar.Z"> -The Zdist package doesn't support result-set references. Also, it has a few -bugs. Therefore we've included a patch <tt/<zdist.patch/ which fixes +The zdist package doesn't support result-set references. Also, it has a few +bugs. Therefore we've included a patch <tt/zdist.patch/ which fixes some of these bugs. Run patch in the directory above <tt/zdist102b1-1/: -<tscreen><verb> -$ patch <zdist.patch -</verb></tscreen> +<verb>patch <zdist.path</verb> +The ES server only depends on <tt>libz3950.a</tt> so you only need +to build the zdist software in the directory <tt/libz3950/. -YAZ can be found in: +YAZ can be found at the FTP host: -<url url="ftp://ftp.algonet.se/pub/index/yaz/">. +<htmlurl url="ftp://ftp.indexdata.dk/index/yaz" + name="ftp://ftp.indexdata.dk/index/yaz"> -The ES also use GNU's regex package to parse regular expressions. +The ES also uses GNU's regex package to parse regular expressions. The ES has been tested with regex-0.12. Some systems, such as Linux, -comes with the regex package preinstalled. - -<sect>Compilation +come with the regex package preinstalled. -<p> Unpack <tt>egate.tar.gz</tt> and edit the top level <tt/Makefile/. Specify -where the GNU regex package is located and specify whether you use -YAZ or zdist. One some systems, you may have to set the <tt/NETLIB/ as -well. +where the GNU regex package is located by setting the variables <tt/REGEXOBJ/ +and <tt/REGXINC/. -You may wish to set <tt/CC/ and <tt/CFLAGS/ in your shell, since these -will affect the compilation — these are not set in the <tt/Makefile/. +A little further down the <tt/Makefile/ you find a section called +<tt/Common settings/ where you specify the location of either YAZ or zdist. +On some systems, you may have to set the <tt/ELIB/ as well to link with +BSD socket libraries. -Now, type <tt/make/. +If you intend only to compile the Email server and not the Web server +you don't have to worry about the section entitled <tt/WWW gateway settings/. -<sect>Installation +The shell variables <tt/CC/ and <tt/CFLAGS/ are used by the +<tt/Makefile/ so you may set these in your shell before you start +compiling. + +Now, type <tt/make email/. <p> -If the compilation was successful, you should install the software. -Edit the <tt/Makefile/ and set the LIBDIR to the installation +If the compilation succeeds, you should install the software in some +standard location. +Edit the <tt/Makefile/ and set EMAILLIBDIR to the installation directory. Since, the ES is executed by the mail system, and not by a user, this directory shouldn't be globally executable. -When satisfied, type <tt/make install/. +When satisfied, type <tt/make install.email/. -Three executables are installed in LIBDIR: +Three executables are installed in EMAILLIBDIR: <descrip> <tag/eti/ The email transport interface. This program receives incoming mail, identifies the user, and delivers the mail request @@ -93,7 +108,7 @@ The <tt/sendmail/ or a similar program delivers the mail to the you create a special user and group for the ES software. In this case you should use <tt/chmod/ to and set the 'set user ID on execution' bits on the executable files and give that user read/write/execute -permissions in LIBDIR. +permissions in EMAILLIBDIR. The mail system needs to know about the ES. Pick some name that serves as the ES user and edit <tt/aliases/ used by your mail system (usually @@ -101,7 +116,7 @@ as the ES user and edit <tt/aliases/ used by your mail system (usually <tt>es:"|/usr/local/lib/es/eti </tt><em>options</em><tt>"</tt> -In this example the mail user name was <tt/es/ and the LIBDIR was +In this example the mail user name is <tt/es/ and the EMAILLIBDIR is <tt>/usr/local/lib/es</tt>. The ES system can operate with or without the monitor. When using @@ -111,7 +126,7 @@ two dashes (<tt>--</tt>) it will operate without the monitor and the options specified after the two dashes are transferred to the kernel. -<sect1>With the monitor +<sect1>Running with the monitor <p> The monitor must be running at all times in this mode. You should @@ -133,7 +148,7 @@ es:"|/usr/local/lib/es/eti -c /usr/local/lib/es" The eti sets current directory to the path specified by option <tt>-c</tt>. -<sect1>Without the monitor +<sect1>Running without the monitor <p> In this mode you should never start the monitor. @@ -382,7 +397,7 @@ The type is simply one of the six Bib-1 attribute query types: both left and right (3) truncation indicated by a <tt/?/ on both left and right side of a term. <tag/n/ This character indicates that the CCL parser should announce - no truncation (100) if no truncation was indicated. + no truncation (100) if no truncation was specified. </descrip> <tag/p/ Position attribute. Valus is an integer. <tag/s/ Structure attribute. Value is an integer; or the @@ -421,10 +436,230 @@ find date>1990 </verb></tscreen> where the use attribute is <em/date/ and the relation is <em/greater than/. +<sect>Implementation + +<p> +The implementation of the email server includes all the modules described +in the design deliverable. + +The work was roughly carried out as follows: +<enum> +<item>The logging facilities and resource management utilities were + implemented — virtually all other modules depend on these + modules. +<item>A minimal ES was implemented — including a high-level + API to the Z39.50 sub-system and a CCL parser with a few + commands, such as FIND and SHOW. This version displayed MARC + records in a raw format. This version served as base for the URP. +<item>The first version of the MARC display formatting tool, FML, + was implemented and included in the ES. +<item>The ETI program was implemented along with the IPC + (interprocess communication) utilities based on FIFOs. Facilities + to keep connections alive (to Z39.50 targets) was implemented. + To identify a user, a file-resident symbol table (small database) was + implemented which maps a email username to a unique integer (email userid). +<item>The protocol persistency was implemented and more CCL commands + were added. +<item>The monitor program was implemented. +</enum> + +The following sections cover the most important modules in the ES and +deviations from the design. + +<sect1>Z39.50 Interface layer + +<p> +The design report specified that the Zdist toolkit from CNIDR would +be used in the ES to provide access to the Z39.50 protocol. The package +was choosen bacause it is easy to use and, more important, we felt +that the API would be reasonably stable and supported. + +Nevertheless it turned out that CNIDR choose to change the API +completely around January 1995 and announced a new version +called zdist102b1-1. + +<em>Note: As of this date the newest version of Zdist is still +zdist102b1-1. CNIDR seems to concentracte on their Isite package +which also includes a Zdist package presumably similar to the +standalone Zdist package</em> + +During the work with the Zdist package a few bugs were discovered. +Fortunately, they could be solved within a few days. We also +discovered that the package lacks result-set references. +We posted the bug fixes to Kevin Gamiel who is responsible for +the package but we didn't get responses. So, eventually, we weren't +satisfied with the package after all. + +In February some of us began the development of a new Z39.50 package +called YAZ — in retrospect somewhat motivated by the +experiences with existing Z39.50/SR toolkits. + +To support result-set references we chose to incorporate a YAZ +interface in the ES also. And we designed and implemented a +simple high-level Z39.50 origin API that supported both Zdist and YAZ. + +The protocol persistency module was implemented on top of +the high-level API and not on top of Zdist. The obvious +advantage is that the persistency module is not tied to one +particular Z39.50/SR package. + +Persistency information stored for each user is simply: +<itemize> +<item>hostname and port number. +<item>authentication string +<item>selected database(s) +<item>next result set number +<item>next result set position +<item>result set information +</itemize> + +Information about each result set includes: +<itemize> +<item>name +<item>size (number of hits) +<item>database(s) +<item>query +</itemize> + +A persistency file is removed each time a new target is selected. +It is our experiences that the persistency files are very small. + +<sect1>CCL + +<p> +The CCL was implemented as described in the design. A CCL utility +was made as a separate module which implements a tokenization +package and a parser which translates from FIND to RPN. The +data structure used to represent the RPN query is also used in +Z39.50 search API on top of YAZ or Zdist. + +The CCL parser is quite configurable. Token names can be redefined to +one or more names (aliases). Also, the specification of mapping +between CCL field names (qualifiers) and Bib-1 attributes can be +specified in either the C API or a file. + +Although the Z39.50 system in the ES uses the Bib-1 attribute set, the +CCL parser itself is not tied to Bib-1. + +<sect1>FML + +<p> +The FML system is used to handle the presentation of MARC +records. There are some deviations to the design report, however. +The most important changes are: +<itemize> +<item>The <tt/expr/ function is not implemented. Instead arithmetic +operators <tt/plus/, <tt/minus/, <tt/mult/ and <tt/div/ are +implemented. Also relational operators <tt/gt/, <tt/lt/ ... are +implemented. +<item>The <tt/lindex/ function is called <tt/index/ and it is a binary +operator where the left operand is the list and the right operand is +the index integer. +<item>The MARC extraction routines are not implemented. +Instead, a MARC record is transferred as an argument +to a formatting-routine (in list notation). The formatting +routine then extracts fields from the list by list/string +manipulation functions. +<item>A new statement, <tt/bin/, is implemented to define +binary operators (functions). +</itemize> + +<sect1>IPC + +<p> +As described in the design, FIFOs are used to communicate between +the ETI, monitor and kernel. The ES can run without the monitor, +however. The primary reason for the presence of the monitor was +to assure that the kernel releases the resources used by the +persistency layer. But, since the persistency layer did turn out to +use virtually no disk space at all, there was no point in starting +a kernel process to remove its files — hence this facility +was not implemented. The only purpose of the monitor is to keep the +number of running kernels at a maximum level and even that +is probably useless since most unices will swap kernel processes +out anyway. + +The idle time +before a kernel exits and saves its persistency file is not +controlled by the monitor. Saving the persistency file and +keeping it is usually a good approach — even when a +user doesn't reference/show old result-sets since the user +has a notion of <em/current target/ and database. + +<sect1>Source + +<p> +In this section a short description of each source module is +given. Each module is implemented in a separate sub directory. +Any public headers are located in the <tt/include/ directory. + +<descrip> +<tag/res+log/ is an implementation of the logging system +and the resource management sub system. Note that the +resource module depends on the logging facility. Logging +is implemented in <tt>gw-log.c</tt> and <tt/gw-log.h/. The +file <tt>gw-log-test.c</tt> is small test program for the +logging system. The core of the resource management is implemented +in <tt>gw-res.c</tt>. The files <tt>gw-res-bool.c</tt> and +<tt>gw-res-int.c</tt> implement two utility routines &mdash +on top of the resource management. The header file +<tt>gw-resp.h</tt> is a private header file and <tt>gw-res.h</tt> +is a public header file. + +<tag/ccl/ implements CCL to RPN mapping and a tokenization + utility for other CCL commands. The mapping function is + implemented in <tt>cclfind.c</tt>. Qualifiers are handled in + <tt>cclqual.c</tt> while reading of qualifier mappings from a + file is implemented in <tt>cclqfile.c</tt>. Scanning is implemented + in <tt>ccltoken.c</tt>. String utilities, which might be changed if + other character sets are needed, is implemented in + <tt>cclstr.c</tt>. Table of error messages is implemented in + <tt>cclerrms.c</tt>. + +<tag/util/ implements various utilities: + <descrip> + <tag>MARC utility</tag> implemented in <tt>iso2709</tt>... + <tag>Database utility</tag> implemented in <tt>gw-db.[ch]</tt>. This + utility is used to map a user (email) to an integer. + <tag>String queue utility</tag> implemented in <tt>strqueue.[ch]</tt>. This + utiltiy is used to queue incoming mail in the ETI, kernel and + the monitor. + <tag>Pretty printer</tag> implemented in <tt>ttyemit.[ch]</tt> + — used by the URP. + <tag>FIFO IPC utiltiy</tag> implemented in <tt>gip*.[ch]</tt> — + used by the ETI, kernel and monitor. + </descrip> + +<tag/fml/ implements FML. The top level functions are implemented + in <tt>fml.c</tt>, <tt>fmlcall.c</tt> and <tt>fmlcalls.c</tt>. + Scanning is implemented in <tt>fmltoken.c</tt>. + Memory management is implemented in <tt>fmlmem.c</tt>. + Arithmetic operators are implemented in <tt>fmlarit.c</tt>. + String manipulation functions are implemented in <tt>fmlstr.c</tt>. + Relational operators are implemented in <tt>fmlrel.c</tt>. + List maniuplations are performed in <tt>fmllist.c</tt>. + FML symbol table management is implemented in <tt>fmlsym.c</tt>. + Conversion from ISO2709 to list notation is implemented in + <tt>fmlmarc.c</tt>. + +<tag/zlayer-zdist/ implements the high-level Z39.50 API on top + of Zdist. This task is implemented in <tt>zaccess.c</tt>. The + public header file is called <tt>zaccess.h</tt>. + +<tag/zlayer-yaz/ implements the high-level Z39.50 API on top + of YAZ. This task is implemented in <tt>zaccess.c</tt>. The + public header file is called <tt>zaccess.h</tt>. + +<tag/kernel/ implements the ETI, kernel and monitor. The kernel + itself is implemented in <tt>main.c</tt>, <tt>urp.c</tt> and + <tt>persist.c</tt>. The ETI is implemented in <tt>eti.c</tt> and + the monitor is implemented <tt>monitor.c</tt>. +</descrip> + <sect>LICENSE <p> - Copyright © 1995, the EUROPAGATE consortium (see below). + Copyright © 1995-1996, the EUROPAGATE consortium (see below). The EUROPAGATE consortium members are: