1 <!doctype linuxdoc system>
4 $Id: ir-tcl.sgml,v 1.2 1995-05-30 08:09:27 adam Exp $
8 <title>IrTcl User's Guide and Reference
9 <author>Index Data, <tt/info@index.ping.dk/
12 This document describes IrTcl - an information retrieval toolkit for
13 Tcl and Tk that provides access to the Z39.50/SR protocol.
21 This document describes the <sf/IrTcl/ information retrieval toolkit,
22 which offers a high-level, client interface to the Z39.50 and SR protocols.
23 The toolkit is based on the Tcl/Tk toolkit developed by Prof. John
24 K. Ousterhout at the University of California [ref 1].
25 Tcl is a simple, somewhat shell-like, interpreted language. What
26 makes Tcl attractive is that it also offers a C API, which makes
27 extensions to the language possible. The most important Tcl extension is
28 probably Tk --- A Motif look-and-feel interface to the X window
31 To interface the Z39.50/SR protocol <sf/IrTcl/ uses <bf/YAZ.
32 <bf/YAZ/ offers two transport types: RFC1729/BER on TCP/IP and the mOSI
34 However, the mOSI transport is only an option, and hence it is not
35 needed unless you wish to communicate within an OSI environment.
36 See [ref 2] for more information about the XTI/mOSI implementation.
38 <sf/IrTcl/ provides two system environments:
41 <item> A simple command line shell --- useful for
43 <item> A system which operates within the Tk environment which
44 makes it very easy to implement GUI clients.
50 Basically, <sf/IrTcl/ is a set of commands introduced to Tcl.
51 When extending Tcl there are two approaches: action-oriented commands
52 and object-oriented commands.
54 Action-oriented commands manipulate
55 Tcl variables and each command introduces only one action.
56 The string manipulation commands in Tcl are action oriented.
58 Object-oriented commands are added for every declared
59 variable (object). Object-oriented commands usually provide a set of
60 actions (methods) to manipulate the object.
61 The widgets in Tk (X objects) are examples of the object-oriented style.
63 <sf/IrTcl/ commands are object-oriented. The main reason
64 for this is that the data structures involved in the IR protocol
65 are not easily represented by Tcl data structures.
66 Also, the <sf/IrTcl/ objects tend to exist for a relativly long time.
67 Note that although we use the term object-oriented commands, this
68 does not mean that the programming style is strictly object-oriented. For
69 example, there is such no such thing as inheritance.
71 We are now ready to present the three commands introduced to Tcl by
75 <item> ir: The ir object represents a connection to a target. More
76 precisely it describes a Z-association.
77 <item> ir-set: The ir-set describes a result set, which is
78 conceptually a collection of records returned by the target.
79 The ir-set object may retrieve records from a target by means of
80 the ir object; it may read/write records from/to a local file or it may be
81 updated with a user-edited record.
82 <item> ir-scan: The scan object represents a list of scan lines
83 retrieved from a target.
88 To create a new IR object called <tt/z-assoc/ write:
95 Each object provides a set of <em/settings/ which may either be
96 readable, writeable of both. All settings immediately follow
97 the name of the object. If a value is present the setting
102 We wish to set the preferred-message-size to 18000 on the
106 z-assoc preferredMessageSize 18000
109 To read the current value of preferred-message-size use:
112 z-assoc preferredMessageSize
116 One important category consists of settings is those that relate to the
117 event-driven model. When <sf/IrTcl/ receives responses from the target, i.e.
118 init responses, search responses, etc., a <em/callback/ routine
119 is called. Callback routines are represented in Tcl as
120 a list, which is re-interpreted prior to invocation.
121 The method is similar to the one used in Tk to capture X events.
123 For each SR/Z39.50 request there is a corresponding object action. The most
124 important actions are:
126 <item> connect Establishes connection with a target
127 <item> init Sends an initialize request.
128 <item> search Sends a search request.
129 <item> present Sends a present request.
130 <item> scan Sends a scan request.
134 This example shows a complete
135 connect - init - search - present scenario.
137 First an IR object, called <tt/z/, is created.
138 Also a result set <tt/z.1/ is introduced by the <tt/ir-set/
139 and it is specified that the result set uses <tt/z/ as its association.
141 The setting <tt/databaseNames/ is set to the
142 database <tt/books/ to which the following searches are directed.
143 A connection is established to <tt/fake.com/ by the <tt/connect/ action.
144 A callback is then defined <em/before/ the init request is executed.
145 The Tcl procedure <tt/init-response/ is called when a
146 init-response is returned from the target.
148 The <tt/init-response/ procedure sets up a <tt/search-response/
149 callback handler and sends a search-request by using a query which
150 consists of a single word <tt/science/.
152 When the <tt/search-response/ procedure is called it defines
153 a variable <tt/hits/ and sets it to the value of the setting
154 <tt/resultCount/. If <tt/hits/ is positive a present-request is
155 sent --- asking for 5 records from position 1.
157 Finally, a present-response is received and the number of records
158 returned is stored in the variable <tt/ret/.
163 z databaseNames books
165 z callback {init-response}
168 proc init-response {} {
169 z.1 callback {search-response}
173 proc search-response {} {
174 set hits [z.1 resultCount]
177 z.1 callback {present-response}
182 proc present-response {} {
183 set ret [z.1 numberOfRecordsReturned]
184 puts "$ret records returned"
189 The previous example program doesn't care about error conditions.
190 If errors occur in the program they will be trapped by the Tcl error
191 handler. This is not always appropriate. However, Tcl offers a
192 <tt/catch/ command to support error handling by the program itself.
197 The ir object describes an association with a target.
198 This section covers the connect-init-disconnect actions provided
200 An ir object is created by the <tt/ir/ command and the
201 created object enters a 'not connected' state, because it isn't
202 connected to a target yet.
207 A connection is established by the <tt/connect/ action which is
208 immediately followed by a hostname. Table ref{tab:irconnect} lists the
209 settings that affect the <tt/connect/ action.
210 Obviously, these settings should be set <bf/before/ connecting.
213 <tag><tt>comstack </tt><tt>mosi|tcpip</tt></tag>
215 <tag><tt>protocol </tt><tt>Z3950|SR</tt></tag>
216 ANSI/NISO Z39.50 or ISO SR
217 <tag><tt>failback </tt><em>list</em></tag>
218 Fatal error Tcl script. Called on protocol errors or if target
225 If the connect operation succeeds the <tt/init/ action should be used.
226 Table ref{tab:irinit} lists the init related settings.
229 <tag><tt>preferredMessageSize </tt><em>integer</em></tag>
230 Preferred-message-size
231 <tag><tt>maximumRecordSize </tt><em>integer</em></tag>
233 <tag><tt>idAuthentication </tt><em>string</em></tag>
235 <tag><tt>implementationName </tt><em>string</em></tag>
236 Implementation-name of origin system
237 <tag><tt>implementationId </tt><em>string</em></tag>
238 Implementation-id of origin system
239 <tag><tt>options </tt><em>list</em></tag>
240 Options to be negotiated in init. The list contains the options that
242 <tag><tt>protocolVersion </tt><em>integer</em></tag>
243 Protocol version: 2, 3, etc.
244 <tag><tt>initResponse </tt><em>list</em></tag>
245 Init-response Tcl script
246 <tag><tt>callback </tt><em>list</em></tag>
247 General response Tcl script. Only used if initResponse is not specified
250 The init-response handler should inspect some of the settings in table
251 ref{tab:irinitresponse}
254 <tag><tt>initResult </tt><em>boolean</em></tag>
256 <tag><tt>preferredMessageSize </tt><em>integer</em></tag>
257 Preferred-message-size
258 <tag><tt>maximumRecordSize </tt><em>integer</em></tag>
260 <tag><tt>targetImplementationName </tt><em>string</em></tag>
261 Implementation-name of target system
262 <tag><tt>targetImplementationId </tt><em>string</em></tag>
263 Implementation-id of target system
264 <tag><tt>targetImplementationVersion </tt><em>string</em></tag>
265 Implementation-version of target system
266 <tag><tt>options </tt><em>list</em></tag>
267 Options negotiated after init. The list contains the options that are set.
268 <tag><tt>protocolVersion </tt><em>integer</em></tag>
269 Protocol version: 2, 3, etc.
270 <tag><tt>userInformationField </tt><em>string</em></tag>
271 User information field
275 Consider a client with the ability to access multiple targets.
277 We define a list of targets that we wish to connect to.
278 Each item in the list describes the target parameters with
279 the following four components: association-name, comstack-type,
280 protocol-type and a hostname.
282 The list for the two targets: ISO/SR target DANBIB and TCP/Z39.50
283 target Data Research, will be defined as:
285 set targetList { {danbib mosi sr 0103/find2.denet.dk:4500}
286 {drs tcpip z39 dranet.dra.com} }
289 The Tcl code below defines, connect and initialize the
290 targets in <tt/targetList/:
293 \foreach target $targetList {
294 set assoc [lindex $target 0]
296 $assoc comstack [lindex $target 1]
297 $assoc protocol [lindex $target 2]
298 $assoc failback [list fail-response $assoc]
299 $assoc connect [lindex $target 3]
300 $assoc initResponse [list init-response $assoc]
304 proc fail-response {assoc} {
305 puts "$assoc closed connection or protocol error"
308 proc init-response {assoc} {
309 if {[$assoc initResult]} {
310 puts "$assoc initialized ok"
312 puts "$assoc didn't initialize"
317 <tt/target/ is bound to each item in the list of targets.
318 The <tt/assoc/ is set to the ir object name.
319 Then, the comstack, protocol and failback are set for the <tt/assoc/ object.
320 The ir object name is argument to the <tt/fail-response/ routine.
321 Note the use of the Tcl <tt/list/ command which
322 is necessary here because the argument contains variables
323 (<tt/assoc/) that should be substituted before the handler is defined.
324 After the connect operation, the <tt/init-response/ handler
325 is defined in much the same way as the failback handler.
326 And, finally, an init request is executed.
332 To terminate the connection the <tt/disconnect/ action should be used.
333 This action has no parameters.
334 Another connection may be established by a new <tt/connect/ action on
340 This section covers the queries used by <sf/IrTcl/, and how searches and
341 presents are handled.
343 A search operation and a result set is described by the ir set object.
344 The ir set object is defined by the <tt/ir-set/ command which
345 has two parameters. The first is the name of the new ir set object, and
346 the second, which is optional, is the name of an assocation --- an ir
347 object. The second argument is required if the ir set object should be able
348 to perform searches and presents. However, it is not required if
349 only ``local'' operations is done with the ir set object.
351 When the ir set object is created a number of settings are inherited
352 from the ir object, such as the selected databass, query type,
353 etc. Thus, the ir object contains what we could call default
359 Search requests are sent by the <tt/search/ action which
360 takes a query as parameter. There are two types of queries,
361 RPN and CCL, controlled by the setting <tt/queryType/.
362 A string representation for the query is used in <sf/IrTcl/ since
363 Tcl has reasonably powerful string manipulaton capabilities.
364 The RPN query used in <sf/IrTcl/ is the prefix query notation also used in
365 the <bf/YAZ/ test client.
367 The CCL query is an uninterpreted octet-string which is parsed by the target.
368 We refer to the standard: ISO 8777. Note that only a few targets
369 actually support the CCL query and the interpretation of
370 the standard may vary.
372 The prefix query notation (which is converted to RPN) offer a few
373 operators, shown in table ref{tab:prefixop}.
376 <tag><tt>@attr </tt><em>list op</em></tag>
377 The attributes in list are applied to op
378 <tag><tt>@and </tt><em>op1 op2</em></tag>
379 Boolean <em/and/ on op1 and op2
380 <tag><tt>@or </tt><em>op1 op2</em></tag>
381 Boolean <em/or/ on op1 and op2
382 <tag><tt>@not </tt><em>op1 op2</em></tag>
383 Boolean <em/not/ on op1 and op2
384 <tag><tt>@prox </tt><em>list op1 op2</em></tag>
385 Proximity operation on op1 and op2
388 It is simple to build RPN queries in <sf/IrTcl/. Search terms
389 are sequences of characters, as in:
394 Boolean operators use the prefix notation (instead of the suffix/RPN),
397 @and science technology
400 Search terms may be associated with attributes. These
401 attributes are indicated by the <tt/@attr/ operator.
402 Assuming the bib-1 attribute set, we can set the use-attribute
403 (type is 1) to title (value is 4):
409 Also, it is possible to apply attributes to a range of search terms.
410 In the query below, both search terms have use=title but the <tt/tech/
411 term is right truncated:
414 @attr 1=4 @and @attr 5=1 tech beta
420 Table ref{tab:irsearchrequest} lists settings that affect the search.
421 Setting the <tt/databaseNames/ is mandatory. All other settings
422 have reasonable defaults.
425 <tag><tt>databaseNames </tt><em>list</em></tag>
427 <tag><tt>smallSetUpperBound </tt><em>integer</em></tag>
428 small set upper bound
429 <tag><tt>largeSetLowerBound </tt><em>integer</em></tag>
430 large set lower bound
431 <tag><tt>mediumSetPresentNumber </tt><em>integer</em></tag>
432 medium set present number
433 <tag><tt>replaceIndicator </tt><em>boolean</em></tag>
435 <tag><tt>setName </tt><em>string</em></tag>
437 <tag><tt>queryType rpn|ccl</tt></tag>
438 query type-1 or query type-2
439 <tag><tt>preferredRecordSyntax </tt><em>string</em></tag>
440 preferred record syntax. See table ref{tab:recordtypes} on page
441 pageref{tab:recordtypes}
442 <tag><tt>smallSetElementSetNames </tt><em>string</em></tag>
443 small-set-element-set names
444 <tag><tt>mediumSetElementSetNames </tt><em>string</em></tag>
445 medium-set-element-set names
446 <tag><tt>searchResponse </tt><em>list</em></tag>
447 Search-response Tcl script
448 <tag><tt>callback </tt><em>list</em></tag>
449 General response Tcl script. Only used if searchResponse is not specified
452 The search-response handler, specified by the <tt/callback/ - or
453 the <tt/searchResponse/ setting,
454 should read some of the settings shown in table ref{tab:irsearchresponse}.
457 <tag><tt>searchStatus </tt><em>boolean</em></tag>
459 <tag><tt>responseStatus </tt><em>list</em></tag>
460 response status information
461 <tag><tt>resultCount </tt><em>integer</em></tag>
463 <tag><tt>numberOfRecordsReturned </tt><em>integer</em></tag>
464 number of records retrieved
467 The <tt/responseStatus/ signals one of three conditions which
468 is indicated by the value of the first item in the list:
471 <tag><tt>NSD</tt></tag> indicates that the target has returned one or
472 more non-surrogate diagnostic messages. The <tt/NSD/ item is followed by
473 a list with all non-surrogate messages. Each non-surrogate message consists
474 of three items. The first item of the three items is the error
475 code (integer); the next item is a textual representation of the error
476 code in plain english; the third item is additional information, possibly
477 empty if no additional information was returned by the target.
479 <tag><tt>DBOSD</tt></tag> indicates a successful operation where the
480 target has returned one or more records. Each record may be
481 either a database record or a surrogate diagnostic.
483 <tag><tt>OK</tt></tag> indicates a successful operation - no records are
484 returned from the target.
488 We continue with the multiple-targets example.
489 The <tt/init-response/ procedure will attempt to make searches:
492 proc init-response {assoc} {
493 puts "$assoc connected"
494 ir-set ${assoc}.1 $assoc
495 $assoc.1 queryType rpn
496 $assoc.1 databaseNames base-a base-b
497 $assoc.1 searchResponse [list search-response $assoc ${assoc}.1]
498 $assoc.1 search "@attr 1=4 @and @attr 5=1 tech beta"
502 An ir set object is defined and the
503 ir object is told about the name of ir object.
504 The ir set object use the name of the ir object as prefix.
506 Then, the query-type is defined to be RPN, i.e. we will
507 use the prefix query notation later on.
509 Two databases, <tt/base-a/ and <tt/base-b/, are selected.
511 A <tt/search-response/ handler is defined with the
512 ir object and the ir-set object as parameters and
513 the search is executed.
515 The first part of the <tt/search-response/ looks like:
517 proc search-response {assoc rset} {
518 set status [$rset responseStatus]
519 set type [lindex $status 0]
521 set code [lindex $status 1]
522 set msg [lindex $status 2]
523 set addinfo [lindex $status 3]
524 puts "NSD $code: $msg: $addinfo"
527 set hits [$rset resultCount]
528 if {$type == DBOSD} {
529 set ret [$rset numberOfRecordsReturned]
534 The response status is stored in variable <tt/status/ and
535 the first element indicates the condition.
536 If non-surrogate diagnostics are returned they are displayed.
537 Otherwise, the search was a success and the number of hits
538 is read. Finally, it is tested whether the search response
539 returned records (database or diagnostic).
541 Note that we actually didn't inspect the search status (setting
542 <tt/searchStatus/) to determine whether the search was successful or not,
543 because the standard specifies that one or more non-surrogate
544 diagnostics should be returned by the target in case of errors.
547 If one or more records are returned from the target they
548 will be stored in the result set object.
549 In the case in which the search response contains records, it is
550 very similar to the present response case. Therefore, some settings
551 are common to both situations.
556 The <tt/present/ action sends a present request. The <tt/present/ is
557 followed by two optional integers. The first integer is the
558 result-set starting position --- defaults to 1. The second integer
559 is the number of records requested --- defaults to 10.
560 The settings which could be modified before a <tt/present/
561 action are shown in table ref{tab:irpresentrequest}.
564 <tag><tt>preferredRecordSyntax </tt><em>string</em></tag>
565 preferred record syntax. See table ref{tab:recordtypes} on page
566 pageref{tab:recordtypes}
567 <tag><tt>elementSetElementSetNames </tt><em>string</em></tag>
569 <tag><tt>presentResponse </tt><em>list</em></tag>
570 Present-response Tcl script
571 <tag><tt>callback </tt><em>list</em></tag>
572 General response Tcl script. Only used if presentResponse is not specified
575 The present-response handler should inspect the settings
576 shown in table ref{tab:irpresentresponse}.
577 Note that <tt/responseStatus/ and <tt/numberOfRecordsReturned/
578 settings were also used in the search-response case.
580 As in the search-response case, records returned from the
581 target are stored in the result set object.
584 <tag><tt>presentStatus </tt><em>boolean</em></tag>
586 <tag><tt>responseStatus </tt><em>list</em></tag>
587 Response status information
588 <tag><tt>numberOfRecordsReturned </tt><em>integer</em></tag>
589 number of records returned
590 <tag><tt>nextResultSetPosition </tt><em>integer</em></tag>
591 next result set position
597 Search responses and present responses may result in
598 one or more records stored in the ir set object if
599 the <tt/responseStatus/ setting indicates database or
600 surrogate diagnostics (<tt/DBOSD/). The individual
601 records, indexed by an integer position, should be
604 The action <tt/Type/ followed by an integer returns information
605 about a given position in an ir set. There are three possiblities:
608 <item> SD The item is a surrogate diagnostic.
609 <item> DB The item is a database record.
610 <item> <em/empty/ There is no record at the specified position.
613 To handle the first case, surrogate diagnostic, the
614 <tt/Diag/ action should be used. It returns three
615 items: error code (integer), text representation in plain english
616 (string), and additional information (string, possibly empty).
618 In the second case, database record, the <tt/recordType/ action should
619 be used. It returns the record type at the given position.
620 Some record types are shown in table ref{tab:recordtypes}.
623 <tag>UNIMARC</tag> UNIMARC
624 <tag>INTERMARC</tag> INTERMARC
626 <tag>USMARC</tag> USMARC
627 <tag>UKMARC</tag> UKMARK
628 <tag>NORMARC</tag> NORMARC
629 <tag>LIBRISMARC</tag> LIBRISMARC
630 <tag>DANMARC</tag> DANMARC
631 <tag>FINMARC</tag> FINMARC
632 <tag>SUTRS</tag> SUBTRS
636 We continue our search-response example. In the case,
637 <tt/DBOSD/, we should inspect the result set items.
638 Recall that the ir set name was passed to the
639 search-response handler as argument <tt/rset/.
642 if {$type == DBOSD} {
643 set ret [$rset numberOfRecordsReturned]
644 for {set i 1} {$i<=$ret} {incr i} {
645 set itype [$rset Type $i]
647 set diag [$rset Diag $i]
648 set code [lindex $diag 0]
649 set msg [lindex $diag 1]
650 set addinfo [lindex $diag 2]
651 puts "$i: NSD $code: $msg: $addinfo"
652 } else if {$itype == DB} {
653 set rtype [$rset RecordType $i]
654 puts "$i: type is $rtype"
659 Each item in the result-set is examined.
660 If an item is a diagnostic message it is displayed; otherwise
661 if it's a database record its type is displayed.
667 In the case, where there is a MARC record at a given position we
668 want to display it somehow. The action <tt/getMarc/ is what we need.
669 The <tt/getMarc/ is followed by a position integer and the type of
670 extraction we want to make: <tt/field/ or <tt/line/.
672 The <tt/field/ and <tt/line/ type are followed by three
673 parameters that serve as extraction masks.
674 They are called tag, indicator and field.
675 If the mask matches a tag/indicator/field of a record the information
676 is extracted. Two characters have special meaning in masks: the
677 dot (any character) and star (any number of any character).
679 The <tt/field/ type returns one or more lists of field information
680 that matches the mask specification. Only the content of fields
683 The <tt/line/ type, on the other hand, returns a Tcl list that
684 completely describe the layout of the MARC record --- including
687 The <tt/field/ type is sufficient and efficient in the case, where only a
688 small number of fields are extracted, and in the case where no
689 further processing (in Tcl) is necessary.
691 However, if the MARC record is to be edited or altered in any way, the
692 <tt/line/ extraction is more powerful --- only limited by the Tcl
696 Consider the record below:
701 008 910710c19910701nju 00010 eng
705 100 10 $a Jack Collins
706 245 10 $a How to program a computer
712 Assuming this record is at position 1 in ir-set <tt/z.1/, we
713 might extract the title-field (245 * a), with the following command:
715 z.1 getMarc 1 field 245 * a
720 {How to program a computer}
723 Using the <tt/line/ instead of <tt/field/ gives:
725 {245 {10} {{a {How to program a computer}} }}
728 If we wish to extract the whole record as a list, we use:
730 z.1 getMarc 1 line * * *
735 {001 {} {{{} { 11224466 }} }}
737 {005 {} {{{} 00000000000000.0} }}
738 {008 {} {{{} {910710c19910701nju 00010 eng }} }}
739 {010 { } {{a { 11224466 }} }}
740 {040 { } {{a DLC} {c DLC} }}
741 {050 {00} {{a 123-xyz} }}
742 {100 {10} {{a {Jack Collins}} }}
743 {245 {10} {{a {How to program a computer}} }}
744 {260 {1 } {{a Penguin} }}
745 {263 { } {{a 8710} }}
746 {300 { } {{a {p. cm.}} }}
752 This example demonstrates how Tcl can be used to examine
753 a MARC record in the list notation.
755 The procedure <tt/extract-format/ makes an extraction of
756 fields in a MARC record based on a number of masks.
757 There are 5 parameters, <tt/r/: a
758 record in list notation, <tt/tag/: regular expression to
759 match the record tags, <tt/ind/: regular expression to
760 match indicators, <tt/field/: regular expression to
761 match fields, and finally <tt/text/: regular expression to
762 match the content of a field.
765 proc extract-format {r tag ind field text} {
767 if {[regexp $tag [lindex $line 0]] && \
768 [regexp $ind [lindex $line 1]]} {
769 foreach f [lindex $line 2] {
770 if {[regexp $field [lindex $f 0]]} {
771 if {[regexp $text [lindex $f 1]]} {
781 To match <tt/comput/ followed by any number of character(s) in the
782 245 fields in the record from the previous example, we could use:
784 set r [z.1 getMarc 1 line * * *]
786 extract-format $r 245 .. . comput
790 How to program a computer
795 The <tt/putMarc/ action does the opposite of <tt/getMarc/. It
796 copies a record in Tcl list notation to a ir set object and is
797 needed if a result-set must be updated by a Tcl modified (user-edited)
810 <item> <bf/Ousterhout, John K./:
811 Tcl and the Tk Toolkit. Addison-Wesley Company Inc (ISBN
812 0-201-63337-X). Source and documentation
813 can be found in <tt/URL:ftp://ftp.cs.berkeley.edu/pub/tcl/
815 <item> <bf/Furniss, Peter/:
816 RFC 1698: Octet Sequences for Upper-Layer OSI to Support
817 Basic Communications Applications.