1 <chapter id="examples">
2 <title>Example Configurations</title>
4 <sect1 id="examples-overview">
5 <title>Overview</title>
8 <command>zebraidx</command> and
9 <command>zebrasrv</command> are both
10 driven by a master configuration file, which may refer to other
11 subsidiary configuration files. By default, they try to use
12 <filename>zebra.cfg</filename> in the working directory as the
13 master file; but this can be changed using the <literal>-c</literal>
14 option to specify an alternative master configuration file.
17 The master configuration file tells &zebra;:
22 Where to find subsidiary configuration files, including both
23 those that are named explicitly and a few ``magic'' files such
24 as <literal>default.idx</literal>,
25 which specifies the default indexing rules.
31 What record schemas to support. (Subsidiary files specifiy how
32 to index the contents of records in those schemas, and what
33 format to use when presenting records in those schemas to client
40 What attribute sets to recognise in searches. (Subsidiary files
41 specify how to interpret the attributes in terms
42 of the indexes that are created on the records.)
48 Policy details such as what type of input format to expect when
49 adding new records, what low-level indexing algorithm to use,
50 how to identify potential duplicate records, etc.
57 Now let's see what goes in the <literal>zebra.cfg</literal> file
58 for some example configurations.
63 <title>Example 1: &acro.xml; Indexing And Searching</title>
66 This example shows how &zebra; can be used with absolutely minimal
67 configuration to index a body of
68 <ulink url="&url.xml;">&acro.xml;</ulink>
69 documents, and search them using
70 <ulink url="&url.xpath;">XPath</ulink>
71 expressions to specify access points.
74 Go to the <literal>examples/zthes</literal> subdirectory
75 of the distribution archive.
76 There you will find a <literal>Makefile</literal> that will
77 populate the <literal>records</literal> subdirectory with a file of
78 <ulink url="http://zthes.z3950.org/">Zthes</ulink>
79 records representing a taxonomic hierarchy of dinosaurs. (The
80 records are generated from the family tree in the file
81 <literal>dino.tree</literal>.)
82 Type <literal>make records/dino.xml</literal>
83 to make the &acro.xml; data file.
84 (Or you could just type <literal>make dino</literal> to build the &acro.xml;
85 data file, create the database and populate it with the taxonomic
86 records all in one shot - but then you wouldn't learn anything,
90 Now we need to create a &zebra; database to hold and index the &acro.xml;
91 records. We do this with the
92 &zebra; indexer, <command>zebraidx</command>, which is
93 driven by the <literal>zebra.cfg</literal> configuration file.
94 For our purposes, we don't need any
95 special behaviour - we can use the defaults - so we can start with a
96 minimal file that just tells <command>zebraidx</command> where to
97 find the default indexing rules, and how to parse the records:
99 profilePath: .:../../tab
104 That's all you need for a minimal &zebra; configuration. Now you can
105 roll the &acro.xml; records into the database and build the indexes:
107 zebraidx update records
111 Now start the server. Like the indexer, its behaviour is
113 <literal>zebra.cfg</literal> file; and like the indexer, it works
114 just fine with this minimal configuration.
118 By default, the server listens on IP port number 9999, although
119 this can easily be changed - see
120 <xref linkend="zebrasrv"/>.
123 Now you can use the &acro.z3950; client program of your choice to execute
124 XPath-based boolean queries and fetch the &acro.xml; records that satisfy
129 Z> find @attr 1=/Zthes/termName Sauroposeidon
134 <termId>22</termId>
135 <termName>Sauroposeidon</termName>
136 <termType>PT</termType>
137 <termNote>The tallest known dinosaur (18m)</termNote>
139 <relationType>BT</relationType>
140 <termId>21</termId>
141 <termName>Brachiosauridae</termName>
142 <termType>PT</termType>
145 <idzebra xmlns="http://www.indexdata.dk/zebra/">
146 <size>300</size>
147 <localnumber>23</localnumber>
148 <filename>records/dino.xml</filename>
154 Now wasn't that nice and easy?
159 <sect1 id="example2">
160 <title>Example 2: Supporting Interoperable Searches</title>
163 The problem with the previous example is that you need to know the
164 structure of the documents in order to find them. For example,
165 when we wanted to find the record for the taxon
166 <foreignphrase role="taxon">Sauroposeidon</foreignphrase>,
167 we had to formulate a complex XPath
168 <literal>/Zthes/termName</literal>
169 which embodies the knowledge that taxon names are specified in a
170 <literal><termName></literal> element inside the top-level
171 <literal><Zthes></literal> element.
174 This is bad not just because it requires a lot of typing, but more
175 significantly because it ties searching semantics to the physical
176 structure of the searched records. You can't use the same search
177 specification to search two databases if their internal
178 representations are different. Consider a different taxonomy
179 database in which the records have taxon names specified
180 inside a <literal><name></literal> element nested within a
181 <literal><identification></literal> element
182 inside a top-level <literal><taxon></literal> element: then
183 you'd need to search for them using
184 <literal>1=/taxon/identification/name</literal>
187 How, then, can we build broadcasting Information Retrieval
188 applications that look for records in many different databases?
189 The &acro.z3950; protocol offers a powerful and general solution to this:
190 abstract ``access points''. In the &acro.z3950; model, an access point
191 is simply a point at which searches can be directed. Nothing is
192 said about implementation: in a given database, an access point
193 might be implemented as an index, a path into physical records, an
194 algorithm for interrogating relational tables or whatever works.
195 The only important thing is that the semantics of an access
196 point is fixed and well defined.
199 For convenience, access points are gathered into <firstterm>attribute
200 sets</firstterm>. For example, the &acro.bib1; attribute set is supposed to
201 contain bibliographic access points such as author, title, subject
202 and ISBN; the GEO attribute set contains access points pertaining
203 to geospatial information (bounding coordinates, stratum, latitude
204 resolution, etc.); the CIMI
205 attribute set contains access points to do with museum collections
206 (provenance, inscriptions, etc.)
209 In practice, the &acro.bib1; attribute set has tended to be a dumping
210 ground for all sorts of access points, so that, for example, it
211 includes some geospatial access points as well as strictly
212 bibliographic ones. Nevertheless, this model
213 allows a layer of abstraction over the physical representation of
214 records in databases.
217 In the &acro.bib1; attribute set, a taxon name is probably best
218 interpreted as a title - that is, a phrase that identifies the item
219 in question. &acro.bib1; represents title searches by
221 <ulink url="&url.z39.50.bib1.semantics;">The &acro.bib1; Attribute
222 Set Semantics</ulink>)
223 So we need to configure our dinosaur database so that searches for
224 &acro.bib1; access point 4 look in the
225 <literal><termName></literal> element,
227 <literal><Zthes></literal> element.
230 This is a two-step process. First, we need to tell &zebra; that we
231 want to support the &acro.bib1; attribute set. Then we need to tell it
232 which elements of its record pertain to access point 4.
235 We need to create an <link linkend="abs-file">Abstract Syntax
236 file</link> named after the document element of the records we're
237 working with, plus a <literal>.abs</literal> suffix - in this case,
238 <literal>Zthes.abs</literal> - as follows:
242 <area id="attset.zthes" coords="2"/>
243 <area id="attset.attset" coords="3"/>
244 <area id="termId" coords="7"/>
245 <area id="termName" coords="8"/>
253 xelm /Zthes/termId termId:w
254 xelm /Zthes/termName termName:w,title:w
255 xelm /Zthes/termQualifier termQualifier:w
256 xelm /Zthes/termType termType:w
257 xelm /Zthes/termLanguage termLanguage:w
258 xelm /Zthes/termNote termNote:w
259 xelm /Zthes/termCreatedDate termCreatedDate:w
260 xelm /Zthes/termCreatedBy termCreatedBy:w
261 xelm /Zthes/termModifiedDate termModifiedDate:w
262 xelm /Zthes/termModifiedBy termModifiedBy:w
265 <callout arearefs="attset.zthes">
267 Declare Thesausus attribute set. See <filename>zthes.att</filename>.
270 <callout arearefs="attset.attset">
272 Declare &acro.bib1; attribute set. See <filename>bib1.att</filename> in
273 &zebra;'s <filename>tab</filename> directory.
276 <callout arearefs="termId">
278 This xelm directive selects contents of nodes by XPath expression
279 <literal>/Zthes/termId</literal>. The contents (CDATA) will be
280 word searchable by Zthes attribute termId (value 1001).
283 <callout arearefs="termName">
285 Make <literal>termName</literal> word searchable by both
286 Zthes attribute termName (1002) and &acro.bib1; atttribute title (4).
292 After re-indexing, we can search the database using &acro.bib1;
293 attribute, title, as follows:
296 Z> f @attr 1=4 Eoraptor
298 Received SearchResponse.
299 Search was a success.
300 Number of hits: 1, setno 1
301 SearchResult-1: Eoraptor(1)
305 Sent presentRequest (1+1).
307 [Default]Record type: &acro.xml;
309 <termId>2</termId>
310 <termName>Eoraptor</termName>
311 <termType>PT</termType>
312 <termNote>The most basal known dinosaur</termNote>
321 The simplest hello-world example could go like this:
326 <title>The art of motorcycle maintenance</title>
327 <subject scheme="Dewey">zen</subject>
332 f @attr 1=/book/title motorcycle
334 f @attr 1=/book/subject[@scheme=Dewey] zen
336 If you suddenly decide you want broader interop, you can add
337 an abs file (more or less like this):
342 elm (2,1) title title
343 elm (2,21) subject subject
347 How to include images:
351 <imagedata fileref="system.eps" format="eps">
354 <imagedata fileref="system.gif" format="gif">
357 <phrase>The Multi-Lingual Search System Architecture</phrase>
361 <emphasis role="strong">
362 The Multi-Lingual Search System Architecture.
365 Network connections across local area networks are
366 represented by straight lines, and those over the
367 internet by jagged lines.
371 Where the three <*object> thingies inside the top-level <mediaobject>
372 are decreasingly preferred version to include depending on what the
373 rendering engine can handle. I generated the EPS version of the image
374 by exporting a line-drawing done in TGIF, then converted that to the
375 GIF using a shell-script called "epstogif" which used an appallingly
376 baroque sequence of conversions, which I would prefer not to pollute
377 the &zebra; build environment with:
381 # Yes, what follows is stupidly convoluted, but I can't find a
382 # more straightforward path from the EPS generated by tgif's
383 # "Print" command into a browser-friendly format.
385 file=`echo "$1" | sed 's/\.eps//'`
386 ps2pdf "$1" "$file".pdf
387 pdftopbm "$file".pdf "$file"
388 pnmscale 0.50 < "$file"-000001.pbm | pnmcrop | ppmtogif
389 rm -f "$file".pdf "$file"-000001.pbm
393 <!-- Keep this comment at the end of the file
398 sgml-minimize-attributes:nil
399 sgml-always-quote-attributes:t
402 sgml-parent-document: "zebra.xml"
403 sgml-local-catalogs: nil
404 sgml-namecase-general:t