1 <chapter id="tutorial">
2 <!-- $Id: tutorial.xml,v 1.3 2008-02-05 12:16:52 marc Exp $ -->
3 <title>Tutorial</title>
6 <sect1 id="tutorial-oai">
7 <title>A first &acro.oai; indexing example</title>
10 In this section, we will test the system by indexing a small set of
11 sample &acro.oai; records that are included with the &zebra; distribution,
12 running a &zebra; server against the newly created database, and
13 searching the indexes with a client that connects to that server.
16 Go to the <literal>examples/oai-pmh</literal> subdirectory of the
17 distribution archive, or make a deep copy of the Debian installation
19 <literal>/usr/share/idzebra-2.0.-examples/oai-pmh</literal>.
20 An XML file containing multiple &acro.oai;
21 records is located in the sub
22 directory <literal>examples/oai-pmh/data</literal>.
25 Additional OAI test records can be downloaded by running a shell
26 script (you may want to abort the script when you have waitet
27 longer than your coffe brews ..).
35 To index these &acro.oai; records, type:
37 zebraidx-2.0 -c conf/zebra.cfg init
38 zebraidx-2.0 -c conf/zebra.cfg update data
39 zebraidx-2.0 -c conf/zebra.cfg commit
41 In case you have not installed zebra yet but have compiled the
42 binaries from this tarball, use the following command form:
44 ../../index/zebraidx -c conf/zebra.cfg this and that
46 On some systems the &zebra; binaries are installed under the
47 generic names, you need to use the following command form:
49 zebraidx -c conf/zebra.cfg this and that
54 In this command, the word <literal>update</literal> is followed
55 by the name of a directory: <literal>zebraidx</literal> updates all
56 files in the hierarchy rooted at <literal>data</literal>.
58 <literal>-c conf/zebra.cfg</literal> points to the proper
63 You might ask yourself how &acro.xml; content is indexed using &acro.xslt;
64 stylesheets: to satisfy your curiosity, you might want to run the
65 indexing transformation on an example debugging &acro.oai; record.
67 xsltproc conf/oai2index.xsl data/debug-record.xml
69 Here you see the &acro.oai; record transformed into the indexing
70 &acro.xml; format. &zebra; is creating several inverted indexes,
71 and their name and type are clearly visible in the indexing
76 If your indexing command was successful, you are now ready to
77 fire up a server. To start a server on port 9999, type:
79 zebrasrv-2.0 -c conf/zebra.cfg @:9999
84 The &zebra; index that you have just created has a single database
85 named <literal>Default</literal>.
86 The database contains several &acro.oai; records, and the server will
87 return records in the &acro.xml; format only. The indexing machine
88 did the splitting into individual records just behind the scenes.
94 <sect1 id="tutorial-oai-sru-pqf">
95 <title>Searching the &acro.oai; database by web service</title>
98 &zebra; has a build-in web service, which is close to the
99 &acro.sru; standard web service. We use it to access our new
100 database using any &acro.xml; enabled web browser.
101 This service is using the &acro.pqf; query language.
103 section we show how to run a fully compliant &acro.sru; server,
104 including support for the query language &acro.cql;
108 Searching and retrieving &acro.xml; records is easy. For example,
109 you can point your browser to one of the following url's to
110 search for the term <literal>the</literal>. Just point your
111 browser at this link:
113 url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the">
114 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the</ulink>
119 These URL's woun't work unless you have indexed the example data
120 and started an &zebra; server as outlined in the previous section.
125 In case we actually want to retrieve one record, we need to alter
126 our URl to the following
127 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=dc">
128 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=dc
133 This way we can page through our result set in chunks of records,
134 for example, we access the 6th to the 10th record using the URL
135 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=6&maximumRecords=5&recordSchema=dc">
136 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=6&maximumRecords=5&recordSchema=dc
145 http://localhost:9999/?version=1.1&operation=searchRetrieve
146 &x-pquery=title%3Cthe
150 <sect1 id="tutorial-oai-sru-present">
151 <title>Presenting search results in different formats</title>
154 &zebra; uses &acro.xslt; stylesheets for both &acro.xml;record
156 display retrieval. In this example installation, they are two
157 retrieval schema's defined in
158 <literal>conf/dom-conf.xml</literal>:
159 the <literal>dc</literal> schema implemented in
160 <literal>conf/oai2dc.xsl</literal>, and
161 the <literal>zebra</literal> schema implemented in
162 <literal>conf/oai2zebra.xsl</literal>.
163 The URL's for acessing both are the same, except for the different
164 value of the <literal>recordSchema</literal> parameter:
165 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=dc">
166 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=dc
169 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra">
170 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra
172 For the curious, one can see that the &acro.xslt; transformations
175 xsltproc conf/oai2dc.xsl data/debug-record.xml
176 xsltproc conf/oai2zebra.xsl data/debug-record.xml
178 Notice also that the &zebra; specific parameters are injected by
179 the engine when retrieving data, therefore some of the attributes
180 in the <literal>zebra</literal> retrieval schema are not filled
181 when running the transformation from the command line.
186 In addition to the user defined retrieval schema's one can always
187 choose from many build-in schema's. In case one is only
188 interested in the &zebra; internal metadata about a certain
189 record, one uses the <literal>zebra::meta</literal> schema.
190 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::meta">
191 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::meta
196 The <literal>zebra::data</literal> schema is used to retrieve the
197 original stored &acro.oai; &acro.xml; record.
198 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::data">
199 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::data
205 <sect1 id="tutorial-oai-sru-searches">
206 <title>More interesting searches</title>
209 The &acro.oai; indexing example defines many different index
210 names, a study of the <literal>conf/oai2index.xsl</literal>
211 stylesheet reveals the following word type indexes (i.e. those
212 swith suffix <literal>:w</literal>):
224 By default, searches do access the <literal>anr:w</literal> index,
225 but we can direct searches to any access point by constructing the
226 correct &acro.pqf; query. For example, to search in titles only,
229 url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=@attr
230 1=dc_title the&startRecord=1&maximumRecords=1&recordSchema=dc">
231 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=@attr
232 1=dc_title the&startRecord=1&maximumRecords=1&recordSchema=dc
237 Similar we can direct searches to the other indexes defined. Or we
238 can create boolean combinations of searches on different
239 indexes. In this case we search for <literal>the</literal> in
240 <literal>dc_title</literal> and for <literal>fish</literal> in
241 <literal>dc_description</literal> using the query
242 <literal>@and @attr 1=dc_title the @attr 1=dc_description fish</literal>.
244 url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=@and
246 @attr 1=dc_description
247 fish&startRecord=1&maximumRecords=1&recordSchema=dc">
248 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=@and
250 @attr 1=dc_description fish&startRecord=1&maximumRecords=1&recordSchema=dc
257 <sect1 id="tutorial-oai-sru-zebra-indexess">
258 <title>Investigating the content of the indexes</title>
261 How works the magic? What is inside the indexes? Why is a certain
262 record foound by a search, and another not?. The answer is in the
263 inverterd indexes. You can easily investigat them using the
264 special &zebra; schema
265 <literal>zebra::index::fieldname</literal>. In this example you
266 can see that the <literal>dc_title</literal> index has both word
267 (type <literal>:w</literal>) and phrase (type
268 <literal>:p</literal>)
270 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::index::dc_title">
271 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::index::dc_title
276 But where in the indexes did the term match for the query occur?
277 Easily answered with the special &zebra; schema
278 <literal>zebra::snippet</literal>. The matching terma are
279 encapsulated by <literal><s></literal> tags.
280 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::snippet">
281 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::snippet
286 How can I refine my search? Which interesting search terms are
287 found inside my hit set? Try the special &zebra; schema
288 <literal>zebra::facet::fieldname:type</literal>. In this case, we
289 investigate additional search terms for the
290 <literal>dc_title:w</literal> index.
291 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::facet::dc_title:w">
292 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::facet::dc_title:w
297 One can ask for multiple facets. Here, we want them from phrase
299 <literal>:p</literal>.
300 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::facet::dc_publisher:p,dc_title:p">
301 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::facet::dc_publisher:p,dc_title:p
309 <sect1 id="tutorial-oai-z3950">
310 <title>Searching the &acro.oai; database by &acro.z3950; protocol</title>
314 In this section we repeat the searches and presents we have done so
315 far using the binary &acro.z3950; protocol, you can use any
317 For instance, you can use the demo command-line client that comes
321 Connecting to the server is done by the command
323 yaz-client localhost:9999
328 When the client has connected, you can type:
339 Z39.50 presents using presentation stylesheets:
350 Z39.50 buildin Zebra presents (in this configuration only if
351 started without yaz-frontendserver):
354 Z> elements zebra::meta
357 Z> elements zebra::meta::sysno
364 Z> elements zebra::index
367 Z> elements zebra::snippet
370 Z> elements zebra::facet::any:w
373 Z> elements zebra::facet::any:w,dc_title:w
379 Z39.50 searches targeted at specific indexes and boolean
380 combinations of these can be issued as well.
384 Z> find @attr 1=oai_identifier @attr 4=3 oai:caltechcstr.library.caltech.edu:4
387 Z> find @attr 1=oai_datestamp @attr 4=3 2001-04-20
390 Z> find @attr 1=oai_setspec @attr 4=3 7374617475733D756E707562
393 Z> find @attr 1=dc_title communication
396 Z> find @attr 1=dc_identifier @attr 4=3
397 http://resolver.caltech.edu/CaltechCSTR:1986.5228-tr-86
404 Notice that all indexes defined by 'type="0"' in the
405 indexing style sheet must be searched using the '@attr 4=3'
406 structure attribute instruction.
410 Notice also that searching and scan on indexes
411 'dc_contributor', 'dc_language', 'dc_rights', and 'dc_source'
412 might fail, simply because none of the records in the small example set
413 have these fields set, and consequently, these indexes might not
421 <sect1 id="tutorial-oai-sru-yazfrontend">
422 <title>Setting up a correct &acro.sru; web service</title>
425 Or, alternatively, starting the SRU/SRW/Z39.50 server including
426 PQF and CQL query configuration:
428 zebrasrv -f yazserver.xml
437 Z39.50 presents using presentation stylesheets:
446 Z39.50 buildin Zebra presents (in this configuration only if
447 started without yaz-frontendserver):
449 Z> elements zebra::meta
452 Z> elements zebra::meta::sysno
459 Z> elements zebra::index
462 Z> elements zebra::snippet
465 Z> elements zebra::facet::any:w
468 Z> elements zebra::facet::any:w,dc_title:w
473 Z39.50 searches targeted at specific indexes
476 Z> find @attr 1=oai_identifier @attr 4=3 oai:caltechcstr.library.caltech.edu:4
479 Z> find @attr 1=oai_datestamp @attr 4=3 2001-04-20
482 Z> find @attr 1=oai_setspec @attr 4=3 7374617475733D756E707562
485 Z> find @attr 1=dc_title communication
488 Z> find @attr 1=dc_identifier @attr 4=3
489 http://resolver.caltech.edu/CaltechCSTR:1986.5228-tr-86
494 Notice that all indexes defined by 'type="0"' in the
495 indexing style sheet must be searched using the '@attr 4=3'
496 structure attribute instruction.
498 Notice also that searching and scan on indexes
499 'dc_contributor', 'dc_language', 'dc_rights', and 'dc_source'
500 fails, simply because none of the records in this example set
501 have these fields set, and consequently, these indexes are
510 yaz-client localhost:9999
513 Z> scan @attr 1=oai_identifier @attr 4=3 oai
514 Z> scan @attr 1=oai_datestamp @attr 4=3 1
515 Z> scan @attr 1=oai_setspec @attr 4=3 2000
517 Z> scan @attr 1=dc_title communication
518 Z> scan @attr 1=dc_identifier @attr 4=3 a
523 Z39.50 search using server-side CQL conversion:
531 Z> find creator = the
532 Z> find dc.creator = the
535 Z> find description < the
536 Z> find title le some
537 Z> find title ge some
540 Z> find identifier eq
541 "http://resolver.caltech.edu/CaltechCSTR:1978.2276-tr-78"
542 Z> find relation eq something
545 etc, etc. Notice that all indexes defined by 'type="0"' in the
546 indexing style sheet must be searched using the 'eq'
554 Z39.50 scan using server side CQL conversion:
556 Unfortunately, this will _never_ work as it is not supported by the
558 If you want to use scan using server side CQL conversion, you need to
559 make an SRW connection using yaz-client, or a
560 SRU connection using REST Web Services - any browser will do.
563 SRU Explain ZeeRex response:
565 http://localhost:9999/
566 http://localhost:9999/?version=1.1&operation=explain
569 SRU Search Retrieve records:
571 http://localhost:9999/?version=1.1&operation=searchRetrieve
572 &x-pquery=creator=adam
574 http://localhost:9999/?version=1.1&operation=searchRetrieve
575 &x-pquery=date=1978-01-01
576 &startRecord=1&maximumRecords=1&recordSchema=dc
578 http://localhost:9999/?version=1.1&operation=searchRetrieve
579 &x-pquery=dc.title=the
581 http://localhost:9999/?version=1.1&operation=searchRetrieve
582 &x-pquery=description=the
587 http://localhost:9999/?version=1.1&operation=searchRetrieve
588 &x-pquery=title%3Cthe
593 http://localhost:9999/?version=1.1&operation=scan&scanClause=title=a
594 http://localhost:9999/?version=1.1&operation=scan
595 &scanClause=identifier%20eq%20a
597 Notice: you need to use the 'eq' relation for all @attr 4=3 indexes
601 SRW explain with CQL index points:
603 Z> open http://localhost:9999
606 Notice: when opening a connection using the 'http.//' prefix, yaz-client
607 uses SRW SOAP connections, and 'form xml' and 'querytype cql' are
611 SRW search using implicit server side CQL:
613 Z> open http://localhost:9999
614 Z> find identifier eq
615 "http://resolver.caltech.edu/CaltechCSTR:1978.2276-tr-78"
616 Z> find description < the
619 In SRW connection mode, the follwing fails due to problem in yaz-client:
624 SRW scan using implicit server side CQL:
626 yaz-client http://localhost:9999
627 Z> scan title = communication
628 Z> scan identifier eq a
630 Notice: you need to use the 'eq' relation for all @attr 4=3 indexes
642 <!-- Keep this comment at the end of the file
647 sgml-minimize-attributes:nil
648 sgml-always-quote-attributes:t
651 sgml-parent-document: "zebra.xml"
652 sgml-local-catalogs: nil
653 sgml-namecase-general:t