From 306231655249a6c67f850f6b0e907ed931585b59 Mon Sep 17 00:00:00 2001 From: Marc Cromme Date: Tue, 5 Feb 2008 12:16:52 +0000 Subject: [PATCH] added several sections on web service usage of zebra, including snippets, facets, index element sets and xslt transformations --- doc/tutorial.xml | 255 ++++++++++++++++++++++++++++++++++-------------------- 1 file changed, 163 insertions(+), 92 deletions(-) diff --git a/doc/tutorial.xml b/doc/tutorial.xml index 665c752..fbbd8c3 100644 --- a/doc/tutorial.xml +++ b/doc/tutorial.xml @@ -1,5 +1,5 @@ - + Tutorial @@ -35,7 +35,7 @@ To index these &acro.oai; records, type: zebraidx-2.0 -c conf/zebra.cfg init - zebraidx-2.0 -c conf/zebra.cfg update data/oai-caltech.xml + zebraidx-2.0 -c conf/zebra.cfg update data zebraidx-2.0 -c conf/zebra.cfg commit In case you have not installed zebra yet but have compiled the @@ -53,7 +53,8 @@ In this command, the word update is followed by the name of a directory: zebraidx updates all - files in the hierarchy rooted at that directory. The command option + files in the hierarchy rooted at data. + The command option -c conf/zebra.cfg points to the proper configuration file. @@ -108,8 +109,9 @@ you can point your browser to one of the following url's to search for the term the. Just point your browser at this link: - - http://localhost:9999/?version=1.1&operation=searchRetrieve&query=the + + http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the @@ -122,16 +124,16 @@ In case we actually want to retrieve one record, we need to alter our URl to the following - - http://localhost:9999/?version=1.1&operation=searchRetrieve&query=the&startRecord=1&maximumRecords=1&recordSchema=dc + + http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=dc This way we can page through our result set in chunks of records, for example, we access the 6th to the 10th record using the URL - - http://localhost:9999/?version=1.1&operation=searchRetrieve&query=the&startRecord=6&maximumRecords=5&recordSchema=dc + + http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=6&maximumRecords=5&recordSchema=dc @@ -141,97 +143,164 @@ http://localhost:9999/?version=1.1&operation=searchRetrieve - &query=title%3Cthe + &x-pquery=title%3Cthe --> - - - Presenting search results in different formats + + &zebra; uses &acro.xslt; stylesheets for both &acro.xml;record + indexing and + display retrieval. In this example installation, they are two + retrieval schema's defined in + conf/dom-conf.xml: + the dc schema implemented in + conf/oai2dc.xsl, and + the zebra schema implemented in + conf/oai2zebra.xsl. + The URL's for acessing both are the same, except for the different + value of the recordSchema parameter: + + http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=dc + + and + + http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra + + For the curious, one can see that the &acro.xslt; transformations + really do the magic. + + xsltproc conf/oai2dc.xsl data/debug-record.xml + xsltproc conf/oai2zebra.xsl data/debug-record.xml + + Notice also that the &zebra; specific parameters are injected by + the engine when retrieving data, therefore some of the attributes + in the zebra retrieval schema are not filled + when running the transformation from the command line. + -Z39.50 search: - - yaz-client localhost:9999 - Z> format xml - Z> querytype prefix - Z> elements oai - Z> find the - Z> show 1+1 - - -Z39.50 presents using presentation stylesheets: - - Z> elements dc - Z> show 2+1 - - Z> elements zebra - Z> show 3+1 - - -Z39.50 buildin Zebra presents (in this configuration only if - started without yaz-frontendserver): - - - Z> elements zebra::meta - Z> show 4+1 - - Z> elements zebra::meta::sysno - Z> show 5+1 - - Z> format sutrs - Z> show 5+1 - Z> format xml - - Z> elements zebra::index - Z> show 6+1 - - Z> elements zebra::snippet - Z> show 7+1 - - Z> elements zebra::facet::any:w - Z> show 8+1 + + In addition to the user defined retrieval schema's one can always + choose from many build-in schema's. In case one is only + interested in the &zebra; internal metadata about a certain + record, one uses the zebra::meta schema. + + http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::meta + + - Z> elements zebra::facet::any:w,dc_title:w - Z> show 9+1 - + + The zebra::data schema is used to retrieve the + original stored &acro.oai; &acro.xml; record. + + http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::data + + + + + More interesting searches -Z39.50 searches targeted at specific indexes + + The &acro.oai; indexing example defines many different index + names, a study of the conf/oai2index.xsl + stylesheet reveals the following word type indexes (i.e. those + swith suffix :w): + + any:w + dc_title:w + dc_creator:w + dc_subject:w + dc_description:w + dc_contributor:w + dc_publisher:w + dc_language:w + dc_rights:w + + By default, searches do access the anr:w index, + but we can direct searches to any access point by constructing the + correct &acro.pqf; query. For example, to search in titles only, + we use + + http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=@attr + 1=dc_title the&startRecord=1&maximumRecords=1&recordSchema=dc + + - Z> elements zebra - Z> find @attr 1=oai_identifier @attr 4=3 oai:caltechcstr.library.caltech.edu:4 - Z> show 1+1 + + Similar we can direct searches to the other indexes defined. Or we + can create boolean combinations of searches on different + indexes. In this case we search for the in + dc_title and for fish in + dc_description using the query + @and @attr 1=dc_title the @attr 1=dc_description fish. + + http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=@and + @attr 1=dc_title the + @attr 1=dc_description fish&startRecord=1&maximumRecords=1&recordSchema=dc + + - Z> find @attr 1=oai_datestamp @attr 4=3 2001-04-20 - Z> show 1+1 - Z> find @attr 1=oai_setspec @attr 4=3 7374617475733D756E707562 - Z> show 1+1 - - Z> find @attr 1=dc_title communication - Z> show 1+1 + - Z> find @attr 1=dc_identifier @attr 4=3 - http://resolver.caltech.edu/CaltechCSTR:1986.5228-tr-86 - Z> show 1+1 + + Investigating the content of the indexes - etc, etc. + + How works the magic? What is inside the indexes? Why is a certain + record foound by a search, and another not?. The answer is in the + inverterd indexes. You can easily investigat them using the + special &zebra; schema + zebra::index::fieldname. In this example you + can see that the dc_title index has both word + (type :w) and phrase (type + :p) + indexed fields, + + http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::index::dc_title + + - Notice that all indexes defined by 'type="0"' in the - indexing style sheet must be searched using the '@attr 4=3' - structure attribute instruction. + + But where in the indexes did the term match for the query occur? + Easily answered with the special &zebra; schema + zebra::snippet. The matching terma are + encapsulated by <s> tags. + + http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::snippet + + - Notice also that searching and scan on indexes - 'dc_contributor', 'dc_language', 'dc_rights', and 'dc_source' - fails, simply because none of the records in this example set - have these fields set, and consequently, these indexes are - _not_ created. + + How can I refine my search? Which interesting search terms are + found inside my hit set? Try the special &zebra; schema + zebra::facet::fieldname:type. In this case, we + investigate additional search terms for the + dc_title:w index. + + http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::facet::dc_title:w + + + + One can ask for multiple facets. Here, we want them from phrase + indexes of type + :p. + + http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::facet::dc_publisher:p,dc_title:p + + @@ -310,7 +379,7 @@ Z39.50 searches targeted at specific indexes Z39.50 searches targeted at specific indexes and boolean combinations of these can be issued as well. - + Z> elements dc Z> find @attr 1=oai_identifier @attr 4=3 oai:caltechcstr.library.caltech.edu:4 Z> show 1+1 @@ -327,7 +396,7 @@ Z39.50 searches targeted at specific indexes Z> find @attr 1=dc_identifier @attr 4=3 http://resolver.caltech.edu/CaltechCSTR:1986.5228-tr-86 Z> show 1+1 - + etc, etc. @@ -352,11 +421,13 @@ Z39.50 searches targeted at specific indexes Setting up a correct &acro.sru; web service -Or, alternatively, starting the SRU/SRW/Z39.50 server including -PQF and CQL query configuration: - - zebrasrv -f yazserver.xml - + + Or, alternatively, starting the SRU/SRW/Z39.50 server including + PQF and CQL query configuration: + + zebrasrv -f yazserver.xml + + @@ -498,23 +569,23 @@ SRU Explain ZeeRex response: SRU Search Retrieve records: http://localhost:9999/?version=1.1&operation=searchRetrieve - &query=creator=adam + &x-pquery=creator=adam http://localhost:9999/?version=1.1&operation=searchRetrieve - &query=date=1978-01-01 + &x-pquery=date=1978-01-01 &startRecord=1&maximumRecords=1&recordSchema=dc http://localhost:9999/?version=1.1&operation=searchRetrieve - &query=dc.title=the + &x-pquery=dc.title=the http://localhost:9999/?version=1.1&operation=searchRetrieve - &query=description=the + &x-pquery=description=the relation tests: http://localhost:9999/?version=1.1&operation=searchRetrieve - &query=title%3Cthe + &x-pquery=title%3Cthe SRU scan: -- 1.7.10.4