X-Git-Url: http://git.indexdata.com/?p=idzebra-moved-to-github.git;a=blobdiff_plain;f=doc%2Ftutorial.xml;h=8d79bd866b5deaf53142f9ce14f3f1354ccbca29;hp=3d8ecb36245022065840aaaee8cab3e2bea62540;hb=dcda88860b03641b6900d43135ca769f005105e8;hpb=b014345252c770fff856df02953cc863c5341a49 diff --git a/doc/tutorial.xml b/doc/tutorial.xml index 3d8ecb3..8d79bd8 100644 --- a/doc/tutorial.xml +++ b/doc/tutorial.xml @@ -1,103 +1,102 @@ - - - Tutorial - - - - A first &acro.oai; indexing example - - - In this section, we will test the system by indexing a small set of - sample &acro.oai; records that are included with the &zebra; distribution, - running a &zebra; server against the newly created database, and - searching the indexes with a client that connects to that server. - - - Go to the examples/oai-pmh subdirectory of the - distribution archive, or make a deep copy of the Debian installation - directory - /usr/share/idzebra-2.0.-examples/oai-pmh. - An XML file containing multiple &acro.oai; - records is located in the sub - directory examples/oai-pmh/data. - - + + Tutorial + + + + A first &acro.oai; indexing example + + + In this section, we will test the system by indexing a small set of + sample &acro.oai; records that are included with the &zebra; distribution, + running a &zebra; server against the newly created database, and + searching the indexes with a client that connects to that server. + + + Go to the examples/oai-pmh subdirectory of the + distribution archive, or make a deep copy of the Debian installation + directory + /usr/share/idzebra-2.0-examples/oai-pmh. + An XML file containing multiple &acro.oai; + records is located in the sub + directory examples/oai-pmh/data. + + Additional OAI test records can be downloaded by running a shell - script (you may want to abort the script when you have waitet - longer than your coffe brews ..). - + script (you may want to abort the script when you have waited + longer than your coffee brews ..). + cd data ./fetch_OAI_data.sh cd ../ - - - + + + To index these &acro.oai; records, type: - - zebraidx-2.0 -c conf/zebra.cfg init - zebraidx-2.0 -c conf/zebra.cfg update data - zebraidx-2.0 -c conf/zebra.cfg commit - - In case you have not installed zebra yet but have compiled the + + zebraidx-2.0 -c conf/zebra.cfg init + zebraidx-2.0 -c conf/zebra.cfg update data + zebraidx-2.0 -c conf/zebra.cfg commit + + In case you have not installed zebra yet but have compiled the binaries from this tarball, use the following command form: - - ../../index/zebraidx -c conf/zebra.cfg this and that - - On some systems the &zebra; binaries are installed under the - generic names, you need to use the following command form: - - zebraidx -c conf/zebra.cfg this and that - - - - - In this command, the word update is followed - by the name of a directory: zebraidx updates all - files in the hierarchy rooted at data. - The command option - -c conf/zebra.cfg points to the proper - configuration file. - - - - You might ask yourself how &acro.xml; content is indexed using &acro.xslt; - stylesheets: to satisfy your curiosity, you might want to run the - indexing transformation on an example debugging &acro.oai; record. - - xsltproc conf/oai2index.xsl data/debug-record.xml - + + ../../index/zebraidx -c conf/zebra.cfg this and that + + On some systems the &zebra; binaries are installed under the + generic names, you need to use the following command form: + + zebraidx -c conf/zebra.cfg this and that + + + + + In this command, the word update is followed + by the name of a directory: zebraidx updates all + files in the hierarchy rooted at data. + The command option + -c conf/zebra.cfg points to the proper + configuration file. + + + + You might ask yourself how &acro.xml; content is indexed using &acro.xslt; + stylesheets: to satisfy your curiosity, you might want to run the + indexing transformation on an example debugging &acro.oai; record. + + xsltproc conf/oai2index.xsl data/debug-record.xml + Here you see the &acro.oai; record transformed into the indexing &acro.xml; format. &zebra; is creating several inverted indexes, and their name and type are clearly visible in the indexing &acro.xml; format. - - - - If your indexing command was successful, you are now ready to - fire up a server. To start a server on port 9999, type: - - zebrasrv-2.0 -c conf/zebra.cfg @:9999 - - - - - The &zebra; index that you have just created has a single database - named Default. - The database contains several &acro.oai; records, and the server will - return records in the &acro.xml; format only. The indexing machine - did the splitting into individual records just behind the scenes. - - - - - - - Searching the &acro.oai; database by web service - - + + + + If your indexing command was successful, you are now ready to + fire up a server. To start a server on port 9999, type: + + zebrasrv-2.0 -c conf/zebra.cfg @:9999 + + + + + The &zebra; index that you have just created has a single database + named Default. + The database contains several &acro.oai; records, and the server will + return records in the &acro.xml; format only. The indexing machine + did the splitting into individual records just behind the scenes. + + + + + + + Searching the &acro.oai; database by web service + + &zebra; has a build-in web service, which is close to the &acro.sru; standard web service. We use it to access our new - database using any &acro.xml; enabled web browser. + database using any &acro.xml; enabled web browser. This service is using the &acro.pqf; query language. In a later section we show how to run a fully compliant &acro.sru; server, @@ -106,75 +105,75 @@ Searching and retrieving &acro.xml; records is easy. For example, - you can point your browser to one of the following url's to + you can point your browser to one of the following URLs to search for the term the. Just point your browser at this link: - http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the + url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the"> + http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the - These URL's woun't work unless you have indexed the example data + These URLs won't work unless you have indexed the example data and started an &zebra; server as outlined in the previous section. In case we actually want to retrieve one record, we need to alter - our URl to the following - - http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=dc - + our URL to the following + + http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=dc + This way we can page through our result set in chunks of records, for example, we access the 6th to the 10th record using the URL - - http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=6&maximumRecords=5&recordSchema=dc - - + + http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=6&maximumRecords=5&recordSchema=dc + + - - + &x-pquery=title%3Cthe + --> + - - Presenting search results in different formats + + Presenting search results in different formats &zebra; uses &acro.xslt; stylesheets for both &acro.xml;record indexing and display retrieval. In this example installation, they are two - retrieval schema's defined in - conf/dom-conf.xml: - the dc schema implemented in + retrieval schema's defined in + conf/dom-conf.xml: + the dc schema implemented in conf/oai2dc.xsl, and - the zebra schema implemented in - conf/oai2zebra.xsl. - The URL's for acessing both are the same, except for the different + the zebra schema implemented in + conf/oai2zebra.xsl. + The URLs for accessing both are the same, except for the different value of the recordSchema parameter: http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=dc - + and http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra - + For the curious, one can see that the &acro.xslt; transformations - really do the magic. + really do the magic. xsltproc conf/oai2dc.xsl data/debug-record.xml xsltproc conf/oai2zebra.xsl data/debug-record.xml - + Notice also that the &zebra; specific parameters are injected by the engine when retrieving data, therefore some of the attributes in the zebra retrieval schema are not filled @@ -197,39 +196,39 @@ original stored &acro.oai; &acro.xml; record. http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::data - + - + - - More interesting searches + + More interesting searches The &acro.oai; indexing example defines many different index names, a study of the conf/oai2index.xsl stylesheet reveals the following word type indexes (i.e. those - swith suffix :w): + with suffix :w): - any:w - dc_title:w - dc_creator:w - dc_subject:w - dc_description:w - dc_contributor:w - dc_publisher:w - dc_language:w - dc_rights:w + any:w + title:w + author:w + subject:w + description:w + contributor:w + publisher:w + language:w + rights:w - By default, searches do access the anr:w index, + By default, searches do access the any:w index, but we can direct searches to any access point by constructing the correct &acro.pqf; query. For example, to search in titles only, we use + url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=@attr + 1=title the&startRecord=1&maximumRecords=1&recordSchema=dc"> http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=@attr - 1=dc_title the&startRecord=1&maximumRecords=1&recordSchema=dc + 1=title the&startRecord=1&maximumRecords=1&recordSchema=dc @@ -237,49 +236,49 @@ Similar we can direct searches to the other indexes defined. Or we can create boolean combinations of searches on different indexes. In this case we search for the in - dc_title and for fish in - dc_description using the query - @and @attr 1=dc_title the @attr 1=dc_description fish. + title and for fish in + description using the query + @and @attr 1=title the @attr 1=description fish. + url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=@and + @attr 1=title the + @attr 1=description + fish&startRecord=1&maximumRecords=1&recordSchema=dc"> http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=@and - @attr 1=dc_title the - @attr 1=dc_description fish&startRecord=1&maximumRecords=1&recordSchema=dc + @attr 1=title the + @attr 1=description fish&startRecord=1&maximumRecords=1&recordSchema=dc - + - - Investigating the content of the indexes + + Investigating the content of the indexes - How doess the magic work? What is inside the indexes? Why is a certain - record foound by a search, and another not?. The answer is in the - inverterd indexes. You can easily investigate them using the + How does the magic work? What is inside the indexes? Why is a certain + record found by a search, and another not?. The answer is in the + inverted indexes. You can easily investigate them using the special &zebra; schema zebra::index::fieldname. In this example you - can see that the dc_title index has both word + can see that the title index has both word (type :w) and phrase (type - :p) - indexed fields, - - http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::index::dc_title - + :p) + indexed fields, + + http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::index::title + But where in the indexes did the term match for the query occur? Easily answered with the special &zebra; schema - zebra::snippet. The matching terma are - encapsulated by <s> tags. + zebra::snippet. The matching terms are + encapsulated by <s> tags. http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::snippet - + @@ -287,37 +286,37 @@ found inside my hit set? Try the special &zebra; schema zebra::facet::fieldname:type. In this case, we investigate additional search terms for the - dc_title:w index. - - http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::facet::dc_title:w - + title:w index. + + http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::facet::title:w + One can ask for multiple facets. Here, we want them from phrase indexes of type :p. - - http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::facet::dc_publisher:p,dc_title:p - + + http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::facet::publisher:p,title:p + - + - - Setting up a correct &acro.sru; web service + + Setting up a correct &acro.sru; web service - The &acro.sru; specification mandates that the &acro.cql; query - language is supported and properly configure. Also, the server - needs to be able to emmit a proper &acro.explain; &acro.xml; - record, which is used to determine the capabilities of the - specific server instance. - + The &acro.sru; specification mandates that the &acro.cql; query + language is supported and properly configure. Also, the server + needs to be able to emit a proper &acro.explain; &acro.xml; + record, which is used to determine the capabilities of the + specific server instance. + - In this example configuration we expoit the similarities between + In this example configuration we exploit the similarities between the &acro.explain; record and the &acro.cql; query language configuration, we generate the later from the former using an &acro.xslt; transformation. @@ -327,18 +326,18 @@ - The we are all set to start the &acro.sru;/acro.z3950; server including + We are all set to start the &acro.sru;/&acro.z3950; server including &acro.pqf; and &acro.cql; query configuration. It uses the &yaz; frontend server configuration - just type zebrasrv -f conf/yazserver.xml - - + + First, we'd like to be sure that we can see the &acro.explain; &acro.xml; response correctly. You might use either of these equivalent - requests: + requests: http://localhost:9999 @@ -350,42 +349,42 @@ - Now we can issue true &acro.sru; requests. For example, + Now we can issue true &acro.sru; requests. For example, dc.title=the - and dc.description=fish results in the following page + and dc.description=fish results in the following page + url="http://localhost:9999/?version=1.1&operation=searchRetrieve&query=dc.title=the + and dc.description=fish + &startRecord=1&maximumRecords=1&recordSchema=dc"> http://localhost:9999/?version=1.1&operation=searchRetrieve&query=dc.title=the and dc.description=fish &startRecord=1&maximumRecords=1&recordSchema=dc - Scan of indexes is a part of the &acro.sru; server business. For example, + Scan of indexes is a part of the &acro.sru; server business. For example, scanning the dc.title index gives us an idea what search terms are found there + url="http://localhost:9999/?version=1.1&operation=scan&scanClause=dc.title=fish"> http://localhost:9999/?version=1.1&operation=scan&scanClause=dc.title=fish , - whereas - -http://localhost:9999/?version=1.1&operation=scan&scanClause=dc.identifier=fish - - accesses the indexed indentifiers. + whereas + + http://localhost:9999/?version=1.1&operation=scan&scanClause=dc.identifier=fish + + accesses the indexed identifiers. - In addition, all &zebra; internal special elemen sets or record - schema's of the form + In addition, all &zebra; internal special element sets or record + schema's of the form zebra:: just work right out of the box + url="http://localhost:9999/?version=1.1&operation=searchRetrieve&query=dc.title=the + and dc.description=fish + &startRecord=1&maximumRecords=1&recordSchema=zebra::snippet"> http://localhost:9999/?version=1.1&operation=searchRetrieve&query=dc.title=the and dc.description=fish &startRecord=1&maximumRecords=1&recordSchema=zebra::snippet @@ -393,26 +392,26 @@ http://localhost:9999/?version=1.1&operation=scan&scanClause=dc.identifi - + Searching the &acro.oai; database by &acro.z3950; protocol - + In this section we repeat the searches and presents we have done so far using the binary &acro.z3950; protocol, you can use any - &acro.z3950; client. + &acro.z3950; client. For instance, you can use the demo command-line client that comes - with &yaz;. + with &yaz;. - Connecting to the server is done by the command - + Connecting to the server is done by the command + yaz-client localhost:9999 - + When the client has connected, you can type: @@ -423,45 +422,45 @@ http://localhost:9999/?version=1.1&operation=scan&scanClause=dc.identifi Z> show 1+1 - + &acro.z3950; presents using presentation stylesheets: Z> elements dc Z> show 2+1 - + Z> elements zebra Z> show 3+1 - &acro.z3950; buildin Zebra presents (in this configuration only if + &acro.z3950; buildin Zebra presents (in this configuration only if started without yaz-frontendserver): - + Z> elements zebra::meta Z> show 4+1 - + Z> elements zebra::meta::sysno Z> show 5+1 - + Z> format sutrs Z> show 5+1 Z> format xml - + Z> elements zebra::index Z> show 6+1 - + Z> elements zebra::snippet Z> show 7+1 - + Z> elements zebra::facet::any:w Z> show 1+1 - - Z> elements zebra::facet::dc_publisher:p,dc_title:p + + Z> elements zebra::facet::publisher:p,title:p Z> show 1+1 - + @@ -478,15 +477,15 @@ http://localhost:9999/?version=1.1&operation=scan&scanClause=dc.identifi Z> find @attr 1=oai_setspec @attr 4=3 7374617475733D756E707562 Z> show 1+1 - - Z> find @attr 1=dc_title communication + + Z> find @attr 1=title communication Z> show 1+1 - - Z> find @attr 1=dc_identifier @attr 4=3 + + Z> find @attr 1=identifier @attr 4=3 http://resolver.caltech.edu/CaltechCSTR:1986.5228-tr-86 Z> show 1+1 - etc, etc. + etc, etc. @@ -499,79 +498,74 @@ http://localhost:9999/?version=1.1&operation=scan&scanClause=dc.identifi Z> scan @attr 1=oai_datestamp @attr 4=3 1 Z> scan @attr 1=oai_setspec @attr 4=3 2000 Z> - Z> scan @attr 1=dc_title communication - Z> scan @attr 1=dc_identifier @attr 4=3 a - + Z> scan @attr 1=title communication + Z> scan @attr 1=identifier @attr 4=3 a + &acro.z3950; search using server-side CQL conversion: - Z> format xml - Z> querytype cql - Z> elements dc - Z> - Z> find harry - Z> - Z> find dc.creator = the - Z> find dc.creator = the - Z> find dc.title = the - Z> - Z> find dc.description < the - Z> find dc.title > some - Z> - Z> find dc.identifier="http://resolver.caltech.edu/CaltechCSTR:1978.2276-tr-78" - Z> find dc.relation = something - + Z> format xml + Z> querytype cql + Z> elements dc + Z> + Z> find harry + Z> + Z> find dc.creator = the + Z> find dc.creator = the + Z> find dc.title = the + Z> + Z> find dc.description < the + Z> find dc.title > some + Z> + Z> find dc.identifier="http://resolver.caltech.edu/CaltechCSTR:1978.2276-tr-78" + Z> find dc.relation = something + - - &acro.z3950; scan using server side CQL conversion - - unfortunately, this will _never_ work as it is not supported by the - &acro.z3950; standard. - If you want to use scan using server side CQL conversion, you need to - make an SRW connection using yaz-client, or a - SRU connection using REST Web Services - any browser will do. - + + &acro.z3950; scan using server side CQL conversion - + unfortunately, this will _never_ work as it is not supported by the + &acro.z3950; standard. + If you want to use scan using server side CQL conversion, you need to + make an SRW connection using yaz-client, or a + SRU connection using REST Web Services - any browser will do. + - - All indexes defined by 'type="0"' in the - indexing style sheet must be searched using the '@attr 4=3' - structure attribute instruction. - + + All indexes defined by 'type="0"' in the + indexing style sheet must be searched using the '@attr 4=3' + structure attribute instruction. + - Notice that searching and scan on indexes - dc_contributor, dc_language, - dc_rights, and dc_source - might fail, simply because none of the records in the small example set - have these fields set, and consequently, these indexes might not - been created. + Notice that searching and scan on indexes + contributor, language, + rights, and source + might fail, simply because none of the records in the small example set + have these fields set, and consequently, these indexes might not + been created. - - - - - + + - -