X-Git-Url: http://git.indexdata.com/?p=idzebra-moved-to-github.git;a=blobdiff_plain;f=doc%2Fexamples.xml;h=7a5b015e1ccee9aec5ff78077aa6428e2b422831;hp=dc95e1209e282683562398be4d2d60677b984d0c;hb=c00bfddbf0f3608340d61298acc61dafb167f9b2;hpb=c50b7223e10de52e713be64559129ea89e8ed601 diff --git a/doc/examples.xml b/doc/examples.xml index dc95e12..7a5b015 100644 --- a/doc/examples.xml +++ b/doc/examples.xml @@ -1,334 +1,394 @@ - - - Example Configurations - - - Overview - - - zebraidx and zebrasrv are both - driven by a master configuration file, which may refer to other - subsidiary configuration files. By default, they try to use - zebra.cfg in the working directory as the - master file; but this can be changed using the -c - option to specify an alternative master configuration file. - - - The master configuration file tells Zebra: - - - - - Where to find subsidiary configuration files, including both - those that are named explicitly and a few ``magic'' files such - as default.idx, - which specifies the default indexing rules. - - + + Example Configurations - - - What record schemas to support. (Subsidiary files specifiy how - to index the contents of records in those schemas, and what - format to use when presenting records in those schemas to client - software.) - - + + Overview - - - What attribute sets to recognise in searches. (Subsidiary files - specify how to interpret the attributes in terms - of the indexes that are created on the records.) - - + + zebraidx and + zebrasrv are both + driven by a master configuration file, which may refer to other + subsidiary configuration files. By default, they try to use + zebra.cfg in the working directory as the + master file; but this can be changed using the -c + option to specify an alternative master configuration file. + + + The master configuration file tells &zebra;: + - - - Policy details such as what type of input format to expect when - adding new records, what low-level indexing algorithm to use, - how to identify potential duplicate records, etc. - - - - - - - Now let's see what goes in the zebra.cfg file - for some example configurations. - - - - - Example 1: XML Indexing And Searching - - - This example shows how Zebra can be used with absolutely minimal - configuration to index a body of - XML - documents, and search them using - XPath - expressions to specify access points. - - - Go to the examples/zthes subdirectory - of the distribution archive. - There you will find a Makefile that will - populate the records subdirectory with a file of - Zthes - records representing a taxonomic hierarchy of dinosaurs. (The - records are generated from the family tree in the file - dino.tree.) - Type make records/dino.xml - to make the XML data file. - (Or you could just type make to build the XML - data file, create the database and populate it with the taxonomic - records all in one shot - but then you wouldn't learn anything, - would you? :-) - - - Now we need to create a Zebra database to hold and index the XML - records. We do this with the - Zebra indexer, zebraidx, which is - driven by the zebra.cfg configuration file. - For our purposes, we don't need any - special behaviour - we can use the defaults - so we can start with a - minimal file that just tells zebraidx where to - find the default indexing rules, and how to parse the records: - - profilePath: .:../../tab - recordType: grs.sgml - - - - That's all you need for a minimal Zebra configuration. Now you can - roll the XML records into the database and build the indexes: - - zebraidx update records - - - - Now start the server. Like the indexer, its behaviour is - controlled by the - zebra.cfg file; and like the indexer, it works - just fine with this minimal configuration. - - zebrasrv - - By default, the server listens on IP port number 9999, although - this can easily be changed - see - . - - - Now you can use the Z39.50 client program of your choice to execute - XPath-based boolean queries and fetch the XML records that satisfy - them: - - $ yaz-client @:9999 - Connecting...Ok. - Z> find @attr 1=/Zthes/termName Sauroposeidon - Number of hits: 1 - Z> format xml - Z> show 1 - <Zthes> + + + Where to find subsidiary configuration files, including both + those that are named explicitly and a few ``magic'' files such + as default.idx, + which specifies the default indexing rules. + + + + + + What record schemas to support. (Subsidiary files specify how + to index the contents of records in those schemas, and what + format to use when presenting records in those schemas to client + software.) + + + + + + What attribute sets to recognise in searches. (Subsidiary files + specify how to interpret the attributes in terms + of the indexes that are created on the records.) + + + + + + Policy details such as what type of input format to expect when + adding new records, what low-level indexing algorithm to use, + how to identify potential duplicate records, etc. + + + + + + + Now let's see what goes in the zebra.cfg file + for some example configurations. + + + + + Example 1: &acro.xml; Indexing And Searching + + + This example shows how &zebra; can be used with absolutely minimal + configuration to index a body of + &acro.xml; + documents, and search them using + XPath + expressions to specify access points. + + + Go to the examples/zthes subdirectory + of the distribution archive. + There you will find a Makefile that will + populate the records subdirectory with a file of + Zthes + records representing a taxonomic hierarchy of dinosaurs. (The + records are generated from the family tree in the file + dino.tree.) + Type make records/dino.xml + to make the &acro.xml; data file. + (Or you could just type make dino to build the &acro.xml; + data file, create the database and populate it with the taxonomic + records all in one shot - but then you wouldn't learn anything, + would you? :-) + + + Now we need to create a &zebra; database to hold and index the &acro.xml; + records. We do this with the + &zebra; indexer, zebraidx, which is + driven by the zebra.cfg configuration file. + For our purposes, we don't need any + special behaviour - we can use the defaults - so we can start with a + minimal file that just tells zebraidx where to + find the default indexing rules, and how to parse the records: + + profilePath: .:../../tab + recordType: grs.sgml + + + + That's all you need for a minimal &zebra; configuration. Now you can + roll the &acro.xml; records into the database and build the indexes: + + zebraidx update records + + + + Now start the server. Like the indexer, its behaviour is + controlled by the + zebra.cfg file; and like the indexer, it works + just fine with this minimal configuration. + + zebrasrv + + By default, the server listens on IP port number 9999, although + this can easily be changed - see + . + + + Now you can use the &acro.z3950; client program of your choice to execute + XPath-based boolean queries and fetch the &acro.xml; records that satisfy + them: + + $ yaz-client @:9999 + Connecting...Ok. + Z> find @attr 1=/Zthes/termName Sauroposeidon + Number of hits: 1 + Z> format xml + Z> show 1 + <Zthes> <termId>22</termId> <termName>Sauroposeidon</termName> <termType>PT</termType> <termNote>The tallest known dinosaur (18m)</termNote> <relation> - <relationType>BT</relationType> - <termId>21</termId> - <termName>Brachiosauridae</termName> - <termType>PT</termType> + <relationType>BT</relationType> + <termId>21</termId> + <termName>Brachiosauridae</termName> + <termType>PT</termType> </relation> - <idzebra xmlns="http://www.indexdata.dk/zebra/"> - <size>300</size> - <localnumber>23</localnumber> - <filename>records/dino.xml</filename> - </idzebra> - </Zthes> - - - - Now wasn't that nice and easy? - - - - - - Example 2: Supporting Interoperable Searches - - - The problem with the previous example is that you need to know the - structure of the documents in order to find them. For example, - when we wanted to find the record for the taxon - Sauroposeidon, - we had to formulate a complex XPath - /Zthes/termName - which embodies the knowledge that taxon names are specified in a - <termName> element inside the top-level - <Zthes> element. - - - This is bad not just because it requires a lot of typing, but more - significantly because it ties searching semantics to the physical - structure of the searched records. You can't use the same search - specification to search two databases if their internal - representations are different. Consider an different taxonomy - database in which the records have taxon names specified - inside a <name> element nested within a - <identification> element - inside a top-level <taxon> element: then - you'd need to search for them using - 1=/taxon/identification/name - - - How, then, can we build broadcasting Information Retrieval - applications that look for records in many different databases? - The Z39.50 protocol offers a powerful and general solution to this: - abstract ``access points''. In the Z39.50 model, an access point - is simply a point at which searches can be directed. Nothing is - said about implementation: in a given database, an access point - might be implemented as an index, a path into physical records, an - algorithm for interrogating relational tables or whatever works. - The only important thing point is that the semantics of an access - point are fixed and well defined. - - - For convenience, access points are gathered into attribute - sets. For example, the BIB-1 attribute set is supposed to - contain bibliographic access points such as author, title, subject - and ISBN; the GEO attribute set contains access points pertaining - to geospatial information (bounding coordinates, stratum, latitude - resolution, etc.); the CIMI - attribute set contains access points to do with museum collections - (provenance, inscriptions, etc.) - - - In practice, the BIB-1 attribute set has tended to be a dumping - ground for all sorts of access points, so that, for example, it - includes some geospatial access points as well as strictly - bibliographic ones. Nevertheless, this model - allows a layer of abstraction over the physical representation of - records in databases. - - - In the BIB-1 attribute set, a taxon name is probably best - interpreted as a title - that is, a phrase that identifies the item - in question. BIB-1 represents title searches by - access point 4. (See - The BIB-1 Attribute Set Semantics) - So we need to configure our dinosaur database so that searches for - BIB-1 access point 4 look in the - <termName> element, - inside the top-level - <Zthes> element. - - - ### Here's where it all goes to pieces. The current arrangement is - very awkward (and somewhat embarrassing) to describe, and the new - arrangement hasn't actually been implemented yet. - - - This is a two-step process. First, we need to tell Zebra that we - want to support the BIB-1 attribute set. Then we need to tell it - which elements of its record pertain to access point 4. - - - We need to create an Abstract Syntax - file named after the document element of the records we're - working with, plus a .abs suffix - in this case, - Zthes.abs - as follows: - - - - - - - - - - - - - - - - - - - + + + -->