X-Git-Url: http://git.indexdata.com/?p=idzebra-moved-to-github.git;a=blobdiff_plain;f=doc%2Fexamples.xml;fp=doc%2Fexamples.xml;h=7a5b015e1ccee9aec5ff78077aa6428e2b422831;hp=ebbac178df224830168ebc8367c93f26a4136753;hb=972bceaa6386f904bc3e4845f1c5598656c5c6f2;hpb=bb39ca3dd76e6339f66813bca1e64b644760e5a2 diff --git a/doc/examples.xml b/doc/examples.xml index ebbac17..7a5b015 100644 --- a/doc/examples.xml +++ b/doc/examples.xml @@ -1,239 +1,239 @@ - - Example Configurations - - - Overview - - - zebraidx and - zebrasrv are both - driven by a master configuration file, which may refer to other - subsidiary configuration files. By default, they try to use - zebra.cfg in the working directory as the - master file; but this can be changed using the -c - option to specify an alternative master configuration file. - - - The master configuration file tells &zebra;: - - - - - Where to find subsidiary configuration files, including both - those that are named explicitly and a few ``magic'' files such - as default.idx, - which specifies the default indexing rules. - - + + Example Configurations - - - What record schemas to support. (Subsidiary files specify how - to index the contents of records in those schemas, and what - format to use when presenting records in those schemas to client - software.) - - + + Overview - - - What attribute sets to recognise in searches. (Subsidiary files - specify how to interpret the attributes in terms - of the indexes that are created on the records.) - - + + zebraidx and + zebrasrv are both + driven by a master configuration file, which may refer to other + subsidiary configuration files. By default, they try to use + zebra.cfg in the working directory as the + master file; but this can be changed using the -c + option to specify an alternative master configuration file. + + + The master configuration file tells &zebra;: + - - - Policy details such as what type of input format to expect when - adding new records, what low-level indexing algorithm to use, - how to identify potential duplicate records, etc. - - - - - - - Now let's see what goes in the zebra.cfg file - for some example configurations. - - - - - Example 1: &acro.xml; Indexing And Searching - - - This example shows how &zebra; can be used with absolutely minimal - configuration to index a body of - &acro.xml; - documents, and search them using - XPath - expressions to specify access points. - - - Go to the examples/zthes subdirectory - of the distribution archive. - There you will find a Makefile that will - populate the records subdirectory with a file of - Zthes - records representing a taxonomic hierarchy of dinosaurs. (The - records are generated from the family tree in the file - dino.tree.) - Type make records/dino.xml - to make the &acro.xml; data file. - (Or you could just type make dino to build the &acro.xml; - data file, create the database and populate it with the taxonomic - records all in one shot - but then you wouldn't learn anything, - would you? :-) - - - Now we need to create a &zebra; database to hold and index the &acro.xml; - records. We do this with the - &zebra; indexer, zebraidx, which is - driven by the zebra.cfg configuration file. - For our purposes, we don't need any - special behaviour - we can use the defaults - so we can start with a - minimal file that just tells zebraidx where to - find the default indexing rules, and how to parse the records: - - profilePath: .:../../tab - recordType: grs.sgml - - - - That's all you need for a minimal &zebra; configuration. Now you can - roll the &acro.xml; records into the database and build the indexes: - - zebraidx update records - - - - Now start the server. Like the indexer, its behaviour is - controlled by the - zebra.cfg file; and like the indexer, it works - just fine with this minimal configuration. - - zebrasrv - - By default, the server listens on IP port number 9999, although - this can easily be changed - see - . - - - Now you can use the &acro.z3950; client program of your choice to execute - XPath-based boolean queries and fetch the &acro.xml; records that satisfy - them: - - $ yaz-client @:9999 - Connecting...Ok. - Z> find @attr 1=/Zthes/termName Sauroposeidon - Number of hits: 1 - Z> format xml - Z> show 1 - <Zthes> + + + Where to find subsidiary configuration files, including both + those that are named explicitly and a few ``magic'' files such + as default.idx, + which specifies the default indexing rules. + + + + + + What record schemas to support. (Subsidiary files specify how + to index the contents of records in those schemas, and what + format to use when presenting records in those schemas to client + software.) + + + + + + What attribute sets to recognise in searches. (Subsidiary files + specify how to interpret the attributes in terms + of the indexes that are created on the records.) + + + + + + Policy details such as what type of input format to expect when + adding new records, what low-level indexing algorithm to use, + how to identify potential duplicate records, etc. + + + + + + + Now let's see what goes in the zebra.cfg file + for some example configurations. + + + + + Example 1: &acro.xml; Indexing And Searching + + + This example shows how &zebra; can be used with absolutely minimal + configuration to index a body of + &acro.xml; + documents, and search them using + XPath + expressions to specify access points. + + + Go to the examples/zthes subdirectory + of the distribution archive. + There you will find a Makefile that will + populate the records subdirectory with a file of + Zthes + records representing a taxonomic hierarchy of dinosaurs. (The + records are generated from the family tree in the file + dino.tree.) + Type make records/dino.xml + to make the &acro.xml; data file. + (Or you could just type make dino to build the &acro.xml; + data file, create the database and populate it with the taxonomic + records all in one shot - but then you wouldn't learn anything, + would you? :-) + + + Now we need to create a &zebra; database to hold and index the &acro.xml; + records. We do this with the + &zebra; indexer, zebraidx, which is + driven by the zebra.cfg configuration file. + For our purposes, we don't need any + special behaviour - we can use the defaults - so we can start with a + minimal file that just tells zebraidx where to + find the default indexing rules, and how to parse the records: + + profilePath: .:../../tab + recordType: grs.sgml + + + + That's all you need for a minimal &zebra; configuration. Now you can + roll the &acro.xml; records into the database and build the indexes: + + zebraidx update records + + + + Now start the server. Like the indexer, its behaviour is + controlled by the + zebra.cfg file; and like the indexer, it works + just fine with this minimal configuration. + + zebrasrv + + By default, the server listens on IP port number 9999, although + this can easily be changed - see + . + + + Now you can use the &acro.z3950; client program of your choice to execute + XPath-based boolean queries and fetch the &acro.xml; records that satisfy + them: + + $ yaz-client @:9999 + Connecting...Ok. + Z> find @attr 1=/Zthes/termName Sauroposeidon + Number of hits: 1 + Z> format xml + Z> show 1 + <Zthes> <termId>22</termId> <termName>Sauroposeidon</termName> <termType>PT</termType> <termNote>The tallest known dinosaur (18m)</termNote> <relation> - <relationType>BT</relationType> - <termId>21</termId> - <termName>Brachiosauridae</termName> - <termType>PT</termType> + <relationType>BT</relationType> + <termId>21</termId> + <termName>Brachiosauridae</termName> + <termType>PT</termType> </relation> - <idzebra xmlns="http://www.indexdata.dk/zebra/"> - <size>300</size> - <localnumber>23</localnumber> - <filename>records/dino.xml</filename> - </idzebra> - </Zthes> - - - - Now wasn't that nice and easy? - - - - - - Example 2: Supporting Interoperable Searches - - - The problem with the previous example is that you need to know the - structure of the documents in order to find them. For example, - when we wanted to find the record for the taxon - Sauroposeidon, - we had to formulate a complex XPath - /Zthes/termName - which embodies the knowledge that taxon names are specified in a - <termName> element inside the top-level - <Zthes> element. - - - This is bad not just because it requires a lot of typing, but more - significantly because it ties searching semantics to the physical - structure of the searched records. You can't use the same search - specification to search two databases if their internal - representations are different. Consider a different taxonomy - database in which the records have taxon names specified - inside a <name> element nested within a - <identification> element - inside a top-level <taxon> element: then - you'd need to search for them using - 1=/taxon/identification/name - - - How, then, can we build broadcasting Information Retrieval - applications that look for records in many different databases? - The &acro.z3950; protocol offers a powerful and general solution to this: - abstract ``access points''. In the &acro.z3950; model, an access point - is simply a point at which searches can be directed. Nothing is - said about implementation: in a given database, an access point - might be implemented as an index, a path into physical records, an - algorithm for interrogating relational tables or whatever works. - The only important thing is that the semantics of an access - point is fixed and well defined. - - - For convenience, access points are gathered into attribute - sets. For example, the &acro.bib1; attribute set is supposed to - contain bibliographic access points such as author, title, subject - and ISBN; the GEO attribute set contains access points pertaining - to geospatial information (bounding coordinates, stratum, latitude - resolution, etc.); the CIMI - attribute set contains access points to do with museum collections - (provenance, inscriptions, etc.) - - - In practice, the &acro.bib1; attribute set has tended to be a dumping - ground for all sorts of access points, so that, for example, it - includes some geospatial access points as well as strictly - bibliographic ones. Nevertheless, this model - allows a layer of abstraction over the physical representation of - records in databases. - - - In the &acro.bib1; attribute set, a taxon name is probably best - interpreted as a title - that is, a phrase that identifies the item - in question. &acro.bib1; represents title searches by - access point 4. (See - The &acro.bib1; Attribute - Set Semantics) - So we need to configure our dinosaur database so that searches for - &acro.bib1; access point 4 look in the - <termName> element, - inside the top-level - <Zthes> element. - - - This is a two-step process. First, we need to tell &zebra; that we - want to support the &acro.bib1; attribute set. Then we need to tell it - which elements of its record pertain to access point 4. + <idzebra xmlns="http://www.indexdata.dk/zebra/"> + <size>300</size> + <localnumber>23</localnumber> + <filename>records/dino.xml</filename> + </idzebra> + </Zthes> + + + + Now wasn't that nice and easy? + + + + + + Example 2: Supporting Interoperable Searches + + + The problem with the previous example is that you need to know the + structure of the documents in order to find them. For example, + when we wanted to find the record for the taxon + Sauroposeidon, + we had to formulate a complex XPath + /Zthes/termName + which embodies the knowledge that taxon names are specified in a + <termName> element inside the top-level + <Zthes> element. + + + This is bad not just because it requires a lot of typing, but more + significantly because it ties searching semantics to the physical + structure of the searched records. You can't use the same search + specification to search two databases if their internal + representations are different. Consider a different taxonomy + database in which the records have taxon names specified + inside a <name> element nested within a + <identification> element + inside a top-level <taxon> element: then + you'd need to search for them using + 1=/taxon/identification/name + + + How, then, can we build broadcasting Information Retrieval + applications that look for records in many different databases? + The &acro.z3950; protocol offers a powerful and general solution to this: + abstract ``access points''. In the &acro.z3950; model, an access point + is simply a point at which searches can be directed. Nothing is + said about implementation: in a given database, an access point + might be implemented as an index, a path into physical records, an + algorithm for interrogating relational tables or whatever works. + The only important thing is that the semantics of an access + point is fixed and well defined. + + + For convenience, access points are gathered into attribute + sets. For example, the &acro.bib1; attribute set is supposed to + contain bibliographic access points such as author, title, subject + and ISBN; the GEO attribute set contains access points pertaining + to geospatial information (bounding coordinates, stratum, latitude + resolution, etc.); the CIMI + attribute set contains access points to do with museum collections + (provenance, inscriptions, etc.) + + + In practice, the &acro.bib1; attribute set has tended to be a dumping + ground for all sorts of access points, so that, for example, it + includes some geospatial access points as well as strictly + bibliographic ones. Nevertheless, this model + allows a layer of abstraction over the physical representation of + records in databases. + + + In the &acro.bib1; attribute set, a taxon name is probably best + interpreted as a title - that is, a phrase that identifies the item + in question. &acro.bib1; represents title searches by + access point 4. (See + The &acro.bib1; Attribute + Set Semantics) + So we need to configure our dinosaur database so that searches for + &acro.bib1; access point 4 look in the + <termName> element, + inside the top-level + <Zthes> element. - We need to create an Abstract Syntax - file named after the document element of the records we're + This is a two-step process. First, we need to tell &zebra; that we + want to support the &acro.bib1; attribute set. Then we need to tell it + which elements of its record pertain to access point 4. + + + We need to create an Abstract Syntax + file named after the document element of the records we're working with, plus a .abs suffix - in this case, Zthes.abs - as follows: @@ -244,23 +244,23 @@ - -attset zthes.att -attset bib1.att -xpath enable -systag sysno none - -xelm /Zthes/termId termId:w -xelm /Zthes/termName termName:w,title:w -xelm /Zthes/termQualifier termQualifier:w -xelm /Zthes/termType termType:w -xelm /Zthes/termLanguage termLanguage:w -xelm /Zthes/termNote termNote:w -xelm /Zthes/termCreatedDate termCreatedDate:w -xelm /Zthes/termCreatedBy termCreatedBy:w -xelm /Zthes/termModifiedDate termModifiedDate:w -xelm /Zthes/termModifiedBy termModifiedBy:w - + + attset zthes.att + attset bib1.att + xpath enable + systag sysno none + + xelm /Zthes/termId termId:w + xelm /Zthes/termName termName:w,title:w + xelm /Zthes/termQualifier termQualifier:w + xelm /Zthes/termType termType:w + xelm /Zthes/termLanguage termLanguage:w + xelm /Zthes/termNote termNote:w + xelm /Zthes/termCreatedDate termCreatedDate:w + xelm /Zthes/termCreatedBy termCreatedBy:w + xelm /Zthes/termModifiedDate termModifiedDate:w + xelm /Zthes/termModifiedBy termModifiedBy:w + @@ -292,103 +292,103 @@ xelm /Zthes/termModifiedBy termModifiedBy:w After re-indexing, we can search the database using &acro.bib1; attribute, title, as follows: -Z> form xml -Z> f @attr 1=4 Eoraptor -Sent searchRequest. -Received SearchResponse. -Search was a success. -Number of hits: 1, setno 1 -SearchResult-1: Eoraptor(1) -records returned: 0 -Elapsed: 0.106896 -Z> s -Sent presentRequest (1+1). -Records: 1 -[Default]Record type: &acro.xml; -<Zthes> - <termId>2</termId> - <termName>Eoraptor</termName> - <termType>PT</termType> - <termNote>The most basal known dinosaur</termNote> - ... + Z> form xml + Z> f @attr 1=4 Eoraptor + Sent searchRequest. + Received SearchResponse. + Search was a success. + Number of hits: 1, setno 1 + SearchResult-1: Eoraptor(1) + records returned: 0 + Elapsed: 0.106896 + Z> s + Sent presentRequest (1+1). + Records: 1 + [Default]Record type: &acro.xml; + <Zthes> + <termId>2</termId> + <termName>Eoraptor</termName> + <termType>PT</termType> + <termNote>The most basal known dinosaur</termNote> + ... - - - - - - - + + + -->