From d8ba36b7da24857c593393ef680c3983070e1124 Mon Sep 17 00:00:00 2001 From: Mike Taylor Date: Fri, 8 Nov 2002 01:01:38 +0000 Subject: [PATCH] Fix to describe new Zthes example instead of dinosauricon. --- doc/examples.xml | 103 ++++++++++++++++++++++++++---------------------------- 1 file changed, 49 insertions(+), 54 deletions(-) diff --git a/doc/examples.xml b/doc/examples.xml index aefb286..0a77798 100644 --- a/doc/examples.xml +++ b/doc/examples.xml @@ -1,5 +1,5 @@ - + Example Configurations @@ -59,16 +59,20 @@ expressions to specify access points. - Go to the examples/dinosauricon subdirectory + Go to the examples/zthes subdirectory of the distribution archive. - There you will find a records subdirectory, - which contains some raw XML data to be added to the database: in - this case, as single file, genera.xml, - which contain information about all the known dinosaur genera as of - August 2002. + There you will find a Makefile that will + populate the records subdirectory with a file of + Zthes + records representing a taxonomic hierarchy of dinosaurs. (The + records are generated from the family tree in the file + dino.tree.) + Type make records/dino.xml + to make the XML data file. - Now we need to create the Zebra database, which we do with the + Now we need to create a Zebra database to hold and index the XML + records. We do this with the Zebra indexer, zebraidx, which is driven by the zebra.cfg configuration file. For our purposes, we don't need any @@ -106,31 +110,27 @@ $ yaz-client tcp:@:9999 Connecting...Ok. - Z> find @attr 1=/GENUS/SPECIES/AUTHOR/@name Wedel + Z> find @attr 1=/Zthes/termName Sauroposeidon Number of hits: 1 Z> format xml Z> show 1 - <GENUS name="Sauroposeidon" type="with"> - <MEANING>lizard Poseidon <LOW>(Greek god of, among other things, earthquakes)</LOW></MEANING> - <SPECIES name="proteles"> - <AUTHOR type="vide" name="Franklin" year="2000"></AUTHOR> - <AUTHOR name="Wedel, Cifelli, Sanders"></AUTHOR> - </SPECIES> - <PLACE name="Oklahoma"></PLACE> - <TIME value="Albian"></TIME> - <LENGTH value="30" q="1"></LENGTH> - <REMAINS content="rib, cervical vertebrae"></REMAINS> - <ESSAY> - <P> This new <NOMEN name="Brachiosaurus"></NOMEN>-like <LINK content="dinosaur"></LINK> - was perhaps the tallest. With its head raised, it stood 60 feet (nearly - 20 m) tall. </P> - </ESSAY> + <Zthes> + <termId>22</termId> + <termName>Sauroposeidon</termName> + <termType>PT</termType> + <relation> + <relationType>BT</relationType> + <termId>21</termId> + <termName>Brachiosauridae</termName> + <termType>PT</termType> + </relation> + <idzebra xmlns="http://www.indexdata.dk/zebra/"> - <size>593</size> - <localnumber>891</localnumber> - <filename>records/genera.xml</filename> - </idzebra> - </GENUS> + <size>245</size> + <localnumber>23</localnumber> + <filename>records/dino.xml</filename> + </idzebra> + </Zthes> @@ -145,30 +145,26 @@ The problem with the previous example is that you need to know the structure of the documents in order to find them. For example, - when we wanted to know the genera for which Matt Wedel is an - author - (Sauroposeidon proteles), + when we wanted to find the record for the taxon + Sauroposeidon, we had to formulate a complex XPath - 1=/GENUS/SPECIES/AUTHOR/@name - which embodies the knowledge that author names are specified in the - name attribute of the - <AUTHOR> element, - which is inside the - <SPECIES> element, - which in turn is inside the top-level - <GENUS> element. + /Zthes/termName + which embodies the knowledge that taxon names are specified in a + <termName> element inside the top-level + <Zthes> element. This is bad not just because it requires a lot of typing, but more significantly because it ties searching semantics to the physical structure of the searched records. You can't use the same search specification to search two databases if their internal - representations are different. Consider an alternative dinosaur - database in which the records have author names specified - inside an <authorName> element directly + representations are different. Consider an alternative taxonomy + database in which the records have taxon names specified + inside a <name> element nested within a + <identification> element inside a top-level <taxon> element: then you'd need to search for them using - 1=/taxon/authorName + 1=/taxon/identification/name How, then, can we build broadcasting Information Retrieval @@ -200,28 +196,27 @@ records in databases. - In the BIB-1 attribute set, an author search is represented by - access point 1003. (See + In the BIB-1 attribute set, a taxon name is probably best + interpreted as a title - that is, a phrase that identifies the item + in question. BIB-1 represents title searches by + access point 4. (See ) So we need to configure our dinosaur database so that searches for - BIB-1 access point 1003 look the - name attribute of the - <AUTHOR> element, - inside the - <SPECIES> element, + BIB-1 access point 4 look in the + <termName> element, inside the top-level - <GENUS> element. + <Zthes> element. This is a two-step process. First, we need to tell Zebra that we want to support the BIB-1 attribute set. Then we need to tell it - which elements of its record pertain to access point 1003. + which elements of its record pertain to access point 4. We need to create an Abstract Syntax file named after the document element of the records we're working with, plus a .abs suffix - in this case, - GENUS.abs - as follows: + Zthes.abs - as follows: -- 1.7.10.4