X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Fexamples.xml;h=f8f9a203ef119a85280f242d1e5672faf4993635;hb=7c3a0352f0492609a3b6b26b63a72b0b2d207aab;hp=0a77798a913bf3d90b77e92203c21e12d1b75044;hpb=d8ba36b7da24857c593393ef680c3983070e1124;p=idzebra-moved-to-github.git diff --git a/doc/examples.xml b/doc/examples.xml index 0a77798..f8f9a20 100644 --- a/doc/examples.xml +++ b/doc/examples.xml @@ -1,5 +1,5 @@ - + Example Configurations @@ -19,23 +19,35 @@ - Where to find subsidiary configuration files, including - default.idx + Where to find subsidiary configuration files, including both + those that are named explicitly and a few ``magic'' files such + as default.idx, which specifies the default indexing rules. - What attribute sets to recognise in searches. + What record schemas to support. (Subsidiary files specifiy how + to index the contents of records in those schemas, and what + format to use when presenting records in those schemas to client + software.) - Policy details such as what record type to expect, what - low-level indexing algorithm to use, how to identify potential - duplicate records, etc. + What attribute sets to recognise in searches. (Subsidiary files + specify how to interpret the attributes in terms + of the indexes that are created on the records.) + + + + + + Policy details such as what type of input format to expect when + adding new records, what low-level indexing algorithm to use, + how to identify potential duplicate records, etc. @@ -53,9 +65,9 @@ This example shows how Zebra can be used with absolutely minimal configuration to index a body of - XML + XML documents, and search them using - XPath + XPath expressions to specify access points. @@ -69,6 +81,10 @@ dino.tree.) Type make records/dino.xml to make the XML data file. + (Or you could just type make dino to build the XML + data file, create the database and populate it with the taxonomic + records all in one shot - but then you wouldn't learn anything, + would you? :-) Now we need to create a Zebra database to hold and index the XML @@ -76,7 +92,7 @@ Zebra indexer, zebraidx, which is driven by the zebra.cfg configuration file. For our purposes, we don't need any - special behaviour - we can use the defaults - so we start with a + special behaviour - we can use the defaults - so we can start with a minimal file that just tells zebraidx where to find the default indexing rules, and how to parse the records: @@ -108,7 +124,7 @@ XPath-based boolean queries and fetch the XML records that satisfy them: - $ yaz-client tcp:@:9999 + $ yaz-client @:9999 Connecting...Ok. Z> find @attr 1=/Zthes/termName Sauroposeidon Number of hits: 1 @@ -118,6 +134,7 @@ <termId>22</termId> <termName>Sauroposeidon</termName> <termType>PT</termType> + <termNote>The tallest known dinosaur (18m)</termNote> <relation> <relationType>BT</relationType> <termId>21</termId> @@ -126,7 +143,7 @@ </relation> <idzebra xmlns="http://www.indexdata.dk/zebra/"> - <size>245</size> + <size>300</size> <localnumber>23</localnumber> <filename>records/dino.xml</filename> </idzebra> @@ -134,7 +151,7 @@ - Now wasn't that easy? + Now wasn't that nice and easy? @@ -158,7 +175,7 @@ significantly because it ties searching semantics to the physical structure of the searched records. You can't use the same search specification to search two databases if their internal - representations are different. Consider an alternative taxonomy + representations are different. Consider an different taxonomy database in which the records have taxon names specified inside a <name> element nested within a <identification> element @@ -175,15 +192,16 @@ said about implementation: in a given database, an access point might be implemented as an index, a path into physical records, an algorithm for interrogating relational tables or whatever works. - The key point is that the semantics of an access point are fixed - and well defined. + The only important thing point is that the semantics of an access + point are fixed and well defined. For convenience, access points are gathered into attribute sets. For example, the BIB-1 attribute set is supposed to contain bibliographic access points such as author, title, subject and ISBN; the GEO attribute set contains access points pertaining - to geospatial information (bounding box, ###, etc.); the CIMI + to geospatial information (bounding coordinates, stratum, latitude + resolution, etc.); the CIMI attribute set contains access points to do with museum collections (provenance, inscriptions, etc.) @@ -191,7 +209,7 @@ In practice, the BIB-1 attribute set has tended to be a dumping ground for all sorts of access points, so that, for example, it includes some geospatial access points as well as strictly - bibliographic ones. Nevertheless, the key point is that this model + bibliographic ones. Nevertheless, this model allows a layer of abstraction over the physical representation of records in databases. @@ -200,7 +218,8 @@ interpreted as a title - that is, a phrase that identifies the item in question. BIB-1 represents title searches by access point 4. (See - ) + The BIB-1 Attribute Set Semantics) So we need to configure our dinosaur database so that searches for BIB-1 access point 4 look in the <termName> element, @@ -211,24 +230,89 @@ This is a two-step process. First, we need to tell Zebra that we want to support the BIB-1 attribute set. Then we need to tell it which elements of its record pertain to access point 4. - - + + We need to create an Abstract Syntax file named after the document element of the records we're - working with, plus a .abs suffix - in this case, - Zthes.abs - as follows: - - - - - - - - - - - - + working with, plus a .abs suffix - in this case, + Zthes.abs - as follows: + + + + + + + + + +attset zthes.att +attset bib1.att +xpath enable +systag sysno none + +xelm /Zthes/termId termId:w +xelm /Zthes/termName termName:w,title:w +xelm /Zthes/termQualifier termQualifier:w +xelm /Zthes/termType termType:w +xelm /Zthes/termLanguage termLanguage:w +xelm /Zthes/termNote termNote:w +xelm /Zthes/termCreatedDate termCreatedDate:w +xelm /Zthes/termCreatedBy termCreatedBy:w +xelm /Zthes/termModifiedDate termModifiedDate:w +xelm /Zthes/termModifiedBy termModifiedBy:w + + + + + Declare Thesausus attribute set. See zthes.att. + + + + + Declare Bib-1 attribute set. See bib1.att in + Zebra's tab directory. + + + + + This xelm directive selects contents of nodes by XPath expression + /Zthes/termId. The contents (CDATA) will be + word searchable by Zthes attribute termId (value 1001). + + + + + Make termName word searchable by both + Zthes attribute termName (1002) and Bib-1 atttribute title (4). + + + + + + After re-indexing, we can search the database using Bib-1 + attribute, title, as follows: + +Z> form xml +Z> f @attr 1=4 Eoraptor +Sent searchRequest. +Received SearchResponse. +Search was a success. +Number of hits: 1, setno 1 +SearchResult-1: Eoraptor(1) +records returned: 0 +Elapsed: 0.106896 +Z> s +Sent presentRequest (1+1). +Records: 1 +[Default]Record type: XML +<Zthes> + <termId>2</termId> + <termName>Eoraptor</termName> + <termType>PT</termType> + <termNote>The most basal known dinosaur</termNote> + ... + +