- <idzebra xmlns="http://www.indexdata.dk/zebra/">
- <size>245</size>
- <localnumber>23</localnumber>
- <filename>records/dino.xml</filename>
- </idzebra>
- </Zthes>
- </screen>
- </para>
- <para>
- Now wasn't that easy?
- </para>
- </sect1>
-
-
- <sect1 id="example2">
- <title>Example 2: Supporting Interoperable Searches</title>
-
- <para>
- The problem with the previous example is that you need to know the
- structure of the documents in order to find them. For example,
- when we wanted to find the record for the taxon
- <foreignphrase role="taxon">Sauroposeidon</foreignphrase>,
- we had to formulate a complex XPath
- <literal>/Zthes/termName</literal>
- which embodies the knowledge that taxon names are specified in a
- <literal><termName></literal> element inside the top-level
- <literal><Zthes></literal> element.
- </para>
- <para>
- This is bad not just because it requires a lot of typing, but more
- significantly because it ties searching semantics to the physical
- structure of the searched records. You can't use the same search
- specification to search two databases if their internal
- representations are different. Consider an alternative taxonomy
- database in which the records have taxon names specified
- inside a <literal><name></literal> element nested within a
- <literal><identification></literal> element
- inside a top-level <literal><taxon></literal> element: then
- you'd need to search for them using
- <literal>1=/taxon/identification/name</literal>
- </para>
- <para>
- How, then, can we build broadcasting Information Retrieval
- applications that look for records in many different databases?
- The Z39.50 protocol offers a powerful and general solution to this:
- abstract ``access points''. In the Z39.50 model, an access point
- is simply a point at which searches can be directed. Nothing is
- said about implementation: in a given database, an access point
- might be implemented as an index, a path into physical records, an
- algorithm for interrogating relational tables or whatever works.
- The key point is that the semantics of an access point are fixed
- and well defined.
- </para>
- <para>
- For convenience, access points are gathered into <firstterm>attribute
- sets</firstterm>. For example, the BIB-1 attribute set is supposed to
- contain bibliographic access points such as author, title, subject
- and ISBN; the GEO attribute set contains access points pertaining
- to geospatial information (bounding coordinates, stratum, latitude
- resolution, etc.); the CIMI
- attribute set contains access points to do with museum collections
- (provenance, inscriptions, etc.)
- </para>
- <para>
- In practice, the BIB-1 attribute set has tended to be a dumping
- ground for all sorts of access points, so that, for example, it
- includes some geospatial access points as well as strictly
- bibliographic ones. Nevertheless, the key point is that this model
- allows a layer of abstraction over the physical representation of
- records in databases.
- </para>
- <para>
- In the BIB-1 attribute set, a taxon name is probably best
- interpreted as a title - that is, a phrase that identifies the item
- in question. BIB-1 represents title searches by
- access point 4. (See
- <ulink url="ftp://ftp.loc.gov/pub/z3950/defs/bib1.txt"
- >The BIB-1 Attribute Set Semantics</ulink>)
- So we need to configure our dinosaur database so that searches for
- BIB-1 access point 4 look in the
- <literal><termName></literal> element,
- inside the top-level
- <literal><Zthes></literal> element.
- </para>
- <para>
- This is a two-step process. First, we need to tell Zebra that we
- want to support the BIB-1 attribute set. Then we need to tell it
- which elements of its record pertain to access point 4.
- </para>
- <para>
- We need to create an <link linkend="abs-file">Abstract Syntax
- file</link> named after the document element of the records we're
- working with, plus a <literal>.abs</literal> suffix - in this case,
- <literal>Zthes.abs</literal> - as follows:
- </para>
- <itemizedlist>
- <listitem>
- <para>
-
- </para>
- </listitem>
- <listitem>
- <para>
- </para>
- </listitem>
- </itemizedlist>
- </sect1>
-</chapter>
-
-
-<!--
- The simplest hello-world example could go like this:
-
- Index the document
-
- <book>
- <title>The art of motorcycle maintenance</title>
- <subject scheme="Dewey">zen</subject>
+ <idzebra xmlns="http://www.indexdata.dk/zebra/">
+ <size>300</size>
+ <localnumber>23</localnumber>
+ <filename>records/dino.xml</filename>
+ </idzebra>
+ </Zthes>
+ </screen>
+ </para>
+ <para>
+ Now wasn't that nice and easy?
+ </para>
+ </sect1>
+
+
+ <sect1 id="example2">
+ <title>Example 2: Supporting Interoperable Searches</title>
+
+ <para>
+ The problem with the previous example is that you need to know the
+ structure of the documents in order to find them. For example,
+ when we wanted to find the record for the taxon
+ <foreignphrase role="taxon">Sauroposeidon</foreignphrase>,
+ we had to formulate a complex XPath
+ <literal>/Zthes/termName</literal>
+ which embodies the knowledge that taxon names are specified in a
+ <literal><termName></literal> element inside the top-level
+ <literal><Zthes></literal> element.
+ </para>
+ <para>
+ This is bad not just because it requires a lot of typing, but more
+ significantly because it ties searching semantics to the physical
+ structure of the searched records. You can't use the same search
+ specification to search two databases if their internal
+ representations are different. Consider a different taxonomy
+ database in which the records have taxon names specified
+ inside a <literal><name></literal> element nested within a
+ <literal><identification></literal> element
+ inside a top-level <literal><taxon></literal> element: then
+ you'd need to search for them using
+ <literal>1=/taxon/identification/name</literal>
+ </para>
+ <para>
+ How, then, can we build broadcasting Information Retrieval
+ applications that look for records in many different databases?
+ The &acro.z3950; protocol offers a powerful and general solution to this:
+ abstract ``access points''. In the &acro.z3950; model, an access point
+ is simply a point at which searches can be directed. Nothing is
+ said about implementation: in a given database, an access point
+ might be implemented as an index, a path into physical records, an
+ algorithm for interrogating relational tables or whatever works.
+ The only important thing is that the semantics of an access
+ point is fixed and well defined.
+ </para>
+ <para>
+ For convenience, access points are gathered into <firstterm>attribute
+ sets</firstterm>. For example, the &acro.bib1; attribute set is supposed to
+ contain bibliographic access points such as author, title, subject
+ and ISBN; the GEO attribute set contains access points pertaining
+ to geospatial information (bounding coordinates, stratum, latitude
+ resolution, etc.); the CIMI
+ attribute set contains access points to do with museum collections
+ (provenance, inscriptions, etc.)
+ </para>
+ <para>
+ In practice, the &acro.bib1; attribute set has tended to be a dumping
+ ground for all sorts of access points, so that, for example, it
+ includes some geospatial access points as well as strictly
+ bibliographic ones. Nevertheless, this model
+ allows a layer of abstraction over the physical representation of
+ records in databases.
+ </para>
+ <para>
+ In the &acro.bib1; attribute set, a taxon name is probably best
+ interpreted as a title - that is, a phrase that identifies the item
+ in question. &acro.bib1; represents title searches by
+ access point 4. (See
+ <ulink url="&url.z39.50.bib1.semantics;">The &acro.bib1; Attribute
+ Set Semantics</ulink>)
+ So we need to configure our dinosaur database so that searches for
+ &acro.bib1; access point 4 look in the
+ <literal><termName></literal> element,
+ inside the top-level
+ <literal><Zthes></literal> element.
+ </para>
+ <para>
+ This is a two-step process. First, we need to tell &zebra; that we
+ want to support the &acro.bib1; attribute set. Then we need to tell it
+ which elements of its record pertain to access point 4.
+ </para>
+ <para>
+ We need to create an <link linkend="abs-file">Abstract Syntax
+ file</link> named after the document element of the records we're
+ working with, plus a <literal>.abs</literal> suffix - in this case,
+ <literal>Zthes.abs</literal> - as follows:
+ </para>
+ <programlistingco>
+ <areaspec>
+ <area id="attset.zthes" coords="2"/>
+ <area id="attset.attset" coords="3"/>
+ <area id="termId" coords="7"/>
+ <area id="termName" coords="8"/>
+ </areaspec>
+ <programlisting>
+ attset zthes.att
+ attset bib1.att
+ xpath enable
+ systag sysno none
+
+ xelm /Zthes/termId termId:w
+ xelm /Zthes/termName termName:w,title:w
+ xelm /Zthes/termQualifier termQualifier:w
+ xelm /Zthes/termType termType:w
+ xelm /Zthes/termLanguage termLanguage:w
+ xelm /Zthes/termNote termNote:w
+ xelm /Zthes/termCreatedDate termCreatedDate:w
+ xelm /Zthes/termCreatedBy termCreatedBy:w
+ xelm /Zthes/termModifiedDate termModifiedDate:w
+ xelm /Zthes/termModifiedBy termModifiedBy:w
+ </programlisting>
+ <calloutlist>
+ <callout arearefs="attset.zthes">
+ <para>
+ Declare Thesaurus attribute set. See <filename>zthes.att</filename>.
+ </para>
+ </callout>
+ <callout arearefs="attset.attset">
+ <para>
+ Declare &acro.bib1; attribute set. See <filename>bib1.att</filename> in
+ &zebra;'s <filename>tab</filename> directory.
+ </para>
+ </callout>
+ <callout arearefs="termId">
+ <para>
+ This xelm directive selects contents of nodes by XPath expression
+ <literal>/Zthes/termId</literal>. The contents (CDATA) will be
+ word searchable by Zthes attribute termId (value 1001).
+ </para>
+ </callout>
+ <callout arearefs="termName">
+ <para>
+ Make <literal>termName</literal> word searchable by both
+ Zthes attribute termName (1002) and &acro.bib1; attribute title (4).
+ </para>
+ </callout>
+ </calloutlist>
+ </programlistingco>
+ <para>
+ After re-indexing, we can search the database using &acro.bib1;
+ attribute, title, as follows:
+ <screen>
+ Z> form xml
+ Z> f @attr 1=4 Eoraptor
+ Sent searchRequest.
+ Received SearchResponse.
+ Search was a success.
+ Number of hits: 1, setno 1
+ SearchResult-1: Eoraptor(1)
+ records returned: 0
+ Elapsed: 0.106896
+ Z> s
+ Sent presentRequest (1+1).
+ Records: 1
+ [Default]Record type: &acro.xml;
+ <Zthes>
+ <termId>2</termId>
+ <termName>Eoraptor</termName>
+ <termType>PT</termType>
+ <termNote>The most basal known dinosaur</termNote>
+ ...
+ </screen>
+ </para>
+ </sect1>
+ </chapter>
+
+
+ <!--
+ The simplest hello-world example could go like this:
+
+ Index the document
+
+ <book>
+ <title>The art of motorcycle maintenance</title>
+ <subject scheme="Dewey">zen</subject>