- <chapter id="record-model-alvisxslt">
- <!-- $Id: recordmodel-alvisxslt.xml,v 1.6 2006-04-24 12:53:03 marc Exp $ -->
- <title>ALVIS XML Record Model and Filter Module</title>
-
+<chapter id="record-model-alvisxslt">
+ <title>ALVIS &acro.xml; Record Model and Filter Module</title>
+
+ <warning>
+ <para>
+ The functionality of this record model has been improved and
+ replaced by the DOM &acro.xml; record model, see
+ <xref linkend="record-model-domxml"/>. The Alvis &acro.xml; record
+ model is considered obsolete, and will eventually be removed
+ from future releases of the &zebra; software.
+ </para>
+ </warning>
<para>
The record model described in this chapter applies to the fundamental,
- structured XML
+ structured &acro.xml;
record type <literal>alvis</literal>, introduced in
- <xref linkend="componentmodulesalvis"/>. The ALVIS XML record model
- is experimental, and it's inner workings might change in future
- releases of the Zebra Information Server.
+ <xref linkend="componentmodulesalvis"/>.
</para>
<para> This filter has been developed under the
</para>
- <sect1 id="record-model-alvisxslt-filter">
+ <section id="record-model-alvisxslt-filter">
<title>ALVIS Record Filter</title>
<para>
- The experimental, loadable Alvis XML/XSLT filter module
+ The experimental, loadable Alvis &acro.xml;/&acro.xslt; filter module
<literal>mod-alvis.so</literal> is packaged in the GNU/Debian package
<literal>libidzebra1.4-mod-alvis</literal>.
It is invoked by the <filename>zebra.cfg</filename> configuration statement
</screen>
In this example on all data files with suffix
<filename>*.xml</filename>, where the
- Alvis XSLT filter configuration file is found in the
+ Alvis &acro.xslt; filter configuration file is found in the
path <filename>db/filter_alvis_conf.xml</filename>.
</para>
- <para>The Alvis XSLT filter configuration file must be
- valid XML. It might look like this (This example is
- used for indexing and display of OAI harvested records):
+ <para>The Alvis &acro.xslt; filter configuration file must be
+ valid &acro.xml;. It might look like this (This example is
+ used for indexing and display of &acro.oai; harvested records):
<screen>
<?xml version="1.0" encoding="UTF-8"?>
<schemaInfo>
<schema name="index" identifier="http://indexdata.dk/zebra/xslt/1"
stylesheet="xsl/oai2index.xsl" />
<schema name="dc" stylesheet="xsl/oai2dc.xsl" />
- <!-- use split level 2 when indexing whole OAI Record lists -->
+ <!-- use split level 2 when indexing whole &acro.oai; Record lists -->
<split level="2"/>
</schemaInfo>
</screen>
names defined in the <literal>name</literal> attributes must be
unique, these are the literal <literal>schema</literal> or
<literal>element set</literal> names used in
- <ulink url="http://www.loc.gov/standards/sru/srw/">SRW</ulink>,
- <ulink url="http://www.loc.gov/standards/sru/">SRU</ulink> and
- Z39.50 protocol queries.
+ <ulink url="http://www.loc.gov/standards/sru/srw/">&acro.srw;</ulink>,
+ <ulink url="&url.sru;">&acro.sru;</ulink> and
+ &acro.z3950; protocol queries.
The paths in the <literal>stylesheet</literal> attributes
are relative to zebras working directory, or absolute to file
system root.
</para>
<para>
The <literal><split level="2"/></literal> decides where the
- XML Reader shall split the
+ &acro.xml; Reader shall split the
collections of records into individual records, which then are
- loaded into DOM, and have the indexing XSLT stylesheet applied.
+ loaded into &acro.dom;, and have the indexing &acro.xslt; stylesheet applied.
</para>
<para>
- There must be exactly one indexing XSLT stylesheet, which is
+ There must be exactly one indexing &acro.xslt; stylesheet, which is
defined by the magic attribute
<literal>identifier="http://indexdata.dk/zebra/xslt/1"</literal>.
</para>
- <sect2 id="record-model-alvisxslt-internal">
+ <section id="record-model-alvisxslt-internal">
<title>ALVIS Internal Record Representation</title>
- <para>When indexing, an XML Reader is invoked to split the input
- files into suitable record XML pieces. Each record piece is then
- transformed to an XML DOM structure, which is essentially the
- record model. Only XSLT transformations can be applied during
+ <para>When indexing, an &acro.xml; Reader is invoked to split the input
+ files into suitable record &acro.xml; pieces. Each record piece is then
+ transformed to an &acro.xml; &acro.dom; structure, which is essentially the
+ record model. Only &acro.xslt; transformations can be applied during
index, search and retrieval. Consequently, output formats are
- restricted to whatever XSLT can deliver from the record XML
- structure, be it other XML formats, HTML, or plain text. In case
- you have <literal>libxslt1</literal> running with EXSLT support,
+ restricted to whatever &acro.xslt; can deliver from the record &acro.xml;
+ structure, be it other &acro.xml; formats, HTML, or plain text. In case
+ you have <literal>libxslt1</literal> running with E&acro.xslt; support,
you can use this functionality inside the Alvis
- filter configuration XSLT stylesheets.
+ filter configuration &acro.xslt; stylesheets.
</para>
- </sect2>
+ </section>
- <sect2 id="record-model-alvisxslt-canonical">
+ <section id="record-model-alvisxslt-canonical">
<title>ALVIS Canonical Indexing Format</title>
- <para>The output of the indexing XSLT stylesheets must contain
+ <para>The output of the indexing &acro.xslt; stylesheets must contain
certain elements in the magic
<literal>xmlns:z="http://indexdata.dk/zebra/xslt/1"</literal>
- namespace. The output of the XSLT indexing transformation is then
- parsed using DOM methods, and the contained instructions are
+ namespace. The output of the &acro.xslt; indexing transformation is then
+ parsed using &acro.dom; methods, and the contained instructions are
performed on the <emphasis>magic elements and their
subtrees</emphasis>.
</para>
<?xml version="1.0" encoding="UTF-8"?>
<z:record xmlns:z="http://indexdata.dk/zebra/xslt/1"
z:id="oai:JTRS:CP-3290---Volume-I"
- z:rank="47896"
- z:type="update">
- <z:index name="oai:identifier" type="0">
+ z:rank="47896">
+ <z:index name="oai_identifier" type="0">
oai:JTRS:CP-3290---Volume-I</z:index>
- <z:index name="oai:datestamp" type="0">2004-07-09</z:index>
- <z:index name="oai:setspec" type="0">jtrs</z:index>
- <z:index name="dc:all" type="w">
- <z:index name="dc:title" type="w">Proceedings of the 4th
+ <z:index name="oai_datestamp" type="0">2004-07-09</z:index>
+ <z:index name="oai_setspec" type="0">jtrs</z:index>
+ <z:index name="dc_all" type="w">
+ <z:index name="dc_title" type="w">Proceedings of the 4th
International Conference and Exhibition:
World Congress on Superconductivity - Volume I</z:index>
- <z:index name="dc:creator" type="w">Kumar Krishen and *Calvin
+ <z:index name="dc_creator" type="w">Kumar Krishen and *Calvin
Burnham, Editors</z:index>
</z:index>
</z:record>
</screen>
</para>
- <para>This means the following: From the original XML file
- <literal>one-record.xml</literal> (or from the XML record DOM of the
- same form coming from a splitted input file), the indexing
- stylesheet produces an indexing XML record, which is defined by
+ <para>This means the following: From the original &acro.xml; file
+ <literal>one-record.xml</literal> (or from the &acro.xml; record &acro.dom; of the
+ same form coming from a split input file), the indexing
+ stylesheet produces an indexing &acro.xml; record, which is defined by
the <literal>record</literal> element in the magic namespace
<literal>xmlns:z="http://indexdata.dk/zebra/xslt/1"</literal>.
- Zebra uses the content of
+ &zebra; uses the content of
<literal>z:id="oai:JTRS:CP-3290---Volume-I"</literal> as internal
record ID, and - in case static ranking is set - the content of
<literal>z:rank="47896"</literal> as static rank. Following the
we see that this records is internally ordered
lexicographically according to the value of the string
<literal>oai:JTRS:CP-3290---Volume-I47896</literal>.
- The type of action performed during indexing is defined by
+ <!-- The type of action performed during indexing is defined by
<literal>z:type="update"></literal>, with recognized values
<literal>insert</literal>, <literal>update</literal>, and
- <literal>delete</literal>.
+ <literal>delete</literal>. -->
</para>
<para>In this example, the following literal indexes are constructed:
<screen>
- oai:identifier
- oai:datestamp
- oai:setspec
- dc:all
- dc:title
- dc:creator
+ oai_identifier
+ oai_datestamp
+ oai_setspec
+ dc_all
+ dc_title
+ dc_creator
</screen>
where the indexing type is defined in the
<literal>type</literal> attribute
file <filename>default.idx</filename> will do). Finally, any
<literal>text()</literal> node content recursively contained
inside the <literal>index</literal> will be filtered through the
- appropriate charmap for character normalization, and will be
+ appropriate char map for character normalization, and will be
inserted in the index.
</para>
<para>
will be inserted using the <literal>w</literal> character
normalization defined in <filename>default.idx</filename> into
the index <literal>dc:creator</literal> (that is, after character
- normalization the index will keep the inidividual words
+ normalization the index will keep the individual words
<literal>kumar</literal>, <literal>krishen</literal>,
<literal>and</literal>, <literal>calvin</literal>,
<literal>burnham</literal>, and <literal>editors</literal>), and
the same character normalization map <literal>w</literal>.
</para>
<para>
- Finally, this example configuration can be queried using PQF
- queries, either transported by Z39.50, (here using a yaz-client)
+ Finally, this example configuration can be queried using &acro.pqf;
+ queries, either transported by &acro.z3950;, (here using a yaz-client)
<screen>
<![CDATA[
Z> open localhost:9999
Z> elem dc
Z> form xml
Z>
- Z> f @attr 1=dc:creator Kumar
- Z> scan @attr 1=dc:creator adam
+ Z> f @attr 1=dc_creator Kumar
+ Z> scan @attr 1=dc_creator adam
Z>
- Z> f @attr 1=dc:title @attr 4=2 "proceeding congress superconductivity"
- Z> scan @attr 1=dc:title abc
+ Z> f @attr 1=dc_title @attr 4=2 "proceeding congress superconductivity"
+ Z> scan @attr 1=dc_title abc
]]>
</screen>
or the proprietary
- extentions <literal>x-pquery</literal> and
+ extensions <literal>x-pquery</literal> and
<literal>x-pScanClause</literal> to
- SRU, and SRW
+ &acro.sru;, and &acro.srw;
<screen>
<![CDATA[
- http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=%40attr+1%3Ddc%3Acreator+%40attr+4%3D6+%22the
- http://localhost:9999/?version=1.1&operation=scan&x-pScanClause=@attr+1=dc:date+@attr+4=2+a
+ http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=%40attr+1%3Ddc_creator+%40attr+4%3D6+%22the
+ http://localhost:9999/?version=1.1&operation=scan&x-pScanClause=@attr+1=dc_date+@attr+4=2+a
]]>
</screen>
- See <xref linkend="server-sru"/> for more information on SRU/SRW
- configuration, and <xref linkend="gfs-config"/> or
- <ulink url="http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql">
- the YAZ manual CQL section</ulink>
- for the details
- of the YAZ frontend server
- <ulink url="http://www.loc.gov/standards/sru/cql/">CQL</ulink>
- configuration.
+ See <xref linkend="zebrasrv-sru"/> for more information on &acro.sru;/&acro.srw;
+ configuration, and <xref linkend="gfs-config"/> or the &yaz;
+ <ulink url="&url.yaz.cql;">&acro.cql; section</ulink>
+ for the details or the &yaz; frontend server.
</para>
<para>
Notice that there are no <filename>*.abs</filename>,
- <filename>*.est</filename>, <filename>*.map</filename>, or other GRS-1
+ <filename>*.est</filename>, <filename>*.map</filename>, or other &acro.grs1;
filter configuration files involves in this process, and that the
literal index names are used during search and retrieval.
</para>
- </sect2>
- </sect1>
+ </section>
+ </section>
- <sect1 id="record-model-alvisxslt-conf">
+ <section id="record-model-alvisxslt-conf">
<title>ALVIS Record Model Configuration</title>
- <sect2 id="record-model-alvisxslt-index">
+ <section id="record-model-alvisxslt-index">
<title>ALVIS Indexing Configuration</title>
<para>
As mentioned above, there can be only one indexing
stylesheet, and configuration of the indexing process is a synonym
- of writing an XSLT stylesheet which produces XML output containing the
+ of writing an &acro.xslt; stylesheet which produces &acro.xml; output containing the
magic elements discussed in
<xref linkend="record-model-alvisxslt-internal"/>.
Obviously, there are million of different ways to accomplish this
task, and some comments and code snippets are in order to lead
- our paduans on the right track to the good side of the force.
+ our Padawan's on the right track to the good side of the force.
</para>
<para>
Stylesheets can be written in the <emphasis>pull</emphasis> or
the <emphasis>push</emphasis> style: <emphasis>pull</emphasis>
- means that the output XML structure is taken as starting point of
- the internal structure of the XSLT stylesheet, and portions of
- the input XML are <emphasis>pulled</emphasis> out and inserted
- into the right spots of the output XML structure. On the other
- side, <emphasis>push</emphasis> XSLT stylesheets are recursavly
+ means that the output &acro.xml; structure is taken as starting point of
+ the internal structure of the &acro.xslt; stylesheet, and portions of
+ the input &acro.xml; are <emphasis>pulled</emphasis> out and inserted
+ into the right spots of the output &acro.xml; structure. On the other
+ side, <emphasis>push</emphasis> &acro.xslt; stylesheets are recursively
calling their template definitions, a process which is commanded
- by the input XML structure, and avake to produce some output XML
- whenever some special conditions in the input styelsheets are
+ by the input &acro.xml; structure, and are triggered to produce some output &acro.xml;
+ whenever some special conditions in the input stylesheets are
met. The <emphasis>pull</emphasis> type is well-suited for input
- XML with strong and well-defined structure and semantcs, like the
- following OAI indexing example, whereas the
+ &acro.xml; with strong and well-defined structure and semantics, like the
+ following &acro.oai; indexing example, whereas the
<emphasis>push</emphasis> type might be the only possible way to
- sort out deeply recursive input XML formats.
+ sort out deeply recursive input &acro.xml; formats.
</para>
<para>
A <emphasis>pull</emphasis> stylesheet example used to index
- OAI harvested records could use some of the following template
+ &acro.oai; harvested records could use some of the following template
definitions:
<screen>
<![CDATA[
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:z="http://indexdata.dk/zebra/xslt/1"
- xmlns:oai="http://www.openarchives.org/OAI/2.0/"
- xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
+ xmlns:oai="http://www.openarchives.org/&acro.oai;/2.0/"
+ xmlns:oai_dc="http://www.openarchives.org/&acro.oai;/2.0/oai_dc/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
version="1.0">
<!-- match on oai xml record root -->
<xsl:template match="/">
- <z:record z:id="{normalize-space(oai:record/oai:header/oai:identifier)}"
- z:type="update">
- <!-- you might want to use z:rank="{some XSLT function here}" -->
+ <z:record z:id="{normalize-space(oai:record/oai:header/oai:identifier)}">
+ <!-- you might want to use z:rank="{some &acro.xslt; function here}" -->
<xsl:apply-templates/>
</z:record>
</xsl:template>
- <!-- OAI indexing templates -->
+ <!-- &acro.oai; indexing templates -->
<xsl:template match="oai:record/oai:header/oai:identifier">
- <z:index name="oai:identifier" type="0">
+ <z:index name="oai_identifier" type="0">
<xsl:value-of select="."/>
</z:index>
</xsl:template>
<!-- DC specific indexing templates -->
<xsl:template match="oai:record/oai:metadata/oai_dc:dc/dc:title">
- <z:index name="dc:title" type="w">
+ <z:index name="dc_title" type="w">
<xsl:value-of select="."/>
</z:index>
</xsl:template>
<para>
Notice also,
that the names and types of the indexes can be defined in the
- indexing XSLT stylesheet <emphasis>dynamically according to
- content in the original XML records</emphasis>, which has
- opportunities for great power and wizardery as well as grande
+ indexing &acro.xslt; stylesheet <emphasis>dynamically according to
+ content in the original &acro.xml; records</emphasis>, which has
+ opportunities for great power and wizardry as well as grande
disaster.
</para>
<para>
The following excerpt of a <emphasis>push</emphasis> stylesheet
<emphasis>might</emphasis>
- be a good idea according to your strict control of the XML
- input format (due to rigerours checking against well-defined and
- tight RelaxNG or XML Schema's, for example):
+ be a good idea according to your strict control of the &acro.xml;
+ input format (due to rigorous checking against well-defined and
+ tight RelaxNG or &acro.xml; Schema's, for example):
<screen>
<![CDATA[
<xsl:template name="element-name-indexes">
]]>
</screen>
This template creates indexes which have the name of the working
- node of any input XML file, and assigns a '1' to the index.
+ node of any input &acro.xml; file, and assigns a '1' to the index.
The example query
<literal>find @attr 1=xyz 1</literal>
finds all files which contain at least one
- <literal>xyz</literal> XML element. In case you can not control
+ <literal>xyz</literal> &acro.xml; element. In case you can not control
which element names the input files contain, you might ask for
disaster and bad karma using this technique.
</para>
<![CDATA[
<!-- match on oai xml record root -->
<xsl:template match="/">
- <z:record z:type="update">
+ <z:record>
<!-- create dynamic index name from input content -->
<xsl:variable name="dynamic_content">
]]>
</screen>
Don't be tempted to cross
- the line to the dark side of the force, paduan; this leads
+ the line to the dark side of the force, Padawan; this leads
to suffering and pain, and universal
- disentigration of your project schedule.
+ disintegration of your project schedule.
</para>
- </sect2>
+ </section>
- <sect2 id="record-model-alvisxslt-elementset">
+ <section id="record-model-alvisxslt-elementset">
<title>ALVIS Exchange Formats</title>
<para>
An exchange format can be anything which can be the outcome of an
- XSLT transformation, as far as the stylesheet is registered in
- the main Alvis XSLT filter configuration file, see
+ &acro.xslt; transformation, as far as the stylesheet is registered in
+ the main Alvis &acro.xslt; filter configuration file, see
<xref linkend="record-model-alvisxslt-filter"/>.
- In principle anything that can be expressed in XML, HTML, and
+ In principle anything that can be expressed in &acro.xml;, HTML, and
TEXT can be the output of a <literal>schema</literal> or
<literal>element set</literal> directive during search, as long as
the information comes from the
- <emphasis>original input record XML DOM tree</emphasis>
- (and not the transformed and <emphasis>indexed</emphasis> XML!!).
+ <emphasis>original input record &acro.xml; &acro.dom; tree</emphasis>
+ (and not the transformed and <emphasis>indexed</emphasis> &acro.xml;!!).
</para>
<para>
- In addition, internal administrative information from the Zebra
+ In addition, internal administrative information from the &zebra;
indexer can be accessed during record retrieval. The following
example is a summary of the possibilities:
<screen>
</screen>
</para>
- </sect2>
+ </section>
- <sect2 id="record-model-alvisxslt-example">
- <title>ALVIS Filter OAI Indexing Example</title>
+ <section id="record-model-alvisxslt-example">
+ <title>ALVIS Filter &acro.oai; Indexing Example</title>
<para>
- The sourcecode tarball contains a working Alvis filter example in
+ The source code tarball contains a working Alvis filter example in
the directory <filename>examples/alvis-oai/</filename>, which
should get you started.
</para>
<para>
- More example data can be harvested from any OAI complient server,
- see details at the OAI
+ More example data can be harvested from any &acro.oai; compliant server,
+ see details at the &acro.oai;
<ulink url="http://www.openarchives.org/">
http://www.openarchives.org/</ulink> web site, and the community
links at
<ulink url="http://www.oaforum.org/tutorial/">
http://www.oaforum.org/tutorial/</ulink>.
</para>
- </sect2>
+ </section>
- </sect1>
+ </section>
</chapter>
-<!--
-
-c) Main "alvis" XSLT filter config file:
- cat db/filter_alvis_conf.xml
-
- <?xml version="1.0" encoding="UTF8"?>
- <schemaInfo>
- <schema name="alvis" stylesheet="db/alvis2alvis.xsl" />
- <schema name="index" identifier="http://indexdata.dk/zebra/xslt/1"
- stylesheet="db/alvis2index.xsl" />
- <schema name="dc" stylesheet="db/alvis2dc.xsl" />
- <schema name="dc-short" stylesheet="db/alvis2dc_short.xsl" />
- <schema name="snippet" snippet="25" stylesheet="db/alvis2snippet.xsl" />
- <schema name="help" stylesheet="db/alvis2help.xsl" />
- <split level="1"/>
- </schemaInfo>
-
- the paths are relative to the directory where zebra.init is placed
- and is started up.
-
- The split level decides where the SAX parser shall split the
- collections of records into individual records, which then are
- loaded into DOM, and have the indexing XSLT stylesheet applied.
-
- The indexing stylesheet is found by it's identifier.
-
- All the other stylesheets are for presentation after search.
-
-- in data/ a short sample of harvested carnivorous plants
- ZEBRA_INDEX_DIRS=data/carnivor_20050118_2200_short-346.xml
-
-- in root also one single data record - nice for testing the xslt
- stylesheets,
-
- xsltproc db/alvis2index.xsl carni*.xml
-
- and so on.
-
-- in db/ a cql2pqf.txt yaz-client config file
- which is also used in the yaz-server <ulink url="http://www.loc.gov/standards/sru/cql/">CQL</ulink>-to-PQF process
-
- see: http://www.indexdata.com/yaz/doc/tools.tkl#tools.cql.map
-
-- in db/ an indexing XSLT stylesheet. This is a PULL-type XSLT thing,
- as it constructs the new XML structure by pulling data out of the
- respective elements/attributes of the old structure.
-
- Notice the special zebra namespace, and the special elements in this
- namespace which indicate to the zebra indexer what to do.
-
- <z:record id="67ht7" rank="675" type="update">
- indicates that a new record with given id and static rank has to be updated.
-
- <z:index name="title" type="w">
- encloses all the text/XML which shall be indexed in the index named
- "title" and of index type "w" (see file default.idx in your zebra
- installation)
-
-
- </para>
-
- <para>
--->
-
-
-
-
- <!-- Keep this Emacs mode comment at the end of the file
-Local variables:
-mode: nxml
-End:
--->
+ <!-- Keep this comment at the end of the file
+ Local variables:
+ mode: sgml
+ sgml-omittag:t
+ sgml-shorttag:t
+ sgml-minimize-attributes:nil
+ sgml-always-quote-attributes:t
+ sgml-indent-step:1
+ sgml-indent-data:t
+ sgml-parent-document: "zebra.xml"
+ sgml-local-catalogs: nil
+ sgml-namecase-general:t
+ End:
+ -->