From: Marc Cromme Date: Wed, 15 Feb 2006 14:57:48 +0000 (+0000) Subject: added sections on alvis filter configuration, not finished yet X-Git-Tag: before.bug.529~244 X-Git-Url: http://git.indexdata.com/?p=idzebra-moved-to-github.git;a=commitdiff_plain;h=495a66ecd5fb966a8bd52f95dc25cde9d673e569 added sections on alvis filter configuration, not finished yet --- diff --git a/doc/Makefile.am b/doc/Makefile.am index 5983f23..5e189ab 100644 --- a/doc/Makefile.am +++ b/doc/Makefile.am @@ -1,4 +1,4 @@ -## $Id: Makefile.am,v 1.35 2006-02-15 12:08:47 marc Exp $ +## $Id: Makefile.am,v 1.36 2006-02-15 14:57:48 marc Exp $ docdir=$(datadir)/doc/@PACKAGE@ SUPPORTFILES = \ @@ -30,8 +30,11 @@ XMLFILES = zebra.xml.in \ zebrasrv-virtual.xml -HTMLFILES = administration.html \ +HTMLFILES = administration-ranking.html \ + administration.html \ apps.html \ + architecture-maincomponents.html \ + architecture-workflow.html \ architecture.html \ configuration-file.html \ example1.html \ @@ -50,7 +53,6 @@ HTMLFILES = administration.html \ introduction.html \ license.html \ locating-records.html \ - architecture-maincomponents.html \ protocol-support.html \ quick-start.html \ record-model-alvisxslt-conf.html \ @@ -58,13 +60,12 @@ HTMLFILES = administration.html \ record-model-grs-conf.html \ record-model-grs.html \ register-location.html \ + server-sru-support.html \ + server-sru.html \ server.html \ shadow-registers.html \ simple-indexing.html \ - sru-server.html \ - sru-support.html \ support.html \ - workflow.html \ zebra.html \ zebraidx.html diff --git a/doc/recordmodel-alvisxslt.xml b/doc/recordmodel-alvisxslt.xml index 764190d..a322f74 100644 --- a/doc/recordmodel-alvisxslt.xml +++ b/doc/recordmodel-alvisxslt.xml @@ -1,5 +1,5 @@ - + ALVIS XML Record Model and Filter Module @@ -12,39 +12,183 @@ releases of the Zebra Information Server. - - + This filter has been developed under the + ALVIS project funded by + the European Community under the "Information Society Technologies" + Programme (2002-2006). + - ALLVIS Record Filter + ALVIS Record Filter The experimental, loadable Alvis XM/XSLT filter module mod-alvis.so is packaged in the GNU/Debian package libidzebra1.4-mod-alvis. + It is invoked by the zebra configuration statement + + recordtype.xml: alvis.db/filter_alvis_conf.xml + + on all data files with suffix .xml, where the + alvis XSLT filter config file is found in the + path db/filter_alvis_conf.xml + + The alvis XSLT filter config file must be + valid XML. It might look like this (used for indexing and display + of OAI harvested records): + + <?xml version="1.0" encoding="UTF-8"?> + <schemaInfo> + <schema name="identity" stylesheet="xsl/identity.xsl" /> + <schema name="index" identifier="http://indexdata.dk/zebra/xslt/1" + stylesheet="xsl/oai2index.xsl" /> + <schema name="dc" stylesheet="xsl/oai2dc.xsl" /> + <!-- use split level 2 when indexing whole OAI Record lists --> + <split level="2"/> + </schemaInfo> + + + + All named stylesheets defined inside + schema element tags + are for presentation after search, including + the indexing stylesheet (which is a great debugging help). The + names defined in the name attributes must be + unique, these are the literal schema or + element set names used in + SRW, + SRU and + Z39.50 protocol queries. + The pathes in the stylesheet attributes + are relative to zebras working directory, or absolute to file + system root. + + + The <split level="2"/> decides where the + XML Reader shall split the + collections of records into individual records, which then are + loaded into DOM, and have the indexing XSLT stylesheet applied. + + + There must be exactly one indexing XSLT stylesheet, which is + defined by the magic attribute + identifier="http://indexdata.dk/zebra/xslt/1". - ALLVIS Internal Record Representation - FIXME + ALVIS Internal Record Representation + When indexing, an XML Reader is invoked to split the input + files into suitable record XML pieces. Each record piece is then + transformed to an XML DOM structire, which is essentially the + record model. Only XSLT transfomations can be applied during + index, search and retrieval. Consequently, output formats are + restricted to whatever XSLT can deliver from the record XML + structure, be it other XML formats, HTML, or plain text. In case + you have libxslt1 running with EXSLT support, + you can use this functionality inside the alvis + filter configuraiton XSLT stylesheets. + - ALLVIS Canonical Format - FIXME + ALVIS Canonical Indexing Format + The output of the indexing XSLT stylesheets must contain + certain elements in the magic + xmlns:z="http://indexdata.dk/zebra/xslt/1" + namespace. The output of the XSLT indexing transformation is then + parsed using DOM methods, and the contained instructions are + performed on the magic elements and their + subtrees. + + + For example, the output of the command + + xsltproc xsl/oai2index.xsl one-record.xml + + might look like this: + + <?xml version="1.0" encoding="UTF-8"?> + <z:record xmlns:z="http://indexdata.dk/zebra/xslt/1" + z:id="oai:JTRS:CP-3290---Volume-I" + z:rank="47896" + z:type="update"> + <z:index name="oai:identifier" type="0"> + oai:JTRS:CP-3290---Volume-I</z:index> + <z:index name="oai:datestamp" type="0">2004-07-09</z:index> + <z:index name="oai:setspec" type="0">jtrs</z:index> + <z:index name="dc:all" type="w"> + <z:index name="dc:title" type="w">Proceedings of the 4th + International Conference and Exhibition: + World Congress on Superconductivity - Volume I</z:index> + <z:index name="dc:creator" type="w">Kumar Krishen and *Calvin + Burnham, Editors</z:index> + </z:index> + </z:record> + + + This means the following: From the original XML file + one-record.xml (or from the XML record DOM of the + same form coming from a splitted input file), the indexing + stylesheet produces an indexing XML record, which is defined by + the record element in the magic namespace + xmlns:z="http://indexdata.dk/zebra/xslt/1". + Zebra uses the content of + z:id="oai:JTRS:CP-3290---Volume-I" as internal + record ID, and - in case static ranking is set - the content of + z:rank="47896" as static rank. Following the + discussion in XXX we see that this records is internally ordered + lexicographically according to the value of the string + oai:JTRS:CP-3290---Volume-I47896. + The type of action performed during indexing is defined by + z:type="update">, with recognized values + insert, update, and + delete. + + Then the following literal indexes are constructed: + + oai:identifier + oai:datestamp + oai:setspec + dc:all + dc:title + dc:creator + + where the indexing type is defined in the + type attribute (any value from the standard config + filedefault.idx will do). Finally, any + text() node content recursively contained + inside the index will be filtered through the + appropriate charmap for character normalization, and will be + inserted in the index. + + + Notice that there are no .abs, + .est, .map, or other GRS-1 + filter configuration files involves in this process. Notice also, + that the names and types of the indexes can be defined in the + indexing XSLT stylesheet dynamically according to + content in the original XML records, which has + oppertunities for great power and great disaster. + - - - ALLVIS Record Model Configuration - FIXME + ALVIS Record Model Configuration + + ALVIS Indexing Configuration + FIXME + + FIXME + + FIXME + + - + ALVIS Exchange Formats FIXME