X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Frecordmodel-alvisxslt.xml;h=93ce649c4304cbbd36a036cf56ccf0b7cb481176;hb=fdad2b849ba30355fbb50a599af70639e691ae3a;hp=be69601c941735a0eb8cee933b233f8d214bf3e0;hpb=0381d7dc936e74ac2fb55ad217b760c97ace0d5b;p=idzebra-moved-to-github.git diff --git a/doc/recordmodel-alvisxslt.xml b/doc/recordmodel-alvisxslt.xml index be69601..93ce649 100644 --- a/doc/recordmodel-alvisxslt.xml +++ b/doc/recordmodel-alvisxslt.xml @@ -1,15 +1,15 @@ - - ALVIS XML Record Model and Filter Module + + ALVIS &xml; Record Model and Filter Module The record model described in this chapter applies to the fundamental, - structured XML + structured &xml; record type alvis, introduced in - . The ALVIS XML record model + . The ALVIS &xml; record model is experimental, and it's inner workings might change in future - releases of the Zebra Information Server. + releases of the &zebra; Information Server. This filter has been developed under the @@ -19,10 +19,10 @@ - +
ALVIS Record Filter - The experimental, loadable Alvis XML/XSLT filter module + The experimental, loadable Alvis &xml;/&xslt; filter module mod-alvis.so is packaged in the GNU/Debian package libidzebra1.4-mod-alvis. It is invoked by the zebra.cfg configuration statement @@ -31,12 +31,12 @@ In this example on all data files with suffix *.xml, where the - Alvis XSLT filter configuration file is found in the + Alvis &xslt; filter configuration file is found in the path db/filter_alvis_conf.xml. - The Alvis XSLT filter configuration file must be - valid XML. It might look like this (This example is - used for indexing and display of OAI harvested records): + The Alvis &xslt; filter configuration file must be + valid &xml;. It might look like this (This example is + used for indexing and display of &oai; harvested records): <?xml version="1.0" encoding="UTF-8"?> <schemaInfo> @@ -44,7 +44,7 @@ <schema name="index" identifier="http://indexdata.dk/zebra/xslt/1" stylesheet="xsl/oai2index.xsl" /> <schema name="dc" stylesheet="xsl/oai2dc.xsl" /> - <!-- use split level 2 when indexing whole OAI Record lists --> + <!-- use split level 2 when indexing whole &oai; Record lists --> <split level="2"/> </schemaInfo> @@ -57,47 +57,47 @@ names defined in the name attributes must be unique, these are the literal schema or element set names used in - SRW, - SRU and - Z39.50 protocol queries. + &srw;, + &sru; and + &z3950; protocol queries. The paths in the stylesheet attributes are relative to zebras working directory, or absolute to file system root. The <split level="2"/> decides where the - XML Reader shall split the + &xml; Reader shall split the collections of records into individual records, which then are - loaded into DOM, and have the indexing XSLT stylesheet applied. + loaded into &dom;, and have the indexing &xslt; stylesheet applied. - There must be exactly one indexing XSLT stylesheet, which is + There must be exactly one indexing &xslt; stylesheet, which is defined by the magic attribute identifier="http://indexdata.dk/zebra/xslt/1". - +
ALVIS Internal Record Representation - When indexing, an XML Reader is invoked to split the input - files into suitable record XML pieces. Each record piece is then - transformed to an XML DOM structure, which is essentially the - record model. Only XSLT transformations can be applied during + When indexing, an &xml; Reader is invoked to split the input + files into suitable record &xml; pieces. Each record piece is then + transformed to an &xml; &dom; structure, which is essentially the + record model. Only &xslt; transformations can be applied during index, search and retrieval. Consequently, output formats are - restricted to whatever XSLT can deliver from the record XML - structure, be it other XML formats, HTML, or plain text. In case - you have libxslt1 running with EXSLT support, + restricted to whatever &xslt; can deliver from the record &xml; + structure, be it other &xml; formats, HTML, or plain text. In case + you have libxslt1 running with E&xslt; support, you can use this functionality inside the Alvis - filter configuration XSLT stylesheets. + filter configuration &xslt; stylesheets. - +
- +
ALVIS Canonical Indexing Format - The output of the indexing XSLT stylesheets must contain + The output of the indexing &xslt; stylesheets must contain certain elements in the magic xmlns:z="http://indexdata.dk/zebra/xslt/1" - namespace. The output of the XSLT indexing transformation is then - parsed using DOM methods, and the contained instructions are + namespace. The output of the &xslt; indexing transformation is then + parsed using &dom; methods, and the contained instructions are performed on the magic elements and their subtrees. @@ -113,27 +113,27 @@ z:id="oai:JTRS:CP-3290---Volume-I" z:rank="47896" z:type="update"> - <z:index name="oai:identifier" type="0"> + <z:index name="oai_identifier" type="0"> oai:JTRS:CP-3290---Volume-I</z:index> - <z:index name="oai:datestamp" type="0">2004-07-09</z:index> - <z:index name="oai:setspec" type="0">jtrs</z:index> - <z:index name="dc:all" type="w"> - <z:index name="dc:title" type="w">Proceedings of the 4th + <z:index name="oai_datestamp" type="0">2004-07-09</z:index> + <z:index name="oai_setspec" type="0">jtrs</z:index> + <z:index name="dc_all" type="w"> + <z:index name="dc_title" type="w">Proceedings of the 4th International Conference and Exhibition: World Congress on Superconductivity - Volume I</z:index> - <z:index name="dc:creator" type="w">Kumar Krishen and *Calvin + <z:index name="dc_creator" type="w">Kumar Krishen and *Calvin Burnham, Editors</z:index> </z:index> </z:record> - This means the following: From the original XML file - one-record.xml (or from the XML record DOM of the + This means the following: From the original &xml; file + one-record.xml (or from the &xml; record &dom; of the same form coming from a splitted input file), the indexing - stylesheet produces an indexing XML record, which is defined by + stylesheet produces an indexing &xml; record, which is defined by the record element in the magic namespace xmlns:z="http://indexdata.dk/zebra/xslt/1". - Zebra uses the content of + &zebra; uses the content of z:id="oai:JTRS:CP-3290---Volume-I" as internal record ID, and - in case static ranking is set - the content of z:rank="47896" as static rank. Following the @@ -148,12 +148,12 @@ In this example, the following literal indexes are constructed: - oai:identifier - oai:datestamp - oai:setspec - dc:all - dc:title - dc:creator + oai_identifier + oai_datestamp + oai_setspec + dc_all + dc_title + dc_creator where the indexing type is defined in the type attribute @@ -187,60 +187,56 @@ the same character normalization map w. - Finally, this example configuration can be queried using PQF - queries, either transported by Z39.50, (here using a yaz-client) + Finally, this example configuration can be queried using &pqf; + queries, either transported by &z3950;, (here using a yaz-client) open localhost:9999 Z> elem dc Z> form xml Z> - Z> f @attr 1=dc:creator Kumar - Z> scan @attr 1=dc:creator adam + Z> f @attr 1=dc_creator Kumar + Z> scan @attr 1=dc_creator adam Z> - Z> f @attr 1=dc:title @attr 4=2 "proceeding congress superconductivity" - Z> scan @attr 1=dc:title abc + Z> f @attr 1=dc_title @attr 4=2 "proceeding congress superconductivity" + Z> scan @attr 1=dc_title abc ]]> or the proprietary extentions x-pquery and x-pScanClause to - SRU, and SRW + &sru;, and &srw; - See for more information on SRU/SRW - configuration, and or - - the YAZ manual CQL section - for the details - of the YAZ frontend server - CQL - configuration. + See for more information on &sru;/&srw; + configuration, and or the &yaz; + &cql; section + for the details or the &yaz; frontend server. Notice that there are no *.abs, - *.est, *.map, or other GRS-1 + *.est, *.map, or other &grs1; filter configuration files involves in this process, and that the literal index names are used during search and retrieval. - - +
+
- +
ALVIS Record Model Configuration - +
ALVIS Indexing Configuration As mentioned above, there can be only one indexing stylesheet, and configuration of the indexing process is a synonym - of writing an XSLT stylesheet which produces XML output containing the + of writing an &xslt; stylesheet which produces &xml; output containing the magic elements discussed in . Obviously, there are million of different ways to accomplish this @@ -250,30 +246,30 @@ Stylesheets can be written in the pull or the push style: pull - means that the output XML structure is taken as starting point of - the internal structure of the XSLT stylesheet, and portions of - the input XML are pulled out and inserted - into the right spots of the output XML structure. On the other - side, push XSLT stylesheets are recursavly + means that the output &xml; structure is taken as starting point of + the internal structure of the &xslt; stylesheet, and portions of + the input &xml; are pulled out and inserted + into the right spots of the output &xml; structure. On the other + side, push &xslt; stylesheets are recursavly calling their template definitions, a process which is commanded - by the input XML structure, and avake to produce some output XML + by the input &xml; structure, and avake to produce some output &xml; whenever some special conditions in the input styelsheets are met. The pull type is well-suited for input - XML with strong and well-defined structure and semantcs, like the - following OAI indexing example, whereas the + &xml; with strong and well-defined structure and semantcs, like the + following &oai; indexing example, whereas the push type might be the only possible way to - sort out deeply recursive input XML formats. + sort out deeply recursive input &xml; formats. A pull stylesheet example used to index - OAI harvested records could use some of the following template + &oai; harvested records could use some of the following template definitions: @@ -286,14 +282,14 @@ - + - + - + @@ -302,7 +298,7 @@ - + @@ -316,17 +312,17 @@ Notice also, that the names and types of the indexes can be defined in the - indexing XSLT stylesheet dynamically according to - content in the original XML records, which has + indexing &xslt; stylesheet dynamically according to + content in the original &xml; records, which has opportunities for great power and wizardery as well as grande disaster. The following excerpt of a push stylesheet might - be a good idea according to your strict control of the XML + be a good idea according to your strict control of the &xml; input format (due to rigerours checking against well-defined and - tight RelaxNG or XML Schema's, for example): + tight RelaxNG or &xml; Schema's, for example): @@ -337,11 +333,11 @@ ]]> This template creates indexes which have the name of the working - node of any input XML file, and assigns a '1' to the index. + node of any input &xml; file, and assigns a '1' to the index. The example query find @attr 1=xyz 1 finds all files which contain at least one - xyz XML element. In case you can not control + xyz &xml; element. In case you can not control which element names the input files contain, you might ask for disaster and bad karma using this technique. @@ -373,24 +369,24 @@ to suffering and pain, and universal disentigration of your project schedule. - +
- +
ALVIS Exchange Formats An exchange format can be anything which can be the outcome of an - XSLT transformation, as far as the stylesheet is registered in - the main Alvis XSLT filter configuration file, see + &xslt; transformation, as far as the stylesheet is registered in + the main Alvis &xslt; filter configuration file, see . - In principle anything that can be expressed in XML, HTML, and + In principle anything that can be expressed in &xml;, HTML, and TEXT can be the output of a schema or element set directive during search, as long as the information comes from the - original input record XML DOM tree - (and not the transformed and indexed XML!!). + original input record &xml; &dom; tree + (and not the transformed and indexed &xml;!!). - In addition, internal administrative information from the Zebra + In addition, internal administrative information from the &zebra; indexer can be accessed during record retrieval. The following example is a summary of the possibilities: @@ -422,18 +418,18 @@ - +
- - ALVIS Filter OAI Indexing Example +
+ ALVIS Filter &oai; Indexing Example The sourcecode tarball contains a working Alvis filter example in the directory examples/alvis-oai/, which should get you started. - More example data can be harvested from any OAI complient server, - see details at the OAI + More example data can be harvested from any &oai; complient server, + see details at the &oai; http://www.openarchives.org/ web site, and the community links at @@ -444,9 +440,9 @@ http://www.oaforum.org/tutorial/. - +
- +
@@ -454,7 +450,7 @@