From 3592c160de9edabf9bdc7a5e0f592af6b9f938bb Mon Sep 17 00:00:00 2001 From: Marc Cromme Date: Tue, 21 Feb 2006 14:54:25 +0000 Subject: [PATCH] added more text on GRS and Alvis filters --- doc/architecture.xml | 379 ++++++++++++++++++++++++++++++++--------- doc/recordmodel-alvisxslt.xml | 24 ++- 2 files changed, 323 insertions(+), 80 deletions(-) diff --git a/doc/architecture.xml b/doc/architecture.xml index 1def20e..9890ab0 100644 --- a/doc/architecture.xml +++ b/doc/architecture.xml @@ -1,5 +1,5 @@ - + Overview of Zebra Architecture @@ -46,36 +46,88 @@ The Zebra indexer and information retrieval server consists of the - following main applications: the zebraidx - indexing maintenance utility, and the zebrasrv - information query and retireval server. Both are using some of the + following main applications: the zebraidx + indexing maintenance utility, and the zebrasrv + information query and retrieval server. Both are using some of the same main components, which are presented here. - This virtual package installs all the necessary packages to start + The virtual Debian package idzebra1.4 + installs all the necessary packages to start working with Zebra - including utility programs, development libraries, - documentation and modules. - idzebra1.4 + documentation and modules. - Core Zebra Module Containing Common Functionality + Core Zebra Libraries Containing Common Functionality - - loads external filter modules used for presenting - the recods in a search response. - - executes search requests in PQF/RPN, which are handed over from - the YAZ server frontend API - - calls resorting/reranking algorithms on the hit sets - - returns - possibly ranked - result sets, hit - numbers, and the like internal data to the YAZ server backend API. - + The core Zebra module is the meat of the zebraidx + indexing maintenance utility, and the zebrasrv + information query and retrieval server binaries. Shortly, the core + libraries are responsible for + + + Dynamic Loading + + of external filter modules, in case the application is + not compiled statically. These filter modules define indexing, + search and retrieval capabilities of the various input formats. + + + + + Index Maintenance + + Zebra maintains Term Dictionaries and ISAM index + entries in inverted index structures kept on disk. These are + optimized for fast inset, update and delete, as well as good + search performance. + + + + + Search Evaluation + + by execution of search requests expressed in PQF/RPN + data structures, which are handed over from + the YAZ server frontend API. Search evaluation includes + construction of hit lists according to boolean combinations + of simpler searches. Fast performance is achieved by careful + use of index structures, and by evaluation specific index hit + lists in correct order. + + + + + Ranking and Sorting + + + components call resorting/re-ranking algorithms on the hit + sets. These might also be pre-sorted not only using the + assigned document ID's, but also using assigned static rank + information. + + + + + Record Presentation + + returns - possibly ranked - result sets, hit + numbers, and the like internal data to the YAZ server backend API + for shipping to the client. Each individual filter module + implements it's own specific presentation formats. + + + + + - This package contains all run-time libraries for Zebra. - libidzebra1.4 - This package includes documentation for Zebra in PDF and HTML. - idzebra1.4-doc - This package includes common essential Zebra configuration files + The Debian package libidzebra1.4 + contains all run-time libraries for Zebra, the + documentation in PDF and HTML is found in + idzebra1.4-doc, and idzebra1.4-common + includes common essential Zebra configuration files. @@ -83,27 +135,28 @@ Zebra Indexer - the core Zebra indexer which - - loads external filter modules used for indexing data records of - different type. - - creates, updates and drops databases and indexes + The zebraidx + indexing maintenance utility + loads external filter modules used for indexing data records of + different type, and creates, updates and drops databases and + indexes according to the rules defined in the filter modules. - This package contains Zebra utilities such as the zebraidx indexer - utility and the zebrasrv server. - idzebra1.4-utils + The Debian package idzebra1.4-utils contains + the zebraidx utility. Zebra Searcher/Retriever - the core Zebra searcher/retriever which + This is the executable which runs the Z39.50/SRU/SRW server and + glues together the core libraries and the filter modules to one + great Information Retrieval server application. - This package contains Zebra utilities such as the zebraidx indexer - utility and the zebrasrv server, and their associated man pages. - idzebra1.4-utils + The Debian package idzebra1.4-utils contains + the zebrasrv utility. @@ -117,33 +170,48 @@ In addition to Z39.50 requests, the YAZ server frontend acts - as HTTP server, honouring - SRW SOAP requests, and SRU REST requests. Moreover, it can - translate inco ming CQL queries to PQF/RPN queries, if + as HTTP server, honoring + SRW + SOAP requests, and + SRU + REST requests. Moreover, it can + translate incoming + CQL + queries to + PQF + queries, if correctly configured. - YAZ is a toolkit that allows you to develop software using the - ANSI Z39.50/ISO23950 standard for information retrieval. - SRW/ SRU - libyazthread.so - libyaz.so - libyaz + YAZ + is an Open Source + toolkit that allows you to develop software using the + ANSI Z39.50/ISO23950 standard for information retrieval. + It is packaged in the Debian packages + yaz and libyaz. Record Models and Filter Modules - all filter modules which do indexing and record display filtering: -This virtual package contains all base IDZebra filter modules. EMPTY ??? - libidzebra1.4-modules + The hard work of knowing what to index, + how to do it, and which + part of the records to send in a search/retrieve response is + implemented in + various filter modules. It is their responsibility to define the + exact indexing and record display filtering rules. + + + The virtual Debian package + libidzebra1.4-modules installs all base filter + modules. TEXT Record Model and Filter Module - Plain ASCII text filter + Plain ASCII text filter. TODO: add information here. @@ -153,53 +221,103 @@ This virtual package contains all base IDZebra filter modules. EMPTY ??? GRS Record Model and Filter Modules + The GRS filter modules described in - - - grs.danbib GRS filters of various kind (*.abs files) -IDZebra filter grs.danbib (DBC DanBib records) - This package includes grs.danbib filter which parses DanBib records. - DanBib is the Danish Union Catalogue hosted by DBC - (Danish Bibliographic Centre). - libidzebra1.4-mod-grs-danbib - - - - grs.marc - - grs.marcxml - This package includes the grs.marc and grs.marcxml filters that allows - IDZebra to read MARC records based on ISO2709. - - libidzebra1.4-mod-grs-marc - - - grs.regx - - grs.tcl GRS TCL scriptable filter - This package includes the grs.regx and grs.tcl filters. - libidzebra1.4-mod-grs-regx - - - - grs.sgml - libidzebra1.4-mod-grs-sgml not packaged yet ?? - - - grs.xml - This package includes the grs.xml filter which uses Expat to - parse records in XML and turn them into IDZebra's internal grs node. - libidzebra1.4-mod-grs-xml + are all based on the Z39.50 specifications, and it is absolutely + mandatory to have the reference pages on BIB-1 attribute sets on + you hand when configuring GRS filters. The GRS filters come in + different flavors, and a short introduction is needed here. + GRS filters of various kind have also been called ABS filters due + to the *.abs configuration file suffix. + + + The grs.danbib filter is developed for + DBC DanBib records. + DanBib is the Danish Union Catalogue hosted by DBC + (Danish Bibliographic Center). This filter is found in the + Debian package + libidzebra1.4-mod-grs-danbib. + + + The grs.marc and + grs.marcxml filters are suited to parse and + index binary and XML versions of traditional library MARC records + based on the ISO2709 standard. The Debian package for both + filters is + libidzebra1.4-mod-grs-marc. + + + GRS TCL scriptable filters for extensive user configuration come + in two flavors: a regular expression filter + grs.regx using TCL regular expressions, and + a general scriptable TCL filter called + grs.tcl + are both included in the + libidzebra1.4-mod-grs-regx Debian package. + + + A general purpose SGML filter is called + grs.sgml. This filter is not yet packaged, + but planned to be in the + libidzebra1.4-mod-grs-sgml Debian package. + + + The Debian package + libidzebra1.4-mod-grs-xml includes the + grs.xml filter which uses Expat to + parse records in XML and turn them into IDZebra's internal GRS node + trees. Have also a look at the Alvis XML/XSLT filter described in + the next session. ALVIS Record Model and Filter Module - - - alvis Experimental Alvis XSLT filter - mod-alvis.so - libidzebra1.4-mod-alvis + The Alvis filter for XML files is an XSLT based input + filter. + It indexes element and attribute content of any thinkable XML format + using full XPATH support, a feature which the standard Zebra + GRS SGML and XML filters lacked. The indexed documents are + parsed into a standard XML DOM tree, which restricts record size + according to availability of memory. + + + The Alvis filter + uses XSLT display stylesheets, which let + the Zebra DB administrator associate multiple, different views on + the same XML document type. These views are chosen on-the-fly in + search time. + + + In addition, the Alvis filter configuration is not bound to the + arcane BIB-1 Z39.50 library catalogue indexing traditions and + folklore, and is therefore easier to understand. + + + Finally, the Alvis filter allows for static ranking at index + time, and to to sort hit lists according to predefined + static ranks. This imposes no overhead at all, both + search and indexing perform still + O(1) irrespectively of document + collection size. This feature resembles Googles pre-ranking using + their Pagerank algorithm. + + + Details on the experimental Alvis XSLT filter are found in + . + + + The Debian package libidzebra1.4-mod-alvis + contains the Alvis filter module. SAFARI Record Model and Filter Module - - safari + SAFARI filter module TODO: add information here. @@ -264,6 +382,109 @@ IDZebra filter grs.danbib (DBC DanBib records) + + + + ALVIS XML Record Model and Filter Module @@ -424,6 +424,28 @@ + + ALVIS Filter OAI Indexing Example + + The sourcecode tarball contains a working Alvis filter example in + the directory examples/alvis-oai/, which + should get you started. + + + More example data can be harvested from any OAI complient server, + see details at the OAI + + http://www.openarchives.org/ web site, and the community + links at + + http://www.openarchives.org/community/index.html. + There is a tutorial + found at + + http://www.oaforum.org/tutorial/. + + + -- 1.7.10.4