X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Farchitecture.xml;h=b13903830341a8d00415815f170f160eacbcafc6;hb=b9c1a6fcf5c4821d0190efdecbc14ea5d6c96aec;hp=4cb8dae9d13bf499f678e784ab962e62c516c0d6;hpb=d659f801ef702c37a0d6bcbf6c227fdcc4e75520;p=idzebra-moved-to-github.git diff --git a/doc/architecture.xml b/doc/architecture.xml index 4cb8dae..b139038 100644 --- a/doc/architecture.xml +++ b/doc/architecture.xml @@ -1,11 +1,11 @@ - + Overview of Zebra Architecture - +
Local Representation - + As mentioned earlier, Zebra places few restrictions on the type of data that you can index and manage. Generally, whatever the form of @@ -30,62 +30,9 @@ "grs" keyword, separated by "." characters. --> - - - - Indexing and Retrieval Workflow - - - Records pass through three different states during processing in the - system. - - - - - - - - - When records are accessed by the system, they are represented - in their local, or native format. This might be SGML or HTML files, - News or Mail archives, MARC records. If the system doesn't already - know how to read the type of data you need to store, you can set up an - input filter by preparing conversion rules based on regular - expressions and possibly augmented by a flexible scripting language - (Tcl). - The input filter produces as output an internal representation, - a tree structure. - - - - - - - When records are processed by the system, they are represented - in a tree-structure, constructed by tagged data elements hanging off a - root node. The tagged elements may contain data or yet more tagged - elements in a recursive structure. The system performs various - actions on this tree structure (indexing, element selection, schema - mapping, etc.), - - - - - - - Before transmitting records to the client, they are first - converted from the internal structure to a form suitable for exchange - over the network - according to the Z39.50 standard. - - - - - - - - +
- +
Main Components The Zebra system is designed to support a wide range of data management @@ -99,68 +46,121 @@ The Zebra indexer and information retrieval server consists of the - following main applications: the zebraidx - indexing maintenance utility, and the zebrasrv - information query and retireval server. Both are using some of the + following main applications: the zebraidx + indexing maintenance utility, and the zebrasrv + information query and retrieval server. Both are using some of the same main components, which are presented here. - This virtual package installs all the necessary packages to start + The virtual Debian package idzebra-2.0 + installs all the necessary packages to start working with Zebra - including utility programs, development libraries, - documentation and modules. - idzebra1.4 + documentation and modules. - - Core Zebra Module Containing Common Functionality +
+ Core Zebra Libraries Containing Common Functionality - - loads external filter modules used for presenting - the recods in a search response. - - executes search requests in PQF/RPN, which are handed over from - the YAZ server frontend API - - calls resorting/reranking algorithms on the hit sets - - returns - possibly ranked - result sets, hit - numbers, and the like internal data to the YAZ server backend API. - + The core Zebra module is the meat of the zebraidx + indexing maintenance utility, and the zebrasrv + information query and retrieval server binaries. Shortly, the core + libraries are responsible for + + + Dynamic Loading + + of external filter modules, in case the application is + not compiled statically. These filter modules define indexing, + search and retrieval capabilities of the various input formats. + + + + + Index Maintenance + + Zebra maintains Term Dictionaries and ISAM index + entries in inverted index structures kept on disk. These are + optimized for fast inset, update and delete, as well as good + search performance. + + + + + Search Evaluation + + by execution of search requests expressed in PQF/RPN + data structures, which are handed over from + the YAZ server frontend API. Search evaluation includes + construction of hit lists according to boolean combinations + of simpler searches. Fast performance is achieved by careful + use of index structures, and by evaluation specific index hit + lists in correct order. + + + + + Ranking and Sorting + + + components call resorting/re-ranking algorithms on the hit + sets. These might also be pre-sorted not only using the + assigned document ID's, but also using assigned static rank + information. + + + + + Record Presentation + + returns - possibly ranked - result sets, hit + numbers, and the like internal data to the YAZ server backend API + for shipping to the client. Each individual filter module + implements it's own specific presentation formats. + + + + + - This package contains all run-time libraries for Zebra. - libidzebra1.4 - This package includes documentation for Zebra in PDF and HTML. - idzebra1.4-doc - This package includes common essential Zebra configuration files - idzebra1.4-common + The Debian package libidzebra-2.0 + contains all run-time libraries for Zebra, the + documentation in PDF and HTML is found in + idzebra-2.0-doc, and + idzebra-2.0-common + includes common essential Zebra configuration files. - +
- +
Zebra Indexer - the core Zebra indexer which - - loads external filter modules used for indexing data records of - different type. - - creates, updates and drops databases and indexes + The zebraidx + indexing maintenance utility + loads external filter modules used for indexing data records of + different type, and creates, updates and drops databases and + indexes according to the rules defined in the filter modules. - This package contains Zebra utilities such as the zebraidx indexer - utility and the zebrasrv server. - idzebra1.4-utils + The Debian package idzebra-2.0-utils contains + the zebraidx utility. - +
- +
Zebra Searcher/Retriever - the core Zebra searcher/retriever which + This is the executable which runs the Z39.50/SRU/SRW server and + glues together the core libraries and the filter modules to one + great Information Retrieval server application. - This package contains Zebra utilities such as the zebraidx indexer - utility and the zebrasrv server, and their associated man pages. - idzebra1.4-utils + The Debian package idzebra-2.0-utils contains + the zebrasrv utility. - +
- +
YAZ Server Frontend The YAZ server frontend is @@ -170,488 +170,205 @@ In addition to Z39.50 requests, the YAZ server frontend acts - as HTTP server, honouring - SRW SOAP requests, and SRU REST requests. Moreover, it can - translate inco ming CQL queries to PQF/RPN queries, if + as HTTP server, honoring + SRU SOAP + requests, and + SRU REST + requests. Moreover, it can + translate incoming + CQL + queries to + PQF + queries, if correctly configured. - YAZ is a toolkit that allows you to develop software using the - ANSI Z39.50/ISO23950 standard for information retrieval. - SRW/ SRU - libyazthread.so - libyaz.so - libyaz + YAZ + is an Open Source + toolkit that allows you to develop software using the + ANSI Z39.50/ISO23950 standard for information retrieval. + It is packaged in the Debian packages + yaz and libyaz. - +
- +
Record Models and Filter Modules - all filter modules which do indexing and record display filtering: -This virtual package contains all base IDZebra filter modules. EMPTY ??? - libidzebra1.4-modules + The hard work of knowing what to index, + how to do it, and which + part of the records to send in a search/retrieve response is + implemented in + various filter modules. It is their responsibility to define the + exact indexing and record display filtering rules. + + + The virtual Debian package + libidzebra-2.0-modules installs all base filter + modules. - + +
TEXT Record Model and Filter Module - Plain ASCII text filter - + Plain ASCII text filter. TODO: add information here. - +
- +
GRS Record Model and Filter Modules - - - - grs.danbib GRS filters of various kind (*.abs files) -IDZebra filter grs.danbib (DBC DanBib records) - This package includes grs.danbib filter which parses DanBib records. - DanBib is the Danish Union Catalogue hosted by DBC - (Danish Bibliographic Centre). - libidzebra1.4-mod-grs-danbib - - - - grs.marc - - grs.marcxml - This package includes the grs.marc and grs.marcxml filters that allows - IDZebra to read MARC records based on ISO2709. - - libidzebra1.4-mod-grs-marc - - - grs.regx - - grs.tcl GRS TCL scriptable filter - This package includes the grs.regx and grs.tcl filters. - libidzebra1.4-mod-grs-regx - - - - grs.sgml - libidzebra1.4-mod-grs-sgml not packaged yet ?? - - - grs.xml - This package includes the grs.xml filter which uses Expat to - parse records in XML and turn them into IDZebra's internal grs node. - libidzebra1.4-mod-grs-xml + The GRS filter modules described in + + are all based on the Z39.50 specifications, and it is absolutely + mandatory to have the reference pages on BIB-1 attribute sets on + you hand when configuring GRS filters. The GRS filters come in + different flavors, and a short introduction is needed here. + GRS filters of various kind have also been called ABS filters due + to the *.abs configuration file suffix. + + + The grs.marc and + grs.marcxml filters are suited to parse and + index binary and XML versions of traditional library MARC records + based on the ISO2709 standard. The Debian package for both + filters is + libidzebra-2.0-mod-grs-marc. + + + GRS TCL scriptable filters for extensive user configuration come + in two flavors: a regular expression filter + grs.regx using TCL regular expressions, and + a general scriptable TCL filter called + grs.tcl + are both included in the + libidzebra-2.0-mod-grs-regx Debian package. - + + A general purpose SGML filter is called + grs.sgml. This filter is not yet packaged, + but planned to be in the + libidzebra-2.0-mod-grs-sgml Debian package. + + + The Debian package + libidzebra-2.0-mod-grs-xml includes the + grs.xml filter which uses Expat to + parse records in XML and turn them into IDZebra's internal GRS node + trees. Have also a look at the Alvis XML/XSLT filter described in + the next session. + +
- +
ALVIS Record Model and Filter Module - - alvis Experimental Alvis XSLT filter - mod-alvis.so - libidzebra1.4-mod-alvis + The Alvis filter for XML files is an XSLT based input + filter. + It indexes element and attribute content of any thinkable XML format + using full XPATH support, a feature which the standard Zebra + GRS SGML and XML filters lacked. The indexed documents are + parsed into a standard XML DOM tree, which restricts record size + according to availability of memory. + + + The Alvis filter + uses XSLT display stylesheets, which let + the Zebra DB administrator associate multiple, different views on + the same XML document type. These views are chosen on-the-fly in + search time. - - - - SAFARI Record Model and Filter Module - - safari - + In addition, the Alvis filter configuration is not bound to the + arcane BIB-1 Z39.50 library catalogue indexing traditions and + folklore, and is therefore easier to understand. - - - - - - - - -And in passwordfile, specify users and passwords .. -admin:secret +
-We can now start a yaz-client admin session and create a database: +
-$ yaz-client localhost:9999 -u admin/secret -Authentication set to Open (admin/secret) -Connecting...OK. -Sent initrequest. -Connection accepted by v3 target. -ID : 81 -Name : Zebra Information Server/GFS/YAZ -Version: Zebra 1.4.0/1.63/2.1.9 -Options: search present delSet triggerResourceCtrl scan sort -extendedServices namedResultSets -Elapsed: 0.007046 -Z> adm-create -Admin request -Got extended services response -Status: done -Elapsed: 0.045009 -: -Now Default was created.. We can now insert an XML file (esdd0006.grs -from example/gils/records) and index it: -Z> update insert 1 esdd0006.grs -Got extended services response -Status: done -Elapsed: 0.438016 +
+ Indexing and Retrieval Workflow -The 3rd parameter.. 1 here .. is the opaque record id from Ext update. -It a record ID that _we_ assign to the record in question. If we do not -assign one the usual rules for match apply (recordId: from zebra.cfg). + + Records pass through three different states during processing in the + system. + -Actually, we should have a way to specify "no opaque record id" for -yaz-client's update command.. We'll fix that. + -Elapsed: 0.438016 -Z> f utah -Sent searchRequest. -Received SearchResponse. -Search was a success. -Number of hits: 1, setno 1 -SearchResult-1: term=utah cnt=1 -records returned: 0 -Elapsed: 0.014179 + + + + + When records are accessed by the system, they are represented + in their local, or native format. This might be SGML or HTML files, + News or Mail archives, MARC records. If the system doesn't already + know how to read the type of data you need to store, you can set up an + input filter by preparing conversion rules based on regular + expressions and possibly augmented by a flexible scripting language + (Tcl). + The input filter produces as output an internal representation, + a tree structure. -Let's delete the beast: -Z> update delete 1 -No last record (update ignored) -Z> update delete 1 esdd0006.grs -Got extended services response -Status: done -Elapsed: 0.072441 -Z> f utah -Sent searchRequest. -Received SearchResponse. -Search was a success. -Number of hits: 0, setno 2 -SearchResult-1: term=utah cnt=0 -records returned: 0 -Elapsed: 0.013610 + + + -If shadow register is enabled you must run the adm-commit command in -order write your changes.. + + When records are processed by the system, they are represented + in a tree-structure, constructed by tagged data elements hanging off a + root node. The tagged elements may contain data or yet more tagged + elements in a recursive structure. The system performs various + actions on this tree structure (indexing, element selection, schema + mapping, etc.), - + + + + + Before transmitting records to the client, they are first + converted from the internal structure to a form suitable for exchange + over the network - according to the Z39.50 standard. + + + - ---> + +