X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Farchitecture.xml;h=b6fe7cf140f8f6bde45b9c40678336cbc4c76117;hb=4aae319a0b820d1e8d3ab5d82c48f5047c9995f9;hp=60281e3f7acca948c3d5e1c7288ffbbeb3725845;hpb=0381d7dc936e74ac2fb55ad217b760c97ace0d5b;p=idzebra-moved-to-github.git diff --git a/doc/architecture.xml b/doc/architecture.xml index 60281e3..b6fe7cf 100644 --- a/doc/architecture.xml +++ b/doc/architecture.xml @@ -1,11 +1,10 @@ - + Overview of Zebra Architecture - - +
Local Representation - + As mentioned earlier, Zebra places few restrictions on the type of data that you can index and manage. Generally, whatever the form of @@ -30,9 +29,9 @@ "grs" keyword, separated by "." characters. --> - +
- +
Main Components The Zebra system is designed to support a wide range of data management @@ -52,13 +51,13 @@ same main components, which are presented here. - The virtual Debian package idzebra1.4 + The virtual Debian package idzebra-2.0 installs all the necessary packages to start working with Zebra - including utility programs, development libraries, documentation and modules. - +
Core Zebra Libraries Containing Common Functionality The core Zebra module is the meat of the zebraidx @@ -122,17 +121,17 @@ - The Debian package libidzebra1.4 + The Debian package libidzebra-2.0 contains all run-time libraries for Zebra, the documentation in PDF and HTML is found in - idzebra1.4-doc, and - idzebra1.4-common + idzebra-2.0-doc, and + idzebra-2.0-common includes common essential Zebra configuration files. - +
- +
Zebra Indexer The zebraidx @@ -142,12 +141,12 @@ indexes according to the rules defined in the filter modules. - The Debian package idzebra1.4-utils contains + The Debian package idzebra-2.0-utils contains the zebraidx utility. - +
- +
Zebra Searcher/Retriever This is the executable which runs the Z39.50/SRU/SRW server and @@ -155,12 +154,12 @@ great Information Retrieval server application. - The Debian package idzebra1.4-utils contains + The Debian package idzebra-2.0-utils contains the zebrasrv utility. - +
- +
YAZ Server Frontend The YAZ server frontend is @@ -171,28 +170,28 @@ In addition to Z39.50 requests, the YAZ server frontend acts as HTTP server, honoring - SRW - SOAP requests, and - SRU - REST requests. Moreover, it can + SRU SOAP + requests, and + SRU REST + requests. Moreover, it can translate incoming - CQL + CQL queries to - PQF + PQF queries, if correctly configured. - YAZ + YAZ is an Open Source toolkit that allows you to develop software using the ANSI Z39.50/ISO23950 standard for information retrieval. It is packaged in the Debian packages yaz and libyaz. - +
- +
Record Models and Filter Modules The hard work of knowing what to index, @@ -204,25 +203,23 @@ The virtual Debian package - libidzebra1.4-modules installs all base filter + libidzebra-2.0-modules installs all base filter modules. - + +
TEXT Record Model and Filter Module Plain ASCII text filter. TODO: add information here. - - +
- +
GRS Record Model and Filter Modules The GRS filter modules described in - + are all based on the Z39.50 specifications, and it is absolutely mandatory to have the reference pages on BIB-1 attribute sets on you hand when configuring GRS filters. The GRS filters come in @@ -231,20 +228,12 @@ to the *.abs configuration file suffix. - The grs.danbib filter is developed for - DBC DanBib records. - DanBib is the Danish Union Catalogue hosted by DBC - (Danish Bibliographic Center). This filter is found in the - Debian package - libidzebra1.4-mod-grs-danbib. - - The grs.marc and grs.marcxml filters are suited to parse and index binary and XML versions of traditional library MARC records based on the ISO2709 standard. The Debian package for both filters is - libidzebra1.4-mod-grs-marc. + libidzebra-2.0-mod-grs-marc. GRS TCL scriptable filters for extensive user configuration come @@ -253,26 +242,26 @@ a general scriptable TCL filter called grs.tcl are both included in the - libidzebra1.4-mod-grs-regx Debian package. + libidzebra-2.0-mod-grs-regx Debian package. A general purpose SGML filter is called grs.sgml. This filter is not yet packaged, but planned to be in the - libidzebra1.4-mod-grs-sgml Debian package. + libidzebra-2.0-mod-grs-sgml Debian package. The Debian package - libidzebra1.4-mod-grs-xml includes the + libidzebra-2.0-mod-grs-xml includes the grs.xml filter which uses Expat to + url="&url.expat;">Expat to parse records in XML and turn them into IDZebra's internal GRS node trees. Have also a look at the Alvis XML/XSLT filter described in the next session. - +
- +
ALVIS Record Model and Filter Module The Alvis filter for XML files is an XSLT based input @@ -309,27 +298,26 @@ . - The Debian package libidzebra1.4-mod-alvis + The Debian package libidzebra-2.0-mod-alvis contains the Alvis filter module. - +
- + - +
+ --> -
+
-
+ - +
Indexing and Retrieval Workflow @@ -379,111 +367,160 @@ - +
- - - + Starting with Zebra version 2.0.5 or newer, it is + possible to use a special element set which has the prefix + zebra::. + + + Using this element will, regardless of record type, return + Zebra's internal index structure/data for a record. + In particular, the regular record filters are not invoked when + these are in use. + This can in some cases make the retrival faster than regular + retrieval operations (for MARC, XML etc). + + + Special Retrieval Elements + + + + Element Set + Description + Syntax + + + + + zebra::meta::sysno + Get Zebra record system ID + XML and SUTRS + + + zebra::data + Get raw record + all + + + zebra::meta + Get Zebra record internal metadata + XML and SUTRS + + + zebra::index + Get all indexed keys for record + XML and SUTRS + + + + zebra::index::f + + + Get indexed keys for field f for record + + XML and SUTRS + + + + zebra::index::f:t + + + Get indexed keys for field f + and type t for record + + XML and SUTRS + + + +
+ + For example, to fetch the raw binary record data stored in the + zebra internal storage, or on the filesystem, the following + commands can be issued: + + Z> f @attr 1=title my + Z> format xml + Z> elements zebra::data + Z> s 1+1 + Z> format sutrs + Z> s 1+1 + Z> format usmarc + Z> s 1+1 + + + + The special + zebra::data element set name is + defined for any record syntax, but will always fetch + the raw record data in exactly the original form. No record syntax + specific transformations will be applied to the raw record data. + + + Also, Zebra internal metadata about the record can be accessed: + + Z> f @attr 1=title my + Z> format xml + Z> elements zebra::meta::sysno + Z> s 1+1 + + displays in XML record syntax only internal + record system number, whereas + + Z> f @attr 1=title my + Z> format xml + Z> elements zebra::meta + Z> s 1+1 + + displays all available metadata on the record. These include sytem + number, database name, indexed filename, filter used for indexing, + score and static ranking information and finally bytesize of record. + + + Sometimes, it is very hard to figure out what exactly has been + indexed how and in which indexes. Using the indexing stylesheet of + the Alvis filter, one can at least see which portion of the record + went into which index, but a similar aid does not exist for all + other indexing filters. + + + The special + zebra::index element set names are provided to + access information on per record indexed fields. For example, the + queries + + Z> f @attr 1=title my + Z> format sutrs + Z> elements zebra::index + Z> s 1+1 + + will display all indexed tokens from all indexed fields of the + first record, and it will display in SUTRS + record syntax, whereas + + Z> f @attr 1=title my + Z> format xml + Z> elements zebra::index::title + Z> s 1+1 + Z> elements zebra::index::title:p + Z> s 1+1 + + displays in XML record syntax only the content + of the zebra string index title, or + even only the type p phrase indexed part of it. + + + + Trying to access numeric Bib-1 use + attributes or trying to access non-existent zebra intern string + access points will result in a Diagnostic 25: Specified element set + 'name not valid for specified database. + + +