X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Farchitecture.xml;h=b6fe7cf140f8f6bde45b9c40678336cbc4c76117;hb=3fe5d30485d3fc95b24ee5e7dc75971447ecb5aa;hp=a4872f611c5b9fccf0ccb5762cd6b81c1e0660d5;hpb=25801551c8321842e7b0c2a65925692ccf63a9e4;p=idzebra-moved-to-github.git diff --git a/doc/architecture.xml b/doc/architecture.xml index a4872f6..b6fe7cf 100644 --- a/doc/architecture.xml +++ b/doc/architecture.xml @@ -1,11 +1,10 @@ - + Overview of Zebra Architecture - - +
Local Representation - + As mentioned earlier, Zebra places few restrictions on the type of data that you can index and manage. Generally, whatever the form of @@ -30,9 +29,9 @@ "grs" keyword, separated by "." characters. --> - +
- +
Main Components The Zebra system is designed to support a wide range of data management @@ -58,7 +57,7 @@ documentation and modules. - +
Core Zebra Libraries Containing Common Functionality The core Zebra module is the meat of the zebraidx @@ -129,10 +128,10 @@ idzebra-2.0-common includes common essential Zebra configuration files. - +
- +
Zebra Indexer The zebraidx @@ -145,9 +144,9 @@ The Debian package idzebra-2.0-utils contains the zebraidx utility. - +
- +
Zebra Searcher/Retriever This is the executable which runs the Z39.50/SRU/SRW server and @@ -158,9 +157,9 @@ The Debian package idzebra-2.0-utils contains the zebrasrv utility. - +
- +
YAZ Server Frontend The YAZ server frontend is @@ -171,28 +170,28 @@ In addition to Z39.50 requests, the YAZ server frontend acts as HTTP server, honoring - SRW - SOAP requests, and - SRU - REST requests. Moreover, it can + SRU SOAP + requests, and + SRU REST + requests. Moreover, it can translate incoming CQL queries to - PQF + PQF queries, if correctly configured. - YAZ + YAZ is an Open Source toolkit that allows you to develop software using the ANSI Z39.50/ISO23950 standard for information retrieval. It is packaged in the Debian packages yaz and libyaz. - +
- +
Record Models and Filter Modules The hard work of knowing what to index, @@ -209,18 +208,18 @@ - +
TEXT Record Model and Filter Module Plain ASCII text filter. TODO: add information here. - +
- +
GRS Record Model and Filter Modules The GRS filter modules described in - + are all based on the Z39.50 specifications, and it is absolutely mandatory to have the reference pages on BIB-1 attribute sets on you hand when configuring GRS filters. The GRS filters come in @@ -255,14 +254,14 @@ The Debian package libidzebra-2.0-mod-grs-xml includes the grs.xml filter which uses Expat to + url="&url.expat;">Expat to parse records in XML and turn them into IDZebra's internal GRS node trees. Have also a look at the Alvis XML/XSLT filter described in the next session. - +
- +
ALVIS Record Model and Filter Module The Alvis filter for XML files is an XSLT based input @@ -302,23 +301,23 @@ The Debian package libidzebra-2.0-mod-alvis contains the Alvis filter module. - +
- +
- +
- +
Indexing and Retrieval Workflow @@ -368,9 +367,160 @@ - - +
+
+ Retrieval of Zebra internal record data + + Starting with Zebra version 2.0.5 or newer, it is + possible to use a special element set which has the prefix + zebra::. + + + Using this element will, regardless of record type, return + Zebra's internal index structure/data for a record. + In particular, the regular record filters are not invoked when + these are in use. + This can in some cases make the retrival faster than regular + retrieval operations (for MARC, XML etc). + + + Special Retrieval Elements + + + + Element Set + Description + Syntax + + + + + zebra::meta::sysno + Get Zebra record system ID + XML and SUTRS + + + zebra::data + Get raw record + all + + + zebra::meta + Get Zebra record internal metadata + XML and SUTRS + + + zebra::index + Get all indexed keys for record + XML and SUTRS + + + + zebra::index::f + + + Get indexed keys for field f for record + + XML and SUTRS + + + + zebra::index::f:t + + + Get indexed keys for field f + and type t for record + + XML and SUTRS + + + +
+ + For example, to fetch the raw binary record data stored in the + zebra internal storage, or on the filesystem, the following + commands can be issued: + + Z> f @attr 1=title my + Z> format xml + Z> elements zebra::data + Z> s 1+1 + Z> format sutrs + Z> s 1+1 + Z> format usmarc + Z> s 1+1 + + + + The special + zebra::data element set name is + defined for any record syntax, but will always fetch + the raw record data in exactly the original form. No record syntax + specific transformations will be applied to the raw record data. + + + Also, Zebra internal metadata about the record can be accessed: + + Z> f @attr 1=title my + Z> format xml + Z> elements zebra::meta::sysno + Z> s 1+1 + + displays in XML record syntax only internal + record system number, whereas + + Z> f @attr 1=title my + Z> format xml + Z> elements zebra::meta + Z> s 1+1 + + displays all available metadata on the record. These include sytem + number, database name, indexed filename, filter used for indexing, + score and static ranking information and finally bytesize of record. + + + Sometimes, it is very hard to figure out what exactly has been + indexed how and in which indexes. Using the indexing stylesheet of + the Alvis filter, one can at least see which portion of the record + went into which index, but a similar aid does not exist for all + other indexing filters. + + + The special + zebra::index element set names are provided to + access information on per record indexed fields. For example, the + queries + + Z> f @attr 1=title my + Z> format sutrs + Z> elements zebra::index + Z> s 1+1 + + will display all indexed tokens from all indexed fields of the + first record, and it will display in SUTRS + record syntax, whereas + + Z> f @attr 1=title my + Z> format xml + Z> elements zebra::index::title + Z> s 1+1 + Z> elements zebra::index::title:p + Z> s 1+1 + + displays in XML record syntax only the content + of the zebra string index title, or + even only the type p phrase indexed part of it. + + + + Trying to access numeric Bib-1 use + attributes or trying to access non-existent zebra intern string + access points will result in a Diagnostic 25: Specified element set + 'name not valid for specified database. + + +