X-Git-Url: http://git.indexdata.com/?p=idzebra-moved-to-github.git;a=blobdiff_plain;f=doc%2Farchitecture.xml;fp=doc%2Farchitecture.xml;h=ce39c222bd1b8a23aa2e11900bfee756c48b49d8;hp=5b3a27ebd2b5cc89cc08f7225dca9864c90487be;hb=972bceaa6386f904bc3e4845f1c5598656c5c6f2;hpb=bb39ca3dd76e6339f66813bca1e64b644760e5a2 diff --git a/doc/architecture.xml b/doc/architecture.xml index 5b3a27e..ce39c22 100644 --- a/doc/architecture.xml +++ b/doc/architecture.xml @@ -3,7 +3,7 @@
Local Representation - + As mentioned earlier, &zebra; places few restrictions on the type of data that you can index and manage. Generally, whatever the form of @@ -48,28 +48,28 @@ indexing maintenance utility, and the zebrasrv information query and retrieval server. Both are using some of the same main components, which are presented here. - - + + The virtual Debian package idzebra-2.0 installs all the necessary packages to start working with &zebra; - including utility programs, development libraries, - documentation and modules. - - + documentation and modules. + +
Core &zebra; Libraries Containing Common Functionality The core &zebra; module is the meat of the zebraidx indexing maintenance utility, and the zebrasrv information query and retrieval server binaries. Shortly, the core - libraries are responsible for + libraries are responsible for Dynamic Loading of external filter modules, in case the application is not compiled statically. These filter modules define indexing, - search and retrieval capabilities of the various input formats. + search and retrieval capabilities of the various input formats. @@ -92,7 +92,7 @@ construction of hit lists according to boolean combinations of simpler searches. Fast performance is achieved by careful use of index structures, and by evaluation specific index hit - lists in correct order. + lists in correct order. @@ -103,7 +103,7 @@ components call resorting/re-ranking algorithms on the hit sets. These might also be pre-sorted not only using the assigned document ID's, but also using assigned static rank - information. + information. @@ -119,27 +119,27 @@ - - The Debian package libidzebra-2.0 - contains all run-time libraries for &zebra;, the - documentation in PDF and HTML is found in + + The Debian package libidzebra-2.0 + contains all run-time libraries for &zebra;, the + documentation in PDF and HTML is found in idzebra-2.0-doc, and idzebra-2.0-common includes common essential &zebra; configuration files.
- +
&zebra; Indexer The zebraidx - indexing maintenance utility + indexing maintenance utility loads external filter modules used for indexing data records of different type, and creates, updates and drops databases and indexes according to the rules defined in the filter modules. - - + + The Debian package idzebra-2.0-utils contains the zebraidx utility. @@ -150,9 +150,9 @@ This is the executable which runs the &acro.z3950;/&acro.sru;/&acro.srw; server and glues together the core libraries and the filter modules to one - great Information Retrieval server application. - - + great Information Retrieval server application. + + The Debian package idzebra-2.0-utils contains the zebrasrv utility. @@ -161,67 +161,67 @@
&yaz; Server Frontend - The &yaz; server frontend is + The &yaz; server frontend is a full fledged stateful &acro.z3950; server taking client - connections, and forwarding search and scan requests to the + connections, and forwarding search and scan requests to the &zebra; core indexer. In addition to &acro.z3950; requests, the &yaz; server frontend acts as HTTP server, honoring - &acro.sru; &acro.soap; - requests, and + &acro.sru; &acro.soap; + requests, and &acro.sru; &acro.rest; requests. Moreover, it can - translate incoming + translate incoming &acro.cql; queries to &acro.pqf; queries, if - correctly configured. + correctly configured. &yaz; - is an Open Source + is an Open Source toolkit that allows you to develop software using the &acro.ansi; &acro.z3950;/ISO23950 standard for information retrieval. - It is packaged in the Debian packages + It is packaged in the Debian packages yaz and libyaz.
- +
Record Models and Filter Modules - The hard work of knowing what to index, + The hard work of knowing what to index, how to do it, and which part of the records to send in a search/retrieve response is - implemented in + implemented in various filter modules. It is their responsibility to define the exact indexing and record display filtering rules. The virtual Debian package libidzebra-2.0-modules installs all base filter - modules. + modules.
&acro.dom; &acro.xml; Record Model and Filter Module The &acro.dom; &acro.xml; filter uses a standard &acro.dom; &acro.xml; structure as - internal data model, and can thus parse, index, and display + internal data model, and can thus parse, index, and display any &acro.xml; document. A parser for binary &acro.marc; records based on the ISO2709 library standard is provided, it transforms these to the internal - &acro.marcxml; &acro.dom; representation. + &acro.marcxml; &acro.dom; representation. The internal &acro.dom; &acro.xml; representation can be fed into four different pipelines, consisting of arbitrarily many successive - &acro.xslt; transformations; these are for + &acro.xslt; transformations; these are for input parsing and initial transformations, @@ -245,7 +245,7 @@ static ranks. - Details on the experimental &acro.dom; &acro.xml; filter are found in + Details on the experimental &acro.dom; &acro.xml; filter are found in . @@ -259,14 +259,14 @@ The functionality of this record model has been improved and - replaced by the &acro.dom; &acro.xml; record model. See + replaced by the &acro.dom; &acro.xml; record model. See . The Alvis filter for &acro.xml; files is an &acro.xslt; based input - filter. + filter. It indexes element and attribute content of any thinkable &acro.xml; format using full &acro.xpath; support, a feature which the standard &zebra; &acro.grs1; &acro.sgml; and &acro.xml; filters lacked. The indexed documents are @@ -274,7 +274,7 @@ according to availability of memory. - The Alvis filter + The Alvis filter uses &acro.xslt; display stylesheets, which let the &zebra; DB administrator associate multiple, different views on the same &acro.xml; document type. These views are chosen on-the-fly in @@ -289,13 +289,13 @@ Finally, the Alvis filter allows for static ranking at index time, and to to sort hit lists according to predefined static ranks. This imposes no overhead at all, both - search and indexing perform still + search and indexing perform still O(1) irrespectively of document collection size. This feature resembles Google's pre-ranking using their PageRank algorithm. - Details on the experimental Alvis &acro.xslt; filter are found in + Details on the experimental Alvis &acro.xslt; filter are found in . @@ -309,12 +309,12 @@ The functionality of this record model has been improved and - replaced by the &acro.dom; &acro.xml; record model. See + replaced by the &acro.dom; &acro.xml; record model. See . - The &acro.grs1; filter modules described in + The &acro.grs1; filter modules described in are all based on the &acro.z3950; specifications, and it is absolutely mandatory to have the reference pages on &acro.bib1; attribute sets on @@ -324,39 +324,39 @@ to the *.abs configuration file suffix. - The grs.marc and + The grs.marc and grs.marcxml filters are suited to parse and - index binary and &acro.xml; versions of traditional library &acro.marc; records + index binary and &acro.xml; versions of traditional library &acro.marc; records based on the ISO2709 standard. The Debian package for both - filters is + filters is libidzebra-2.0-mod-grs-marc. &acro.grs1; TCL scriptable filters for extensive user configuration come - in two flavors: a regular expression filter + in two flavors: a regular expression filter grs.regx using TCL regular expressions, and - a general scriptable TCL filter called - grs.tcl - are both included in the + a general scriptable TCL filter called + grs.tcl + are both included in the libidzebra-2.0-mod-grs-regx Debian package. A general purpose &acro.sgml; filter is called grs.sgml. This filter is not yet packaged, - but planned to be in the + but planned to be in the libidzebra-2.0-mod-grs-sgml Debian package. - The Debian package - libidzebra-2.0-mod-grs-xml includes the + The Debian package + libidzebra-2.0-mod-grs-xml includes the grs.xml filter which uses Expat to + url="&url.expat;">Expat to parse records in &acro.xml; and turn them into ID&zebra;'s internal &acro.grs1; node trees. Have also a look at the Alvis &acro.xml;/&acro.xslt; filter described in the next session.
- +
TEXT Record Model and Filter Module @@ -390,7 +390,7 @@ - + When records are accessed by the system, they are represented in their local, or native format. This might be &acro.sgml; or HTML files, @@ -516,8 +516,8 @@ Get facet of a result set. The facet result is returned - as if it was a normal record, while in reality is a - recap of most "important" terms in a result set for the fields + as if it was a normal record, while in reality is a + recap of most "important" terms in a result set for the fields given. The facet facility first appeared in Zebra 2.0.20. @@ -542,28 +542,28 @@ - The special - zebra::data element set name is - defined for any record syntax, but will always fetch + The special + zebra::data element set name is + defined for any record syntax, but will always fetch the raw record data in exactly the original form. No record syntax - specific transformations will be applied to the raw record data. + specific transformations will be applied to the raw record data. - Also, &zebra; internal metadata about the record can be accessed: + Also, &zebra; internal metadata about the record can be accessed: Z> f @attr 1=title my Z> format xml Z> elements zebra::meta::sysno Z> s 1+1 - + displays in &acro.xml; record syntax only internal - record system number, whereas + record system number, whereas Z> f @attr 1=title my Z> format xml Z> elements zebra::meta Z> s 1+1 - + displays all available metadata on the record. These include system number, database name, indexed filename, filter used for indexing, score and static ranking information and finally bytesize of record. @@ -573,13 +573,13 @@ indexed how and in which indexes. Using the indexing stylesheet of the Alvis filter, one can at least see which portion of the record went into which index, but a similar aid does not exist for all - other indexing filters. + other indexing filters. The special zebra::index element set names are provided to access information on per record indexed fields. For example, the - queries + queries Z> f @attr 1=title my Z> format sutrs @@ -588,7 +588,7 @@ will display all indexed tokens from all indexed fields of the first record, and it will display in &acro.sutrs; - record syntax, whereas + record syntax, whereas Z> f @attr 1=title my Z> format xml @@ -596,7 +596,7 @@ Z> s 1+1 Z> elements zebra::index::title:p Z> s 1+1 - + displays in &acro.xml; record syntax only the content of the zebra string index title, or even only the type p phrase indexed part of it. @@ -611,7 +611,7 @@
- +