From 763bf5f4fc8d22feda4784ec7a9db01902902016 Mon Sep 17 00:00:00 2001 From: Adam Dickmeiss Date: Thu, 24 May 2007 13:44:09 +0000 Subject: [PATCH] Using acro. entities. Replaced some it's to its (where appropriate). --- doc/administration.xml | 94 +++++++------- doc/architecture.xml | 122 +++++++++--------- doc/examples.xml | 44 +++---- doc/installation.xml | 26 ++-- doc/introduction.xml | 120 +++++++++--------- doc/marc_indexing.xml | 52 ++++---- doc/querymodel.xml | 276 ++++++++++++++++++++--------------------- doc/quickstart.xml | 12 +- doc/recordmodel-alvisxslt.xml | 132 ++++++++++---------- doc/recordmodel-domxml.xml | 248 ++++++++++++++++++------------------ doc/recordmodel-grs.xml | 158 +++++++++++------------ doc/zebra.xml | 6 +- doc/zebraidx.xml | 4 +- doc/zebrasrv-options.xml | 12 +- doc/zebrasrv-virtual.xml | 30 ++--- doc/zebrasrv.xml | 164 ++++++++++++------------ 16 files changed, 750 insertions(+), 750 deletions(-) diff --git a/doc/administration.xml b/doc/administration.xml index d47fc77..a1a7da2 100644 --- a/doc/administration.xml +++ b/doc/administration.xml @@ -1,5 +1,5 @@ - + Administrating &zebra; @@ -418,7 +418,7 @@ of permissions currently: read (r) and write(w). By default users not listed in a permission directive are given the read privilege. To specify permissions for a user with no - username, or &z3950; anonymous style use + username, or &acro.z3950; anonymous style use anonymous. The permstring consists of a sequence of characters. Include character w for write/update access, r for read access and @@ -465,7 +465,7 @@ mounted on a CD-ROM drive, you may want &zebra; to make an internal copy of them. To do this, you specify 1 (true) in the storeData setting. When - the &z3950; server retrieves the records they will be read from the + the &acro.z3950; server retrieves the records they will be read from the internal file structures of the system. @@ -494,7 +494,7 @@ Consider a system in which you have a group of text files called simple. - That group of records should belong to a &z3950; database called + That group of records should belong to a &acro.z3950; database called textbase. The following zebra.cfg file will suffice: @@ -613,7 +613,7 @@ information. If you have a group of records that explicitly associates an ID with each record, this method is convenient. For example, the record format may contain a title or a ID-number - unique within the group. - In either case you specify the &z3950; attribute set and use-attribute + In either case you specify the &acro.z3950; attribute set and use-attribute location in which this information is stored, and the system looks at that field to determine the identity of the record. @@ -700,7 +700,7 @@ For instance, the sample GILS records that come with the &zebra; distribution contain a unique ID in the data tagged Control-Identifier. - The data is mapped to the &bib1; use attribute Identifier-standard + The data is mapped to the &acro.bib1; use attribute Identifier-standard (code 1007). To use this field as a record id, specify (bib1,Identifier-standard) as the value of the recordId in the configuration file. @@ -1049,7 +1049,7 @@ The experimental alvis filter provides a - directive to fetch static rank information out of the indexed &xml; + directive to fetch static rank information out of the indexed &acro.xml; records, thus making all hit sets ordered after ascending static rank, and for those doc's which have the same static rank, ordered @@ -1086,21 +1086,21 @@ indexing time (this is why we call it ``dynamic ranking'' in the first place ...) It is invoked by adding - the &bib1; relation attribute with - value ``relevance'' to the &pqf; query (that is, + the &acro.bib1; relation attribute with + value ``relevance'' to the &acro.pqf; query (that is, @attr 2=102, see also - The &bib1; Attribute Set Semantics, also in + The &acro.bib1; Attribute Set Semantics, also in HTML). To find all articles with the word Eoraptor in - the title, and present them relevance ranked, issue the &pqf; query: + the title, and present them relevance ranked, issue the &acro.pqf; query: @attr 2=102 @attr 1=4 Eoraptor - Dynamically ranking using &pqf; queries with the 'rank-1' + <title>Dynamically ranking using &acro.pqf; queries with the 'rank-1' algorithm @@ -1119,7 +1119,7 @@ Query Components - First, the boolean query is dismantled into it's principal components, + First, the boolean query is dismantled into its principal components, i.e. atomic queries where one term is looked up in one index. For example, the query @@ -1167,7 +1167,7 @@ It is possible to apply dynamic ranking on only parts of the - &pqf; query: + &acro.pqf; query: @and @attr 2=102 @attr 1=1010 Utah @attr 1=1018 Springer @@ -1202,7 +1202,7 @@ Ranking weights may be used to pass a value to a ranking - algorithm, using the non-standard &bib1; attribute type 9. + algorithm, using the non-standard &acro.bib1; attribute type 9. This allows one branch of a query to use one value while another branch uses a different one. For example, we can search for utah in the @@ -1214,7 +1214,7 @@ The default weight is - sqrt(1000) ~ 34 , as the &z3950; standard prescribes that the top score + sqrt(1000) ~ 34 , as the &acro.z3950; standard prescribes that the top score is 1000 and the bottom score is 0, encoded in integers. @@ -1339,7 +1339,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci @@ -1555,10 +1555,10 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci - Extended services in the &z3950; protocol + Extended services in the &acro.z3950; protocol - The &z3950; standard allows + The &acro.z3950; standard allows servers to accept special binary extended services protocol packages, which may be used to insert, update and delete records into servers. These carry control and update @@ -1566,7 +1566,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci - Extended services &z3950; Package Fields + Extended services &acro.z3950; Package Fields @@ -1594,13 +1594,13 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci record - &xml; string - An &xml; formatted string containing the record + &acro.xml; string + An &acro.xml; formatted string containing the record syntax 'xml' - Only &xml; record syntax is supported + Only &acro.xml; record syntax is supported recordIdOpaque @@ -1663,7 +1663,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci When retrieving existing - records indexed with &grs1; indexing filters, the &zebra; internal + records indexed with &acro.grs1; indexing filters, the &zebra; internal ID number is returned in the field /*/id:idzebra/localnumber in the namespace xmlns:id="http://www.indexdata.dk/zebra/", @@ -1712,7 +1712,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci ]]> Now the Default database was created, - we can insert an &xml; file (esdd0006.grs + we can insert an &acro.xml; file (esdd0006.grs from example/gils/records) and index it: rset) is the count of all documents in this speci Extended services from yaz-php - Extended services are also available from the &yaz; &php; client layer. An - example of an &yaz;-&php; extended service transaction is given here: + Extended services are also available from the &yaz; &acro.php; client layer. An + example of an &yaz;-&acro.php; extended service transaction is given here: A fine specimen of a record'; diff --git a/doc/architecture.xml b/doc/architecture.xml index cecd978..dca8925 100644 --- a/doc/architecture.xml +++ b/doc/architecture.xml @@ -1,5 +1,5 @@ - + Overview of &zebra; Architecture
@@ -87,9 +87,9 @@ Search Evaluation - by execution of search requests expressed in &pqf;/&rpn; + by execution of search requests expressed in &acro.pqf;/&acro.rpn; data structures, which are handed over from - the &yaz; server frontend &api;. Search evaluation includes + the &yaz; server frontend &acro.api;. Search evaluation includes construction of hit lists according to boolean combinations of simpler searches. Fast performance is achieved by careful use of index structures, and by evaluation specific index hit @@ -112,7 +112,7 @@ Record Presentation returns - possibly ranked - result sets, hit - numbers, and the like internal data to the &yaz; server backend &api; + numbers, and the like internal data to the &yaz; server backend &acro.api; for shipping to the client. Each individual filter module implements it's own specific presentation formats. @@ -149,7 +149,7 @@
&zebra; Searcher/Retriever - This is the executable which runs the &z3950;/&sru;/&srw; server and + This is the executable which runs the &acro.z3950;/&acro.sru;/&acro.srw; server and glues together the core libraries and the filter modules to one great Information Retrieval server application. @@ -163,21 +163,21 @@ &yaz; Server Frontend The &yaz; server frontend is - a full fledged stateful &z3950; server taking client + a full fledged stateful &acro.z3950; server taking client connections, and forwarding search and scan requests to the &zebra; core indexer. - In addition to &z3950; requests, the &yaz; server frontend acts + In addition to &acro.z3950; requests, the &yaz; server frontend acts as HTTP server, honoring - &sru; &soap; + &acro.sru; &acro.soap; requests, and - &sru; &rest; + &acro.sru; &acro.rest; requests. Moreover, it can translate incoming - &cql; + &acro.cql; queries to - &pqf; + &acro.pqf; queries, if correctly configured. @@ -185,7 +185,7 @@ &yaz; is an Open Source toolkit that allows you to develop software using the - &ansi; &z3950;/ISO23950 standard for information retrieval. + &acro.ansi; &acro.z3950;/ISO23950 standard for information retrieval. It is packaged in the Debian packages yaz and libyaz. @@ -208,21 +208,21 @@
- &dom; &xml; Record Model and Filter Module + &acro.dom; &acro.xml; Record Model and Filter Module - The &dom; &xml; filter uses a standard &dom; &xml; structure as + The &acro.dom; &acro.xml; filter uses a standard &acro.dom; &acro.xml; structure as internal data model, and can thus parse, index, and display - any &xml; document. + any &acro.xml; document. - A parser for binary &marc; records based on the ISO2709 library + A parser for binary &acro.marc; records based on the ISO2709 library standard is provided, it transforms these to the internal - &marcxml; &dom; representation. + &acro.marcxml; &acro.dom; representation. - The internal &dom; &xml; representation can be fed into four + The internal &acro.dom; &acro.xml; representation can be fed into four different pipelines, consisting of arbitraily many sucessive - &xslt; transformations; these are for + &acro.xslt; transformations; these are for input parsing and initial transformations, @@ -235,55 +235,55 @@ - The &dom; &xml; filter pipelines use &xslt; (and if supported on - your platform, even &exslt;), it brings thus full &xpath; + The &acro.dom; &acro.xml; filter pipelines use &acro.xslt; (and if supported on + your platform, even &acro.exslt;), it brings thus full &acro.xpath; support to the indexing, storage and display rules of not only - &xml; documents, but also binary &marc; records. + &acro.xml; documents, but also binary &acro.marc; records. - Finally, the &dom; &xml; filter allows for static ranking at index + Finally, the &acro.dom; &acro.xml; filter allows for static ranking at index time, and to to sort hit lists according to predefined static ranks. - Details on the experimental &dom; &xml; filter are found in + Details on the experimental &acro.dom; &acro.xml; filter are found in . The Debian package libidzebra-2.0-mod-dom - contains the &dom; filter module. + contains the &acro.dom; filter module.
- ALVIS &xml; Record Model and Filter Module + ALVIS &acro.xml; Record Model and Filter Module The functionality of this record model has been improved and - replaced by the &dom; &xml; record model. See + replaced by the &acro.dom; &acro.xml; record model. See . - The Alvis filter for &xml; files is an &xslt; based input + The Alvis filter for &acro.xml; files is an &acro.xslt; based input filter. - It indexes element and attribute content of any thinkable &xml; format - using full &xpath; support, a feature which the standard &zebra; - &grs1; &sgml; and &xml; filters lacked. The indexed documents are - parsed into a standard &xml; &dom; tree, which restricts record size + It indexes element and attribute content of any thinkable &acro.xml; format + using full &acro.xpath; support, a feature which the standard &zebra; + &acro.grs1; &acro.sgml; and &acro.xml; filters lacked. The indexed documents are + parsed into a standard &acro.xml; &acro.dom; tree, which restricts record size according to availability of memory. The Alvis filter - uses &xslt; display stylesheets, which let + uses &acro.xslt; display stylesheets, which let the &zebra; DB administrator associate multiple, different views on - the same &xml; document type. These views are chosen on-the-fly in + the same &acro.xml; document type. These views are chosen on-the-fly in search time. In addition, the Alvis filter configuration is not bound to the - arcane &bib1; &z3950; library catalogue indexing traditions and + arcane &acro.bib1; &acro.z3950; library catalogue indexing traditions and folklore, and is therefore easier to understand. @@ -296,7 +296,7 @@ their Pagerank algorithm. - Details on the experimental Alvis &xslt; filter are found in + Details on the experimental Alvis &acro.xslt; filter are found in . @@ -306,34 +306,34 @@
- &grs1; Record Model and Filter Modules + &acro.grs1; Record Model and Filter Modules The functionality of this record model has been improved and - replaced by the &dom; &xml; record model. See + replaced by the &acro.dom; &acro.xml; record model. See . - The &grs1; filter modules described in + The &acro.grs1; filter modules described in - are all based on the &z3950; specifications, and it is absolutely - mandatory to have the reference pages on &bib1; attribute sets on - you hand when configuring &grs1; filters. The GRS filters come in + are all based on the &acro.z3950; specifications, and it is absolutely + mandatory to have the reference pages on &acro.bib1; attribute sets on + you hand when configuring &acro.grs1; filters. The GRS filters come in different flavors, and a short introduction is needed here. - &grs1; filters of various kind have also been called ABS filters due + &acro.grs1; filters of various kind have also been called ABS filters due to the *.abs configuration file suffix. The grs.marc and grs.marcxml filters are suited to parse and - index binary and &xml; versions of traditional library &marc; records + index binary and &acro.xml; versions of traditional library &acro.marc; records based on the ISO2709 standard. The Debian package for both filters is libidzebra-2.0-mod-grs-marc. - &grs1; TCL scriptable filters for extensive user configuration come + &acro.grs1; TCL scriptable filters for extensive user configuration come in two flavors: a regular expression filter grs.regx using TCL regular expressions, and a general scriptable TCL filter called @@ -342,7 +342,7 @@ libidzebra-2.0-mod-grs-regx Debian package. - A general purpose &sgml; filter is called + A general purpose &acro.sgml; filter is called grs.sgml. This filter is not yet packaged, but planned to be in the libidzebra-2.0-mod-grs-sgml Debian package. @@ -352,8 +352,8 @@ libidzebra-2.0-mod-grs-xml includes the grs.xml filter which uses Expat to - parse records in &xml; and turn them into ID&zebra;'s internal &grs1; node - trees. Have also a look at the Alvis &xml;/&xslt; filter described in + parse records in &acro.xml; and turn them into ID&zebra;'s internal &acro.grs1; node + trees. Have also a look at the Alvis &acro.xml;/&acro.xslt; filter described in the next session.
@@ -394,8 +394,8 @@ When records are accessed by the system, they are represented - in their local, or native format. This might be &sgml; or HTML files, - News or Mail archives, &marc; records. If the system doesn't already + in their local, or native format. This might be &acro.sgml; or HTML files, + News or Mail archives, &acro.marc; records. If the system doesn't already know how to read the type of data you need to store, you can set up an input filter by preparing conversion rules based on regular expressions and possibly augmented by a flexible scripting language @@ -422,7 +422,7 @@ Before transmitting records to the client, they are first converted from the internal structure to a form suitable for exchange - over the network - according to the &z3950; standard. + over the network - according to the &acro.z3950; standard. @@ -444,7 +444,7 @@ In particular, the regular record filters are not invoked when these are in use. This can in some cases make the retrival faster than regular - retrieval operations (for &marc;, &xml; etc). + retrieval operations (for &acro.marc;, &acro.xml; etc).
Special Retrieval Elements @@ -460,7 +460,7 @@ zebra::meta::sysno Get &zebra; record system ID - &xml; and &sutrs; + &acro.xml; and &acro.sutrs; zebra::data @@ -470,12 +470,12 @@ zebra::meta Get &zebra; record internal metadata - &xml; and &sutrs; + &acro.xml; and &acro.sutrs; zebra::index Get all indexed keys for record - &xml; and &sutrs; + &acro.xml; and &acro.sutrs; @@ -484,7 +484,7 @@ Get indexed keys for field f for record - &xml; and &sutrs; + &acro.xml; and &acro.sutrs; @@ -494,7 +494,7 @@ Get indexed keys for field f and type t for record - &xml; and &sutrs; + &acro.xml; and &acro.sutrs; @@ -529,7 +529,7 @@ Z> elements zebra::meta::sysno Z> s 1+1 - displays in &xml; record syntax only internal + displays in &acro.xml; record syntax only internal record system number, whereas Z> f @attr 1=title my @@ -560,7 +560,7 @@ Z> s 1+1 will display all indexed tokens from all indexed fields of the - first record, and it will display in &sutrs; + first record, and it will display in &acro.sutrs; record syntax, whereas Z> f @attr 1=title my @@ -570,13 +570,13 @@ Z> elements zebra::index::title:p Z> s 1+1 - displays in &xml; record syntax only the content + displays in &acro.xml; record syntax only the content of the zebra string index title, or even only the type p phrase indexed part of it. - Trying to access numeric &bib1; use + Trying to access numeric &acro.bib1; use attributes or trying to access non-existent zebra intern string access points will result in a Diagnostic 25: Specified element set 'name not valid for specified database. diff --git a/doc/examples.xml b/doc/examples.xml index 2681945..6e0a7ac 100644 --- a/doc/examples.xml +++ b/doc/examples.xml @@ -1,5 +1,5 @@ - + Example Configurations @@ -61,12 +61,12 @@ - Example 1: &xml; Indexing And Searching + Example 1: &acro.xml; Indexing And Searching This example shows how &zebra; can be used with absolutely minimal configuration to index a body of - &xml; + &acro.xml; documents, and search them using XPath expressions to specify access points. @@ -81,14 +81,14 @@ records are generated from the family tree in the file dino.tree.) Type make records/dino.xml - to make the &xml; data file. - (Or you could just type make dino to build the &xml; + to make the &acro.xml; data file. + (Or you could just type make dino to build the &acro.xml; data file, create the database and populate it with the taxonomic records all in one shot - but then you wouldn't learn anything, would you? :-) - Now we need to create a &zebra; database to hold and index the &xml; + Now we need to create a &zebra; database to hold and index the &acro.xml; records. We do this with the &zebra; indexer, zebraidx, which is driven by the zebra.cfg configuration file. @@ -103,7 +103,7 @@ That's all you need for a minimal &zebra; configuration. Now you can - roll the &xml; records into the database and build the indexes: + roll the &acro.xml; records into the database and build the indexes: zebraidx update records @@ -121,8 +121,8 @@ . - Now you can use the &z3950; client program of your choice to execute - XPath-based boolean queries and fetch the &xml; records that satisfy + Now you can use the &acro.z3950; client program of your choice to execute + XPath-based boolean queries and fetch the &acro.xml; records that satisfy them: $ yaz-client @:9999 @@ -187,8 +187,8 @@ How, then, can we build broadcasting Information Retrieval applications that look for records in many different databases? - The &z3950; protocol offers a powerful and general solution to this: - abstract ``access points''. In the &z3950; model, an access point + The &acro.z3950; protocol offers a powerful and general solution to this: + abstract ``access points''. In the &acro.z3950; model, an access point is simply a point at which searches can be directed. Nothing is said about implementation: in a given database, an access point might be implemented as an index, a path into physical records, an @@ -198,7 +198,7 @@ For convenience, access points are gathered into attribute - sets. For example, the &bib1; attribute set is supposed to + sets. For example, the &acro.bib1; attribute set is supposed to contain bibliographic access points such as author, title, subject and ISBN; the GEO attribute set contains access points pertaining to geospatial information (bounding coordinates, stratum, latitude @@ -207,7 +207,7 @@ (provenance, inscriptions, etc.) - In practice, the &bib1; attribute set has tended to be a dumping + In practice, the &acro.bib1; attribute set has tended to be a dumping ground for all sorts of access points, so that, for example, it includes some geospatial access points as well as strictly bibliographic ones. Nevertheless, this model @@ -215,21 +215,21 @@ records in databases. - In the &bib1; attribute set, a taxon name is probably best + In the &acro.bib1; attribute set, a taxon name is probably best interpreted as a title - that is, a phrase that identifies the item - in question. &bib1; represents title searches by + in question. &acro.bib1; represents title searches by access point 4. (See - The &bib1; Attribute + The &acro.bib1; Attribute Set Semantics) So we need to configure our dinosaur database so that searches for - &bib1; access point 4 look in the + &acro.bib1; access point 4 look in the <termName> element, inside the top-level <Zthes> element. This is a two-step process. First, we need to tell &zebra; that we - want to support the &bib1; attribute set. Then we need to tell it + want to support the &acro.bib1; attribute set. Then we need to tell it which elements of its record pertain to access point 4. @@ -270,7 +270,7 @@ xelm /Zthes/termModifiedBy termModifiedBy:w - Declare &bib1; attribute set. See bib1.att in + Declare &acro.bib1; attribute set. See bib1.att in &zebra;'s tab directory. @@ -284,13 +284,13 @@ xelm /Zthes/termModifiedBy termModifiedBy:w Make termName word searchable by both - Zthes attribute termName (1002) and &bib1; atttribute title (4). + Zthes attribute termName (1002) and &acro.bib1; atttribute title (4). - After re-indexing, we can search the database using &bib1; + After re-indexing, we can search the database using &acro.bib1; attribute, title, as follows: Z> form xml @@ -305,7 +305,7 @@ Elapsed: 0.106896 Z> s Sent presentRequest (1+1). Records: 1 -[Default]Record type: &xml; +[Default]Record type: &acro.xml; <Zthes> <termId>2</termId> <termName>Eoraptor</termName> diff --git a/doc/installation.xml b/doc/installation.xml index 2603948..9f2314d 100644 --- a/doc/installation.xml +++ b/doc/installation.xml @@ -1,8 +1,8 @@ - + Installation - &zebra; is written in &ansi; C and was implemented with portability in mind. + &zebra; is written in &acro.ansi; C and was implemented with portability in mind. We primarily use GCC on UNIX and Microsoft Visual C++ on Windows. @@ -30,8 +30,8 @@ (required) - &zebra; uses &yaz; to support &z3950; / - &sru;. + &zebra; uses &yaz; to support &acro.z3950; / + &acro.sru;. Also the memory management utilites from &yaz; is used by &zebra;. @@ -52,7 +52,7 @@ (optional) - &xml; parser. If you're going to index real &xml; you should + &acro.xml; parser. If you're going to index real &acro.xml; you should install this (filter grs.xml). On most systems you should be able to find binary Expat packages. @@ -103,7 +103,7 @@ On Unix, GCC works fine, but any native C compiler should be possible to use as long as it is - &ansi; C compliant. + &acro.ansi; C compliant. @@ -160,7 +160,7 @@ zebrasrv - The &z3950; server and search engine. + The &acro.z3950; server and search engine. @@ -179,8 +179,8 @@ The .so-files are &zebra; record filter modules. There are modules for reading - &marc; (mod-grs-marc.so), - &xml; (mod-grs-xml.so) , etc. + &acro.marc; (mod-grs-marc.so), + &acro.xml; (mod-grs-xml.so) , etc. @@ -453,7 +453,7 @@ redirection to other fields. For example the following snippet of a custom custom/bib1.att - &bib1; attribute set definition file is no + &acro.bib1; attribute set definition file is no longer supported: att 1016 Any 1016,4,1005,62 @@ -465,7 +465,7 @@ Similar behaviour can be expressed in the new release by defining - a new index Any:w in all &grs1; + a new index Any:w in all &acro.grs1; *.abs record indexing configuration files. The above example configuration needs to make the changes from version 1.3.x indexing instructions @@ -486,13 +486,13 @@ att 1016 Body-of-text - with equivalent outcome without editing all &grs1; + with equivalent outcome without editing all &acro.grs1; *.abs record indexing configuration files. Server installations which use the special - &idxpath; attribute set must add the following + &acro.idxpath; attribute set must add the following line to the zebra.cfg configuration file: attset: idxpath.att diff --git a/doc/introduction.xml b/doc/introduction.xml index 37c5fd2..b88f4c9 100644 --- a/doc/introduction.xml +++ b/doc/introduction.xml @@ -1,5 +1,5 @@ - + Introduction
@@ -7,10 +7,10 @@ &zebra; is a free, fast, friendly information management system. It can - index records in &xml;/&sgml;, &marc;, e-mail archives and many other + index records in &acro.xml;/&acro.sgml;, &acro.marc;, e-mail archives and many other formats, and quickly find them using a combination of boolean searching and relevance ranking. Search-and-retrieve applications can - be written using &api;s in a wide variety of languages, communicating + be written using &acro.api;s in a wide variety of languages, communicating with the &zebra; server using industry-standard information-retrieval protocols or web services. @@ -21,11 +21,11 @@ &zebra; is a networked component which acts as a - reliable &z3950; server + reliable &acro.z3950; server for both record/document search, presentation, insert, update and - delete operations. In addition, it understands the &sru; family of - webservices, which exist in &rest; &get;/&post; and truly - &soap; flavors. + delete operations. In addition, it understands the &acro.sru; family of + webservices, which exist in &acro.rest; &acro.get;/&acro.post; and truly + &acro.soap; flavors. &zebra; is available as MS Windows 2003 Server (32 bit) self-extracting @@ -44,7 +44,7 @@ &zebra; is a high-performance, general-purpose structured text indexing and retrieval engine. It reads records in a - variety of input formats (eg. email, &xml;, &marc;) and provides access + variety of input formats (eg. email, &acro.xml;, &acro.marc;) and provides access to them through a powerful combination of boolean search expressions and relevance-ranked free-text queries. @@ -53,13 +53,13 @@ &zebra; supports large databases (tens of millions of records, tens of gigabytes of data). It allows safe, incremental database updates on live systems. Because &zebra; supports - the industry-standard information retrieval protocol, &z3950;, + the industry-standard information retrieval protocol, &acro.z3950;, you can search &zebra; databases using an enormous variety of programs and toolkits, both commercial and free, which understand this protocol. Application libraries are available to allow bespoke clients to be written in Perl, C, C++, Java, Tcl, Visual - Basic, Python, &php; and more - see the - &zoom; web site + Basic, Python, &acro.php; and more - see the + &acro.zoom; web site for more information on some of these client toolkits. @@ -118,20 +118,20 @@
Complex semi-structured Documents - &xml; and &grs1; Documents - Both &xml; and &grs1; documents exhibit a &dom; like internal + &acro.xml; and &acro.grs1; Documents + Both &acro.xml; and &acro.grs1; documents exhibit a &acro.dom; like internal representation allowing for complex indexing and display rules and Input document formats - &xml;, &sgml;, Text, ISO2709 (&marc;) + &acro.xml;, &acro.sgml;, Text, ISO2709 (&acro.marc;) A system of input filters driven by regular expressions allows most ASCII-based data formats to be easily processed. - &sgml;, &xml;, ISO2709 (&marc;), and raw text are also + &acro.sgml;, &acro.xml;, ISO2709 (&acro.marc;), and raw text are also supported. @@ -171,25 +171,25 @@ Query languages - &cql; and &rpn;/&pqf; - The type-1 Reverse Polish Notation (&rpn;) - and it's textual representation Prefix Query Format (&pqf;) are - supported. The Common Query Language (&cql;) can be configured as - a mapping from &cql; to &rpn;/&pqf; + &acro.cql; and &acro.rpn;/&acro.pqf; + The type-1 Reverse Polish Notation (&acro.rpn;) + and its textual representation Prefix Query Format (&acro.pqf;) are + supported. The Common Query Language (&acro.cql;) can be configured as + a mapping from &acro.cql; to &acro.rpn;/&acro.pqf; and Complex boolean query tree - &cql; and &rpn;/&pqf; - Both &cql; and &rpn;/&pqf; allow atomic query parts (&apt;) to + &acro.cql; and &acro.rpn;/&acro.pqf; + Both &acro.cql; and &acro.rpn;/&acro.pqf; allow atomic query parts (&acro.apt;) to be combined into complex boolean query trees Field search user defined - Atomic query parts (&apt;) are either general, or + Atomic query parts (&acro.apt;) are either general, or directed at user-specified document fields , @@ -335,21 +335,21 @@ - &xml; document transformations - &xslt; based + &acro.xml; document transformations + &acro.xslt; based Record presentation can be performed in many - pre-defined &xml; data - formats, where the original &xml; records are on-the-fly transformed - through any preconfigured &xslt; transformation. It is therefore - trivial to present records in short/full &xml; views, transforming to - RSS, Dublin Core, or other &xml; based data formats, or transform + pre-defined &acro.xml; data + formats, where the original &acro.xml; records are on-the-fly transformed + through any preconfigured &acro.xslt; transformation. It is therefore + trivial to present records in short/full &acro.xml; views, transforming to + RSS, Dublin Core, or other &acro.xml; based data formats, or transform records to XHTML snippets ready for inserting in XHTML pages. Binary record transformations - &marc;, &usmarc;, &marc21; and &marcxml; + &acro.marc;, &acro.usmarc;, &acro.marc21; and &acro.marcxml; post-filter record transformations @@ -357,8 +357,8 @@ Record Syntaxes Multiple record syntaxes - for data retrieval: &grs1;, &sutrs;, - &xml;, ISO2709 (&marc;), etc. Records can be mapped between + for data retrieval: &acro.grs1;, &acro.sutrs;, + &acro.xml;, ISO2709 (&acro.marc;), etc. Records can be mapped between record syntaxes and schemas on the fly. @@ -366,7 +366,7 @@ &zebra; internal metadatayes &zebra; internal document metadata can be fetched in - &sutrs; and &xml; record syntaxes. Those are useful in client + &acro.sutrs; and &acro.xml; record syntaxes. Those are useful in client applications. @@ -374,7 +374,7 @@ &zebra; internal raw record datayes &zebra; internal raw, binary record data can be fetched in - &sutrs; and &xml; record syntaxes, leveraging %zebra; to a + &acro.sutrs; and &acro.xml; record syntaxes, leveraging %zebra; to a binary storage system @@ -382,7 +382,7 @@ &zebra; internal record field datayes &zebra; internal record field data can be fetched in - &sutrs; and &xml; record syntaxes. This makes very fast minimal + &acro.sutrs; and &acro.xml; record syntaxes. This makes very fast minimal record data displays possible. @@ -479,9 +479,9 @@ Remote updates - &z3950; extended services + &acro.z3950; extended services Updates can be performed from remote locations using the - &z3950; extended services. Access to extended services can be + &acro.z3950; extended services. Access to extended services can be login-password protected. and @@ -523,14 +523,14 @@ Fundamental operations - &z3950;/&sru; explain, + &acro.z3950;/&acro.sru; explain, search, scan, and update - &z3950; protocol support + &acro.z3950; protocol support yes Protocol facilities supported are: init, search, @@ -539,18 +539,18 @@ delete, scan (index browsing), sort, close and support for the update - Extended Service to add or replace an existing &xml; + Extended Service to add or replace an existing &acro.xml; record. Piggy-backed presents are honored in the search request. Named result sets are supported. Web Service support - &sru_gps; + &acro.sru; The protocol operations explain, searchRetrieve and scan - are supported. &cql; to internal - query model &rpn; + are supported. &acro.cql; to internal + query model &acro.rpn; conversion is supported. Extended RPN queries for search/retrieve and scan are supported. @@ -716,7 +716,7 @@ In early 2005, the Koha project development team began looking at - ways to improve &marc; support and overcome scalability limitations + ways to improve &acro.marc; support and overcome scalability limitations in the Koha 2.x series. After extensive evaluations of the best of the Open Source textual database engines - including MySQL full-text searching, PostgreSQL, Lucene and Plucene - the team @@ -735,7 +735,7 @@ and relevance-ranked free-text queries, both of which the Koha 2.x series lack. &zebra; also supports incremental and safe database updates, which allow on-the-fly record - management. Finally, since &zebra; has at its heart the &z3950; + management. Finally, since &zebra; has at its heart the &acro.z3950; protocol, it greatly improves Koha's support for that critical library standard." @@ -765,12 +765,12 @@ from virtually any computer with an Internet connection, has template based layout allowing anyone to alter the visual appearance of Emilda, and is - &xml; based language for fast and easy portability to virtually any + &acro.xml; based language for fast and easy portability to virtually any language. Currently, Emilda is used at three schools in Espoo, Finland. - As a surplus, 100% &marc; compatibility has been achieved using the + As a surplus, 100% &acro.marc; compatibility has been achieved using the &zebra; Server from Index Data as backend server. @@ -782,18 +782,18 @@ is a netbased library service offering all traditional functions on a very high level plus many new services. Reindex.net is a comprehensive and powerful WEB system - based on standards such as &xml; and &z3950;. - updates. Reindex supports &marc21;, dan&marc; eller Dublin Core with + based on standards such as &acro.xml; and &acro.z3950;. + updates. Reindex supports &acro.marc21;, dan&acro.marc; eller Dublin Core with UTF8-encoding. Reindex.net runs on GNU/Debian Linux with &zebra; and Simpleserver from Index Data for bibliographic data. The relational database system - Sybase 9 &xml; is used for + Sybase 9 &acro.xml; is used for administrative data. - Internally &marcxml; is used for bibliographical records. Update - utilizes &z3950; extended services. + Internally &acro.marcxml; is used for bibliographical records. Update + utilizes &acro.z3950; extended services. @@ -858,8 +858,8 @@ The &zebra; information retrieval indexing machine is used inside the Alvis framework to manage huge collections of natural language processed and - enhanced &xml; data, coming from a topic relevant web crawl. - In this application, &zebra; swallows and manages 37GB of &xml; data + enhanced &acro.xml; data, coming from a topic relevant web crawl. + In this application, &zebra; swallows and manages 37GB of &acro.xml; data in about 4 hours, resulting in search times of fractions of seconds. @@ -881,9 +881,9 @@ The member libraries send in data files representing their periodicals, including both brief bibliographic data and summary - holdings. Then 21 individual &z3950; targets are created, each + holdings. Then 21 individual &acro.z3950; targets are created, each using &zebra;, and all mounted on the single hardware server. - The live service provides a web gateway allowing &z3950; searching + The live service provides a web gateway allowing &acro.z3950; searching of all of the targets or a selection of them. &zebra;'s small footprint allows a relatively modest system to comfortably host the 21 servers. @@ -895,7 +895,7 @@
- NLI-&z3950; - a Natural Language Interface for Libraries + NLI-&acro.z3950; - a Natural Language Interface for Libraries Fernuniversität Hagen in Germany have developed a natural language interface for access to library databases. @@ -904,8 +904,8 @@ In order to evaluate this interface for recall and precision, they chose &zebra; as the basis for retrieval effectiveness. The &zebra; server contains a copy of the GIRT database, consisting of more - than 76000 records in &sgml; format (bibliographic records from - social science), which are mapped to &marc; for presentation. + than 76000 records in &acro.sgml; format (bibliographic records from + social science), which are mapped to &acro.marc; for presentation. (GIRT is the German Indexing and Retrieval Testdatabase. It is a diff --git a/doc/marc_indexing.xml b/doc/marc_indexing.xml index 75a159f..c8ec14f 100644 --- a/doc/marc_indexing.xml +++ b/doc/marc_indexing.xml @@ -1,38 +1,38 @@ - - + - Indexing of &marc; records by &zebra; + Indexing of &acro.marc; records by &zebra; - &zebra; is suitable for distribution of &marc; records via &z3950;. We - have a several possibilities to describe the indexing process of &marc; records. + &zebra; is suitable for distribution of &acro.marc; records via &acro.z3950;. We + have a several possibilities to describe the indexing process of &acro.marc; records. This document shows these possibilities. - Simple indexing of &marc; records + Simple indexing of &acro.marc; records Simple indexing is not described yet. - Extended indexing of &marc; records + Extended indexing of &acro.marc; records -Extended indexing of &marc; records will help you if you need index a +Extended indexing of &acro.marc; records will help you if you need index a combination of subfields, or index only a part of the whole field, -or use during indexing process embedded fields of &marc; record. +or use during indexing process embedded fields of &acro.marc; record. -Extended indexing of &marc; records additionally allows: +Extended indexing of &acro.marc; records additionally allows: -to index data in LEADER of &marc; record +to index data in LEADER of &acro.marc; record @@ -44,23 +44,23 @@ or use during indexing process embedded fields of &marc; record. -to index linked fields for UNI&marc; based formats +to index linked fields for UNI&acro.marc; based formats In compare with simple indexing process the extended indexing -may increase (about 2-3 times) the time of indexing process for &marc; +may increase (about 2-3 times) the time of indexing process for &acro.marc; records. The index-formula At the beginning, we have to define the term index-formula -for &marc; records. This term helps to understand the notation of extended indexing of MARC records +for &acro.marc; records. This term helps to understand the notation of extended indexing of MARC records by &zebra;. Our definition is based on the document "The -table of conformity for &z3950; use attributes and R&usmarc; fields". +table of conformity for &acro.z3950; use attributes and R&acro.usmarc; fields". The document is available only in russian language. The index-formula is the combination of subfields presented in such way: @@ -69,7 +69,7 @@ The document is available only in russian language. 71-00$a, $g, $h ($c){.$b ($c)} , (1) -We know that &zebra; supports a &bib1; attribute - right truncation. +We know that &zebra; supports a &acro.bib1; attribute - right truncation. In this case, the index-formula (1) consists from forms, defined in the same way as (1) @@ -79,7 +79,7 @@ forms, defined in the same way as (1) 71-00$a -The original &marc; record may be without some elements, which included in index-formula. +The original &acro.marc; record may be without some elements, which included in index-formula. This notation includes such operands as: @@ -92,7 +92,7 @@ forms, defined in the same way as (1) - - The position may contain any value, defined by &marc; format. + The position may contain any value, defined by &acro.marc; format. For example, index-formula @@ -132,7 +132,7 @@ forms, defined in the same way as (1) -All another operands are the same as accepted in &marc; world. +All another operands are the same as accepted in &acro.marc; world. @@ -146,7 +146,7 @@ forms, defined in the same way as (1) (.abs file). It means that names beginning with "mc-" are interpreted by &zebra; as index-formula. The database index is created and -linked with access point (&bib1; use attribute) +linked with access point (&acro.bib1; use attribute) according to this formula. For example, index-formula @@ -172,7 +172,7 @@ mc-71.00_$a,_$g,_$h_(_$c_){.$b_(_$c_)} . -The position may contain any value, defined by &marc; format. For example, +The position may contain any value, defined by &acro.marc; format. For example, index-formula @@ -232,7 +232,7 @@ includes -All another operands are the same as accepted in &marc; world. +All another operands are the same as accepted in &acro.marc; world. @@ -265,7 +265,7 @@ elm mc-ldr[7] Bib-level ! elm mc-008[0-5] Date/time-added-to-db ! -or for R&usmarc; (this data included in 100th field) +or for R&acro.usmarc; (this data included in 100th field) elm mc-100___$a[0-7]_ Date/time-added-to-db ! @@ -277,7 +277,7 @@ elm mc-100___$a[0-7]_ Date/time-added-to-db ! using indicators while indexing -For R&usmarc; index-formula +For R&acro.usmarc; index-formula 70-#1$a, $g matches @@ -293,9 +293,9 @@ indexed. -indexing embedded (linked) fields for UNI&marc; based formats +indexing embedded (linked) fields for UNI&acro.marc; based formats -For R&usmarc; index-formula +For R&acro.usmarc; index-formula 4--#-$170-#1$a, $g ($c) matches diff --git a/doc/querymodel.xml b/doc/querymodel.xml index cbdf902..0272966 100644 --- a/doc/querymodel.xml +++ b/doc/querymodel.xml @@ -1,5 +1,5 @@ - + Query Model
@@ -11,18 +11,18 @@ &zebra; is born as a networking Information Retrieval engine adhering to the international standards - &z3950; and - &sru;, + &acro.z3950; and + &acro.sru;, and implement the - type-1 Reverse Polish Notation (&rpn;) query + type-1 Reverse Polish Notation (&acro.rpn;) query model defined there. Unfortunately, this model has only defined a binary encoded representation, which is used as transport packaging in - the &z3950; protocol layer. This representation is not human + the &acro.z3950; protocol layer. This representation is not human readable, nor defines any convenient way to specify queries. - Since the type-1 (&rpn;) + Since the type-1 (&acro.rpn;) query structure has no direct, useful string representation, every client application needs to provide some form of mapping from a local query notation or representation to it. @@ -30,33 +30,33 @@
- Prefix Query Format (&pqf;) + Prefix Query Format (&acro.pqf;) Index Data has defined a textual representation in the Prefix Query Format, short - &pqf;, which maps + &acro.pqf;, which maps one-to-one to binary encoded - type-1 &rpn; queries. - &pqf; has been adopted by other - parties developing &z3950; software, and is often referred to as + type-1 &acro.rpn; queries. + &acro.pqf; has been adopted by other + parties developing &acro.z3950; software, and is often referred to as Prefix Query Notation, or in short - &pqn;. See + &acro.pqn;. See for further explanations and descriptions of &zebra;'s capabilities.
- Common Query Language (&cql;) + Common Query Language (&acro.cql;) - The query model of the type-1 &rpn;, - expressed in &pqf;/&pqn; is natively supported. - On the other hand, the default &sru; + The query model of the type-1 &acro.rpn;, + expressed in &acro.pqf;/&acro.pqn; is natively supported. + On the other hand, the default &acro.sru; web services Common Query Language - &cql; is not natively supported. + &acro.cql; is not natively supported. - &zebra; can be configured to understand and map &cql; to &pqf;. See + &zebra; can be configured to understand and map &acro.cql; to &acro.pqf;. See .
@@ -67,7 +67,7 @@ Operation types &zebra; supports all of the three different - &z3950;/&sru; operations defined in the + &acro.z3950;/&acro.sru; operations defined in the standards: explain, search, and scan. A short description of the functionality and purpose of each is quite in order here. @@ -76,7 +76,7 @@
Explain Operation - The syntax of &z3950;/&sru; queries is + The syntax of &acro.z3950;/&acro.sru; queries is well known to any client, but the specific semantics - taking into account a particular servers functionalities and abilities - must be @@ -89,15 +89,15 @@ of the general query model are supported. - The &z3950; embeds the explain operation + The &acro.z3950; embeds the explain operation by performing a search in the magic IR-Explain-1 database; see . - In &sru;, explain is an entirely separate - operation, which returns an ZeeRex &xml; record according to the + In &acro.sru;, explain is an entirely separate + operation, which returns an ZeeRex &acro.xml; record according to the structure defined by the protocol. @@ -117,7 +117,7 @@ simple free text searches to nested complex boolean queries, targeting specific indexes, and possibly enhanced with many query semantic specifications. Search interactions are the heart - and soul of &z3950;/&sru; servers. + and soul of &acro.z3950;/&acro.sru; servers.
@@ -145,24 +145,24 @@
- &rpn; queries and semantics + &acro.rpn; queries and semantics - The &pqf; grammar + The &acro.pqf; grammar is documented in the &yaz; manual, and shall not be - repeated here. This textual &pqf; representation + repeated here. This textual &acro.pqf; representation is not transmistted to &zebra; during search, but it is in the - client mapped to the equivalent &z3950; binary + client mapped to the equivalent &acro.z3950; binary query parse tree.
- &rpn; tree structure + &acro.rpn; tree structure - The &rpn; parse tree - or the equivalent textual representation in &pqf; - + The &acro.rpn; parse tree - or the equivalent textual representation in &acro.pqf; - may start with one specification of the attribute set used. Following is a query tree, which - consists of atomic query parts (&apt;) or + consists of atomic query parts (&acro.apt;) or named result sets, eventually paired by boolean binary operators, and finally recursively combined into @@ -184,7 +184,7 @@
Attribute set - &pqf; notation (Short hand) + &acro.pqf; notation (Short hand) Status Notes @@ -201,10 +201,10 @@ predefined - &bib1; + &acro.bib1; bib-1 - Standard &pqf; query language attribute set which defines the - semantics of &z3950; searching. In addition, all of the + Standard &acro.pqf; query language attribute set which defines the + semantics of &acro.z3950; searching. In addition, all of the non-use attributes (types 2-12) define the hard-wired &zebra; internal query processing. @@ -213,15 +213,15 @@ GILS gils - Extension to the &bib1; attribute set. + Extension to the &acro.bib1; attribute set. predefined @@ -238,7 +238,7 @@ The &zebra; internal query processing is modeled after - the &bib1; attribute set, and the non-use + the &acro.bib1; attribute set, and the non-use attributes type 2-6 are hard-wired in. It is therefore essential to be familiar with . @@ -317,7 +317,7 @@ retrieval, taking proximity into account: The hit set is a subset of the corresponding AND query - (see the &pqf; grammar for + (see the &acro.pqf; grammar for details on the proximity operator): Z> find @prox 0 3 0 2 k 2 information retrieval @@ -338,23 +338,23 @@
- Atomic queries (&apt;) + Atomic queries (&acro.apt;) Atomic queries are the query parts which work on one access point only. These consist of an attribute list followed by a single term or a quoted term list, and are often called - Attributes-Plus-Terms (&apt;) queries. + Attributes-Plus-Terms (&acro.apt;) queries. - Atomic (&apt;) queries are always leaf nodes in the &pqf; query tree. + Atomic (&acro.apt;) queries are always leaf nodes in the &acro.pqf; query tree. UN-supplied non-use attributes types 2-12 are either inherited from higher nodes in the query tree, or are set to &zebra;'s default values. See for details.
- Atomic queries (&apt;) + Atomic queries (&acro.apt;) @@ -407,7 +407,7 @@ The scan operation is only supported with - atomic &apt; queries, as it is bound to one access point at a + atomic &acro.apt; queries, as it is bound to one access point at a time. Boolean query trees are not allowed during scan. @@ -429,8 +429,8 @@ Named result sets are supported in &zebra;, and result sets can be used as operands without limitations. It follows that named - result sets are leaf nodes in the &pqf; query tree, exactly as - atomic &apt; queries are. + result sets are leaf nodes in the &acro.pqf; query tree, exactly as + atomic &acro.apt; queries are. After the execution of a search, the result set is available at @@ -460,10 +460,10 @@ - Named result sets are only supported by the &z3950; protocol. - The &sru; web service is stateless, and therefore the notion of + Named result sets are only supported by the &acro.z3950; protocol. + The &acro.sru; web service is stateless, and therefore the notion of named result sets does not exist when accessing a &zebra; server by - the &sru; protocol. + the &acro.sru; protocol. @@ -483,7 +483,7 @@ Finding all documents which have the term list "information - retrieval" in an &zebra; index, using it's internal full string + retrieval" in an &zebra; index, using its internal full string name. Scanning the same index. Z> find @attr 1=sometext "information retrieval" @@ -492,7 +492,7 @@ Searching or scanning - the bib-1 use attribute 54 using it's string name: + the bib-1 use attribute 54 using its string name: Z> find @attr 1=Code-language eng Z> scan @attr 1=Code-language "" @@ -501,7 +501,7 @@ It is possible to search in any silly string index - if it's defined in your - indexation rules and can be parsed by the &pqf; parser. + indexation rules and can be parsed by the &acro.pqf; parser. This is definitely not the recommended use of this facility, as it might confuse your users with some very unexpected results. @@ -512,14 +512,14 @@ See also for details, and - for the &sru; &pqf; query extension using string names as a fast + for the &acro.sru; &acro.pqf; query extension using string names as a fast debugging facility.
&zebra;'s special access point of type 'XPath' - for &grs1; filters + for &acro.grs1; filters As we have seen above, it is possible (albeit seldom a great idea) to emulate @@ -531,15 +531,15 @@ be defined at indexation time, no new undefined XPath queries can entered at search time, and second, it might confuse users very much that an XPath-alike index name in fact - gets populated from a possible entirely different &xml; element + gets populated from a possible entirely different &acro.xml; element than it pretends to access. - When using the &grs1; Record Model + When using the &acro.grs1; Record Model (see ), we have the possibility to embed life XPath expressions - in the &pqf; queries, which are here called + in the &acro.pqf; queries, which are here called use (type 1) xpath attributes. You must enable the xpath enable directive in your @@ -549,14 +549,14 @@ Only a very restricted subset of the XPath 1.0 - standard is supported as the &grs1; record model is simpler than - a full &xml; &dom; structure. See the following examples for + standard is supported as the &acro.grs1; record model is simpler than + a full &acro.xml; &acro.dom; structure. See the following examples for possibilities. Finding all documents which have the term "content" - inside a text node found in a specific &xml; &dom; + inside a text node found in a specific &acro.xml; &acro.dom; subtree, whose starting element is addressed by XPath. @@ -586,7 +586,7 @@ Filter the addressing XPath by a predicate working on exact string values in - attributes (in the &xml; sense) can be done: return all those docs which + attributes (in the &acro.xml; sense) can be done: return all those docs which have the term "english" contained in one of all text sub nodes of the subtree defined by the XPath /record/title[@lang='en']. And similar @@ -607,8 +607,8 @@ - Escaping &pqf; keywords and other non-parseable XPath constructs - with '{ }' to prevent client-side &pqf; parsing + Escaping &acro.pqf; keywords and other non-parseable XPath constructs + with '{ }' to prevent client-side &acro.pqf; parsing syntax errors: Z> find @attr {1=/root/first[@attr='danish']} content @@ -630,7 +630,7 @@
Explain Attribute Set - The &z3950; standard defines the + The &acro.z3950; standard defines the Explain attribute set Exp-1, which is used to discover information about a server's search semantics and functional capabilities @@ -644,11 +644,11 @@ In addition, the non-Use - &bib1; attributes, that is, the types + &acro.bib1; attributes, that is, the types Relation, Position, Structure, Truncation, and Completeness are imported from - the &bib1; attribute set, and may be used + the &acro.bib1; attribute set, and may be used within any explain query. @@ -669,7 +669,7 @@ See tab/explain.att and the - &z3950; standard + &acro.z3950; standard for more information.
@@ -678,11 +678,11 @@ Explain searches with yaz-client Classic Explain only defines retrieval of Explain information - via ASN.1. Practically no &z3950; clients supports this. Fortunately + via ASN.1. Practically no &acro.z3950; clients supports this. Fortunately they don't have to - &zebra; allows retrieval of this information in other formats: - &sutrs;, &xml;, - &grs1; and ASN.1 Explain. + &acro.sutrs;, &acro.xml;, + &acro.grs1; and ASN.1 Explain. @@ -743,7 +743,7 @@ Default. This query is very useful to study the internal &zebra; indexes. If records have been indexed using the alvis - &xslt; filter, the string representation names of the known indexes can be + &acro.xslt; filter, the string representation names of the known indexes can be found. Z> base IR-Explain-1 @@ -760,13 +760,13 @@
- &bib1; Attribute Set + &acro.bib1; Attribute Set Most of the information contained in this section is an excerpt of - the ATTRIBUTE SET &bib1; (&z3950;-1995) SEMANTICS - found at . The &bib1; + the ATTRIBUTE SET &acro.bib1; (&acro.z3950;-1995) SEMANTICS + found at . The &acro.bib1; Attribute Set Semantics from 1995, also in an updated - &bib1; + &acro.bib1; Attribute Set version from 2003. Index Data is not the copyright holder of this information, except for the configuration details, the listing of @@ -788,7 +788,7 @@ tab/gils.att. - For example, some few &bib1; use + For example, some few &acro.bib1; use attributes from the tab/bib1.att are: att 1 Personal-name @@ -979,7 +979,7 @@ AlwaysMatches (103) is a great way to discover how many documents have been indexed in a given field. The search term is ignored, but needed for correct - &pqf; syntax. An empty search term may be supplied. + &acro.pqf; syntax. An empty search term may be supplied. Z> find @attr 1=Title @attr 2=103 "" Z> find @attr 1=Title @attr 2=103 @attr 4=1 "" @@ -1159,7 +1159,7 @@ is supported, and maps to the boolean AND combination of words supplied. The word list is useful when google-like bag-of-word queries need to be translated from a GUI - query language to &pqf;. For example, the following queries + query language to &acro.pqf;. For example, the following queries are equivalent: Z> find @attr 1=Title @attr 4=6 "mozart amadeus" @@ -1213,7 +1213,7 @@ - The exact mapping between &pqf; queries and &zebra; internal indexes + The exact mapping between &acro.pqf; queries and &zebra; internal indexes and index types is explained in . @@ -1408,14 +1408,14 @@ The Complete subfield (2) is a reminiscens - from the happy &marc; + from the happy &acro.marc; binary format days. &zebra; does not support it, but maps silently to Complete field (3). - The exact mapping between &pqf; queries and &zebra; internal indexes + The exact mapping between &acro.pqf; queries and &zebra; internal indexes and index types is explained in . @@ -1427,7 +1427,7 @@
- Extended &zebra; &rpn; Features + Extended &zebra; &acro.rpn; Features The &zebra; internal query engine has been extended to specific needs not covered by the bib-1 attribute set query @@ -1478,7 +1478,7 @@
- &dom; &xml; filter pipelines overview + &acro.dom; &acro.xml; filter pipelines overview @@ -78,26 +78,26 @@ input first input parsing and initial - transformations to common &xml; format - Input raw &xml; record buffers, &xml; streams and - binary &marc; buffers - Common &xml; &dom; + transformations to common &acro.xml; format + Input raw &acro.xml; record buffers, &acro.xml; streams and + binary &acro.marc; buffers + Common &acro.xml; &acro.dom; extract second indexing term extraction transformations - Common &xml; &dom; - Indexing &xml; &dom; + Common &acro.xml; &acro.dom; + Indexing &acro.xml; &acro.dom; store second transformations before internal document storage - Common &xml; &dom; - Storage &xml; &dom; + Common &acro.xml; &acro.dom; + Storage &acro.xml; &acro.dom; retrieve @@ -105,40 +105,40 @@ multiple document retrieve transformations from storage to different output formats are possible - Storage &xml; &dom; - Output &xml; syntax in requested formats + Storage &acro.xml; &acro.dom; + Output &acro.xml; syntax in requested formats
- The &dom; &xml; filter pipelines use &xslt; (and if supported on - your platform, even &exslt;), it brings thus full &xpath; + The &acro.dom; &acro.xml; filter pipelines use &acro.xslt; (and if supported on + your platform, even &acro.exslt;), it brings thus full &acro.xpath; support to the indexing, storage and display rules of not only - &xml; documents, but also binary &marc; records. + &acro.xml; documents, but also binary &acro.marc; records.
- &dom; &xml; filter pipeline configuration + &acro.dom; &acro.xml; filter pipeline configuration - The experimental, loadable &dom; &xml;/&xslt; filter module + The experimental, loadable &acro.dom; &acro.xml;/&acro.xslt; filter module mod-dom.so is invoked by the zebra.cfg configuration statement recordtype.xml: dom.db/filter_dom_conf.xml - In this example the &dom; &xml; filter is configured to work + In this example the &acro.dom; &acro.xml; filter is configured to work on all data files with suffix *.xml, where the configuration file is found in the path db/filter_dom_conf.xml. - The &dom; &xslt; filter configuration file must be - valid &xml;. It might look like this: + The &acro.dom; &acro.xslt; filter configuration file must be + valid &acro.xml;. It might look like this: @@ -164,8 +164,8 @@ - The root &xml; element <dom> and all other &dom; - &xml; filter elements are residing in the namespace + The root &acro.xml; element <dom> and all other &acro.dom; + &acro.xml; filter elements are residing in the namespace xmlns="http://indexdata.dk/zebra-2.0". @@ -180,7 +180,7 @@ All pipeline definition elements may contain zero or more ]]> - &xslt; transformation instructions, which are performed + &acro.xslt; transformation instructions, which are performed sequentially from top to bottom. The paths in the stylesheet attributes are relative to zebras working directory, or absolute to the file @@ -192,22 +192,22 @@ Input pipeline The <input> pipeline definition element - may contain either one &xml; Reader definition + may contain either one &acro.xml; Reader definition ]]>, used to split - an &xml; collection input stream into individual &xml; &dom; + an &acro.xml; collection input stream into individual &acro.xml; &acro.dom; documents at the prescribed element level, - or one &marc; binary + or one &acro.marc; binary parsing instruction ]]>, which defines - a conversion to &marcxml; format &dom; trees. The allowed values + a conversion to &acro.marcxml; format &acro.dom; trees. The allowed values of the inputcharset attribute depend on your local iconv set-up. - Both input parsers deliver individual &dom; &xml; documents to the + Both input parsers deliver individual &acro.dom; &acro.xml; documents to the following chain of zero or more ]]> - &xslt; transformations. At the end of this pipeline, the documents + &acro.xslt; transformations. At the end of this pipeline, the documents are in the common format, used to feed both the <extract> and <store> pipelines. @@ -218,11 +218,11 @@ Extract pipeline The <extract> pipeline takes documents - from any common &dom; &xml; format to the &zebra; specific - indexing &dom; &xml; format. + from any common &acro.dom; &acro.xml; format to the &zebra; specific + indexing &acro.dom; &acro.xml; format. It may consist of zero ore more ]]> - &xslt; transformations, and the outcome is handled to the + &acro.xslt; transformations, and the outcome is handled to the &zebra; core to drive the process of building the inverted indexes. See for @@ -233,11 +233,11 @@
Store pipeline The <store> pipeline takes documents - from any common &dom; &xml; format to the &zebra; specific - storage &dom; &xml; format. + from any common &acro.dom; &acro.xml; format to the &zebra; specific + storage &acro.dom; &acro.xml; format. It may consist of zero ore more ]]> - &xslt; transformations, and the outcome is handled to the + &acro.xslt; transformations, and the outcome is handled to the &zebra; core for deposition into the internal storage system.
@@ -248,9 +248,9 @@ <retrieve> pipeline definitions, each of them again consisting of zero or more ]]> - &xslt; transformations. These are used for document - presentation after search, and take the internal storage &dom; - &xml; to the requested output formats during record present + &acro.xslt; transformations. These are used for document + presentation after search, and take the internal storage &acro.dom; + &acro.xml; to the requested output formats during record present requests.
@@ -259,9 +259,9 @@ are distinguished by their unique name attributes, these are the literal schema or element set names used in - &srw;, - &sru; and - &z3950; protocol queries. + &acro.srw;, + &acro.sru; and + &acro.z3950; protocol queries.
@@ -270,10 +270,10 @@ Canonical Indexing Format - &dom; &xml; indexing comes in two flavors: pure - processing-instruction governed plain &xml; documents, and - very - similar to the Alvis filter indexing format - &xml; documents - containing &xml; <record> and + &acro.dom; &acro.xml; indexing comes in two flavors: pure + processing-instruction governed plain &acro.xml; documents, and - very + similar to the Alvis filter indexing format - &acro.xml; documents + containing &acro.xml; <record> and <index> instructions from the magic namespace xmlns:z="http://indexdata.dk/zebra-2.0". @@ -282,11 +282,11 @@ Processing-instruction governed indexing format The output of the processing instruction driven - indexing &xslt; stylesheets must contain + indexing &acro.xslt; stylesheets must contain processing instructions named zebra-2.0. - The output of the &xslt; indexing transformation is then - parsed using &dom; methods, and the contained instructions are + The output of the &acro.xslt; indexing transformation is then + parsed using &acro.dom; methods, and the contained instructions are performed on the elements and their subtrees directly following the processing instructions. @@ -314,11 +314,11 @@
Magic element governed indexing format - The output of the indexing &xslt; stylesheets must contain + The output of the indexing &acro.xslt; stylesheets must contain certain elements in the magic xmlns:z="http://indexdata.dk/zebra-2.0" - namespace. The output of the &xslt; indexing transformation is then - parsed using &dom; methods, and the contained instructions are + namespace. The output of the &acro.xslt; indexing transformation is then + parsed using &acro.dom; methods, and the contained instructions are performed on the magic elements and their subtrees. @@ -455,7 +455,7 @@ - &dom; input documents which are not resulting in both one + &acro.dom; input documents which are not resulting in both one unique valid record instruction and one or more valid index instructions can not be searched and @@ -470,13 +470,13 @@ The examples work as follows: - From the original &xml; file - marc-one.xml (or from the &xml; record &dom; of the + From the original &acro.xml; file + marc-one.xml (or from the &acro.xml; record &acro.dom; of the same form coming from an <input> pipeline), the indexing pipeline <extract> - produces an indexing &xml; record, which is defined by + produces an indexing &acro.xml; record, which is defined by the record instruction &zebra; uses the content of z:id="11224466" @@ -512,8 +512,8 @@ inserted in the named indexes. - Finally, this example configuration can be queried using &pqf; - queries, either transported by &z3950;, (here using a yaz-client) + Finally, this example configuration can be queried using &acro.pqf; + queries, either transported by &acro.z3950;, (here using a yaz-client) open localhost:9999 @@ -533,27 +533,27 @@ or the proprietary extensions x-pquery and x-pScanClause to - &sru;, and &srw; + &acro.sru;, and &acro.srw; - See for more information on &sru;/&srw; + See for more information on &acro.sru;/&acro.srw; configuration, and or the &yaz; - &cql; section + &acro.cql; section for the details or the &yaz; frontend server. Notice that there are no *.abs, - *.est, *.map, or other &grs1; + *.est, *.map, or other &acro.grs1; filter configuration files involves in this process, and that the literal index names are used during search and retrieval. In case that we want to support the usual - bib-1 &z3950; numeric access points, it is a + bib-1 &acro.z3950; numeric access points, it is a good idea to choose string index names defined in the default configuration file tab/bib1.att, see @@ -566,15 +566,15 @@
- &dom; Record Model Configuration + &acro.dom; Record Model Configuration
- &dom; Indexing Configuration + &acro.dom; Indexing Configuration As mentioned above, there can be only one indexing pipeline, and configuration of the indexing process is a synonym - of writing an &xslt; stylesheet which produces &xml; output containing the + of writing an &acro.xslt; stylesheet which produces &acro.xml; output containing the magic processing instructions or elements discussed in . Obviously, there are million of different ways to accomplish this @@ -584,32 +584,32 @@ Stylesheets can be written in the pull or the push style: pull - means that the output &xml; structure is taken as starting point of - the internal structure of the &xslt; stylesheet, and portions of - the input &xml; are pulled out and inserted - into the right spots of the output &xml; structure. + means that the output &acro.xml; structure is taken as starting point of + the internal structure of the &acro.xslt; stylesheet, and portions of + the input &acro.xml; are pulled out and inserted + into the right spots of the output &acro.xml; structure. On the other - side, push &xslt; stylesheets are recursively + side, push &acro.xslt; stylesheets are recursively calling their template definitions, a process which is commanded - by the input &xml; structure, and is triggered to produce - some output &xml; + by the input &acro.xml; structure, and is triggered to produce + some output &acro.xml; whenever some special conditions in the input stylesheets are met. The pull type is well-suited for input - &xml; with strong and well-defined structure and semantics, like the - following &oai; indexing example, whereas the + &acro.xml; with strong and well-defined structure and semantics, like the + following &acro.oai; indexing example, whereas the push type might be the only possible way to - sort out deeply recursive input &xml; formats. + sort out deeply recursive input &acro.xml; formats. A pull stylesheet example used to index - &oai; harvested records could use some of the following template + &acro.oai; harvested records could use some of the following template definitions: @@ -658,11 +658,11 @@
- &dom; Indexing &marcxml; + &acro.dom; Indexing &acro.marcxml; - The &dom; filter allows indexing of both binary &marc; records - and &marcxml; records, depending on it's configuration. - A typical &marcxml; record might look like this: + The &acro.dom; filter allows indexing of both binary &acro.marc; records + and &acro.marcxml; records, depending on its configuration. + A typical &acro.marcxml; record might look like this: @@ -703,11 +703,11 @@ - It is easily possible to make string manipulation in the &dom; + It is easily possible to make string manipulation in the &acro.dom; filter. For example, if you want to drop some leading articles in the indexing of sort fields, you might want to pick out the - &marcxml; indicator attributes to chop of leading substrings. If - the above &xml; example would have an indicator + &acro.marcxml; indicator attributes to chop of leading substrings. If + the above &acro.xml; example would have an indicator ind2="8" in the title field 245, i.e. @@ -741,7 +741,7 @@ ]]> - The output of the above &marcxml; and &xslt; excerpt would then be: + The output of the above &acro.marcxml; and &acro.xslt; excerpt would then be: How to program a computer @@ -754,20 +754,20 @@
- &dom; Indexing Wizardry + &acro.dom; Indexing Wizardry The names and types of the indexes can be defined in the - indexing &xslt; stylesheet dynamically according to - content in the original &xml; records, which has + indexing &acro.xslt; stylesheet dynamically according to + content in the original &acro.xml; records, which has opportunities for great power and wizardry as well as grande disaster. The following excerpt of a push stylesheet might - be a good idea according to your strict control of the &xml; + be a good idea according to your strict control of the &acro.xml; input format (due to rigorous checking against well-defined and - tight RelaxNG or &xml; Schema's, for example): + tight RelaxNG or &acro.xml; Schema's, for example): @@ -778,11 +778,11 @@ ]]> This template creates indexes which have the name of the working - node of any input &xml; file, and assigns a '1' to the index. + node of any input &acro.xml; file, and assigns a '1' to the index. The example query find @attr 1=xyz 1 finds all files which contain at least one - xyz &xml; element. In case you can not control + xyz &acro.xml; element. In case you can not control which element names the input files contain, you might ask for disaster and bad karma using this technique. @@ -810,18 +810,18 @@ ]]> Don't be tempted to play too smart tricks with the power of - &xslt;, the above example will create zillions of + &acro.xslt;, the above example will create zillions of indexes with unpredictable names, resulting in severe &zebra; index pollution..
- Debuggig &dom; Filter Configurations + Debuggig &acro.dom; Filter Configurations - It can be very hard to debug a &dom; filter setup due to the many - sucessive &marc; syntax translations, &xml; stream splitting and - &xslt; transformations involved. As an aid, you have always the + It can be very hard to debug a &acro.dom; filter setup due to the many + sucessive &acro.marc; syntax translations, &acro.xml; stream splitting and + &acro.xslt; transformations involved. As an aid, you have always the power of the -s command line switch to the zebraidz indexing command at your hand: @@ -836,18 +836,18 @@ - &grs1; Record Model and Filter Modules + + &acro.grs1; Record Model and Filter Modules The functionality of this record model has been improved and - replaced by the DOM &xml; record model. See + replaced by the DOM &acro.xml; record model. See . @@ -19,7 +19,7 @@
- &grs1; Record Filters + &acro.grs1; Record Filters Many basic subtypes of the grs type are currently available: @@ -33,7 +33,7 @@ This is the canonical input format described . It is using - simple &sgml;-like syntax. + simple &acro.sgml;-like syntax. @@ -42,17 +42,17 @@ This allows &zebra; to read - records in the ISO2709 (&marc;) encoding standard. + records in the ISO2709 (&acro.marc;) encoding standard. Last parameter type names the .abs file (see below) - which describes the specific &marc; structure of the input record as + which describes the specific &acro.marc; structure of the input record as well as the indexing rules. The grs.marc uses an internal represtantion - which is not &xml; conformant. In particular &marc; tags are - presented as elements with the same name. And &xml; elements + which is not &acro.xml; conformant. In particular &acro.marc; tags are + presented as elements with the same name. And &acro.xml; elements may not start with digits. Therefore this filter is only - suitable for systems returning &grs1; and &marc; records. For &xml; + suitable for systems returning &acro.grs1; and &acro.marc; records. For &acro.xml; use grs.marcxml filter instead (see below). @@ -69,14 +69,14 @@ This allows &zebra; to read ISO2709 encoded records. Last parameter type names the .abs file (see below) - which describes the specific &marc; structure of the input record as + which describes the specific &acro.marc; structure of the input record as well as the indexing rules. The internal representation for grs.marcxml - is the same as for &marcxml;. + is the same as for &acro.marcxml;. It slightly more complicated to work with than - grs.marc but &xml; conformant. + grs.marc but &acro.xml; conformant. The loadable grs.marcxml filter module @@ -89,11 +89,11 @@ grs.xml - This filter reads &xml; records and uses + This filter reads &acro.xml; records and uses Expat to parse them and convert them into ID&zebra;'s internal grs record model. - Only one record per file is supported, due to the fact &xml; does + Only one record per file is supported, due to the fact &acro.xml; does not allow two documents to "follow" each other (there is no way to know when a document is finished). This filter is only available if &zebra; is compiled with EXPAT support. @@ -138,14 +138,14 @@
- &grs1; Canonical Input Format + &acro.grs1; Canonical Input Format Although input data can take any form, it is sometimes useful to describe the record processing capabilities of the system in terms of a single, canonical input format that gives access to the full spectrum of structure and flexibility in the system. In &zebra;, this - canonical format is an "&sgml;-like" syntax. + canonical format is an "&acro.sgml;-like" syntax. @@ -223,7 +223,7 @@ contains only a single element (strictly speaking, that makes it an illegal GILS record, since the GILS profile includes several mandatory elements - &zebra; does not validate the contents of a record against - the &z3950; profile, however - it merely attempts to match up elements + the &acro.z3950; profile, however - it merely attempts to match up elements of a local representation with the given schema): @@ -248,7 +248,7 @@ textual data elements which might appear in different languages, and images which may appear in different formats or layouts. The variant system in &zebra; is essentially a representation of - the variant mechanism of &z3950;-1995. + the variant mechanism of &acro.z3950;-1995. @@ -328,7 +328,7 @@ The title element above comes in two variants. Both have the IANA body type "text/plain", but one is in English, and the other in - Danish. The client, using the element selection mechanism of &z3950;, + Danish. The client, using the element selection mechanism of &acro.z3950;, can retrieve information about the available variant forms of data elements, or it can select specific variants based on the requirements of the end-user. @@ -339,7 +339,7 @@
- &grs1; REGX And TCL Input Filters + &acro.grs1; REGX And TCL Input Filters In order to handle general input formats, &zebra; allows the @@ -591,7 +591,7 @@
- &grs1; Internal Record Representation + &acro.grs1; Internal Record Representation When records are manipulated by the system, they're represented in a @@ -691,7 +691,7 @@ In practice, each variant node is associated with a triple of class, - type, value, corresponding to the variant mechanism of &z3950;. + type, value, corresponding to the variant mechanism of &acro.z3950;.
@@ -715,7 +715,7 @@
- &grs1; Record Model Configuration + &acro.grs1; Record Model Configuration The following sections describe the configuration files that govern @@ -745,7 +745,7 @@ - The object identifier of the &z3950; schema associated + The object identifier of the &acro.z3950; schema associated with the ARS, so that it can be referred to by the client. @@ -782,7 +782,7 @@ ask for a subset of the data elements contained in a record. Element set names, in the retrieval module, are mapped to element specifications, which contain information equivalent to the - Espec-1 syntax of &z3950;. + Espec-1 syntax of &acro.z3950;. @@ -796,7 +796,7 @@ Possibly, a set of rules describing the mapping of elements to a - &marc; representation. + &acro.marc; representation. @@ -804,7 +804,7 @@ A list of element descriptions (this is the actual ARS of the - schema, in &z3950; terms), which lists the ways in which the various + schema, in &acro.z3950; terms), which lists the ways in which the various tags can be used and organized hierarchically. @@ -830,7 +830,7 @@ The number of different file types may appear daunting at first, but - each type corresponds fairly clearly to a single aspect of the &z3950; + each type corresponds fairly clearly to a single aspect of the &acro.z3950; retrieval facilities. Further, the average database administrator, who is simply reusing an existing profile for which tables already exist, shouldn't have to worry too much about the contents of these tables. @@ -855,14 +855,14 @@ The Abstract Syntax (.abs) Files - The name of this file type is slightly misleading in &z3950; terms, + The name of this file type is slightly misleading in &acro.z3950; terms, since, apart from the actual abstract syntax of the profile, it also includes most of the other definitions that go into a database profile. - When a record in the canonical, &sgml;-like format is read from a file + When a record in the canonical, &acro.sgml;-like format is read from a file or from the database, the first tag of the file should reference the profile that governs the layout of the record. If the first tag of the record is, say, <gils>, the system will look @@ -946,7 +946,7 @@ (o) Points to a file containing parameters for representing the record contents in the ISO2709 syntax. - Read the description of the &marc; representation facility below. + Read the description of the &acro.marc; representation facility below. @@ -982,7 +982,7 @@ (o,r) Adds an element to the abstract record syntax of the schema. The path follows the - syntax which is suggested by the &z3950; document - that is, a sequence + syntax which is suggested by the &acro.z3950; document - that is, a sequence of tags separated by slashes (/). Each tag is given as a comma-separated pair of tag type and -value surrounded by parenthesis. The name is the name of the element, and @@ -1029,8 +1029,8 @@ melm field$subfield attributes - This directive is specifically for &marc;-formatted records, - ingested either in the form of &marcxml; documents, or in the + This directive is specifically for &acro.marc;-formatted records, + ingested either in the form of &acro.marcxml; documents, or in the ISO2709/Z39.2 format using the grs.marcxml input filter. You can specify indexing rules for any subfield, or you can leave off the $subfield part and specify default rules @@ -1046,7 +1046,7 @@ This directive specifies character encoding for external records. - For records such as &xml; that specifies encoding within the + For records such as &acro.xml; that specifies encoding within the file via a header this directive is ignored. If neither this directive is given, nor an encoding is set within external records, ISO-8859-1 encoding is assumed. @@ -1143,7 +1143,7 @@ An automatically generated identifier for the record, unique within this database. It is represented by the <localControlNumber> element in - &xml; and the (1,14) tag in &grs1;. + &acro.xml; and the (1,14) tag in &acro.grs1;. @@ -1258,7 +1258,7 @@ set. For instance, many new attribute sets are defined as extensions to the bib-1 set. This is an important feature of the retrieval - system of &z3950;, as it ensures the highest possible level of + system of &acro.z3950;, as it ensures the highest possible level of interoperability, as those access points of your database which are derived from the external set (say, bib-1) can be used even by clients who are unaware of the new set. @@ -1310,7 +1310,7 @@ This file type defines the tagset of the profile, possibly by referencing other tag sets (most tag sets, for instance, will include - tagsetG and tagsetM from the &z3950; specification. The file may + tagsetG and tagsetM from the &acro.z3950; specification. The file may contain the following directives. @@ -1550,7 +1550,7 @@ The element set specification files describe a selection of a subset of the elements of a database record. The element selection mechanism is equivalent to the one supplied by the Espec-1 - syntax of the &z3950; specification. + syntax of the &acro.z3950; specification. In fact, the internal representation of an element set specification is identical to the Espec-1 structure, and we'll refer you to the description of that structure for most of @@ -1691,14 +1691,14 @@ a schema that differs from the native schema of the record. For instance, a client might only know how to process WAIS records, while the database record is represented in a more specific schema, such as - GILS. In this module, a mapping of data to one of the &marc; formats is + GILS. In this module, a mapping of data to one of the &acro.marc; formats is also thought of as a schema mapping (mapping the elements of the - record into fields consistent with the given &marc; specification, prior + record into fields consistent with the given &acro.marc; specification, prior to actually converting the data to the ISO2709). This use of the - object identifier for &usmarc; as a schema identifier represents an + object identifier for &acro.usmarc; as a schema identifier represents an overloading of the OID which might not be entirely proper. However, it represents the dual role of schema and record syntax which - is assumed by the &marc; family in &z3950;. + is assumed by the &acro.marc; family in &acro.z3950;. @@ -1766,7 +1766,7 @@
- &grs1; Exchange Formats + &acro.grs1; Exchange Formats Converting records from the internal structure to an exchange format @@ -1778,7 +1778,7 @@ - &grs1;. The internal representation is based on &grs1;/&xml;, so the + &acro.grs1;. The internal representation is based on &acro.grs1;/&acro.xml;, so the conversion here is straightforward. The system will create applied variant and supported variant lists as required, if a record contains variant information. @@ -1787,34 +1787,34 @@ - &xml;. The internal representation is based on &grs1;/&xml; so - the mapping is trivial. Note that &xml; schemas, preprocessing + &acro.xml;. The internal representation is based on &acro.grs1;/&acro.xml; so + the mapping is trivial. Note that &acro.xml; schemas, preprocessing instructions and comments are not part of the internal representation - and therefore will never be part of a generated &xml; record. + and therefore will never be part of a generated &acro.xml; record. Future versions of the &zebra; will support that. - &sutrs;. Again, the mapping is fairly straightforward. Indentation + &acro.sutrs;. Again, the mapping is fairly straightforward. Indentation is used to show the hierarchical structure of the record. All - "&grs1;" type records support both the &grs1; and &sutrs; + "&acro.grs1;" type records support both the &acro.grs1; and &acro.sutrs; representations. - + - ISO2709-based formats (&usmarc;, etc.). Only records with a + ISO2709-based formats (&acro.usmarc;, etc.). Only records with a two-level structure (corresponding to fields and subfields) can be directly mapped to ISO2709. For records with a different structuring - (eg., GILS), the representation in a structure like &usmarc; involves a + (eg., GILS), the representation in a structure like &acro.usmarc; involves a schema-mapping (see ), to an - "implied" &usmarc; schema (implied, + "implied" &acro.usmarc; schema (implied, because there is no formal schema which specifies the use of the - &usmarc; fields outside of ISO2709). The resultant, two-level record is + &acro.usmarc; fields outside of ISO2709). The resultant, two-level record is then mapped directly from the internal representation to ISO2709. See the GILS schema definition files for a detailed example of this approach. @@ -1853,18 +1853,18 @@
- Extended indexing of &marc; records + Extended indexing of &acro.marc; records - Extended indexing of &marc; records will help you if you need index a + Extended indexing of &acro.marc; records will help you if you need index a combination of subfields, or index only a part of the whole field, - or use during indexing process embedded fields of &marc; record. + or use during indexing process embedded fields of &acro.marc; record. - Extended indexing of &marc; records additionally allows: + Extended indexing of &acro.marc; records additionally allows: - to index data in LEADER of &marc; record + to index data in LEADER of &acro.marc; record @@ -1876,25 +1876,25 @@ - to index linked fields for UNI&marc; based formats + to index linked fields for UNI&acro.marc; based formats In compare with simple indexing process the extended indexing - may increase (about 2-3 times) the time of indexing process for &marc; + may increase (about 2-3 times) the time of indexing process for &acro.marc; records.
The index-formula At the beginning, we have to define the term - index-formula for &marc; records. This term helps - to understand the notation of extended indexing of &marc; records by &zebra;. + index-formula for &acro.marc; records. This term helps + to understand the notation of extended indexing of &acro.marc; records by &zebra;. Our definition is based on the document "The table - of conformity for &z3950; use attributes and R&usmarc; fields". + of conformity for &acro.z3950; use attributes and R&acro.usmarc; fields". The document is available only in russian language. @@ -1907,7 +1907,7 @@ - We know that &zebra; supports a &bib1; attribute - right truncation. + We know that &zebra; supports a &acro.bib1; attribute - right truncation. In this case, the index-formula (1) consists from forms, defined in the same way as (1) @@ -1918,7 +1918,7 @@ - The original &marc; record may be without some elements, which included in index-formula. + The original &acro.marc; record may be without some elements, which included in index-formula. @@ -1933,7 +1933,7 @@ - The position may contain any value, defined by - &marc; format. + &acro.marc; format. For example, index-formula @@ -1976,7 +1976,7 @@ - All another operands are the same as accepted in &marc; world. + All another operands are the same as accepted in &acro.marc; world. @@ -1991,7 +1991,7 @@ (.abs file). It means that names beginning with "mc-" are interpreted by &zebra; as index-formula. The database index is created and - linked with access point (&bib1; use attribute) + linked with access point (&acro.bib1; use attribute) according to this formula. For example, index-formula @@ -2018,7 +2018,7 @@ . The position may contain any value, defined by - &marc; format. For example, + &acro.marc; format. For example, index-formula @@ -2082,7 +2082,7 @@ - All another operands are the same as accepted in &marc; world. + All another operands are the same as accepted in &acro.marc; world.
@@ -2115,7 +2115,7 @@ elm mc-008[0-5] Date/time-added-to-db ! - or for R&usmarc; (this data included in 100th field) + or for R&acro.usmarc; (this data included in 100th field) elm mc-100___$a[0-7]_ Date/time-added-to-db ! @@ -2127,7 +2127,7 @@ using indicators while indexing - For R&usmarc; index-formula + For R&acro.usmarc; index-formula 70-#1$a, $g matches @@ -2143,10 +2143,10 @@ - indexing embedded (linked) fields for UNI&marc; based + indexing embedded (linked) fields for UNI&acro.marc; based formats - For R&usmarc; index-formula + For R&acro.usmarc; index-formula 4--#-$170-#1$a, $g ($c) matches ]> - + &zebra; - User's Guide and Reference @@ -30,10 +30,10 @@ &zebra; is a free, fast, friendly information management system. It - can index records in &xml;, &sgml;, &marc;, e-mail archives and many + can index records in &acro.xml;, &acro.sgml;, &acro.marc;, e-mail archives and many other formats, and quickly find them using a combination of boolean searching and relevance ranking. Search-and-retrieve - applications can be written using &api;s in a wide variety of + applications can be written using &acro.api;s in a wide variety of languages, communicating with the &zebra; server using industry-standard information-retrieval protocols or web services. diff --git a/doc/zebraidx.xml b/doc/zebraidx.xml index c8ad513..261ae76 100644 --- a/doc/zebraidx.xml +++ b/doc/zebraidx.xml @@ -8,7 +8,7 @@ %idcommon; ]> - + zebra @@ -160,7 +160,7 @@ The records located should be associated with the database name - database for access through the &z3950; server. + database for access through the &acro.z3950; server. diff --git a/doc/zebrasrv-options.xml b/doc/zebrasrv-options.xml index b0b3fdb..f2fac22 100644 --- a/doc/zebrasrv-options.xml +++ b/doc/zebrasrv-options.xml @@ -1,5 +1,5 @@ -z - Use the &z3950; protocol (default). This option and -s + Use the &acro.z3950; protocol (default). This option and -s complement each other. You can use both multiple times on the same command line, between listener-specifications (see below). This way, you @@ -77,7 +77,7 @@ crash at a single server instance. Heikki --> -f vconfig - This specifies an &xml; file that describes + This specifies an &acro.xml; file that describes one or more &yaz; frontend virtual servers. See section VIRTUAL HOSTS for details. @@ -221,7 +221,7 @@ Heikki --> hostname | IP-number [: portnumber] - The port number defaults to 210 (standard &z3950; port) for + The port number defaults to 210 (standard &acro.z3950; port) for privileged users (root), and 9999 for normal users. The special hostname "@" is mapped to the address INADDR_ANY, which causes the server to listen on any local @@ -230,7 +230,7 @@ Heikki --> The default behavior for zebrasrv - if started as non-priviledged user - is to establish - a single TCP/IP listener, for the &z3950; protocol, on port 9999. + a single TCP/IP listener, for the &acro.z3950; protocol, on port 9999. zebrasrv @ zebrasrv tcp:some.server.name.org:1234 @@ -240,7 +240,7 @@ Heikki --> To start the server listening on the registered port for - &z3950;, or on a filesystem socket, + &acro.z3950;, or on a filesystem socket, and to drop root privileges once the ports are bound, execute the server like this from a root shell: diff --git a/doc/zebrasrv-virtual.xml b/doc/zebrasrv-virtual.xml index eb240e5..0a43aa3 100644 --- a/doc/zebrasrv-virtual.xml +++ b/doc/zebrasrv-virtual.xml @@ -1,5 +1,5 @@ @@ -11,27 +11,27 @@ A backend can be configured to execute in a particular working - directory. Or the &yaz; frontend may perform &cql; to &rpn; conversion, thus - allowing traditional &z3950; backends to be offered as a -&sru; service. - &sru; Explain information for a particular backend may also be specified. + directory. Or the &yaz; frontend may perform &acro.cql; to &acro.rpn; conversion, thus + allowing traditional &acro.z3950; backends to be offered as a +&acro.sru; service. + &acro.sru; Explain information for a particular backend may also be specified. For the HTTP protocol, the virtual host is specified in the Host header. - For the &z3950; protocol, the virtual host is specified as in the + For the &acro.z3950; protocol, the virtual host is specified as in the Initialize Request in the OtherInfo, OID 1.2.840.10003.10.1000.81.1. - Not all &z3950; clients allows the VHOST information to be set. + Not all &acro.z3950; clients allows the VHOST information to be set. For those the selection of the backend must rely on the TCP/IP information alone (port and address). - The &yaz; frontend server uses &xml; to describe the backend + The &yaz; frontend server uses &acro.xml; to describe the backend configurations. Command-line option -f - specifies filename of the &xml; configuration. + specifies filename of the &acro.xml; configuration. The configuration uses the root element yazgfs. @@ -128,9 +128,9 @@ element cql2rpn (optional) - Specifies a filename that includes &cql; to &rpn; conversion for this - backend server. See &cql; section in &yaz; manual. - If given, the backend server will only "see" a Type-1/&rpn; query. + Specifies a filename that includes &acro.cql; to &acro.rpn; conversion for this + backend server. See &acro.cql; section in &yaz; manual. + If given, the backend server will only "see" a Type-1/&acro.rpn; query. @@ -138,7 +138,7 @@ element explain (optional) - Specifies &sru; ZeeRex content for this + Specifies &acro.sru; ZeeRex content for this server - copied verbatim to the client. As things are now, some of the Explain content seems redundant because host information, etc. is also stored elsewhere. @@ -154,7 +154,7 @@ - The &xml; below configures a server that accepts connections from + The &acro.xml; below configures a server that accepts connections from two ports, TCP/IP port 9900 and a local UNIX file socket. We name the TCP/IP server public and the other server internal. @@ -200,7 +200,7 @@ For "server2" elements for -&cql; to &rpn; conversion +&acro.cql; to &acro.rpn; conversion is supported and explain information has been added (a short one here to keep the example small). diff --git a/doc/zebrasrv.xml b/doc/zebrasrv.xml index 7899207..f517ca6 100644 --- a/doc/zebrasrv.xml +++ b/doc/zebrasrv.xml @@ -8,7 +8,7 @@ %idcommon; ]> - + zebra @@ -31,11 +31,11 @@ DESCRIPTION Zebra is a high-performance, general-purpose structured text indexing and retrieval engine. It reads structured records in a variety of input - formats (eg. email, &xml;, &marc;) and allows access to them through exact + formats (eg. email, &acro.xml;, &acro.marc;) and allows access to them through exact boolean search expressions and relevance-ranked free-text queries. - zebrasrv is the &z3950; and &sru; frontend + zebrasrv is the &acro.z3950; and &acro.sru; frontend server for the Zebra search engine and indexer. @@ -60,14 +60,14 @@ - &z3950; Protocol Support and Behavior + &acro.z3950; Protocol Support and Behavior - &z3950; Initialization + &acro.z3950; Initialization During initialization, the server will negotiate to version 3 of the - &z3950; protocol, and the option bits for Search, Present, Scan, + &acro.z3950; protocol, and the option bits for Search, Present, Scan, NamedResultSets, and concurrentOperations will be set, if requested by the client. The maximum PDU size is negotiated down to a maximum of 1 MB by default. @@ -76,7 +76,7 @@ - &z3950; Search + &acro.z3950; Search The supported query type are 1 and 101. All operators are currently @@ -96,17 +96,17 @@ - &z3950; Present + &acro.z3950; Present The present facility is supported in a standard fashion. The requested record syntax is matched against the ones supported by the profile of - each record retrieved. If no record syntax is given, &sutrs; is the + each record retrieved. If no record syntax is given, &acro.sutrs; is the default. The requested element set name, again, is matched against any provided by the relevant record profiles. - &z3950; Scan + &acro.z3950; Scan The attribute combinations provided with the termListAndStartPoint are processed in the same way as operands in a query (see above). @@ -115,10 +115,10 @@ - &z3950; Sort + &acro.z3950; Sort - &z3950; specifies three different types of sort criteria. + &acro.z3950; specifies three different types of sort criteria. Of these Zebra supports the attribute specification type in which case the use attribute specifies the "Sort register". Sort registers are created for those fields that are of type "sort" in @@ -128,14 +128,14 @@ - &z3950; allows the client to specify sorting on one or more input + &acro.z3950; allows the client to specify sorting on one or more input result sets and one output result set. Zebra supports sorting on one result set only which may or may not be the same as the output result set. - &z3950; Close + &acro.z3950; Close If a Close PDU is received, the server will respond with a Close PDU with reason=FINISHED, no matter which protocol version was negotiated @@ -150,10 +150,10 @@ - &z3950; Explain + &acro.z3950; Explain Zebra maintains a "classic" - &z3950; Explain database + &acro.z3950; Explain database on the side. This database is called IR-Explain-1 and can be searched using the attribute set exp-1. @@ -176,54 +176,54 @@ - The &sru; Server + The &acro.sru; Server - In addition to &z3950;, Zebra supports the more recent and - web-friendly IR protocol &sru;. - &sru; can be carried over &soap; or a &rest;-like protocol - that uses HTTP &get; or &post; to request search responses. The request + In addition to &acro.z3950;, Zebra supports the more recent and + web-friendly IR protocol &acro.sru;. + &acro.sru; can be carried over &acro.soap; or a &acro.rest;-like protocol + that uses HTTP &acro.get; or &acro.post; to request search responses. The request itself is made of parameters such as query, startRecord, maximumRecords and recordSchema; - the response is an &xml; document containing hit-count, result-set - records, diagnostics, etc. &sru; can be thought of as a re-casting - of &z3950; semantics in web-friendly terms; or as a standardisation + the response is an &acro.xml; document containing hit-count, result-set + records, diagnostics, etc. &acro.sru; can be thought of as a re-casting + of &acro.z3950; semantics in web-friendly terms; or as a standardisation of the ad-hoc query parameters used by search engines such as Google and AltaVista; or as a superset of A9's OpenSearch (which it predates). - Zebra supports &z3950;, &sru; &get;, SRU &post;, SRU &soap; (&srw;) + Zebra supports &acro.z3950;, &acro.sru; &acro.get;, SRU &acro.post;, SRU &acro.soap; (&acro.srw;) - on the same port, recognising what protocol is used by each incoming requests and handling them accordingly. This is a achieved through the use of Deep Magic; civilians are warned not to stand too close. - Running zebrasrv as an &sru; Server + Running zebrasrv as an &acro.sru; Server Because Zebra supports all protocols on one port, it would - seem to follow that the &sru; server is run in the same way as - the &z3950; server, as described above. This is true, but only in + seem to follow that the &acro.sru; server is run in the same way as + the &acro.z3950; server, as described above. This is true, but only in an uninterestingly vacuous way: a Zebra server run in this manner - will indeed recognise and accept &sru; requests; but since it - doesn't know how to handle the &cql; queries that these protocols + will indeed recognise and accept &acro.sru; requests; but since it + doesn't know how to handle the &acro.cql; queries that these protocols use, all it can do is send failure responses. - It is possible to cheat, by having &sru; search Zebra with - a &pqf; query instead of &cql;, using the + It is possible to cheat, by having &acro.sru; search Zebra with + a &acro.pqf; query instead of &acro.cql;, using the x-pquery parameter instead of query. This is a non-standard extension - of &cql;, and a + of &acro.cql;, and a very naughty - thing to do, but it does give you a way to see Zebra serving &sru; + thing to do, but it does give you a way to see Zebra serving &acro.sru; ``right out of the box''. If you start your favourite Zebra server in the usual way, on port 9999, then you can send your web browser to: @@ -236,20 +236,20 @@ &maximumRecords=1 - This will display the &xml;-formatted &sru; response that includes the + This will display the &acro.xml;-formatted &acro.sru; response that includes the first record in the result-set found by the query - mineral. (For clarity, the &sru; URL is shown + mineral. (For clarity, the &acro.sru; URL is shown here broken across lines, but the lines should be joined to gether to make single-line URL for the browser to submit.) - In order to turn on Zebra's support for &cql; queries, it's necessary + In order to turn on Zebra's support for &acro.cql; queries, it's necessary to have the &yaz; generic front-end (which Zebra uses) translate them - into the &z3950; Type-1 query format that is used internally. And + into the &acro.z3950; Type-1 query format that is used internally. And to do this, the generic front-end's own configuration file must be used. See ; - the salient point for &sru; support is that + the salient point for &acro.sru; support is that zebrasrv must be started with the -f frontendConfigFile @@ -257,7 +257,7 @@ -c zebraConfigFile option, and that the front-end configuration file must include both a - reference to the Zebra configuration file and the &cql;-to-&pqf; + reference to the Zebra configuration file and the &acro.cql;-to-&acro.pqf; translator configuration file. @@ -281,13 +281,13 @@ -c command-line argument, and the <cql2rpn> - element contains the name of the &cql; properties file specifying how - various &cql; indexes, relations, etc. are translated into Type-1 + element contains the name of the &acro.cql; properties file specifying how + various &acro.cql; indexes, relations, etc. are translated into Type-1 queries. A zebra server running with such a configuration can then be - queried using proper, conformant &sru; URLs with &cql; queries: + queried using proper, conformant &acro.sru; URLs with &acro.cql; queries: http://localhost:9999/Default?version=1.1 @@ -299,107 +299,107 @@ - &sru; Protocol Support and Behavior + &acro.sru; Protocol Support and Behavior - Zebra running as an &sru; server supports SRU version 1.1, including - &cql; version 1.1. In particular, it provides support for the + Zebra running as an &acro.sru; server supports SRU version 1.1, including + &acro.cql; version 1.1. In particular, it provides support for the following elements of the protocol. - &sru; Search and Retrieval + &acro.sru; Search and Retrieval Zebra supports the - &sru; searchRetrieve + &acro.sru; searchRetrieve operation. - One of the great strengths of &sru; is that it mandates a standard - query language, &cql;, and that all conforming implementations can + One of the great strengths of &acro.sru; is that it mandates a standard + query language, &acro.cql;, and that all conforming implementations can therefore be trusted to correctly interpret the same queries. It is with some shame, then, that we admit that Zebra also supports an additional query language, our own Prefix Query Format - (&pqf;). - A &pqf; query is submitted by using the extension parameter + (&acro.pqf;). + A &acro.pqf; query is submitted by using the extension parameter x-pquery, in which case the query - parameter must be omitted, which makes the request not valid &sru;. + parameter must be omitted, which makes the request not valid &acro.sru;. Please feel free to use this facility within your own - applications; but be aware that it is not only non-standard &sru; + applications; but be aware that it is not only non-standard &acro.sru; but not even syntactically valid, since it omits the mandatory query parameter. - &sru; Scan + &acro.sru; Scan - Zebra supports &sru; scan + Zebra supports &acro.sru; scan operation. - Scanning using &cql; syntax is the default, where the + Scanning using &acro.cql; syntax is the default, where the standard scanClause parameter is used. In addition, a - mutant form of &sru; scan is supported, using + mutant form of &acro.sru; scan is supported, using the non-standard x-pScanClause parameter in place of the standard scanClause to scan on a - &pqf; query clause. + &acro.pqf; query clause. - &sru; Explain + &acro.sru; Explain - Zebra supports &sru; explain. + Zebra supports &acro.sru; explain. The ZeeRex record explaining a database may be requested either - with a fully fledged &sru; request (with + with a fully fledged &acro.sru; request (with operation=explain and version-number specified) - or with a simple HTTP &get; at the server's basename. + or with a simple HTTP &acro.get; at the server's basename. The ZeeRex record returned in response is the one embedded in the &yaz; Frontend Server configuration file that is described in the . Unfortunately, the data found in the - &cql;-to-&pqf; text file must be added by hand-craft into the explain + &acro.cql;-to-&acro.pqf; text file must be added by hand-craft into the explain section of the &yaz; Frontend Server configuration file to be able to provide a suitable explain record. Too bad, but this is all extreme new alpha stuff, and a lot of work has yet to be done .. - There is no linkeage whatsoever between the &z3950; explain model - and the &sru; explain response (well, at least not implemented + There is no linkeage whatsoever between the &acro.z3950; explain model + and the &acro.sru; explain response (well, at least not implemented in Zebra, that is ..). Zebra does not provide a means using - &z3950; to obtain the ZeeRex record. + &acro.z3950; to obtain the ZeeRex record. - Other &sru; operations + Other &acro.sru; operations - In the &z3950; protocol, Initialization, Present, Sort and Close - are separate operations. In &sru;, however, these operations do not + In the &acro.z3950; protocol, Initialization, Present, Sort and Close + are separate operations. In &acro.sru;, however, these operations do not exist. - &sru; has no explicit initialization handshake phase, but + &acro.sru; has no explicit initialization handshake phase, but commences immediately with searching, scanning and explain operations. - Neither does &sru; have a close operation, since the protocol is + Neither does &acro.sru; have a close operation, since the protocol is stateless and each request is self-contained. (It is true that - multiple &sru; request/response pairs may be implemented as + multiple &acro.sru; request/response pairs may be implemented as multiple HTTP request/response pairs over a single persistent TCP/IP connection; but the closure of that connection is not a protocol-level operation.) @@ -407,12 +407,12 @@ - Retrieval in &sru; is part of the + Retrieval in &acro.sru; is part of the searchRetrieve operation, in which a search is submitted and the response includes a subset of the records - in the result set. There is no direct analogue of &z3950;'s + in the result set. There is no direct analogue of &acro.z3950;'s Present operation which requests records from an established - result set. In &sru;, this is achieved by sending a subsequent + result set. In &acro.sru;, this is achieved by sending a subsequent searchRetrieve request with the query cql.resultSetId=id where id is the identifier of the previously @@ -421,24 +421,24 @@ - Sorting in &cql; is done within the + Sorting in &acro.cql; is done within the searchRetrieve operation - in v1.1, by an explicit sort parameter, but the forthcoming v1.2 or v2.0 will most likely use an extension of the query - language, &cql; sorting. + language, &acro.cql; sorting. - It can be seen, then, that while Zebra operating as an &sru; server + It can be seen, then, that while Zebra operating as an &acro.sru; server does not provide the same set of operations as when operating as a - &z3950; server, it does provide equivalent functionality. + &acro.z3950; server, it does provide equivalent functionality. - &sru; Examples + &acro.sru; Examples Surf into http://localhost:9999 to get an explain response, or use @@ -462,7 +462,7 @@ ]]> - Even search using &pqf; queries using the extended naughty + Even search using &acro.pqf; queries using the extended naughty parameter x-pquery