From b19b79e382ef8196f1625763db1af3a82b1e0c81 Mon Sep 17 00:00:00 2001 From: Marc Cromme Date: Fri, 2 Feb 2007 11:10:08 +0000 Subject: [PATCH] replaces acronymes in XML text with new defined acronyme entities --- doc/administration.xml | 92 +++++++------- doc/architecture.xml | 98 +++++++-------- doc/examples.xml | 44 +++---- doc/installation.xml | 32 ++--- doc/introduction.xml | 106 ++++++++-------- doc/marc_indexing.xml | 52 ++++---- doc/querymodel.xml | 274 ++++++++++++++++++++--------------------- doc/quickstart.xml | 14 +-- doc/recordmodel-alvisxslt.xml | 96 +++++++-------- doc/recordmodel-grs.xml | 166 ++++++++++++------------- doc/zebra.xml | 4 +- doc/zebraidx.xml | 4 +- doc/zebrasrv-options.xml | 14 +-- doc/zebrasrv-synopsis.xml | 4 +- doc/zebrasrv-virtual.xml | 36 +++--- doc/zebrasrv.xml | 174 +++++++++++++------------- 16 files changed, 605 insertions(+), 605 deletions(-) diff --git a/doc/administration.xml b/doc/administration.xml index 13baec6..d47fc77 100644 --- a/doc/administration.xml +++ b/doc/administration.xml @@ -1,5 +1,5 @@ - + Administrating &zebra; @@ -418,7 +418,7 @@ of permissions currently: read (r) and write(w). By default users not listed in a permission directive are given the read privilege. To specify permissions for a user with no - username, or Z39.50 anonymous style use + username, or &z3950; anonymous style use anonymous. The permstring consists of a sequence of characters. Include character w for write/update access, r for read access and @@ -465,7 +465,7 @@ mounted on a CD-ROM drive, you may want &zebra; to make an internal copy of them. To do this, you specify 1 (true) in the storeData setting. When - the Z39.50 server retrieves the records they will be read from the + the &z3950; server retrieves the records they will be read from the internal file structures of the system. @@ -494,7 +494,7 @@ Consider a system in which you have a group of text files called simple. - That group of records should belong to a Z39.50 database called + That group of records should belong to a &z3950; database called textbase. The following zebra.cfg file will suffice: @@ -613,7 +613,7 @@ information. If you have a group of records that explicitly associates an ID with each record, this method is convenient. For example, the record format may contain a title or a ID-number - unique within the group. - In either case you specify the Z39.50 attribute set and use-attribute + In either case you specify the &z3950; attribute set and use-attribute location in which this information is stored, and the system looks at that field to determine the identity of the record. @@ -700,7 +700,7 @@ For instance, the sample GILS records that come with the &zebra; distribution contain a unique ID in the data tagged Control-Identifier. - The data is mapped to the Bib-1 use attribute Identifier-standard + The data is mapped to the &bib1; use attribute Identifier-standard (code 1007). To use this field as a record id, specify (bib1,Identifier-standard) as the value of the recordId in the configuration file. @@ -1049,7 +1049,7 @@ The experimental alvis filter provides a - directive to fetch static rank information out of the indexed XML + directive to fetch static rank information out of the indexed &xml; records, thus making all hit sets ordered after ascending static rank, and for those doc's which have the same static rank, ordered @@ -1086,21 +1086,21 @@ indexing time (this is why we call it ``dynamic ranking'' in the first place ...) It is invoked by adding - the Bib-1 relation attribute with - value ``relevance'' to the PQF query (that is, + the &bib1; relation attribute with + value ``relevance'' to the &pqf; query (that is, @attr 2=102, see also - The BIB-1 Attribute Set Semantics, also in + The &bib1; Attribute Set Semantics, also in HTML). To find all articles with the word Eoraptor in - the title, and present them relevance ranked, issue the PQF query: + the title, and present them relevance ranked, issue the &pqf; query: @attr 2=102 @attr 1=4 Eoraptor - Dynamically ranking using PQF queries with the 'rank-1' + <title>Dynamically ranking using &pqf; queries with the 'rank-1' algorithm @@ -1167,7 +1167,7 @@ It is possible to apply dynamic ranking on only parts of the - PQF query: + &pqf; query: @and @attr 2=102 @attr 1=1010 Utah @attr 1=1018 Springer @@ -1202,7 +1202,7 @@ Ranking weights may be used to pass a value to a ranking - algorithm, using the non-standard BIB-1 attribute type 9. + algorithm, using the non-standard &bib1; attribute type 9. This allows one branch of a query to use one value while another branch uses a different one. For example, we can search for utah in the @@ -1214,7 +1214,7 @@ The default weight is - sqrt(1000) ~ 34 , as the Z39.50 standard prescribes that the top score + sqrt(1000) ~ 34 , as the &z3950; standard prescribes that the top score is 1000 and the bottom score is 0, encoded in integers. @@ -1339,7 +1339,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci @@ -1555,10 +1555,10 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci - Extended services in the Z39.50 protocol + Extended services in the &z3950; protocol - The Z39.50 standard allows + The &z3950; standard allows servers to accept special binary extended services protocol packages, which may be used to insert, update and delete records into servers. These carry control and update @@ -1566,7 +1566,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci - Extended services Z39.50 Package Fields + Extended services &z3950; Package Fields @@ -1594,13 +1594,13 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci record - XML string - An XML formatted string containing the record + &xml; string + An &xml; formatted string containing the record syntax 'xml' - Only XML record syntax is supported + Only &xml; record syntax is supported recordIdOpaque @@ -1663,7 +1663,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci When retrieving existing - records indexed with GRS indexing filters, the &zebra; internal + records indexed with &grs1; indexing filters, the &zebra; internal ID number is returned in the field /*/id:idzebra/localnumber in the namespace xmlns:id="http://www.indexdata.dk/zebra/", @@ -1712,7 +1712,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci ]]> Now the Default database was created, - we can insert an XML file (esdd0006.grs + we can insert an &xml; file (esdd0006.grs from example/gils/records) and index it: rset) is the count of all documents in this speci Extended services from yaz-php - Extended services are also available from the YAZ PHP client layer. An - example of an YAZ-PHP extended service transaction is given here: + Extended services are also available from the &yaz; &php; client layer. An + example of an &yaz;-&php; extended service transaction is given here: A fine specimen of a record'; diff --git a/doc/architecture.xml b/doc/architecture.xml index 86a04fb..fd89051 100644 --- a/doc/architecture.xml +++ b/doc/architecture.xml @@ -1,5 +1,5 @@ - + Overview of &zebra; Architecture
@@ -87,9 +87,9 @@ Search Evaluation - by execution of search requests expressed in PQF/RPN + by execution of search requests expressed in &pqf;/&rpn; data structures, which are handed over from - the YAZ server frontend API. Search evaluation includes + the &yaz; server frontend &api;. Search evaluation includes construction of hit lists according to boolean combinations of simpler searches. Fast performance is achieved by careful use of index structures, and by evaluation specific index hit @@ -112,7 +112,7 @@ Record Presentation returns - possibly ranked - result sets, hit - numbers, and the like internal data to the YAZ server backend API + numbers, and the like internal data to the &yaz; server backend &api; for shipping to the client. Each individual filter module implements it's own specific presentation formats. @@ -149,7 +149,7 @@
&zebra; Searcher/Retriever - This is the executable which runs the Z39.50/SRU/SRW server and + This is the executable which runs the &z3950;/&sru;/&srw; server and glues together the core libraries and the filter modules to one great Information Retrieval server application. @@ -160,32 +160,32 @@
- YAZ Server Frontend + &yaz; Server Frontend - The YAZ server frontend is - a full fledged stateful Z39.50 server taking client + The &yaz; server frontend is + a full fledged stateful &z3950; server taking client connections, and forwarding search and scan requests to the &zebra; core indexer. - In addition to Z39.50 requests, the YAZ server frontend acts + In addition to &z3950; requests, the &yaz; server frontend acts as HTTP server, honoring - SRU SOAP + &sru; &soap; requests, and - SRU REST + &sru; &rest; requests. Moreover, it can translate incoming - CQL + &cql; queries to - PQF + &pqf; queries, if correctly configured. - YAZ + &yaz; is an Open Source toolkit that allows you to develop software using the - ANSI Z39.50/ISO23950 standard for information retrieval. + &ansi; &z3950;/ISO23950 standard for information retrieval. It is packaged in the Debian packages yaz and libyaz. @@ -209,26 +209,26 @@
- ALVIS XML Record Model and Filter Module + ALVIS &xml; Record Model and Filter Module - The Alvis filter for XML files is an XSLT based input + The Alvis filter for &xml; files is an &xslt; based input filter. - It indexes element and attribute content of any thinkable XML format - using full XPATH support, a feature which the standard &zebra; - GRS SGML and XML filters lacked. The indexed documents are - parsed into a standard XML DOM tree, which restricts record size + It indexes element and attribute content of any thinkable &xml; format + using full &xpath; support, a feature which the standard &zebra; + &grs1; &sgml; and &xml; filters lacked. The indexed documents are + parsed into a standard &xml; &dom; tree, which restricts record size according to availability of memory. The Alvis filter - uses XSLT display stylesheets, which let + uses &xslt; display stylesheets, which let the &zebra; DB administrator associate multiple, different views on - the same XML document type. These views are chosen on-the-fly in + the same &xml; document type. These views are chosen on-the-fly in search time. In addition, the Alvis filter configuration is not bound to the - arcane BIB-1 Z39.50 library catalogue indexing traditions and + arcane &bib1; &z3950; library catalogue indexing traditions and folklore, and is therefore easier to understand. @@ -241,7 +241,7 @@ their Pagerank algorithm. - Details on the experimental Alvis XSLT filter are found in + Details on the experimental Alvis &xslt; filter are found in . @@ -251,27 +251,27 @@
- GRS Record Model and Filter Modules + &grs1; Record Model and Filter Modules - The GRS filter modules described in + The &grs1; filter modules described in - are all based on the Z39.50 specifications, and it is absolutely - mandatory to have the reference pages on BIB-1 attribute sets on - you hand when configuring GRS filters. The GRS filters come in + are all based on the &z3950; specifications, and it is absolutely + mandatory to have the reference pages on &bib1; attribute sets on + you hand when configuring &grs1; filters. The GRS filters come in different flavors, and a short introduction is needed here. - GRS filters of various kind have also been called ABS filters due + &grs1; filters of various kind have also been called ABS filters due to the *.abs configuration file suffix. The grs.marc and grs.marcxml filters are suited to parse and - index binary and XML versions of traditional library MARC records + index binary and &xml; versions of traditional library &marc; records based on the ISO2709 standard. The Debian package for both filters is libidzebra-2.0-mod-grs-marc. - GRS TCL scriptable filters for extensive user configuration come + &grs1; TCL scriptable filters for extensive user configuration come in two flavors: a regular expression filter grs.regx using TCL regular expressions, and a general scriptable TCL filter called @@ -280,7 +280,7 @@ libidzebra-2.0-mod-grs-regx Debian package. - A general purpose SGML filter is called + A general purpose &sgml; filter is called grs.sgml. This filter is not yet packaged, but planned to be in the libidzebra-2.0-mod-grs-sgml Debian package. @@ -290,8 +290,8 @@ libidzebra-2.0-mod-grs-xml includes the grs.xml filter which uses Expat to - parse records in XML and turn them into ID&zebra;'s internal GRS node - trees. Have also a look at the Alvis XML/XSLT filter described in + parse records in &xml; and turn them into ID&zebra;'s internal &grs1; node + trees. Have also a look at the Alvis &xml;/&xslt; filter described in the next session.
@@ -332,8 +332,8 @@ When records are accessed by the system, they are represented - in their local, or native format. This might be SGML or HTML files, - News or Mail archives, MARC records. If the system doesn't already + in their local, or native format. This might be &sgml; or HTML files, + News or Mail archives, &marc; records. If the system doesn't already know how to read the type of data you need to store, you can set up an input filter by preparing conversion rules based on regular expressions and possibly augmented by a flexible scripting language @@ -360,7 +360,7 @@ Before transmitting records to the client, they are first converted from the internal structure to a form suitable for exchange - over the network - according to the Z39.50 standard. + over the network - according to the &z3950; standard. @@ -382,7 +382,7 @@ In particular, the regular record filters are not invoked when these are in use. This can in some cases make the retrival faster than regular - retrieval operations (for MARC, XML etc). + retrieval operations (for &marc;, &xml; etc).
Special Retrieval Elements @@ -398,7 +398,7 @@ zebra::meta::sysno Get &zebra; record system ID - XML and SUTRS + &xml; and &sutrs; zebra::data @@ -408,12 +408,12 @@ zebra::meta Get &zebra; record internal metadata - XML and SUTRS + &xml; and &sutrs; zebra::index Get all indexed keys for record - XML and SUTRS + &xml; and &sutrs; @@ -422,7 +422,7 @@ Get indexed keys for field f for record - XML and SUTRS + &xml; and &sutrs; @@ -432,7 +432,7 @@ Get indexed keys for field f and type t for record - XML and SUTRS + &xml; and &sutrs; @@ -467,7 +467,7 @@ Z> elements zebra::meta::sysno Z> s 1+1 - displays in XML record syntax only internal + displays in &xml; record syntax only internal record system number, whereas Z> f @attr 1=title my @@ -498,7 +498,7 @@ Z> s 1+1 will display all indexed tokens from all indexed fields of the - first record, and it will display in SUTRS + first record, and it will display in &sutrs; record syntax, whereas Z> f @attr 1=title my @@ -508,13 +508,13 @@ Z> elements zebra::index::title:p Z> s 1+1 - displays in XML record syntax only the content + displays in &xml; record syntax only the content of the zebra string index title, or even only the type p phrase indexed part of it. - Trying to access numeric Bib-1 use + Trying to access numeric &bib1; use attributes or trying to access non-existent zebra intern string access points will result in a Diagnostic 25: Specified element set 'name not valid for specified database. diff --git a/doc/examples.xml b/doc/examples.xml index a2c2ab3..2681945 100644 --- a/doc/examples.xml +++ b/doc/examples.xml @@ -1,5 +1,5 @@ - + Example Configurations @@ -61,12 +61,12 @@ - Example 1: XML Indexing And Searching + Example 1: &xml; Indexing And Searching This example shows how &zebra; can be used with absolutely minimal configuration to index a body of - XML + &xml; documents, and search them using XPath expressions to specify access points. @@ -81,14 +81,14 @@ records are generated from the family tree in the file dino.tree.) Type make records/dino.xml - to make the XML data file. - (Or you could just type make dino to build the XML + to make the &xml; data file. + (Or you could just type make dino to build the &xml; data file, create the database and populate it with the taxonomic records all in one shot - but then you wouldn't learn anything, would you? :-) - Now we need to create a &zebra; database to hold and index the XML + Now we need to create a &zebra; database to hold and index the &xml; records. We do this with the &zebra; indexer, zebraidx, which is driven by the zebra.cfg configuration file. @@ -103,7 +103,7 @@ That's all you need for a minimal &zebra; configuration. Now you can - roll the XML records into the database and build the indexes: + roll the &xml; records into the database and build the indexes: zebraidx update records @@ -121,8 +121,8 @@ . - Now you can use the Z39.50 client program of your choice to execute - XPath-based boolean queries and fetch the XML records that satisfy + Now you can use the &z3950; client program of your choice to execute + XPath-based boolean queries and fetch the &xml; records that satisfy them: $ yaz-client @:9999 @@ -187,8 +187,8 @@ How, then, can we build broadcasting Information Retrieval applications that look for records in many different databases? - The Z39.50 protocol offers a powerful and general solution to this: - abstract ``access points''. In the Z39.50 model, an access point + The &z3950; protocol offers a powerful and general solution to this: + abstract ``access points''. In the &z3950; model, an access point is simply a point at which searches can be directed. Nothing is said about implementation: in a given database, an access point might be implemented as an index, a path into physical records, an @@ -198,7 +198,7 @@ For convenience, access points are gathered into attribute - sets. For example, the BIB-1 attribute set is supposed to + sets. For example, the &bib1; attribute set is supposed to contain bibliographic access points such as author, title, subject and ISBN; the GEO attribute set contains access points pertaining to geospatial information (bounding coordinates, stratum, latitude @@ -207,7 +207,7 @@ (provenance, inscriptions, etc.) - In practice, the BIB-1 attribute set has tended to be a dumping + In practice, the &bib1; attribute set has tended to be a dumping ground for all sorts of access points, so that, for example, it includes some geospatial access points as well as strictly bibliographic ones. Nevertheless, this model @@ -215,21 +215,21 @@ records in databases. - In the BIB-1 attribute set, a taxon name is probably best + In the &bib1; attribute set, a taxon name is probably best interpreted as a title - that is, a phrase that identifies the item - in question. BIB-1 represents title searches by + in question. &bib1; represents title searches by access point 4. (See - The BIB-1 Attribute + The &bib1; Attribute Set Semantics) So we need to configure our dinosaur database so that searches for - BIB-1 access point 4 look in the + &bib1; access point 4 look in the <termName> element, inside the top-level <Zthes> element. This is a two-step process. First, we need to tell &zebra; that we - want to support the BIB-1 attribute set. Then we need to tell it + want to support the &bib1; attribute set. Then we need to tell it which elements of its record pertain to access point 4. @@ -270,7 +270,7 @@ xelm /Zthes/termModifiedBy termModifiedBy:w - Declare Bib-1 attribute set. See bib1.att in + Declare &bib1; attribute set. See bib1.att in &zebra;'s tab directory. @@ -284,13 +284,13 @@ xelm /Zthes/termModifiedBy termModifiedBy:w Make termName word searchable by both - Zthes attribute termName (1002) and Bib-1 atttribute title (4). + Zthes attribute termName (1002) and &bib1; atttribute title (4). - After re-indexing, we can search the database using Bib-1 + After re-indexing, we can search the database using &bib1; attribute, title, as follows: Z> form xml @@ -305,7 +305,7 @@ Elapsed: 0.106896 Z> s Sent presentRequest (1+1). Records: 1 -[Default]Record type: XML +[Default]Record type: &xml; <Zthes> <termId>2</termId> <termName>Eoraptor</termName> diff --git a/doc/installation.xml b/doc/installation.xml index 328c1c2..2603948 100644 --- a/doc/installation.xml +++ b/doc/installation.xml @@ -1,8 +1,8 @@ - + Installation - &zebra; is written in ANSI C and was implemented with portability in mind. + &zebra; is written in &ansi; C and was implemented with portability in mind. We primarily use GCC on UNIX and Microsoft Visual C++ on Windows. @@ -30,9 +30,9 @@ (required) - &zebra; uses YAZ to support Z39.50 / - SRU. - Also the memory management utilites from YAZ is used by &zebra;. + &zebra; uses &yaz; to support &z3950; / + &sru;. + Also the memory management utilites from &yaz; is used by &zebra;. @@ -52,7 +52,7 @@ (optional) - XML parser. If you're going to index real XML you should + &xml; parser. If you're going to index real &xml; you should install this (filter grs.xml). On most systems you should be able to find binary Expat packages. @@ -103,7 +103,7 @@ On Unix, GCC works fine, but any native C compiler should be possible to use as long as it is - ANSI C compliant. + &ansi; C compliant. @@ -160,7 +160,7 @@ zebrasrv - The Z39.50 server and search engine. + The &z3950; server and search engine. @@ -179,8 +179,8 @@ The .so-files are &zebra; record filter modules. There are modules for reading - MARC (mod-grs-marc.so), - XML (mod-grs-xml.so) , etc. + &marc; (mod-grs-marc.so), + &xml; (mod-grs-xml.so) , etc. @@ -323,7 +323,7 @@ YAZDIR - Directory of YAZ source. &zebra;'s makefile expects to find + Directory of &yaz; source. &zebra;'s makefile expects to find yaz.lib, yaz.dll in yazdir/lib and yazdir/bin respectively. @@ -374,7 +374,7 @@ The DEBUG setting in the makefile for &zebra; must be set to the same value as DEBUG setting in the - makefile for YAZ. + makefile for &yaz;. If not, the &zebra; server/indexer will crash. @@ -453,7 +453,7 @@ redirection to other fields. For example the following snippet of a custom custom/bib1.att - Bib-1 attribute set definition file is no + &bib1; attribute set definition file is no longer supported: att 1016 Any 1016,4,1005,62 @@ -465,7 +465,7 @@ Similar behaviour can be expressed in the new release by defining - a new index Any:w in all GRS + a new index Any:w in all &grs1; *.abs record indexing configuration files. The above example configuration needs to make the changes from version 1.3.x indexing instructions @@ -486,13 +486,13 @@ att 1016 Body-of-text - with equivalent outcome without editing all GRS + with equivalent outcome without editing all &grs1; *.abs record indexing configuration files. Server installations which use the special - IDXPATH attribute set must add the following + &idxpath; attribute set must add the following line to the zebra.cfg configuration file: attset: idxpath.att diff --git a/doc/introduction.xml b/doc/introduction.xml index ff4e41e..47096a3 100644 --- a/doc/introduction.xml +++ b/doc/introduction.xml @@ -1,5 +1,5 @@ - + Introduction
@@ -7,10 +7,10 @@ &zebra; is a free, fast, friendly information management system. It can - index records in XML/SGML, MARC, e-mail archives and many other + index records in &xml;/&sgml;, &marc;, e-mail archives and many other formats, and quickly find them using a combination of boolean searching and relevance ranking. Search-and-retrieve applications can - be written using APIs in a wide variety of languages, communicating + be written using &api;s in a wide variety of languages, communicating with the &zebra; server using industry-standard information-retrieval protocols or web services. @@ -23,7 +23,7 @@ &zebra; is a networked component which acts as a reliable &z3950; server for both record/document search, presentation, insert, update and delete operations. In addition, it understands the &sru; family of - webservices, which exist in REST GET/POST and truly SOAP flavors. + webservices, which exist in &rest; &get;/&post; and truly &soap; flavors. &zebra; is available as MS Windows 2003 Server (32 bit) self-extracting @@ -42,7 +42,7 @@ &zebra; is a high-performance, general-purpose structured text indexing and retrieval engine. It reads records in a - variety of input formats (eg. email, XML, MARC) and provides access + variety of input formats (eg. email, &xml;, &marc;) and provides access to them through a powerful combination of boolean search expressions and relevance-ranked free-text queries. @@ -51,13 +51,13 @@ &zebra; supports large databases (tens of millions of records, tens of gigabytes of data). It allows safe, incremental database updates on live systems. Because &zebra; supports - the industry-standard information retrieval protocol, Z39.50, + the industry-standard information retrieval protocol, &z3950;, you can search &zebra; databases using an enormous variety of programs and toolkits, both commercial and free, which understand this protocol. Application libraries are available to allow bespoke clients to be written in Perl, C, C++, Java, Tcl, Visual - Basic, Python, PHP and more - see the - ZOOM web site + Basic, Python, &php; and more - see the + &zoom; web site for more information on some of these client toolkits. @@ -87,24 +87,24 @@
Boolean query language - CQL and RPN/PQF - The type-1 Reverse Polish Notation (RPN) - and it's textual representation Prefix Query Format (PQF) are - supported. The Common Query Language (CQL) can be configured as - a mapping from CQL to RPN/PQF + &cql; and &rpn;/&pqf; + The type-1 Reverse Polish Notation (&rpn;) + and it's textual representation Prefix Query Format (&pqf;) are + supported. The Common Query Language (&cql;) can be configured as + a mapping from &cql; to &rpn;/&pqf; Operation types - Z39.50/SRU explain, search, and scan + &z3950;/&sru; explain, search, and scan Recursive boolean query tree - CQL and RPN/PQF - Both CQL and RPN/PQF allow atomic query parts (APT) to + &cql; and &rpn;/&pqf; + Both &cql; and &rpn;/&pqf; allow atomic query parts (&apt;) to be combined into complex boolean query trees @@ -119,8 +119,8 @@ Complex semi-structured Documents - XML and GRS-1 Documents - Both XML and GRS-1 documents exhibit a DOM like internal + &xml; and &grs1; Documents + Both &xml; and &grs1; documents exhibit a &dom; like internal representation allowing for complex indexing and display rules @@ -138,12 +138,12 @@ Input document formats - XML, SGML, Text, ISO2709 (MARC) + &xml;, &sgml;, Text, ISO2709 (&marc;) A system of input filters driven by regular expressions allows most ASCII-based data formats to be easily processed. - SGML, XML, ISO2709 (MARC), and raw text are also + &sgml;, &xml;, ISO2709 (&marc;), and raw text are also supported. @@ -178,7 +178,7 @@ Remote update - Z39.50 extended services + &z3950; extended services @@ -192,12 +192,12 @@ - Z39.50 - Z39.50 protocol support + &z3950; + &z3950; protocol support Protocol facilities: Init, Search, Present (retrieval), Segmentation (support for very large records), Delete, Scan (index browsing), Sort, Close and support for the ``update'' - Extended Service to add or replace an existing XML + Extended Service to add or replace an existing &xml; record. Piggy-backed presents are honored in the search request. Named result sets are supported. @@ -206,18 +206,18 @@ Record Syntaxes Multiple record syntaxes - for data retrieval: GRS-1, SUTRS, - XML, ISO2709 (MARC), etc. Records can be mapped between record syntaxes + for data retrieval: &grs1;, &sutrs;, + &xml;, ISO2709 (&marc;), etc. Records can be mapped between record syntaxes and schemas on the fly. Web Service support - SRU GET/POST/SOAP + &sru_gps; The protocol operations explain, searchRetrieve and scan - are supported. CQL to internal - query model RPN conversion is supported. Extended RPN queries + are supported. &cql; to internal + query model &rpn; conversion is supported. Extended RPN queries for search/retrieve and scan are supported. @@ -316,7 +316,7 @@ In early 2005, the Koha project development team began looking at - ways to improve MARC support and overcome scalability limitations + ways to improve &marc; support and overcome scalability limitations in the Koha 2.x series. After extensive evaluations of the best of the Open Source textual database engines - including MySQL full-text searching, PostgreSQL, Lucene and Plucene - the team @@ -335,7 +335,7 @@ and relevance-ranked free-text queries, both of which the Koha 2.x series lack. &zebra; also supports incremental and safe database updates, which allow on-the-fly record - management. Finally, since &zebra; has at its heart the Z39.50 + management. Finally, since &zebra; has at its heart the &z3950; protocol, it greatly improves Koha's support for that critical library standard." @@ -365,12 +365,12 @@ from virtually any computer with an Internet connection, has template based layout allowing anyone to alter the visual appearance of Emilda, and is - XML based language for fast and easy portability to virtually any + &xml; based language for fast and easy portability to virtually any language. Currently, Emilda is used at three schools in Espoo, Finland. - As a surplus, 100% MARC compatibility has been achieved using the + As a surplus, 100% &marc; compatibility has been achieved using the &zebra; Server from Index Data as backend server. @@ -382,18 +382,18 @@ is a netbased library service offering all traditional functions on a very high level plus many new services. Reindex.net is a comprehensive and powerful WEB system - based on standards such as XML and Z39.50. - updates. Reindex supports MARC21, danMARC eller Dublin Core with + based on standards such as &xml; and &z3950;. + updates. Reindex supports &marc21;, dan&marc; eller Dublin Core with UTF8-encoding. Reindex.net runs on GNU/Debian Linux with &zebra; and Simpleserver from Index Data for bibliographic data. The relational database system - Sybase 9 XML is used for + Sybase 9 &xml; is used for administrative data. - Internally MARCXML is used for bibliographical records. Update - utilizes Z39.50 extended services. + Internally &marcxml; is used for bibliographical records. Update + utilizes &z3950; extended services. @@ -458,8 +458,8 @@ The &zebra; information retrieval indexing machine is used inside the Alvis framework to manage huge collections of natural language processed and - enhanced XML data, coming from a topic relevant web crawl. - In this application, &zebra; swallows and manages 37GB of XML data + enhanced &xml; data, coming from a topic relevant web crawl. + In this application, &zebra; swallows and manages 37GB of &xml; data in about 4 hours, resulting in search times of fractions of seconds. @@ -481,9 +481,9 @@ The member libraries send in data files representing their periodicals, including both brief bibliographic data and summary - holdings. Then 21 individual Z39.50 targets are created, each + holdings. Then 21 individual &z3950; targets are created, each using &zebra;, and all mounted on the single hardware server. - The live service provides a web gateway allowing Z39.50 searching + The live service provides a web gateway allowing &z3950; searching of all of the targets or a selection of them. &zebra;'s small footprint allows a relatively modest system to comfortably host the 21 servers. @@ -495,7 +495,7 @@
- NLI-Z39.50 - a Natural Language Interface for Libraries + NLI-&z3950; - a Natural Language Interface for Libraries Fernuniversität Hagen in Germany have developed a natural language interface for access to library databases. @@ -504,8 +504,8 @@ In order to evaluate this interface for recall and precision, they chose &zebra; as the basis for retrieval effectiveness. The &zebra; server contains a copy of the GIRT database, consisting of more - than 76000 records in SGML format (bibliographic records from - social science), which are mapped to MARC for presentation. + than 76000 records in &sgml; format (bibliographic records from + social science), which are mapped to &marc; for presentation. (GIRT is the German Indexing and Retrieval Testdatabase. It is a @@ -627,16 +627,16 @@ - Improved support for XML in search and retrieval. Eventually, + Improved support for &xml; in search and retrieval. Eventually, the goal is for &zebra; to pull double duty as a flexible - information retrieval engine and high-performance XML + information retrieval engine and high-performance &xml; repository. The recent addition of XPath searching is one example of the kind of enhancement we're working on. - There is also the experimental ALVIS XSLT - XML input filter, which unleashes the full power of DOM based - XSLT transformations during indexing and record retrieval. Work + There is also the experimental ALVIS &xslt; + &xml; input filter, which unleashes the full power of &dom; based + &xslt; transformations during indexing and record retrieval. Work on this filter has been sponsored by the ALVIS EU project . We expect this filter to mature soon, as it is planned to be included in the version 2.0 @@ -647,9 +647,9 @@ Finalisation and documentation of &zebra;'s C programming - API, allowing updates, database management and other functions - not readily expressed in Z39.50. We will also consider - exposing the API through SOAP. + &api;, allowing updates, database management and other functions + not readily expressed in &z3950;. We will also consider + exposing the &api; through &soap;. diff --git a/doc/marc_indexing.xml b/doc/marc_indexing.xml index 813e246..75a159f 100644 --- a/doc/marc_indexing.xml +++ b/doc/marc_indexing.xml @@ -1,38 +1,38 @@ - - + - Indexing of MARC records by &zebra; + Indexing of &marc; records by &zebra; - &zebra; is suitable for distribution of MARC records via Z39.50. We - have a several possibilities to describe the indexing process of MARC records. + &zebra; is suitable for distribution of &marc; records via &z3950;. We + have a several possibilities to describe the indexing process of &marc; records. This document shows these possibilities. - Simple indexing of MARC records + Simple indexing of &marc; records Simple indexing is not described yet. - Extended indexing of MARC records + Extended indexing of &marc; records -Extended indexing of MARC records will help you if you need index a +Extended indexing of &marc; records will help you if you need index a combination of subfields, or index only a part of the whole field, -or use during indexing process embedded fields of MARC record. +or use during indexing process embedded fields of &marc; record. -Extended indexing of MARC records additionally allows: +Extended indexing of &marc; records additionally allows: -to index data in LEADER of MARC record +to index data in LEADER of &marc; record @@ -44,23 +44,23 @@ or use during indexing process embedded fields of MARC record. -to index linked fields for UNIMARC based formats +to index linked fields for UNI&marc; based formats In compare with simple indexing process the extended indexing -may increase (about 2-3 times) the time of indexing process for MARC +may increase (about 2-3 times) the time of indexing process for &marc; records. The index-formula At the beginning, we have to define the term index-formula -for MARC records. This term helps to understand the notation of extended indexing of MARC records +for &marc; records. This term helps to understand the notation of extended indexing of MARC records by &zebra;. Our definition is based on the document "The -table of conformity for Z39.50 use attributes and RUSMARC fields". +table of conformity for &z3950; use attributes and R&usmarc; fields". The document is available only in russian language. The index-formula is the combination of subfields presented in such way: @@ -69,7 +69,7 @@ The document is available only in russian language. 71-00$a, $g, $h ($c){.$b ($c)} , (1) -We know that &zebra; supports a Bib-1 attribute - right truncation. +We know that &zebra; supports a &bib1; attribute - right truncation. In this case, the index-formula (1) consists from forms, defined in the same way as (1) @@ -79,7 +79,7 @@ forms, defined in the same way as (1) 71-00$a -The original MARC record may be without some elements, which included in index-formula. +The original &marc; record may be without some elements, which included in index-formula. This notation includes such operands as: @@ -92,7 +92,7 @@ forms, defined in the same way as (1) - - The position may contain any value, defined by MARC format. + The position may contain any value, defined by &marc; format. For example, index-formula @@ -132,7 +132,7 @@ forms, defined in the same way as (1) -All another operands are the same as accepted in MARC world. +All another operands are the same as accepted in &marc; world. @@ -146,7 +146,7 @@ forms, defined in the same way as (1) (.abs file). It means that names beginning with "mc-" are interpreted by &zebra; as index-formula. The database index is created and -linked with access point (Bib-1 use attribute) +linked with access point (&bib1; use attribute) according to this formula. For example, index-formula @@ -172,7 +172,7 @@ mc-71.00_$a,_$g,_$h_(_$c_){.$b_(_$c_)} . -The position may contain any value, defined by MARC format. For example, +The position may contain any value, defined by &marc; format. For example, index-formula @@ -232,7 +232,7 @@ includes -All another operands are the same as accepted in MARC world. +All another operands are the same as accepted in &marc; world. @@ -265,7 +265,7 @@ elm mc-ldr[7] Bib-level ! elm mc-008[0-5] Date/time-added-to-db ! -or for RUSMARC (this data included in 100th field) +or for R&usmarc; (this data included in 100th field) elm mc-100___$a[0-7]_ Date/time-added-to-db ! @@ -277,7 +277,7 @@ elm mc-100___$a[0-7]_ Date/time-added-to-db ! using indicators while indexing -For RUSMARC index-formula +For R&usmarc; index-formula 70-#1$a, $g matches @@ -293,9 +293,9 @@ indexed. -indexing embedded (linked) fields for UNIMARC based formats +indexing embedded (linked) fields for UNI&marc; based formats -For RUSMARC index-formula +For R&usmarc; index-formula 4--#-$170-#1$a, $g ($c) matches diff --git a/doc/querymodel.xml b/doc/querymodel.xml index afdb407..b3df12e 100644 --- a/doc/querymodel.xml +++ b/doc/querymodel.xml @@ -1,5 +1,5 @@ - + Query Model
@@ -11,18 +11,18 @@ &zebra; is born as a networking Information Retrieval engine adhering to the international standards - Z39.50 and - SRU, + &z3950; and + &sru;, and implement the - type-1 Reverse Polish Notation (RPN) query + type-1 Reverse Polish Notation (&rpn;) query model defined there. Unfortunately, this model has only defined a binary encoded representation, which is used as transport packaging in - the Z39.50 protocol layer. This representation is not human + the &z3950; protocol layer. This representation is not human readable, nor defines any convenient way to specify queries. - Since the type-1 (RPN) + Since the type-1 (&rpn;) query structure has no direct, useful string representation, every client application needs to provide some form of mapping from a local query notation or representation to it. @@ -30,33 +30,33 @@
- Prefix Query Format (PQF) + Prefix Query Format (&pqf;) Index Data has defined a textual representation in the Prefix Query Format, short - PQF, which maps + &pqf;, which maps one-to-one to binary encoded - type-1 RPN queries. - PQF has been adopted by other - parties developing Z39.50 software, and is often referred to as + type-1 &rpn; queries. + &pqf; has been adopted by other + parties developing &z3950; software, and is often referred to as Prefix Query Notation, or in short - PQN. See + &pqn;. See for further explanations and descriptions of &zebra;'s capabilities.
- Common Query Language (CQL) + Common Query Language (&cql;) - The query model of the type-1 RPN, - expressed in PQF/PQN is natively supported. - On the other hand, the default SRU + The query model of the type-1 &rpn;, + expressed in &pqf;/&pqn; is natively supported. + On the other hand, the default &sru; web services Common Query Language - CQL is not natively supported. + &cql; is not natively supported. - &zebra; can be configured to understand and map CQL to PQF. See + &zebra; can be configured to understand and map &cql; to &pqf;. See .
@@ -67,7 +67,7 @@ Operation types &zebra; supports all of the three different - Z39.50/SRU operations defined in the + &z3950;/&sru; operations defined in the standards: explain, search, and scan. A short description of the functionality and purpose of each is quite in order here. @@ -76,7 +76,7 @@
Explain Operation - The syntax of Z39.50/SRU queries is + The syntax of &z3950;/&sru; queries is well known to any client, but the specific semantics - taking into account a particular servers functionalities and abilities - must be @@ -89,15 +89,15 @@ of the general query model are supported. - The Z39.50 embeds the explain operation + The &z3950; embeds the explain operation by performing a search in the magic IR-Explain-1 database; see . - In SRU, explain is an entirely separate - operation, which returns an ZeeRex XML record according to the + In &sru;, explain is an entirely separate + operation, which returns an ZeeRex &xml; record according to the structure defined by the protocol. @@ -117,7 +117,7 @@ simple free text searches to nested complex boolean queries, targeting specific indexes, and possibly enhanced with many query semantic specifications. Search interactions are the heart - and soul of Z39.50/SRU servers. + and soul of &z3950;/&sru; servers.
@@ -145,24 +145,24 @@
- RPN queries and semantics + &rpn; queries and semantics - The PQF grammar - is documented in the YAZ manual, and shall not be - repeated here. This textual PQF representation + The &pqf; grammar + is documented in the &yaz; manual, and shall not be + repeated here. This textual &pqf; representation is not transmistted to &zebra; during search, but it is in the - client mapped to the equivalent Z39.50 binary + client mapped to the equivalent &z3950; binary query parse tree.
- RPN tree structure + &rpn; tree structure - The RPN parse tree - or the equivalent textual representation in PQF - + The &rpn; parse tree - or the equivalent textual representation in &pqf; - may start with one specification of the attribute set used. Following is a query tree, which - consists of atomic query parts (APT) or + consists of atomic query parts (&apt;) or named result sets, eventually paired by boolean binary operators, and finally recursively combined into @@ -184,7 +184,7 @@
Attribute set - PQF notation (Short hand) + &pqf; notation (Short hand) Status Notes @@ -201,10 +201,10 @@ predefined - Bib-1 + &bib1; bib-1 - Standard PQF query language attribute set which defines the - semantics of Z39.50 searching. In addition, all of the + Standard &pqf; query language attribute set which defines the + semantics of &z3950; searching. In addition, all of the non-use attributes (types 2-12) define the hard-wired &zebra; internal query processing. @@ -213,15 +213,15 @@ GILS gils - Extension to the Bib-1 attribute set. + Extension to the &bib1; attribute set. predefined @@ -238,7 +238,7 @@ The &zebra; internal query processing is modeled after - the Bib-1 attribute set, and the non-use + the &bib1; attribute set, and the non-use attributes type 2-6 are hard-wired in. It is therefore essential to be familiar with . @@ -317,7 +317,7 @@ retrieval, taking proximity into account: The hit set is a subset of the corresponding AND query - (see the PQF grammar for + (see the &pqf; grammar for details on the proximity operator): Z> find @prox 0 3 0 2 k 2 information retrieval @@ -338,23 +338,23 @@
- Atomic queries (APT) + Atomic queries (&apt;) Atomic queries are the query parts which work on one access point only. These consist of an attribute list followed by a single term or a quoted term list, and are often called - Attributes-Plus-Terms (APT) queries. + Attributes-Plus-Terms (&apt;) queries. - Atomic (APT) queries are always leaf nodes in the PQF query tree. + Atomic (&apt;) queries are always leaf nodes in the &pqf; query tree. UN-supplied non-use attributes types 2-12 are either inherited from higher nodes in the query tree, or are set to &zebra;'s default values. See for details.
- Atomic queries (APT) + Atomic queries (&apt;) @@ -407,7 +407,7 @@ The scan operation is only supported with - atomic APT queries, as it is bound to one access point at a + atomic &apt; queries, as it is bound to one access point at a time. Boolean query trees are not allowed during scan. @@ -429,8 +429,8 @@ Named result sets are supported in &zebra;, and result sets can be used as operands without limitations. It follows that named - result sets are leaf nodes in the PQF query tree, exactly as - atomic APT queries are. + result sets are leaf nodes in the &pqf; query tree, exactly as + atomic &apt; queries are. After the execution of a search, the result set is available at @@ -460,10 +460,10 @@ - Named result sets are only supported by the Z39.50 protocol. - The SRU web service is stateless, and therefore the notion of + Named result sets are only supported by the &z3950; protocol. + The &sru; web service is stateless, and therefore the notion of named result sets does not exist when accessing a &zebra; server by - the SRU protocol. + the &sru; protocol. @@ -501,7 +501,7 @@ It is possible to search in any silly string index - if it's defined in your - indexation rules and can be parsed by the PQF parser. + indexation rules and can be parsed by the &pqf; parser. This is definitely not the recommended use of this facility, as it might confuse your users with some very unexpected results. @@ -512,14 +512,14 @@ See also for details, and - for the SRU PQF query extension using string names as a fast + for the &sru; &pqf; query extension using string names as a fast debugging facility.
&zebra;'s special access point of type 'XPath' - for GRS filters + for &grs1; filters As we have seen above, it is possible (albeit seldom a great idea) to emulate @@ -531,15 +531,15 @@ be defined at indexation time, no new undefined XPath queries can entered at search time, and second, it might confuse users very much that an XPath-alike index name in fact - gets populated from a possible entirely different XML element + gets populated from a possible entirely different &xml; element than it pretends to access. - When using the GRS Record Model + When using the &grs1; Record Model (see ), we have the possibility to embed life XPath expressions - in the PQF queries, which are here called + in the &pqf; queries, which are here called use (type 1) xpath attributes. You must enable the xpath enable directive in your @@ -549,14 +549,14 @@ Only a very restricted subset of the XPath 1.0 - standard is supported as the GRS record model is simpler than - a full XML DOM structure. See the following examples for + standard is supported as the &grs1; record model is simpler than + a full &xml; &dom; structure. See the following examples for possibilities. Finding all documents which have the term "content" - inside a text node found in a specific XML DOM + inside a text node found in a specific &xml; &dom; subtree, whose starting element is addressed by XPath. @@ -586,7 +586,7 @@ Filter the addressing XPath by a predicate working on exact string values in - attributes (in the XML sense) can be done: return all those docs which + attributes (in the &xml; sense) can be done: return all those docs which have the term "english" contained in one of all text sub nodes of the subtree defined by the XPath /record/title[@lang='en']. And similar @@ -607,8 +607,8 @@ - Escaping PQF keywords and other non-parseable XPath constructs - with '{ }' to prevent client-side PQF parsing + Escaping &pqf; keywords and other non-parseable XPath constructs + with '{ }' to prevent client-side &pqf; parsing syntax errors: Z> find @attr {1=/root/first[@attr='danish']} content @@ -630,7 +630,7 @@
Explain Attribute Set - The Z39.50 standard defines the + The &z3950; standard defines the Explain attribute set Exp-1, which is used to discover information about a server's search semantics and functional capabilities @@ -644,11 +644,11 @@ In addition, the non-Use - Bib-1 attributes, that is, the types + &bib1; attributes, that is, the types Relation, Position, Structure, Truncation, and Completeness are imported from - the Bib-1 attribute set, and may be used + the &bib1; attribute set, and may be used within any explain query. @@ -669,7 +669,7 @@ See tab/explain.att and the - Z39.50 standard + &z3950; standard for more information.
@@ -678,11 +678,11 @@ Explain searches with yaz-client Classic Explain only defines retrieval of Explain information - via ASN.1. Practically no Z39.50 clients supports this. Fortunately + via ASN.1. Practically no &z3950; clients supports this. Fortunately they don't have to - &zebra; allows retrieval of this information in other formats: - SUTRS, XML, - GRS-1 and ASN.1 Explain. + &sutrs;, &xml;, + &grs1; and ASN.1 Explain. @@ -743,7 +743,7 @@ Default. This query is very useful to study the internal &zebra; indexes. If records have been indexed using the alvis - XSLT filter, the string representation names of the known indexes can be + &xslt; filter, the string representation names of the known indexes can be found. Z> base IR-Explain-1 @@ -760,13 +760,13 @@
- Bib-1 Attribute Set + &bib1; Attribute Set Most of the information contained in this section is an excerpt of - the ATTRIBUTE SET BIB-1 (Z39.50-1995) SEMANTICS - found at . The Bib-1 + the ATTRIBUTE SET &bib1; (&z3950;-1995) SEMANTICS + found at . The &bib1; Attribute Set Semantics from 1995, also in an updated - Bib-1 + &bib1; Attribute Set version from 2003. Index Data is not the copyright holder of this information, except for the configuration details, the listing of @@ -788,7 +788,7 @@ tab/gils.att. - For example, some few Bib-1 use + For example, some few &bib1; use attributes from the tab/bib1.att are: att 1 Personal-name @@ -979,7 +979,7 @@ AlwaysMatches (103) is a great way to discover how many documents have been indexed in a given field. The search term is ignored, but needed for correct - PQF syntax. An empty search term may be supplied. + &pqf; syntax. An empty search term may be supplied. Z> find @attr 1=Title @attr 2=103 "" Z> find @attr 1=Title @attr 2=103 @attr 4=1 "" @@ -1159,7 +1159,7 @@ is supported, and maps to the boolean AND combination of words supplied. The word list is useful when google-like bag-of-word queries need to be translated from a GUI - query language to PQF. For example, the following queries + query language to &pqf;. For example, the following queries are equivalent: Z> find @attr 1=Title @attr 4=6 "mozart amadeus" @@ -1213,7 +1213,7 @@ - The exact mapping between PQF queries and &zebra; internal indexes + The exact mapping between &pqf; queries and &zebra; internal indexes and index types is explained in . @@ -1408,14 +1408,14 @@ The Complete subfield (2) is a reminiscens - from the happy MARC + from the happy &marc; binary format days. &zebra; does not support it, but maps silently to Complete field (3). - The exact mapping between PQF queries and &zebra; internal indexes + The exact mapping between &pqf; queries and &zebra; internal indexes and index types is explained in . @@ -1427,7 +1427,7 @@
- Extended &zebra; RPN Features + Extended &zebra; &rpn; Features The &zebra; internal query engine has been extended to specific needs not covered by the bib-1 attribute set query @@ -1478,7 +1478,7 @@