X-Git-Url: http://git.indexdata.com/?p=idzebra-moved-to-github.git;a=blobdiff_plain;f=doc%2Frecordmodel-grs.xml;h=853410a0f203c748cb88382960d5dbd51fd7fee4;hp=c4ff6c77e148df70cf268c917521097d7a68d43a;hb=972bceaa6386f904bc3e4845f1c5598656c5c6f2;hpb=bb39ca3dd76e6339f66813bca1e64b644760e5a2 diff --git a/doc/recordmodel-grs.xml b/doc/recordmodel-grs.xml index c4ff6c7..853410a 100644 --- a/doc/recordmodel-grs.xml +++ b/doc/recordmodel-grs.xml @@ -1,13 +1,13 @@ &acro.grs1; Record Model and Filter Modules - - - The functionality of this record model has been improved and - replaced by the DOM &acro.xml; record model. See - . - - + + + The functionality of this record model has been improved and + replaced by the DOM &acro.xml; record model. See + . + + The record model described in this chapter applies to the fundamental, @@ -32,7 +32,7 @@ This is the canonical input format described . It is using - simple &acro.sgml;-like syntax. + simple &acro.sgml;-like syntax. @@ -41,7 +41,7 @@ This allows &zebra; to read - records in the ISO2709 (&acro.marc;) encoding standard. + records in the ISO2709 (&acro.marc;) encoding standard. Last parameter type names the .abs file (see below) which describes the specific &acro.marc; structure of the input record as @@ -55,8 +55,8 @@ use grs.marcxml filter instead (see below). - The loadable grs.marc filter module - is packaged in the GNU/Debian package + The loadable grs.marc filter module + is packaged in the GNU/Debian package libidzebra2.0-mod-grs-marc @@ -74,7 +74,7 @@ The internal representation for grs.marcxml is the same as for &acro.marcxml;. - It slightly more complicated to work with than + It slightly more complicated to work with than grs.marc but &acro.xml; conformant. @@ -90,7 +90,7 @@ This filter reads &acro.xml; records and uses Expat to - parse them and convert them into ID&zebra;'s internal + parse them and convert them into ID&zebra;'s internal grs record model. Only one record per file is supported, due to the fact &acro.xml; does not allow two documents to "follow" each other (there is no way @@ -101,7 +101,7 @@ The loadable grs.xml filter module is packaged in the GNU/Debian package libidzebra2.0-mod-grs-xml - + @@ -122,7 +122,7 @@ grs.tcl.filter - Similar to grs.regx but using Tcl for rules, described in + Similar to grs.regx but using Tcl for rules, described in . @@ -164,16 +164,16 @@ <Distributor> - <Name> USGS/WRD </Name> - <Organization> USGS/WRD </Organization> - <Street-Address> - U.S. GEOLOGICAL SURVEY, 505 MARQUETTE, NW - </Street-Address> - <City> ALBUQUERQUE </City> - <State> NM </State> - <Zip-Code> 87102 </Zip-Code> - <Country> USA </Country> - <Telephone> (505) 766-5560 </Telephone> + <Name> USGS/WRD </Name> + <Organization> USGS/WRD </Organization> + <Street-Address> + U.S. GEOLOGICAL SURVEY, 505 MARQUETTE, NW + </Street-Address> + <City> ALBUQUERQUE </City> + <State> NM </State> + <Zip-Code> 87102 </Zip-Code> + <Country> USA </Country> + <Telephone> (505) 766-5560 </Telephone> </Distributor> @@ -181,12 +181,12 @@ @@ -230,7 +230,7 @@ <gils> - <title>Zen and the Art of Motorcycle Maintenance</title> + <title>Zen and the Art of Motorcycle Maintenance</title> </gils> @@ -359,7 +359,7 @@ type regx, argument filter-filename). - + Generally, an input filter consists of a sequence of rules, where each rule consists of a sequence of expressions, followed by an action. The @@ -367,7 +367,7 @@ and the actions normally contribute to the generation of an internal representation of the record. - + An expression can be either of the following: @@ -415,7 +415,7 @@ Matches regular expression pattern reg from the input record. The operators supported are the same - as for regular expression queries. Refer to + as for regular expression queries. Refer to . @@ -467,7 +467,7 @@ data element. The type is one of the following: - + record @@ -568,10 +568,10 @@ /^Subject:/ BODY /$/ { data -element title $1 } /^Date:/ BODY /$/ { data -element lastModified $1 } /\n\n/ BODY END { - begin element bodyOfDisplay - begin variant body iana "text/plain" - data -text $1 - end record + begin element bodyOfDisplay + begin variant body iana "text/plain" + data -text $1 + end record } @@ -604,9 +604,9 @@ - ROOT - TITLE "Zen and the Art of Motorcycle Maintenance" - AUTHOR "Robert Pirsig" + ROOT + TITLE "Zen and the Art of Motorcycle Maintenance" + AUTHOR "Robert Pirsig" @@ -619,11 +619,11 @@ - ROOT - TITLE "Zen and the Art of Motorcycle Maintenance" - AUTHOR - FIRST-NAME "Robert" - SURNAME "Pirsig" + ROOT + TITLE "Zen and the Art of Motorcycle Maintenance" + AUTHOR + FIRST-NAME "Robert" + SURNAME "Pirsig" @@ -687,38 +687,38 @@ Which of the two elements are transmitted to the client by the server depends on the specifications provided by the client, if any. - + In practice, each variant node is associated with a triple of class, type, value, corresponding to the variant mechanism of &acro.z3950;. - + - +
Data Elements - + Data nodes have no children (they are always leaf nodes in the record tree). - + - +
- + - +
&acro.grs1; Record Model Configuration - + The following sections describe the configuration files that govern - the internal management of grs records. + the internal management of grs records. The system searches for the files in the directories specified by the profilePath setting in the zebra.cfg file. @@ -735,7 +735,7 @@ @@ -766,7 +766,7 @@ known. - + The variant set which is used in the profile. This provides a @@ -800,7 +800,7 @@ - + A list of element descriptions (this is the actual ARS of the schema, in &acro.z3950; terms), which lists the ways in which the various @@ -847,19 +847,19 @@ file. Some settings are optional (o), while others again are mandatory (m). - +
- +
The Abstract Syntax (.abs) Files - + The name of this file type is slightly misleading in &acro.z3950; terms, since, apart from the actual abstract syntax of the profile, it also includes most of the other definitions that go into a database profile. - + When a record in the canonical, &acro.sgml;-like format is read from a file or from the database, the first tag of the file should reference the @@ -867,7 +867,7 @@ record is, say, <gils>, the system will look for the profile definition in the file gils.abs. Profile definitions are cached, so they only have to be read once - during the lifespan of the current process. + during the lifespan of the current process. @@ -876,14 +876,14 @@ introduces the profile, and should always be called first thing when introducing a new record. - + The file may contain the following directives: - + - + name symbolic-name @@ -1003,7 +1003,7 @@ - + xelm xpath attributes @@ -1049,7 +1049,7 @@ file via a header this directive is ignored. If neither this directive is given, nor an encoding is set within external records, ISO-8859-1 encoding is assumed. - + @@ -1058,60 +1058,60 @@ If this directive is followed by enable, then extra indexing is performed to allow for XPath-like queries. - If this directive is not specified - equivalent to + If this directive is not specified - equivalent to disable - no extra XPath-indexing is performed. - @@ -1124,7 +1124,7 @@ Specifies what information, if any, &zebra; should - automatically include in retrieval records for the + automatically include in retrieval records for the ``system fields'' that it supports. systemTag may be any of the following: @@ -1132,24 +1132,24 @@ rank - An integer indicating the relevance-ranking score - assigned to the record. - + An integer indicating the relevance-ranking score + assigned to the record. + sysno - An automatically generated identifier for the record, - unique within this database. It is represented by the - <localControlNumber> element in - &acro.xml; and the (1,14) tag in &acro.grs1;. - + An automatically generated identifier for the record, + unique within this database. It is represented by the + <localControlNumber> element in + &acro.xml; and the (1,14) tag in &acro.grs1;. + size - The size, in bytes, of the retrieved record. - + The size, in bytes, of the retrieved record. + @@ -1162,7 +1162,7 @@ - + The mechanism for controlling indexing is not adequate for @@ -1170,7 +1170,7 @@ configuration table eventually. - + The following is an excerpt from the abstract syntax file for the GILS profile. @@ -1202,7 +1202,7 @@ elm (4,1) controlIdentifier Identifier-standard elm (2,6) abstract Abstract elm (4,51) purpose ! - elm (4,52) originator - + elm (4,52) originator - elm (4,53) accessConstraints ! elm (4,54) useConstraints ! elm (4,70) availability - @@ -1222,10 +1222,10 @@ This file type describes the Use elements of - an attribute set. - It contains the following directives. + an attribute set. + It contains the following directives. - + @@ -1273,7 +1273,7 @@ attribute value is stored in the index (unless a local-value is given, in which case this is stored). The name is used to refer to the - attribute from the abstract syntax. + attribute from the abstract syntax. @@ -1563,7 +1563,7 @@ otherwise is noted. - + The directives available in the element set file are as follows: @@ -1701,10 +1701,10 @@ @@ -1756,9 +1756,9 @@
@@ -1836,7 +1836,7 @@ - + SOIF. Support for this syntax is experimental, and is currently @@ -1846,48 +1846,48 @@ level. - + - +
Extended indexing of &acro.marc; records - + Extended indexing of &acro.marc; records will help you if you need index a combination of subfields, or index only a part of the whole field, or use during indexing process embedded fields of &acro.marc; record. - + Extended indexing of &acro.marc; records additionally allows: - + to index data in LEADER of &acro.marc; record - + to index data in control fields (with fixed length) - + to use during indexing the values of indicators - + to index linked fields for UNI&acro.marc; based formats - + - + In compare with simple indexing process the extended indexing may increase (about 2-3 times) the time of indexing process for &acro.marc; records. - +
The index-formula - + At the beginning, we have to define the term index-formula for &acro.marc; records. This term helps to understand the notation of extended indexing of &acro.marc; records by &zebra;. @@ -1895,84 +1895,84 @@ "The table of conformity for &acro.z3950; use attributes and R&acro.usmarc; fields". The document is available only in Russian language. - + The index-formula is the combination of subfields presented in such way: - + 71-00$a, $g, $h ($c){.$b ($c)} , (1) - + We know that &zebra; supports a &acro.bib1; attribute - right truncation. - In this case, the index-formula (1) consists from + In this case, the index-formula (1) consists from forms, defined in the same way as (1) - + 71-00$a, $g, $h 71-00$a, $g 71-00$a - + The original &acro.marc; record may be without some elements, which included in index-formula. - + This notation includes such operands as: - + # It means whitespace character. - + - The position may contain any value, defined by &acro.marc; format. For example, index-formula - + 70-#1$a, $g , (2) - - includes - + + includes + 700#1$a, $g 701#1$a, $g 702#1$a, $g - + - + {...} The repeatable elements are defined in figure-brackets {}. For example, index-formula - + 71-00$a, $g, $h ($c){.$b ($c)} , (3) - + includes - + 71-00$a, $g, $h ($c). $b ($c) 71-00$a, $g, $h ($c). $b ($c). $b ($c) 71-00$a, $g, $h ($c). $b ($c). $b ($c). $b ($c) - + - + All another operands are the same as accepted in &acro.marc; world. @@ -1980,11 +1980,11 @@
- +
Notation of <emphasis>index-formula</emphasis> for &zebra; - - + + Extended indexing overloads path of elm definition in abstract syntax file of &zebra; (.abs file). It means that names beginning with @@ -1992,40 +1992,40 @@ index-formula. The database index is created and linked with access point (&acro.bib1; use attribute) according to this formula. - + For example, index-formula - + 71-00$a, $g, $h ($c){.$b ($c)} , (4) - + in .abs file looks like: - + mc-71.00_$a,_$g,_$h_(_$c_){.$b_(_$c_)} - - + + The notation of index-formula uses the operands: - + _ It means whitespace character. - + . The position may contain any value, defined by &acro.marc; format. For example, index-formula - + 70-#1$a, $g , (5) - + matches mc-70._1_$a,_$g_ and includes - + 700_1_$a,_$g_ 701_1_$a,_$g_ @@ -2033,21 +2033,21 @@ - + {...} The repeatable elements are defined in figure-brackets {}. For example, index-formula - + 71#00$a, $g, $h ($c) {.$b ($c)} , (6) - - matches + + matches mc-71.00_$a,_$g,_$h_(_$c_){.$b_(_$c_)} and includes - + 71.00_$a,_$g,_$h_(_$c_).$b_(_$c_) 71.00_$a,_$g,_$h_(_$c_).$b_(_$c_).$b_(_$c_) @@ -2055,120 +2055,120 @@ - + <...> Embedded index-formula (for linked fields) is between <>. For example, index-formula - + 4--#-$170-#1$a, $g ($c) , (7) - + matches mc-4.._._$1<70._1_$a,_$g_(_$c_)>_ and includes - + 463_._$1<70._1_$a,_$g_(_$c_)>_ - + - + All another operands are the same as accepted in &acro.marc; world. - +
Examples - + - + - + indexing LEADER - + You need to use keyword "ldr" to index leader. For example, indexing data from 6th and 7th position of LEADER - + elm mc-ldr[6] Record-type ! elm mc-ldr[7] Bib-level ! - + - + - + indexing data from control fields - + indexing date (the time added to database) - + - elm mc-008[0-5] Date/time-added-to-db ! + elm mc-008[0-5] Date/time-added-to-db ! - + or for R&acro.usmarc; (this data included in 100th field) - + elm mc-100___$a[0-7]_ Date/time-added-to-db ! - + - + - + using indicators while indexing For R&acro.usmarc; index-formula 70-#1$a, $g matches - + elm 70._1_$a,_$g_ Author !:w,!:p - - When &zebra; finds a field according to + + When &zebra; finds a field according to "70." pattern it checks the indicators. In this case the value of first indicator doesn't mater, but the value of - second one must be whitespace, in another case a field is not + second one must be whitespace, in another case a field is not indexed. - + - + indexing embedded (linked) fields for UNI&acro.marc; based formats - - For R&acro.usmarc; index-formula + + For R&acro.usmarc; index-formula 4--#-$170-#1$a, $g ($c) matches - + _ Author !:w,!:p ]]> - + Data are extracted from record if the field matches to "4.._." pattern and data in linked field match to embedded index-formula 70._1_$a,_$g_(_$c_). - + - + - - + +
- +