X-Git-Url: http://git.indexdata.com/?p=idzebra-moved-to-github.git;a=blobdiff_plain;f=doc%2Frecordmodel-grs.xml;h=c4ff6c77e148df70cf268c917521097d7a68d43a;hp=68744b002324ca11f853a6bff8317f743dc57d9e;hb=e4c6861efeeea654bfb00c5f0239ee258629d77f;hpb=20748e92bc15812298ad44318f0c54dd297bb9df diff --git a/doc/recordmodel-grs.xml b/doc/recordmodel-grs.xml index 68744b0..c4ff6c7 100644 --- a/doc/recordmodel-grs.xml +++ b/doc/recordmodel-grs.xml @@ -1,6 +1,13 @@ - - GRS Record Model and Filter Modules + &acro.grs1; Record Model and Filter Modules + + + + The functionality of this record model has been improved and + replaced by the DOM &acro.xml; record model. See + . + + The record model described in this chapter applies to the fundamental, @@ -11,7 +18,7 @@
- GRS Record Filters + &acro.grs1; Record Filters Many basic subtypes of the grs type are currently available: @@ -25,7 +32,7 @@ This is the canonical input format described . It is using - simple SGML-like syntax. + simple &acro.sgml;-like syntax. @@ -33,18 +40,18 @@ grs.marc.type - This allows Zebra to read - records in the ISO2709 (MARC) encoding standard. + This allows &zebra; to read + records in the ISO2709 (&acro.marc;) encoding standard. Last parameter type names the .abs file (see below) - which describes the specific MARC structure of the input record as + which describes the specific &acro.marc; structure of the input record as well as the indexing rules. - The grs.marc uses an internal represtantion - which is not XML conformant. In particular MARC tags are - presented as elements with the same name. And XML elements + The grs.marc uses an internal representation + which is not &acro.xml; conformant. In particular &acro.marc; tags are + presented as elements with the same name. And &acro.xml; elements may not start with digits. Therefore this filter is only - suitable for systems returning GRS-1 and MARC records. For XML + suitable for systems returning &acro.grs1; and &acro.marc; records. For &acro.xml; use grs.marcxml filter instead (see below). @@ -58,17 +65,17 @@ grs.marcxml.type - This allows Zebra to read ISO2709 encoded records. + This allows &zebra; to read ISO2709 encoded records. Last parameter type names the .abs file (see below) - which describes the specific MARC structure of the input record as + which describes the specific &acro.marc; structure of the input record as well as the indexing rules. The internal representation for grs.marcxml - is the same as for MARCXML. + is the same as for &acro.marcxml;. It slightly more complicated to work with than - grs.marc but XML conformant. + grs.marc but &acro.xml; conformant. The loadable grs.marcxml filter module @@ -81,18 +88,18 @@ grs.xml - This filter reads XML records and uses + This filter reads &acro.xml; records and uses Expat to - parse them and convert them into IDZebra's internal + parse them and convert them into ID&zebra;'s internal grs record model. - Only one record per file is supported, due to the fact XML does + Only one record per file is supported, due to the fact &acro.xml; does not allow two documents to "follow" each other (there is no way to know when a document is finished). - This filter is only available if Zebra is compiled with EXPAT support. + This filter is only available if &zebra; is compiled with EXPAT support. The loadable grs.xml filter module - is packagged in the GNU/Debian package + is packaged in the GNU/Debian package libidzebra2.0-mod-grs-xml @@ -130,14 +137,14 @@
- GRS Canonical Input Format + &acro.grs1; Canonical Input Format Although input data can take any form, it is sometimes useful to describe the record processing capabilities of the system in terms of a single, canonical input format that gives access to the full - spectrum of structure and flexibility in the system. In Zebra, this - canonical format is an "SGML-like" syntax. + spectrum of structure and flexibility in the system. In &zebra;, this + canonical format is an "&acro.sgml;-like" syntax. @@ -175,7 +182,7 @@ @@ -1758,7 +1765,7 @@
- GRS Exchange Formats + &acro.grs1; Exchange Formats Converting records from the internal structure to an exchange format @@ -1770,7 +1777,7 @@ - GRS-1. The internal representation is based on GRS-1/XML, so the + &acro.grs1;. The internal representation is based on &acro.grs1;/&acro.xml;, so the conversion here is straightforward. The system will create applied variant and supported variant lists as required, if a record contains variant information. @@ -1779,34 +1786,34 @@ - XML. The internal representation is based on GRS-1/XML so - the mapping is trivial. Note that XML schemas, preprocessing + &acro.xml;. The internal representation is based on &acro.grs1;/&acro.xml; so + the mapping is trivial. Note that &acro.xml; schemas, preprocessing instructions and comments are not part of the internal representation - and therefore will never be part of a generated XML record. - Future versions of the Zebra will support that. + and therefore will never be part of a generated &acro.xml; record. + Future versions of the &zebra; will support that. - SUTRS. Again, the mapping is fairly straightforward. Indentation + &acro.sutrs;. Again, the mapping is fairly straightforward. Indentation is used to show the hierarchical structure of the record. All - "GRS" type records support both the GRS-1 and SUTRS + "&acro.grs1;" type records support both the &acro.grs1; and &acro.sutrs; representations. - + - ISO2709-based formats (USMARC, etc.). Only records with a + ISO2709-based formats (&acro.usmarc;, etc.). Only records with a two-level structure (corresponding to fields and subfields) can be directly mapped to ISO2709. For records with a different structuring - (eg., GILS), the representation in a structure like USMARC involves a + (e.g., GILS), the representation in a structure like &acro.usmarc; involves a schema-mapping (see ), to an - "implied" USMARC schema (implied, + "implied" &acro.usmarc; schema (implied, because there is no formal schema which specifies the use of the - USMARC fields outside of ISO2709). The resultant, two-level record is + &acro.usmarc; fields outside of ISO2709). The resultant, two-level record is then mapped directly from the internal representation to ISO2709. See the GILS schema definition files for a detailed example of this approach. @@ -1845,18 +1852,18 @@
- Extended indexing of MARC records + Extended indexing of &acro.marc; records - Extended indexing of MARC records will help you if you need index a + Extended indexing of &acro.marc; records will help you if you need index a combination of subfields, or index only a part of the whole field, - or use during indexing process embedded fields of MARC record. + or use during indexing process embedded fields of &acro.marc; record. - Extended indexing of MARC records additionally allows: + Extended indexing of &acro.marc; records additionally allows: - to index data in LEADER of MARC record + to index data in LEADER of &acro.marc; record @@ -1868,26 +1875,26 @@ - to index linked fields for UNIMARC based formats + to index linked fields for UNI&acro.marc; based formats In compare with simple indexing process the extended indexing - may increase (about 2-3 times) the time of indexing process for MARC + may increase (about 2-3 times) the time of indexing process for &acro.marc; records.
The index-formula At the beginning, we have to define the term - index-formula for MARC records. This term helps - to understand the notation of extended indexing of MARC records by Zebra. + index-formula for &acro.marc; records. This term helps + to understand the notation of extended indexing of &acro.marc; records by &zebra;. Our definition is based on the document "The table - of conformity for Z39.50 use attributes and RUSMARC fields". - The document is available only in russian language. + of conformity for &acro.z3950; use attributes and R&acro.usmarc; fields". + The document is available only in Russian language. The index-formula is the combination of @@ -1899,7 +1906,7 @@ - We know that Zebra supports a Bib-1 attribute - right truncation. + We know that &zebra; supports a &acro.bib1; attribute - right truncation. In this case, the index-formula (1) consists from forms, defined in the same way as (1) @@ -1910,7 +1917,7 @@ - The original MARC record may be without some elements, which included in index-formula. + The original &acro.marc; record may be without some elements, which included in index-formula. @@ -1925,7 +1932,7 @@ - The position may contain any value, defined by - MARC format. + &acro.marc; format. For example, index-formula @@ -1968,22 +1975,22 @@ - All another operands are the same as accepted in MARC world. + All another operands are the same as accepted in &acro.marc; world.
- Notation of <emphasis>index-formula</emphasis> for Zebra + Notation of <emphasis>index-formula</emphasis> for &zebra; Extended indexing overloads path of - elm definition in abstract syntax file of Zebra + elm definition in abstract syntax file of &zebra; (.abs file). It means that names beginning with - "mc-" are interpreted by Zebra as + "mc-" are interpreted by &zebra; as index-formula. The database index is created and - linked with access point (Bib-1 use attribute) + linked with access point (&acro.bib1; use attribute) according to this formula. For example, index-formula @@ -2010,7 +2017,7 @@ . The position may contain any value, defined by - MARC format. For example, + &acro.marc; format. For example, index-formula @@ -2074,7 +2081,7 @@ - All another operands are the same as accepted in MARC world. + All another operands are the same as accepted in &acro.marc; world.
@@ -2107,7 +2114,7 @@ elm mc-008[0-5] Date/time-added-to-db ! - or for RUSMARC (this data included in 100th field) + or for R&acro.usmarc; (this data included in 100th field) elm mc-100___$a[0-7]_ Date/time-added-to-db ! @@ -2119,14 +2126,14 @@ using indicators while indexing - For RUSMARC index-formula + For R&acro.usmarc; index-formula 70-#1$a, $g matches elm 70._1_$a,_$g_ Author !:w,!:p - When Zebra finds a field according to + When &zebra; finds a field according to "70." pattern it checks the indicators. In this case the value of first indicator doesn't mater, but the value of second one must be whitespace, in another case a field is not @@ -2135,10 +2142,10 @@ - indexing embedded (linked) fields for UNIMARC based + indexing embedded (linked) fields for UNI&acro.marc; based formats - For RUSMARC index-formula + For R&acro.usmarc; index-formula 4--#-$170-#1$a, $g ($c) matches