Indexing of &acro.marc; records by &zebra;
&zebra; is suitable for distribution of &acro.marc; records via &acro.z3950;. We
have a several possibilities to describe the indexing process of &acro.marc; records.
This document shows these possibilities.
Simple indexing of &acro.marc; records
Simple indexing is not described yet.
Extended indexing of &acro.marc; records
Extended indexing of &acro.marc; records will help you if you need index a
combination of subfields, or index only a part of the whole field,
or use during indexing process embedded fields of &acro.marc; record.
Extended indexing of &acro.marc; records additionally allows:
to index data in LEADER of &acro.marc; record
to index data in control fields (with fixed length)
to use during indexing the values of indicators
to index linked fields for UNI&acro.marc; based formats
In compare with simple indexing process the extended indexing
may increase (about 2-3 times) the time of indexing process for &acro.marc;
records.
The index-formula
At the beginning, we have to define the term index-formula
for &acro.marc; records. This term helps to understand the notation of extended indexing of MARC records
by &zebra;. Our definition is based on the document "The
table of conformity for &acro.z3950; use attributes and R&acro.usmarc; fields".
The document is available only in Russian language.
The index-formula is the combination of subfields presented in such way:
71-00$a, $g, $h ($c){.$b ($c)} , (1)
We know that &zebra; supports a &acro.bib1; attribute - right truncation.
In this case, the index-formula (1) consists from
forms, defined in the same way as (1)
71-00$a, $g, $h
71-00$a, $g
71-00$a
The original &acro.marc; record may be without some elements, which included in index-formula.
This notation includes such operands as:
#
It means whitespace character.
-
The position may contain any value, defined by &acro.marc; format.
For example, index-formula
70-#1$a, $g , (2)
includes
700#1$a, $g
701#1$a, $g
702#1$a, $g
{...}
The repeatable elements are defined in figure-brackets {}. For example,
index-formula
71-00$a, $g, $h ($c){.$b ($c)} , (3)
includes
71-00$a, $g, $h ($c). $b ($c)
71-00$a, $g, $h ($c). $b ($c). $b ($c)
71-00$a, $g, $h ($c). $b ($c). $b ($c). $b ($c)
All another operands are the same as accepted in &acro.marc; world.
Notation of index-formula for &zebra;
Extended indexing overloads path of
elm definition in abstract syntax file of &zebra;
(.abs file). It means that names beginning with
"mc-" are interpreted by &zebra; as
index-formula. The database index is created and
linked with access point (&acro.bib1; use attribute)
according to this formula.
For example, index-formula
71-00$a, $g, $h ($c){.$b ($c)} , (4)
in .abs file looks like:
mc-71.00_$a,_$g,_$h_(_$c_){.$b_(_$c_)}
The notation of index-formula uses the operands:
_
It means whitespace character.
.
The position may contain any value, defined by &acro.marc; format. For example,
index-formula
70-#1$a, $g , (5)
matches mc-70._1_$a,_$g_ and includes
700_1_$a,_$g_
701_1_$a,_$g_
702_1_$a,_$g_
{...}
The repeatable elements are defined in figure-brackets {}. For example,
index-formula
71#00$a, $g, $h ($c) {.$b ($c)} , (6)
matches mc-71.00_$a,_$g,_$h_(_$c_){.$b_(_$c_)} and
includes
71.00_$a,_$g,_$h_(_$c_).$b_(_$c_)
71.00_$a,_$g,_$h_(_$c_).$b_(_$c_).$b_(_$c_)
71.00_$a,_$g,_$h_(_$c_).$b_(_$c_).$b_(_$c_).$b_(_$c_)
<...>
Embedded index-formula (for linked fields) is between <>. For example,
index-formula
4--#-$170-#1$a, $g ($c) , (7)
matches mc-4.._._$1<70._1_$a,_$g_(_$c_)>_ and
includes
463_._$1<70._1_$a,_$g_(_$c_)>_
All another operands are the same as accepted in &acro.marc; world.
Examples
indexing LEADER
You need to use keyword "ldr" to index leader. For example, indexing data from 6th
and 7th position of LEADER
elm mc-ldr[6] Record-type !
elm mc-ldr[7] Bib-level !
indexing data from control fields
indexing date (the time added to database)
elm mc-008[0-5] Date/time-added-to-db !
or for R&acro.usmarc; (this data included in 100th field)
elm mc-100___$a[0-7]_ Date/time-added-to-db !
using indicators while indexing
For R&acro.usmarc; index-formula
70-#1$a, $g matches
elm 70._1_$a,_$g_ Author !:w,!:p
When &zebra; finds a field according to "70." pattern it checks
the indicators. In this case the value of first indicator doesn't mater, but
the value of second one must be whitespace, in another case a field is not
indexed.
indexing embedded (linked) fields for UNI&acro.marc; based formats
For R&acro.usmarc; index-formula
4--#-$170-#1$a, $g ($c) matches
elm mc-4.._._$1<70._1_$a,_$g_(_$c_)>_ Author !:w,!:p
Data are extracted from record if the field matches to
"4.._." pattern and data in linked field match to embedded
index-formula 70._1_$a,_$g_(_$c_).