&acro.grs1; Record Model and Filter Modules The functionality of this record model has been improved and replaced by the DOM &acro.xml; record model. See . The record model described in this chapter applies to the fundamental, structured record type grs, introduced in .
&acro.grs1; Record Filters Many basic subtypes of the grs type are currently available: grs.sgml This is the canonical input format described . It is using simple &acro.sgml;-like syntax. grs.marc.type This allows &zebra; to read records in the ISO2709 (&acro.marc;) encoding standard. Last parameter type names the .abs file (see below) which describes the specific &acro.marc; structure of the input record as well as the indexing rules. The grs.marc uses an internal representation which is not &acro.xml; conformant. In particular &acro.marc; tags are presented as elements with the same name. And &acro.xml; elements may not start with digits. Therefore this filter is only suitable for systems returning &acro.grs1; and &acro.marc; records. For &acro.xml; use grs.marcxml filter instead (see below). The loadable grs.marc filter module is packaged in the GNU/Debian package libidzebra2.0-mod-grs-marc grs.marcxml.type This allows &zebra; to read ISO2709 encoded records. Last parameter type names the .abs file (see below) which describes the specific &acro.marc; structure of the input record as well as the indexing rules. The internal representation for grs.marcxml is the same as for &acro.marcxml;. It slightly more complicated to work with than grs.marc but &acro.xml; conformant. The loadable grs.marcxml filter module is also contained in the GNU/Debian package libidzebra2.0-mod-grs-marc grs.xml This filter reads &acro.xml; records and uses Expat to parse them and convert them into ID&zebra;'s internal grs record model. Only one record per file is supported, due to the fact &acro.xml; does not allow two documents to "follow" each other (there is no way to know when a document is finished). This filter is only available if &zebra; is compiled with EXPAT support. The loadable grs.xml filter module is packaged in the GNU/Debian package libidzebra2.0-mod-grs-xml grs.regx.filter This enables a user-supplied Regular Expressions input filter described in . The loadable grs.regx filter module is packaged in the GNU/Debian package libidzebra2.0-mod-grs-regx grs.tcl.filter Similar to grs.regx but using Tcl for rules, described in . The loadable grs.tcl filter module is also packaged in the GNU/Debian package libidzebra2.0-mod-grs-regx
&acro.grs1; Canonical Input Format Although input data can take any form, it is sometimes useful to describe the record processing capabilities of the system in terms of a single, canonical input format that gives access to the full spectrum of structure and flexibility in the system. In &zebra;, this canonical format is an "&acro.sgml;-like" syntax. To use the canonical format specify grs.sgml as the record type. Consider a record describing an information resource (such a record is sometimes known as a locator record). It might contain a field describing the distributor of the information resource, which might in turn be partitioned into various fields providing details about the distributor, like this: <Distributor> <Name> USGS/WRD </Name> <Organization> USGS/WRD </Organization> <Street-Address> U.S. GEOLOGICAL SURVEY, 505 MARQUETTE, NW </Street-Address> <City> ALBUQUERQUE </City> <State> NM </State> <Zip-Code> 87102 </Zip-Code> <Country> USA </Country> <Telephone> (505) 766-5560 </Telephone> </Distributor> The keywords surrounded by <...> are tags, while the sections of text in between are the data elements. A data element is characterized by its location in the tree that is made up by the nested elements. Each element is terminated by a closing tag - beginning with </, and containing the same symbolic tag-name as the corresponding opening tag. The general closing tag - </> - terminates the element started by the last opening tag. The structuring of elements is significant. The element Telephone, for instance, may be indexed and presented to the client differently, depending on whether it appears inside the Distributor element, or some other, structured data element such a Supplier element.
Record Root The first tag in a record describes the root node of the tree that makes up the total record. In the canonical input format, the root tag should contain the name of the schema that lends context to the elements of the record (see ). The following is a GILS record that contains only a single element (strictly speaking, that makes it an illegal GILS record, since the GILS profile includes several mandatory elements - &zebra; does not validate the contents of a record against the &acro.z3950; profile, however - it merely attempts to match up elements of a local representation with the given schema): <gils> <title>Zen and the Art of Motorcycle Maintenance</title> </gils>
Variants &zebra; allows you to provide individual data elements in a number of variant forms. Examples of variant forms are textual data elements which might appear in different languages, and images which may appear in different formats or layouts. The variant system in &zebra; is essentially a representation of the variant mechanism of &acro.z3950;-1995. The following is an example of a title element which occurs in two different languages. <title> <var lang lang "eng"> Zen and the Art of Motorcycle Maintenance</> <var lang lang "dan"> Zen og Kunsten at Vedligeholde en Motorcykel</> </title> The syntax of the variant element is <var class type value>. The available values for the class and type fields are given by the variant set that is associated with the current schema (see ). Variant elements are terminated by the general end-tag </>, by the variant end-tag </var>, by the appearance of another variant tag with the same class and value settings, or by the appearance of another, normal tag. In other words, the end-tags for the variants used in the example above could have been omitted. Variant elements can be nested. The element <title> <var lang lang "eng"><var body iana "text/plain"> Zen and the Art of Motorcycle Maintenance </title> Associates two variant components to the variant list for the title element. Given the nesting rules described above, we could write <title> <var body iana "text/plain> <var lang lang "eng"> Zen and the Art of Motorcycle Maintenance <var lang lang "dan"> Zen og Kunsten at Vedligeholde en Motorcykel </title> The title element above comes in two variants. Both have the IANA body type "text/plain", but one is in English, and the other in Danish. The client, using the element selection mechanism of &acro.z3950;, can retrieve information about the available variant forms of data elements, or it can select specific variants based on the requirements of the end-user.
&acro.grs1; REGX And TCL Input Filters In order to handle general input formats, &zebra; allows the operator to define filters which read individual records in their native format and produce an internal representation that the system can work with. Input filters are ASCII files, generally with the suffix .flt. The system looks for the files in the directories given in the profilePath setting in the zebra.cfg files. The record type for the filter is grs.regx.filter-filename (fundamental type grs, file read type regx, argument filter-filename). Generally, an input filter consists of a sequence of rules, where each rule consists of a sequence of expressions, followed by an action. The expressions are evaluated against the contents of the input record, and the actions normally contribute to the generation of an internal representation of the record. An expression can be either of the following: INIT The action associated with this expression is evaluated exactly once in the lifetime of the application, before any records are read. It can be used in conjunction with an action that initializes tables or other resources that are used in the processing of input records. BEGIN Matches the beginning of the record. It can be used to initialize variables, etc. Typically, the BEGIN rule is also used to establish the root node of the record. END Matches the end of the record - when all of the contents of the record has been processed. /reg/ Matches regular expression pattern reg from the input record. The operators supported are the same as for regular expression queries. Refer to . BODY This keyword may only be used between two patterns. It matches everything between (not including) those patterns. FINISH The expression associated with this pattern is evaluated once, before the application terminates. It can be used to release system resources - typically ones allocated in the INIT step. An action is surrounded by curly braces ({...}), and consists of a sequence of statements. Statements may be separated by newlines or semicolons (;). Within actions, the strings that matched the expressions immediately preceding the action can be referred to as $0, $1, $2, etc. The available statements are: begin type [parameter ... ] Begin a new data element. The type is one of the following: record Begin a new record. The following parameter should be the name of the schema that describes the structure of the record, e.g., gils or wais (see below). The begin record call should precede any other use of the begin statement. element Begin a new tagged element. The parameter is the name of the tag. If the tag is not matched anywhere in the tagsets referenced by the current schema, it is treated as a local string tag. variant Begin a new node in a variant tree. The parameters are class type value. data parameter Create a data element. The concatenated arguments make up the value of the data element. The option -text signals that the layout (whitespace) of the data should be retained for transmission. The option -element tag wraps the data up in the tag. The use of the -element option is equivalent to preceding the command with a begin element command, and following it with the end command. end [type] Close a tagged element. If no parameter is given, the last element on the stack is terminated. The first parameter, if any, is a type name, similar to the begin statement. For the element type, a tag name can be provided to terminate a specific tag. unread no Move the input pointer to the offset of first character that match rule given by no. The first rule from left-to-right is numbered zero, the second rule is named 1 and so on. The following input filter reads a Usenet news file, producing a record in the WAIS schema. Note that the body of a news posting is separated from the list of headers by a blank line (or rather a sequence of two newline characters. BEGIN { begin record wais } /^From:/ BODY /$/ { data -element name $1 } /^Subject:/ BODY /$/ { data -element title $1 } /^Date:/ BODY /$/ { data -element lastModified $1 } /\n\n/ BODY END { begin element bodyOfDisplay begin variant body iana "text/plain" data -text $1 end record } If &zebra; is compiled with support for Tcl enabled, the statements described above are supplemented with a complete scripting environment, including control structures (conditional expressions and loop constructs), and powerful string manipulation mechanisms for modifying the elements of a record.
&acro.grs1; Internal Record Representation When records are manipulated by the system, they're represented in a tree-structure, with data elements at the leaf nodes, and tags or variant components at the non-leaf nodes. The root-node identifies the schema that lends context to the tagging and structuring of the record. Imagine a simple record, consisting of a 'title' element and an 'author' element: ROOT TITLE "Zen and the Art of Motorcycle Maintenance" AUTHOR "Robert Pirsig" A slightly more complex record would have the author element consist of two elements, a surname and a first name: ROOT TITLE "Zen and the Art of Motorcycle Maintenance" AUTHOR FIRST-NAME "Robert" SURNAME "Pirsig" The root of the record will refer to the record schema that describes the structuring of this particular record. The schema defines the element tags (TITLE, FIRST-NAME, etc.) that may occur in the record, as well as the structuring (SURNAME should appear below AUTHOR, etc.). In addition, the schema establishes element set names that are used by the client to request a subset of the elements of a given record. The schema may also establish rules for converting the record to a different schema, by stating, for each element, a mapping to a different tag path.
Tagged Elements A data element is characterized by its tag, and its position in the structure of the record. For instance, while the tag "telephone number" may be used different places in a record, we may need to distinguish between these occurrences, both for searching and presentation purposes. For instance, while the phone numbers for the "customer" and the "service provider" are both representatives for the same type of resource (a telephone number), it is essential that they be kept separate. The record schema provides the structure of the record, and names each data element (defined by the sequence of tags - the tag path - by which the element can be reached from the root of the record).
Variants The children of a tag node may be either more tag nodes, a data node (possibly accompanied by tag nodes), or a tree of variant nodes. The children of variant nodes are either more variant nodes or a data node (possibly accompanied by more variant nodes). Each leaf node, which is normally a data node, corresponds to a variant form of the tagged element identified by the tag which parents the variant tree. The following title element occurs in two different languages: VARIANT LANG=ENG "War and Peace" TITLE VARIANT LANG=DAN "Krig og Fred" Which of the two elements are transmitted to the client by the server depends on the specifications provided by the client, if any. In practice, each variant node is associated with a triple of class, type, value, corresponding to the variant mechanism of &acro.z3950;.
Data Elements Data nodes have no children (they are always leaf nodes in the record tree).
&acro.grs1; Record Model Configuration The following sections describe the configuration files that govern the internal management of grs records. The system searches for the files in the directories specified by the profilePath setting in the zebra.cfg file.
The Abstract Syntax The abstract syntax definition (also known as an Abstract Record Structure, or ARS) is the focal point of the record schema description. For a given schema, the ABS file may state any or all of the following: The object identifier of the &acro.z3950; schema associated with the ARS, so that it can be referred to by the client. The attribute set (which can possibly be a compound of multiple sets) which applies in the profile. This is used when indexing and searching the records belonging to the given profile. The tag set (again, this can consist of several different sets). This is used when reading the records from a file, to recognize the different tags, and when transmitting the record to the client - mapping the tags to their numerical representation, if they are known. The variant set which is used in the profile. This provides a vocabulary for specifying the forms of data that appear inside the records. Element set names, which are a shorthand way for the client to ask for a subset of the data elements contained in a record. Element set names, in the retrieval module, are mapped to element specifications, which contain information equivalent to the Espec-1 syntax of &acro.z3950;. Map tables, which may specify mappings to other database profiles, if desired. Possibly, a set of rules describing the mapping of elements to a &acro.marc; representation. A list of element descriptions (this is the actual ARS of the schema, in &acro.z3950; terms), which lists the ways in which the various tags can be used and organized hierarchically. Several of the entries above simply refer to other files, which describe the given objects.
The Configuration Files This section describes the syntax and use of the various tables which are used by the retrieval module. The number of different file types may appear daunting at first, but each type corresponds fairly clearly to a single aspect of the &acro.z3950; retrieval facilities. Further, the average database administrator, who is simply reusing an existing profile for which tables already exist, shouldn't have to worry too much about the contents of these tables. Generally, the files are simple ASCII files, which can be maintained using any text editor. Blank lines, and lines beginning with a (#) are ignored. Any characters on a line followed by a (#) are also ignored. All other lines contain directives, which provide some setting or value to the system. Generally, settings are characterized by a single keyword, identifying the setting, followed by a number of parameters. Some settings are repeatable (r), while others may occur only once in a file. Some settings are optional (o), while others again are mandatory (m).
The Abstract Syntax (.abs) Files The name of this file type is slightly misleading in &acro.z3950; terms, since, apart from the actual abstract syntax of the profile, it also includes most of the other definitions that go into a database profile. When a record in the canonical, &acro.sgml;-like format is read from a file or from the database, the first tag of the file should reference the profile that governs the layout of the record. If the first tag of the record is, say, <gils>, the system will look for the profile definition in the file gils.abs. Profile definitions are cached, so they only have to be read once during the lifespan of the current process. When writing your own input filters, the record-begin command introduces the profile, and should always be called first thing when introducing a new record. The file may contain the following directives: name symbolic-name (m) This provides a shorthand name or description for the profile. Mostly useful for diagnostic purposes. reference OID-name (m) The reference name of the OID for the profile. The reference names can be found in the util module of &yaz;. attset filename (m) The attribute set that is used for indexing and searching records belonging to this profile. tagset filename (o) The tag set (if any) that describe that fields of the records. varset filename (o) The variant set used in the profile. maptab filename (o,r) This points to a conversion table that might be used if the client asks for the record in a different schema from the native one. marc filename (o) Points to a file containing parameters for representing the record contents in the ISO2709 syntax. Read the description of the &acro.marc; representation facility below. esetname name filename (o,r) Associates the given element set name with an element selection file. If an (@) is given in place of the filename, this corresponds to a null mapping for the given element set name. all tags (o) This directive specifies a list of attributes which should be appended to the attribute list given for each element. The effect is to make every single element in the abstract syntax searchable by way of the given attributes. This directive provides an efficient way of supporting free-text searching across all elements. However, it does increase the size of the index significantly. The attributes can be qualified with a structure, as in the elm directive below. elm path name attributes (o,r) Adds an element to the abstract record syntax of the schema. The path follows the syntax which is suggested by the &acro.z3950; document - that is, a sequence of tags separated by slashes (/). Each tag is given as a comma-separated pair of tag type and -value surrounded by parenthesis. The name is the name of the element, and the attributes specifies which attributes to use when indexing the element in a comma-separated list. A ! in place of the attribute name is equivalent to specifying an attribute name identical to the element name. A - in place of the attribute name specifies that no indexing is to take place for the given element. The attributes can be qualified with field types to specify which character set should govern the indexing procedure for that field. The same data element may be indexed into several different fields, using different character set definitions. See the . The default field type is w for word. xelm xpath attributes Specifies indexing for record nodes given by xpath. Unlike directive elm, this directive allows you to index attribute contents. The xpath uses a syntax similar to XPath. The attributes have same syntax and meaning as directive elm, except that operator ! refers to the nodes selected by xpath. melm field$subfield attributes This directive is specifically for &acro.marc;-formatted records, ingested either in the form of &acro.marcxml; documents, or in the ISO2709/Z39.2 format using the grs.marcxml input filter. You can specify indexing rules for any subfield, or you can leave off the $subfield part and specify default rules for all subfields of the given field (note: default rules should come after any subfield-specific rules in the configuration file). The attributes have the same syntax and meaning as for the 'elm' directive above. encoding encodingname This directive specifies character encoding for external records. For records such as &acro.xml; that specifies encoding within the file via a header this directive is ignored. If neither this directive is given, nor an encoding is set within external records, ISO-8859-1 encoding is assumed. xpath enable/disable If this directive is followed by enable, then extra indexing is performed to allow for XPath-like queries. If this directive is not specified - equivalent to disable - no extra XPath-indexing is performed. systag systemTag actualTag Specifies what information, if any, &zebra; should automatically include in retrieval records for the ``system fields'' that it supports. systemTag may be any of the following: rank An integer indicating the relevance-ranking score assigned to the record. sysno An automatically generated identifier for the record, unique within this database. It is represented by the <localControlNumber> element in &acro.xml; and the (1,14) tag in &acro.grs1;. size The size, in bytes, of the retrieved record. The actualTag parameter may be none to indicate that the named element should be omitted from retrieval records. The mechanism for controlling indexing is not adequate for complex databases, and will probably be moved into a separate configuration table eventually. The following is an excerpt from the abstract syntax file for the GILS profile. name gils reference GILS-schema attset gils.att tagset gils.tag varset var1.var maptab gils-usmarc.map # Element set names esetname VARIANT gils-variant.est # for WAIS-compliance esetname B gils-b.est esetname G gils-g.est esetname F @ elm (1,10) rank - elm (1,12) url - elm (1,14) localControlNumber Local-number elm (1,16) dateOfLastModification Date/time-last-modified elm (2,1) title w:!,p:! elm (4,1) controlIdentifier Identifier-standard elm (2,6) abstract Abstract elm (4,51) purpose ! elm (4,52) originator - elm (4,53) accessConstraints ! elm (4,54) useConstraints ! elm (4,70) availability - elm (4,70)/(4,90) distributor - elm (4,70)/(4,90)/(2,7) distributorName ! elm (4,70)/(4,90)/(2,10) distributorOrganization ! elm (4,70)/(4,90)/(4,2) distributorStreetAddress ! elm (4,70)/(4,90)/(4,3) distributorCity !
The Attribute Set (.att) Files This file type describes the Use elements of an attribute set. It contains the following directives. name symbolic-name (m) This provides a shorthand name or description for the attribute set. Mostly useful for diagnostic purposes. reference OID-name (m) The reference name of the OID for the attribute set. The reference names can be found in the util module of &yaz;. include filename (o,r) This directive is used to include another attribute set as a part of the current one. This is used when a new attribute set is defined as an extension to another set. For instance, many new attribute sets are defined as extensions to the bib-1 set. This is an important feature of the retrieval system of &acro.z3950;, as it ensures the highest possible level of interoperability, as those access points of your database which are derived from the external set (say, bib-1) can be used even by clients who are unaware of the new set. att att-value att-name [local-value] (o,r) This repeatable directive introduces a new attribute to the set. The attribute value is stored in the index (unless a local-value is given, in which case this is stored). The name is used to refer to the attribute from the abstract syntax. This is an excerpt from the GILS attribute set definition. Notice how the file describing the bib-1 attribute set is referenced. name gils reference GILS-attset include bib1.att att 2001 distributorName att 2002 indextermsControlled att 2003 purpose att 2004 accessConstraints att 2005 useConstraints
The Tag Set (.tag) Files This file type defines the tagset of the profile, possibly by referencing other tag sets (most tag sets, for instance, will include tagsetG and tagsetM from the &acro.z3950; specification. The file may contain the following directives. name symbolic-name (m) This provides a shorthand name or description for the tag set. Mostly useful for diagnostic purposes. reference OID-name (o) The reference name of the OID for the tag set. The reference names can be found in the util module of &yaz;. The directive is optional, since not all tag sets are registered outside of their schema. type integer (m) The type number of the tagset within the schema profile (note: this specification really should belong to the .abs file. This will be fixed in a future release). include filename (o,r) This directive is used to include the definitions of other tag sets into the current one. tag number names type (o,r) Introduces a new tag to the set. The number is the tag number as used in the protocol (there is currently no mechanism for specifying string tags at this point, but this would be quick work to add). The names parameter is a list of names by which the tag should be recognized in the input file format. The names should be separated by slashes (/). The type is the recommended data type of the tag. It should be one of the following: structured string numeric bool oid generalizedtime intunit int octetstring null The following is an excerpt from the TagsetG definition file. name tagsetg reference TagsetG type 2 tag 1 title string tag 2 author string tag 3 publicationPlace string tag 4 publicationDate string tag 5 documentId string tag 6 abstract string tag 7 name string tag 8 date generalizedtime tag 9 bodyOfDisplay string tag 10 organization string
The Variant Set (.var) Files The variant set file is a straightforward representation of the variant set definitions associated with the protocol. At present, only the Variant-1 set is known. These are the directives allowed in the file. name symbolic-name (m) This provides a shorthand name or description for the variant set. Mostly useful for diagnostic purposes. reference OID-name (o) The reference name of the OID for the variant set, if one is required. The reference names can be found in the util module of &yaz;. class integer class-name (m,r) Introduces a new class to the variant set. type integer type-name datatype (m,r) Addes a new type to the current class (the one introduced by the most recent class directive). The type names belong to the same name space as the one used in the tag set definition file. The following is an excerpt from the file describing the variant set Variant-1. name variant-1 reference Variant-1 class 1 variantId type 1 variantId octetstring class 2 body type 1 iana string type 2 z39.50 string type 3 other string
The Element Set (.est) Files The element set specification files describe a selection of a subset of the elements of a database record. The element selection mechanism is equivalent to the one supplied by the Espec-1 syntax of the &acro.z3950; specification. In fact, the internal representation of an element set specification is identical to the Espec-1 structure, and we'll refer you to the description of that structure for most of the detailed semantics of the directives below. Not all of the Espec-1 functionality has been implemented yet. The fields that are mentioned below all work as expected, unless otherwise is noted. The directives available in the element set file are as follows: defaultVariantSetId OID-name (o) If variants are used in the following, this should provide the name of the variantset used (it's not currently possible to specify a different set in the individual variant request). In almost all cases (certainly all profiles known to us), the name Variant-1 should be given here. defaultVariantRequest variant-request (o) This directive provides a default variant request for use when the individual element requests (see below) do not contain a variant request. Variant requests consist of a blank-separated list of variant components. A variant component is a comma-separated, parenthesized triple of variant class, type, and value (the two former values being represented as integers). The value can currently only be entered as a string (this will change to depend on the definition of the variant in question). The special value (@) is interpreted as a null value, however. simpleElement path ['variant' variant-request] (o,r) This corresponds to a simple element request in Espec-1. The path consists of a sequence of tag-selectors, where each of these can consist of either: A simple tag, consisting of a comma-separated type-value pair in parenthesis, possibly followed by a colon (:) followed by an occurrences-specification (see below). The tag-value can be a number or a string. If the first character is an apostrophe ('), this forces the value to be interpreted as a string, even if it appears to be numerical. A WildThing, represented as a question mark (?), possibly followed by a colon (:) followed by an occurrences specification (see below). A WildPath, represented as an asterisk (*). Note that the last element of the path should not be a wildPath (wildpaths don't work in this version). The occurrences-specification can be either the string all, the string last, or an explicit value-range. The value-range is represented as an integer (the starting point), possibly followed by a plus (+) and a second integer (the number of elements, default being one). The variant-request has the same syntax as the defaultVariantRequest above. Note that it may sometimes be useful to give an empty variant request, simply to disable the default for a specific set of fields (we aren't certain if this is proper Espec-1, but it works in this implementation). The following is an example of an element specification belonging to the GILS profile. simpleelement (1,10) simpleelement (1,12) simpleelement (2,1) simpleelement (1,14) simpleelement (4,1) simpleelement (4,52)
The Schema Mapping (.map) Files Sometimes, the client might want to receive a database record in a schema that differs from the native schema of the record. For instance, a client might only know how to process WAIS records, while the database record is represented in a more specific schema, such as GILS. In this module, a mapping of data to one of the &acro.marc; formats is also thought of as a schema mapping (mapping the elements of the record into fields consistent with the given &acro.marc; specification, prior to actually converting the data to the ISO2709). This use of the object identifier for &acro.usmarc; as a schema identifier represents an overloading of the OID which might not be entirely proper. However, it represents the dual role of schema and record syntax which is assumed by the &acro.marc; family in &acro.z3950;. These are the directives of the schema mapping file format: targetName name (m) A symbolic name for the target schema of the table. Useful mostly for diagnostic purposes. targetRef OID-name (m) An OID name for the target schema. This is used, for instance, by a server receiving a request to present a record in a different schema from the native one. The name, again, is found in the oid module of &yaz;. map element-name target-path (o,r) Adds an element mapping rule to the table.
The &acro.marc; (ISO2709) Representation (.mar) Files This file provides rules for representing a record in the ISO2709 format. The rules pertain mostly to the values of the constant-length header of the record.
&acro.grs1; Exchange Formats Converting records from the internal structure to an exchange format is largely an automatic process. Currently, the following exchange formats are supported: &acro.grs1;. The internal representation is based on &acro.grs1;/&acro.xml;, so the conversion here is straightforward. The system will create applied variant and supported variant lists as required, if a record contains variant information. &acro.xml;. The internal representation is based on &acro.grs1;/&acro.xml; so the mapping is trivial. Note that &acro.xml; schemas, preprocessing instructions and comments are not part of the internal representation and therefore will never be part of a generated &acro.xml; record. Future versions of the &zebra; will support that. &acro.sutrs;. Again, the mapping is fairly straightforward. Indentation is used to show the hierarchical structure of the record. All "&acro.grs1;" type records support both the &acro.grs1; and &acro.sutrs; representations. ISO2709-based formats (&acro.usmarc;, etc.). Only records with a two-level structure (corresponding to fields and subfields) can be directly mapped to ISO2709. For records with a different structuring (e.g., GILS), the representation in a structure like &acro.usmarc; involves a schema-mapping (see ), to an "implied" &acro.usmarc; schema (implied, because there is no formal schema which specifies the use of the &acro.usmarc; fields outside of ISO2709). The resultant, two-level record is then mapped directly from the internal representation to ISO2709. See the GILS schema definition files for a detailed example of this approach. Explain. This representation is only available for records belonging to the Explain schema. Summary. This ASN-1 based structure is only available for records belonging to the Summary schema - or schema which provide a mapping to this schema (see the description of the schema mapping facility above). SOIF. Support for this syntax is experimental, and is currently keyed to a private Index Data OID (1.2.840.10003.5.1000.81.2). All abstract syntaxes can be mapped to the SOIF format, although nested elements are represented by concatenation of the tag names at each level.
Extended indexing of &acro.marc; records Extended indexing of &acro.marc; records will help you if you need index a combination of subfields, or index only a part of the whole field, or use during indexing process embedded fields of &acro.marc; record. Extended indexing of &acro.marc; records additionally allows: to index data in LEADER of &acro.marc; record to index data in control fields (with fixed length) to use during indexing the values of indicators to index linked fields for UNI&acro.marc; based formats In compare with simple indexing process the extended indexing may increase (about 2-3 times) the time of indexing process for &acro.marc; records.
The index-formula At the beginning, we have to define the term index-formula for &acro.marc; records. This term helps to understand the notation of extended indexing of &acro.marc; records by &zebra;. Our definition is based on the document "The table of conformity for &acro.z3950; use attributes and R&acro.usmarc; fields". The document is available only in Russian language. The index-formula is the combination of subfields presented in such way: 71-00$a, $g, $h ($c){.$b ($c)} , (1) We know that &zebra; supports a &acro.bib1; attribute - right truncation. In this case, the index-formula (1) consists from forms, defined in the same way as (1) 71-00$a, $g, $h 71-00$a, $g 71-00$a The original &acro.marc; record may be without some elements, which included in index-formula. This notation includes such operands as: # It means whitespace character. - The position may contain any value, defined by &acro.marc; format. For example, index-formula 70-#1$a, $g , (2) includes 700#1$a, $g 701#1$a, $g 702#1$a, $g {...} The repeatable elements are defined in figure-brackets {}. For example, index-formula 71-00$a, $g, $h ($c){.$b ($c)} , (3) includes 71-00$a, $g, $h ($c). $b ($c) 71-00$a, $g, $h ($c). $b ($c). $b ($c) 71-00$a, $g, $h ($c). $b ($c). $b ($c). $b ($c) All another operands are the same as accepted in &acro.marc; world.
Notation of <emphasis>index-formula</emphasis> for &zebra; Extended indexing overloads path of elm definition in abstract syntax file of &zebra; (.abs file). It means that names beginning with "mc-" are interpreted by &zebra; as index-formula. The database index is created and linked with access point (&acro.bib1; use attribute) according to this formula. For example, index-formula 71-00$a, $g, $h ($c){.$b ($c)} , (4) in .abs file looks like: mc-71.00_$a,_$g,_$h_(_$c_){.$b_(_$c_)} The notation of index-formula uses the operands: _ It means whitespace character. . The position may contain any value, defined by &acro.marc; format. For example, index-formula 70-#1$a, $g , (5) matches mc-70._1_$a,_$g_ and includes 700_1_$a,_$g_ 701_1_$a,_$g_ 702_1_$a,_$g_ {...} The repeatable elements are defined in figure-brackets {}. For example, index-formula 71#00$a, $g, $h ($c) {.$b ($c)} , (6) matches mc-71.00_$a,_$g,_$h_(_$c_){.$b_(_$c_)} and includes 71.00_$a,_$g,_$h_(_$c_).$b_(_$c_) 71.00_$a,_$g,_$h_(_$c_).$b_(_$c_).$b_(_$c_) 71.00_$a,_$g,_$h_(_$c_).$b_(_$c_).$b_(_$c_).$b_(_$c_) <...> Embedded index-formula (for linked fields) is between <>. For example, index-formula 4--#-$170-#1$a, $g ($c) , (7) matches mc-4.._._$1<70._1_$a,_$g_(_$c_)>_ and includes 463_._$1<70._1_$a,_$g_(_$c_)>_ All another operands are the same as accepted in &acro.marc; world.
Examples indexing LEADER You need to use keyword "ldr" to index leader. For example, indexing data from 6th and 7th position of LEADER elm mc-ldr[6] Record-type ! elm mc-ldr[7] Bib-level ! indexing data from control fields indexing date (the time added to database) elm mc-008[0-5] Date/time-added-to-db ! or for R&acro.usmarc; (this data included in 100th field) elm mc-100___$a[0-7]_ Date/time-added-to-db ! using indicators while indexing For R&acro.usmarc; index-formula 70-#1$a, $g matches elm 70._1_$a,_$g_ Author !:w,!:p When &zebra; finds a field according to "70." pattern it checks the indicators. In this case the value of first indicator doesn't mater, but the value of second one must be whitespace, in another case a field is not indexed. indexing embedded (linked) fields for UNI&acro.marc; based formats For R&acro.usmarc; index-formula 4--#-$170-#1$a, $g ($c) matches _ Author !:w,!:p ]]> Data are extracted from record if the field matches to "4.._." pattern and data in linked field match to embedded index-formula 70._1_$a,_$g_(_$c_).