X-Git-Url: http://git.indexdata.com/?p=idzebra-moved-to-github.git;a=blobdiff_plain;f=doc%2Fadministration.xml;h=b95db6619112ccf3c6a809d3822e27eec6e4b30b;hp=be92e8e0893b8d105638a623d5194f0b45dca4dc;hb=HEAD;hpb=24cf42a15df56f9fe2436eedef816212b9d4fb17 diff --git a/doc/administration.xml b/doc/administration.xml index be92e8e..b95db66 100644 --- a/doc/administration.xml +++ b/doc/administration.xml @@ -1,877 +1,1869 @@ - - - Administrating Zebra - + + Administrating &zebra; + - - Unlike many simpler retrieval systems, Zebra supports safe, incremental - updates to an existing index. - - - - Normally, when Zebra modifies the index it reads a number of records - that you specify. - Depending on your specifications and on the contents of each record - one the following events take place for each record: - - - - Insert - - - The record is indexed as if it never occurred before. - Either the Zebra system doesn't know how to identify the record or - Zebra can identify the record but didn't find it to be already indexed. - - - - - Modify - - - The record has already been indexed. - In this case either the contents of the record or the location - (file) of the record indicates that it has been indexed before. - - - - - Delete - - - The record is deleted from the index. As in the - update-case it must be able to identify the record. - - - - - - - - Please note that in both the modify- and delete- case the Zebra - indexer must be able to generate a unique key that identifies the record - in question (more on this below). - - - - To administrate the Zebra retrieval system, you run the - zebraidx program. - This program supports a number of options which are preceded by a dash, - and a few commands (not preceded by dash). - - - - Both the Zebra administrative tool and the Z39.50 server share a - set of index files and a global configuration file. - The name of the configuration file defaults to - zebra.cfg. - The configuration file includes specifications on how to index - various kinds of records and where the other configuration files - are located. zebrasrv and zebraidx - must be run in the directory where the - configuration file lives unless you indicate the location of the - configuration file by option -c. - - - - Record Types - - - Indexing is a per-record process, in which either insert/modify/delete - will occur. Before a record is indexed search keys are extracted from - whatever might be the layout the original record (sgml,html,text, etc..). - The Zebra system currently supports two fundamental types of records: - structured and simple text. - To specify a particular extraction process, use either the - command line option -t or specify a - recordType setting in the configuration file. - - - - - - The Zebra Configuration File - - - The Zebra configuration file, read by zebraidx and - zebrasrv defaults to zebra.cfg - unless specified by -c option. - - - - You can edit the configuration file with a normal text editor. - parameter names and values are separated by colons in the file. Lines - starting with a hash sign (#) are - treated as comments. - - - - If you manage different sets of records that share common - characteristics, you can organize the configuration settings for each - type into "groups". - When zebraidx is run and you wish to address a - given group you specify the group name with the -g - option. - In this case settings that have the group name as their prefix - will be used by zebraidx. - If no -g option is specified, the settings - without prefix are used. - - - - In the configuration file, the group name is placed before the option - name itself, separated by a dot (.). For instance, to set the record type - for group public to grs.sgml - (the SGML-like format for structured records) you would write: - - - - - public.recordType: grs.sgml - - - - - To set the default value of the record type to text - write: - - - - recordType: text - + Unlike many simpler retrieval systems, &zebra; supports safe, incremental + updates to an existing index. - - - The available configuration settings are summarized below. They will be - explained further in the following sections. - - - - + + Normally, when &zebra; modifies the index it reads a number of records + that you specify. + Depending on your specifications and on the contents of each record + one the following events take place for each record: - - - - group - .recordType[.name]: - type - - - - Specifies how records with the file extension - name should be handled by the indexer. - This option may also be specified as a command line option - (-t). Note that if you do not specify a - name, the setting applies to all files. - In general, the record type specifier consists of the elements (each - element separated by dot), fundamental-type, - file-read-type and arguments. Currently, two - fundamental types exist, text and - grs. - - - - - group.recordId: - record-id-spec - - - Specifies how the records are to be identified when updated. See - . - - - - - group.database: - database - - - Specifies the Z39.50 database name. - - - - - - group.storeKeys: - boolean - - - Specifies whether key information should be saved for a given - group of records. If you plan to update/delete this type of - records later this should be specified as 1; otherwise it - should be 0 (default), to save register space. - - See . - - - - - group.storeData: - boolean - - - Specifies whether the records should be stored internally - in the Zebra system files. - If you want to maintain the raw records yourself, - this option should be false (0). - If you want Zebra to take care of the records for you, it - should be true(1). - - - - - - register: register-location - - - Specifies the location of the various register files that Zebra uses - to represent your databases. - See . - - - - - shadow: register-location - - - Enables the safe update facility of Zebra, and - tells the system where to place the required, temporary files. - See . - - - - - lockDir: directory - - - Directory in which various lock files are stored. - - - - - keyTmpDir: directory - - - Directory in which temporary files used during zebraidx's update - phase are stored. - - - - - setTmpDir: directory - - - Specifies the directory that the server uses for temporary result sets. - If not specified /tmp will be used. - - - - - profilePath: path - - - Specifies a path of profile specification files. - The path is composed of one or more directories separated by - colon. Similar to PATH for UNIX systems. - - - - - attset: filename - - - Specifies the filename(s) of attribute set files for use in - searching. At least the Bib-1 set should be loaded - (bib1.att). - The profilePath setting is used to look for - the specified files. - See - - - + - memMax: size + Insert - Specifies size of internal memory - to use for the zebraidx program. - The amount is given in megabytes - default is 4 (4 MB). + The record is indexed as if it never occurred before. + Either the &zebra; system doesn't know how to identify the record or + &zebra; can identify the record but didn't find it to be already indexed. - - root: dir + Modify - Specifies a directory base for Zebra. All relative paths - given (in profilePath, register, shadow) are based on this - directory. This setting is useful if your Zebra server - is running in a different directory from where - zebra.cfg is located. + The record has already been indexed. + In this case either the contents of the record or the location + (file) of the record indicates that it has been indexed before. - - tagsysno: 0|1 + Delete - Species whether Zebra should include system-number data in XML - and GRS-1 records returned to clients, represented by the - <localControlNumber> element in XML - and the (1,14) tag in GRS-1. - The content of these elements is an internally-generated - integer uniquely identifying the record within its database. - It is included by default but may be turned off, with - tagsysno: 0 for databases in which a local - control number is explicitly specified in the input records - themselves. + The record is deleted from the index. As in the + update-case it must be able to identify the record. - - - - - - Locating Records - - - The default behavior of the Zebra system is to reference the - records from their original location, i.e. where they were found when you - ran zebraidx. - That is, when a client wishes to retrieve a record - following a search operation, the files are accessed from the place - where you originally put them - if you remove the files (without - running zebraidx again, the server will return - diagnostic number 14 (``System error in presenting records'') to - the client. - - - - If your input files are not permanent - for example if you retrieve - your records from an outside source, or if they were temporarily - mounted on a CD-ROM drive, - you may want Zebra to make an internal copy of them. To do this, - you specify 1 (true) in the storeData setting. When - the Z39.50 server retrieves the records they will be read from the - internal file structures of the system. - - - - - - Indexing with no Record IDs (Simple Indexing) - - - If you have a set of records that are not expected to change over time - you may can build your database without record IDs. - This indexing method uses less space than the other methods and - is simple to use. - - - - To use this method, you simply omit the recordId entry - for the group of files that you index. To add a set of records you use - zebraidx with the update command. The - update command will always add all of the records that it - encounters to the index - whether they have already been indexed or - not. If the set of indexed files change, you should delete all of the - index files, and build a new index from scratch. - - - - Consider a system in which you have a group of text files called - simple. - That group of records should belong to a Z39.50 database called - textbase. - The following zebra.cfg file will suffice: - - - - - profilePath: /usr/local/yaz - attset: bib1.att - simple.recordType: text - simple.database: textbase - - - - - Since the existing records in an index can not be addressed by their - IDs, it is impossible to delete or modify records when using this method. - - - - - - Indexing with File Record IDs - - If you have a set of files that regularly change over time: Old files - are deleted, new ones are added, or existing files are modified, you - can benefit from using the file ID - indexing methodology. - Examples of this type of database might include an index of WWW - resources, or a USENET news spool area. - Briefly speaking, the file key methodology uses the directory paths - of the individual records as a unique identifier for each record. - To perform indexing of a directory with file keys, again, you specify - the top-level directory after the update command. - The command will recursively traverse the directories and compare - each one with whatever have been indexed before in that same directory. - If a file is new (not in the previous version of the directory) it - is inserted into the registers; if a file was already indexed and - it has been modified since the last update, the index is also - modified; if a file has been removed since the last - visit, it is deleted from the index. + Please note that in both the modify- and delete- case the &zebra; + indexer must be able to generate a unique key that identifies the record + in question (more on this below). - + - The resulting system is easy to administrate. To delete a record you - simply have to delete the corresponding file (say, with the - rm command). And to add records you create new - files (or directories with files). For your changes to take effect - in the register you must run zebraidx update with - the same directory root again. This mode of operation requires more - disk space than simpler indexing methods, but it makes it easier for - you to keep the index in sync with a frequently changing set of data. - If you combine this system with the safe update - facility (see below), you never have to take your server off-line for - maintenance or register updating purposes. + To administrate the &zebra; retrieval system, you run the + zebraidx program. + This program supports a number of options which are preceded by a dash, + and a few commands (not preceded by dash). - + - To enable indexing with pathname IDs, you must specify - file as the value of recordId - in the configuration file. In addition, you should set - storeKeys to 1, since the Zebra - indexer must save additional information about the contents of each record - in order to modify the indexes correctly at a later time. + Both the &zebra; administrative tool and the &acro.z3950; server share a + set of index files and a global configuration file. + The name of the configuration file defaults to + zebra.cfg. + The configuration file includes specifications on how to index + various kinds of records and where the other configuration files + are located. zebrasrv and zebraidx + must be run in the directory where the + configuration file lives unless you indicate the location of the + configuration file by option -c. - + + + Record Types + + + Indexing is a per-record process, in which either insert/modify/delete + will occur. Before a record is indexed search keys are extracted from + whatever might be the layout the original record (sgml,html,text, etc..). + The &zebra; system currently supports two fundamental types of records: + structured and simple text. + To specify a particular extraction process, use either the + command line option -t or specify a + recordType setting in the configuration file. + + + + + + The &zebra; Configuration File + + + The &zebra; configuration file, read by zebraidx and + zebrasrv defaults to zebra.cfg + unless specified by -c option. + + + + You can edit the configuration file with a normal text editor. + parameter names and values are separated by colons in the file. Lines + starting with a hash sign (#) are + treated as comments. + + + + If you manage different sets of records that share common + characteristics, you can organize the configuration settings for each + type into "groups". + When zebraidx is run and you wish to address a + given group you specify the group name with the -g + option. + In this case settings that have the group name as their prefix + will be used by zebraidx. + If no -g option is specified, the settings + without prefix are used. + + + + In the configuration file, the group name is placed before the option + name itself, separated by a dot (.). For instance, to set the record type + for group public to grs.sgml + (the &acro.sgml;-like format for structured records) you would write: + + + + + public.recordType: grs.sgml + + + + + To set the default value of the record type to text + write: + + + + + recordType: text + + + + + The available configuration settings are summarized below. They will be + explained further in the following sections. + + + FIXME - Didn't Adam make something to have multiple databases in multiple dirs... + --> - - For example, to update records of group esdd - located below - /data1/records/ you should type: - - $ zebraidx -g esdd update /data1/records - - - - - The corresponding configuration file includes: - - esdd.recordId: file - esdd.recordType: grs.sgml - esdd.storeKeys: 1 - - - - - You cannot start out with a group of records with simple - indexing (no record IDs as in the previous section) and then later - enable file record Ids. Zebra must know from the first time that you - index the group that - the files should be indexed with file record IDs. + + + + + + group + .recordType[.name]: + type + + + + Specifies how records with the file extension + name should be handled by the indexer. + This option may also be specified as a command line option + (-t). Note that if you do not specify a + name, the setting applies to all files. + In general, the record type specifier consists of the elements (each + element separated by dot), fundamental-type, + file-read-type and arguments. Currently, two + fundamental types exist, text and + grs. + + + + + group.recordId: + record-id-spec + + + Specifies how the records are to be identified when updated. See + . + + + + + group.database: + database + + + Specifies the &acro.z3950; database name. + + + + + + group.storeKeys: + boolean + + + Specifies whether key information should be saved for a given + group of records. If you plan to update/delete this type of + records later this should be specified as 1; otherwise it + should be 0 (default), to save register space. + + See . + + + + + group.storeData: + boolean + + + Specifies whether the records should be stored internally + in the &zebra; system files. + If you want to maintain the raw records yourself, + this option should be false (0). + If you want &zebra; to take care of the records for you, it + should be true(1). + + + + + + register: register-location + + + Specifies the location of the various register files that &zebra; uses + to represent your databases. + See . + + + + + shadow: register-location + + + Enables the safe update facility of &zebra;, and + tells the system where to place the required, temporary files. + See . + + + + + lockDir: directory + + + Directory in which various lock files are stored. + + + + + keyTmpDir: directory + + + Directory in which temporary files used during zebraidx's update + phase are stored. + + + + + setTmpDir: directory + + + Specifies the directory that the server uses for temporary result sets. + If not specified /tmp will be used. + + + + + profilePath: path + + + Specifies a path of profile specification files. + The path is composed of one or more directories separated by + colon. Similar to PATH for UNIX systems. + + + + + + modulePath: path + + + Specifies a path of record filter modules. + The path is composed of one or more directories separated by + colon. Similar to PATH for UNIX systems. + The 'make install' procedure typically puts modules in + /usr/local/lib/idzebra-2.0/modules. + + + + + + index: filename + + + Defines the filename which holds fields structure + definitions. If omitted, the file default.idx + is read. + Refer to for + more information. + + + + + + sortmax: integer + + + Specifies the maximum number of records that will be sorted + in a result set. If the result set contains more than + integer records, records after the + limit will not be sorted. If omitted, the default value is + 1,000. + + + + + + staticrank: integer + + + Enables whether static ranking is to be enabled (1) or + disabled (0). If omitted, it is disabled - corresponding + to a value of 0. + Refer to . + + + + + + + estimatehits: integer + + + Controls whether &zebra; should calculate approximate hit counts and + at which hit count it is to be enabled. + A value of 0 disables approximate hit counts. + For a positive value approximate hit count is enabled + if it is known to be larger than integer. + + + Approximate hit counts can also be triggered by a particular + attribute in a query. + Refer to . + + + + + + attset: filename + + + Specifies the filename(s) of attribute set files for use in + searching. In many configurations bib1.att + is used, but that is not required. If Classic Explain + attributes is to be used for searching, + explain.att must be given. + The path to att-files in general can be given using + profilePath setting. + See also . + + + + + memMax: size + + + Specifies size of internal memory + to use for the zebraidx program. + The amount is given in megabytes - default is 4 (4 MB). + The more memory, the faster large updates happen, up to about + half the free memory available on the computer. + + + + + tempfiles: Yes/Auto/No + + + Tells zebra if it should use temporary files when indexing. The + default is Auto, in which case zebra uses temporary files only + if it would need more that memMax + megabytes of memory. This should be good for most uses. + + + + + + root: dir + + + Specifies a directory base for &zebra;. All relative paths + given (in profilePath, register, shadow) are based on this + directory. This setting is useful if your &zebra; server + is running in a different directory from where + zebra.cfg is located. + + + + + + passwd: file + + + Specifies a file with description of user accounts for &zebra;. + The format is similar to that known to Apache's htpasswd files + and UNIX' passwd files. Non-empty lines not beginning with + # are considered account lines. There is one account per-line. + A line consists of fields separate by a single colon character. + First field is username, second is password. + + + + + + passwd.c: file + + + Specifies a file with description of user accounts for &zebra;. + File format is similar to that used by the passwd directive except + that the password are encrypted. Use Apache's htpasswd or similar + for maintenance. + + + + + + perm.user: + permstring + + + Specifies permissions (privilege) for a user that are allowed + to access &zebra; via the passwd system. There are two kinds + of permissions currently: read (r) and write(w). By default + users not listed in a permission directive are given the read + privilege. To specify permissions for a user with no + username, or &acro.z3950; anonymous style use + anonymous. The permstring consists of + a sequence of characters. Include character w + for write/update access, r for read access and + a to allow anonymous access through this account. + + + + + + dbaccess: accessfile + + + Names a file which lists database subscriptions for individual users. + The access file should consists of lines of the form + username: dbnames, where dbnames is a list of + database names, separated by '+'. No whitespace is allowed in the + database list. + + + + + + encoding: charsetname + + + Tells &zebra; to interpret the terms in Z39.50 queries as + having been encoded using the specified character + encoding. The default is ISO-8859-1; one + useful alternative is UTF-8. + + + + + + storeKeys: value + + + Specifies whether &zebra; keeps a copy of indexed keys. + Use a value of 1 to enable; 0 to disable. If storeKeys setting is + omitted, it is enabled. Enabled storeKeys + are required for updating and deleting records. Disable only + storeKeys to save space and only plan to index data once. + + + + + + storeData: value + + + Specifies whether &zebra; keeps a copy of indexed records. + Use a value of 1 to enable; 0 to disable. If storeData setting is + omitted, it is enabled. A storeData setting of 0 (disabled) makes + Zebra fetch records from the original locaction in the file + system using filename, file offset and file length. For the + DOM and ALVIS filter, the storeData setting is ignored. + + + + + + + + + + + Locating Records + + + The default behavior of the &zebra; system is to reference the + records from their original location, i.e. where they were found when you + run zebraidx. + That is, when a client wishes to retrieve a record + following a search operation, the files are accessed from the place + where you originally put them - if you remove the files (without + running zebraidx again, the server will return + diagnostic number 14 (``System error in presenting records'') to + the client. + + + + If your input files are not permanent - for example if you retrieve + your records from an outside source, or if they were temporarily + mounted on a CD-ROM drive, + you may want &zebra; to make an internal copy of them. To do this, + you specify 1 (true) in the storeData setting. When + the &acro.z3950; server retrieves the records they will be read from the + internal file structures of the system. + + + + + + Indexing with no Record IDs (Simple Indexing) + + + If you have a set of records that are not expected to change over time + you may can build your database without record IDs. + This indexing method uses less space than the other methods and + is simple to use. + + + + To use this method, you simply omit the recordId entry + for the group of files that you index. To add a set of records you use + zebraidx with the update command. The + update command will always add all of the records that it + encounters to the index - whether they have already been indexed or + not. If the set of indexed files change, you should delete all of the + index files, and build a new index from scratch. + + + + Consider a system in which you have a group of text files called + simple. + That group of records should belong to a &acro.z3950; database called + textbase. + The following zebra.cfg file will suffice: + + + + + profilePath: /usr/local/idzebra/tab + attset: bib1.att + simple.recordType: text + simple.database: textbase + + + + + + Since the existing records in an index can not be addressed by their + IDs, it is impossible to delete or modify records when using this method. + + + + + + Indexing with File Record IDs + + + If you have a set of files that regularly change over time: Old files + are deleted, new ones are added, or existing files are modified, you + can benefit from using the file ID + indexing methodology. + Examples of this type of database might include an index of WWW + resources, or a USENET news spool area. + Briefly speaking, the file key methodology uses the directory paths + of the individual records as a unique identifier for each record. + To perform indexing of a directory with file keys, again, you specify + the top-level directory after the update command. + The command will recursively traverse the directories and compare + each one with whatever have been indexed before in that same directory. + If a file is new (not in the previous version of the directory) it + is inserted into the registers; if a file was already indexed and + it has been modified since the last update, the index is also + modified; if a file has been removed since the last + visit, it is deleted from the index. + + + The resulting system is easy to administrate. To delete a record you + simply have to delete the corresponding file (say, with the + rm command). And to add records you create new + files (or directories with files). For your changes to take effect + in the register you must run zebraidx update with + the same directory root again. This mode of operation requires more + disk space than simpler indexing methods, but it makes it easier for + you to keep the index in sync with a frequently changing set of data. + If you combine this system with the safe update + facility (see below), you never have to take your server off-line for + maintenance or register updating purposes. + + + + To enable indexing with pathname IDs, you must specify + file as the value of recordId + in the configuration file. In addition, you should set + storeKeys to 1, since the &zebra; + indexer must save additional information about the contents of each record + in order to modify the indexes correctly at a later time. + + + + + + For example, to update records of group esdd + located below + /data1/records/ you should type: + + $ zebraidx -g esdd update /data1/records + + + + + The corresponding configuration file includes: + + esdd.recordId: file + esdd.recordType: grs.sgml + esdd.storeKeys: 1 + + + + + You cannot start out with a group of records with simple + indexing (no record IDs as in the previous section) and then later + enable file record Ids. &zebra; must know from the first time that you + index the group that + the files should be indexed with file record IDs. + - - - You cannot explicitly delete records when using this method (using the - delete command to zebraidx. Instead - you have to delete the files from the file system (or move them to a - different location) - and then run zebraidx with the - update command. - - - - - - Indexing with General Record IDs - - - When using this method you construct an (almost) arbitrary, internal - record key based on the contents of the record itself and other system - information. If you have a group of records that explicitly associates - an ID with each record, this method is convenient. For example, the - record format may contain a title or a ID-number - unique within the group. - In either case you specify the Z39.50 attribute set and use-attribute - location in which this information is stored, and the system looks at - that field to determine the identity of the record. - - - - As before, the record ID is defined by the recordId - setting in the configuration file. The value of the record ID specification - consists of one or more tokens separated by whitespace. The resulting - ID is represented in the index by concatenating the tokens and - separating them by ASCII value (1). - - - - There are three kinds of tokens: - - - - Internal record info - - - The token refers to a key that is - extracted from the record. The syntax of this token is - ( set , - use ), - where set is the - attribute set name use is the - name or value of the attribute. - - - - - System variable - - - The system variables are preceded by - - - $ - - and immediately followed by the system variable name, which - may one of - - - - group - - - Group name. - - - - - database - - - Current database specified. - - - - - type - - - Record type. - - - - - - - - - Constant string - - - A string used as part of the ID — surrounded - by single- or double quotes. - - - - - - - - For instance, the sample GILS records that come with the Zebra - distribution contain a unique ID in the data tagged Control-Identifier. - The data is mapped to the Bib-1 use attribute Identifier-standard - (code 1007). To use this field as a record id, specify - (bib1,Identifier-standard) as the value of the - recordId in the configuration file. - If you have other record types that uses the same field for a - different purpose, you might add the record type - (or group or database name) to the record id of the gils - records as well, to prevent matches with other types of records. - In this case the recordId might be set like this: - - - gils.recordId: $type (bib1,Identifier-standard) - - - - - - (see - for details of how the mapping between elements of your records and - searchable attributes is established). - - - - As for the file record ID case described in the previous section, - updating your system is simply a matter of running - zebraidx - with the update command. However, the update with general - keys is considerably slower than with file record IDs, since all files - visited must be (re)read to discover their IDs. - - - - As you might expect, when using the general record IDs - method, you can only add or modify existing records with the - update command. - If you wish to delete records, you must use the, - delete command, with a directory as a parameter. - This will remove all records that match the files below that root - directory. - - - - - - Register Location - - - Normally, the index files that form dictionaries, inverted - files, record info, etc., are stored in the directory where you run - zebraidx. If you wish to store these, possibly large, - files somewhere else, you must add the register - entry to the zebra.cfg file. - Furthermore, the Zebra system allows its file - structures to span multiple file systems, which is useful for - managing very large databases. - - - - The value of the register setting is a sequence - of tokens. Each token takes the form: - - - dir:size. - - - The dir specifies a directory in which index files - will be stored and the size specifies the maximum - size of all files in that directory. The Zebra indexer system fills - each directory in the order specified and use the next specified - directories as needed. - The size is an integer followed by a qualifier - code, - b for bytes, - k for kilobytes. - M for megabytes, - G for gigabytes. - - - - For instance, if you have allocated two disks for your register, and - the first disk is mounted - on /d1 and has 2GB of free space and the - second, mounted on /d2 has 3.6 GB, you could - put this entry in your configuration file: - - - register: /d1:2G /d2:3600M - - - - - - Note that Zebra does not verify that the amount of space specified is - actually available on the directory (file system) specified - it is - your responsibility to ensure that enough space is available, and that - other applications do not attempt to use the free space. In a large - production system, it is recommended that you allocate one or more - file system exclusively to the Zebra register files. - - - - - - Safe Updating - Using Shadow Registers - - - Description - - - The Zebra server supports updating of the index - structures. That is, you can add, modify, or remove records from - databases managed by Zebra without rebuilding the entire index. - Since this process involves modifying structured files with various - references between blocks of data in the files, the update process - is inherently sensitive to system crashes, or to process interruptions: - Anything but a successfully completed update process will leave the - register files in an unknown state, and you will essentially have no - recourse but to re-index everything, or to restore the register files - from a backup medium. - Further, while the update process is active, users cannot be - allowed to access the system, as the contents of the register files - may change unpredictably. - - - - You can solve these problems by enabling the shadow register system in - Zebra. - During the updating procedure, zebraidx will temporarily - write changes to the involved files in a set of "shadow - files", without modifying the files that are accessed by the - active server processes. If the update procedure is interrupted by a - system crash or a signal, you simply repeat the procedure - the - register files have not been changed or damaged, and the partially - written shadow files are automatically deleted before the new updating - procedure commences. - - - - At the end of the updating procedure (or in a separate operation, if - you so desire), the system enters a "commit mode". First, - any active server processes are forced to access those blocks that - have been changed from the shadow files rather than from the main - register files; the unmodified blocks are still accessed at their - normal location (the shadow files are not a complete copy of the - register files - they only contain those parts that have actually been - modified). If the commit process is interrupted at any point during the - commit process, the server processes will continue to access the - shadow files until you can repeat the commit procedure and complete - the writing of data to the main register files. You can perform - multiple update operations to the registers before you commit the - changes to the system files, or you can execute the commit operation - at the end of each update operation. When the commit phase has - completed successfully, any running server processes are instructed to - switch their operations to the new, operational register, and the - temporary shadow files are deleted. - - - - - - How to Use Shadow Register Files - - - The first step is to allocate space on your system for the shadow - files. - You do this by adding a shadow entry to the - zebra.cfg file. - The syntax of the shadow entry is exactly the - same as for the register entry - (see ). - The location of the shadow area should be - different from the location of the main register - area (if you have specified one - remember that if you provide no - register setting, the default register area is the - working directory of the server and indexing processes). + + + You cannot explicitly delete records when using this method (using the + delete command to zebraidx. Instead + you have to delete the files from the file system (or move them to a + different location) + and then run zebraidx with the + update command. + + + + + + Indexing with General Record IDs + + + When using this method you construct an (almost) arbitrary, internal + record key based on the contents of the record itself and other system + information. If you have a group of records that explicitly associates + an ID with each record, this method is convenient. For example, the + record format may contain a title or a ID-number - unique within the group. + In either case you specify the &acro.z3950; attribute set and use-attribute + location in which this information is stored, and the system looks at + that field to determine the identity of the record. - + - The following excerpt from a zebra.cfg file shows - one example of a setup that configures both the main register - location and the shadow file area. - Note that two directories or partitions have been set aside - for the shadow file area. You can specify any number of directories - for each of the file areas, but remember that there should be no - overlaps between the directories used for the main registers and the - shadow files, respectively. + As before, the record ID is defined by the recordId + setting in the configuration file. The value of the record ID specification + consists of one or more tokens separated by whitespace. The resulting + ID is represented in the index by concatenating the tokens and + separating them by ASCII value (1). + - + There are three kinds of tokens: + + + + Internal record info + + + The token refers to a key that is + extracted from the record. The syntax of this token is + ( set , + use ), + where set is the + attribute set name use is the + name or value of the attribute. + + + + + System variable + + + The system variables are preceded by + + + $ + + and immediately followed by the system variable name, which + may one of + + + + group + + + Group name. + + + + + database + + + Current database specified. + + + + + type + + + Record type. + + + + + + + + + Constant string + + + A string used as part of the ID — surrounded + by single- or double quotes. + + + + + + + + For instance, the sample GILS records that come with the &zebra; + distribution contain a unique ID in the data tagged Control-Identifier. + The data is mapped to the &acro.bib1; use attribute Identifier-standard + (code 1007). To use this field as a record id, specify + (bib1,Identifier-standard) as the value of the + recordId in the configuration file. + If you have other record types that uses the same field for a + different purpose, you might add the record type + (or group or database name) to the record id of the gils + records as well, to prevent matches with other types of records. + In this case the recordId might be set like this: + - register: /d1:500M - - shadow: /scratch1:100M /scratch2:200M + gils.recordId: $type (bib1,Identifier-standard) - + - + - When shadow files are enabled, an extra command is available at the - zebraidx command line. - In order to make changes to the system take effect for the - users, you'll have to submit a "commit" command after a - (sequence of) update operation(s). + (see + for details of how the mapping between elements of your records and + searchable attributes is established). - + - + As for the file record ID case described in the previous section, + updating your system is simply a matter of running + zebraidx + with the update command. However, the update with general + keys is considerably slower than with file record IDs, since all files + visited must be (re)read to discover their IDs. + + + + As you might expect, when using the general record IDs + method, you can only add or modify existing records with the + update command. + If you wish to delete records, you must use the, + delete command, with a directory as a parameter. + This will remove all records that match the files below that root + directory. + + + + + + Register Location + + + Normally, the index files that form dictionaries, inverted + files, record info, etc., are stored in the directory where you run + zebraidx. If you wish to store these, possibly large, + files somewhere else, you must add the register + entry to the zebra.cfg file. + Furthermore, the &zebra; system allows its file + structures to span multiple file systems, which is useful for + managing very large databases. + + + + The value of the register setting is a sequence + of tokens. Each token takes the form: + + dir:size + + The dir specifies a directory in which index files + will be stored and the size specifies the maximum + size of all files in that directory. The &zebra; indexer system fills + each directory in the order specified and use the next specified + directories as needed. + The size is an integer followed by a qualifier + code, + b for bytes, + k for kilobytes. + M for megabytes, + G for gigabytes. + Specifying a negative value disables the checking (it still needs the unit, + use -1b). + + + + For instance, if you have allocated three disks for your register, and + the first disk is mounted + on /d1 and has 2GB of free space, the + second, mounted on /d2 has 3.6 GB, and the third, + on which you have more space than you bother to worry about, mounted on + /d3 you could put this entry in your configuration file: + - $ zebraidx update /d1/records - $ zebraidx commit + register: /d1:2G /d2:3600M /d3:-1b - - + - Or you can execute multiple updates before committing the changes: + Note that &zebra; does not verify that the amount of space specified is + actually available on the directory (file system) specified - it is + your responsibility to ensure that enough space is available, and that + other applications do not attempt to use the free space. In a large + production system, it is recommended that you allocate one or more + file system exclusively to the &zebra; register files. + + + + + + Safe Updating - Using Shadow Registers + + + Description + + + The &zebra; server supports updating of the index + structures. That is, you can add, modify, or remove records from + databases managed by &zebra; without rebuilding the entire index. + Since this process involves modifying structured files with various + references between blocks of data in the files, the update process + is inherently sensitive to system crashes, or to process interruptions: + Anything but a successfully completed update process will leave the + register files in an unknown state, and you will essentially have no + recourse but to re-index everything, or to restore the register files + from a backup medium. + Further, while the update process is active, users cannot be + allowed to access the system, as the contents of the register files + may change unpredictably. + + + + You can solve these problems by enabling the shadow register system in + &zebra;. + During the updating procedure, zebraidx will temporarily + write changes to the involved files in a set of "shadow + files", without modifying the files that are accessed by the + active server processes. If the update procedure is interrupted by a + system crash or a signal, you simply repeat the procedure - the + register files have not been changed or damaged, and the partially + written shadow files are automatically deleted before the new updating + procedure commences. + + + + At the end of the updating procedure (or in a separate operation, if + you so desire), the system enters a "commit mode". First, + any active server processes are forced to access those blocks that + have been changed from the shadow files rather than from the main + register files; the unmodified blocks are still accessed at their + normal location (the shadow files are not a complete copy of the + register files - they only contain those parts that have actually been + modified). If the commit process is interrupted at any point during the + commit process, the server processes will continue to access the + shadow files until you can repeat the commit procedure and complete + the writing of data to the main register files. You can perform + multiple update operations to the registers before you commit the + changes to the system files, or you can execute the commit operation + at the end of each update operation. When the commit phase has + completed successfully, any running server processes are instructed to + switch their operations to the new, operational register, and the + temporary shadow files are deleted. + + + + + + How to Use Shadow Register Files + + + The first step is to allocate space on your system for the shadow + files. + You do this by adding a shadow entry to the + zebra.cfg file. + The syntax of the shadow entry is exactly the + same as for the register entry + (see ). + The location of the shadow area should be + different from the location of the main register + area (if you have specified one - remember that if you provide no + register setting, the default register area is the + working directory of the server and indexing processes). + + + + The following excerpt from a zebra.cfg file shows + one example of a setup that configures both the main register + location and the shadow file area. + Note that two directories or partitions have been set aside + for the shadow file area. You can specify any number of directories + for each of the file areas, but remember that there should be no + overlaps between the directories used for the main registers and the + shadow files, respectively. + + + + + register: /d1:500M + shadow: /scratch1:100M /scratch2:200M + + + + + + When shadow files are enabled, an extra command is available at the + zebraidx command line. + In order to make changes to the system take effect for the + users, you'll have to submit a "commit" command after a + (sequence of) update operation(s). + + + + + + $ zebraidx update /d1/records + $ zebraidx commit + + + + + + Or you can execute multiple updates before committing the changes: + + + + + + $ zebraidx -g books update /d1/records /d2/more-records + $ zebraidx -g fun update /d3/fun-records + $ zebraidx commit + + + + + + If one of the update operations above had been interrupted, the commit + operation on the last line would fail: zebraidx + will not let you commit changes that would destroy the running register. + You'll have to rerun all of the update operations since your last + commit operation, before you can commit the new changes. + + + + Similarly, if the commit operation fails, zebraidx + will not let you start a new update operation before you have + successfully repeated the commit operation. + The server processes will keep accessing the shadow files rather + than the (possibly damaged) blocks of the main register files + until the commit operation has successfully completed. + + + + You should be aware that update operations may take slightly longer + when the shadow register system is enabled, since more file access + operations are involved. Further, while the disk space required for + the shadow register data is modest for a small update operation, you + may prefer to disable the system if you are adding a very large number + of records to an already very large database (we use the terms + large and modest + very loosely here, since every application will have a + different perception of size). + To update the system without the use of the the shadow files, + simply run zebraidx with the -n + option (note that you do not have to execute the + commit command of zebraidx + when you temporarily disable the use of the shadow registers in + this fashion. + Note also that, just as when the shadow registers are not enabled, + server processes will be barred from accessing the main register + while the update procedure takes place. + + + + + + + + + Relevance Ranking and Sorting of Result Sets + + + Overview + + The default ordering of a result set is left up to the server, + which inside &zebra; means sorting in ascending document ID order. + This is not always the order humans want to browse the sometimes + quite large hit sets. Ranking and sorting comes to the rescue. + + + + In cases where a good presentation ordering can be computed at + indexing time, we can use a fixed static ranking + scheme, which is provided for the alvis + indexing filter. This defines a fixed ordering of hit lists, + independently of the query issued. + + + + There are cases, however, where relevance of hit set documents is + highly dependent on the query processed. + Simply put, dynamic relevance ranking + sorts a set of retrieved records such that those most likely to be + relevant to your request are retrieved first. + Internally, &zebra; retrieves all documents that satisfy your + query, and re-orders the hit list to arrange them based on + a measurement of similarity between your query and the content of + each record. + + + + Finally, there are situations where hit sets of documents should be + sorted during query time according to the + lexicographical ordering of certain sort indexes created at + indexing time. + + + + + + Static Ranking + + + &zebra; uses internally inverted indexes to look up term frequencies + in documents. Multiple queries from different indexes can be + combined by the binary boolean operations AND, + OR and/or NOT (which + is in fact a binary AND NOT operation). + To ensure fast query execution + speed, all indexes have to be sorted in the same order. + + + The indexes are normally sorted according to document + ID in + ascending order, and any query which does not invoke a special + re-ranking function will therefore retrieve the result set in + document + ID + order. + + + If one defines the + + staticrank: 1 + + directive in the main core &zebra; configuration file, the internal document + keys used for ordering are augmented by a preceding integer, which + contains the static rank of a given document, and the index lists + are ordered + first by ascending static rank, + then by ascending document ID. + Zero + is the ``best'' rank, as it occurs at the + beginning of the list; higher numbers represent worse scores. + + + The experimental alvis filter provides a + directive to fetch static rank information out of the indexed &acro.xml; + records, thus making all hit sets ordered + after ascending static + rank, and for those doc's which have the same static rank, ordered + after ascending doc ID. + See for the gory details. + + + + + + Dynamic Ranking + + In order to fiddle with the static rank order, it is necessary to + invoke additional re-ranking/re-ordering using dynamic + ranking or score functions. These functions return positive + integer scores, where highest score is + ``best''; + hit sets are sorted according to descending + scores (in contrary + to the index lists which are sorted according to + ascending rank number and document ID). + + + Dynamic ranking is enabled by a directive like one of the + following in the zebra configuration file (use only one of these a time!): + + rank: rank-1 # default TDF-IDF like + rank: rank-static # dummy do-nothing + + + + + Dynamic ranking is done at query time rather than + indexing time (this is why we + call it ``dynamic ranking'' in the first place ...) + It is invoked by adding + the &acro.bib1; relation attribute with + value ``relevance'' to the &acro.pqf; query (that is, + @attr 2=102, see also + + The &acro.bib1; Attribute Set Semantics, also in + HTML). + To find all articles with the word Eoraptor in + the title, and present them relevance ranked, issue the &acro.pqf; query: + + @attr 2=102 @attr 1=4 Eoraptor + + + + + Dynamically ranking using &acro.pqf; queries with the 'rank-1' + algorithm + + + The default rank-1 ranking module implements a + TF/IDF (Term Frequecy over Inverse Document Frequency) like + algorithm. In contrast to the usual definition of TF/IDF + algorithms, which only considers searching in one full-text + index, this one works on multiple indexes at the same time. + More precisely, + &zebra; does boolean queries and searches in specific addressed + indexes (there are inverted indexes pointing from terms in the + dictionary to documents and term positions inside documents). + It works like this: + + + Query Components + + + First, the boolean query is dismantled into its principal components, + i.e. atomic queries where one term is looked up in one index. + For example, the query + + @attr 2=102 @and @attr 1=1010 Utah @attr 1=1018 Springer + + is a boolean AND between the atomic parts + + @attr 2=102 @attr 1=1010 Utah + + and + + @attr 2=102 @attr 1=1018 Springer + + which gets processed each for itself. + + + + + + Atomic hit lists + + + Second, for each atomic query, the hit list of documents is + computed. + + + In this example, two hit lists for each index + @attr 1=1010 and + @attr 1=1018 are computed. + + + + + + Atomic scores + + + Third, each document in the hit list is assigned a score (_if_ ranking + is enabled and requested in the query) using a TF/IDF scheme. + + + In this example, both atomic parts of the query assign the magic + @attr 2=102 relevance attribute, and are + to be used in the relevance ranking functions. + + + It is possible to apply dynamic ranking on only parts of the + &acro.pqf; query: + + @and @attr 2=102 @attr 1=1010 Utah @attr 1=1018 Springer + + searches for all documents which have the term 'Utah' on the + body of text, and which have the term 'Springer' in the publisher + field, and sort them in the order of the relevance ranking made on + the body-of-text index only. + + + + + + Hit list merging + + + Fourth, the atomic hit lists are merged according to the boolean + conditions to a final hit list of documents to be returned. + + + This step is always performed, independently of the fact that + dynamic ranking is enabled or not. + + + + + + Document score computation + + + Fifth, the total score of a document is computed as a linear + combination of the atomic scores of the atomic hit lists + + + Ranking weights may be used to pass a value to a ranking + algorithm, using the non-standard &acro.bib1; attribute type 9. + This allows one branch of a query to use one value while + another branch uses a different one. For example, we can search + for utah in the + @attr 1=4 index with weight 30, as + well as in the @attr 1=1010 index with weight 20: + + @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 @attr 1=1010 city + + + + The default weight is + sqrt(1000) ~ 34 , as the &acro.z3950; standard prescribes that the top score + is 1000 and the bottom score is 0, encoded in integers. + + + + The ranking-weight feature is experimental. It may change in future + releases of zebra. + + + + + + + Re-sorting of hit list + + + Finally, the final hit list is re-ordered according to scores. + + + + + + + + + + The rank-1 algorithm + does not use the static rank + information in the list keys, and will produce the same ordering + with or without static ranking enabled. + + + + + + + + Dynamic ranking is not compatible + with estimated hit sizes, as all documents in + a hit set must be accessed to compute the correct placing in a + ranking sorted list. Therefore the use attribute setting + @attr 2=102 clashes with + @attr 9=integer. + + + + + + + + + Dynamically ranking &acro.cql; queries + + Dynamic ranking can be enabled during sever side &acro.cql; + query expansion by adding @attr 2=102 + chunks to the &acro.cql; config file. For example + + relationModifier.relevant = 2=102 + + invokes dynamic ranking each time a &acro.cql; query of the form + + Z> querytype cql + Z> f alvis.text =/relevant house + + is issued. Dynamic ranking can also be automatically used on + specific &acro.cql; indexes by (for example) setting + + index.alvis.text = 1=text 2=102 + + which then invokes dynamic ranking each time a &acro.cql; query of the form + + Z> querytype cql + Z> f alvis.text = house + + is issued. + + + + + + + + + Sorting + + &zebra; sorts efficiently using special sorting indexes + (type=s; so each sortable index must be known + at indexing time, specified in the configuration of record + indexing. For example, to enable sorting according to the &acro.bib1; + Date/time-added-to-db field, one could add the line + + xelm /*/@created Date/time-added-to-db:s + + to any .abs record-indexing configuration file. + Similarly, one could add an indexing element of the form + + + + ]]> + to any alvis-filter indexing stylesheet. + + + Indexing can be specified at searching time using a query term + carrying the non-standard + &acro.bib1; attribute-type 7. This removes the + need to send a &acro.z3950; Sort Request + separately, and can dramatically improve latency when the client + and server are on separate networks. + The sorting part of the query is separate from the rest of the + query - the actual search specification - and must be combined + with it using OR. + + + A sorting subquery needs two attributes: an index (such as a + &acro.bib1; type-1 attribute) specifying which index to sort on, and a + type-7 attribute whose value is be 1 for + ascending sorting, or 2 for descending. The + term associated with the sorting attribute is the priority of + the sort key, where 0 specifies the primary + sort key, 1 the secondary sort key, and so + on. + + For example, a search for water, sort by title (ascending), + is expressed by the &acro.pqf; query + + @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 + + whereas a search for water, sort by title ascending, + then date descending would be + + @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1 + + + + Notice the fundamental differences between dynamic + ranking and sorting: there can be + only one ranking function defined and configured; but multiple + sorting indexes can be specified dynamically at search + time. Ranking does not need to use specific indexes, so + dynamic ranking can be enabled and disabled without + re-indexing; whereas, sorting indexes need to be + defined before indexing. + + + + + + + + + Extended Services: Remote Insert, Update and Delete + + + + Extended services are only supported when accessing the &zebra; + server using the &acro.z3950; + protocol. The &acro.sru; protocol does + not support extended services. + + + + + The extended services are not enabled by default in zebra - due to the + fact that they modify the system. &zebra; can be configured + to allow anybody to + search, and to allow only updates for a particular admin user + in the main zebra configuration file zebra.cfg. + For user admin, you could use: + + perm.anonymous: r + perm.admin: rw + passwd: passwordfile + + And in the password file + passwordfile, you have to specify users and + encrypted passwords as colon separated strings. + Use a tool like htpasswd + to maintain the encrypted passwords. + + admin:secret + + It is essential to configure &zebra; to store records internally, + and to support + modifications and deletion of records: + + storeData: 1 + storeKeys: 1 + + The general record type should be set to any record filter which + is able to parse &acro.xml; records, you may use any of the two + declarations (but not both simultaneously!) + + recordType: dom.filter_dom_conf.xml + # recordType: grs.xml + + Notice the difference to the specific instructions + + recordType.xml: dom.filter_dom_conf.xml + # recordType.xml: grs.xml + + which only work when indexing XML files from the filesystem using + the *.xml naming convention. - - + To enable transaction safe shadow indexing, + which is extra important for this kind of operation, set - $ zebraidx -g books update /d1/records /d2/more-records - $ zebraidx -g fun update /d3/fun-records - $ zebraidx commit + shadow: directoryname: size (e.g. 1000M) - - - - - If one of the update operations above had been interrupted, the commit - operation on the last line would fail: zebraidx - will not let you commit changes that would destroy the running register. - You'll have to rerun all of the update operations since your last - commit operation, before you can commit the new changes. - - - - Similarly, if the commit operation fails, zebraidx - will not let you start a new update operation before you have - successfully repeated the commit operation. - The server processes will keep accessing the shadow files rather - than the (possibly damaged) blocks of the main register files - until the commit operation has successfully completed. - - - - You should be aware that update operations may take slightly longer - when the shadow register system is enabled, since more file access - operations are involved. Further, while the disk space required for - the shadow register data is modest for a small update operation, you - may prefer to disable the system if you are adding a very large number - of records to an already very large database (we use the terms - large and modest - very loosely here, since every application will have a - different perception of size). - To update the system without the use of the the shadow files, - simply run zebraidx with the -n - option (note that you do not have to execute the - commit command of zebraidx - when you temporarily disable the use of the shadow registers in - this fashion. - Note also that, just as when the shadow registers are not enabled, - server processes will be barred from accessing the main register - while the update procedure takes place. - - - - - - - + See for additional information on + these configuration options. + + + + It is not possible to carry information about record types or + similar to &zebra; when using extended services, due to + limitations of the &acro.z3950; + protocol. Therefore, indexing filters can not be chosen on a + per-record basis. One and only one general &acro.xml; indexing filter + must be defined. + + + + + + + + Extended services in the &acro.z3950; protocol + + + The &acro.z3950; standard allows + servers to accept special binary extended services + protocol packages, which may be used to insert, update and delete + records into servers. These carry control and update + information to the servers, which are encoded in seven package fields: + + + + Extended services &acro.z3950; Package Fields + + + + Parameter + Value + Notes + + + + + type + 'update' + Must be set to trigger extended services + + + action + string + + Extended service action type with + one of four possible values: recordInsert, + recordReplace, + recordDelete, + and specialUpdate + + + + record + &acro.xml; string + An &acro.xml; formatted string containing the record + + + syntax + 'xml' + XML/SUTRS/MARC. GRS-1 not supported. + The default filter (record type) as given by recordType in + zebra.cfg is used to parse the record. + + + recordIdOpaque + string + + Optional client-supplied, opaque record + identifier used under insert operations. + + + + recordIdNumber + positive number + &zebra;'s internal system number, + not allowed for recordInsert or + specialUpdate actions which result in fresh + record inserts. + + + + databaseName + database identifier + + The name of the database to which the extended services should be + applied. + + + + +
+ + + + The action parameter can be any of + recordInsert (will fail if the record already exists), + recordReplace (will fail if the record does not exist), + recordDelete (will fail if the record does not + exist), and + specialUpdate (will insert or update the record + as needed, record deletion is not possible). + + + + During all actions, the + usual rules for internal record ID generation apply, unless an + optional recordIdNumber &zebra; internal ID or a + recordIdOpaque string identifier is assigned. + The default ID generation is + configured using the recordId: from + zebra.cfg. + See . + + + + Setting of the recordIdNumber parameter, + which must be an existing &zebra; internal system ID number, is not + allowed during any recordInsert or + specialUpdate action resulting in fresh record + inserts. + + + + When retrieving existing + records indexed with &acro.grs1; indexing filters, the &zebra; internal + ID number is returned in the field + /*/id:idzebra/localnumber in the namespace + xmlns:id="http://www.indexdata.dk/zebra/", + where it can be picked up for later record updates or deletes. + + + + A new element set for retrieval of internal record + data has been added, which can be used to access minimal records + containing only the recordIdNumber &zebra; + internal ID, or the recordIdOpaque string + identifier. This works for any indexing filter used. + See . + + + + The recordIdOpaque string parameter + is an client-supplied, opaque record + identifier, which may be used under + insert, update and delete operations. The + client software is responsible for assigning these to + records. This identifier will + replace zebra's own automagic identifier generation with a unique + mapping from recordIdOpaque to the + &zebra; internal recordIdNumber. + The opaque recordIdOpaque string + identifiers + are not visible in retrieval records, nor are + searchable, so the value of this parameter is + questionable. It serves mostly as a convenient mapping from + application domain string identifiers to &zebra; internal ID's. + + +
+ + + + Extended services from yaz-client + + + We can now start a yaz-client admin session and create a database: + + adm-create + ]]> + + Now the Default database was created, + we can insert an &acro.xml; file (esdd0006.grs + from example/gils/records) and index it: + + update insert id1234 esdd0006.grs + ]]> + + The 3rd parameter - id1234 here - + is the recordIdOpaque package field. + + + Actually, we should have a way to specify "no opaque record id" for + yaz-client's update command.. We'll fix that. + + + The newly inserted record can be searched as usual: + + f utah + Sent searchRequest. + Received SearchResponse. + Search was a success. + Number of hits: 1, setno 1 + SearchResult-1: term=utah cnt=1 + records returned: 0 + Elapsed: 0.014179 + ]]> + + + + Let's delete the beast, using the same + recordIdOpaque string parameter: + + update delete id1234 + No last record (update ignored) + Z> update delete 1 esdd0006.grs + Got extended services response + Status: done + Elapsed: 0.072441 + Z> f utah + Sent searchRequest. + Received SearchResponse. + Search was a success. + Number of hits: 0, setno 2 + SearchResult-1: term=utah cnt=0 + records returned: 0 + Elapsed: 0.013610 + ]]> + + + + If shadow register is enabled in your + zebra.cfg, + you must run the adm-commit command + + adm-commit + ]]> + + after each update session in order write your changes from the + shadow to the life register space. + + + + + + Extended services from yaz-php + + + Extended services are also available from the &yaz; &acro.php; client layer. An + example of an &yaz;-&acro.php; extended service transaction is given here: + + A fine specimen of a record'; + + $options = array('action' => 'recordInsert', + 'syntax' => 'xml', + 'record' => $record, + 'databaseName' => 'mydatabase' + ); + + yaz_es($yaz, 'update', $options); + yaz_es($yaz, 'commit', array()); + yaz_wait(); + + if ($error = yaz_error($yaz)) + echo "$error"; + ]]> + + + + + + Extended services debugging guide + + When debugging ES over PHP we recommend the following order of tests: + + + + + + Make sure you have a nice record on your filesystem, which you can + index from the filesystem by use of the zebraidx command. + Do it exactly as you planned, using one of the GRS-1 filters, + or the DOMXML filter. + When this works, proceed. + + + + + Check that your server setup is OK before you even coded one single + line PHP using ES. + Take the same record form the file system, and send as ES via + yaz-client like described in + , + and + remember the -a option which tells you what + goes over the wire! Notice also the section on permissions: + try + + perm.anonymous: rw + + in zebra.cfg to make sure you do not run into + permission problems (but never expose such an insecure setup on the + internet!!!). Then, make sure to set the general + recordType instruction, pointing correctly + to the GRS-1 filters, + or the DOMXML filters. + + + + + If you insist on using the sysno in the + recordIdNumber setting, + please make sure you do only updates and deletes. Zebra's internal + system number is not allowed for + recordInsert or + specialUpdate actions + which result in fresh record inserts. + + + + + If shadow register is enabled in your + zebra.cfg, you must remember running the + + Z> adm-commit + + command as well. + + + + + If this works, then proceed to do the same thing in your PHP script. + + + + + + + + + +
+