X-Git-Url: http://git.indexdata.com/?p=idzebra-moved-to-github.git;a=blobdiff_plain;f=doc%2Fadministration.xml;h=762ba7db9f2dea232dfc8d6f2106713b6272652c;hp=28e480247e9ad15cd0f2baf2799ec8e60a0a3631;hb=99842ec71f065fd6886daa355923b01d9ce71d26;hpb=d2e692248eac6469ef7a3a3f8044010cb5cc1da7 diff --git a/doc/administration.xml b/doc/administration.xml index 28e4802..762ba7d 100644 --- a/doc/administration.xml +++ b/doc/administration.xml @@ -1,20 +1,19 @@ - - Administrating Zebra + Administrating &zebra; - Unlike many simpler retrieval systems, Zebra supports safe, incremental + Unlike many simpler retrieval systems, &zebra; supports safe, incremental updates to an existing index. - Normally, when Zebra modifies the index it reads a number of records + Normally, when &zebra; modifies the index it reads a number of records that you specify. Depending on your specifications and on the contents of each record one the following events take place for each record: @@ -25,8 +24,8 @@ The record is indexed as if it never occurred before. - Either the Zebra system doesn't know how to identify the record or - Zebra can identify the record but didn't find it to be already indexed. + Either the &zebra; system doesn't know how to identify the record or + &zebra; can identify the record but didn't find it to be already indexed. @@ -53,20 +52,20 @@ - Please note that in both the modify- and delete- case the Zebra + Please note that in both the modify- and delete- case the &zebra; indexer must be able to generate a unique key that identifies the record in question (more on this below). - To administrate the Zebra retrieval system, you run the + To administrate the &zebra; retrieval system, you run the zebraidx program. This program supports a number of options which are preceded by a dash, and a few commands (not preceded by dash). - Both the Zebra administrative tool and the Z39.50 server share a + Both the &zebra; administrative tool and the &acro.z3950; server share a set of index files and a global configuration file. The name of the configuration file defaults to zebra.cfg. @@ -85,7 +84,7 @@ Indexing is a per-record process, in which either insert/modify/delete will occur. Before a record is indexed search keys are extracted from whatever might be the layout the original record (sgml,html,text, etc..). - The Zebra system currently supports two fundamental types of records: + The &zebra; system currently supports two fundamental types of records: structured and simple text. To specify a particular extraction process, use either the command line option -t or specify a @@ -94,11 +93,11 @@ - - The Zebra Configuration File + + The &zebra; Configuration File - The Zebra configuration file, read by zebraidx and + The &zebra; configuration file, read by zebraidx and zebrasrv defaults to zebra.cfg unless specified by -c option. @@ -127,7 +126,7 @@ In the configuration file, the group name is placed before the option name itself, separated by a dot (.). For instance, to set the record type for group public to grs.sgml - (the SGML-like format for structured records) you would write: + (the &acro.sgml;-like format for structured records) you would write: @@ -195,7 +194,7 @@ database - Specifies the Z39.50 database name. + Specifies the &acro.z3950; database name. @@ -220,10 +219,10 @@ Specifies whether the records should be stored internally - in the Zebra system files. + in the &zebra; system files. If you want to maintain the raw records yourself, this option should be false (0). - If you want Zebra to take care of the records for you, it + If you want &zebra; to take care of the records for you, it should be true(1). @@ -233,7 +232,7 @@ register: register-location - Specifies the location of the various register files that Zebra uses + Specifies the location of the various register files that &zebra; uses to represent your databases. See . @@ -243,7 +242,7 @@ shadow: register-location - Enables the safe update facility of Zebra, and + Enables the safe update facility of &zebra;, and tells the system where to place the required, temporary files. See . @@ -276,25 +275,98 @@ - profilePath: path + profilePath: path Specifies a path of profile specification files. The path is composed of one or more directories separated by - colon. Similar to PATH for UNIX systems. + colon. Similar to PATH for UNIX systems. + + + modulePath: path + + + Specifies a path of record filter modules. + The path is composed of one or more directories separated by + colon. Similar to PATH for UNIX systems. + The 'make install' procedure typically puts modules in + /usr/local/lib/idzebra-2.0/modules. + + + + + + index: filename + + + Defines the filename which holds fields structure + definitions. If omitted, the file default.idx + is read. + Refer to for + more information. + + + + + + sortmax: integer + + + Specifies the maximum number of records that will be sorted + in a result set. If the result set contains more than + integer records, records after the + limit will not be sorted. If omitted, the default value is + 1,000. + + + + + + staticrank: integer + + + Enables whether static ranking is to be enabled (1) or + disabled (0). If omitted, it is disabled - corresponding + to a value of 0. + Refer to . + + + + + + + estimatehits: integer + + + Controls whether &zebra; should calculate approximate hit counts and + at which hit count it is to be enabled. + A value of 0 disables approximate hit counts. + For a positive value approximate hit count is enabled + if it is known to be larger than integer. + + + Approximate hit counts can also be triggered by a particular + attribute in a query. + Refer to . + + + + attset: filename - Specifies the filename(s) of attribute set files for use in - searching. At least the Bib-1 set should be loaded - (bib1.att). - The profilePath setting is used to look for - the specified files. - See + Specifies the filename(s) of attribute set files for use in + searching. In many configurations bib1.att + is used, but that is not required. If Classic Explain + attributes is to be used for searching, + explain.att must be given. + The path to att-files in general can be given using + profilePath setting. + See also . @@ -305,6 +377,19 @@ Specifies size of internal memory to use for the zebraidx program. The amount is given in megabytes - default is 4 (4 MB). + The more memory, the faster large updates happen, up to about + half the free memory available on the computer. + + + + + tempfiles: Yes/Auto/No + + + Tells zebra if it should use temporary files when indexing. The + default is Auto, in which case zebra uses temporary files only + if it would need more that memMax + megabytes of memory. This should be good for most uses. @@ -313,15 +398,112 @@ root: dir - Specifies a directory base for Zebra. All relative paths + Specifies a directory base for &zebra;. All relative paths given (in profilePath, register, shadow) are based on this - directory. This setting is useful if your Zebra server + directory. This setting is useful if your &zebra; server is running in a different directory from where zebra.cfg is located. + + passwd: file + + + Specifies a file with description of user accounts for &zebra;. + The format is similar to that known to Apache's htpasswd files + and UNIX' passwd files. Non-empty lines not beginning with + # are considered account lines. There is one account per-line. + A line consists of fields separate by a single colon character. + First field is username, second is password. + + + + + + passwd.c: file + + + Specifies a file with description of user accounts for &zebra;. + File format is similar to that used by the passwd directive except + that the password are encrypted. Use Apache's htpasswd or similar + for maintenance. + + + + + + perm.user: + permstring + + + Specifies permissions (privilege) for a user that are allowed + to access &zebra; via the passwd system. There are two kinds + of permissions currently: read (r) and write(w). By default + users not listed in a permission directive are given the read + privilege. To specify permissions for a user with no + username, or &acro.z3950; anonymous style use + anonymous. The permstring consists of + a sequence of characters. Include character w + for write/update access, r for read access and + a to allow anonymous access through this account. + + + + + + dbaccess: accessfile + + + Names a file which lists database subscriptions for individual users. + The access file should consists of lines of the form + username: dbnames, where dbnames is a list of + database names, separated by '+'. No whitespace is allowed in the + database list. + + + + + + encoding: charsetname + + + Tells &zebra; to interpret the terms in Z39.50 queries as + having been encoded using the specified character + encoding. The default is ISO-8859-1; one + useful alternative is UTF-8. + + + + + + storeKeys: value + + + Specifies whether &zebra; keeps a copy of indexed keys. + Use a value of 1 to enable; 0 to disable. If storeKeys setting is + omitted, it is enabled. Enabled storeKeys + are required for updating and deleting records. Disable only + storeKeys to save space and only plan to index data once. + + + + + + storeData: value + + + Specifies whether &zebra; keeps a copy of indexed records. + Use a value of 1 to enable; 0 to disable. If storeData setting is + omitted, it is enabled. A storeData setting of 0 (disabled) makes + Zebra fetch records from the original locaction in the file + system using filename, file offset and file length. For the + DOM and ALVIS filter, the storeData setting is ignored. + + + + @@ -331,9 +513,9 @@ Locating Records - The default behavior of the Zebra system is to reference the + The default behavior of the &zebra; system is to reference the records from their original location, i.e. where they were found when you - ran zebraidx. + run zebraidx. That is, when a client wishes to retrieve a record following a search operation, the files are accessed from the place where you originally put them - if you remove the files (without @@ -346,9 +528,9 @@ If your input files are not permanent - for example if you retrieve your records from an outside source, or if they were temporarily mounted on a CD-ROM drive, - you may want Zebra to make an internal copy of them. To do this, + you may want &zebra; to make an internal copy of them. To do this, you specify 1 (true) in the storeData setting. When - the Z39.50 server retrieves the records they will be read from the + the &acro.z3950; server retrieves the records they will be read from the internal file structures of the system. @@ -377,14 +559,14 @@ Consider a system in which you have a group of text files called simple. - That group of records should belong to a Z39.50 database called + That group of records should belong to a &acro.z3950; database called textbase. The following zebra.cfg file will suffice: - profilePath: /usr/local/yaz + profilePath: /usr/local/idzebra/tab attset: bib1.att simple.recordType: text simple.database: textbase @@ -440,7 +622,7 @@ To enable indexing with pathname IDs, you must specify file as the value of recordId in the configuration file. In addition, you should set - storeKeys to 1, since the Zebra + storeKeys to 1, since the &zebra; indexer must save additional information about the contents of each record in order to modify the indexes correctly at a later time. @@ -470,7 +652,7 @@ You cannot start out with a group of records with simple indexing (no record IDs as in the previous section) and then later - enable file record Ids. Zebra must know from the first time that you + enable file record Ids. &zebra; must know from the first time that you index the group that the files should be indexed with file record IDs. @@ -496,7 +678,7 @@ information. If you have a group of records that explicitly associates an ID with each record, this method is convenient. For example, the record format may contain a title or a ID-number - unique within the group. - In either case you specify the Z39.50 attribute set and use-attribute + In either case you specify the &acro.z3950; attribute set and use-attribute location in which this information is stored, and the system looks at that field to determine the identity of the record. @@ -581,9 +763,9 @@ - For instance, the sample GILS records that come with the Zebra + For instance, the sample GILS records that come with the &zebra; distribution contain a unique ID in the data tagged Control-Identifier. - The data is mapped to the Bib-1 use attribute Identifier-standard + The data is mapped to the &acro.bib1; use attribute Identifier-standard (code 1007). To use this field as a record id, specify (bib1,Identifier-standard) as the value of the recordId in the configuration file. @@ -600,7 +782,7 @@ - (see + (see for details of how the mapping between elements of your records and searchable attributes is established). @@ -635,7 +817,7 @@ zebraidx. If you wish to store these, possibly large, files somewhere else, you must add the register entry to the zebra.cfg file. - Furthermore, the Zebra system allows its file + Furthermore, the &zebra; system allows its file structures to span multiple file systems, which is useful for managing very large databases. @@ -644,13 +826,11 @@ The value of the register setting is a sequence of tokens. Each token takes the form: - - dir:size. - + dir:size The dir specifies a directory in which index files will be stored and the size specifies the maximum - size of all files in that directory. The Zebra indexer system fills + size of all files in that directory. The &zebra; indexer system fills each directory in the order specified and use the next specified directories as needed. The size is an integer followed by a qualifier @@ -659,28 +839,30 @@ k for kilobytes. M for megabytes, G for gigabytes. + Specifying a negative value disables the checking (it still needs the unit, + use -1b). - For instance, if you have allocated two disks for your register, and + For instance, if you have allocated three disks for your register, and the first disk is mounted - on /d1 and has 2GB of free space and the - second, mounted on /d2 has 3.6 GB, you could - put this entry in your configuration file: + on /d1 and has 2GB of free space, the + second, mounted on /d2 has 3.6 GB, and the third, + on which you have more space than you bother to worry about, mounted on + /d3 you could put this entry in your configuration file: - register: /d1:2G /d2:3600M + register: /d1:2G /d2:3600M /d3:-1b - - Note that Zebra does not verify that the amount of space specified is + Note that &zebra; does not verify that the amount of space specified is actually available on the directory (file system) specified - it is your responsibility to ensure that enough space is available, and that other applications do not attempt to use the free space. In a large production system, it is recommended that you allocate one or more - file system exclusively to the Zebra register files. + file system exclusively to the &zebra; register files. @@ -688,13 +870,13 @@ Safe Updating - Using Shadow Registers - + Description - The Zebra server supports updating of the index + The &zebra; server supports updating of the index structures. That is, you can add, modify, or remove records from - databases managed by Zebra without rebuilding the entire index. + databases managed by &zebra; without rebuilding the entire index. Since this process involves modifying structured files with various references between blocks of data in the files, the update process is inherently sensitive to system crashes, or to process interruptions: @@ -709,7 +891,7 @@ You can solve these problems by enabling the shadow register system in - Zebra. + &zebra;. During the updating procedure, zebraidx will temporarily write changes to the involved files in a set of "shadow files", without modifying the files that are accessed by the @@ -742,7 +924,7 @@ - + How to Use Shadow Register Files @@ -774,7 +956,6 @@ register: /d1:500M - shadow: /scratch1:100M /scratch2:200M @@ -852,8 +1033,927 @@ + + + + Relevance Ranking and Sorting of Result Sets + + + Overview + + The default ordering of a result set is left up to the server, + which inside &zebra; means sorting in ascending document ID order. + This is not always the order humans want to browse the sometimes + quite large hit sets. Ranking and sorting comes to the rescue. + + + + In cases where a good presentation ordering can be computed at + indexing time, we can use a fixed static ranking + scheme, which is provided for the alvis + indexing filter. This defines a fixed ordering of hit lists, + independently of the query issued. + + + + There are cases, however, where relevance of hit set documents is + highly dependent on the query processed. + Simply put, dynamic relevance ranking + sorts a set of retrieved records such that those most likely to be + relevant to your request are retrieved first. + Internally, &zebra; retrieves all documents that satisfy your + query, and re-orders the hit list to arrange them based on + a measurement of similarity between your query and the content of + each record. + + + + Finally, there are situations where hit sets of documents should be + sorted during query time according to the + lexicographical ordering of certain sort indexes created at + indexing time. + + + + + + Static Ranking + + + &zebra; uses internally inverted indexes to look up term frequencies + in documents. Multiple queries from different indexes can be + combined by the binary boolean operations AND, + OR and/or NOT (which + is in fact a binary AND NOT operation). + To ensure fast query execution + speed, all indexes have to be sorted in the same order. + + + The indexes are normally sorted according to document + ID in + ascending order, and any query which does not invoke a special + re-ranking function will therefore retrieve the result set in + document + ID + order. + + + If one defines the + + staticrank: 1 + + directive in the main core &zebra; configuration file, the internal document + keys used for ordering are augmented by a preceding integer, which + contains the static rank of a given document, and the index lists + are ordered + first by ascending static rank, + then by ascending document ID. + Zero + is the ``best'' rank, as it occurs at the + beginning of the list; higher numbers represent worse scores. + + + The experimental alvis filter provides a + directive to fetch static rank information out of the indexed &acro.xml; + records, thus making all hit sets ordered + after ascending static + rank, and for those doc's which have the same static rank, ordered + after ascending doc ID. + See for the gory details. + + + + + + Dynamic Ranking + + In order to fiddle with the static rank order, it is necessary to + invoke additional re-ranking/re-ordering using dynamic + ranking or score functions. These functions return positive + integer scores, where highest score is + ``best''; + hit sets are sorted according to descending + scores (in contrary + to the index lists which are sorted according to + ascending rank number and document ID). + + + Dynamic ranking is enabled by a directive like one of the + following in the zebra configuration file (use only one of these a time!): + + rank: rank-1 # default TDF-IDF like + rank: rank-static # dummy do-nothing + + + + + Dynamic ranking is done at query time rather than + indexing time (this is why we + call it ``dynamic ranking'' in the first place ...) + It is invoked by adding + the &acro.bib1; relation attribute with + value ``relevance'' to the &acro.pqf; query (that is, + @attr 2=102, see also + + The &acro.bib1; Attribute Set Semantics, also in + HTML). + To find all articles with the word Eoraptor in + the title, and present them relevance ranked, issue the &acro.pqf; query: + + @attr 2=102 @attr 1=4 Eoraptor + + + + + Dynamically ranking using &acro.pqf; queries with the 'rank-1' + algorithm + + + The default rank-1 ranking module implements a + TF/IDF (Term Frequecy over Inverse Document Frequency) like + algorithm. In contrast to the usual definition of TF/IDF + algorithms, which only considers searching in one full-text + index, this one works on multiple indexes at the same time. + More precisely, + &zebra; does boolean queries and searches in specific addressed + indexes (there are inverted indexes pointing from terms in the + dictionary to documents and term positions inside documents). + It works like this: + + + Query Components + + + First, the boolean query is dismantled into its principal components, + i.e. atomic queries where one term is looked up in one index. + For example, the query + + @attr 2=102 @and @attr 1=1010 Utah @attr 1=1018 Springer + + is a boolean AND between the atomic parts + + @attr 2=102 @attr 1=1010 Utah + + and + + @attr 2=102 @attr 1=1018 Springer + + which gets processed each for itself. + + + + + + Atomic hit lists + + + Second, for each atomic query, the hit list of documents is + computed. + + + In this example, two hit lists for each index + @attr 1=1010 and + @attr 1=1018 are computed. + + + + + + Atomic scores + + + Third, each document in the hit list is assigned a score (_if_ ranking + is enabled and requested in the query) using a TF/IDF scheme. + + + In this example, both atomic parts of the query assign the magic + @attr 2=102 relevance attribute, and are + to be used in the relevance ranking functions. + + + It is possible to apply dynamic ranking on only parts of the + &acro.pqf; query: + + @and @attr 2=102 @attr 1=1010 Utah @attr 1=1018 Springer + + searches for all documents which have the term 'Utah' on the + body of text, and which have the term 'Springer' in the publisher + field, and sort them in the order of the relevance ranking made on + the body-of-text index only. + + + + + + Hit list merging + + + Fourth, the atomic hit lists are merged according to the boolean + conditions to a final hit list of documents to be returned. + + + This step is always performed, independently of the fact that + dynamic ranking is enabled or not. + + + + + + Document score computation + + + Fifth, the total score of a document is computed as a linear + combination of the atomic scores of the atomic hit lists + + + Ranking weights may be used to pass a value to a ranking + algorithm, using the non-standard &acro.bib1; attribute type 9. + This allows one branch of a query to use one value while + another branch uses a different one. For example, we can search + for utah in the + @attr 1=4 index with weight 30, as + well as in the @attr 1=1010 index with weight 20: + + @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 @attr 1=1010 city + + + + The default weight is + sqrt(1000) ~ 34 , as the &acro.z3950; standard prescribes that the top score + is 1000 and the bottom score is 0, encoded in integers. + + + + The ranking-weight feature is experimental. It may change in future + releases of zebra. + + + + + + + Re-sorting of hit list + + + Finally, the final hit list is re-ordered according to scores. + + + + + + + + + + + + + + The rank-1 algorithm + does not use the static rank + information in the list keys, and will produce the same ordering + with or without static ranking enabled. + + + + + + + Dynamic ranking is not compatible + with estimated hit sizes, as all documents in + a hit set must be accessed to compute the correct placing in a + ranking sorted list. Therefore the use attribute setting + @attr 2=102 clashes with + @attr 9=integer. + + + + + + + + + Dynamically ranking &acro.cql; queries + + Dynamic ranking can be enabled during sever side &acro.cql; + query expansion by adding @attr 2=102 + chunks to the &acro.cql; config file. For example + + relationModifier.relevant = 2=102 + + invokes dynamic ranking each time a &acro.cql; query of the form + + Z> querytype cql + Z> f alvis.text =/relevant house + + is issued. Dynamic ranking can also be automatically used on + specific &acro.cql; indexes by (for example) setting + + index.alvis.text = 1=text 2=102 + + which then invokes dynamic ranking each time a &acro.cql; query of the form + + Z> querytype cql + Z> f alvis.text = house + + is issued. + + + + + + + + + Sorting + + &zebra; sorts efficiently using special sorting indexes + (type=s; so each sortable index must be known + at indexing time, specified in the configuration of record + indexing. For example, to enable sorting according to the &acro.bib1; + Date/time-added-to-db field, one could add the line + + xelm /*/@created Date/time-added-to-db:s + + to any .abs record-indexing configuration file. + Similarly, one could add an indexing element of the form + + + + ]]> + to any alvis-filter indexing stylesheet. + + + Indexing can be specified at searching time using a query term + carrying the non-standard + &acro.bib1; attribute-type 7. This removes the + need to send a &acro.z3950; Sort Request + separately, and can dramatically improve latency when the client + and server are on separate networks. + The sorting part of the query is separate from the rest of the + query - the actual search specification - and must be combined + with it using OR. + + + A sorting subquery needs two attributes: an index (such as a + &acro.bib1; type-1 attribute) specifying which index to sort on, and a + type-7 attribute whose value is be 1 for + ascending sorting, or 2 for descending. The + term associated with the sorting attribute is the priority of + the sort key, where 0 specifies the primary + sort key, 1 the secondary sort key, and so + on. + + For example, a search for water, sort by title (ascending), + is expressed by the &acro.pqf; query + + @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 + + whereas a search for water, sort by title ascending, + then date descending would be + + @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1 + + + + Notice the fundamental differences between dynamic + ranking and sorting: there can be + only one ranking function defined and configured; but multiple + sorting indexes can be specified dynamically at search + time. Ranking does not need to use specific indexes, so + dynamic ranking can be enabled and disabled without + re-indexing; whereas, sorting indexes need to be + defined before indexing. + + + + + + + + + Extended Services: Remote Insert, Update and Delete + + + + Extended services are only supported when accessing the &zebra; + server using the &acro.z3950; + protocol. The &acro.sru; protocol does + not support extended services. + + + + + The extended services are not enabled by default in zebra - due to the + fact that they modify the system. &zebra; can be configured + to allow anybody to + search, and to allow only updates for a particular admin user + in the main zebra configuration file zebra.cfg. + For user admin, you could use: + + perm.anonymous: r + perm.admin: rw + passwd: passwordfile + + And in the password file + passwordfile, you have to specify users and + encrypted passwords as colon separated strings. + Use a tool like htpasswd + to maintain the encrypted passwords. + + admin:secret + + It is essential to configure &zebra; to store records internally, + and to support + modifications and deletion of records: + + storeData: 1 + storeKeys: 1 + + The general record type should be set to any record filter which + is able to parse &acro.xml; records, you may use any of the two + declarations (but not both simultaneously!) + + recordType: dom.filter_dom_conf.xml + # recordType: grs.xml + + Notice the difference to the specific instructions + + recordType.xml: dom.filter_dom_conf.xml + # recordType.xml: grs.xml + + which only work when indexing XML files from the filesystem using + the *.xml naming convention. + + + To enable transaction safe shadow indexing, + which is extra important for this kind of operation, set + + shadow: directoryname: size (e.g. 1000M) + + See for additional information on + these configuration options. + + + + It is not possible to carry information about record types or + similar to &zebra; when using extended services, due to + limitations of the &acro.z3950; + protocol. Therefore, indexing filters can not be chosen on a + per-record basis. One and only one general &acro.xml; indexing filter + must be defined. + + + + + + + + Extended services in the &acro.z3950; protocol + + + The &acro.z3950; standard allows + servers to accept special binary extended services + protocol packages, which may be used to insert, update and delete + records into servers. These carry control and update + information to the servers, which are encoded in seven package fields: + + + + Extended services &acro.z3950; Package Fields + + + + Parameter + Value + Notes + + + + + type + 'update' + Must be set to trigger extended services + + + action + string + + Extended service action type with + one of four possible values: recordInsert, + recordReplace, + recordDelete, + and specialUpdate + + + + record + &acro.xml; string + An &acro.xml; formatted string containing the record + + + syntax + 'xml' + XML/SUTRS/MARC. GRS-1 not supported. + The default filter (record type) as given by recordType in + zebra.cfg is used to parse the record. + + + recordIdOpaque + string + + Optional client-supplied, opaque record + identifier used under insert operations. + + + + recordIdNumber + positive number + &zebra;'s internal system number, + not allowed for recordInsert or + specialUpdate actions which result in fresh + record inserts. + + + + databaseName + database identifier + + The name of the database to which the extended services should be + applied. + + + + +
+ + + + The action parameter can be any of + recordInsert (will fail if the record already exists), + recordReplace (will fail if the record does not exist), + recordDelete (will fail if the record does not + exist), and + specialUpdate (will insert or update the record + as needed, record deletion is not possible). + + + + During all actions, the + usual rules for internal record ID generation apply, unless an + optional recordIdNumber &zebra; internal ID or a + recordIdOpaque string identifier is assigned. + The default ID generation is + configured using the recordId: from + zebra.cfg. + See . + + + + Setting of the recordIdNumber parameter, + which must be an existing &zebra; internal system ID number, is not + allowed during any recordInsert or + specialUpdate action resulting in fresh record + inserts. + + + + When retrieving existing + records indexed with &acro.grs1; indexing filters, the &zebra; internal + ID number is returned in the field + /*/id:idzebra/localnumber in the namespace + xmlns:id="http://www.indexdata.dk/zebra/", + where it can be picked up for later record updates or deletes. + + + + A new element set for retrieval of internal record + data has been added, which can be used to access minimal records + containing only the recordIdNumber &zebra; + internal ID, or the recordIdOpaque string + identifier. This works for any indexing filter used. + See . + + + + The recordIdOpaque string parameter + is an client-supplied, opaque record + identifier, which may be used under + insert, update and delete operations. The + client software is responsible for assigning these to + records. This identifier will + replace zebra's own automagic identifier generation with a unique + mapping from recordIdOpaque to the + &zebra; internal recordIdNumber. + The opaque recordIdOpaque string + identifiers + are not visible in retrieval records, nor are + searchable, so the value of this parameter is + questionable. It serves mostly as a convenient mapping from + application domain string identifiers to &zebra; internal ID's. + + +
+ + + + Extended services from yaz-client + + + We can now start a yaz-client admin session and create a database: + + adm-create + ]]> + + Now the Default database was created, + we can insert an &acro.xml; file (esdd0006.grs + from example/gils/records) and index it: + + update insert id1234 esdd0006.grs + ]]> + + The 3rd parameter - id1234 here - + is the recordIdOpaque package field. + + + Actually, we should have a way to specify "no opaque record id" for + yaz-client's update command.. We'll fix that. + + + The newly inserted record can be searched as usual: + + f utah + Sent searchRequest. + Received SearchResponse. + Search was a success. + Number of hits: 1, setno 1 + SearchResult-1: term=utah cnt=1 + records returned: 0 + Elapsed: 0.014179 + ]]> + + + + Let's delete the beast, using the same + recordIdOpaque string parameter: + + update delete id1234 + No last record (update ignored) + Z> update delete 1 esdd0006.grs + Got extended services response + Status: done + Elapsed: 0.072441 + Z> f utah + Sent searchRequest. + Received SearchResponse. + Search was a success. + Number of hits: 0, setno 2 + SearchResult-1: term=utah cnt=0 + records returned: 0 + Elapsed: 0.013610 + ]]> + + + + If shadow register is enabled in your + zebra.cfg, + you must run the adm-commit command + + adm-commit + ]]> + + after each update session in order write your changes from the + shadow to the life register space. + + + + + + Extended services from yaz-php + + + Extended services are also available from the &yaz; &acro.php; client layer. An + example of an &yaz;-&acro.php; extended service transaction is given here: + + A fine specimen of a record'; + + $options = array('action' => 'recordInsert', + 'syntax' => 'xml', + 'record' => $record, + 'databaseName' => 'mydatabase' + ); + + yaz_es($yaz, 'update', $options); + yaz_es($yaz, 'commit', array()); + yaz_wait(); + + if ($error = yaz_error($yaz)) + echo "$error"; + ]]> + + + + + + Extended services debugging guide + + When debugging ES over PHP we recommend the following order of tests: + + + + + + Make sure you have a nice record on your filesystem, which you can + index from the filesystem by use of the zebraidx command. + Do it exactly as you planned, using one of the GRS-1 filters, + or the DOMXML filter. + When this works, proceed. + + + + + Check that your server setup is OK before you even coded one single + line PHP using ES. + Take the same record form the file system, and send as ES via + yaz-client like described in + , + and + remember the -a option which tells you what + goes over the wire! Notice also the section on permissions: + try + + perm.anonymous: rw + + in zebra.cfg to make sure you do not run into + permission problems (but never expose such an insecure setup on the + internet!!!). Then, make sure to set the general + recordType instruction, pointing correctly + to the GRS-1 filters, + or the DOMXML filters. + + + + + If you insist on using the sysno in the + recordIdNumber setting, + please make sure you do only updates and deletes. Zebra's internal + system number is not allowed for + recordInsert or + specialUpdate actions + which result in fresh record inserts. + + + + + If shadow register is enabled in your + zebra.cfg, you must remember running the + + Z> adm-commit + + command as well. + + + + + If this works, then proceed to do the same thing in your PHP script. + + + + + + + +
+
+