X-Git-Url: http://git.indexdata.com/?p=idzebra-moved-to-github.git;a=blobdiff_plain;f=doc%2Fadministration.xml;fp=doc%2Fadministration.xml;h=13baec6a252641c36832553f132ed9be26a2e0de;hp=829ef7591505428863477952c443e459617edb76;hb=5ca4e60e990af6ad6b62ebff855d7b642f37c3ec;hpb=e6ff84c71e457ff668dce640382fc1ad88c37d6d diff --git a/doc/administration.xml b/doc/administration.xml index 829ef75..13baec6 100644 --- a/doc/administration.xml +++ b/doc/administration.xml @@ -1,6 +1,6 @@ - - Administrating Zebra + + Administrating &zebra; - Unlike many simpler retrieval systems, Zebra supports safe, incremental + Unlike many simpler retrieval systems, &zebra; supports safe, incremental updates to an existing index. - Normally, when Zebra modifies the index it reads a number of records + Normally, when &zebra; modifies the index it reads a number of records that you specify. Depending on your specifications and on the contents of each record one the following events take place for each record: @@ -25,8 +25,8 @@ The record is indexed as if it never occurred before. - Either the Zebra system doesn't know how to identify the record or - Zebra can identify the record but didn't find it to be already indexed. + Either the &zebra; system doesn't know how to identify the record or + &zebra; can identify the record but didn't find it to be already indexed. @@ -53,20 +53,20 @@ - Please note that in both the modify- and delete- case the Zebra + Please note that in both the modify- and delete- case the &zebra; indexer must be able to generate a unique key that identifies the record in question (more on this below). - To administrate the Zebra retrieval system, you run the + To administrate the &zebra; retrieval system, you run the zebraidx program. This program supports a number of options which are preceded by a dash, and a few commands (not preceded by dash). - Both the Zebra administrative tool and the Z39.50 server share a + Both the &zebra; administrative tool and the Z39.50 server share a set of index files and a global configuration file. The name of the configuration file defaults to zebra.cfg. @@ -85,7 +85,7 @@ Indexing is a per-record process, in which either insert/modify/delete will occur. Before a record is indexed search keys are extracted from whatever might be the layout the original record (sgml,html,text, etc..). - The Zebra system currently supports two fundamental types of records: + The &zebra; system currently supports two fundamental types of records: structured and simple text. To specify a particular extraction process, use either the command line option -t or specify a @@ -95,10 +95,10 @@ - The Zebra Configuration File + The &zebra; Configuration File - The Zebra configuration file, read by zebraidx and + The &zebra; configuration file, read by zebraidx and zebrasrv defaults to zebra.cfg unless specified by -c option. @@ -220,10 +220,10 @@ Specifies whether the records should be stored internally - in the Zebra system files. + in the &zebra; system files. If you want to maintain the raw records yourself, this option should be false (0). - If you want Zebra to take care of the records for you, it + If you want &zebra; to take care of the records for you, it should be true(1). @@ -233,7 +233,7 @@ register: register-location - Specifies the location of the various register files that Zebra uses + Specifies the location of the various register files that &zebra; uses to represent your databases. See . @@ -243,7 +243,7 @@ shadow: register-location - Enables the safe update facility of Zebra, and + Enables the safe update facility of &zebra;, and tells the system where to place the required, temporary files. See . @@ -316,7 +316,7 @@ estimatehits:: integer - Controls whether Zebra should calculate approximite hit counts and + Controls whether &zebra; should calculate approximite hit counts and at which hit count it is to be enabled. A value of 0 disables approximiate hit counts. For a positive value approximaite hit count is enabled @@ -373,9 +373,9 @@ root: dir - Specifies a directory base for Zebra. All relative paths + Specifies a directory base for &zebra;. All relative paths given (in profilePath, register, shadow) are based on this - directory. This setting is useful if your Zebra server + directory. This setting is useful if your &zebra; server is running in a different directory from where zebra.cfg is located. @@ -386,7 +386,7 @@ passwd: file - Specifies a file with description of user accounts for Zebra. + Specifies a file with description of user accounts for &zebra;. The format is similar to that known to Apache's htpasswd files and UNIX' passwd files. Non-empty lines not beginning with # are considered account lines. There is one account per-line. @@ -400,7 +400,7 @@ passwd.c: file - Specifies a file with description of user accounts for Zebra. + Specifies a file with description of user accounts for &zebra;. File format is similar to that used by the passwd directive except that the password are encrypted. Use Apache's htpasswd or similar for maintenance. @@ -414,7 +414,7 @@ Specifies permissions (priviledge) for a user that are allowed - to access Zebra via the passwd system. There are two kinds + to access &zebra; via the passwd system. There are two kinds of permissions currently: read (r) and write(w). By default users not listed in a permission directive are given the read privilege. To specify permissions for a user with no @@ -448,7 +448,7 @@ Locating Records - The default behavior of the Zebra system is to reference the + The default behavior of the &zebra; system is to reference the records from their original location, i.e. where they were found when you run zebraidx. That is, when a client wishes to retrieve a record @@ -463,7 +463,7 @@ If your input files are not permanent - for example if you retrieve your records from an outside source, or if they were temporarily mounted on a CD-ROM drive, - you may want Zebra to make an internal copy of them. To do this, + you may want &zebra; to make an internal copy of them. To do this, you specify 1 (true) in the storeData setting. When the Z39.50 server retrieves the records they will be read from the internal file structures of the system. @@ -557,7 +557,7 @@ To enable indexing with pathname IDs, you must specify file as the value of recordId in the configuration file. In addition, you should set - storeKeys to 1, since the Zebra + storeKeys to 1, since the &zebra; indexer must save additional information about the contents of each record in order to modify the indexes correctly at a later time. @@ -587,7 +587,7 @@ You cannot start out with a group of records with simple indexing (no record IDs as in the previous section) and then later - enable file record Ids. Zebra must know from the first time that you + enable file record Ids. &zebra; must know from the first time that you index the group that the files should be indexed with file record IDs. @@ -698,7 +698,7 @@ - For instance, the sample GILS records that come with the Zebra + For instance, the sample GILS records that come with the &zebra; distribution contain a unique ID in the data tagged Control-Identifier. The data is mapped to the Bib-1 use attribute Identifier-standard (code 1007). To use this field as a record id, specify @@ -752,7 +752,7 @@ zebraidx. If you wish to store these, possibly large, files somewhere else, you must add the register entry to the zebra.cfg file. - Furthermore, the Zebra system allows its file + Furthermore, the &zebra; system allows its file structures to span multiple file systems, which is useful for managing very large databases. @@ -767,7 +767,7 @@ The dir specifies a directory in which index files will be stored and the size specifies the maximum - size of all files in that directory. The Zebra indexer system fills + size of all files in that directory. The &zebra; indexer system fills each directory in the order specified and use the next specified directories as needed. The size is an integer followed by a qualifier @@ -792,12 +792,12 @@ - Note that Zebra does not verify that the amount of space specified is + Note that &zebra; does not verify that the amount of space specified is actually available on the directory (file system) specified - it is your responsibility to ensure that enough space is available, and that other applications do not attempt to use the free space. In a large production system, it is recommended that you allocate one or more - file system exclusively to the Zebra register files. + file system exclusively to the &zebra; register files. @@ -809,9 +809,9 @@ Description - The Zebra server supports updating of the index + The &zebra; server supports updating of the index structures. That is, you can add, modify, or remove records from - databases managed by Zebra without rebuilding the entire index. + databases managed by &zebra; without rebuilding the entire index. Since this process involves modifying structured files with various references between blocks of data in the files, the update process is inherently sensitive to system crashes, or to process interruptions: @@ -826,7 +826,7 @@ You can solve these problems by enabling the shadow register system in - Zebra. + &zebra;. During the updating procedure, zebraidx will temporarily write changes to the involved files in a set of "shadow files", without modifying the files that are accessed by the @@ -977,7 +977,7 @@ Overview The default ordering of a result set is left up to the server, - which inside Zebra means sorting in ascending document ID order. + which inside &zebra; means sorting in ascending document ID order. This is not always the order humans want to browse the sometimes quite large hit sets. Ranking and sorting comes to the rescue. @@ -996,7 +996,7 @@ Simply put, dynamic relevance ranking sorts a set of retrieved records such that those most likely to be relevant to your request are retrieved first. - Internally, Zebra retrieves all documents that satisfy your + Internally, &zebra; retrieves all documents that satisfy your query, and re-orders the hit list to arrange them based on a measurement of similarity between your query and the content of each record. @@ -1015,7 +1015,7 @@ Static Ranking - Zebra uses internally inverted indexes to look up term occurencies + &zebra; uses internally inverted indexes to look up term occurencies in documents. Multiple queries from different indexes can be combined by the binary boolean operations AND, OR and/or NOT (which @@ -1037,7 +1037,7 @@ staticrank: 1 - directive in the main core Zebra configuration file, the internal document + directive in the main core &zebra; configuration file, the internal document keys used for ordering are augmented by a preceding integer, which contains the static rank of a given document, and the index lists are ordered @@ -1110,7 +1110,7 @@ algorithms, which only considers searching in one full-text index, this one works on multiple indexes at the same time. More precisely, - Zebra does boolean queries and searches in specific addressed + &zebra; does boolean queries and searches in specific addressed indexes (there are inverted indexes pointing from terms in the dictionary to documents and term positions inside documents). It works like this: @@ -1415,7 +1415,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci Sorting - Zebra sorts efficiently using special sorting indexes + &zebra; sorts efficiently using special sorting indexes (type=s; so each sortable index must be known at indexing time, specified in the configuration of record indexing. For example, to enable sorting according to the BIB-1 @@ -1485,7 +1485,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci - Extended services are only supported when accessing the Zebra + Extended services are only supported when accessing the &zebra; server using the Z39.50 protocol. The SRU protocol does not support extended services. @@ -1494,7 +1494,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci The extended services are not enabled by default in zebra - due to the - fact that they modify the system. Zebra can be configured + fact that they modify the system. &zebra; can be configured to allow anybody to search, and to allow only updates for a particular admin user in the main zebra configuration file zebra.cfg. @@ -1512,7 +1512,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci admin:secret - It is essential to configure Zebra to store records internally, + It is essential to configure &zebra; to store records internally, and to support modifications and deletion of records: @@ -1537,7 +1537,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci It is not possible to carry information about record types or - similar to Zebra when using extended services, due to + similar to &zebra; when using extended services, due to limitations of the Z39.50 protocol. Therefore, indexing filters can not be chosen on a per-record basis. One and only one general XML indexing filter @@ -1613,7 +1613,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci recordIdNumber positive number - Zebra's internal system number, + &zebra;'s internal system number, not allowed for recordInsert or specialUpdate actions which result in fresh record inserts. @@ -1645,7 +1645,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci During all actions, the usual rules for internal record ID generation apply, unless an - optional recordIdNumber Zebra internal ID or a + optional recordIdNumber &zebra; internal ID or a recordIdOpaque string identifier is assigned. The default ID generation is configured using the recordId: from @@ -1655,7 +1655,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci Setting of the recordIdNumber parameter, - which must be an existing Zebra internal system ID number, is not + which must be an existing &zebra; internal system ID number, is not allowed during any recordInsert or specialUpdate action resulting in fresh record inserts. @@ -1663,7 +1663,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci When retrieving existing - records indexed with GRS indexing filters, the Zebra internal + records indexed with GRS indexing filters, the &zebra; internal ID number is returned in the field /*/id:idzebra/localnumber in the namespace xmlns:id="http://www.indexdata.dk/zebra/", @@ -1673,7 +1673,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci A new element set for retrieval of internal record data has been added, which can be used to access minimal records - containing only the recordIdNumber Zebra + containing only the recordIdNumber &zebra; internal ID, or the recordIdOpaque string identifier. This works for any indexing filter used. See . @@ -1688,13 +1688,13 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci records. This identifier will replace zebra's own automagic identifier generation with a unique mapping from recordIdOpaque to the - Zebra internal recordIdNumber. + &zebra; internal recordIdNumber. The opaque recordIdOpaque string identifiers are not visible in retrieval records, nor are searchable, so the value of this parameter is questionable. It serves mostly as a convenient mapping from - application domain string identifiers to Zebra internal ID's. + application domain string identifiers to &zebra; internal ID's.