X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Fadministration.xml;h=eee315e1de6912fe8e955d22d33395c59c0bfbea;hb=b83408311d403f7463c336ec398766ec7d719418;hp=5ebfcd3ed4ac714ec8ef689d2be7fb00535b5dc4;hpb=79e9818dfb6b9a0a04bdd6bc6467c8dae3b8f493;p=idzebra-moved-to-github.git diff --git a/doc/administration.xml b/doc/administration.xml index 5ebfcd3..eee315e 100644 --- a/doc/administration.xml +++ b/doc/administration.xml @@ -1,7 +1,13 @@ - + Administrating Zebra - + + Unlike many simpler retrieval systems, Zebra supports safe, incremental updates to an existing index. @@ -100,7 +106,7 @@ You can edit the configuration file with a normal text editor. parameter names and values are separated by colons in the file. Lines - starting with a hash sign (#) are + starting with a hash sign (#) are treated as comments. @@ -146,9 +152,9 @@ explained further in the following sections. - + @@ -156,7 +162,7 @@ group - .recordType[.name]: + .recordType[.name]: type @@ -190,7 +196,7 @@ Specifies the Z39.50 database name. - FIXME - now we can have multiple databases in one server. -H + @@ -203,6 +209,7 @@ group of records. If you plan to update/delete this type of records later this should be specified as 1; otherwise it should be 0 (default), to save register space. + See . @@ -222,6 +229,7 @@ + register: register-location @@ -253,7 +261,7 @@ keyTmpDir: directory - Directory in which temporary files used during zebraidx' update + Directory in which temporary files used during zebraidx's update phase are stored. @@ -268,7 +276,7 @@ - profilePath: path + profilePath: path Specifies a path of profile specification files. @@ -297,6 +305,19 @@ Specifies size of internal memory to use for the zebraidx program. The amount is given in megabytes - default is 4 (4 MB). + The more memory, the faster large updates happen, up to about + half the free memory available on the computer. + + + + + tempfiles: Yes/Auto/No + + + Tells zebra if it should use temporary files when indexing. The + default is Auto, in which case zebra uses temporary files only + if it would need more that memMax + megabytes of memory. This should be good for most uses. @@ -307,13 +328,69 @@ Specifies a directory base for Zebra. All relative paths given (in profilePath, register, shadow) are based on this - directory. This setting is useful if if you Zebra server + directory. This setting is useful if your Zebra server is running in a different directory from where zebra.cfg is located. + + passwd: file + + + Specifies a file with description of user accounts for Zebra. + The format is similar to that known to Apache's htpasswd files + and UNIX' passwd files. Non-empty lines not beginning with + # are considered account lines. There is one account per-line. + A line consists of fields separate by a single colon character. + First field is username, second is password. + + + + + + passwd.c: file + + + Specifies a file with description of user accounts for Zebra. + File format is similar to that used by the passwd directive except + that the password are encrypted. Use Apache's htpasswd or similar + for maintenanace. + + + + + + perm.user: + permstring + + + Specifies permissions (priviledge) for a user that are allowed + to access Zebra via the passwd system. There are two kinds + of permissions currently: read (r) and write(w). By default + users not listed in a permission directive are given the read + priviledge. To specify permissions for a user with no + username, or Z39.50 anonymous style use + anonymous. The permstring consists of + a sequence of characters. Include character w + for write/update access, r for read access. + + + + + + dbaccess accessfile + + + Names a file which lists database subscriptions for individual users. + The access file should consists of lines of the form username: + dbnames, where dbnames is a list of database names, seprated by + '+'. No whitespace is allowed in the database list. + + + + @@ -329,8 +406,9 @@ That is, when a client wishes to retrieve a record following a search operation, the files are accessed from the place where you originally put them - if you remove the files (without - running zebraidx again, the client - will receive a diagnostic message. + running zebraidx again, the server will return + diagnostic number 14 (``System error in presenting records'') to + the client. @@ -375,7 +453,7 @@ - profilePath: /usr/local/yaz + profilePath: /usr/local/idzebra/tab attset: bib1.att simple.recordType: text simple.database: textbase @@ -436,9 +514,9 @@ in order to modify the indexes correctly at a later time. - - FIXME - There must be a simpler way to do this with Adams string tags -H - + For example, to update records of group esdd @@ -475,6 +553,7 @@ and then run zebraidx with the update command. + @@ -590,7 +669,7 @@ - (see + (see for details of how the mapping between elements of your records and searchable attributes is established). @@ -764,7 +843,6 @@ register: /d1:500M - shadow: /scratch1:100M /scratch2:200M @@ -776,14 +854,13 @@ In order to make changes to the system take effect for the users, you'll have to submit a "commit" command after a (sequence of) update operation(s). - You can ask the indexer to commit the changes immediately - after the update operation: - $ zebraidx update /d1/records update /d2/more-records commit + $ zebraidx update /d1/records + $ zebraidx commit @@ -795,7 +872,7 @@ - $ zebraidx -g books update /d1/records update /d2/more-records + $ zebraidx -g books update /d1/records /d2/more-records $ zebraidx -g fun update /d3/fun-records $ zebraidx commit @@ -843,8 +920,112 @@ + + + + Static and Dynamic Ranking + + + Zebra uses internally inverted indexes to look up term occurencies + in documents. Multiple queries from different indexes can be + combined by the binary boolean operations AND, + OR and/or NOT (which + is in fact a binary AND NOT operation). + To ensure fast query execution + speed, all indexes have to be sorted in the same order. + + + The indexes are normally sorted according to document + ID in + ascending order, and any query which does not invoke a special + re-ranking function will therefore retrieve the result set in + document + ID + order. + + + If one defines the + + staticrank: 1 + + directive in the main core Zebra config file, the internal document + keys used for ordering are augmented by a preceeding integer, which + contains the static rank of a given document, and the index lists + are ordered + first by ascending static rank, + then by ascending document ID. + + + This implies that the default rank 0 + is the best rank at the + beginning of the list, and max int + is the worst static rank. + + + The experimental alvis filter provides a + directive to fetch static rank information out of the indexed XML + records, thus making all hit sets orderd + after ascending static + rank, and for those doc's which have the same static rank, ordered + after ascending doc ID. + See for the glory details. + + + If one wants to do a little fiddeling with the static rank order, + one has to invoke additional re-ranking/re-ordering using dynamic + reranking or score functions. These functions return positive + interger scores, where highest score is + best, which means that the + hit sets will be sorted according to + decending + scores (in contrary + to the index lists which are sorted according to + ascending rank number and document ID). + + + + Those are in the zebra config file enabled by a directive like (use + only one of these a time!): + + rank: rank-1 # default + rank: rank-static # dummy + rank: zvrank # TDF-IDF like + + Notice that the rank-1 and + zvrank do not use the static rank + information in the list keys, and will produce the same ordering + with our without static ranking enabled. + + + The dummy rank-static reranking/scoring + function returns just + score = max int - staticrank + in order to preserve the ordering of hit sets with and without it's + call. + Obviously, to combine static and dynamic ranking usefully, one wants + to make a new ranking + function, which is left + as an exercise for the reader. + + + + +