X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Fadministration.xml;h=42e13c1f724e485b8b9dc30eba0efb839582ddc2;hb=6d3b83ae7e008f2d61326051b03f7f07d3cc2ef0;hp=1c8df0e37733787c46c7c819ab35bf2b54218eca;hpb=25a37c9be836f891281688788a7a1f967ea2b2cb;p=idzebra-moved-to-github.git diff --git a/doc/administration.xml b/doc/administration.xml index 1c8df0e..42e13c1 100644 --- a/doc/administration.xml +++ b/doc/administration.xml @@ -1,9 +1,9 @@ - + Administrating Zebra @@ -106,7 +106,7 @@ You can edit the configuration file with a normal text editor. parameter names and values are separated by colons in the file. Lines - starting with a hash sign (#) are + starting with a hash sign (#) are treated as comments. @@ -162,7 +162,7 @@ group - .recordType[.name]: + .recordType[.name]: type @@ -276,7 +276,7 @@ - profilePath: path + profilePath: path Specifies a path of profile specification files. @@ -305,6 +305,19 @@ Specifies size of internal memory to use for the zebraidx program. The amount is given in megabytes - default is 4 (4 MB). + The more memory, the faster large updates happen, up to about + half the free memory available on the computer. + + + + + tempfiles: Yes/Auto/No + + + Tells zebra if it should use temporary files when indexing. The + default is Auto, in which case zebra uses temporary files only + if it would need more that memMax + megabytes of memory. This should be good for most uses. @@ -315,13 +328,69 @@ Specifies a directory base for Zebra. All relative paths given (in profilePath, register, shadow) are based on this - directory. This setting is useful if if you Zebra server + directory. This setting is useful if your Zebra server is running in a different directory from where zebra.cfg is located. + + passwd: file + + + Specifies a file with description of user accounts for Zebra. + The format is similar to that known to Apache's htpasswd files + and UNIX' passwd files. Non-empty lines not beginning with + # are considered account lines. There is one account per-line. + A line consists of fields separate by a single colon character. + First field is username, second is password. + + + + + + passwd.c: file + + + Specifies a file with description of user accounts for Zebra. + File format is similar to that used by the passwd directive except + that the password are encrypted. Use Apache's htpasswd or similar + for maintenanace. + + + + + + perm.user: + permstring + + + Specifies permissions (priviledge) for a user that are allowed + to access Zebra via the passwd system. There are two kinds + of permissions currently: read (r) and write(w). By default + users not listed in a permission directive are given the read + priviledge. To specify permissions for a user with no + username, or Z39.50 anonymous style use + anonymous. The permstring consists of + a sequence of characters. Include character w + for write/update access, r for read access. + + + + + + dbaccess accessfile + + + Names a file which lists database subscriptions for individual users. + The access file should consists of lines of the form username: + dbnames, where dbnames is a list of database names, seprated by + '+'. No whitespace is allowed in the database list. + + + + @@ -384,7 +453,7 @@ - profilePath: /usr/local/yaz + profilePath: /usr/local/idzebra/tab attset: bib1.att simple.recordType: text simple.database: textbase @@ -484,6 +553,7 @@ and then run zebraidx with the update command. + @@ -599,7 +669,7 @@ - (see + (see for details of how the mapping between elements of your records and searchable attributes is established). @@ -773,7 +843,6 @@ register: /d1:500M - shadow: /scratch1:100M /scratch2:200M @@ -851,8 +920,342 @@ + + + + Static and Dynamic Ranking + + + Zebra uses internally inverted indexes to look up term occurencies + in documents. Multiple queries from different indexes can be + combined by the binary boolean operations AND, + OR and/or NOT (which + is in fact a binary AND NOT operation). + To ensure fast query execution + speed, all indexes have to be sorted in the same order. + + + The indexes are normally sorted according to document + ID in + ascending order, and any query which does not invoke a special + re-ranking function will therefore retrieve the result set in + document + ID + order. + + + If one defines the + + staticrank: 1 + + directive in the main core Zebra config file, the internal document + keys used for ordering are augmented by a preceeding integer, which + contains the static rank of a given document, and the index lists + are ordered + first by ascending static rank, + then by ascending document ID. + + + This implies that the default rank 0 + is the best rank at the + beginning of the list, and max int + is the worst static rank. + + + The experimental alvis filter provides a + directive to fetch static rank information out of the indexed XML + records, thus making all hit sets orderd + after ascending static + rank, and for those doc's which have the same static rank, ordered + after ascending doc ID. + See for the glory details. + + + If one wants to do a little fiddeling with the static rank order, + one has to invoke additional re-ranking/re-ordering using dynamic + reranking or score functions. These functions return positive + interger scores, where highest score is + best, which means that the + hit sets will be sorted according to + decending + scores (in contrary + to the index lists which are sorted according to + ascending rank number and document ID). + + + + Those are in the zebra config file enabled by a directive like (use + only one of these a time!): + + rank: rank-1 # default + rank: rank-static # dummy + rank: zvrank # TDF-IDF like + + Notice that the rank-1 and + zvrank do not use the static rank + information in the list keys, and will produce the same ordering + with our without static ranking enabled. + + + The dummy rank-static reranking/scoring + function returns just + score = max int - staticrank + in order to preserve the ordering of hit sets with and without it's + call. + Obviously, to combine static and dynamic ranking usefully, one wants + to make a new ranking + function, which is left + as an exercise for the reader. + + + + + + Extended Services: Remote Insert, Update and Delete + + + The extended services are not enabled by default in zebra - due to the + fact that they modify the system. + In order to allow anybody to update, use + + perm.anonymous: rw + + in the main zebra configuration file zebra.cfg. + Or, even better, allow only updates for a particular admin user. For + user admin, you could use: + + perm.admin: rw + passwd: passwordfile + + And in passwordfile, specify users and + passwords as colon seperated strings: + + admin:secret + + + + We can now start a yaz-client admin session and create a database: + + adm-create + ]]> + + Now the Default database was created, + we can insert an XML file (esdd0006.grs + from example/gils/records) and index it: + + update insert 1 esdd0006.grs + ]]> + + The 3rd parameter - 1 here - + is the opaque record ID from Ext update. + It a record ID that we assign to the record + in question. If we do not + assign one, the usual rules for match apply (recordId: from zebra.cfg). + + + Actually, we should have a way to specify "no opaque record id" for + yaz-client's update command.. We'll fix that. + + + The newly inserted record can be searched as usual: + + f utah + Sent searchRequest. + Received SearchResponse. + Search was a success. + Number of hits: 1, setno 1 + SearchResult-1: term=utah cnt=1 + records returned: 0 + Elapsed: 0.014179 + ]]> + + + + Let's delete the beast: + + update delete 1 + No last record (update ignored) + Z> update delete 1 esdd0006.grs + Got extended services response + Status: done + Elapsed: 0.072441 + Z> f utah + Sent searchRequest. + Received SearchResponse. + Search was a success. + Number of hits: 0, setno 2 + SearchResult-1: term=utah cnt=0 + records returned: 0 + Elapsed: 0.013610 + ]]> + + + + If shadow register is enabled in your + zebra.cfg, + you must run the adm-commit command + + adm-commit + ]]> + + after each update session in order write your changes from the + shadow to the life register space. + + + Extended services are also available from the YAZ client layer. An + example of an YAZ-PHP extended service transaction is given here: + + A fine specimen of a record'; + + $options = array('action' => 'recordInsert', + 'syntax' => 'xml', + 'record' => $record, + 'databaseName' => 'mydatabase' + ); + + yaz_es($yaz, 'update', $options); + yaz_es($yaz, 'commit', array()); + yaz_wait(); + + if ($error = yaz_error($yaz)) + echo "$error"; + ]]> + + The action parameter can be any of + recordInsert (will fail if the record already exists), + recordReplace (will fail if the record does not exist), + recordDelete (will fail if the record does not + exist), and + specialUpdate (will insert or update the record + as needed). + + + If a record is inserted + using the action recordInsert + one can specify the optional + recordIdOpaque parameter, which is a + client-supplied, opaque record identifier. This identifier will + replace zebra's own automagic identifier generation. + + + When using the action recordReplace or + recordDelete, one must specify the additional + recordIdNumber parameter, which must be an + existing Zebra internal system ID number. When retrieving existing + records, the ID number is returned in the field + /*/id:idzebra/localnumber in the namespace + xmlns:id="http://www.indexdata.dk/zebra/", + where it can be picked up for later record updates or deletes. + + + + + + YAZ Frontend Virtual Hosts + + zebrasrv uses the YAZ server frontend and does + support multiple virtual servers behind multiple listening sockets. + + &zebrasrv-virtual; + + + Section "Virtual Hosts" in the YAZ manual. + http://www.indexdata.dk/yaz/doc/server.vhosts.tkl + + + + + + Server Side CQL to PQF Query Translation + + Using the + <cql2rpn>l2rpn.txt</cql2rpn> + YAZ Frontend Virtual + Hosts option, one can configure + the YAZ Frontend CQL-to-PQF + converter, specifying the interpretation of various + CQL + indexes, relations, etc. in terms of Type-1 query attributes. + + + + For example, using server-side CQL-to-PQF conversion, one might + query a zebra server like this: + + querytype cql + Z> find text=(plant and soil) + ]]> + + and - if properly configured - even static relevance ranking can + be performed using CQL query syntax: + + find text = /relevant (plant and soil) + ]]> + + + + + By the way, the same configuration can be used to + search using client-side CQL-to-PQF conversion: + (the only difference is querytype cql2rpn + instead of + querytype cql, and the call specifying a local + conversion file) + + querytype cql2rpn + Z> find text=(plant and soil) + ]]> + + + + + Exhaustive information can be found in the + Section "Specification of CQL to RPN mappings" in the YAZ manual. + + http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map, + and shall therefore not be repeated here. + + + + + +