X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Fadministration.xml;h=7303d30a4dc637a73207c1de70a78440714f34b2;hb=558bf94a5f36eb89b0ca7ac4780b641da852c36b;hp=34f938c44596dd9ef654709f6804f44c7d674b0a;hpb=94ec3a0012667d2c67e4bdba2a8255e863f88925;p=idzebra-moved-to-github.git diff --git a/doc/administration.xml b/doc/administration.xml index 34f938c..7303d30 100644 --- a/doc/administration.xml +++ b/doc/administration.xml @@ -1,5 +1,5 @@ - + Administrating Zebra + - Those are in the zebra config file enabled by a directive like (use - only one of these a time!): - - rank: rank-1 # default - rank: rank-static # dummy - rank: zvrank # TDF-IDF like + Dynamic ranking is done at query time rather than + indexing time (this is why we + call it ``dynamic ranking'' in the first place ...) + It is invoked by adding + the Bib-1 relation attribute with + value ``relevance'' to the PQF query (that is, + @attr 2=102, see also + + The BIB-1 Attribute Set Semantics, also in + HTML). + To find all articles with the word Eoraptor in + the title, and present them relevance ranked, issue the PQF query: + + @attr 2=102 @attr 1=4 Eoraptor - Notice that the rank-1 and - zvrank do not use the static rank - information in the list keys, and will produce the same ordering - with our without static ranking enabled. + + + Dynamically ranking using PQF queries with the 'rank-1' + algorithm + + The default rank-1 ranking module implements a + TF/IDF (Term Frequecy over Inverse Document Frequency) like + algorithm. In contrast to the usual defintion of TF/IDF + algorithms, which only considers searching in one full-text + index, this one works on multiple indexes at the same time. + More precisely, + Zebra does boolean queries and searches in specific addressed + indexes (there are inverted indexes pointing from terms in the + dictionary to documents and term positions inside documents). + It works like this: + + + Query Components + + + First, the boolean query is dismantled into it's principal components, + i.e. atomic queries where one term is looked up in one index. + For example, the query + + @attr 2=102 @and @attr 1=1010 Utah @attr 1=1018 Springer + + is a boolean AND between the atomic parts + + @attr 2=102 @attr 1=1010 Utah + + and + + @attr 2=102 @attr 1=1018 Springer + + which gets processed each for itself. + + + + + + Atomic hit lists + + + Second, for each atomic query, the hit list of documents is + computed. + + + In this example, two hit lists for each index + @attr 1=1010 and + @attr 1=1018 are computed. + + + + + + Atomic scores + + + Third, each document in the hit list is assigned a score (_if_ ranking + is enabled and requested in the query) using a TF/IDF scheme. + + + In this example, both atomic parts of the query assign the magic + @attr 2=102 relevance attribute, and are + to be used in the relevance ranking functions. + + + It is possible to apply dynamic ranking on only parts of the + PQF query: + + @and @attr 2=102 @attr 1=1010 Utah @attr 1=1018 Springer + + searches for all documents which have the term 'Utah' on the + body of text, and which have the term 'Springer' in the publisher + field, and sort them in the order of the relevance ranking made on + the body-of-text index only. + + + + + + Hit list merging + + + Fourth, the atomic hit lists are merged according to the boolean + conditions to a final hit list of documents to be returned. + + + This step is always performed, independently of the fact that + dynamic ranking is enabled or not. + + + + + + Document score computation + + + Fifth, the total score of a document is computed as a linear + combination of the atomic scores of the atomic hit lists + + + Ranking weights may be used to pass a value to a ranking + algorithm, using the non-standard BIB-1 attribute type 9. + This allows one branch of a query to use one value while + another branch uses a different one. For example, we can search + for utah in the + @attr 1=4 index with weight 30, as + well as in the @attr 1=1010 index with weight 20: + + @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 @attr 1=1010 city + + + + The default weight is + sqrt(1000) ~ 34 , as the Z39.50 standard prescribes that the top score + is 1000 and the bottom score is 0, encoded in integers. + + + + The ranking-weight feature is experimental. It may change in future + releases of zebra. + + + + + + + Re-sorting of hit list + + + Finally, the final hit list is re-ordered according to scores. + + + + + + + + + + + + + + The rank-1 algorithm + does not use the static rank + information in the list keys, and will produce the same ordering + with or without static ranking enabled. + + + + + + + + + + Dynamic ranking is not compatible + with estimated hit sizes, as all documents in + a hit set must be accessed to compute the correct placing in a + ranking sorted list. Therefore the use attribute setting + @attr 2=102 clashes with + @attr 9=integer. + + + + + + + + Dynamically ranking CQL queries + + Dynamic ranking can be enabled during sever side CQL + query expansion by adding @attr 2=102 + chunks to the CQL config file. For example + + relationModifier.relevant = 2=102 + + invokes dynamic ranking each time a CQL query of the form + + Z> querytype cql + Z> f alvis.text =/relevant house + + is issued. Dynamic ranking can also be automatically used on + specific CQL indexes by (for example) setting + + index.alvis.text = 1=text 2=102 + + which then invokes dynamic ranking each time a CQL query of the form + + Z> querytype cql + Z> f alvis.text = house + + is issued. + + + + + + + + + Sorting + + Zebra sorts efficiently using special sorting indexes + (type=s; so each sortable index must be known + at indexing time, specified in the configuration of record + indexing. For example, to enable sorting according to the BIB-1 + Date/time-added-to-db field, one could add the line + + xelm /*/@created Date/time-added-to-db:s + + to any .abs record-indexing configuration file. + Similarly, one could add an indexing element of the form + + + + ]]> + to any alvis-filter indexing stylesheet. + + + Indexing can be specified at searching time using a query term + carrying the non-standard + BIB-1 attribute-type 7. This removes the + need to send a Z39.50 Sort Request + separately, and can dramatically improve latency when the client + and server are on separate networks. + The sorting part of the query is separate from the rest of the + query - the actual search specification - and must be combined + with it using OR. + + + A sorting subquery needs two attributes: an index (such as a + BIB-1 type-1 attribute) specifying which index to sort on, and a + type-7 attribute whose value is be 1 for + ascending sorting, or 2 for descending. The + term associated with the sorting attribute is the priority of + the sort key, where 0 specifies the primary + sort key, 1 the secondary sort key, and so + on. + + For example, a search for water, sort by title (ascending), + is expressed by the PQF query + + @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 + + whereas a search for water, sort by title ascending, + then date descending would be + + @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1 + + + + Notice the fundamental differences between dynamic + ranking and sorting: there can be + only one ranking function defined and configured; but multiple + sorting indexes can be specified dynamically at search + time. Ranking does not need to use specific indexes, so + dynamic ranking can be enabled and disabled without + re-indexing; whereas, sorting indexes need to be + defined before indexing. + + + + + @@ -1120,10 +1530,71 @@ after each update session in order write your changes from the shadow to the life register space. + + Extended services are also available from the YAZ client layer. An + example of an YAZ-PHP extended service transaction is given here: + + A fine specimen of a record'; - + $options = array('action' => 'recordInsert', + 'syntax' => 'xml', + 'record' => $record, + 'databaseName' => 'mydatabase' + ); + + yaz_es($yaz, 'update', $options); + yaz_es($yaz, 'commit', array()); + yaz_wait(); + + if ($error = yaz_error($yaz)) + echo "$error"; + ]]> + + The action parameter can be any of + recordInsert (will fail if the record already exists), + recordReplace (will fail if the record does not exist), + recordDelete (will fail if the record does not + exist), and + specialUpdate (will insert or update the record + as needed). + + + If a record is inserted + using the action recordInsert + one can specify the optional + recordIdOpaque parameter, which is a + client-supplied, opaque record identifier. This identifier will + replace zebra's own automagic identifier generation. + + + When using the action recordReplace or + recordDelete, one must specify the additional + recordIdNumber parameter, which must be an + existing Zebra internal system ID number. When retrieving existing + records, the ID number is returned in the field + /*/id:idzebra/localnumber in the namespace + xmlns:id="http://www.indexdata.dk/zebra/", + where it can be picked up for later record updates or deletes. + + + + YAZ Frontend Virtual Hosts + + zebrasrv uses the YAZ server frontend and does + support multiple virtual servers behind multiple listening sockets. + + &zebrasrv-virtual; + + + Section "Virtual Hosts" in the YAZ manual. + http://www.indexdata.dk/yaz/doc/server.vhosts.tkl + + + +