X-Git-Url: http://git.indexdata.com/?p=idzebra-moved-to-github.git;a=blobdiff_plain;f=doc%2Fadministration.xml;h=762ba7db9f2dea232dfc8d6f2106713b6272652c;hp=d47fc776959c255b541814ebe8f64cddac52de99;hb=99842ec71f065fd6886daa355923b01d9ce71d26;hpb=b19b79e382ef8196f1625763db1af3a82b1e0c81 diff --git a/doc/administration.xml b/doc/administration.xml index d47fc77..762ba7d 100644 --- a/doc/administration.xml +++ b/doc/administration.xml @@ -1,5 +1,4 @@ - Administrating &zebra; @@ -300,6 +299,32 @@ + index: filename + + + Defines the filename which holds fields structure + definitions. If omitted, the file default.idx + is read. + Refer to for + more information. + + + + + + sortmax: integer + + + Specifies the maximum number of records that will be sorted + in a result set. If the result set contains more than + integer records, records after the + limit will not be sorted. If omitted, the default value is + 1,000. + + + + + staticrank: integer @@ -313,13 +338,13 @@ - estimatehits:: integer + estimatehits: integer - Controls whether &zebra; should calculate approximite hit counts and + Controls whether &zebra; should calculate approximate hit counts and at which hit count it is to be enabled. - A value of 0 disables approximiate hit counts. - For a positive value approximaite hit count is enabled + A value of 0 disables approximate hit counts. + For a positive value approximate hit count is enabled if it is known to be larger than integer. @@ -413,12 +438,12 @@ permstring - Specifies permissions (priviledge) for a user that are allowed + Specifies permissions (privilege) for a user that are allowed to access &zebra; via the passwd system. There are two kinds of permissions currently: read (r) and write(w). By default users not listed in a permission directive are given the read privilege. To specify permissions for a user with no - username, or &z3950; anonymous style use + username, or &acro.z3950; anonymous style use anonymous. The permstring consists of a sequence of characters. Include character w for write/update access, r for read access and @@ -428,13 +453,53 @@ - dbaccess accessfile + dbaccess: accessfile Names a file which lists database subscriptions for individual users. - The access file should consists of lines of the form username: - dbnames, where dbnames is a list of database names, seprated by - '+'. No whitespace is allowed in the database list. + The access file should consists of lines of the form + username: dbnames, where dbnames is a list of + database names, separated by '+'. No whitespace is allowed in the + database list. + + + + + + encoding: charsetname + + + Tells &zebra; to interpret the terms in Z39.50 queries as + having been encoded using the specified character + encoding. The default is ISO-8859-1; one + useful alternative is UTF-8. + + + + + + storeKeys: value + + + Specifies whether &zebra; keeps a copy of indexed keys. + Use a value of 1 to enable; 0 to disable. If storeKeys setting is + omitted, it is enabled. Enabled storeKeys + are required for updating and deleting records. Disable only + storeKeys to save space and only plan to index data once. + + + + + + storeData: value + + + Specifies whether &zebra; keeps a copy of indexed records. + Use a value of 1 to enable; 0 to disable. If storeData setting is + omitted, it is enabled. A storeData setting of 0 (disabled) makes + Zebra fetch records from the original locaction in the file + system using filename, file offset and file length. For the + DOM and ALVIS filter, the storeData setting is ignored. @@ -465,7 +530,7 @@ mounted on a CD-ROM drive, you may want &zebra; to make an internal copy of them. To do this, you specify 1 (true) in the storeData setting. When - the &z3950; server retrieves the records they will be read from the + the &acro.z3950; server retrieves the records they will be read from the internal file structures of the system. @@ -494,7 +559,7 @@ Consider a system in which you have a group of text files called simple. - That group of records should belong to a &z3950; database called + That group of records should belong to a &acro.z3950; database called textbase. The following zebra.cfg file will suffice: @@ -613,7 +678,7 @@ information. If you have a group of records that explicitly associates an ID with each record, this method is convenient. For example, the record format may contain a title or a ID-number - unique within the group. - In either case you specify the &z3950; attribute set and use-attribute + In either case you specify the &acro.z3950; attribute set and use-attribute location in which this information is stored, and the system looks at that field to determine the identity of the record. @@ -700,7 +765,7 @@ For instance, the sample GILS records that come with the &zebra; distribution contain a unique ID in the data tagged Control-Identifier. - The data is mapped to the &bib1; use attribute Identifier-standard + The data is mapped to the &acro.bib1; use attribute Identifier-standard (code 1007). To use this field as a record id, specify (bib1,Identifier-standard) as the value of the recordId in the configuration file. @@ -761,9 +826,7 @@ The value of the register setting is a sequence of tokens. Each token takes the form: - - dir:size. - + dir:size The dir specifies a directory in which index files will be stored and the size specifies the maximum @@ -776,19 +839,21 @@ k for kilobytes. M for megabytes, G for gigabytes. + Specifying a negative value disables the checking (it still needs the unit, + use -1b). - For instance, if you have allocated two disks for your register, and + For instance, if you have allocated three disks for your register, and the first disk is mounted - on /d1 and has 2GB of free space and the - second, mounted on /d2 has 3.6 GB, you could - put this entry in your configuration file: + on /d1 and has 2GB of free space, the + second, mounted on /d2 has 3.6 GB, and the third, + on which you have more space than you bother to worry about, mounted on + /d3 you could put this entry in your configuration file: - register: /d1:2G /d2:3600M + register: /d1:2G /d2:3600M /d3:-1b - @@ -1015,7 +1080,7 @@ Static Ranking - &zebra; uses internally inverted indexes to look up term occurencies + &zebra; uses internally inverted indexes to look up term frequencies in documents. Multiple queries from different indexes can be combined by the binary boolean operations AND, OR and/or NOT (which @@ -1049,7 +1114,7 @@ The experimental alvis filter provides a - directive to fetch static rank information out of the indexed &xml; + directive to fetch static rank information out of the indexed &acro.xml; records, thus making all hit sets ordered after ascending static rank, and for those doc's which have the same static rank, ordered @@ -1086,27 +1151,27 @@ indexing time (this is why we call it ``dynamic ranking'' in the first place ...) It is invoked by adding - the &bib1; relation attribute with - value ``relevance'' to the &pqf; query (that is, + the &acro.bib1; relation attribute with + value ``relevance'' to the &acro.pqf; query (that is, @attr 2=102, see also - The &bib1; Attribute Set Semantics, also in + The &acro.bib1; Attribute Set Semantics, also in HTML). To find all articles with the word Eoraptor in - the title, and present them relevance ranked, issue the &pqf; query: + the title, and present them relevance ranked, issue the &acro.pqf; query: @attr 2=102 @attr 1=4 Eoraptor - Dynamically ranking using &pqf; queries with the 'rank-1' + <title>Dynamically ranking using &acro.pqf; queries with the 'rank-1' algorithm The default rank-1 ranking module implements a TF/IDF (Term Frequecy over Inverse Document Frequency) like - algorithm. In contrast to the usual defintion of TF/IDF + algorithm. In contrast to the usual definition of TF/IDF algorithms, which only considers searching in one full-text index, this one works on multiple indexes at the same time. More precisely, @@ -1119,7 +1184,7 @@ Query Components - First, the boolean query is dismantled into it's principal components, + First, the boolean query is dismantled into its principal components, i.e. atomic queries where one term is looked up in one index. For example, the query @@ -1167,7 +1232,7 @@ It is possible to apply dynamic ranking on only parts of the - &pqf; query: + &acro.pqf; query: @and @attr 2=102 @attr 1=1010 Utah @attr 1=1018 Springer @@ -1202,7 +1267,7 @@ Ranking weights may be used to pass a value to a ranking - algorithm, using the non-standard &bib1; attribute type 9. + algorithm, using the non-standard &acro.bib1; attribute type 9. This allows one branch of a query to use one value while another branch uses a different one. For example, we can search for utah in the @@ -1214,7 +1279,7 @@ The default weight is - sqrt(1000) ~ 34 , as the &z3950; standard prescribes that the top score + sqrt(1000) ~ 34 , as the &acro.z3950; standard prescribes that the top score is 1000 and the bottom score is 0, encoded in integers. @@ -1339,7 +1404,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci @@ -1555,10 +1629,10 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci - Extended services in the &z3950; protocol + Extended services in the &acro.z3950; protocol - The &z3950; standard allows + The &acro.z3950; standard allows servers to accept special binary extended services protocol packages, which may be used to insert, update and delete records into servers. These carry control and update @@ -1566,7 +1640,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci - Extended services &z3950; Package Fields + Extended services &acro.z3950; Package Fields @@ -1594,19 +1668,21 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci record - &xml; string - An &xml; formatted string containing the record - - - syntax - 'xml' - Only &xml; record syntax is supported + &acro.xml; string + An &acro.xml; formatted string containing the record + + syntax + 'xml' + XML/SUTRS/MARC. GRS-1 not supported. + The default filter (record type) as given by recordType in + zebra.cfg is used to parse the record. + recordIdOpaque string - Optional client-supplied, opaque record + Optional client-supplied, opaque record identifier used under insert operations. @@ -1663,7 +1739,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci When retrieving existing - records indexed with &grs1; indexing filters, the &zebra; internal + records indexed with &acro.grs1; indexing filters, the &zebra; internal ID number is returned in the field /*/id:idzebra/localnumber in the namespace xmlns:id="http://www.indexdata.dk/zebra/", @@ -1712,7 +1788,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci ]]> Now the Default database was created, - we can insert an &xml; file (esdd0006.grs + we can insert an &acro.xml; file (esdd0006.grs from example/gils/records) and index it: rset) is the count of all documents in this speci Extended services from yaz-php - Extended services are also available from the &yaz; &php; client layer. An - example of an &yaz;-&php; extended service transaction is given here: + Extended services are also available from the &yaz; &acro.php; client layer. An + example of an &yaz;-&acro.php; extended service transaction is given here: A fine specimen of a record'; @@ -1804,6 +1880,76 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci + + + Extended services debugging guide + + When debugging ES over PHP we recommend the following order of tests: + + + + + + Make sure you have a nice record on your filesystem, which you can + index from the filesystem by use of the zebraidx command. + Do it exactly as you planned, using one of the GRS-1 filters, + or the DOMXML filter. + When this works, proceed. + + + + + Check that your server setup is OK before you even coded one single + line PHP using ES. + Take the same record form the file system, and send as ES via + yaz-client like described in + , + and + remember the -a option which tells you what + goes over the wire! Notice also the section on permissions: + try + + perm.anonymous: rw + + in zebra.cfg to make sure you do not run into + permission problems (but never expose such an insecure setup on the + internet!!!). Then, make sure to set the general + recordType instruction, pointing correctly + to the GRS-1 filters, + or the DOMXML filters. + + + + + If you insist on using the sysno in the + recordIdNumber setting, + please make sure you do only updates and deletes. Zebra's internal + system number is not allowed for + recordInsert or + specialUpdate actions + which result in fresh record inserts. + + + + + If shadow register is enabled in your + zebra.cfg, you must remember running the + + Z> adm-commit + + command as well. + + + + + If this works, then proceed to do the same thing in your PHP script. + + + + + + +