X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Fzebra.sgml;h=7b83012b5d3ec0793793bae434dd4fc24bd6935e;hb=067b55382bc9916b3f7dcd473512c703d4de4a5d;hp=2929114da05935829514a466c1949e9f6b9fb101;hpb=1cdd84e7d045f28abfbca5a76712f1c9b8475809;p=idzebra-moved-to-github.git diff --git a/doc/zebra.sgml b/doc/zebra.sgml index 2929114..7b83012 100644 --- a/doc/zebra.sgml +++ b/doc/zebra.sgml @@ -1,13 +1,13 @@
Zebra Server - Administrators's Guide and Reference <author><htmlurl url="http://www.indexdata.dk/" name="Index Data">, <tt><htmlurl url="mailto:info@index.ping.dk" name="info@index.ping.dk"></> -<date>$Revision: 1.28 $ +<date>$Revision: 1.35 $ <abstract> The Zebra information server combines a versatile fielded/free-text search engine with a Z39.50-1995 frontend to provide a powerful and flexible @@ -159,9 +159,6 @@ data elements in records. *Port the system to Windows NT. <item> -Add index and data compression to save disk space. - -<item> Add more sophisticated relevance ranking mechanisms. Add support for soundex and stemming. Add relevance <it/feedback/ support. @@ -197,7 +194,7 @@ provide an adequate compiler. Unpack the distribution archive. In some cases, you may want to edit the top-level <tt/Makefile/, eg. to select a different C compiler, or -to specify machine-specific libraries in the <bf/NETLIB/ variable. +to specify machine-specific libraries in the <bf/ELIBS/ variable. When you are done editing the <tt>Makefile</tt> type: <tscreen><verb> @@ -415,9 +412,12 @@ section <ref id="locating-records" name="Locating Records">. Enables the <it/safe update/ facility of Zebra, and tells the system where to place the required, temporary files. See section <ref id="shadow-registers" name="Safe Updating - Using Shadow Registers">. -<tag>lockPath</tag> +<tag>lockDir</tag> Directory in which various lock files are stored. -<tag>tempSetPath</tag> +<tag>keyTmpDir</tag> + Directory in which temporary files used during zebraidx' update + phase are stored. +<tag>setTmpDir</tag> Specifies the directory that the server uses for temporary result sets. If not specified <tt>/tmp</tt> will be used. <tag>profilePath</tag> @@ -430,8 +430,10 @@ section <ref id="locating-records" name="Locating Records">. <tag>charMap</tag> Specifies the filename of a character mapping. Zebra uses the path, <tt>profilePath</tt>, to locate this file. +<tag>memMax</tag> + Specifies size of internal memory to use for the zebraidx program. The + amount is given in megabytes - default is 4 (4 MB). </descrip> - <sect1>Locating Records<label id="locating-records"> <p> The default behaviour of the Zebra system is to reference the @@ -944,10 +946,11 @@ the client. The maximum PDU size is negotiated down to a maximum of <sect2>Search <p> -The supported query type are 1 and 101 All operators except PROXIMITY -are currently supported. Queries can be arbitrarily complex. Named -result sets are supported, and result sets can be used as operands -with no limitations. Searches may span multiple databases. +The supported query type are 1 and 101. All operators are currently +supported except that only proximity units of type "word" are supported +for the proximity operator. Queries can be arbitrarily complex. Named +result sets are supported, and result sets can be used as operands with +no limitations. Searches may span multiple databases. The server has full support for piggy-backed present requests (see also the following section). @@ -986,8 +989,127 @@ search. As a default, a single error (deletion, insertion, replacement) is accepted when terms are matched against the register contents. -<sect2>Present +Zebra interprets queries in one the following ways: +<descrip> +<tag>1 Phrase search</tag> + Each token separated by white space is truncated according to the + value of truncation attribute. If the completeness attribute + is <bf/complete subfield/ the search is directed to the phrase + register. For other completeness attribute values the term is split + into tokens according to the white-space specification in the + character map. Only records in which each token exists in the order + specified are matched. +<tag>2 Word search</tag> + The token is truncated according to the value of truncation attribute. + The completeness attribute is ignored. +<tag>3 Ranked search</tag> + Each token separated by white space is truncated according to the value + of truncation attribute. The completenss attribute is ignored. +<tag>4 Numeric relation</tag> + The token should consist of decimal digits. The integer is matched + against integers in the register according to the relation attribute. + The truncation - and the completenss attribute is ignored. +<tag>5 Document identifier</tag> + The token consists of exactly one document identifier. The + truncation - and the completeness attribute is ignored. +</descrip> + +For ranked searches the result sets are ranked and a score +is associated with each record. All other result sets from the +remaining four types are non-ranked. + +Combinations of the structure attribute and the relation attribute +determine how the query is interpreted. The two following tables +define how. + +<verb> + Structure Attribute (4) + none phrase(1) word(2) word list(6) + + none 1 1 2 3 + = (3) 1 1 2 3 + < (1) 4 4 4 4 +Relation <= (2) 4 4 4 4 +Attribute >= (4) 4 4 4 4 + (2) > (5) 4 4 4 4 + <> (6) - - - - + rel (102) 3 3 3 3 + other 1 1 2 3 + +</verb> + +<verb> + Structure Attribute (4) + free-form- document- local- string + text text number + (105) (106) (107) (108) + none 3 3 5 1 + = (3) 3 3 5 1 + < (1) 4 4 5 4 + Relation <= (2) 4 4 5 4 + Attribute >= (4) 4 4 5 4 + (2) > (5) 4 4 5 4 + <> (6) - - 5 - + rel (102) 3 3 5 3 + other 3 3 5 1 + +</verb> +<sect3>Regular expressions +<p> + +Each term in a query is interpreted as a regular expression if +the truncation value is either <bf/Regxp-1/ (102) or <bf/Regxp-2/ (103). +Both query types follow the same syntax with the operands: +<descrip> +<tag/x/ Matches the character <it/x/. +<tag/./ Matches any character. +<tag><tt/[/..<tt/]/</tag> Matches the set of characters specified; + such as <tt/[abc]/ or <tt/[a-c]/. +</descrip> +and the operators: +<descrip> +<tag/x*/ Matches <it/x/ zero or more times. Priority: high. +<tag/x+/ Matches <it/x/ one or more times. Priority: high. +<tag/x?/ Matches <it/x/ once or twice. Priority: high. +<tag/xy/ Matches <it/x/, then <it/y/. Priority: medium. +<tag/x|y/ Matches either <it/x/ or <it/y/. Priority: low. +</descrip> +The order of evaluation may be changed by using parentheses. + +If the first character of the <bf/Regxp-2/ query is a plus character +(<tt/+/) it marks the beginning of a section with non-standard +specifiers. The next plus character marks the end of the section. +Currently Zebra only supports one specifier, the error tolerance, +which consists one digit. + +Since the plus operator is normally a suffix operator the addition to +the query syntax doesn't violate the syntax for standard regular +expressions. + +<sect3>Query examples +<p> +Phrase search for <bf/information retrieval/ in the title-register: +<verb> + @attr 1=4 "information retrieval" +</verb> + +Ranked search for the same thing: +<verb> + @attr 1=4 @attr 2=102 "Information retrieval" +</verb> + +Phrase search with a regular expression: +<verb> + @attr 1=4 @attr 5=102 "informat.* retrieval" +</verb> + +Ranked search with a regular expression: +<verb> + @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval" +</verb> + +<sect2>Present <p> The present facility is supported in a standard fashion. The requested record syntax is matched against the ones supported by the profile of @@ -1541,16 +1663,28 @@ given element set name with an element selection file. If an (@) is given in place of the filename, this corresponds to a null mapping for the given element set name. -<tag>elm <it/path name attribute/</tag> (o,r) Adds an element +<tag>any <it/tags/</tag> (o) This directive specifies a list of +attributes which should be appended to the attribute list given for each +element. The effect is to make every single element in the abstract +syntax searchable by way of the given attributes. This directive +provides an efficient way of supporting free-text searching across all +elements. However, it does increase the size of the index +significantly. The attributes can be qualified with a structure, as in +the <bf/elm/ directive below. + +<tag>elm <it/path name attributes/</tag> (o,r) Adds an element to the abstract record syntax of the schema. The <it/path/ follows the syntax which is suggested by the Z39.50 document - that is, a sequence of tags separated by slashes (/). Each tag is given as a comma-separated pair of tag type and -value surrounded by parenthesis. -The <it/name/ is the name of the element, and the <it/attribute/ -specifies what attribute to use when indexing the element. A ! in +The <it/name/ is the name of the element, and the <it/attributes/ +specifies which attributes to use when indexing the element in a +comma-separated list. A ! in place of the attribute name is equivalent to specifying an attribute name identical to the element name. A - in place of the attribute name -specifies that no indexing is to take place for the given element. +specifies that no indexing is to take place for the given element. The +attributes can be qualified with a &dquot;p&dquot; or &dquot;w&dquot; +to specify either word or phrase (complete field) indexing. </descrip> <it>