X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Fpazpar2_conf.xml;h=9e75b4810cbd5c3163ac1bd5549857efb21ee3ce;hb=248d8951ec9cd5a7fc786a5b621a51a8e9a852af;hp=4df6094fc398d5398b2d21132ce1e9a0361bab24;hpb=8f811d796c8d84b1bfb932bbbd8580a70a0c1b4d;p=pazpar2-moved-to-github.git diff --git a/doc/pazpar2_conf.xml b/doc/pazpar2_conf.xml index 4df6094..9e75b48 100644 --- a/doc/pazpar2_conf.xml +++ b/doc/pazpar2_conf.xml @@ -1,6 +1,6 @@ - %local; @@ -13,10 +13,13 @@ Pazpar2 &version; + Index Data + Pazpar2 conf 5 + File formats and conventions @@ -48,18 +51,33 @@ FORMAT - The configuration file is XML-structured. It must be valid XML. All + The configuration file is XML-structured. It must be well-formed XML. All elements specific to Pazpar2 should belong to the namespace http://www.indexdata.com/pazpar2/1.0 (this is assumed in the - following examples). The root element is named pazpar2. + following examples). The root element is named "pazpar2". Under the root element are a number of elements which group categories of information. The categories are described below. + threads + + This section is optional and is supported for Pazpar2 version 1.3.1 and + later . It is identified by element "threads" which + may include one attribute "number" which specifies + the number of worker-threads that the Pazpar2 instance is to use. + A value of 0 (zero) disables worker-threads (all work is carried out + in main thread). + + server - This section governs overall behavior of the server. The data + This section governs overall behavior of a server endpoint. It is identified + by the element "server" which takes an optional attribute, "id", which + identifies this particular Pazpar2 server. Any string value for "id" + may be given. + + The data elements are described below. From Pazpar2 version 1.2 this is a repeatable element. @@ -101,80 +119,30 @@ - - - relevance - - - Specifies ICU tokenization and transformation rules - for tokens that are used in Pazpar2's relevance ranking. The 'id' - attribute is currently not used, and the 'locale' - attribute must be set to one of the locale strings - defined in ICU. The child elements listed below can be - in any order, except the 'index' element which logically - belongs to the end of the list. The stated tokenization, - transformation and charmapping instructions are performed - in order from top to bottom. - - - casemap - - - The attribute 'rule' defines the direction of the - per-character casemapping, allowed values are "l" - (lower), "u" (upper), "t" (title). - - - - transform - - - Normalization and transformation of tokens follows - the rules defined in the 'rule' attribute. For - possible values we refer to the extensive ICU - documentation found at the - ICU - transformation home page. Set filtering - principles are explained at the - ICU set and - filtering page. - - - - tokenize - - - Tokenization is the only rule in the ICU chain - which splits one token into multiple tokens. The - 'rule' attribute may have the following values: - "s" (sentence), "l" (line-break), "w" (word), and - "c" (character), the later probably not being - very useful in a pruning Pazpar2 installation. - - - - - - - sort + relevance / sort / mergekey / facet - Specifies ICU tokenization and transformation rules - for tokens that are used in Pazpar2's sorting. The contents - is similar to that of relevance. + Specifies character set normalization for relevancy / sorting / + mergekey and facets - for the server. These definitions serves as + default for services that don't have these given. For the meaning + of these settings refer to the "relevance" element inside service. - mergekey + settings - Specifies ICU tokenization and transformation rules - for tokens that are used in Pazpar2's mergekey. The contents - is similar to that of relevance. + Specifies target settings for the server.. These settings serves + as default for all services which don't have these given. + The settings element requires one attribute 'src' which specifies + a settings file or a directory . If a directory is given all + files with suffix .xml is read from this + directory. Refer to + for more information. @@ -195,7 +163,8 @@ Multiple services must be given a unique ID by specifying attribute id. A single service may be unnamed (service ID omitted). The - service ID is referred to in the init webservice + service ID is referred to in the + init webservice command's service parameter. @@ -206,9 +175,9 @@ One of these elements is required for every data element in the internal representation of the record (see . It governs - subsequent processing as pertains to sorting, relevance - ranking, merging, and display of data elements. It supports - the following attributes: + subsequent processing as pertains to sorting, relevance + ranking, merging, and display of data elements. It supports + the following attributes: @@ -306,7 +275,7 @@ longest element (strlen), 'range' (calculate a range of values across all matching records), 'all' (include all elements), or 'no' (don't merge; this is the - default); + default); @@ -314,86 +283,242 @@ mergekey - If set to yes, the value of this - metadata element is appended to the resulting mergekey. - By default metadata is not part of a mergekey. + If set to 'required', the value of this + metadata element is appended to the resulting mergekey if + the metadata is present in a record instance. + If the metadata element is not present, the a unique mergekey + will be generated instead. + + + If set to 'optional', the value of this + metadata element is appended to the resulting mergekey if the + the metadata is present in a record instance. If the metadata + is not present, it will be empty. + + + If set to 'no' or the mergekey attribute is + omitted, the metadata will not be used in the creation of a + mergekey. - setting - This attribute allows you to make use of static database - settings in the processing of records. Three possible values - are allowed. 'no' is the default and doesn't do anything. - 'postproc' copies the value of a setting with the same name - into the output of the normalization stylesheet(s). 'parameter' - makes the value of a setting with the same name available - as a parameter to the normalization stylesheet, so you - can further process the value inside of the stylesheet, or use - the value to decide how to deal with other data values. + This attribute allows you to make use of static database + settings in the processing of records. Three possible values + are allowed. 'no' is the default and doesn't do anything. + 'postproc' copies the value of a setting with the same name + into the output of the normalization stylesheet(s). 'parameter' + makes the value of a setting with the same name available + as a parameter to the normalization stylesheet, so you + can further process the value inside of the stylesheet, or use + the value to decide how to deal with other data values. + The purpose of using settings in this way can either be to + control the behavior of normalization stylesheet in a database- + dependent way, or to easily make database-dependent values + available to display-logic in your user interface, without having + to implement complicated interactions between the user interface + and your configuration system. - The purpose of using settings in this way can either be to - control the behavior of normalization stylesheet in a database- - dependent way, or to easily make database-dependent values - available to display-logic in your user interface, without having - to implement complicated interactions between the user interface - and your configuration system. + + + + relevance + + + Specifies ICU tokenization and transformation rules + for tokens that are used in Pazpar2's relevance ranking. + The 'id' attribute is currently not used, and the 'locale' + attribute must be set to one of the locale strings + defined in ICU. The child elements listed below can be + in any order, except the 'index' element which logically + belongs to the end of the list. The stated tokenization, + transformation and charmapping instructions are performed + in order from top to bottom. + + + casemap + + + The attribute 'rule' defines the direction of the + per-character casemapping, allowed values are "l" + (lower), "u" (upper), "t" (title). + + + + transform + + + Normalization and transformation of tokens follows + the rules defined in the 'rule' attribute. For + possible values we refer to the extensive ICU + documentation found at the + ICU + transformation home page. Set filtering + principles are explained at the + ICU set and + filtering page. + + + + tokenize + + + Tokenization is the only rule in the ICU chain + which splits one token into multiple tokens. The + 'rule' attribute may have the following values: + "s" (sentence), "l" (line-break), "w" (word), and + "c" (character), the later probably not being + very useful in a pruning Pazpar2 installation. + + + + + + From Pazpar2 version 1.1 the ICU wrapper from YAZ is used. + Refer to the yaz-icu + utility for more information. + + + + + + sort + + + Specifies ICU tokenization and transformation rules + for tokens that are used in Pazpar2's sorting. The contents + is similar to that of relevance. + + + + + + mergekey + + + Specifies ICU tokenization and transformation rules + for tokens that are used in Pazpar2's mergekey. The contents + is similar to that of relevance. + + + + + + facet + + + Specifies ICU tokenization and transformation rules + for tokens that are used in Pazpar2's facets. The contents + is similar to that of relevance. + + + + + + settings + + + Specifies target settings for this service. Refer to + . + + + + + + timeout + + + Specifies timeout parameters for this service. + The timeout + element supports the following attributes: + session, z3950_operation, + z3950_session which specifies + 'session timeout', 'Z39.50 operation timeout', + 'Z39.50 session timeout' respectively. The Z39.50 operation + timeout is the time Pazpar2 will wait for an active Z39.50/SRU + operation before it gives up (times out). The Z39.50 session + time out is the time Pazpar2 will keep the session alive for + an idle session (no operation). + + + The following is recommended but not required: + z3950_operation (30) < session (60) < z3950_session (180) . + The default values are given in parantheses. + + + + + - + EXAMPLE Below is a working example configuration: - - - - - - - - - - - - - - - - - - - - - -]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + ]]> - + + INCLUDE FACILITY + + The XML configuration may be partitioned into multiple files by using + the include element which takes a single attribute, + src. The of the src attribute is + regular Shell like glob-pattern. For example, + + ]]> + + + The include facility requires Pazpar2 version 1.2. + + + TARGET SETTINGS Pazpar2 features a cunning scheme by which you can associate various @@ -429,7 +554,9 @@ environment, where different end-users may need to be represented to some search targets in different ways. This, again, can be managed using an external database or other lookup mechanism. Setting overrides - can be performed either using the 'init' or the 'settings' webservice + can be performed either using the + init or the + settings webservice command. @@ -442,8 +569,10 @@ Finally, as an extreme case of this, the webservice client can - introduce entirely new targets, on the fly, as part of the init or - settings command. This is useful if you desire to manage information + introduce entirely new targets, on the fly, as part of the + init or + settings command. + This is useful if you desire to manage information about your search targets in a separate application such as a database. You do not need any static settings file whatsoever to run Pazpar2 -- as long as the webservice client is prepared to supply the necessary @@ -680,7 +809,7 @@ - + pz:requestsyntax @@ -716,16 +845,27 @@ pz:nativesyntax - The representation (syntax) of the retrieval records. Currently - recognized values are iso2709 and xml. + Specifies how Pazpar2 shoule map retrieved records to XML. Currently + supported values are xml, + iso2709 and txml. - For iso2709, can also specify a native character set, e.g. "iso2709;latin-1". - If no character set is provided, MARC-8 is assumed. + The value iso2709 makes Pazpar2 convert retrieved + MARC records to MARCXML. In order to convert to XML, the exact + chacater set of the MARC must be known (if not, the resulting + XML is probably not well-formed). The character set may be + specified by adding: + ;charset=charset to + iso2709. If omitted, a charset of + MARC-8 is assumed. This is correct for most MARC21/USMARC records. - If pz:nativesyntax is not specified, pazpar2 will attempt to determine - the value based on the response from the server. + The value txml is like iso2709 + except that records are converted to TurboMARC instead of MARCXML. + + + The value xml is used if Pazpar2 retrieves + records that are already XML (no conversion takes place). @@ -742,11 +882,48 @@ + pz:negotiation_charset + + + Sets character set for Z39.50 negotiation. Most targets do not support + this, and some will even close connection if set (crash on server + side or similar). If set, you probably want to set it to + UTF-8. + + + + + pz:xslt - Provides the path of an XSLT stylesheet which will be used to - map incoming records to the internal representation. + Is a comma separated list of of files that specifies + how to convert incoming records to the internal representation. + + + The suffix of each file specifies the kind of tranformation. + Suffix ".xsl" makes an XSL transform. Suffix + ".mmap" will use the MMAP transform (described below). + + + The special value "auto" will use a file + which is the pz:requestsyntax's + value followed by + '.xsl'. + + + When mapping MARC records, XSLT can be bypassed for increased + performance with the alternate "MARC map" format. Provide the + path of a file with extension ".mmap" containing on each line: + + <field> <subfield> <metadata element> + For example: + + 245 a title + 500 $ description + 773 * citation + To map the field value specify a subfield of '$'. To store a + concatenation of all subfields, specify a subfield of '*'. @@ -800,7 +977,7 @@ - + pz:apdulog @@ -810,45 +987,135 @@ + + + pz:sru + + + This setting enables + SRU/SOLR + support. + It has four possible settings. + 'get', enables SRU access through GET requests. 'post' enables SRU/POST + support, less commonly supported, but useful if very large requests are + to be submitted. 'srw' enables the SRW (SRU over SOAP) variation of + the protocol. + + + A value of 'solr' anables SOLR client support. This is supported + for Pazpar version 1.5.0 and later. + + + + + + pz:sru_version + + + This allows SRU version to be specified. If unset Pazpar2 + will the default of YAZ (currently 1.2). Should be set + to 1.1 or 1.2. For SOLR, the current supported/tested version is 1.4 + + + + + + pz:pqf_prefix + + + Allows you to specify an arbitrary PQF query language substring. + The provided string is prefixed the user's query after it has been + normalized to PQF internally in pazpar2. + This allows you to attach complex 'filters' to queries for a given + target, sometimes necessary to select sub-catalogs + in union catalog systems, etc. + + + + + + pz:pqf_strftime + + + Allows you to extend a query with dates and operators. + The provided string allows certain substitutions and serves as a + format string. + The special two character sequence '%%' gets converted to the + original query. Other characters leading with the percent sign are + conversions supported by strftime. + All other characters are copied verbatim. For example, the string + @and @attr 1=30 @attr 2=3 %Y %% + would search for current year combined with the original PQF (%%). + + + + + + pz:sort + + + Specifies sort criteria to be applied to the result set. + Only works for targets which support the sort service. + + + - pz:sru - - - This setting enables SRU/SRW support. It has three possible settings. - 'get', enables SRU access through GET requests. 'post' enables SRU/POST - support, less commonly supported, but useful if very large requests are - to be submitted. 'srw' enables the SRW variation of the protocol. - - + pz:recordfilter + + + Specifies a filter which allows Pazpar2 to only include + records that meet a certain criteria in a result. Unmatched records + will be ignored. The filter takes the form name, name~value, or name=value, which + will include only records with metadata element (name) that has the + substring (~value) given, or matches exactly (=value). If value is omitted all records + with the named + metadata element present will be included. + + + + + + pz:preferred + + + Specifies that a target is preferred, e.g. possible local, faster target. Using block=pref on show command + will wait for all these targets to return records before releasing the block. If no target is preferred, + the block=pref will identical to block=1, which release when one target has returned records. + + - pz:sru_version - - - This allows SRU version to be specified. If unset Pazpar2 - will the default of YAZ (currently 1.2). Should be set - to 1.1 or 1.2. - - + pz:block_timeout + + + (Not yet implemented). Specifies the time for which a block should be released anyway. + + - pz:pqf_prefix - - - Allows you to specify an arbitrary PQF query language substring. The provided - string is prefixed the user's query after it has been normalized to PQF - internally in pazpar2. This allows you to attach complex 'filters' to - queries for a gien target, sometimes necessary to select sub-catalogs - in union catalog systems, etc. + pz:facetmap:name + + + Specifies that for field name, the target + supports (native) facets. The value is the name of the + field on the target. + + + + At this point only SOLR targets have been tested with this + facility. - + + + - + + SEE ALSO