X-Git-Url: http://git.indexdata.com/?p=pazpar2-moved-to-github.git;a=blobdiff_plain;f=doc%2Fpazpar2_conf.xml;h=c90c9319eafee4f8c727e7d03aaccaaeac0b8c56;hp=6db09992ecb7a94b483112fb1aca243bcb2c681f;hb=24ad8ea356d71c764af19897e2719670a94a3a05;hpb=b7b3b09b5bf04a832b9602d4717d7e1eb512079c diff --git a/doc/pazpar2_conf.xml b/doc/pazpar2_conf.xml index 6db0999..c90c931 100644 --- a/doc/pazpar2_conf.xml +++ b/doc/pazpar2_conf.xml @@ -1,752 +1,1759 @@ - + %local; %entities; - - %common; + + %idcommon; ]> - Pazpar2 &version; + Index Data + Pazpar2 conf 5 + File formats and conventions - + pazpar2_conf Pazpar2 Configuration - + pazpar2.conf - - DESCRIPTION - - The pazpar2 configuration file, together with any referenced XSLT files, - govern pazpar2's behavior as a client, and control the normalization and - extraction of data elements from incoming result records, for the - purposes of merging, sorting, facet analysis, and display. - - - - The file is specified using the option -f on the pazpar2 command line. - There is not presently a way to reload the configuration file without - restarting pazpar2, although this will most likely be added some time - in the future. - + + + DESCRIPTION + + The Pazpar2 configuration file, together with any referenced XSLT files, + govern Pazpar2's behavior as a client, and control the normalization and + extraction of data elements from incoming result records, for the + purposes of merging, sorting, facet analysis, and display. + + + + The file is specified using the option -f on the Pazpar2 command line. + There is not presently a way to reload the configuration file without + restarting Pazpar2, although this will most likely be added some time + in the future. + - FORMAT + + FORMAT + + The configuration file is XML-structured. It must be well-formed XML. All + elements specific to Pazpar2 should belong to the namespace + http://www.indexdata.com/pazpar2/1.0 + (this is assumed in the + following examples). The root element is named "pazpar2". + Under the root element are a number of elements which group categories of + information. The categories are described below. + + + + threads + + This section is optional and is supported for Pazpar2 version 1.3.1 and + later . It is identified by element "threads" which + may include one attribute "number" which specifies + the number of worker-threads that the Pazpar2 instance is to use. + A value of 0 (zero) disables worker-threads (all work is carried out + in main thread). + + + + sockets + + This section is optional and is supported for Pazpar2 version 1.13.0 and + later . It is identified by element "sockets" which + may include one attribute "max" which specifies + the maximum number of sockets to be used by Pazpar2. + + + + file - The configuration file is XML-structured. It must be valid XML. All - elements specific to pazpar2 should belong to the namespace - "http://www.indexdata.com/pazpar2/1.0" (this is assumed in the - following examples). The root element is named 'pazpar2'. Under the - root element are a number of elements which group categories of - information. The categories are described below. - - - server - - This section governs overall behavior of the client. The data - elements are described below. - - - - listen - + This configuration takes one attribute path which + specifies a path to search for local files, such as XSLTs and settings. + The path is a colon separated list of directories. Its default value + is "." which is equivalent to the location of the + main configuration file (where indeed the file element is given). + + + + server + + This section governs overall behavior of a server endpoint. It is identified + by the element "server" which takes an optional attribute, "id", which + identifies this particular Pazpar2 server. Any string value for "id" + may be given. + + + The data + elements are described below. From Pazpar2 version 1.2 this is + a repeatable element. + + + + listen + + + Configures the webservice -- this controls how you can connect + to Pazpar2 from your browser or server-side code. The + attributes 'host' and 'port' control the binding of the + server. The 'host' attribute can be used to bind the server to + a secondary IP address of your system, enabling you to run + Pazpar2 on port 80 alongside a conventional web server. You + can override this setting on the command line using the option -h. + + + + + + proxy + + + If this item is given, Pazpar2 will forward all incoming HTTP + requests that do not contain the filename 'search.pz2' to the + host and port specified using the 'host' and 'port' + attributes. The 'myurl' attribute is required, and should provide + the base URL of the server. Generally, the HTTP URL for the host + specified in the 'listen' parameter. This functionality is + crucial if you wish to use + Pazpar2 in conjunction with browser-based code (JS, Flash, + applets, etc.) which operates in a security sandbox. Such code + can only connect to the same server from which the enclosing + HTML page originated. Pazpar2s proxy functionality enables you + to host all of the main pages (plus images, CSS, etc) of your + application on a conventional webserver, while efficiently + processing webservice requests for metasearch status, results, + etc. + + + + + + icu_chain + + + Specifies character set normalization for relevancy / sorting / + mergekey and facets - for the server. These definitions serves as + default for services that don't have these given. For the meaning + of these settings refer to the + element inside service. + + + + + + relevance / sort / mergekey / facet + + + Obsolete. Use element icu_chain instead. + + + + + + settings + + + Specifies target settings for the server.. These settings serves + as default for all services which don't have these given. + The settings element requires one attribute 'src' which specifies + a settings file or a directory . If a directory is given all + files with suffix .xml is read from this + directory. Refer to + for more information. + + + + + + service + + + This nested element controls the behavior of Pazpar2 with + respect to your data model. In Pazpar2, incoming records are + normalized, using XSLT, into an internal representation. + The 'service' section controls the further processing and + extraction of data from the internal representation, primarily + through the 'metadata' sub-element. + + + Pazpar2 version 1.2 and later allows multiple service elements. + Multiple services must be given a unique ID by specifying + attribute id. + A single service may be unnamed (service ID omitted). The + service ID is referred to in the + init webservice + command's service parameter. + + + + + metadata + + + One of these elements is required for every data element in + the internal representation of the record (see + . It governs + subsequent processing as pertains to sorting, relevance + ranking, merging, and display of data elements. It supports + the following attributes: + + + + + name + - Configures the webservice -- this controls how you can connect - to pazpar2 from your browser or server-side code. The - attributes 'host' and 'port' control the binding of the - server. The 'host' attribute can be used to bind the server to - a secondary IP address of your system, enabling you to run - pazpar2 on port 80 alongside a conventional web server. You - can override this setting on the command lineusing the option -h. + This is the name of the data element. It is matched + against the 'type' attribute of the + 'metadata' element + in the normalized record. A warning is produced if + metadata elements with an unknown name are + found in the + normalized record. This name is also used to + represent + data elements in the records returned by the + webservice API, and to name sort lists and browse + facets. - - + + - - proxy - + + type + - If this item is given, pazpar2 will forward all incoming HTTP - requests that do not contain the filename 'search.pz2' to the - host and port specified using the 'host' and 'port' - attributes. The 'myurl' attribute is required, and should provide - the base URL of the server. Generally, the HTTP URL for the host - specified in the 'listen' parameter. This functionality is - crucial if you wish to use - pazpar2 in conjunction with browser-based code (JS, Flash, - applets, etc.) which operates in a security sandbox. Such code - can only connect to the same server from which the enclosing - HTML page originated. Pazpar2s proxy functionality enables you - to host all of the main pages (plus images, CSS, etc) of your - application on a conventional webserver, while efficiently - processing webservice requests for metasearch status, results, - etc. + The type of data element. This value governs any + normalization or special processing that might take + place on an element. Possible values are 'generic' + (basic string), 'year' (a range is computed if + multiple years are found in the record). Note: This + list is likely to increase in the future. - - + + - - zproxy - + + brief + - If this item is given, pazpar2 will send all Z39.50 - packages through this Z39.50 proxy server. - At least one of the 'host' and 'post' attributes is required. - The 'host' attribute may contain both host name and port - number, seperated by a colon ':', or only the host name. - An empty 'host' attribute sets the Z39.50 host address - to 'localhost'. + If this is set to 'yes', then the data element is + includes in brief records in the webservice API. Note + that this only makes sense for metadata elements that + are merged (see below). The default value is 'no'. - - + + - - icu_chain - + + sortkey + - Definition of ICU tokenization and normalization rules - are used if ICU support is compiled in. The 'id' - attribute is currently not used, and the 'locale' - attribute must be set to one of the locale strings - defined in ICU. The child elements listed below can be - in any order, except the 'index' element which logically - belongs to the end of the list. The stated tokenization, - normalization and charmapping instructions are performed - in order from top to bottom. + Specifies that this data element is to be used for + sorting. The possible values are 'numeric' (numeric + value), 'skiparticle' (string; skip common, leading + articles), and 'no' (no sorting). The default value is + 'no'. - - casemap - - - The attribure 'rule' defines the direction of the - per-character casemapping, allowed values are "l" - (lower), "u" (upper), "t" (title). - - - - normalize - - - Normalization and transformation of tokens follows - the rules defined in the 'rule' attribute. For - possible values we refer to the extensive ICU - documentation found at the - ICU - transformation home page. Set filtering - principles are explained at the - ICU set and - filtering page. - - - - tokenize - - - Tokenization is the only rule in the ICU chain - which splits one token into multiple tokens. The - 'rule' attribute may have the following values: - "s" (sentence), "l" (line-break), "w" (word), and - "c" (character), the later probably not beeing - very useful in a runing pazpar2 installation. - - - - index - - - Finally the 'index' element instruction - without - any 'rule' attribute - is used to store the tokens - after chain processing in the relevance ranking - unit of Pazpar2. It will always be the last - instruction in the chain. - - - - - - - - - service - - This nested element controls the behavior of pazpar2 with - respect to your data model. In pazpar2, incoming records are - normalized, using XSLT, into an internal representation. - The 'service' section controls the further processing and - extraction of data from the internal representation, primarily - through the 'metdata' sub-element. + When 'skiparticle' is used, some common articles from the + English and German languages are ignored. At present the + list is: 'the', 'den', 'der', 'die', 'des', 'an', 'a'. + + - - metadata - - - One of these elements is required for every data element in - the internal representation of the record (see - . It governs - subsequent processing as pertains to sorting, relevance - ranking, merging, and display of data elements. It supports - the following attributes: - - - - name - - - This is the name of the data element. It is matched - against the 'type' attribute of the - 'metadata' element - in the normalized record. A warning is produced if - metdata elements with an unknown name are - found in the - normalized record. This name is also used to - represent - data elements in the records returned by the - webservice API, and to name sort lists and browse - facets. - - - - - type - - - The type of data element. This value governs any - normalization or special processing that might take - place on an element. Possible values are 'generic' - (basic string), 'year' (a range is computed if - multiple years are found in the record). Note: This - list is likely to increase in the future. - - - - - brief - - - If this is set to 'yes', then the data element is - includes in brief records in the webservice API. Note - that this only makes sense for metadata elements that - are merged (see below). The default value is 'no'. - - - - - sortkey - - - Specifies that this data element is to be used for - sorting. The possible values are 'numeric' (numeric - value), 'skiparticle' (string; skip common, leading - articles), and 'no' (no sorting). The default value is - 'no'. - - - - - rank - - - Specifies that this element is to be used to - help rank - records against the user's query (when ranking is - requested). The value is an integer, used as a - multiplier against the basic TF*IDF score. A value of - 1 is the base, higher values give additional - weight to - elements of this type. The default is '0', which - excludes this element from the rank calculation. - - - - - termlist - - - Specifies that this element is to be used as a - termlist, or browse facet. Values are tabulated from - incoming records, and a highscore of values (with - their associated frequency) is made available to the - client through the webservice API. - The possible values - are 'yes' and 'no' (default). - - - - - merge - - - This governs whether, and how elements are extracted - from individual records and merged into cluster - records. The possible values are: 'unique' (include - all unique elements), 'longest' (include only the - longest element (strlen), 'range' (calculate a range - of values across al matching records), 'all' (include - all elements), or 'no' (don't merge; this is the - default); - - - - - - - - - - - - - - - - EXAMPLE - Below is a working example configuration: - - + + rank + + + Specifies that this element is to be used to + help rank + records against the user's query (when ranking is + requested). + The valus is of the form + + M [F N] + + where M is an integer, used as a + weight against the basic TF*IDF score. A value of + 1 is the base, higher values give additional weight to + elements of this type. The default is '0', which + excludes this element from the rank calculation. + + + F is a CCL field and N is the multipler for terms + that matches those part of the CCL field in search. + The F+N combo allows the system to use a different + multipler for a certain field. For example, a rank value of + "1 au 3" gives a multipler of 3 for + all terms part of the au(thor) terms and 1 for everything else. + + + For Pazpar2 1.6.13 and later, the rank may also defined + "per-document", by the normalization stylesheet. + + + The per field rank was introduced in Pazpar2 1.6.15. Earlier + releases only allowed a rank value M (simple integer). + + See for more + about ranking. + + - - - + + termlist + + + Specifies that this element is to be used as a + termlist, or browse facet. Values are tabulated from + incoming records, and a highscore of values (with + their associated frequency) is made available to the + client through the webservice API. + The possible values + are 'yes' and 'no' (default). + + + - - - + + merge + + + This governs whether, and how elements are extracted + from individual records and merged into cluster + records. The possible values are: 'unique' (include + all unique elements), 'longest' (include only the + longest element (strlen), 'range' (calculate a range + of values across all matching records), 'all' (include + all elements), or 'no' (don't merge; this is the + default); + + + Pazpar 1.6.24 also offers a new value for merge, 'first', which + is like 'all' but only takes all from first database that returns + the particular metadata field. + + + + + mergekey + + + If set to 'required', the value of this + metadata element is appended to the resulting mergekey if + the metadata is present in a record instance. + If the metadata element is not present, the a unique mergekey + will be generated instead. + + + If set to 'optional', the value of this + metadata element is appended to the resulting mergekey if the + the metadata is present in a record instance. If the metadata + is not present, it will be empty. + + + If set to 'no' or the mergekey attribute is + omitted, the metadata will not be used in the creation of a + mergekey. + + + - - + + facetrule + + + Specifies the ICU rule set to be used for normalizing + facets. If facetrule is omitted from metadata, the + rule set 'facet' is used. + + + - - - - - - - - - + + limitcluster + + + Allow a limit on merged metadata. The value of this attribute + is the name of actual metadata content to be used for matching + (most often same name as metadata name). + + + + Requires Pazpar2 1.6.23 or later. + + + + - -]]> - - + + limitmap + + + Specifies a default limitmap for this field. This is to avoid mass + configuring of targets. However it is important to review/do + this on a per target since it is usually target-specific. + See limitmap for format. + + + - TARGET SETTINGS - - Pazpar2 features a cunning scheme by which you can associate various - kinds of attributes, or settings with search targets. This can be done - through XML files which are read at startup; each file can associate - one or more settings with one or more targets. The file format is generic - in nature, designed to support a wide range of application requirements. The - settings can be purely technical things, like, how to perform a title - search against a given target, or it can associate arbitrary name=value - pairs with groups of targets -- for instance, if you would like to - place all commercial full-text bases in one group for selection - purposes, or you would like to control what targets are accessible - to users by default. - + + facetmap + + + Specifies a default facetmap for this field. This is to avoid mass + configuring of targets. However it is important to review/do + this on a per target since it is usually target-specific. + See facetmap for format. + + + - - During startup, pazpar2 will recursively read a specified directory - (can be identified in the pazpar2.cfg file or on the command line), and - process any settings files found therein. - + + icurule + + + Specifies the ICU rule set to be used for normalizing + metadata text. The "display" part of the rule is kept + in the returned metadata record (record+show commands), the + end result - normalized text - is used for performing + within-cluster merge (unique, longest, etc). If the icurule is + omitted, type generic (text) is converted as follows: + any of the characters " ,/.:([" are + chopped of prefix and suffix of text content + unless it includes the + characters "://" (URL). + + + + Requires Pazpar2 1.9.0 or later. + + + + + + + setting + + + This attribute allows you to make use of static database + settings in the processing of records. Three possible values + are allowed. 'no' is the default and doesn't do anything. + 'postproc' copies the value of a setting with the same name + into the output of the normalization stylesheet(s). 'parameter' + makes the value of a setting with the same name available + as a parameter to the normalization stylesheet, so you + can further process the value inside of the stylesheet, or use + the value to decide how to deal with other data values. + + + The purpose of using settings in this way can either be to + control the behavior of normalization stylesheet in a database- + dependent way, or to easily make database-dependent values + available to display-logic in your user interface, without having + to implement complicated interactions between the user interface + and your configuration system. + + + + + + + + - - Clients of the pazpar2 webservice interface can selectively override - settings for individual targets within the scope of one session. This - can be used in conjunction with an external authentication system to - determine which resources are to be accessible to which users. Pazpar2 - itself has no notion of end-users, and so can be used in conjunction - with any type of authentication system. Similarly, the authentication - tokens submitted to access-controlled search targets can similarly be - overriden, to allow use of pazpar2 in a consortial or multi-library - environment, where different end-users may need to be represented to - some search targets in different ways. This, again, can be managed - using an external database or other lookup mechanism. Setting overrides - can be performed either using the 'init' or the 'settings' webservice - command (see XXX ref to pazpar2 protocol). - - - - In fact, every setting that applies to a database (except pz:id, which - can only be used for filtering targets to use for a search) can be overriden - on a per-session basis. This allows the client to override specific CCL fields - for searching, etc., to meet the needs of a session or user. - - - - Finally, as an extreme case of this, the webservice client can - introduce entirely new targets, on the fly, as part of the init or - settings command. This is useful if you desire to manage information - about your search targets in a separate application such as a database. - You do not need any static settings file whatsoever to run pazpar2 -- as - long as the webservice client is prepared to supply the necessary - information at the beginning of every session. - - - - NOTE: The following discussion of practical issues related to session and settings - management are cast in terms of a user interface based on Ajax/Javascript - technology. It would apply equally well to many other kinds of browser-based logic. - - - - Typically, a Javascript client is not allowed to directly alter the parameters - of a session. There are two reasons for this. One has to do with access - to information; typically, information about a user will be stored in a - system on the server side, or it will be accessible in some way from the server. - However, since the Javascript client cannot be entirely trusted (some hostile - agent might in fact 'pretend' to be a regular ws client), it is more robust - to control session sesttings from scripting that you run as part of your - webserver. Typically, this can be handled during the session initialization, - as follows: - - - - Step 1: The Javascript client loads, and asks the webserver for a new pazpar2 - session ID. This can be done using a Javascript call, for instance. Note that - it is possible to submit Ajax HTTPXmlRequest calls either to pazpar2 or to the - webserver that pazpar2 is proxying for. See (XXX Insert link to pazpar2 protocol). - - - - Step 2: Code on the webserver authenticates the user, by database lookup, - LDAP access, NCIP, etc. Determines which resources the user has access to, - and any user-specific parameters that are to be applied during this session. - - - - Step 3: The webserver initializes a new pazpar2 settings, and sets user-specific - parameters as necessary, using the init webservice command. A new session ID is - returned. - - - - Step 4: The webserver returns this session ID to the Javascript client, which then - uses the session ID to submit searches, show results, etc. - - - - Step 5: When the Javascript client ceases to use the session, pazpar2 destroys - any session-specific information. - - - SETTINGS FILE FORMAT - - Each file contains a root element named <settings>. It may - contain one or more <set> elements. The settings and set - elements may contain the following attributes. Attributes in the set node - overrides those in the setting root element. Each set node must - specify (directly, or inherited from the parent node) at least a - target, name, and value. - - - - target - - - This specifies the search target to which this setting should be - applied. Targets are identified by their Z39.50 URL, generally - including the host, port, and database name, (e.g. - bagel.indexdata.com:210/marc). Two wildcard forms are accepted: - * (asterisk) matches all known targets; - bagel.indexdata.com:210/* matches all known databases on the given - host. - - - A precedence system determines what happens if there are - overlapping values for the same setting name for the same - target. A setting for a specific target name overrides a - setting whch specifies target using a wildcard. This makes it - easy to set defaults for all targets, and then override them - for specific targets or hosts. If there are - multiple overlapping settings with the same name and target - value, the 'precedence' attribute determines what happens. + xslt + + + Defines a XSLT stylesheet. The xslt + element takes exactly one attribute id + which names the stylesheet. This can be referred to in target + settings . + + + The content of the xslt element is the embedded stylesheet XML + + + + + icu_chain + + + Specifies a named ICU rule set. The icu_chain element must include + attribute 'id' which specifies the identifier (name) for the ICU + rule set. + Pazpar2 uses the particular rule sets for particular purposes. + Rule set 'relevance' is used to normalize + terms for relevance ranking. Rule set 'sort' is used to + normalize terms for sorting. Rule set 'mergekey' is used to + normalize terms for making a mergekey and, finally. Rule set 'facet' + is normally used to normalize facet terms, unless + facetrule is given for a + metadata field. + + + The icu_chain element must also include a 'locale' + attribute which must be set to one of the locale strings + defined in ICU. The child elements listed below can be + in any order, except the 'index' element which logically + belongs to the end of the list. The stated tokenization, + transformation and charmapping instructions are performed + in order from top to bottom. + + + + casemap + + + The attribute 'rule' defines the direction of the + per-character casemapping, allowed values are "l" + (lower), "u" (upper), "t" (title). - + + + + transform + + + Normalization and transformation of tokens follows + the rules defined in the 'rule' attribute. For + possible values we refer to the extensive ICU + documentation found at the + ICU + transformation home page. Set filtering + principles are explained at the + ICU set and + filtering page. + + + + + tokenize + + + Tokenization is the only rule in the ICU chain + which splits one token into multiple tokens. The + 'rule' attribute may have the following values: + "s" (sentence), "l" (line-break), "w" (word), and + "c" (character), the later probably not being + very useful in a pruning Pazpar2 installation. + + + + + + From Pazpar2 version 1.1 the ICU wrapper from YAZ is used. + Refer to the yaz-icu + utility for more information. + + + + + + relevance + + + Specifies the ICU rule set used for relevance ranking. + The child element of 'relevance' must be 'icu_chain' and the + 'id' attribute of the icu_chain is ignored. This + definition is obsolete and should be replaced by the equivalent + construct: + + <icu_chain id="relevance" locale="en">..<icu_chain> + + + + + + + sort + + + Specifies the ICU rule set used for sorting. + The child element of 'sort' must be 'icu_chain' and the + 'id' attribute of the icu_chain is ignored. This + definition is obsolete and should be replaced by the equivalent + construct: + + <icu_chain id="sort" locale="en">..<icu_chain> + + + + + + + mergekey + + + Specifies ICU tokenization and transformation rules + for tokens that are used in Pazpar2's mergekey. + The child element of 'mergekey' must be 'icu_chain' and the + 'id' attribute of the icu_chain is ignored. This + definition is obsolete and should be replaced by the equivalent + construct: + + <icu_chain id="mergekey" locale="en">..<icu_chain> + + + - - name - + + + facet + + + Specifies ICU tokenization and transformation rules + for tokens that are used in Pazpar2's facets. + The child element of 'facet' must be 'icu_chain' and the + 'id' attribute of the icu_chain is ignored. This + definition is obsolete and should be replaced by the equivalent + construct: + + <icu_chain id="facet" locale="en">..<icu_chain> + + + + + + + ccldirective + + + Customizes the CCL parsing (interpretation of query parameter + in search). + The name and value of the CCL directive is gigen by attributes + 'name' and 'value' respectively. Refer to possible list of names + in the + + YAZ manual + . + + + + + + rank + + + Customizes the ranking (relevance) algorithm. Also known as + rank tweaks. The rank element + accepts the following attributes - all being optional: + + + + cluster + + + Attribute 'cluster' is a boolean + that controls whether Pazpar2 should boost ranking for merged + records. Is 'yes' by default. A value of 'no' will make + Pazpar2 average ranking of each record in a cluster. + + + + + debug + + + Attribute 'debug' is a boolean + that controls whether Pazpar2 should include details + about ranking for each document in the show command's + response. Enable by using value "yes", disable by using + value "no" (default). + + + + + follow + - The name of the setting. This can be anything you like. - However, pazpar2 reserves a number of setting names for - specific purposes, all starting with 'pz:', and it is a good - idea to avoid that prefix if you make up your own setting - names. See below for a list of reserved variables. + Attribute 'follow' is a a floating point number greater than + or equal to 0. A positive number will boost weight for terms + that occur close to each other (proximity, distance). + A value of 1, will double the weight if two terms are in + proximity distance of 1 (next to each other). The default + value of 'follow' is 0 (order will not affect weight). - - - - value - + + + + lead + - The value of the setting. Generally, this can be anything you - want -- however, some of the reserved settings may expect - specific kinds of values. + Attribute 'lead' is a floating point number. + It controls if term weight should be reduced by position + from start in a metadata field. A positive value of 'lead' + will reduce weight as it apperas further away from the lead + of the field. Default value is 0 (no reduction of weight by + position). - - - - precedence - + + + + length + - This should be an integer. If not provided, the default value - is 0. If two (or more) settings have the same content for - target and name, the precedence value determines the outcome. - If both settings have the same precedence value, they are both - applied to the target(s). If one has a higher value, then the - value of that setting is applied, and the other one is ignored. + Attribute 'length' determines how/if term weight should be + divided by lenght of metadata field. A value of "linear" + divide by length. A value of "log" will divide by log2(length). + A value of "none" will leave term weight as is (no division). + Default value is "linear". - - - + + + + + Refer to to see how + these tweaks are used in computation of score. + + + Customization of ranking algorithm was introduced with + Pazpar2 1.6.18. The semantics of some of the fields changed + in versions up to 1.6.22. + + + + + + sort-default + + + Specifies the default sort criteria (default 'relevance'), + which previous was hard-coded as default criteria in search. + This is a fix/work-around to avoid re-searching when using + target-based sorting. In order for this to work efficient, + the search must also have the sort critera parameter; otherwise + pazpar2 will do re-searching on search criteria changes, if + changed between search and show command. + + + This configuration was added in pazpar2 1.6.20. + + + + + + + settings + + + Specifies target settings for this service. Refer to + . + + + + + + timeout + + + Specifies timeout parameters for this service. + The timeout + element supports the following attributes: + session, z3950_operation, + z3950_session which specifies + 'session timeout', 'Z39.50 operation timeout', + 'Z39.50 session timeout' respectively. The Z39.50 operation + timeout is the time Pazpar2 will wait for an active Z39.50/SRU + operation before it gives up (times out). The Z39.50 session + time out is the time Pazpar2 will keep the session alive for + an idle session (no operation). + + + The following is recommended but not required: + z3950_operation (30) < session (60) < z3950_session (180) . + The default values are given in parantheses. + + + The Z39.50 operation timeout may be set per database. Refer to + . + + + + + + + + + + + + EXAMPLE + + Below is a working example configuration: + + + + + + + + + + + + + + + + + + + + + + + + + + + + ]]> + + + + + INCLUDE FACILITY + + The XML configuration may be partitioned into multiple files by using + the include element which takes a single attribute, + src. The src attribute is + regular Shell like glob-pattern. For example, + + ]]> + + + The include facility requires Pazpar2 version 1.2. + + + + + TARGET SETTINGS + + Pazpar2 features a cunning scheme by which you can associate various + kinds of attributes, or settings with search targets. This can be done + through XML files which are read at startup; each file can associate + one or more settings with one or more targets. The file format is generic + in nature, designed to support a wide range of application requirements. + The settings can be purely technical things, like, how to perform a title + search against a given target, or it can associate arbitrary name=value + pairs with groups of targets -- for instance, if you would like to + place all commercial full-text bases in one group for selection + purposes, or you would like to control what targets are accessible + to users by default. Per-database settings values can even be used + to drive sorting, facet/termlist generation, or end-user interface display + logic. + + + + During startup, Pazpar2 will recursively read a specified directory + (can be identified in the pazpar2.cfg file or on the command line), and + process any settings files found therein. + + + + Clients of the Pazpar2 webservice interface can selectively override + settings for individual targets within the scope of one session. This + can be used in conjunction with an external authentication system to + determine which resources are to be accessible to which users. Pazpar2 + itself has no notion of end-users, and so can be used in conjunction + with any type of authentication system. Similarly, the authentication + tokens submitted to access-controlled search targets can similarly be + overridden, to allow use of Pazpar2 in a consortial or multi-library + environment, where different end-users may need to be represented to + some search targets in different ways. This, again, can be managed + using an external database or other lookup mechanism. Setting overrides + can be performed either using the + init or the + settings webservice + command. + + + + In fact, every setting that applies to a database (except pz:id, which + can only be used for filtering targets to use for a search) can be overridden + on a per-session basis. + This allows the client to override specific CCL fields for + searching, etc., to meet the needs of a session or user. + + + + Finally, as an extreme case of this, the webservice client can + introduce entirely new targets, on the fly, as part of the + init or + settings command. + This is useful if you desire to manage information + about your search targets in a separate application such as a database. + You do not need any static settings file whatsoever to run Pazpar2 -- as + long as the webservice client is prepared to supply the necessary + information at the beginning of every session. + + + + + The following discussion of practical issues related to session + and settings management are cast in terms of a user interface based on + Ajax/Javascript technology. It would apply equally well to many other + kinds of browser-based logic. + + + + + Typically, a Javascript client is not allowed to directly alter the + parameters of a session. There are two reasons for this. One has to do + with access to information; typically, information about a user will + be stored in a system on the server side, or it will be accessible in + some way from the server. However, since the Javascript client cannot + be entirely trusted (some hostile agent might in fact 'pretend' to be + a regular ws client), it is more robust to control session settings + from scripting that you run as part of your webserver. Typically, this + can be handled during the session initialization, as follows: + + + + Step 1: The Javascript client loads, and asks the webserver for a + new Pazpar2 session ID. This can be done using a Javascript call, for + instance. Note that it is possible to submit Ajax HTTPXmlRequest calls + either to Pazpar2 or to the webserver that Pazpar2 is proxying + for. See (XXX Insert link to Pazpar2 protocol). + + + + Step 2: Code on the webserver authenticates the user, by database lookup, + LDAP access, NCIP, etc. Determines which resources the user has access to, + and any user-specific parameters that are to be applied during this session. + + + + Step 3: The webserver initializes a new Pazpar2 settings, and sets + user-specific parameters as necessary, using the init webservice + command. A new session ID is returned. + + + + Step 4: The webserver returns this session ID to the Javascript + client, which then uses the session ID to submit searches, show + results, etc. + + + + Step 5: When the Javascript client ceases to use the session, + Pazpar2 destroys any session-specific information. + + + + SETTINGS FILE FORMAT + + Each file contains a root element named <settings>. It may + contain one or more <set> elements. The settings and set + elements may contain the following attributes. Attributes in the set + node overrides those in the setting root element. Each set node must + specify (directly, or inherited from the parent node) at least a + target, name, and value. + + + + + target + + + This specifies the search target to which this setting should be + applied. Targets are identified by their Z39.50 URL, generally + including the host, port, and database name, (e.g. + bagel.indexdata.com:210/marc). + Two wildcard forms are accepted: + * (asterisk) matches all known targets; + bagel.indexdata.com:210/* matches all + known databases on the given host. + + + A precedence system determines what happens if there are + overlapping values for the same setting name for the same + target. A setting for a specific target name overrides a + setting which specifies target using a wildcard. This makes it + easy to set defaults for all targets, and then override them + for specific targets or hosts. If there are + multiple overlapping settings with the same name and target + value, the 'precedence' attribute determines what happens. + + + For Pazpar2 1.6.4 or later, the target ID may be user-defined, in + which case, the actual host, port, etc is given by setting + . + + + + + name + + + The name of the setting. This can be anything you like. + However, Pazpar2 reserves a number of setting names for + specific purposes, all starting with 'pz:', and it is a good + idea to avoid that prefix if you make up your own setting + names. See below for a list of reserved variables. + + + + + value + + + The value of the setting. Generally, this can be anything you + want -- however, some of the reserved settings may expect + specific kinds of values. + + + + + precedence + + + This should be an integer. If not provided, the default value + is 0. If two (or more) settings have the same content for + target and name, the precedence value determines the outcome. + If both settings have the same precedence value, they are both + applied to the target(s). If one has a higher value, then the + value of that setting is applied, and the other one is ignored. + + + + + + + By setting defaults for target, name, or value in the root + settings node, you can use the settings files in many different + ways. For instance, you can use a single file to set defaults for + many different settings, like search fields, retrieval syntaxes, + etc. You can have one file per server, which groups settings for + that server or target. You could also have one file which associates + a number of targets with a given setting, for instance, to associate + many databases with a given category or class that makes sense + within your application. + + + + The following examples illustrate uses of the settings system to + associate settings with targets to meet different requirements. + + + + The example below associates a set of default values that can be + used across many targets. Note the wildcard for targets. + This associates the given settings with all targets for which no + other information is provided. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ]]> + + + + The next example shows certain settings overridden for one target, + one which returns XML records containing DublinCore elements, and + which furthermore requires a username/password. + + + + + + + + ]]> + + + + The following example associates a specific name/value combination + with a number of targets. The targets below are access-restricted, + and can only be used by users with special credentials. + + + + + ]]> + + + + + + RESERVED SETTING NAMES + + The following setting names are reserved by Pazpar2 to control the + behavior of the client function. + + + + + + pz:allow + - By setting defaults for target, name, or value in the root - settings node, you can use the settings files in many different - ways. For instance, you can use a single file to set defaults for - many different settings, like search fields, retrieval syntaxes, - etc. You can have one file per server, which groups settings for - that server or target. You could also have one file which associates - a number of targets with a given setting, for instance, to associate - many databases with a given category or class that makes sense - within your application. + Allows or denies access to the resources it is applied to. Possible + values are '0' and '1'. + The default is '1' (allow access to this resource). + + + + pz:apdulog + - The following examples illustrate uses of the settings system to - associate settings with targets to meet different requirements. + If the 'pz:apdulog' setting is defined and has other value than 0, + then Z39.50 APDUs are written to the log. + + + + pz:authentication + + + Sets an authentication string for a given database. For Z39.50, + this is carried as part of the Initialize Request. In order to carry + the information in the "open" elements, separate + username and password with a slash (In Z39.50 it is a VisibleString). + In order to carry the information in the idPass elements, separate + username term, password term and, optionally, a group term with a + single blank. + If three terms are given, the order is + user, group, password. + If only two terms are given, the order is + user, password. + - The example below associates a set of default values that can be - used across many targets. Note the wildcard for targets. - This associates the given settings with all targets for which no - other information is provided. - + For HTTP based procotols, such as SRU and Apache Solr, the + authentication string includes a username term and, optionally, + a password term. + Each term is separated by a single blank. The + authentication information is passed either by HTTP basic + authentication or via URL parameters. The mode of operation is + determined by pz:authentication_mode setting. + + + - - + + pz:authentication_mode + + + Determines how authentication is carried in HTTP based protocols. + Value may be "basic" or "url". + + + - - + + pz:block_timeout + + + (Not yet implemented). + Specifies the time for which a block should be released anyway. + + + - - - - - - + + pz:cclmap:xxx + + + This establishes a CCL field definition or other setting, for + the purpose of mapping end-user queries. XXX is the field or + setting name, and the value of the setting provides parameters + (e.g. parameters to send to the server, etc.). Please consult + the YAZ manual for a full overview of the many capabilities of + the powerful and flexible CCL parser. + + + Note that it is easy to establish a set of default parameters, + and then override them individually for a given target. + + + - + + pz:elements + + + The element set name to be used when retrieving records from a + server. + + + - - + + pz:extendrecs + + + If a show command goes to the boundary of a result set for a + database - depends on sorting - and pz:extendrecs is set to a positive + value. then Pazpar2 wait for show to fetch pz:extendrecs more + records. This setting is best used if a database does native + sorting, because the result set otherwise may be completely + re-sorted during extended fetch. + The default value of pz:extendrecs is 0 (no extended fetch). + + + + The pz:extendrecs setting appeared in Pazpar2 version 1.6.26. + But the bahavior changed with the release of Pazpar2 1.6.29. + + + + - + + pz:facetmap:name + + + Specifies that for field name, the target + supports (native) facets. The value is the name of the + field on the target. + + + - - + + pz:facetmap:split:name + + + Like pz:facetmap, but makes Pazpar2 inspect the term value consisting + of two items separated by colon. First item is the raw ID to be + sent to database if limitmap on the field + name is used. The second item is + the display term. + + + This facility was added in Pazpar2 version 1.11.0. + + + - + + pz:id + + + This setting can't be 'set' -- it contains the ID (normally + ZURL) for a given target, and is useful for filtering -- + specifically when you want to select one or more specific + targets in the search command. + + + - ]]> + + pz:limitmap:name + + + Specifies attributes for limiting a search to a field - using + the limit parameter for search. It can be used to filter locally + or remotely (search in a target). In some cases the mapping of + a field to a value is identical to an existing cclmap field; in + other cases the field must be specified in a different way - for + example to match a complete field (rather than parts of a subfield). + + + The value of limitmap may have one of three forms: referral to + an existing CCL field, a raw PQF string or a local limit. Leading string + determines type; either ccl: for CCL field, + rpn: for PQF/RPN, or local: + for filtering in Pazpar2. The local filtering may be followed + by a field a metadata field (default is to use the name of the + limitmap itself). + + + For Pazpar2 version 1.6.23 and later the limitmap may include multiple + specifications, separated by , (comma). + For example: + ccl:title,local:ltitle,rpn:@attr 1=4. + + + The limitmap facility is supported for Pazpar2 version 1.6.0. + Local filtering is supported in Pazpar2 1.6.6. + + + + + + pz:maxrecs + - The next example shows certain settings overriden for one target, - one which returns XML records containing DublinCore elements, and - which furthermore requires a username/password. - - - - + Controls the maximum number of records to be retrieved from a + server. The default is 100. + + + - - - ]]> + + pz:memcached + + + If set and non-empty, + libMemcached will + configured and enabled for the target. + The value of this setting is same as the ZOOM option + memcached, which in turn is the configuration + string passed to the memcached function + of libMemcached. + + + This setting is honored in Pazpar2 1.6.39 or later. Pazpar2 must + be using YAZ version 5.0.13 or later. + + + + pz:redis + + + If set and non-empty, + redis will + configured and enabled for the target. + The value of this setting is exactly as the redis option for + ZOOM C of YAZ. + - The following example associates a specific name/value combination - with a number of targets. The targets below are access-restricted, - and can only be used by users with special credentials. - - - - - ]]> + This setting is honored in Pazpar2 1.6.43 or later. Pazpar2 must + be using YAZ version 5.2.0 or later. + + - + + pz:nativesyntax + + + Specifies how Pazpar2 shoule map retrieved records to XML. Currently + supported values are xml, + iso2709 and txml. + + + The value iso2709 makes Pazpar2 convert retrieved + MARC records to MARCXML. In order to convert to XML, the exact + chacater set of the MARC must be known (if not, the resulting + XML is probably not well-formed). The character set may be + specified by adding: + ;charset to + iso2709. If omitted, a charset of + MARC-8 is assumed. This is correct for most MARC21/USMARC records. + + + The value txml is like iso2709 + except that records are converted to TurboMARC instead of MARCXML. + + + The value xml is used if Pazpar2 retrieves + records that are already XML (no conversion takes place). + + + - RESERVED SETTING NAMES + + pz:negotiation_charset + - The following setting names are reserved by pazpar2 to control the - behavior of the client function. + Sets character set for Z39.50 negotiation. Most targets do not support + this, and some will even close connection if set (crash on server + side or similar). If set, you probably want to set it to + UTF-8. + + - - - pz:cclmap:xxx - - - This establishes a CCL field definition or other setting, for - the purpose of mapping end-user queries. XXX is the field or - setting name, and the value of the setting provides parameters - (e.g. parameters to send to the server, etc.). Please consult - the YAZ manual for a full overview of the many capabilities of - the powerful and flexible CCL parser. - - - Note that it is easy to etablish a set of default parameters, - and then override them individually for a given target. - - - - - pz:requestsyntax - - - This specifies the record syntax to use when requesting - records from a given server. The value can be a symbolic name like - marc21 or xml, or it can be a Z39.50-style dot-separated OID. - - - - - pz:elements - - - The element set name to be used when retrieving records from a - server (not yet implemented). - - - - - pz:piggyback - - - Piggybacking enables the server to retrieve records from the - server as part of the search response in Z39.50. Almost all - servers support this (or fail it gracefully), but a few - servers will produce undesirable results. - Set to '1' to enable piggybacking, '0' to disable it. Default - is 1 (piggybacking enabled). - - - - - pz:nativesyntax - - - The representation (syntax) of the retrieval records. Currently - recognized values are iso2709 and xml. - - - For iso2709, can also specify a native character set, e.g. "iso2709;latin-1". - If no character set is provided, MARC-8 is assumed. - - - - - pz:xslt - - - Provides the path of an XSLT stylesheet which will be used to - map incoming records to the internal representation. - - - - - pz:authentication - - - Sets an authentication string for a given server. See the section on - authorization and authentication for discussion. - - - - - pz:allow - - - Allows or denies access to the resources it is applied to. Possible - values are '0' and '1'. The default is '1' (allow access to this resource). - See the manual section on authorization and authentication for discussion - about how to use this setting. - - - - - pz:maxrecs - - - Controls the maximum number of records to be retrieved from a - server. The default is 100 (not yet implemented). - - - - - pz:id - - - This setting can't be 'set' -- it contains the ID (normally - ZURL) for a given target, and is useful for filtering -- - specifically when you want to select one or more specific - targets in the search command. - - - - - + + pz:piggyback + + + Piggybacking enables the server to retrieve records from the + server as part of the search response in Z39.50. Almost all + servers support this (or fail it gracefully), but a few + servers will produce undesirable results. + Set to '1' to enable piggybacking, '0' to disable it. Default + is 1 (piggybacking enabled). + + + + + pz:pqf_prefix + + + Allows you to specify an arbitrary PQF query language substring. + The provided string is prefixed to the user's query after it has been + normalized to PQF internally in pazpar2. + This allows you to attach complex 'filters' to queries for a given + target, sometimes necessary to select sub-catalogs + in union catalog systems, etc. + + + + + + pz:pqf_strftime + + + Allows you to extend a query with dates and operators. + The provided string allows certain substitutions and serves as a + format string. + The special two character sequence '%%' gets converted to the + original query. Other characters leading with the percent sign are + conversions supported by strftime. + All other characters are copied verbatim. For example, the string + @and @attr 1=30 @attr 2=3 %Y %% + would search for current year combined with the original PQF (%%). + + + This setting can also be used as more general alternative to + pz:pqf_prefix -- a way of embedding the submitted query + anywhere in the string rather than appending it to prefix. For + example, if it is desired to omit all records satisfying the + query @attr 1=pica.bib 0007 then this + subquery can be combined with the submitted query as the second + argument of @andnot by using the + pz:pqf_strftime value @not %% @attr 1=pica.bib + 0007. + + + + + + pz:preferred + + + Specifies that a target is preferred, e.g. possible local, faster + target. Using block=preferred on + show command will wait for all these + targets to return records before releasing the block. + If no target is preferred, the block=preferred will identical to + block=1, which release when one target has returned records. + + + + + + pz:present_chunk + + + Controls the chunk size in present requests. Pazpar2 will + make (maxrecs / chunk) request(s). The default is 20. + + + + + + pz:queryencoding + + + The encoding of the search terms that a target accepts. Most + targets do not honor UTF-8 in which case this needs to be specified. + Each term in a query will be converted if this setting is given. + + + + + + pz:recordfilter + + + Specifies a filter which allows Pazpar2 to only include + records that meet a certain criteria in a result. + Unmatched records will be ignored. + The filter takes the form name, name~value, or name=value, which + will include only records with metadata element (name) that has the + substring (~value) given, or matches exactly (=value). + If value is omitted all records with the named metadata element + present will be included. + + + + + + pz:requestsyntax + + + This specifies the record syntax to use when requesting + records from a given server. The value can be a symbolic name like + marc21 or xml, or it can be a Z39.50-style dot-separated OID. + + + + + + pz:sort + + + Specifies sort criteria to be applied to the result set. + Only works for targets which support the sort service. + + + + + + pz:sortmap:field + + + Specifies native sorting for a target where + field is a sort criterion (see command + show). The value has two components separated by a colon: strategy and + native-field. Strategy is one of z3950, + type7, cql, + sru11, or embed. + The second component, native-field, is the field that is recognized + by the target. + + + + Only supported for Pazpar2 1.6.4 and later. + + + + + + + pz:sru + + + This setting enables + SRU/Solr + support. + It has four possible settings. + 'get', enables SRU access through GET requests. 'post' enables SRU/POST + support, less commonly supported, but useful if very large requests are + to be submitted. 'soap' enables the SRW (SRU over SOAP) variation of + the protocol. + + + A value of 'solr' enables Solr client support. This is supported + for Pazpar version 1.5.0 and later. + + + + + + pz:sru_version + + + This allows SRU version to be specified. If unset Pazpar2 + will the default of YAZ (currently 1.2). Should be set + to 1.1 or 1.2. For Solr, the current supported/tested version + is 1.4 and 3.x. + + + + + + pz:termlist_term_count + + + Specifies number of facet terms to be requested from the target. + The default is unspecified e.g. server-decided. Also see pz:facetmap. + + + + + + pz:termlist_term_factor + + + Specifies whether to use a factor for pazpar2 generated facets (1) + or not (0). + When mixing locally generated (by the downloaded (pz:maxrecs) samples) + facet with native (target-generated) facets, the later will + dominated the dominate the facet list since they are generated + based on the complete result set. + By scaling up the facet count using the ratio between total hit + count and the sample size, + the total facet count can be approximated and thus better compared + with native facets. This is not enabled by default. + + + + + + + pz:timeout + + + Specifies timeout for operation (eg search, and fetch) for + a database. This overrides the z3650_operation timeout + that is given for a service. See . + + + + The timeout facility is supported for Pazpar2 version 1.8.4 and later. + + + + + + + pz:url + + + Specifies URL for the target and overrides the target ID. + + + + pz:url is only recognized for + Pazpar2 1.6.4 and later. + + + + + + pz:xslt + + + Is a comma separated list of of stylesheet names that specifies + how to convert incoming records to the internal representation. + + + For each name, the embedded stylesheets (XSL) that comes with the + service definition are consulted first and takes precedence over + external files; see + of service definition). + If the name does not match an embedded stylesheet it is + considered a filename. + + + The suffix of each file specifies the kind of tranformation. + Suffix ".xsl" makes an XSL transform. Suffix + ".mmap" will use the MMAP transform (described below). + + + The special value "auto" will use a file + which is the pz:requestsyntax's + value followed by + '.xsl'. + + + When mapping MARC records, XSLT can be bypassed for increased + performance with the alternate "MARC map" format. Provide the + path of a file with extension ".mmap" containing on each line: + + <field> <subfield> <metadata element> + For example: + + 245 a title + 500 $ description + 773 * citation + + To map the field value specify a subfield of '$'. To store a + concatenation of all subfields, specify a subfield of '*'. + + + + + + pz:zproxy + + + The 'pz:zproxy' setting has the value syntax + 'host.internet.adress:port', it is used to tunnel Z39.50 + requests through the named Z39.50 proxy. + + + + + + + + + + SEE ALSO + + + pazpar2 + 8 + + + yaz-icu + 1 + + + pazpar2_protocol + 7 + +