X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Fpazpar2_conf.xml;h=3785042f3e9a5007ea738389629bdda910e058b0;hb=fab92b03dfb7807bc6c9aac7b4ce51ff6ee98534;hp=6deafa28ee008f4b10f1df061b31cbe67025cb1c;hpb=bb1af0d3e894c72d2392108890cb20db173f9f13;p=pazpar2-moved-to-github.git diff --git a/doc/pazpar2_conf.xml b/doc/pazpar2_conf.xml index 6deafa2..3785042 100644 --- a/doc/pazpar2_conf.xml +++ b/doc/pazpar2_conf.xml @@ -8,7 +8,7 @@ %common; ]> - + Pazpar2 @@ -31,22 +31,476 @@ DESCRIPTION - + + The pazpar2 configuration file, together with any referenced XSLT files, + govern pazpar2's behavior as a client, and control the normalization and + extraction of data elements from incoming result records, for the + purposes of merging, sorting, facet analysis, and display. + + + + The file is specified using the option -f on the pazpar2 command line. + There is not presently a way to reload the configuration file without + restarting pazpar2, although this will most likely be added some time + in the future. + + + FORMAT + + The configuration file is XML-structured. It must be valid XML. All + elements specific to pazpar2 should belong to the namespace + "http://www.indexdata.com/pazpar2/1.0" (this is assumed in the + following examples). The root element is named 'pazpar2'. Under the + root element are a number of elements which group categories of + information. The categories are described below. + + + server + + This section governs overall behavior of the client. The data + elements are described below. + + + + listen + + + Configures the webservice -- this controls how you can connect + to pazpar2 from your browser or server-side code. The + attributes 'host' and 'port' control the binding of the + server. The 'host' attribute can be used to bind the server to + a secondary IP address of your system, enabling you to run + pazpar2 on port 80 alongside a conventional web server. You + can override this setting on the command lineusing the option -h. + + + + + + proxy + + + If this item is given, pazpar2 will forward all incoming HTTP + requests that do not contain the filename 'search.pz2' to the + host and port specified using the 'host' and 'port' + attributes. The 'myurl' attribute is required, and should provide + the base URL of the server. Generally, the HTTP URL for the host + specified in the 'listen' parameter. This functionality is + crucial if you wish to use + pazpar2 in conjunction with browser-based code (JS, Flash, + applets, etc.) which operates in a security sandbox. Such code + can only connect to the same server from which the enclosing + HTML page originated. Pazpar2s proxy functionality enables you + to host all of the main pages (plus images, CSS, etc) of your + application on a conventional webserver, while efficiently + processing webservice requests for metasearch status, results, + etc. + + + + + + zproxy + + + If this item is given, pazpar2 will send all Z39.50 + packages through this Z39.50 proxy server. + At least one of the 'host' and 'post' attributes is required. + The 'host' attribute may contain both host name and port + number, seperated by a colon ':', or only the host name. + An empty 'host' attribute sets the Z39.50 host address + to 'localhost'. + + + + + + service + + + This nested element controls the behavior of pazpar2 with + respect to your data model. In pazpar2, incoming records are + normalized, using XSLT, into an internal representation. + The 'service' section controls the further processing and + extraction of data from the internal representation, primarily + through the 'metdata' sub-element. + + + + metadata + + + One of these elements is required for every data element in + the internal representation of the record (see + . It governs + subsequent processing as pertains to sorting, relevance + ranking, merging, and display of data elements. It supports + the following attributes: + + + + name + + + This is the name of the data element. It is matched + against the 'type' attribute of the 'metadata' element + in the normalized record. A warning is produced if + metdata elements with an unknown name are found in the + normalized record. This name is also used to represent + data elements in the records returned by the + webservice API, and to name sort lists and browse + facets. + + + + + type + + + The type of data element. This value governs any + normalization or special processing that might take + place on an element. Possible values are 'generic' + (basic string), 'year' (a range is computed if + multiple years are found in the record). Note: This + list is likely to increase in the future. + + + + + brief + + + If this is set to 'yes', then the data element is + includes in brief records in the webservice API. Note + that this only makes sense for metadata elements that + are merged (see below). The default value is 'no'. + + + + + sortkey + + + Specifies that this data element is to be used for + sorting. The possible values are 'numeric' (numeric + value), 'skiparticle' (string; skip common, leading + articles), and 'no' (no sorting). The default value is + 'no'. + + + + + rank + + + Specifies that this element is to be used to help rank + records against the user's query (when ranking is + requested). The value is an integer, used as a + multiplier against the basic TF*IDF score. A value of + 1 is the base, higher values give additional weight to + elements of this type. The default is '0', which + excludes this element from the rank calculation. + + + + + termlist + + + Specifies that this element is to be used as a + termlist, or browse facet. Values are tabulated from + incoming records, and a highscore of values (with + their associated frequency) is made available to the + client through the webservice API. The possible values + are 'yes' and 'no' (default). + + + + + merge + + + This governs whether, and how elements are extracted + from individual records and merged into cluster + records. The possible values are: 'unique' (include + all unique elements), 'longest' (include only the + longest element (strlen), 'range' (calculate a range + of values across al matching records), 'all' (include + all elements), or 'no' (don't merge; this is the + default); + + + + + + + + + + + + + + - OPTIONS - - + EXAMPLE + Below is a working example configuration: + + + + + + + + + + - EXAMPLES - + + + + + + + + + + + +]]> + - FILES - + TARGET SETTINGS + + Pazpar2 features a cunning scheme by which you can associate various + kinds of attributes, or settings with search targets. This is done + through XML files; each file can associate one or more settings + with one or more targets. The file format is generic in nature, + designed to support a wide range of application requirements. The + settings can be purely technical things, like, how to perform a title + search against a given target, or it can associate arbitrary name=value + pairs with groups of targets -- for instance, if you would like to + place all commercial full-text bases in one group for selection + purposes, or you would like to control what targets are accessible + to users by default. + + + + During startup, pazpar2 will recursively read a specified directory + (can be identified in the pazpar2.cfg file or on the command line), and + process any settings files found therein. + + + SETTINGS FILE FORMAT + + Each file contains a root element named <settings>. It may + contain one or more <set> elements. The settings and set + elements may contain the following attributes. Attributes in set + overrides those in the setting root element. Each set node must + specify (directly, or inherited from the parent node) at least a + target, name, and value. + + + + + target + + + This specifies the search target to which this setting should be + applied. Targets are identified by their Z39.50 URL, generally + including the host, port, and database name, (e.g. + bagel.indexdata.com:210/marc). Two wildcard forms are accepted: + * (asterisk) matches all known targets; + bagel.indexdata.com:210/* matches all known databases on the given + host. + + + A precedence system determines what happens if there are + overlapping values for the same setting name for the same + target. A setting for a specific target name overrides a + setting whch specifies target using a wildcard. This makes it + easy to set defaults for all targets, and then override them + for specific targets or hosts. If there are + multiple overlapping settings with the same name and target + value, the 'precedence' attribute determines what happens. + + + + + name + + + The name of the setting. This can be anything you like. + However, pazpar2 reserves a number of setting names for + specific purposes, all starting with 'pz:', and it is a good + idea to avoid that prefix if you make up your own setting + names. See below for a list of reserved variables. + + + + + value + + + The value of the setting. Generally, this can be anything you + want -- however, some of the reserved settings may expect + specific kinds of values. + + + + + precedence + + + This should be an integer. If not provided, the default value + is 0. If two (or more) settings have the same content for + target and name, the precedence value determines the outcome. + If both settings have the same precedence value, they are both + applied to the target(s). If one has a higher value, then the + value of that setting is applied, and the other one is ignored. + + + + + + + By setting defaults for target, name, or value in the root + settings node, you can use the settings files in many different + ways. For instance, you can use a single file to set defaults for + many different settings, like search fields, retrieval syntaxes, + etc. You can have one file per server, which groups settings for + that server or target. You could also have one file which associates + a number of targets with a given setting, for instance, to associate + many databases with a given category or class that makes sense + within your application. + + + + + RESERVED SETTING NAMES + + The following setting names are reserved by pazpar2 to control the + behavior of the client function. + + + + + pz:cclmap:xxx + + + This establishes a CCL field definition or other setting, for + the purpose of mapping end-user queries. XXX is the field or + setting name, and the value of the setting provides parameters + (e.g. parameters to send to the server, etc.). Please consult + the YAZ manual for a full overview of the many capabilities of + the powerful and flexible CCL parser. + + + Note that it is easy to etablish a set of default parameters, + and then override them individually for a given target. + + + + + pz:requestsyntax + + + This specifies the record syntax to use when requesting + records from a given server. The value can be a symbolic name like + marc21 or xml, or it can be a Z39.50-style dot-separated OID. + + + + + pz:elements + + + The element set name to be used when retrieving records from a + server. + + + + + pz:piggyback + + + Piggybacking enables the server to retrieve records from the + server as part of the search response in Z39.50. Almost all + servers support this (or fail it gracefully), but a few + servers will produce undesirable results. + Set to '1' to enable piggybacking, '0' to disable it. Default + is 1 (piggybacking enabled). + + + + + pz:nativesyntax + + + The representation of the retrieval records. Currently + recognized values are iso2709 and xml. + + + + + pz:encoding + + + The native encoding of retrieval records. Can be anything + recognized by conv, but typical values are marc8 and latin1. + The default is UTF-8. + + + + + pz:xslt + + + Provides the path of an XSLT stylesheet which will be used to + map incoming records to the internal representation. + + + + + pz:authentication + + + Sets an authentication string for a given server. See the section on + authorization and authentication for discussion. + + + + + pz:allow + + + Allows or denies access to the resources it is applied to. Possible + values are '0' and '1'. The default is '1' (allow access to this resource). + See the manual section on authorization and authentication for discussion + about how to use this setting. + + + + + pz:id + + + This setting can't be 'set' -- it contains the ID (normally + ZURL) for a given target, and is useful for filtering -- + specifically when you want to select one or more specific + targets in the search command. + + + + + + -