X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Fzebrasrv.xml;fp=doc%2Fzebrasrv.xml;h=43c45c465fedb66fd5e1a28669e1a9b9d3d4c489;hb=7b25277add2aae5caabee02213911aeeb65030c8;hp=0000000000000000000000000000000000000000;hpb=7c4f374dbf6bc1a32ff560389c05a2252eaa28ce;p=idzebra-moved-to-github.git diff --git a/doc/zebrasrv.xml b/doc/zebrasrv.xml new file mode 100644 index 0000000..43c45c4 --- /dev/null +++ b/doc/zebrasrv.xml @@ -0,0 +1,509 @@ + + %local; + + %entities; + + %common; +]> + + + + ZEBRA + &version; + + + + zebrasrv + 8 + + + + zebrasrv + Zebra Server + + + + &zebrasrv-synopsis; + + DESCRIPTION + Zebra is a high-performance, general-purpose structured text indexing + and retrieval engine. It reads structured records in a variety of input + formats (eg. email, XML, MARC) and allows access to them through exact + boolean search expressions and relevance-ranked free-text queries. + + + zebrasrv is the Z39.50 and SRU frontend + server for the Zebra search engine and indexer. + + + On Unix you can run the zebrasrv + server from the command line - and put it + in the background. It may also operate under the inet daemon. + On WIN32 you can run the server as a console application or + as a WIN32 Service. + + + + OPTIONS + + + The options for zebrasrv are the same + as those for YAZ' yaz-ztest. + Option -c specifies a Zebra configuration + file - if omitted zebra.cfg is read. + + + &zebrasrv-options; + + + + Z39.50 Protocol Support and Behavior + + + Z39.50 Initialization + + + During initialization, the server will negotiate to version 3 of the + Z39.50 protocol, and the option bits for Search, Present, Scan, + NamedResultSets, and concurrentOperations will be set, if requested by + the client. The maximum PDU size is negotiated down to a maximum of + 1 MB by default. + + + + + + Z39.50 Search + + + The supported query type are 1 and 101. All operators are currently + supported with the restriction that only proximity units of type "word" + are supported for the proximity operator. + Queries can be arbitrarily complex. + Named result sets are supported, and result sets can be used as operands + without limitations. + Searches may span multiple databases. + + + + The server has full support for piggy-backed retrieval (see + also the following section). + + + + + + Z39.50 Present + + The present facility is supported in a standard fashion. The requested + record syntax is matched against the ones supported by the profile of + each record retrieved. If no record syntax is given, SUTRS is the + default. The requested element set name, again, is matched against any + provided by the relevant record profiles. + + + + Z39.50 Scan + + The attribute combinations provided with the termListAndStartPoint are + processed in the same way as operands in a query (see above). + Currently, only the term and the globalOccurrences are returned with + the termInfo structure. + + + + Z39.50 Sort + + + Z39.50 specifies three different types of sort criteria. + Of these Zebra supports the attribute specification type in which + case the use attribute specifies the "Sort register". + Sort registers are created for those fields that are of type "sort" in + the default.idx file. + The corresponding character mapping file in default.idx specifies the + ordinal of each character used in the actual sort. + + + + Z39.50 allows the client to specify sorting on one or more input + result sets and one output result set. + Zebra supports sorting on one result set only which may or may not + be the same as the output result set. + + + + Z39.50 Close + + If a Close PDU is received, the server will respond with a Close PDU + with reason=FINISHED, no matter which protocol version was negotiated + during initialization. If the protocol version is 3 or more, the + server will generate a Close PDU under certain circumstances, + including a session timeout (60 minutes by default), and certain kinds of + protocol errors. Once a Close PDU has been sent, the protocol + association is considered broken, and the transport connection will be + closed immediately upon receipt of further data, or following a short + timeout. + + + + + Z39.50 Explain + + Zebra maintains a "classic" + Z39.50 Explain database + on the side. + This database is called IR-Explain-1 and can be + searched using the attribute set exp-1. + + + The records in the explain database are of type + grs.sgml. + The root element for the Explain grs.sgml records is + explain, thus + explain.abs is used for indexing. + + + + Zebra must be able to locate + explain.abs in order to index the Explain + records properly. Zebra will work without it but the information + will not be searchable. + + + + + + The SRU Server + + In addition to Z39.50, Zebra supports the more recent and + web-friendly IR protocol SRU. + SRU can be carried over SOAP or a REST-like protocol + that uses HTTP GET or POST to request search responses. The request + itself is made of parameters such as + query, + startRecord, + maximumRecords + and + recordSchema; + the response is an XML document containing hit-count, result-set + records, diagnostics, etc. SRU can be thought of as a re-casting + of Z39.50 semantics in web-friendly terms; or as a standardisation + of the ad-hoc query parameters used by search engines such as Google + and AltaVista; or as a superset of A9's OpenSearch (which it + predates). + + + Zebra supports Z39.50, SRU GET, SRU POST, SRU SOAP (SRW) + - on the same port, recognising what protocol is used by each incoming + requests and handling them accordingly. This is a achieved through + the use of Deep Magic; civilians are warned not to stand too close. + + + Running zebrasrv as an SRU Server + + Because Zebra supports all protocols on one port, it would + seem to follow that the SRU server is run in the same way as + the Z39.50 server, as described above. This is true, but only in + an uninterestingly vacuous way: a Zebra server run in this manner + will indeed recognise and accept SRU requests; but since it + doesn't know how to handle the CQL queries that these protocols + use, all it can do is send failure responses. + + + + It is possible to cheat, by having SRU search Zebra with + a PQF query instead of CQL, using the + x-pquery + parameter instead of + query. + This is a + non-standard extension + of CQL, and a + very naughty + thing to do, but it does give you a way to see Zebra serving SRU + ``right out of the box''. If you start your favourite Zebra + server in the usual way, on port 9999, then you can send your web + browser to: + + + http://localhost:9999/Default?version=1.1 + &operation=searchRetrieve + &x-pquery=mineral + &startRecord=1 + &maximumRecords=1 + + + This will display the XML-formatted SRU response that includes the + first record in the result-set found by the query + mineral. (For clarity, the SRU URL is shown + here broken across lines, but the lines should be joined to gether + to make single-line URL for the browser to submit.) + + + + In order to turn on Zebra's support for CQL queries, it's necessary + to have the YAZ generic front-end (which Zebra uses) translate them + into the Z39.50 Type-1 query format that is used internally. And + to do this, the generic front-end's own configuration file must be + used. See ; + the salient point for SRU support is that + zebrasrv + must be started with the + -f frontendConfigFile + option rather than the + -c zebraConfigFile + option, + and that the front-end configuration file must include both a + reference to the Zebra configuration file and the CQL-to-PQF + translator configuration file. + + + A minimal front-end configuration file that does this would read as + follows: + + + + + zebra.cfg + ../../tab/pqf.properties + + +]]> + + The + <config> + element contains the name of the Zebra configuration file that was + previously specified by the + -c + command-line argument, and the + <cql2rpn> + element contains the name of the CQL properties file specifying how + various CQL indexes, relations, etc. are translated into Type-1 + queries. + + + A zebra server running with such a configuration can then be + queried using proper, conformant SRU URLs with CQL queries: + + + http://localhost:9999/Default?version=1.1 + &operation=searchRetrieve + &query=title=utah and description=epicent* + &startRecord=1 + &maximumRecords=1 + + + + + SRU Protocol Support and Behavior + + Zebra running as an SRU server supports SRU version 1.1, including + CQL version 1.1. In particular, it provides support for the + following elements of the protocol. + + + + SRU Search and Retrieval + + Zebra supports the + SRU searchRetrieve + operation. + + + One of the great strengths of SRU is that it mandates a standard + query language, CQL, and that all conforming implementations can + therefore be trusted to correctly interpret the same queries. It + is with some shame, then, that we admit that Zebra also supports + an additional query language, our own Prefix Query Format + (PQF). + A PQF query is submitted by using the extension parameter + x-pquery, + in which case the + query + parameter must be omitted, which makes the request not valid SRU. + Please don't do this. + + + + + SRU Scan + + Zebra supports SRU scan + operation. + Scanning using CQL syntax is the default, where the + standard scanClause parameter is used. + + + In addition, a + mutant form of SRU scan is supported, using + the non-standard x-pScanClause parameter in + place of the standard scanClause to scan on a + PQF query clause. + + + + + SRU Explain + + Zebra supports SRU explain. + + + The ZeeRex record explaining a database may be requested either + with a fully fledged SRU request (with + operation=explain + and version-number specified) + or with a simple HTTP GET at the server's basename. + The ZeeRex record returned in response is the one embedded + in the YAZ Frontend Server configuration file that is described in the + . + + + Unfortunately, the data found in the + CQL-to-PQF text file must be added by hand-craft into the explain + section of the YAZ Frontend Server configuration file to be able + to provide a suitable explain record. + Too bad, but this is all extreme + new alpha stuff, and a lot of work has yet to be done .. + + + There is no linkeage whatsoever between the Z39.50 explain model + and the SRU explain response (well, at least not implemented + in Zebra, that is ..). Zebra does not provide a means using + Z39.50 to obtain the ZeeRex record. + + + + + Other SRU operations + + In the Z39.50 protocol, Initialization, Present, Sort and Close + are separate operations. In SRU, however, these operations do not + exist. + + + + + SRU has no explicit initialization handshake phase, but + commences immediately with searching, scanning and explain + operations. + + + + + Neither does SRU have a close operation, since the protocol is + stateless and each request is self-contained. (It is true that + multiple SRU request/response pairs may be implemented as + multiple HTTP request/response pairs over a single persistent + TCP/IP connection; but the closure of that connection is not a + protocol-level operation.) + + + + + Retrieval in SRU is part of the + searchRetrieve operation, in which a search + is submitted and the response includes a subset of the records + in the result set. There is no direct analogue of Z39.50's + Present operation which requests records from an established + result set. In SRU, this is achieved by sending a subsequent + searchRetrieve request with the query + cql.resultSetId=id where + id is the identifier of the previously + generated result-set. + + + + + Sorting in CQL is done within the + searchRetrieve operation - in v1.1, by an + explicit sort parameter, but the forthcoming + v1.2 or v2.0 will most likely use an extension of the query + language, CQL sorting. + + + + + It can be seen, then, that while Zebra operating as an SRU server + does not provide the same set of operations as when operating as a + Z39.50 server, it does provide equivalent functionality. + + + + + + SRU Examples + + Surf into http://localhost:9999 + to get an explain response, or use + + + + See number of hits for a query + + + + Fetch record 5-7 in Dublin Core format + + + + Even search using PQF queries using the extended naughty + verb x-pquery + + + + Or scan indexes using the extended extremely naughty + verb x-pScanClause + + Don't do this in production code! + But it's a great fast debugging aid. + + + + + YAZ server virtual hosts + &zebrasrv-virtual; + + + SEE ALSO + + + zebraidx + 1 + + + + + +