Honor position attribute, i.e. allow first-in-field search. To

[idzebra-moved-to-github.git] / NEWS
diff --git a/NEWS b/NEWS

index 93615f1..e141a43 100644 (file)
--- a/NEWS
+++ b/NEWS
@@ -1,588 +1,51 @@
-Added mechanism to ignore leading articles when doing full-field indexing,
-based on the character map files. See the manual for further discussion.
+Honor position attribute, i.e. allow first-in-field search. To
+enable this, "firstinfield 1" must be given for an index in
+default.idx. Enabled in tab/default.idx for w. At this stage
+first-in field is only supported for phrase searches (including
+simple words).
  
-Fixed bug in record management. Releasing blocks could result in
-partial read.
+Common stream reader interface for record filters (struct ZebraRecStream).
  
-Fixed bug in isam:b. A tree split could result in a lost item.
+Debian package fix: packages idzebra-2.0 + libidzebra-2.0-modules did
+not depend properly on sub packages.
  
-Remove isamd. It's not been in use for a long time and isamb is better
-in most cases.
+Experimental segment facility (for matching of words within one
+field/segment).
  
-Change SYSNO to be zint. Change pointers in isamc and isamb to zint.
-Change block number in bfile/cfile to zint. zint is a long integer
-(64-bit). This change practially removes register limits for Zebra.
+--- 2.0.0 2006/08/14
  
-Implement int-list encoding for ISAMs.
+New record filter (record type) 'alvis' which uses XSLT transformations
+to drive both indexing as well as retrieval. See example configuration in the
+'example/alvis-oai' directory.
  
-Added facility to make attibutes in grs.regx and grs.tcl filter using the
-data command with argument -attribute <name> . The content of data is
-the value of the attribute. This command should be used inside a
-begin element , end element section.
+'isamb' is now the default ISAM system. In Zebra 1.3, the default ISAM was
+'isamc'. The type used can still be configured with the 'isam' setting
+in 'zebra.cfg'.
  
-Update zebra.nsi to NSIS 2.
+Index structure is now 64-bit based, also on 32 bit systems. 
+There are no more 2GB register file limits.
  
-Added a new 'cut' directive to charmaps (.chr files) which specifies that
-only characters after the cutting char should be indexed.
+Extended search result tuning. Approximate limit for terms can be enabled
+and specified with attribute 11. The (approx or exact) hit count is returned
+as part of the search response as in 1.3 series. The subqueryID of a search
+term hit count can be specified with attribute 10.
  
-Update Perl internals so that it matches the current Zebra API.
-The recordGroup structure is no longer available. A group of resources
-can still be referenced by setting groupName=>..  in various methods.
+Zebra uses string attributes for indexing internally. Using set+numeric
+use attribute can still be used. This is a search-only conversion which
+inspects '*.att'-set files as indicated using attset-directives in 'zebra.cfg'.
+'attset' references are no longer required, but when used they deserve
+as "check" for that the index names used are also present in '*.att'.
  
-Maximum number of records to be sorted in a result set can be
-specified by setting "sortmax". Default is 1000.
+Zebra record filters (record type handlers) may be built as loadable
+modules (.so's) on Unix. In particular the Zebra 2.0 Debian package uses
+separate packages for each of them. This also means that zebra programs
+such as zebraidx is no longer depending on Tcl/other..
  
-Allow use of string use attributes for regular attribute sets. The
-name matches the name given in the attribute set file. All strings
-starting with / are considered X-Path as usual.
+Documentation updates, especially on query structure and syntax, SRU, 
+XSLT support, alvis filter module, and many added examples.
  
-Fixed bug in grs.regx. filter . 'end element' could pop off top tag
-element for XML tree. It may only pop off if -record is given.
+Improved logging of the 'zebrasrv' and 'zebraidx' binaries.
  
-Added grs.danbib filter - for Danish Bibliographic Centre.
+Improved debian package structure.
  
-Rename CHANGELOG to NEWS.
-
-For text filter, return only header if elementSetName=H . elementSetName=R
-returns contents only. Other elementSetName returns both header+content.
-
-Added test for charmap and rusmarc.
-
-Added feature charmaps (.chr) so that characters may be specified in
-\LXXXX HEX notation.
-
-Fixed problem with encoding directive for charmap(.chr) files.
-
-Allow Remote insert/delete/replace/update with record, recordIdNumber
-(sysno) and/or recordIdOpaque(user supplied record Id). If both
-IDs are omitted internal record ID match is assumed (recordId: - in
-zebra cfg).
-
---- 1.3.15 2004/01/15
-
-Fix bug. X-Path attribute expressions with spaces in them now works.
-
-Fix base address for MARC output.
-
---- 1.3.14 2003/11/29
-
-Fix bug with shadow and result set handling.
-
-Implement MARCXML to ISO2709 conversion.
-
---- 1.3.13 2003/09/26
-
-Add missing examples for Windows install.
-
-Fix bug in regx filter to make it "greedy" again. This bug appeared
-in version 1.3.12.
-
-Fix a few tests.
-
---- 1.3.12 2003/09/08
-
-Fix XML error handling. Stop XML parse immediately if XML parse error
-occur (i.e.  produce one error only).
-
-Zebra ignores "unsupported use attribute" for individual databases
-when search multiple databases (unless all databases fail).
-
-New filter grs.marcxml which works like grs.marc but produces MARCXML.
-
-Added support for database deletion. It is possible to create/drop
-a database from zebraidx utility. Note: only for isam:b.
-
-Write zebrasrv.pid to lockdir.
-
-Bug fix: result sets were not recovered correctly. Had to
-add ODR handle for zebra_search_RPN in order to make it work.
-
-Fixed a bug in regx filters that didn't do anchors (^) correctly.
-
-Fixed a bug in searches with X-Path searches sometimes giving "extra"
-hits.
-
-Zebra server checks for zebrasrv.pid and refuses to start if it is already
-locked by another (running) zebrasrv.
-
-Fixed a bug with text being chunked in pieces for the grs.xml filter.
-
---- 1.3.11 2003/04/25
-
-xelm code updates. xelm works regardless state of 'xpath enable/disable'
-Avoid -L/usr/lib since that is already default library path.
-
-Allow multiple updates within one transaction.
-
-Fixed a bug with >2GB files (overflow in integer expression).
-
---- 1.3.10 2003/04/01
-
-Fix linker error for Perl module.
-
-Fix bug in and operation which in some cases could result in "extra"
-hits. Bug was introduced in 1.3.5.
-
-Fix bug in handling of schema conversion when producing numeric tags.
-
---- 1.3.9 2003/03/27
-
-Zvrank updates. 
-
-Add missing files doc/zvrank.txt and doc/marc_indexing.xml.
-
---- 1.3.8 2003/03/26
-
-Zvrank: an experimental ranking algorithm. See doc/zvrank.txt and
-source in index/zvrank.c. Enable this by using rank: zvrank in zebra.cfg.
-Contributed by Johannes Leveling <Johannes.Leveling at fernuni-hagen.de>
-
-livrank: another experimental ranking algorithm. Source in livcode.c.
-Enable this by using rank: livrank in zebra.cfg and use -DLIV_CODE=1
-for CFLAGS.
-Contributed by Pete Mallinson, University of Liverpool. 
-
-Advanced MARC indexing. See doc/marc_indexing.xml
- Oleg Kolobov <oleg at lib.tpu.ru>
-
-Perl API updates and fixes. 
- Peter Popovics <pop at technomat.hu>
-
-Fixed 'zebraidx delete'.
-
-Implemented 'zebraidx clean'.
-
-64-bit offsets for register files on WIN32 (no 2 GB limit).
-
-Fixed a few memory leaks WRT sorting.
-
---- 1.3.7 2003/02/27
-
-Fixed error handling : error code was not properly returned.
-
-Support Truncation 104 (CCL).
-
---- 1.3.6 2003/02/25
-
-Added missing source files for perl extension.
-
---- 1.3.5 2003/02/23
-
-Implemented xelm directive.
-
-Updated for newer version of YAZ (introduction of string schema).
-
-Directory examples/zthes now part of distribution (was missing
-in previous release).
-
-New .abs directive, systag, that control where to put retrieval
-information. The directive takes two arguments: system tag, element name.
-System tag is one of : rank, sysno, size.
-
---- 1.3.4 2002/11/26
-
-Perl Filter and Perl API. By Peter Popovics.
-
-For zebra.cfg, if no profilePath is specified, directory
- (prefix)/share/idzebra/tab
-is used.
-
-Zebra Examples in examples . Zebra tests in test.
-
-Bug fix: sort index was not properly modified on 
-record updates/deletes.
-
-Fix handling of character entities for sgml filter.
-
-Move data1 to Zebra (used to be part of YAZ).
-
---- 1.3.3 2002/10/05
-
-Fix character encoding of scan response terms.
-
-Fix character decoding of scan request terms.
-
-Fix ESpec handling (requires YAZ 1.9.1)
-
-Fix searches for complete fields.
-
---- 1.3.2 2002/09/09
-
-When name zebra is used in a filename or directory 'idzebra' is used
-instead to avoid confusion with GNU zebra (routing software).
-
-Zebra server stops with a fatal error if config file cannot be read.
-
-New config setting, followLinks, that controls whether update of files
-should follow symbolic. Set it to 1 (for enable) or 0 (to disable).
-By default symbolic links are followed.
-
-Fix MARC transfer . MARC fields had wrong data for multiple fields.
-
-XML record reader moved from YAZ to Zebra, to make YAZ less 
-dependant on external libraries.
-
-Zebra uses yaz_iconv which is mini iconv library supporting UTF-8,
-UCS4, ISO-8859-1. This means that Zebra does UNICODE even
-on systems that doesn't offer iconv.
-
-XML record reader supports external system entities.
-
---- 1.3.1 2002/08/20
-
-New .abs-directive "xpath" that takes one argument: "enable"
-or "disable" to enable and disable XPath -indexing. If no "xpath"
-direcive is found in .abs-file , XPath-indexing is disabled to ensure
-backwards compatibility. For missing .abs-files XPath-indexing is
-enabled so that such records are searchable.
-
-Zebra warns about missing .abs-file only once (for each type).
-
-Fixed a bug in file update where already-inserted files could
-be treated as "new".
-
---- 1.3.0 2002/08/05
-
-Zebra license changed to GNU GPL.
-
-XPath-like queries used when RPN string attributes are used, eg.
-   @attr 1=/portal/title sometitle
-   @attr 1=/portal/title[@xml:lang=da] danishtitle
-   @attr 1=/portal/title/@xml:lang da
-   @attr 1=//title sometitle
-
-Zebra uses UTF-8 internally:
-1) New setting "encoding" for zebra.cfg that specifies encoding for
-OCTET terms in queries and record encoding for most transfer syntaxes
-(except those that use International Strings, such as GRS-1).
-2) The encoding of International strings is UTF-8 by default. It
-may be changed by character set negotiation. If character set
-negotiation is in effect and if records are selected for conversion
-these'll be converted to the selected character set - thus overriding
-the encoding setting in zebra.cfg.
-3) New directive "encoding" in .abs-files. This specifies the external
-character encoding for files indexed by zebra. However, if records
-themselves have an XML header that specifies and encoding that'll be used
-instead.
-
-XML filter (-t grs.xml).
-
-Multiple registers. New setting in resource 'root' that holds base
-directory for register(s). A group a databases may be put in separate
-register in directory root/reg by using db name 'reg/db1' ... 'reg/dbN'.
-
---- 1.1.1 2002/03/21
-
-Fixes for Digital Unix
-
-Implemented hits per term using USR:SearchResult-1.
-
-New Zebra API. Locking system re-implemented.
-
---- 1.1.stable 2002/02/20
-
-Rank weight can be controlled with attribute type 9. Default
-value is 34. Recommended values between 1-36.
-
---- 1.1 2001/10/25
-
-Updated for YAZ version 1.8.
-
-Added support for termsets - a result set of terms matching
-a given query. For @attr 8=<set> creates termset named <set>.
-
-Added support for raw retrieval. Element Set Name R forces the
-text filter which returns the record in its original form.
-
-Added numerical sort - triggered by structure=numeric (4=109).
-
-Remote record import using Z39.50 Extended Services and Segments.
-
-Fixed bug where updating a database with user-defined attributes
-could corrupt the register (bad storeKeys).
-
-Multi-threaded version.
-
-Fixed bug regarding proximity.
-
-Documentation updates.
-
-Fixed bug in record retrieval module that occured on 64-bit OSF 
-architectures.
-
---- 1.0.1 2000/2/10
-
-Fixed bug in makefile for WIN32.
-
-Fixed bug in configure script - used bash-specific features.
-
---- 1.0 1999/12/10
-
-Added support for multiple records in one file for filter grs.sgml.
-
-Changed record index structure. New layout is incompatible with
-previous releases. Added setting "recordcompression" to control
-compression of records. Possible values are "none" (no
-compression) and bzip2 (compression using libbz2).
-
-Added XML transfer syntax support for retrieval of structured records.
-Schema in CompSpec is recognised in retrieval of structured records.
-
-Changed Tcl record filter so that it attemps to read  <filt>.tflt. If
-that fails, the filter reads the file <filt>.flt (regx style filter).
-
-Implemented new Tcl record filter -  use grs.tcl.<filter> to enable it.
-Zebra's configure script automatically attempts to locate Tcl. For
-manual Tcl configuration use option --with-tclconfig=<path> to specify
-where Tcl's library files are located.
-
-Implemented "compression" of Dictionary and ISAM system. Dictionary
-format HAS changed.
-
-Added "tagsysno" directive to zebra.cfg to control under which tag the
-system ID is placed. Use tagsysno: 0 to disable Zebra's system number
-entirely.
-
-Added "tagrank" as above.
-
-Changed file naming scheme for register files from <name>.mf.<no> to
-<name>-<no>.mf.
-
-Implemented "position"-flag for register type (as defined in
-default.idx). When set to zero no position (or seqence number) is
-saved in register for each word occurrence, thus saving some register
-space.
-
-Implemented database mapping. Using mapdb one can specify a database
-to be mapped to one or more physical databases. Usage:
-mapdb <fromdb> <todb> ..
-
-Added SOIF-filter. Thanks to Peter Valkenburg.
-
-For the regx-filter "end element -record" may trigger a mark-of-record
-if outer level is reached.
-
-Tag sets may be typed in the reference to it. From the .abs-file the
-"tagset" directive takes a third optional integer type for the tag set
-referenced. From a .tag-file the "include" directive takes a third
-optional type as well. The old "type" directive in the tag set itself
-is still recognized but acts as the default type for the tag set.
- 
-Zebra supports the specification of arbitrary attributes sets, schemas
-and tag sets, because of the change in YAZ' OID management system.
-
-Fixed bug in Sort that caused it NOT to use character mapping as it
-should.
-
-Zebra now uses GNU configure to generate Makefile(s).
-
-Added un-optimised support for left and left/right truncation attributes.
-
-Added support for relational operators on text when using RPN queries.
-
-Added support for sort specifications in RPN queries. Type 7 specifies
-'sort' where value 1=ascending, value 2=descending. The use attribute
-specifies the field criteria as usual.  The term specifies priority
-where 0=first, 1=second, ...
-
-Changed the way use attributes are specified in the recordId
-specification.
-
-Maximum number of databases in one Zebra register increased.
-
-New setting, databasePath, which specifies that first directory during
-update traversal is the database name (instead of a fixed one).
-
-New setting, explainDatabase, which specifies that databases are
-EXPLAIN aware.
-
-Modified Zebra so that it works with ASN.1 compiled code for YAZ.
-
-Implemented EXPLAIN database maintenance. Zebra automatically
-generate - and update CategoryList, TargetInfo, DatabaseInfo,
-AttributeSetInfo and AttributeDetails records at this stage. The
-records may be transferred as GRS-1, SUTRS or Explain.
-
-Fixed register spec so that colon isn't treated as size separator
-unless followed by [0-9+-] in order to allow DOS drive specifications.
-
-Fixed two bugs in ISAMC system.
-
-Changed the way Zebra keeps its maintenance information about attribute
-sets, available attributes, etc.. Records in "SGML" notation using an
-EXPLAIN schema is now used when appropriate.
-
-Bug fix: Index didn't handle update/insert/delete of the same record
-(i.e. same recordId) in one run (one invocation of zebraidx). Only the
-first occurence of a record is considered.
-
-Most searches now return correct number of hits.
-
-New modular ranking system. Interested programmers are encouraged to
-inspect rank1.c and improve the algorithm.
-
-Bug fix: Lock files weren't removed as they should on NT.
-
-Implemented Z39.50 Sort. Zebra's sort handler uses use attributes to
-specify a "sort register". Refer to the gils sample records which refer
-to index type "s" which is specified as "sort" in the default.idx file.
-Each sort criteria can either be Ascending or Descending and at most
-three sort elements can be specified.  
-
-Bug fix: Character mapping didn't work for text files.
-
---- 1.0b1 1998/1/29
-
-Simple ranked searches now return correct number of hits.
-
-The test option (-s) only makes a read-lock on the index as well
-as using read-only operations anywhere.
-
-Moved towards generic character mapping. Configuration file default.idx
-specifies character map files for register types w, p, u, etc.
-
-Implemented "begin variant" for the sgml.regx - filter.
-
-Fixed a few memory leaks.
-
-Added support for C++, headers uses extern "C" for public definitions.
-
-Bug fix: The show records facility (-s) only displayed information for
-the first record in a file (and not for every record in the file).
-
-Added option "-f <n>" to limit the logging of record operations. After
-<n> records has been processed no logging is performed (unless errors
-occur).
-
-Bug fix: the compressed ISAM system didn't handle update operations
-correctly.
-
-Added setting, "maxResultSetSize", to hold the number of records to 
-save in a result set.
-
-Bug fix: Complete phrase did't work for search operations.
-
-Bug fix: temporary result sets weren't deleted.
-
-Reduced disk space for saved keys (storeKeys = 1).
-
-Added optional, physical ANY (key replication)
-
-Implemented proximity operator in search.
-
-Bug fix: the path name buffers used by file match traversal routines
-have been extended to support long file names.
-
-New C(ompressed) ISAM system. To enable it, specify "isam: c" in the
-configuration file. The resulting register without "storeKeys" is about
-half the size, and the memory used by zebraidx during phase 2 (merge) is
-reduced to a minimum.
-
-Reworked the way Regexp-2 queries with error tolerance are handled and
-specified. The documentation has been updated accordingly.
-
-Bug fix: Zebrasrv didn't search correctly when queries contained masking
-characters. This bug was introduced in 1.0a8.
-
-Zebrasrv now tag records with the proper database name.
-
-New settings, memMax and keyTmpDir.
-
-Changed name of setting lockDir (previously called lockPath) and
-setTmpDir (previously called tempSetPath).
-
-Generalized and changed record type specifications. In short, there are:
-       text                plain SUTRS
-       grs.sgml            structured, "SGML-like" syntax
-       grs.regx.<filter>   structured, Regular expression filter
-       grs.marc.<abs>      Reads *MARC records in the ISO2709 format. <abs>
-                           is the name of an abstract syntax file.                           
-Bug fix: Result sets weren't sorted in operations involving boolean
-operations with "ranked" operands.
-
---- 1.0a8 1996/6/6
-
-Added national character-handling subsystem.
-
-Various fixes.
-
-Small modifications to input filters and profiles.
-
-Added support for SOIF syntax (with private OID).
-
---- 1.0a7 1996/5/16
-
-Fixed buffer-size problem in indexing.
-
-Added compression to temporary files for updating.
-
-Added phrase registers.
-
-Added dynamic mapping of search attribute to multiple termlists (ANY).
-
-Scan support in multiple databases/registers.
-
-Configuration settings are case-insensitive and single dash (-)
-characters are ignored in comparisons.
-
-The index processing ignores empty files - warning given.
-
-New option to zebraidx (-V) displays version information.
-
---- 1.0a6 1996/2/24
-
-Fixed problem in file-update system.
-
-Fixed problem in shadow system; register was sometimes corrupted after
-a commit operation.
-
---- 1.0a5 1996/2/10
-
-Fixed problems in the ISAM subsystem. Caused difficulties when updating
-existing registers.
-
-Fixed small problem in SUTRS-filter. A newline was sometimes inserted before
-the rank and record number.
-
-Fixed bug in the isam subsystem - caused a malfunction when accessing
-words which occurred more than 10000 times.
-
-Distribution should now include YAZ (Z39.50 protocol stack) to simplify
-installation.
-
-Server can now run under inetd. Use option -i, and -w <directory> to
-set working directory to desired location.
-
-New zebraidx command: clean - removes temporary shadow files.
-
-Fixed bug in ISAM system. Occurred rarely during register updates.
-
-Logging during index merge phase is improved. The remaining running
-time is estimated.
-
-Temporary files generated by zebraidx are removed after each run.
-
-Bug fix: Dictionary didn't handle 8-bit characters correctly; was obvious
-when doing scan operations in dictionaries with European characters.
-
---- 1.0a4 1996/01/11
-
-A whole slew of updates, to make the first publicized release. Get the doc
-and check it out.
-
---- 1.0a3 1995/12/06
-
-Memory-problems in ISAM fixed. More blocktypes added to the default setup
-to increase performance on larger databases.
-
-Various minor changes in data management system.
-
---- 1.0a2 1995/12/05
-
-A couple of portability-problems resolved.
-
-Changed some malloc() to xmalloc().
-
---- 1.0a1 1995/11/28
-
-First release.
+--- 1.3.16 2004/08/16