org.marc4j
Class MarcPermissiveStreamReader

java.lang.Object
  extended by org.marc4j.MarcPermissiveStreamReader
All Implemented Interfaces:
MarcReader

public class MarcPermissiveStreamReader
extends Object
implements MarcReader

An iterator over a collection of MARC records in ISO 2709 format, that is designed to be able to handle MARC records that have errors in their structure or their encoding. If the permissive flag is set in the call to the constructor, or if a ErrorHandler object is passed in as a parameter to the constructor, this reader will do its best to detect and recover from a number of structural or encoding errors that can occur in a MARC record. Note that if this reader is not set to read permissively, its will operate pretty much identically to the MarcStreamReader class. Note that no attempt is made to validate the contents of the record at a semantic level. This reader does not know and does not care whether the record has a 245 field, or if the 008 field is the right length, but if the record claims to be UTF-8 or MARC8 encoded and you are seeing gibberish in the output, or if the reader is throwing an exception in trying to read a record, then this reader may be able to produce a usable record from the bad data you have. The ability to directly translate the record to UTF-8 as it is being read in is useful in cases where the UTF-8 version of the record will be used directly by the program that is reading the MARC data, for instance if the marc records are to be indexed into a SOLR search engine. Previously the MARC record could only be translated to UTF-8 as it was being written out via a MarcStreamWriter or a MarcXmlWriter.

Example usage:

 InputStream input = new FileInputStream("file.mrc");
 MarcReader reader = new MarcPermissiveStreamReader(input, true, true);
 while (reader.hasNext()) {
     Record record = reader.next();
     // Process record
 }
 

Check the org.marc4j.marc package for examples about the use of the Record object model. Check the file org.marc4j.samples.PermissiveReaderExample.java for an example about using the MarcPermissiveStreamReader in conjunction with the ErrorHandler class to report errors encountered while processing records.

When no encoding is given as an constructor argument the parser tries to resolve the encoding by looking at the character coding scheme (leader position 9) in MARC21 records. For UNIMARC records this position is not defined. If the reader is operating in permissive mode and no encoding is given as an constructor argument the reader will look at the leader, and also at the data of the record to determine to the best of its ability what character encoding scheme has been used to encode the data in a particular MARC record.

Version:
$Revision: 1.3 $
Author:
Robert Haschart

Constructor Summary
MarcPermissiveStreamReader(InputStream input, boolean permissive, boolean convertToUTF8)
          Constructs an instance with the specified input stream with possible additional functionality being enabled by setting permissive and/or convertToUTF8 to true.
MarcPermissiveStreamReader(InputStream input, boolean permissive, boolean convertToUTF8, String defaultEncoding)
          Constructs an instance with the specified input stream with possible additional functionality being enabled by setting permissive and/or convertToUTF8 to true.
MarcPermissiveStreamReader(InputStream input, ErrorHandler errors, boolean convertToUTF8)
          Constructs an instance with the specified input stream with possible additional functionality being enabled by passing in an ErrorHandler object and/or setting convertToUTF8 to true.
MarcPermissiveStreamReader(InputStream input, ErrorHandler errors, boolean convertToUTF8, String defaultEncoding)
          Constructs an instance with the specified input stream with possible additional functionality being enabled by setting permissive and/or convertToUTF8 to true.
 
Method Summary
 boolean hasNext()
          Returns true if the iteration has more records, false otherwise.
 Record next()
          Returns the next record in the iteration.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MarcPermissiveStreamReader

public MarcPermissiveStreamReader(InputStream input,
                                  boolean permissive,
                                  boolean convertToUTF8)
Constructs an instance with the specified input stream with possible additional functionality being enabled by setting permissive and/or convertToUTF8 to true. If permissive and convertToUTF8 are both set to false, it functions almost identically to the MarcStreamReader class.


MarcPermissiveStreamReader

public MarcPermissiveStreamReader(InputStream input,
                                  ErrorHandler errors,
                                  boolean convertToUTF8)
Constructs an instance with the specified input stream with possible additional functionality being enabled by passing in an ErrorHandler object and/or setting convertToUTF8 to true. If errors and convertToUTF8 are both set to false, it functions almost identically to the MarcStreamReader class. If an ErrorHandler object is passed in, that object will be used to log and track any errors in the records as the records are decoded. After the next() function returns, you can query to determine whether any errors were detected in the decoding process. See the file org.marc4j.samples.PermissiveReaderExample.java to see how this can be done.


MarcPermissiveStreamReader

public MarcPermissiveStreamReader(InputStream input,
                                  boolean permissive,
                                  boolean convertToUTF8,
                                  String defaultEncoding)
Constructs an instance with the specified input stream with possible additional functionality being enabled by setting permissive and/or convertToUTF8 to true. If permissive and convertToUTF8 are both set to false, it functions almost identically to the MarcStreamReader class. The parameter defaultEncoding is used to specify the character encoding that is used in the records that will be read from the input stream. If permissive is set to true, you can specify "BESTGUESS" as the default encoding, and the reader will attempt to determine the character encoding used in the records being read from the input stream. This is especially useful if you are working with records downloaded from an external source and the encoding is either unknown or the encoding is different from what the records claim to be.


MarcPermissiveStreamReader

public MarcPermissiveStreamReader(InputStream input,
                                  ErrorHandler errors,
                                  boolean convertToUTF8,
                                  String defaultEncoding)
Constructs an instance with the specified input stream with possible additional functionality being enabled by setting permissive and/or convertToUTF8 to true. If errors and convertToUTF8 are both set to false, it functions almost identically to the MarcStreamReader class. The parameter defaultEncoding is used to specify the character encoding that is used in the records that will be read from the input stream. If permissive is set to true, you can specify "BESTGUESS" as the default encoding, and the reader will attempt to determine the character encoding used in the records being read from the input stream. This is especially useful if you are working with records downloaded from an external source and the encoding is either unknown or the encoding is different from what the records claim to be. If an ErrorHandler object is passed in, that object will be used to log and track any errors in the records as the records are decoded. After the next() function returns, you can query to determine whether any errors were detected in the decoding process. See the file org.marc4j.samples.PermissiveReaderExample.java to see how this can be done.

Method Detail

hasNext

public boolean hasNext()
Returns true if the iteration has more records, false otherwise.

Specified by:
hasNext in interface MarcReader

next

public Record next()
Returns the next record in the iteration.

Specified by:
next in interface MarcReader
Returns:
Record - the record object


Copyright © 2002-2006 Bas Peters. All Rights Reserved.