org.marc4j
Class MarcXmlWriter

java.lang.Object
  extended by org.marc4j.MarcXmlWriter
All Implemented Interfaces:
MarcWriter

public class MarcXmlWriter
extends Object
implements MarcWriter

Class for writing MARC record objects in MARCXML format. This class outputs a SAX event stream to the given OutputStream  or Result object. It can be used in a SAX pipeline to postprocess the result. By default this class uses a nulll transform. It is strongly recommended to use a dedicated XML serializer.

This class requires a JAXP compliant XML parser and XSLT processor. The underlying SAX2 parser should be namespace aware. In addition this class requires ICU4J to perform Unicode normalization. A stripped down version of 2.6 originating from the XOM project is included in this distribution.

The following example reads a file with MARC records and writes MARCXML records in UTF-8 encoding to the console:

  
      InputStream input = new FileInputStream("input.mrc")
      MarcReader reader = new MarcStreamReader(input);
              
      MarcWriter writer = new MarcXmlWriter(System.out, true);
      while (reader.hasNext()) {
          Record record = reader.next();
          writer.write(record);
      }
      writer.close();
   
 

To perform a character conversion like MARC-8 to UCS/Unicode register a CharConverter:

 writer.setConverter(new AnselToUnicode());
 

In addition you can perform Unicode normalization. This is for example not done by the MARC-8 to UCS/Unicode converter. With Unicode normalization text is transformed into the canonical composed form. For example "a´bc" is normalized to "ábc". To perform normalization set Unicode normalization to true:

 writer.setUnicodeNormalization(true);
 

Please note that it's not garanteed to work if you try to convert normalized Unicode back to MARC-8 encoding using UnicodeToAnsel.

This class provides very basic formatting options. For more advanced options create an instance of this class with a SAXResult containing a ContentHandler derived from a dedicated XML serializer.

The following example uses org.apache.xml.serialize.XMLSerializer to write MARC records to XML using MARC-8 to UCS/Unicode conversion and Unicode normalization:

  
      InputStream input = new FileInputStream("input.mrc")
      MarcReader reader = new MarcStreamReader(input);
                
      OutputFormat format = new OutputFormat("xml","UTF-8", true);
      OutputStream out = new FileOutputStream("output.xml");
      XMLSerializer serializer = new XMLSerializer(out, format);
      Result result = new SAXResult(serializer.asContentHandler());
                
      MarcXmlWriter writer = new MarcXmlWriter(result);
      writer.setConverter(new AnselToUnicode());
      while (reader.hasNext()) {
          Record record = reader.next();
          writer.write(record);
      }
      writer.close();
   
 

You can post-process the result using a Source object pointing to a stylesheet resource and a Result object to hold the transformation result tree. The example below converts MARC to MARCXML and transforms the result tree to MODS using the stylesheet provided by The Library of Congress:

  
      String stylesheetUrl = "http://www.loc.gov/standards/mods/v3/MARC21slim2MODS3.xsl";
      Source stylesheet = new StreamSource(stylesheetUrl);
         
      Result result = new StreamResult(System.out);
            
      InputStream input = new FileInputStream("input.mrc")
      MarcReader reader = new MarcStreamReader(input);
      MarcXmlWriter writer = new MarcXmlWriter(result, stylesheet);
      writer.setConverter(new AnselToUnicode());
      while (reader.hasNext()) {
          Record record = (Record) reader.next();
          writer.write(record);
      }
      writer.close();
   
 

It is also possible to write the result into a DOM Node:

  
      InputStream input = new FileInputStream("input.mrc")
      MarcReader reader = new MarcStreamReader(input);
      DOMResult result = new DOMResult();
      MarcXmlWriter writer = new MarcXmlWriter(result);
      writer.setConverter(new AnselToUnicode());
      while (reader.hasNext()) {
          Record record = (Record) reader.next();
          writer.write(record);
      }
      writer.close();
         
      Document doc = (Document) result.getNode();
   
 

Version:
$Revision: 1.9 $
Author:
Bas Peters

Field Summary
protected static String COLLECTION
           
protected static String CONTROL_FIELD
           
protected static String DATA_FIELD
           
protected static String LEADER
           
protected static String RECORD
           
protected static String SUBFIELD
           
 
Constructor Summary
MarcXmlWriter(OutputStream out)
          Constructs an instance with the specified output stream.
MarcXmlWriter(OutputStream out, boolean indent)
          Constructs an instance with the specified output stream and indentation.
MarcXmlWriter(OutputStream out, String encoding)
          Constructs an instance with the specified output stream and character encoding.
MarcXmlWriter(OutputStream out, String encoding, boolean indent)
          Constructs an instance with the specified output stream, character encoding and indentation.
MarcXmlWriter(Result result)
          Constructs an instance with the specified result.
MarcXmlWriter(Result result, Source stylesheet)
          Constructs an instance with the specified stylesheet source and result.
MarcXmlWriter(Result result, String stylesheetUrl)
          Constructs an instance with the specified stylesheet location and result.
 
Method Summary
 void close()
          Closes the writer.
 CharConverter getConverter()
          Returns the character converter.
protected  char[] getDataElement(String data)
           
 boolean getUnicodeNormalization()
          Returns true if this writer will perform Unicode normalization, false otherwise.
 boolean hasIndent()
          Returns true if indentation is active, false otherwise.
 void setConverter(CharConverter converter)
          Sets the character converter.
protected  void setHandler(Result result, Source stylesheet)
           
 void setIndent(boolean indent)
          Activates or deactivates indentation.
 void setUnicodeNormalization(boolean normalize)
          If set to true this writer will perform Unicode normalization on data elements using normalization form C (NFC).
protected  void toXml(Record record)
           
 void write(Record record)
          Writes a Record object to the result.
protected  void writeEndDocument()
          Writes the root end tag to the result.
protected  void writeStartDocument()
          Writes the root start tag to the result.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CONTROL_FIELD

protected static final String CONTROL_FIELD
See Also:
Constant Field Values

DATA_FIELD

protected static final String DATA_FIELD
See Also:
Constant Field Values

SUBFIELD

protected static final String SUBFIELD
See Also:
Constant Field Values

COLLECTION

protected static final String COLLECTION
See Also:
Constant Field Values

RECORD

protected static final String RECORD
See Also:
Constant Field Values

LEADER

protected static final String LEADER
See Also:
Constant Field Values
Constructor Detail

MarcXmlWriter

public MarcXmlWriter(OutputStream out)
Constructs an instance with the specified output stream. The default character encoding for UTF-8 is used.

Throws:
MarcException

MarcXmlWriter

public MarcXmlWriter(OutputStream out,
                     boolean indent)
Constructs an instance with the specified output stream and indentation. The default character encoding for UTF-8 is used.

Throws:
MarcException

MarcXmlWriter

public MarcXmlWriter(OutputStream out,
                     String encoding)
Constructs an instance with the specified output stream and character encoding.

Throws:
MarcException

MarcXmlWriter

public MarcXmlWriter(OutputStream out,
                     String encoding,
                     boolean indent)
Constructs an instance with the specified output stream, character encoding and indentation.

Throws:
MarcException

MarcXmlWriter

public MarcXmlWriter(Result result)
Constructs an instance with the specified result.

Parameters:
result -
Throws:
SAXException

MarcXmlWriter

public MarcXmlWriter(Result result,
                     String stylesheetUrl)
Constructs an instance with the specified stylesheet location and result.

Parameters:
result -
Throws:
SAXException

MarcXmlWriter

public MarcXmlWriter(Result result,
                     Source stylesheet)
Constructs an instance with the specified stylesheet source and result.

Parameters:
result -
Throws:
SAXException
Method Detail

close

public void close()
Description copied from interface: MarcWriter
Closes the writer.

Specified by:
close in interface MarcWriter

getConverter

public CharConverter getConverter()
Returns the character converter.

Specified by:
getConverter in interface MarcWriter
Returns:
CharConverter the character converter

setConverter

public void setConverter(CharConverter converter)
Sets the character converter.

Specified by:
setConverter in interface MarcWriter
Parameters:
converter - the character converter

setUnicodeNormalization

public void setUnicodeNormalization(boolean normalize)
If set to true this writer will perform Unicode normalization on data elements using normalization form C (NFC). The default is false. The implementation used is ICU4J 2.6. This version is based on Unicode 4.0.

Parameters:
normalize - true if this writer performs Unicode normalization, false otherwise

getUnicodeNormalization

public boolean getUnicodeNormalization()
Returns true if this writer will perform Unicode normalization, false otherwise.

Returns:
boolean - true if this writer performs Unicode normalization, false otherwise.

setHandler

protected void setHandler(Result result,
                          Source stylesheet)
                   throws MarcException
Throws:
MarcException

writeStartDocument

protected void writeStartDocument()
Writes the root start tag to the result.

Throws:
SAXException

writeEndDocument

protected void writeEndDocument()
Writes the root end tag to the result.

Throws:
SAXException

write

public void write(Record record)
Writes a Record object to the result.

Specified by:
write in interface MarcWriter
Parameters:
record - - the Record object
Throws:
SAXException

hasIndent

public boolean hasIndent()
Returns true if indentation is active, false otherwise.

Returns:
boolean

setIndent

public void setIndent(boolean indent)
Activates or deactivates indentation. Default value is false.

Parameters:
indent -

toXml

protected void toXml(Record record)
              throws SAXException
Throws:
SAXException

getDataElement

protected char[] getDataElement(String data)


Copyright © 2002-2006 Bas Peters. All Rights Reserved.