org.apache.tika.parser.microsoft
Class ExcelEventParser

java.lang.Object
  extended by org.apache.tika.parser.microsoft.OfficeParser
      extended by org.apache.tika.parser.microsoft.ExcelEventParser
All Implemented Interfaces:
java.io.Serializable, Parser

public class ExcelEventParser
extends OfficeParser
implements java.io.Serializable

Excel parser implementation which uses POI's Event API to handle the contents of a Workbook.

This is an alternative implementation to Tika's ExcelParser implementation which uses POI's HSSFWorkbook to parse excel files.

The Event API uses a much smaller memory footprint than HSSFWorkbook when processing excel files but at the cost of more complexity.

With the Event API a listener is registered for specific record types and those records are created, fired off to the listener and then discarded as the stream is being processed.

See Also:
HSSFListener, POI Event API How To, Serialized Form

Constructor Summary
ExcelEventParser()
          Create an instance which only listens for the specified records (i.e.
ExcelEventParser(boolean listenForAllRecords)
          Create an instance specifying whether to listen for all records or just for the specified few.
 
Method Summary
protected  void extractText(org.apache.poi.poifs.filesystem.POIFSFileSystem filesystem, java.lang.Appendable appendable)
          Extracts text from an Excel Workbook writing the extracted content to the specified Appendable.
protected  java.lang.String getContentType()
          Return the content type handled by this parser.
 
Methods inherited from class org.apache.tika.parser.microsoft.OfficeParser
parse
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ExcelEventParser

public ExcelEventParser()
Create an instance which only listens for the specified records (i.e. listenForAllRecords is false).


ExcelEventParser

public ExcelEventParser(boolean listenForAllRecords)
Create an instance specifying whether to listen for all records or just for the specified few.

Note This constructor is intended primarily for testing and debugging - under normal operation listenForAllRecords should be false.

Parameters:
listenForAllRecords - true if the HSSFListener should be registered to listen for all records or false if the listener should be configured to only receive specified records.
Method Detail

getContentType

protected java.lang.String getContentType()
Return the content type handled by this parser.

Specified by:
getContentType in class OfficeParser
Returns:
The content type handled

extractText

protected void extractText(org.apache.poi.poifs.filesystem.POIFSFileSystem filesystem,
                           java.lang.Appendable appendable)
                    throws java.io.IOException
Extracts text from an Excel Workbook writing the extracted content to the specified Appendable.

Specified by:
extractText in class OfficeParser
Parameters:
filesystem - POI file system
appendable - Where to output the parsed contents
Throws:
java.io.IOException - if an error occurs processing the workbook or writing the extracted content


Copyright © 2008 The Apache Software Foundation. All Rights Reserved.