|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.apache.tika.parser.microsoft.OfficeParser
public abstract class OfficeParser
Defines a Microsoft document content extractor.
| Constructor Summary | |
|---|---|
OfficeParser()
|
|
| Method Summary | |
|---|---|
protected abstract void |
extractText(org.apache.poi.poifs.filesystem.POIFSFileSystem filesystem,
java.lang.Appendable appendable)
Extracts the text content from a Microsoft document input stream. |
protected abstract java.lang.String |
getContentType()
The content type of the document being parsed. |
void |
parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata)
Extracts properties and text from an MS Document input stream |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public OfficeParser()
| Method Detail |
|---|
public void parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata)
throws java.io.IOException,
org.xml.sax.SAXException,
TikaException
parse in interface Parserstream - the document stream (input)handler - handler for the XHTML SAX events (output)metadata - document metadata (input and output)
java.io.IOException - if the document stream could not be read
org.xml.sax.SAXException - if the SAX events could not be processed
TikaException - if the document could not be parsedprotected abstract java.lang.String getContentType()
protected abstract void extractText(org.apache.poi.poifs.filesystem.POIFSFileSystem filesystem,
java.lang.Appendable appendable)
throws java.io.IOException,
TikaException
java.io.IOException
TikaException
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||