Frequently asked questions

Question: How to extracte featured text from a document, for example: extract all bold, italicized or underlined text from a document and any other layout evidence of emphasis like the text font size and so on?

Answer: The model of ODF is for there to be "blocks" of text and each run can have an associated style reference. These style references then have definitions of exactly what text attributes they correspond to. There are two methods you can refer to.

  1. First identify which styles have the bold (or italics) attribute. The document might have more than one style that defines bold text. Find which text blocks reference that style. You can download a sample code here.

  2. For each text block, identify the style. For the style, resolve the underlying text attributes. If it is bold (or italics or whatever) then extract it. You can download a sample code here.

Powered by the Apache CMS.

Apache "ODF Toolkit" is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.

Copyright © 2011 The Apache Software Foundation Licensed under the Apache License, Version 2.0. Contact Us
Apache and the Apache feather logos are trademarks of The Apache Software Foundation.
Other names appearing on the site may be trademarks of their respective owners.