How to extract formatting information of word document using Apache POI? -


I am using Apache POI to remove formatting information from MS Word files

I want to remove the information such as the paragraphs have bullet, background color, foreclosure, alignment etc.

There are not enough documentation or tutorials available for this. I also do not include very useful information in Javadoc

Where can I find tutorials / good documents which can be learned from Apache Poi API Can help me in ??

For HPPF (.doc), maybe the categories you want are:

  • If you want that depends on the right property, it may be on paragraph or character properties.

    The best example I can think of reading a word document with HWPF and obtaining text, checking styles and formatting are the wordextractors from Apache vaccine:

    ( .docx is similar to XWPF)

Comments