I am using Apache POI to remove formatting information from MS Word files
I want to remove the information such as the paragraphs have bullet, background color, foreclosure, alignment etc.
There are not enough documentation or tutorials available for this. I also do not include very useful information in Javadoc
Where can I find tutorials / good documents which can be learned from Apache Poi API Can help me in ??
For HPPF (.doc), maybe the categories you want are:
-
If you want that depends on the right property, it may be on paragraph or character properties.
The best example I can think of reading a word document with HWPF and obtaining text, checking styles and formatting are the wordextractors from Apache vaccine:
( .docx is similar to XWPF)
Comments
Post a Comment