[prev in list] [next in list] [prev in thread] [next in thread] 

List:       poi-user
Subject:    Problems with extracting text from xls and doc
From:       Joerg Hohwiller <joerg () j-hohwiller ! de>
Date:       2009-01-25 21:41:01
Message-ID: 497CDC6D.3080401 () j-hohwiller ! de
[Download RAW message or body]

Hi there,

in my project I wrote text-extractors for multiple file-formats.
For ms-office binary files I am using apache-poi.

For PPT I wrote an own extractor based on DocumentInputStream
which works quite ok.

For XLS I use HSSFEventFactory which causes my testcase to fail
because it only hits events about the first sheet. However
the content of the second sheet is not found.

For DOC I use WordExtractor which causes my testcase to fail
because it does NOT find the content of comments and footnotes.

Can someone help me if I do anything wrong or if this
is an expected behavior or bug of POI?

You will find everything (including test-cases and office files) here:
https://m-m-m.svn.sourceforge.net/svnroot/m-m-m/trunk/mmm-search/mmm-search-parser/mmm-search-parser-impl-poi/


Thanks a lot
  Jörg

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic