[prev in list] [next in list] [prev in thread] [next in thread]
List: poi-user
Subject: Re: embedded objects in HSSF
From: MSB <markbrdsly () tiscali ! co ! uk>
Date: 2009-07-24 6:14:27
Message-ID: 24639171.post () talk ! nabble ! com
[Download RAW message or body]
The last thing to try then is the ExtractorFactory class -
http://poi.apache.org/apidocs/org/apache/poi/extractor/ExtractorFactory.html.
It has methods that allow you to recover pre-coded extractors for embedded
documents. All you can do with these extractors is to recover a text dump of
the files contents a bit like this;
File inputFile = new File(fileName);
FileInputStream fis = new FileInputStream(inputFile);
POIFSFileSystem fileSystem = new POIFSFileSystem(fis);
// Firstly, get an extractor for the Workbook
POIOLE2TextExtractor oleTextExtractor =
ExtractorFactory.createExtractor(fileSystem);
// Then a List of extractors for any embedded Excel, Word, PowerPoint
// or Visio objects embedded into it.
POITextExtractor[] embeddedExtractors =
ExtractorFactory.getEmbededDocsTextExtractors(oleTextExtractor);
for(POITextExtractor textExtractor : embeddedExtractors) {
// If the embedded object was an Excel spreadsheet.
if(textExtractor instanceof ExcelExtractor) {
ExcelExtractor excelExtractor = (ExcelExtractor)textExtractor;
System.out.println(excelExtractor.getText());
}
// A Word Document
else if(textExtractor instanceof WordExtractor) {
WordExtractor wordExtractor = (WordExtractor)textExtractor;
String[] paragraphText = wordExtractor.getParagraphText();
for(String paragraph : paragraphText) {
System.out.println(paragraph);
}
// Display the document's header and footer text as proof you have
it
System.out.println("Footer text: " +
wordExtractor.getFooterText());
System.out.println("Header text: " +
wordExtractor.getHeaderText());
}
// PowerPoint Presentation.
else if(textExtractor instanceof PowerPointExtractor) {
PowerPointExtractor powerPointExtractor =
(PowerPointExtractor)textExtractor;
System.out.println("Text: " + powerPointExtractor.getText());
System.out.println("Notes: " +
powerPointExtractor.getNotes());
}
// Visio Drawing
else if(textExtractor instanceof VisioTextExtractor) {
VisioTextExtractor visioTextExtractor =
(VisioTextExtractor)textExtractor;
System.out.println("Text: " + visioTextExtractor.getText());
}
}
but it will be a good test.
If you still see errors with this, then, if I correctly understand your
explanation of what you did, I am wondering if there is such a thing as a
circular reference when inserting an object; it may be better to make a copy
- another physical copy that is with a different name - of the file and then
insert that into the spreadsheet. Possibly by inserting the same file within
itself, you could be inadvertantly creating the 'problem'; it will be
interesting to see if this is the case.
Yours
Mark B
stigman wrote:
>
> To create the embedded objects I opened up an working existing spreadsheet
> in excel 2000 and using the insert->object option, inserted another copy
> of the same working spreadsheet. I also tried creating a new spreadsheet,
> made a copy of it and inserted it into the new spreadsheet and got the
> same error. It seems to have the same error regardless of the embedded
> ole2 object type. I also created another method with your code reading the
> xls file and got the same errors.
>
>
> MSB wrote:
>>
>> It may be the way you are creating the enbedded workbook object. I have
>> copied the example code from the Quick Guide;
>>
>> It is possible to perform more detailed processing of an embedded Excel,
>> Word or PowerPoint document, or to work with any other type of embedded
>> object.
>>
>> HSSF:
>>
>> POIFSFileSystem fs = new POIFSFileSystem(new
>> FileInputStream("excel_with_embeded.xls"));
>> HSSFWorkbook workbook = new HSSFWorkbook(fs);
>> for (HSSFObjectData obj : workbook.getAllEmbeddedObjects()) {
>> //the OLE2 Class Name of the object
>> String oleName = obj.getOLE2ClassName();
>> if (oleName.equals("Worksheet")) {
>> DirectoryNode dn = (DirectoryNode) obj.getDirectory();
>> HSSFWorkbook embeddedWorkbook = new HSSFWorkbook(dn, fs,
>> false);
>> //System.out.println(entry.getName() + ": " +
>> embeddedWorkbook.getNumberOfSheets());
>> } else if (oleName.equals("Document")) {
>> DirectoryNode dn = (DirectoryNode) obj.getDirectory();
>> HWPFDocument embeddedWordDocument = new HWPFDocument(dn, fs);
>> //System.out.println(entry.getName() + ": " +
>> embeddedWordDocument.getRange().text());
>> } else if (oleName.equals("Presentation")) {
>> DirectoryNode dn = (DirectoryNode) obj.getDirectory();
>> SlideShow embeddedPowerPointDocument = new SlideShow(new
>> HSLFSlideShow(dn, fs));
>> //System.out.println(entry.getName() + ": " +
>> embeddedPowerPointDocument.getSlides().length);
>> } else {
>> if(obj.hasDirectoryEntry()){
>> // The DirectoryEntry is a DocumentNode. Examine its
>> entries to find out what it is
>> DirectoryNode dn = (DirectoryNode) obj.getDirectory();
>> for (Iterator entries = dn.getEntries();
>> entries.hasNext();) {
>> Entry entry = (Entry) entries.next();
>> //System.out.println(oleName + "." + entry.getName());
>> }
>> } else {
>> // There is no DirectoryEntry
>> // Recover the object's data from the HSSFObjectData
>> instance.
>> byte[] objectData = obj.getObjectData();
>> }
>> }
>> }
>>
>> That code has been tested against the latest 3.5 beta release but I think
>> it should work against 3.2 final as well.
>>
>>
>>
>> stigman wrote:
>>>
>>> I'm trying to read embedded objects in an excel spreadsheet and am
>>> getting "Unable to read entire header; 0 bytes read; expected 512 bytes"
>>> exception for all embedded objects when I try to read the embedded
>>> object. It can be another spreadsheet, word doc or ppt object. I'm able
>>> to read the docs individually with poi and when they are embedded in a
>>> ppt file using hslf. The exception is thrown when the new workbook
>>> object is creating. I've been struck on this problem and haven't been
>>> able to find anyone else with this issue, any help would be appreciated.
>>>
>>> using poi 3.2final.
>>> excel spreadsheet is getting created with MS Excel 2000.
>>>
>>> filename is an InputStream.
>>>
>>> HSSFWorkbook wb = new HSSFWorkbook(filename, false);
>>>
>>> for (Iterator<HSSFObjectData> doList =
>>> wb.getAllEmbeddedObjects().iterator(); doList.hasNext(); ) {
>>> HSSFObjectData dataObject = (HSSFObjectData) doList.next();
>>> if(dataObject.hasDirectoryEntry()){
>>> oleName = dataObject.getOLE2ClassName();
>>> if("Worksheet".equals(oleName)){
>>> HSSFWorkbook wBook = new HSSFWorkbook(new
>>> ByteArrayInputStream(dataObject.getObjectData()));
>>> }
>>> }
>>> }
>>>
>>>
>>>
>>
>>
>
>
--
View this message in context: http://www.nabble.com/embedded-objects-in-HSSF-tp24625249p24639171.html
Sent from the POI - User mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic