'Re: embedded objects in HSSF'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       poi-user
Subject:    Re: embedded objects in HSSF
From:       MSB <markbrdsly () tiscali ! co ! uk>
Date:       2009-07-24 6:14:27
Message-ID: 24639171.post () talk ! nabble ! com
[Download RAW message or body]


The last thing to try then is the ExtractorFactory class -
http://poi.apache.org/apidocs/org/apache/poi/extractor/ExtractorFactory.html.

It has methods that allow you to recover pre-coded extractors for embedded
documents. All you can do with these extractors is to recover a text dump of
the files contents a bit like this;

File inputFile‭ = ‬new File(fileName‭);
FileInputStream fis‭ = ‬new FileInputStream(inputFile‭);
POIFSFileSystem fileSystem‭ = ‬new POIFSFileSystem(fis‭);
//‭ ‬Firstly,‭ ‬get an extractor for the Workbook
POIOLE2TextExtractor oleTextExtractor‭ =
‬ExtractorFactory.createExtractor(fileSystem‭);
//‭ ‬Then a List of extractors for any embedded Excel,‭ ‬Word,‭ ‬PowerPoint
//‭ ‬or Visio objects‭ ‬embedded into it.
POITextExtractor‭[] ‬embeddedExtractors‭ =
‬ExtractorFactory.getEmbededDocsTextExtractors(oleTextExtractor‭);
for(POITextExtractor textExtractor‭ ‬:‭ ‬embeddedExtractors‭) {
    //‭ ‬If the embedded object was an Excel spreadsheet.
‭    ‬if(textExtractor instanceof ExcelExtractor‭) {
        ExcelExtractor excelExtractor‭ = (‬ExcelExtractor)textExtractor‭;
        System.out.println(excelExtractor.getText‭());
    }
    //‭ ‬A Word Document
‭    ‬else if(textExtractor instanceof WordExtractor‭) {
        WordExtractor wordExtractor‭ = (‬WordExtractor)textExtractor‭;
        String‭[] ‬paragraphText‭ = ‬wordExtractor.getParagraphText‭();
        for(String paragraph‭ ‬:‭ ‬paragraphText‭) {
            System.out.println(paragraph‭);
        }
        //‭ ‬Display the document's header and footer text as proof you have
it
‭     ‬   System.out.println‭("‬Footer text:‭ " ‬+‭
‬wordExtractor.getFooterText‭());
        System.out.println‭("‬Header text:‭ " ‬+‭
‬wordExtractor.getHeaderText‭());
    }
    //‭ ‬PowerPoint Presentation.
‭    ‬else if(textExtractor instanceof PowerPointExtractor‭) {
        PowerPointExtractor powerPointExtractor‭ =
(‬PowerPointExtractor)textExtractor‭;
        System.out.println‭("‬Text:‭ " ‬+‭ ‬powerPointExtractor.getText‭());
        System.out.println‭("‬Notes:‭ " ‬+‭
‬powerPointExtractor.getNotes‭());
    }
    //‭ ‬Visio Drawing
‭ ‬   else if(textExtractor instanceof VisioTextExtractor‭) {
        VisioTextExtractor visioTextExtractor‭ =
(‬VisioTextExtractor)textExtractor‭;
        System.out.println‭("‬Text:‭ " ‬+‭ ‬visioTextExtractor.getText‭());
    }
}

but it will be a good test.

If you still see errors with this, then, if I correctly understand your
explanation of what you did, I am wondering if there is such a thing as a
circular reference when inserting an object; it may be better to make a copy
- another physical copy that is with a different name - of the file and then
insert that into the spreadsheet. Possibly by inserting the same file within
itself, you could be inadvertantly creating the 'problem'; it will be
interesting to see if this is the case.

Yours

Mark B


stigman wrote:
> 
> To create the embedded objects I opened up an working existing spreadsheet
> in excel 2000 and using the insert->object option, inserted another copy
> of the same working spreadsheet. I also tried creating a new spreadsheet,
> made a copy of it and inserted it into the new spreadsheet and got the
> same error. It seems to have the same error regardless of the embedded
> ole2 object type. I also created another method with your code reading the
> xls file and got the same errors. 
> 
> 
> MSB wrote:
>> 
>> It may be the way you are creating the enbedded workbook object. I have
>> copied the example code from the Quick Guide;
>> 
>> It is possible to perform more detailed processing of an embedded Excel,
>> Word or PowerPoint document, or to work with any other type of embedded
>> object.
>> 
>> HSSF:
>> 
>>   POIFSFileSystem fs = new POIFSFileSystem(new
>> FileInputStream("excel_with_embeded.xls"));
>>   HSSFWorkbook workbook = new HSSFWorkbook(fs);
>>   for (HSSFObjectData obj : workbook.getAllEmbeddedObjects()) {
>>       //the OLE2 Class Name of the object
>>       String oleName = obj.getOLE2ClassName();
>>       if (oleName.equals("Worksheet")) {
>>           DirectoryNode dn = (DirectoryNode) obj.getDirectory();
>>           HSSFWorkbook embeddedWorkbook = new HSSFWorkbook(dn, fs,
>> false);
>>           //System.out.println(entry.getName() + ": " +
>> embeddedWorkbook.getNumberOfSheets());
>>       } else if (oleName.equals("Document")) {
>>           DirectoryNode dn = (DirectoryNode) obj.getDirectory();
>>           HWPFDocument embeddedWordDocument = new HWPFDocument(dn, fs);
>>           //System.out.println(entry.getName() + ": " +
>> embeddedWordDocument.getRange().text());
>>       }  else if (oleName.equals("Presentation")) {
>>           DirectoryNode dn = (DirectoryNode) obj.getDirectory();
>>           SlideShow embeddedPowerPointDocument = new SlideShow(new
>> HSLFSlideShow(dn, fs));
>>           //System.out.println(entry.getName() + ": " +
>> embeddedPowerPointDocument.getSlides().length);
>>       } else {
>>           if(obj.hasDirectoryEntry()){
>>               // The DirectoryEntry is a DocumentNode. Examine its
>> entries to find out what it is
>>               DirectoryNode dn = (DirectoryNode) obj.getDirectory();
>>               for (Iterator entries = dn.getEntries();
>> entries.hasNext();) {
>>                   Entry entry = (Entry) entries.next();
>>                   //System.out.println(oleName + "." + entry.getName());
>>               }
>>           } else {
>>               // There is no DirectoryEntry
>>               // Recover the object's data from the HSSFObjectData
>> instance.
>>               byte[] objectData = obj.getObjectData();
>>           }
>>       }
>>   }
>>        
>> That code has been tested against the latest 3.5 beta release but I think
>> it should work against 3.2 final as well.
>> 
>> 
>> 
>> stigman wrote:
>>> 
>>> I'm trying to read embedded objects in an excel spreadsheet and am
>>> getting "Unable to read entire header; 0 bytes read; expected 512 bytes"
>>> exception for all embedded objects when I try to read the embedded
>>> object. It can be another spreadsheet, word doc or ppt object. I'm able
>>> to read the docs individually with poi and when they are embedded in a
>>> ppt file using hslf. The exception is thrown when the new workbook
>>> object is creating. I've been struck on this problem and haven't been
>>> able to find anyone else with this issue, any help would be appreciated.
>>> 
>>> using poi 3.2final.
>>> excel spreadsheet is getting created with MS Excel 2000.
>>> 
>>> filename is an InputStream.
>>> 
>>> HSSFWorkbook wb = new HSSFWorkbook(filename, false);
>>> 
>>> for (Iterator<HSSFObjectData> doList =
>>> wb.getAllEmbeddedObjects().iterator(); doList.hasNext(); ) {
>>>        	HSSFObjectData dataObject = (HSSFObjectData) doList.next();
>>>        	if(dataObject.hasDirectoryEntry()){
>>>        		oleName = dataObject.getOLE2ClassName();
>>>         	if("Worksheet".equals(oleName)){
>>>        		    HSSFWorkbook wBook = new HSSFWorkbook(new
>>> ByteArrayInputStream(dataObject.getObjectData()));
>>> 	       	}
>>>          }
>>> }
>>> 
>>> 
>>> 
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/embedded-objects-in-HSSF-tp24625249p24639171.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic