[prev in list] [next in list] [prev in thread] [next in thread] 

List:       poi-user
Subject:    Extracting the custom properties from msword files.
From:       ahammad <ahmed.hammad () gmail ! com>
Date:       2009-01-30 19:31:52
Message-ID: 21754082.post () talk ! nabble ! com
[Download RAW message or body]


Hello,

I'm working with Nutch to crawl some msword documents. What I'm trying to do
is to add custom properties to the msword files, so that when Nutch crawls,
it extracts those custom properties and indexes them. 

Nutch doesn't do that, so I came to the conclusion that I'll have to change
the msword parsers. From searching the web, I found that the best way to do
this is to use the CustomProperties and the DocumentSummaryInformation
classes. I followed the following example:

http://www.docjar.com/html/api/org/apache/poi/hpsf/examples/ModifyDocumentSummaryInformation.java.html


I kept all the parts that I needed to read the custom properties, but I get
syntax errors for:
dsi = PropertySetFactory.newDocumentSummaryInformation();
si = PropertySetFactory.newSummaryInformation();

The errors say:
The method newDocumentSummaryInformation() is undefined for the type
PropertySetFactory	
The method newSummaryInformation() is undefined for the type
PropertySetFactory	MSExtractor.java	

When I check the API page, those functions are defined. Why am I getting
this?

Also, this is what I believe to be the only way to do this. If there is a
better way to do this please suggest it. This is only the first step of the
process, I still need to figure out how to integrate this with Nutch's
msword parser.

Cheers

-- 
View this message in context: \
http://www.nabble.com/Extracting-the-custom-properties-from-msword-files.-tp21754082p21754082.html
 Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic