[prev in list] [next in list] [prev in thread] [next in thread] 

List:       nutch-developers
Subject:    [Nutch-dev] [ nutch-Bugs-999549 ] MSWord document's title
From:       "SourceForge.net" <noreply () sourceforge ! net>
Date:       2004-07-29 9:06:12
Message-ID: E1Bq6rc-00070p-00 () sc8-sf-web3 ! sourceforge ! net
[Download RAW message or body]

Bugs item #999549, was opened at 2004-07-28 15:47
Message generated for change (Comment added) made by andyhedges
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=491356&aid=999549&group_id=59548

Category: plugin: other
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Andy Hedges (andyhedges)
Assigned to: Nobody/Anonymous (nobody)
Summary: MSWord document's title

Initial Comment:
MSWord document titles weren't being extracted and
stored. This patch does that by extracting the title
from the documents "properties".



----------------------------------------------------------------------

>Comment By: Andy Hedges (andyhedges)
Date: 2004-07-29 09:06

Message:
Logged In: YES 
user_id=583029

After doing some extensive test on this I have discovered
that occasionally Word 'Streams' don't have the
SummaryInformation documents in them. This apparently
happens when a word doc is opened in StarOffice (or I
imagine OO.o) and saved out again.

Anyway this new patch sets a timeout on the listener and if
no SummaryInformation is found sets the title to the empty
string.

This seems a bit complicated to extract a title from a
document but this maybe due to the nature of the format or
the api. Could someone who is familiar with POI and the
Apache api please comment?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=491356&aid=999549&group_id=59548


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic