[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-dev
Subject:    RE: corrupted index
From:       Doug Cutting <DCutting () grandcentral ! com>
Date:       2002-04-02 16:23:53
[Download RAW message or body]

Matt,

I'd welcome a concrete proposal in this area.  Probably we should wait until
we have a final 1.2 release out there before making such changes.  Note that
this could be done compatibly if the new exceptions subclass
java.io.IOException.

Doug

> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> Sent: Monday, April 01, 2002 9:06 PM
> To: lucene-dev@jakarta.apache.org
> Cc: matt@jivesoftware.com
> Subject: RE: corrupted index
> 
> 
> I changed the recipient from -user to -dev list, as that seems more
> appropriate.
> I think this would not be a bad idea, if we do it right.
> Things like IndexLockedException, etc. sound alright to me.
> I think Doug once welcomed such a change on one of the lists, too.
> 
> Perhaps a list of suggested exceptions, new exception classes and
> appropriate patches would be the best contribution.
> 
> Thanks,
> Otis
> 
> --- Matt Tucker <matt@jivesoftware.com> wrote:
> > Hey all,
> > 
> > Actually, using shutdown hooks might not be the best idea since
> > Lucene is very 
> > often used in server-side Java environments. Many app-servers throw
> > security 
> > errors when trying to add shutdown hooks, and I've seen Weblogic
> > crash before 
> > when having them in a webapp. Has anyone else run into this?
> > 
> > This all brings up a key issue with Lucene, which is that there is
> > little way 
> > to recover from errors gracefully. I'd love to see a number of
> > checked 
> > exceptions added. For example:
> > 
> >  IndexNotFoundException -- when trying to open an index that doesn't
> > exist
> >  IndexLockedException -- when a lock file prevents you from getting
> > an index
> >  IndexCorruptException -- maybe this would be thrown when an index
> > appears to 
> > be broken?
> > 
> > At the moment, Lucene throws many undocumented IOExceptions 
> and even 
> > NullPointerExceptions when an error case comes up. I catch these in
> > my app, but 
> > there's really not an intelligent way to recover from them. Adding
> > checked 
> > exceptions would be a change of the API, but it seems worth it. I'd
> > be happy to 
> > make a more specific proposal if other people feel like 
> this would be
> > a 
> > worthwhile direction to go in.
> > 
> > Regards,
> > Matt
> > 
> > Quoting "Spencer, Dave" <dave@lumos.com>:
> > 
> > > Runtime.addShutdownHook:
> > > 
> > > 
> > > 
> > >
> >
> http://java.sun.com/j2se/1.3/docs/api/java/lang/Runtime.html#a
> ddShutdown
> > > Hook(java.lang.Thread)
> > > 
> > > -----Original Message-----
> > > From: Otis Gospodnetic [ mailto:otis_gospodnetic@yahoo.com]
> > > Sent: Sunday, March 17, 2002 12:06 AM
> > > To: Lucene Users List
> > > Subject: Re: corrupted index
> > > 
> > > 
> > > Oh, I just thought of something (wine does body good).
> > > Perhaps one could use Runtime (the class) to catch the 
> JVM shutdown
> > and
> > > do whatever is needed to prevent index corruption.  I 
> believe there
> > are
> > > some shutdown hook methods in there that may let you do that.  I'm
> > too
> > > lazy to look up the API docs now, but I rememeber reading about
> > that
> > > once, and perhaps it was even mentioned on one of the 2 Lucene
> > mailing
> > > lists.
> > > 
> > > On the other hand, it would be great to have a tool that 
> can verify
> > an
> > > existing index.  I don't know enough about the actual file
> > structure
> > > yet to write something like that, but maybe somebody else has done
> > that
> > > already or would like to contribute.
> > > 
> > > Otis
> > > 
> > > 
> > > --- "Steven J. Owens" <puffmail@darksleep.com> wrote:
> > > > Otis,
> > > >
> > > > > You can remove the .lock file and try re-indexing or 
> continuing
> > > > > indexing where you left off.
> > > > > I am not sure about the corrupt index.  I have never seen it
> > > > happen,
> > > > > and I believe I recall reading some messages from Doug Cutting
> > > > saying
> > > > > that index should never be left in an inconsistent state. 
> > > >
> > > >      Obviously never "should" be, but if something's pulling the
> > rug
> > > > out from under his JRE, changes could be only partially written,
> > > > right? 
> > > >
> > > >      Or is the writing format in some sense 
> transactionally safe?
> > > > I've never worked directly on something like this, but I worked
> > at a
> > > > database software company where they used transaction semantics
> > and a
> > > > journaling scheme to fake a "bulletproof" file system.  Is this
> > how
> > > > the index-writing code is implemented?
> > > >
> > > >      In general, I can guess Doug's response - just 
> torch the old
> > > > index directory and rebuild it; Lucene's indexing is fast enough
> > that
> > > > you don't need to get clever.  This seems to be Doug's stance in
> > > > general (i.e. "don't get fancy, I already put all the fanciness
> > > > you'll
> > > > need into extremely fast indexing and searching").  So far, it
> > seems
> > > > to work :-).
> > > >
> > > > > I could be making this up, though, so I suggest you search
> > through
> > > > > lucene-user and lucene-dev archives on www.mail-archive.com.
> > > > > A search for "corrupt" should do it.
> > > > > Once you figure things out maybe you can post a summary here.
> > > >
> > > >      I got a little curious, so I went and did the searches. 
> > There
> > > > is
> > > > exactly one message in each list archive (dev and 
> users) with the
> > > > keyword "corrupt" in it.  The lucene-users instance is
> > irrelevant:
> > > >
> > > >
> > >
> >
> http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg
00557.html
> > >
> > >      The lucene-dev instance is more useful:
> > >
> > >
> >
>
http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg00157.html
> > >
> > >      It's a post from Doug, dated sept 27, 2001, about adding not
> > > just
> > > thread-safety but process-safety:
> > >
> > >   It should be impossible to corrupt an index through the Lucene
> API.
> > >   However if a Lucene process exits unexpectedly it can leave the
> > > index
> > >   locked.  The remedy is simply to, at a time when it is certain
> that
> > > no
> > >   processes are accessing the index, remove all lock files.
> > >  
> > >      So it sounds like it's worth trying just removing the lock
> > > files.
> > > Hm, is there a way to come up with a "sanity check" you can run
> on an
> > > index to make sure it's not corrupted?  This might be an
> excellent
> > > thing to reassure yourself with: something went wrong?  Run a
> sanity
> > > check, if it fails just reindex.
> > >
> > > Steven J. Owens
> > > puff@darksleep.com



__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://http://taxes.yahoo.com/

--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>

--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic