[prev in list] [next in list] [prev in thread] [next in thread]
List: lucene-dev
Subject: RE: corrupted index
From: Doug Cutting <DCutting () grandcentral ! com>
Date: 2002-04-02 16:23:53
[Download RAW message or body]
Matt,
I'd welcome a concrete proposal in this area. Probably we should wait until
we have a final 1.2 release out there before making such changes. Note that
this could be done compatibly if the new exceptions subclass
java.io.IOException.
Doug
> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> Sent: Monday, April 01, 2002 9:06 PM
> To: lucene-dev@jakarta.apache.org
> Cc: matt@jivesoftware.com
> Subject: RE: corrupted index
>
>
> I changed the recipient from -user to -dev list, as that seems more
> appropriate.
> I think this would not be a bad idea, if we do it right.
> Things like IndexLockedException, etc. sound alright to me.
> I think Doug once welcomed such a change on one of the lists, too.
>
> Perhaps a list of suggested exceptions, new exception classes and
> appropriate patches would be the best contribution.
>
> Thanks,
> Otis
>
> --- Matt Tucker <matt@jivesoftware.com> wrote:
> > Hey all,
> >
> > Actually, using shutdown hooks might not be the best idea since
> > Lucene is very
> > often used in server-side Java environments. Many app-servers throw
> > security
> > errors when trying to add shutdown hooks, and I've seen Weblogic
> > crash before
> > when having them in a webapp. Has anyone else run into this?
> >
> > This all brings up a key issue with Lucene, which is that there is
> > little way
> > to recover from errors gracefully. I'd love to see a number of
> > checked
> > exceptions added. For example:
> >
> > IndexNotFoundException -- when trying to open an index that doesn't
> > exist
> > IndexLockedException -- when a lock file prevents you from getting
> > an index
> > IndexCorruptException -- maybe this would be thrown when an index
> > appears to
> > be broken?
> >
> > At the moment, Lucene throws many undocumented IOExceptions
> and even
> > NullPointerExceptions when an error case comes up. I catch these in
> > my app, but
> > there's really not an intelligent way to recover from them. Adding
> > checked
> > exceptions would be a change of the API, but it seems worth it. I'd
> > be happy to
> > make a more specific proposal if other people feel like
> this would be
> > a
> > worthwhile direction to go in.
> >
> > Regards,
> > Matt
> >
> > Quoting "Spencer, Dave" <dave@lumos.com>:
> >
> > > Runtime.addShutdownHook:
> > >
> > >
> > >
> > >
> >
> http://java.sun.com/j2se/1.3/docs/api/java/lang/Runtime.html#a
> ddShutdown
> > > Hook(java.lang.Thread)
> > >
> > > -----Original Message-----
> > > From: Otis Gospodnetic [ mailto:otis_gospodnetic@yahoo.com]
> > > Sent: Sunday, March 17, 2002 12:06 AM
> > > To: Lucene Users List
> > > Subject: Re: corrupted index
> > >
> > >
> > > Oh, I just thought of something (wine does body good).
> > > Perhaps one could use Runtime (the class) to catch the
> JVM shutdown
> > and
> > > do whatever is needed to prevent index corruption. I
> believe there
> > are
> > > some shutdown hook methods in there that may let you do that. I'm
> > too
> > > lazy to look up the API docs now, but I rememeber reading about
> > that
> > > once, and perhaps it was even mentioned on one of the 2 Lucene
> > mailing
> > > lists.
> > >
> > > On the other hand, it would be great to have a tool that
> can verify
> > an
> > > existing index. I don't know enough about the actual file
> > structure
> > > yet to write something like that, but maybe somebody else has done
> > that
> > > already or would like to contribute.
> > >
> > > Otis
> > >
> > >
> > > --- "Steven J. Owens" <puffmail@darksleep.com> wrote:
> > > > Otis,
> > > >
> > > > > You can remove the .lock file and try re-indexing or
> continuing
> > > > > indexing where you left off.
> > > > > I am not sure about the corrupt index. I have never seen it
> > > > happen,
> > > > > and I believe I recall reading some messages from Doug Cutting
> > > > saying
> > > > > that index should never be left in an inconsistent state.
> > > >
> > > > Obviously never "should" be, but if something's pulling the
> > rug
> > > > out from under his JRE, changes could be only partially written,
> > > > right?
> > > >
> > > > Or is the writing format in some sense
> transactionally safe?
> > > > I've never worked directly on something like this, but I worked
> > at a
> > > > database software company where they used transaction semantics
> > and a
> > > > journaling scheme to fake a "bulletproof" file system. Is this
> > how
> > > > the index-writing code is implemented?
> > > >
> > > > In general, I can guess Doug's response - just
> torch the old
> > > > index directory and rebuild it; Lucene's indexing is fast enough
> > that
> > > > you don't need to get clever. This seems to be Doug's stance in
> > > > general (i.e. "don't get fancy, I already put all the fanciness
> > > > you'll
> > > > need into extremely fast indexing and searching"). So far, it
> > seems
> > > > to work :-).
> > > >
> > > > > I could be making this up, though, so I suggest you search
> > through
> > > > > lucene-user and lucene-dev archives on www.mail-archive.com.
> > > > > A search for "corrupt" should do it.
> > > > > Once you figure things out maybe you can post a summary here.
> > > >
> > > > I got a little curious, so I went and did the searches.
> > There
> > > > is
> > > > exactly one message in each list archive (dev and
> users) with the
> > > > keyword "corrupt" in it. The lucene-users instance is
> > irrelevant:
> > > >
> > > >
> > >
> >
> http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg
00557.html
> > >
> > > The lucene-dev instance is more useful:
> > >
> > >
> >
>
http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg00157.html
> > >
> > > It's a post from Doug, dated sept 27, 2001, about adding not
> > > just
> > > thread-safety but process-safety:
> > >
> > > It should be impossible to corrupt an index through the Lucene
> API.
> > > However if a Lucene process exits unexpectedly it can leave the
> > > index
> > > locked. The remedy is simply to, at a time when it is certain
> that
> > > no
> > > processes are accessing the index, remove all lock files.
> > >
> > > So it sounds like it's worth trying just removing the lock
> > > files.
> > > Hm, is there a way to come up with a "sanity check" you can run
> on an
> > > index to make sure it's not corrupted? This might be an
> excellent
> > > thing to reassure yourself with: something went wrong? Run a
> sanity
> > > check, if it fails just reindex.
> > >
> > > Steven J. Owens
> > > puff@darksleep.com
__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://http://taxes.yahoo.com/
--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic