[prev in list] [next in list] [prev in thread] [next in thread]
List: lucene-user
Subject: Re: corrupted index
From: Otis Gospodnetic <otis_gospodnetic () yahoo ! com>
Date: 2002-03-17 8:06:10
[Download RAW message or body]
Oh, I just thought of something (wine does body good).
Perhaps one could use Runtime (the class) to catch the JVM shutdown and
do whatever is needed to prevent index corruption. I believe there are
some shutdown hook methods in there that may let you do that. I'm too
lazy to look up the API docs now, but I rememeber reading about that
once, and perhaps it was even mentioned on one of the 2 Lucene mailing
lists.
On the other hand, it would be great to have a tool that can verify an
existing index. I don't know enough about the actual file structure
yet to write something like that, but maybe somebody else has done that
already or would like to contribute.
Otis
--- "Steven J. Owens" <puffmail@darksleep.com> wrote:
> Otis,
>
> > You can remove the .lock file and try re-indexing or continuing
> > indexing where you left off.
> > I am not sure about the corrupt index. I have never seen it
> happen,
> > and I believe I recall reading some messages from Doug Cutting
> saying
> > that index should never be left in an inconsistent state.
>
> Obviously never "should" be, but if something's pulling the rug
> out from under his JRE, changes could be only partially written,
> right?
>
> Or is the writing format in some sense transactionally safe?
> I've never worked directly on something like this, but I worked at a
> database software company where they used transaction semantics and a
> journaling scheme to fake a "bulletproof" file system. Is this how
> the index-writing code is implemented?
>
> In general, I can guess Doug's response - just torch the old
> index directory and rebuild it; Lucene's indexing is fast enough that
> you don't need to get clever. This seems to be Doug's stance in
> general (i.e. "don't get fancy, I already put all the fanciness
> you'll
> need into extremely fast indexing and searching"). So far, it seems
> to work :-).
>
> > I could be making this up, though, so I suggest you search through
> > lucene-user and lucene-dev archives on www.mail-archive.com.
> > A search for "corrupt" should do it.
> > Once you figure things out maybe you can post a summary here.
>
> I got a little curious, so I went and did the searches. There
> is
> exactly one message in each list archive (dev and users) with the
> keyword "corrupt" in it. The lucene-users instance is irrelevant:
>
>
http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg00557.html
>
> The lucene-dev instance is more useful:
>
>
http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg00157.html
>
> It's a post from Doug, dated sept 27, 2001, about adding not
> just
> thread-safety but process-safety:
>
> It should be impossible to corrupt an index through the Lucene API.
> However if a Lucene process exits unexpectedly it can leave the
> index
> locked. The remedy is simply to, at a time when it is certain that
> no
> processes are accessing the index, remove all lock files.
>
> So it sounds like it's worth trying just removing the lock
> files.
> Hm, is there a way to come up with a "sanity check" you can run on an
> index to make sure it's not corrupted? This might be an excellent
> thing to reassure yourself with: something went wrong? Run a sanity
> check, if it fails just reindex.
>
> Steven J. Owens
> puff@darksleep.com
__________________________________________________
Do You Yahoo!?
Yahoo! Sports - live college hoops coverage
http://sports.yahoo.com/
--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic