[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-user
Subject:    Re: corrupted index
From:       puffmail () darksleep ! com (Steven J !  Owens)
Date:       2002-03-17 6:54:05
[Download RAW message or body]

Otis,

> You can remove the .lock file and try re-indexing or continuing
> indexing where you left off.
> I am not sure about the corrupt index.  I have never seen it happen,
> and I believe I recall reading some messages from Doug Cutting saying
> that index should never be left in an inconsistent state.  

     Obviously never "should" be, but if something's pulling the rug
out from under his JRE, changes could be only partially written,
right?  

     Or is the writing format in some sense transactionally safe?
I've never worked directly on something like this, but I worked at a
database software company where they used transaction semantics and a
journaling scheme to fake a "bulletproof" file system.  Is this how
the index-writing code is implemented?

     In general, I can guess Doug's response - just torch the old
index directory and rebuild it; Lucene's indexing is fast enough that
you don't need to get clever.  This seems to be Doug's stance in
general (i.e. "don't get fancy, I already put all the fanciness you'll
need into extremely fast indexing and searching").  So far, it seems
to work :-).

> I could be making this up, though, so I suggest you search through
> lucene-user and lucene-dev archives on www.mail-archive.com.
> A search for "corrupt" should do it.
> Once you figure things out maybe you can post a summary here.

     I got a little curious, so I went and did the searches.  There is
exactly one message in each list archive (dev and users) with the
keyword "corrupt" in it.  The lucene-users instance is irrelevant:

http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg00557.html

     The lucene-dev instance is more useful:

http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg00157.html

     It's a post from Doug, dated sept 27, 2001, about adding not just
thread-safety but process-safety:

  It should be impossible to corrupt an index through the Lucene API.
  However if a Lucene process exits unexpectedly it can leave the index
  locked.  The remedy is simply to, at a time when it is certain that no
  processes are accessing the index, remove all lock files.
  
     So it sounds like it's worth trying just removing the lock files.
Hm, is there a way to come up with a "sanity check" you can run on an
index to make sure it's not corrupted?  This might be an excellent
thing to reassure yourself with: something went wrong?  Run a sanity
check, if it fails just reindex.

Steven J. Owens
puff@darksleep.com

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic