[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-user
Subject:    Re: corrupted index
From:       Ype Kingma <ykingma () xs4all ! nl>
Date:       2002-03-17 8:35:17
[Download RAW message or body]


>Otis,
>
>> You can remove the .lock file and try re-indexing or continuing
>> indexing where you left off.
>> I am not sure about the corrupt index.  I have never seen it happen,
>> and I believe I recall reading some messages from Doug Cutting saying
>> that index should never be left in an inconsistent state. 
>
>     Obviously never "should" be, but if something's pulling the rug
>out from under his JRE, changes could be only partially written,
>right? 
>
>     Or is the writing format in some sense transactionally safe?
>I've never worked directly on something like this, but I worked at a
>database software company where they used transaction semantics and a
>journaling scheme to fake a "bulletproof" file system.  Is this how
>the index-writing code is implemented?
>
>     In general, I can guess Doug's response - just torch the old
>index directory and rebuild it; Lucene's indexing is fast enough that
>you don't need to get clever.  This seems to be Doug's stance in
>general (i.e. "don't get fancy, I already put all the fanciness you'll
>need into extremely fast indexing and searching").  So far, it seems
>to work :-).

Yes, but it's not too difficult to make it work even faster.
Backup your indexes and give all your imports an option to
work incrementally. Then, if something goes wrong, copy from
the backup and restart your import in incremental mode.

> > I could be making this up, though, so I suggest you search through
>> lucene-user and lucene-dev archives on www.mail-archive.com.
>> A search for "corrupt" should do it.
>> Once you figure things out maybe you can post a summary here.
>
>     I got a little curious, so I went and did the searches.  There is
>exactly one message in each list archive (dev and users) with the
>keyword "corrupt" in it.  The lucene-users instance is irrelevant:
>
>http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg00557.html
>
>     The lucene-dev instance is more useful:
>
>http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg00157.html
>
>     It's a post from Doug, dated sept 27, 2001, about adding not just
>thread-safety but process-safety:
>
>  It should be impossible to corrupt an index through the Lucene API.
>  However if a Lucene process exits unexpectedly it can leave the index
>  locked.  The remedy is simply to, at a time when it is certain that no
>  processes are accessing the index, remove all lock files.
> 

Note that this assumes that your file system works as advertised
in the java.io API. If there occasional moments that it doesn't
you'll have to clean up the mess yourself.

>     So it sounds like it's worth trying just removing the lock files.
>Hm, is there a way to come up with a "sanity check" you can run on an
>index to make sure it's not corrupted?  This might be an excellent
>thing to reassure yourself with: something went wrong?  Run a sanity
>check, if it fails just reindex.

One sanity check is to delete a document, add it and reoptimize.
I have had document ordering/numbering exceptions from the optimize() call,
so I concluded optimize() does at least some sanity checks
when it performs actual work.
This makes optimize() it an even nicer preparation for backup.

Regards,
Ype

-- 

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic