[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-user
Subject:    Re: Syncing lucene index with a database
From:       "Matt Schraeder" <MSchraeder () btsb ! com>
Date:       2009-03-27 16:17:42
Message-ID: s9ccb625.016 () mail ! btsb ! com
[Download RAW message or body]


Thanks a million. You really helped me get on the right direction. I'm
going to start building some test cases this afternoon so I can really
begin to get my hands dirty and see how everything works.  It's good to
know that there isn't one "right" way for my purposes, which is mostly
what I wanted to verify before I got some actual timed test cases.

>>> erickerickson@gmail.com 3/27/2009 10:11 AM >>>

Yes, updating a document in Lucene is "expensive" for two
reasons:
1> deleting and adding a document does mean there's internal
    work being done. But it's not all *that* expensive. So this really
    comes down to how many records you expect to update
    every 15 minutes. You've gotta try it.
2> whenever a Lucene index is altered, you have to close/reopen
     the index searcher to see the changes. That's expensive in
     the sense that the first few queries through the system will
     fill up internal caches etc, so will be slower. You can hide this
     from your users by altering the index, firing a few warmup
     searches at the modified index *then* switching searchers
     in your main app.


I didn't see how many additions/deletions you're expecting every 15
minutes, but assuming it's not a huge amount, I'd take the simplest
approach.

The very first thing you need to do is build a stupid version of the
index
and see how long it takes. Take another hour or two and try some
deletions/additions just to get a feel for how much time you're
talking
about
here. *Then* design in more detail.

I think you're over-analyzing the problem, and worrying about
efficiency/
speed too early in the process. You can generate some prototypes in
a very short time that will inform your design. Sure, there are
"best practices", none of which matter in *your* situation. Unless and
until you can put some numbers beside your concerns, you're flying
blind. Do you really care if you take 1.234 seconds to update your
index
as opposed to 1.75 seconds? I have no clue whether the actual numbers
are in the milliseconds or hours. But neither do you <G>. So I'd just
jump in and prototype *just the indexing* portion...

Best
Erick


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic