[prev in list] [next in list] [prev in thread] [next in thread] 

List:       solr-user
Subject:    Re: What exactly happens to extant documents when the schema changes?
From:       Dotan Cohen <dotancohen () gmail ! com>
Date:       2013-05-30 12:34:42
Message-ID: CAKDXFkMw8kGxL7g9QtqZo_KDWr66f+0ucEOr2u4QV9iSRFnOiQ () mail ! gmail ! com
[Download RAW message or body]

On Wed, May 29, 2013 at 5:09 PM, Shawn Heisey <solr@elyograg.org> wrote:
> I handle this in a very specific way with my sharded index.  This won't
> work for all designs, and the precise procedure won't work for SolrCloud.
>
> There is a 'live' and a 'build' core for each of my shards.  When I want
> to reindex, the program makes a note of my current position for deletes,
> reinserts, and new documents.  Then I use a DIH full-import from mysql
> into the build cores.  Once the import is done, I run the update cycle
> of deletes, reinserts, and new documents on those build cores, using the
> position information noted earlier.  Then I swap the cores so the new
> index is online.
>

I do need to examine sharding and multiple cores. I'll look into that,
thank you. By the way, don't google for DIH! It took me some time to
figure out that it is DataImportHandler, as some people use the
acronym for something completely different.


> To adapt this for SolrCloud, I would need to use two collections, and
> update a collection alias for what is considered live.
>
> To control the I/O and CPU usage, you might need some kind of throttling
> in your update/rebuild application.
>
> I don't need any throttling in my design.  Because I'm using DIH, the
> import only uses a single thread for each shard on the server.  I've got
> RAID10 for storage and half of the CPU cores are still available for
> queries, so it doesn't overwhelm the server.
>
> The rebuild does lower performance, so I have the other copy of the
> index handle queries while the rebuild is underway.  When the rebuild is
> done on one copy, I run it again on the other copy.  Right now I'm
> half-upgraded -- one copy of my index is version 3.5.0, the other is
> 4.2.1.  Switching to SolrCloud with sharding and replication would
> eliminate this flexibility, unless I maintained two separate clouds.
>

Thank you. I am not using Solr Cloud but if I ever consider it, then I
will keep this in mind.

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic