'Re: SOLR scalability porblem'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       solr-user
Subject:    Re: SOLR scalability porblem
From:       Erick Erickson <erickerickson () gmail ! com>
Date:       2017-03-30 15:29:59
Message-ID: CAN4YXvcYp8HAX5y0zFUzLgcKyBrC6q01ECB4PSvcYC3=anXMLA () mail ! gmail ! com
[Download RAW message or body]

I'm inferring that at the end of the day, all your docs fit in a
single index, correct? SolrCloud won't be a magic bullet, and I'd
strongly advise if you _do_ go to SolrCloud to use SlolrJ or similar
to feed docs as DIH runs on a single server.

However, all that aside if I can restate your problem to be sure I
have it right: When you replicate out from your master to your 100
slaves the master is overburdened.

Have you consider the "repeater" strategy? See:
https://cwiki.apache.org/confluence/display/solr/Index+Replication#IndexReplication-SettingUpaRepeaterwiththeReplicationHandler


The idea is that the master serves the index to, say, 10 slaves and
each of those slaves in turn serves 10 _other_ slaves.

Best,
Erick

On Thu, Mar 30, 2017 at 6:12 AM, santosh sidnal
<sidnal.santosh@gmail.com> wrote:
> Hi All,
> 
> I have a problem with scalability on my project. we are running almost
> close of 100 cores which are having documents of ~25000 each and the total
> size of the index files being 7.5 GB.
> 
> 
> Also, we have the staging server where we build index files using data
> importer and using replication we are pushing data to data LIVE servers
> which are used for serving the live application. because we are initiating
> index pulling from live servers currently 5 servers against one stage
> server. Stage server is overburden and will not be able to respond properly
> either to indexing job or to other services (very minimal).
> 
> 
> So now the problem statement is I understand my current SOLR architecture
> is not able to handle all my needs so we are thinking to upgrade to next
> levels but I am confused with below questions,
> 
> 
> 1. Can I use index Sharding for my problem? it is recommended to use if
> my have more than ~1million docs in a core but I have in only 25000 in one
> core and having 100 cores.
> 2. Can I consider SOLR cloud? if yes please let me know why?
> 3. How about using apache zookeeper and maintain only LIVE servers by
> dividing every 20 cores to each server? so that I can make only 25 snap
> pull request to stage server instead of 100 snap-puller per server as
> current.
> 
> Any new suggestions or reply to this email is greatly appreciated. Thanks
> in advance.
> 
> 
> 
> --
> Regards,
> Santosh Sidnal


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic