'Re: [EXTERNAL] colStatus response not as expected with Solr 8.1.1 in a distributed deployment'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       solr-user
Subject:    Re: [EXTERNAL] colStatus response not as expected with Solr 8.1.1 in a distributed deployment
From:       Erick Erickson <erickerickson () gmail ! com>
Date:       2019-10-30 11:48:21
Message-ID: 1CFB4D64-6B29-4EE1-BBC6-FB4A12C36FBB () gmail ! com
[Download RAW message or body]

Exactly how did you kill the instance? If I stop Solr gracefully (bin/solr stop…) \
it's fine. If I do a "kill -9" on it, I see the same thing you do on master.

It's a bit tricky. When a node goes away without a chance to gracefully shut down, \
there's no chance to set the state in the collection's "state.json" znode. However, \
the node will be removed from the "live_nodes" list and a replica is not truly active \
unless its state is "active" in the state.json file _and_ the node appears in \
live_nodes.

CLUSTERSTATUS pretty clearly understands this, but COLSTATUS apparently doesn't.

I'll raise a JIRA.

Thanks for letting us know

Erick

> On Oct 29, 2019, at 2:10 PM, Elizaveta Golova <EGolova@uk.ibm.com> wrote:
> 
> colStatus (and clusterStatus) from the Collections api.
> https://lucene.apache.org/solr/guide/8_1/collections-api.html#colstatus
> 
> 
> Running something like this in the browser where the live solr node is accessible \
> on port 8983 (but points at a Docker container which is running the Solr node): \
> http://localhost:8983/solr/admin/collections?action=COLSTATUS&collection=coll 
> 
> 
> 
> -----Erick Erickson <erickerickson@gmail.com> wrote: -----
> To: solr-user@lucene.apache.org
> From: Erick Erickson <erickerickson@gmail.com>
> Date: 10/29/2019 05:39PM
> Subject: [EXTERNAL] Re: colStatus response not as expected with Solr 8.1.1 in a \
> distributed deployment 
> 
> Uhm, what is colStatus? You need to show us _exactly_ what Solr commands you're \
> running for us to make any intelligent comments. 
> > On Oct 29, 2019, at 1:12 PM, Elizaveta Golova <EGolova@uk.ibm.com> wrote:
> > 
> > Hi,
> > 
> > We're seeing an issue with colStatus in a distributed Solr deployment.
> > 
> > Scenario:
> > Collection with:
> > - 1 zk
> > - 2 solr nodes on different boxes (simulated using Docker containers)
> > - replication factor 5
> > 
> > When we take down one node, our clusterStatus response is as expected (only \
> > listing the live node as live, and anything on the "down" node shows the state as \
> > down). 
> > Our colStatus response however continues to shows every shard as being "active" \
> > with the replica breakdown on every shard as "total" == "active", and "down" \
> > always being zero. i.e.
> > "shards":{
> > "shard1":{
> > "state":"active",
> > "range":"80000000-ffffffff",
> > "replicas":{
> > "total":5,
> > "active":5,
> > "down":0,
> > "recovering":0,
> > "recovery_failed":0},
> > 
> > Even though we expect the "down" count to be either 3 or 2 depending on the shard \
> > (and thus "active" being of count 2 or 3 less than it is). 
> > When testing this situation with both Solr nodes being on the same box, the \
> > colStatus response is as expected in regards to the replica counts. 
> > Thanks!Unless stated otherwise above:
> > IBM United Kingdom Limited - Registered in England and Wales with number 741598.
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> > 
> 
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number 741598. 
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> 


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic