'Re: Missing Cores on one node and error running SPLIT command?'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       solr-user
Subject:    Re: Missing Cores on one node and error running SPLIT command?
From:       Jan_Høydahl <jan.asf () cominvent ! com>
Date:       2021-09-22 16:23:07
Message-ID: CE1A7AF8-772D-4B74-BFF2-E1D04ED7D644 () cominvent ! com
[Download RAW message or body]

According to clusterstatus, all the replicas/shards for all collections are located \
on the 7574  node, and all of them are "down". So I suspec the reason for the failed \
splitshard command is that the node is not healthy. I would first try to reboot both \
nodes, and then see if they come up correctly.

I suppose this is a test environment since you deploy two nodes on the same physical \
box. Note that when doing so, you should ideally separate them into two completely \
separate installs, or at least completely separate SOLR_HOME folders, e.g. \
/var/solr1/data and /var/solr2/data. If you start two nodes from the same install, \
you may end up in a bad state.

Jan

> 22. sep. 2021 kl. 15:38 skrev Charlie Hubbard <charlie.hubbard@gmail.com>:
> 
> Hi Jan,
> 
> Here is a link to the image:
> 
> https://app.box.com/s/mxidpcbm2lezm9ts47k2wgqwm34tytce
> 
> From the logs:
> 
> Collection: customer operation: splitshard
> failed:org.apache.solr.common.SolrException: missing index size
> information for parent shard leader
> 	at org.apache.solr.cloud.api.collections.SplitShardCmd.checkDiskSpace(SplitShardCmd.java:657)
>   at org.apache.solr.cloud.api.collections.SplitShardCmd.split(SplitShardCmd.java:159)
>   at org.apache.solr.cloud.api.collections.SplitShardCmd.call(SplitShardCmd.java:102)
>   at org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:270)
>   at org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:524)
>   at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  
> CLUSTERSTATUS output:
> 
> {
> "responseHeader": {
> "status": 0,
> "QTime": 78
> },
> "cluster": {
> "collections": {
> "ultipro_audit": {
> "pullReplicas": "0",
> "replicationFactor": "1",
> "shards": {
> "shard1": {
> "range": "80000000-ffffffff",
> "state": "active",
> "replicas": {
> "core_node3": {
> "core": "ultipro_audit_shard1_replica_n1",
> "node_name": "172.30.1.104:7574_solr",
> "base_url": "http://172.30.1.104:7574/solr",
> "state": "down",
> "type": "NRT",
> "force_set_state": "false",
> "leader": "true"
> }
> },
> "health": "RED"
> },
> "shard2": {
> "range": "0-7fffffff",
> "state": "active",
> "replicas": {
> "core_node4": {
> "core": "ultipro_audit_shard2_replica_n2",
> "node_name": "172.30.1.104:7574_solr",
> "base_url": "http://172.30.1.104:7574/solr",
> "state": "down",
> "type": "NRT",
> "force_set_state": "false",
> "leader": "true"
> }
> },
> "health": "RED"
> }
> },
> "router": {
> "name": "compositeId"
> },
> "maxShardsPerNode": "-1",
> "autoAddReplicas": "false",
> "nrtReplicas": "1",
> "tlogReplicas": "0",
> "health": "RED",
> "znodeVersion": 15,
> "configName": "ultipro_audit"
> },
> "audit": {
> "pullReplicas": "0",
> "replicationFactor": "1",
> "shards": {
> "shard1": {
> "range": "80000000-ffffffff",
> "state": "active",
> "replicas": {
> "core_node3": {
> "core": "audit_shard1_replica_n1",
> "node_name": "172.30.1.104:7574_solr",
> "base_url": "http://172.30.1.104:7574/solr",
> "state": "down",
> "type": "NRT",
> "force_set_state": "false",
> "leader": "true"
> }
> },
> "health": "RED"
> },
> "shard2": {
> "range": "0-7fffffff",
> "state": "active",
> "replicas": {
> "core_node4": {
> "core": "audit_shard2_replica_n2",
> "node_name": "172.30.1.104:7574_solr",
> "base_url": "http://172.30.1.104:7574/solr",
> "state": "down",
> "type": "NRT",
> "force_set_state": "false",
> "leader": "true"
> }
> },
> "health": "RED"
> }
> },
> "router": {
> "name": "compositeId"
> },
> "maxShardsPerNode": "-1",
> "autoAddReplicas": "false",
> "nrtReplicas": "1",
> "tlogReplicas": "0",
> "health": "RED",
> "znodeVersion": 13,
> "configName": "audit"
> },
> "customer": {
> "pullReplicas": "0",
> "replicationFactor": "1",
> "shards": {
> "shard1": {
> "range": "80000000-ffffffff",
> "state": "active",
> "replicas": {
> "core_node3": {
> "core": "customer_shard1_replica_n1",
> "node_name": "172.30.1.104:7574_solr",
> "base_url": "http://172.30.1.104:7574/solr",
> "state": "down",
> "type": "NRT",
> "force_set_state": "false",
> "leader": "true"
> }
> },
> "health": "RED"
> },
> "shard2": {
> "range": "0-7fffffff",
> "state": "active",
> "replicas": {
> "core_node4": {
> "core": "customer_shard2_replica_n2",
> "node_name": "172.30.1.104:7574_solr",
> "base_url": "http://172.30.1.104:7574/solr",
> "state": "down",
> "type": "NRT",
> "force_set_state": "false",
> "leader": "true"
> }
> },
> "health": "RED"
> }
> },
> "router": {
> "name": "compositeId"
> },
> "maxShardsPerNode": "-1",
> "autoAddReplicas": "false",
> "nrtReplicas": "1",
> "tlogReplicas": "0",
> "health": "RED",
> "znodeVersion": 15,
> "configName": "customer"
> },
> "fusearchiver": {
> "pullReplicas": "0",
> "replicationFactor": "1",
> "shards": {
> "shard1": {
> "range": "80000000-ffffffff",
> "state": "active",
> "replicas": {
> "core_node3": {
> "core": "fusearchiver_shard1_replica_n1",
> "node_name": "172.30.1.104:7574_solr",
> "base_url": "http://172.30.1.104:7574/solr",
> "state": "down",
> "type": "NRT",
> "force_set_state": "false",
> "leader": "true"
> }
> },
> "health": "RED"
> },
> "shard2": {
> "range": "0-7fffffff",
> "state": "active",
> "replicas": {
> "core_node4": {
> "core": "fusearchiver_shard2_replica_n2",
> "node_name": "172.30.1.104:7574_solr",
> "base_url": "http://172.30.1.104:7574/solr",
> "state": "down",
> "type": "NRT",
> "force_set_state": "false",
> "leader": "true"
> }
> },
> "health": "RED"
> }
> },
> "router": {
> "name": "compositeId"
> },
> "maxShardsPerNode": "-1",
> "autoAddReplicas": "false",
> "nrtReplicas": "1",
> "tlogReplicas": "0",
> "health": "RED",
> "znodeVersion": 12,
> "configName": "fusearchiver"
> }
> },
> "live_nodes": [
> "172.30.1.104:7574_solr",
> "172.30.1.104:8983_solr"
> ]
> }
> }
> 
> 
> On Wed, Sep 22, 2021 at 8:48 AM Jan Høydahl <jan.asf@cominvent.com> wrote:
> 
> > The image did not make it to the list, please try uploading it elsewhere
> > or copy the text only?
> > 
> > Can you check the solr.log and paste relevant section from it?
> > Was the "empty" node supposed to have a core from the collection? Can you
> > do a CLUSTERSTATUS command and paste the output here?
> > 
> > Jan
> > 
> > > 22. sep. 2021 kl. 13:41 skrev Charlie Hubbard <charlie.hubbard@gmail.com
> > > > 
> > > 
> > > Hi,
> > > 
> > > I have a simple 2 node solr cluster running 8.9 (1 node on 8983 and 1
> > node on 7574), but one of my nodes (8983) shows collections, but no cores
> > in the admin interface.  When I run the following command I get an error:
> > "missing index size information for parent shard leader" when I try to do
> > > curl
> > http://localhost:7574/solr/admin/collections?action=SPLITSHARD&collection=customer&shard=shard1&wt=xml
> >  <
> > http://localhost:7574/solr/admin/collections?action=SPLITSHARD&collection=customer&shard=shard1&wt=xml
> > 
> > > 
> > > Here is the error:
> > > 
> > > Any ideas why the cores are missing on my 8983 node?  I tried moving it
> > to 8984 because I had some trouble with caching before, but it had the same
> > results.
> > > TIA
> > > Charlie
> > 
> > 


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic