'RE: How to restore deleted collection from filesystem'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       solr-user
Subject:    RE: How to restore deleted collection from filesystem
From:       "Kommu, Vinodh K." <vkommu () dtcc ! com>
Date:       2020-05-26 8:33:21
Message-ID: SN6PR15MB2319636069FC16168DB373A7A2B00 () SN6PR15MB2319 ! namprd15 ! prod ! outlook ! com
[Download RAW message or body]

Thanks Eric.

We were able successfully restored deleted collection data as suggested. In fact \
tried both approaches as below & both worked fine:

1) Create collection with same number of shards and replication factor = 1
2) Create collection with same number of shards and same replication factor as \
deleted collection.

As we create collections using rule-based replica placement method, first approach is \
little difficult to find which on node the replicas should be added manually. With \
2nd approach, as the replicas are already created, just copied shard1 leader's index \
files from restored data to all corresponding shard1 replicas index directory on \
newly created collection. Once copy is done, brought up solr nodes and everything was \
working fine.


Thanks & Regards,
Vinodh

-----Original Message-----
From: Erick Erickson <erickerickson@gmail.com> 
Sent: Thursday, May 21, 2020 11:09 PM
To: solr-user@lucene.apache.org
Subject: Re: How to restore deleted collection from filesystem

ATTENTION: External Email – Be Suspicious of Attachments, Links and Requests for \
Login Information.

See inline.

> On May 21, 2020, at 10:13 AM, Kommu, Vinodh K. <vkommu@dtcc.com> wrote:
> 
> Thanks Eric for quick response.
> 
> Yes, our VMs are equipped with NetBackup which is like file based backup and it can \
> restore any files or directories that were deleted from latest available full \
> backup. 
> Can we create an empty collection with the same name which was deleted with same \
> number of shared & replicas and copy the content from restored core to \
> corresponding core?

Kind of. It is NOT necessary that it has the same name. There is no need at all (and \
I do NOT recommend) that you create the same number of replicas to start. As I said \
earlier, create a single-replica (i.e. leader-only) collection with the same number \
of shards. Copy _one_ data dir (not everything under core) to that _one_ \
corresponding replica. It doesn't matter which replica you copy from

> I mean, copy all contents (directories & files) under Oldcollection_shard1_replica1 \
> core from old collection to corresponding Newcollection_shard1_replica1 core in new \
> collection. Would this approach will work? 

As above, do not do this. Just copy the data dir from one of your backup copies to \
the leader-only replica. It doesn't matter at all if the replica names are the same. \
The only thing that matters is that the shard number is identical. For instance, copy \
blah/blah/collection1_shard1_replica_57/data to \
blah/blah/collection1_shared1_replica_1/data if you want.

Once you have a one-replica collection with the data in it and you've done a bit of \
verification, use ADDREPLICA to build it out.

> Lastly anything needs to be aware in core.properties in newly created collection or \
> any reference pointing to new collection specific?

Do not copy or touch  core.properties, you can mess this up thoroughly by \
hand-editing. The _only_ thing you copy is the data directory, which will contain a \
tlog and index directory. And, the tlog isn't even necessary.

Best,
Erick

> 
> 
> Thanks & Regards,
> Vinodh
> 
> -----Original Message-----
> From: Erick Erickson <erickerickson@gmail.com>
> Sent: Thursday, May 21, 2020 6:17 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How to restore deleted collection from filesystem
> 
> ATTENTION: External Email – Be Suspicious of Attachments, Links and Requests for \
> Login Information. 
> So what I'm reading here is that you have the _data_ saved somewhere, right? By \
> "data" I just mean the data directories under the replica. 
> 1> Go ahead and recreate the collection. It _must_ have the same number of shards. \
> Make it leader-only, i.e. replicationFactor == 1 2> The collection will be empty, \
> now shut down the Solr instances hosting any of the replicas. 3> Replace the data \
> directory under each replica with the corresponding one from the backup. \
> "Corresponding" means from the same shard, which should be obvious from the replica \
> name. 4> Start your Solr instances back up and verify it's as you expect.
> 5> Use ADDREPLICA to build out your collection to have as many replicas of each \
> shard as you require. NOTE: I'd do this gradually, maybe 2-3 at a time then wait \
> for them to become active before adding more. The point here is that each \
> ADDREPLICA will cause the entire index down from from the leader and with that many \
> documents you don't want to saturate your network. 
> Best,
> Erick
> 
> > On May 21, 2020, at 8:17 AM, Kommu, Vinodh K. <vkommu@dtcc.com> wrote:
> > 
> > Hi,
> > 
> > One of our largest collection which holds 3.2 billion docs was deleted \
> > accidentally in QA environment. Unfortunately we don't have latest solr backup \
> > for this collection either to restore. The only option left for us is to restore \
> > deleted replica directories under data directory using netbackup restore process. \
> >  We haven't done this way of restore before so following things are not clear:
> > 
> > 1. As the collection was deleted (not created yet), if the necessary replica \
> > directories and files are restore to same location, will the collection works \
> > without creating it again? 2. If above option doesn't work, obviously we have to \
> > create collection but the replica names and placement may not be same as deleted \
> > collection's replica names and placements (creating collections using rule based \
> > replicas) so in this case what need to be done to restore the collection \
> > smoothly. Or is there any predefined steps available to handle this kind of \
> > scenario? Any suggestions is greatly appreciated. 
> > 
> > Thanks & Regards,
> > Vinodh
> > 
> > DTCC DISCLAIMER: This email and any files transmitted with it are confidential \
> > and intended solely for the use of the individual or entity to whom they are \
> > addressed. If you have received this email in error, please notify us immediately \
> > and delete the email and any attachments from your system. The recipient should \
> > check this email and any attachments for the presence of viruses. The company \
> > accepts no liability for any damage caused by any virus transmitted by this \
> > email.
> 
> DTCC DISCLAIMER: This email and any files transmitted with it are confidential and \
> intended solely for the use of the individual or entity to whom they are addressed. \
> If you have received this email in error, please notify us immediately and delete \
> the email and any attachments from your system. The recipient should check this \
> email and any attachments for the presence of viruses. The company accepts no \
> liability for any damage caused by any virus transmitted by this email.

DTCC DISCLAIMER: This email and any files transmitted with it are confidential and \
intended solely for the use of the individual or entity to whom they are addressed. \
If you have received this email in error, please notify us immediately and delete the \
email and any attachments from your system. The recipient should check this email and \
any attachments for the presence of viruses. The company accepts no liability for any \
damage caused by any virus transmitted by this email.


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic