'[ceph-users] Power outages!!! help!'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ceph-users
Subject:    [ceph-users] Power outages!!! help!
From:       ronny+ceph-users () aasen ! cx (Ronny Aasen)
Date:       2017-08-30 14:18:41
Message-ID: cf64e40a-8f33-8c09-1420-b5d502599b11 () aasen ! cx
[Download RAW message or body]

On 30.08.2017 15:32, Steve Taylor wrote:
> I'm not familiar with dd_rescue, but I've just been reading about it. 
> I'm not seeing any features that would be beneficial in this scenario 
> that aren't also available in dd. What specific features give it 
> "really a far better chance of restoring a copy of your disk" than dd? 
> I'm always interested in learning about new recovery tools.

i see i wrote dd_rescue from old habit, but the package one should use 
on debian is gddrescue or also called gnu ddrecue.

this page have some details on the differences on dd vs the ddrescue 
variants.
http://www.toad.com/gnu/sysadmin/index.html#ddrescue

kind regards
Ronny Aasen


> ------------------------------------------------------------------------
>
> 	
>
> *Steve Taylor* | Senior Software Engineer |***StorageCraft Technology 
> Corporation* <https://storagecraft.com>
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> *Office:* 801.871.2799 |
>
> ------------------------------------------------------------------------
> If you are not the intended recipient of this message or received it 
> erroneously, please notify the sender and delete it, together with any 
> attachments, and be advised that any dissemination or copying of this 
> message is prohibited.
>
> ------------------------------------------------------------------------
>
> On Tue, 2017-08-29 at 21:49 +0200, Willem Jan Withagen wrote:
>> On 29-8-2017 19:12, Steve Taylor wrote:
>>> Hong, Probably your best chance at recovering any data without 
>>> special, expensive, forensic procedures is to perform a dd from 
>>> /dev/sdb to somewhere else large enough to hold a full disk image 
>>> and attempt to repair that. You'll want to use 'conv=noerror' with 
>>> your dd command since your disk is failing. Then you could either 
>>> re-attach the OSD from the new source or attempt to retrieve objects 
>>> from the filestore on it. 
>>
>>
>> Like somebody else already pointed out
>> In problem "cases like disk, use dd_rescue.
>> It has really a far better chance of restoring a copy of your disk
>>
>> --WjW
>>
>>> I have actually done this before by creating an RBD that matches the 
>>> disk size, performing the dd, running xfs_repair, and eventually 
>>> adding it back to the cluster as an OSD. RBDs as OSDs is certainly a 
>>> temporary arrangement for repair only, but I'm happy to report that 
>>> it worked flawlessly in my case. I was able to weight the OSD to 0, 
>>> offload all of its data, then remove it for a full recovery, at 
>>> which point I just deleted the RBD. The possibilities afforded by 
>>> Ceph inception are endless. ? Steve Taylor | Senior Software 
>>> Engineer | StorageCraft Technology Corporation 380 Data Drive Suite 
>>> 300 | Draper | Utah | 84020 Office: 801.871.2799 | If you are not 
>>> the intended recipient of this message or received it erroneously, 
>>> please notify the sender and delete it, together with any 
>>> attachments, and be advised that any dissemination or copying of 
>>> this message is prohibited. On Mon, 2017-08-28 at 23:17 +0100, 
>>> Tomasz Kusmierz wrote:
>>>> Rule of thumb with batteries is: - more ?proper temperature? you 
>>>> run them at the more life you get out of them - more battery is 
>>>> overpowered for your application the longer it will survive. Get 
>>>> your self a LSI 94** controller and use it as HBA and you will be 
>>>> fine. but get MORE DRIVES !!!!! ?
>>>>> On 28 Aug 2017, at 23:10, hjcho616 <hjcho616 at yahoo.com 
>>>>> <mailto:hjcho616 at yahoo.com>> wrote: Thank you Tomasz and Ronny. 
>>>>>  I'll have to order some hdd soon and try these out.  Car battery 
>>>>> idea is nice!  I may try that.. =)  Do they last longer?  Ones 
>>>>> that fit the UPS original battery spec didn't last very long... 
>>>>> part of the reason why I gave up on them.. =P  My wife probably 
>>>>> won't like the idea of car battery hanging out though ha! The OSD1 
>>>>> (one with mostly ok OSDs, except that smart failure) motherboard 
>>>>> doesn't have any additional SATA connectors available.  Would it 
>>>>> be safe to add another OSD host? Regards, Hong On Monday, August 
>>>>> 28, 2017 4:43 PM, Tomasz Kusmierz <tom.kusmierz at g mail.com> wrote: 
>>>>> Sorry for being brutal ? anyway 1. get the battery for UPS ( a car 
>>>>> battery will do as well, I?ve moded on ups in the past with truck 
>>>>> battery and it was working like a charm :D ) 2. get spare drives 
>>>>> and put those in because your cluster CAN NOT get out of error due 
>>>>> to lack of space 3. Follow advice of Ronny Aasen on hot to recover 
>>>>> data from hard drives 4 get cooling to drives or you will loose 
>>>>> more !
>>>>>> On 28 Aug 2017, at 22:39, hjcho616 <hjcho616 at yahoo.com 
>>>>>> <mailto:hjcho616 at yahoo.com>> wrote: Tomasz, Those machines are 
>>>>>> behind a surge protector.  Doesn't appear to be a good one!  I do 
>>>>>> have a UPS... but it is my fault... no battery.  Power was pretty 
>>>>>> reliable for a while... and UPS was just beeping every chance it 
>>>>>> had, disrupting some sleep.. =P  So running on surge protector 
>>>>>> only.  I am running this in home environment.   So far, HDD 
>>>>>> failures have been very rare for this environment. =)  It just 
>>>>>> doesn't get loaded as much!  I am not sure what to expect, seeing 
>>>>>> that "unfound" and just a feeling of possibility of maybe getting 
>>>>>> OSD back made me excited about it. =) Thanks for letting me know 
>>>>>> what should be the priority.  I just lack experience and 
>>>>>> knowledge in this. =) Please do continue to guide me though this. 
>>>>>> Thank you for the decode of that smart messages!  I do agree that 
>>>>>> looks like it is on its way out.  I would like to know how to get 
>>>>>> good portion of it back if possible. =) I think I just set the 
>>>>>> size and min_size to 1. # ceph osd lspools 0 data,1 metadata,2 
>>>>>> rbd, # ceph osd pool set rbd size 1 set pool 2 size to 1 # ceph 
>>>>>> osd pool set rbd min_size 1 set pool 2 min_size to 1 Seems to be 
>>>>>> doing some backfilling work. # ceph health HEALTH_ERR 22 pgs are 
>>>>>> stuck inactive for more than 300 seconds; 2 pgs backfill_toofull; 
>>>>>> 74 pgs backfill_wait; 3 pgs backfilling; 108 pgs degraded; 6 pgs 
>>>>>> down; 6 pgs inconsistent; 6 pgs peering; 7 pgs recovery_wait; 16 
>>>>>> pgs stale; 108 pgs stuck degraded; 6 pgs stuck inactive; 16 pgs 
>>>>>> stuck stale; 130 pgs stuck unclean; 101 pgs stuck undersized; 101 
>>>>>> pgs undersized; 1 requests are blocked
>>>>>>> 32 sec; recovery 1790657/4502340 objects degraded (39.772%); 
>>>>>> recovery 641906/4502340 objects misplaced (14.257%); recovery 
>>>>>> 147/2251990 unfound (0.007%); 50 scrub errors; mds cluster is 
>>>>>> degraded; no legacy OSD present but 'sortbitwise' flag is not set 
>>>>>> Regards, Hong On Monday, August 28, 2017 4:18 PM, Tomasz Kusmierz 
>>>>>> <tom.kusmierz @gmail.com> wrote: So to decode few things about 
>>>>>> your disk:   1 Raw_Read_Error_Rate    0x002f  100  100  051    
>>>>>> Pre-fail Always      -      37 37 read erros and only one sector 
>>>>>> marked as pending - fun disk :/ 181 Program_Fail_Cnt_Total  
>>>>>> 0x0022  099  099  000    Old_age Always      -      35325174 So 
>>>>>> firmware has quite few bugs, that?s nice 191 G-Sense_Error_Rate  
>>>>>>     0x0022  100  100  000    Old_age Always      -      2855 disk 
>>>>>> was thrown around while operational even more nice. 194 
>>>>>> Temperature_Celsius    0x0002  047  041  000    Old_age Always    
>>>>>>   -      53 (Min/Max 15/59) if your disk passes 50 you should not 
>>>>>> consider using it, high temperatures demagnetise plate layer and 
>>>>>> you will see more errors in very near future. 197 
>>>>>> Current_Pending_Sector  0x0032  100  100  000    Old_age Always  
>>>>>>     -      1 as mentioned before :) 200 Multi_Zone_Error_Rate  
>>>>>> 0x002a  100  100  000    Old_age Always      -      4222 your 
>>>>>> heads keep missing tracks ? bent ? I don?t even know how to 
>>>>>> comment here. generally fun drive you?ve got there ? rescue as 
>>>>>> much as you can and throw it away !!! 
>>> _______________________________________________ ceph-users mailing 
>>> list ceph-users at lists.ceph.com <mailto:ceph-users at lists.ceph.com> 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>
>>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20170830/60271dda/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 4776 bytes
Desc: not available
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20170830/60271dda/attachment.jpe>

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic