'[ceph-users] Re: OSD crash with assertion'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ceph-users
Subject:    [ceph-users] Re: OSD crash with assertion
From:       Eugen Block <eblock () nde ! ag>
Date:       2020-06-23 6:21:25
Message-ID: 20200623062125.Horde.g5br61wQPKhbmlBJHWbLsDb () webmail ! nde ! ag
[Download RAW message or body]

Hi,

although changing an existing EC profile (by force) is possible (I  
haven't tried in Octopus yet) it won't have any effect on existing  
pools [1]:

> Choosing the right profile is important because it cannot be  
> modified after the pool is created: a new pool with a different  
> profile needs to be created and all objects from the previous pool  
> moved to the new.

You can either change the crush_rule for that pool to get a different  
distribution (but it won't change k and m) or follow Sylvain's  
description to copy the pool content to a new pool with the desired EC  
profile.

Regards,
Eugen

[1]  
https://docs.ceph.com/docs/master/rados/operations/erasure-code/#erasure-code-profiles


Zitat von Michael Fladischer <michael@fladi.at>:

> Hi Sylvain,
>
> Yeah, that's the best and safes way to do it. The pool I wrecked was  
> fortunately a dummy-pool.
>
> The pool for which I want to change to EC profile is ~4PiB large, so  
> moving all files (pool is used in CephFS) on it to a new pool might  
> take some time and I was hoping for an in-place configuration  
> change. But as demonstrated by my own recklessness, this does not  
> work and will take most of the OSD down with it.
>
> Regards,
> Michael
>
> Am 22.06.2020 um 21:39 schrieb St-Germain, Sylvain (SSC/SPC):
>> The way I did is I create a new pool, copy data on it and put the  
>> new pool in place of the old one after I delete the former pool
>>
>> echo "--------------------------------------------------------------------"
>> echo " Create a new pool with erasure coding"
>> echo "--------------------------------------------------------------------"
>> sudo ceph osd pool create $pool.new 64 64 erasure ecprofile-5-3
>>
>> echo "--------------------------------------------------------------------"
>> echo " Copy the original pool to the new pool"
>> echo "--------------------------------------------------------------------"
>> sudo rados cppool $pool $pool.new
>>
>> echo "--------------------------------------------------------------------"
>> echo " Rename the original pool to .old"
>> echo "--------------------------------------------------------------------"
>> sudo ceph osd pool rename $pool $pool.old
>>
>> echo "--------------------------------------------------------------------"
>> echo " Rename the new erasure coding pool to $pool"
>> echo "--------------------------------------------------------------------"
>> sudo ceph osd pool rename $pool.new $pool
>>
>> echo "--------------------------------------------------------------------"
>> echo " Set the pool: $pool  to autoscaling"
>> echo "--------------------------------------------------------------------"
>> sudo ceph osd pool set $pool pg_autoscale_mode on
>>
>> echo "--------------------------------------------------------------------"
>> echo " Show detail off the new create pool"
>> echo "--------------------------------------------------------------------"
>> sudo ceph osd pool get $pool all
>>
>> Sylvain
>>
>> -----Message d'origine-----
>> De : Michael Fladischer <michael@fladi.at>
>> Envoyé : 22 juin 2020 15:23
>> À : ceph-users@ceph.io
>> Objet : [ceph-users] Re: OSD crash with assertion
>>
>> Turns out, I really messed up when changing the EC profile.  
>> Removing the pool did not get rid of it's PGs on the OSDs that have  
>> crashed.
>>
>> To get my OSDs back up I used ceph-objectstore-tool like this:
>>
>> for PG in $(ceph-objectstore-tool --data-path $DIR --type=bluestore  
>> --op=list-pgs |grep '^$POOL_ID'); do
>> 	ceph-objectstore-tool --data-path $DIR --type=bluestore  
>> --op=remove --force --pgid=$PG done
>>
>> $DIR is the data path of the crashed OSD.
>> $POOL_ID is the ID of the pool with the messed up EC profile.
>>
>> I'm now curious if there is an easier way to do this?
>>
>> After getting rid of all PGs the OSD were able to start again. Hope  
>> this helps someone.
>>
>> Regards,
>> Michael
>>
>>
>> Am 22.06.2020 um 19:46 schrieb Michael Fladischer:
>>> Hi,
>>>
>>> a lot of our OSD have crashed a few hours ago because of a failed
>>> assertion:
>>>
>>> /build/ceph-15.2.3/src/osd/ECUtil.h: 34: FAILED
>>> ceph_assert(stripe_width % stripe_size == 0)
>>>
>>> Full output here:
>>> https://pastebin.com/D1SXzKsK
>>>
>>> All OSDs are on bluestore and run 15.2.3.
>>>
>>> I think I messed up when I tried to change an existing EC profile
>>> (using
>>> --force) for an active EC pool.
>>>
>>> I already tried to delete the pool and the EC profile and start the
>>> OSDs but they keep crashing with the same assertion.
>>>
>>> Is there a way to at least find out what the values are for
>>> stripe_width and stripe_size?
>>>
>>> Regards,
>>> Michael
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
>>> email to ceph-users-leave@ceph.io
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send  
>> an email to ceph-users-leave@ceph.io
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-leave@ceph.io
>>
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-leave@ceph.io


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-leave@ceph.io

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic