[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ceph-users
Subject:    [ceph-users] Re: fault tolerant about erasure code pool
From:       Zhenshi Zhou <deaderzzs () gmail ! com>
Date:       2020-06-26 10:04:48
Message-ID: CAJTsu_or9fJ6vLramKHPxkCkBvZpsHBcWbdQ_8_Dw0ASBJG7+g () mail ! gmail ! com
[Download RAW message or body]

Hi Janne,

I use the default profile(2+1) and set failure-domain=host, is my best
practice?

Janne Johansson <icepic.dz@gmail.com> 于2020年6月26日周五 下午4:59写道:

> Den fre 26 juni 2020 kl 10:32 skrev Zhenshi Zhou <deaderzzs@gmail.com>:
>
>> Hi all,
>>
>> I'm going to deploy a cluster with erasure code pool for cold storage.
>> There are 3 servers for me to set up the cluster, 12 OSDs on each server.
>> Does that mean the data is secure while 1/3 OSDs of the cluster is down,
>> or only 2 of the OSDs is down , if I set the ec profile with k=4 and m=2.
>>
>
> By default, crush will want to place each part (of 6 in your case for EC
> 4+2)
> on a host of its own, to maximize data safety. Since you can't do that
> with 3
> hosts, you must make sure no more than 2 pieces end up on a single host
> ever,
> so you can't just move from failure-domain=host to domain=osd, since that
> would place all 6 pieces on the same host but different OSDs which would be
> bad.
>
> You need to make the crush rule pick two different OSDs per host, but not
> more.
> One way could be to make a tree where hosts has half of its OSDs in one
> branch
> and the other half in another (lets call it subhost in this example), then
> you get 3*2
> subhosts, and you make crush pick placement from subhosts and it will
> always put
> two pieces per OSD host, never on the same OSD and it will allow one host
> to be
> down for a while.
>
> I would like to add that data is not very secure when you have no
> redundancy at all
> left. Machines will crash, they will require maintenance, patches, bios
> updates and
> things like that, and having NO redundancy while you have planned or
> unplanned
> downtime will be placing the data at huge risk, _any_ surprise in this
> situation would
> immediately lead to data loss.
>
> Also, if one box dies, the cluster can't run and can't recover until you
> have a new
> host back in, so you are already running at the edge of data safety in
> your normal case.
> Even if this will "work", ceph as being a cluster really should have N+1
> hosts or more
> if your data split (replication factor or EC k+m) is equal to N.
>
> --
> May the most significant bit of your life be positive.
>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-leave@ceph.io

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic