'[ceph-users] Re: Missing OSD in SSD after disk failure'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ceph-users
Subject:    [ceph-users] Re: Missing OSD in SSD after disk failure
From:       Eric Fahnle <efahnle () nubi2go ! com>
Date:       2021-08-31 18:10:02
Message-ID: RO1PR80MB018683D63005A9C0908BA244F2CC9 () RO1PR80MB0186 ! lamprd80 ! prod ! outlook ! com
[Download RAW message or body]

Hi David, no problem, thanks for your help!

Went through your commands, here are the results

-4 Servers with OSDs
-Server "nubceph04" has 2 osd (osd.0 and osd.7 in /dev/sdb and /dev/sdc respectively, \
and db_device in /dev/sdd)

# capture "db device" and raw device associated with OSD (just for safety)
"ceph-volume lvm list" shows for each osd, which disks and lvs are in use (snipped):
====== osd.0 =======
  [block]       /dev/ceph-block-b301ec31-5779-4834-9fb7-e45afa45f803/osd-block-79d89e54-4a4b-4e89-aea3-72fa6aa343a5
                
      db device                 \
/dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-e7771b96-7a1d-43b2-a7d8-9204ef158224
  osd id                    0
      devices                   /dev/sdb
  [db]          /dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-e7771b96-7a1d-43b2-a7d8-9204ef158224
                
      block device              \
/dev/ceph-block-b301ec31-5779-4834-9fb7-e45afa45f803/osd-block-79d89e54-4a4b-4e89-aea3-72fa6aa343a5
                
      db device                 \
/dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-e7771b96-7a1d-43b2-a7d8-9204ef158224
  osd id                    0
      devices                   /dev/sdd
====== osd.7 =======
  [block]       /dev/ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3/osd-block-42278e28-5274-4167-a014-6a6a956110ad
                
      block device              \
/dev/ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3/osd-block-42278e28-5274-4167-a014-6a6a956110ad
  osd id                    7
      devices                   /dev/sdc
  [db]          /dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-fd2bd125-3f22-40f1-8524-744a100236f3
                
      block device              \
/dev/ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3/osd-block-42278e28-5274-4167-a014-6a6a956110ad
                
      db device                 \
/dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-fd2bd125-3f22-40f1-8524-744a100236f3
  osd id                    7
      devices                   /dev/sdd


# drain drive if possible, do this when planning replacement, otherwise do once \
failure has occurred #Try to remove the osd.7

ceph orch osd rm 7 --replace
Scheduled OSD(s) for removal

Waited until it finished rebalancing, monitoring with:
ceph -W cephadm
2021-08-30T18:05:32.280716-0300 mgr.nubvm02.viqmmr [INF] OSD <7> is not empty yet. \
Waiting a bit more 2021-08-30T18:06:03.374424-0300 mgr.nubvm02.viqmmr [INF] OSDs \
<[<OSD>(osd_id=7, is_draining=False)]> are now <down>

# One drained (or if failure occurred) (we don't use the orch version
#yet because we've had issues with it)
ceph-volume lvm zap --osd-id 7 --destroy
--> Zapping: /dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-fd2bd125-3f22-40f1-8524-744a100236f3
 Running command: /usr/bin/dd if=/dev/zero \
of=/dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-fd2bd125-3f22-40f1-8524-744a100236f3 \
bs=1M count=10 conv=fsync  stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.0853454 s, 123 MB/s
--> More than 1 LV left in VG, will proceed to destroy LV only
--> Removing LV because --destroy was given: \
/dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-fd2bd125-3f22-40f1-8524-744a100236f3
 Running command: /usr/sbin/lvremove -v -f \
/dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-fd2bd125-3f22-40f1-8524-744a100236f3
  stdout: Logical volume "osd-block-db-fd2bd125-3f22-40f1-8524-744a100236f3" \
successfully removed  stderr: Removing \
ceph--block--dbs--08ee3a44--8503--40dd--9bdd--ed9a8f674a54-osd--block--db--fd2bd125--3f22--40f1--8524--744a100236f3 \
(253:3)  stderr: Archiving volume group \
"ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54" metadata (seqno 9).  stderr: \
Releasing logical volume "osd-block-db-fd2bd125-3f22-40f1-8524-744a100236f3"  stderr: \
Creating volume group backup \
                "/etc/lvm/backup/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54" \
                (seqno 10).
--> Zapping: /dev/ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3/osd-block-42278e28-5274-4167-a014-6a6a956110ad
 Running command: /usr/bin/dd if=/dev/zero \
of=/dev/ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3/osd-block-42278e28-5274-4167-a014-6a6a956110ad \
bs=1M count=10 conv=fsync  stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.054587 s, 192 MB/s
--> Only 1 LV left in VG, will proceed to destroy volume group \
ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3 Running command: /usr/sbin/vgremove \
-v -f ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3  stderr: Removing \
ceph--block--c3d30e81--ff7d--4007--9ad4--c16f852466a3-osd--block--42278e28--5274--4167--a014--6a6a956110ad \
(253:2)  stderr: Archiving volume group \
"ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3" metadata (seqno 5).  Releasing \
logical volume "osd-block-42278e28-5274-4167-a014-6a6a956110ad"  stderr: Creating \
volume group backup "/etc/lvm/backup/ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3" \
(seqno 6).  stdout: Logical volume "osd-block-42278e28-5274-4167-a014-6a6a956110ad" \
successfully removed  stderr: Removing physical volume "/dev/sdc" from volume group \
"ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3"  stdout: Volume group \
                "ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3" successfully \
                removed
--> Zapping successful for OSD: 7

after that, the command:

ceph-volume lvm list

shows on osd.0 the same as above, but nothing about osd.7

# refresh devices
ceph orch device ls --refresh

HOST       PATH      TYPE   SIZE  DEVICE_ID  MODEL         VENDOR  ROTATIONAL  AVAIL  \
REJECT REASONS nubceph04  /dev/sda  hdd   19.0G             Virtual disk  VMware  1   \
False  locked nubceph04  /dev/sdb  hdd   20.0G             Virtual disk  VMware  1    \
False  locked, Insufficient space (<5GB) on vgs, LVM detected nubceph04  /dev/sdc  \
hdd   20.0G             Virtual disk  VMware  1           True nubceph04  /dev/sdd  \
hdd   10.0G             Virtual disk  VMware  1           False  locked, LVM detected

After some time, recreates osd.7 but without db_device

# monitor ceph for replacement
ceph -W cephadm
..
2021-08-30T18:11:22.439190-0300 mgr.nubvm02.viqmmr [INF] Deploying daemon osd.7 on \
                nubceph04
..

Wait until it finishes rebalancing. If I run again:

ceph-volume lvm list

shows for each osd, which disks and lvs are in use (snipped):
====== osd.0 =======
  [block]       /dev/ceph-block-b301ec31-5779-4834-9fb7-e45afa45f803/osd-block-79d89e54-4a4b-4e89-aea3-72fa6aa343a5
                
      db device                 \
/dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-e7771b96-7a1d-43b2-a7d8-9204ef158224
  osd id                    0
      devices                   /dev/sdb
  [db]          /dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-e7771b96-7a1d-43b2-a7d8-9204ef158224
                
      block device              \
/dev/ceph-block-b301ec31-5779-4834-9fb7-e45afa45f803/osd-block-79d89e54-4a4b-4e89-aea3-72fa6aa343a5
                
      db device                 \
/dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-e7771b96-7a1d-43b2-a7d8-9204ef158224
  osd id                    0
      devices                   /dev/sdd
====== osd.7 =======
  [block]       /dev/ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3/osd-block-42278e28-5274-4167-a014-6a6a956110ad
                
      block device              \
/dev/ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3/osd-block-42278e28-5274-4167-a014-6a6a956110ad
  osd id                    7
      devices                   /dev/sdc

It seems it didn't create the lv for "ceph-block-dbs" as it had before

If I run everything again but with osd.0, it creates correctly, because when running:

ceph-volume lvm zap --osd-id 0 --destroy

It doesn't say this line:

--> More than 1 LV left in VG, will proceed to destroy LV only

But it rather says this:

--> Only 1 LV left in VG, will proceed to destroy volume group

As far as I can tell, if the disk is not empty it just doesn't use it. Let me know if \
I wasn't clear enough.

Best regards,
Eric
________________________________
From: David Orman <ormandj@corenode.com>
Sent: Monday, August 30, 2021 1:14 PM
To: Eric Fahnle <efahnle@nubi2go.com>
Cc: ceph-users@ceph.io <ceph-users@ceph.io>
Subject: Re: [ceph-users] Missing OSD in SSD after disk failure

I may have misread your original email, for which I apologize. If you
do a 'ceph orch device ls' does the NVME in question show available?
On that host with the failed OSD, if you lvs/lsblk do you see the old
DB on the NVME still? I'm not sure if the replacement process you
followed will work. Here's what we do on OSD pre-failure as well as
failures on nodes with NVME backing the OSD for DB/WAL:

In cephadm shell, on host with drive to replace (in this example,
let's say 391 on a node called ceph15):

# capture "db device" and raw device associated with OSD (just for safety)
ceph-volume lvm list | less

# drain drive if possible, do this when planning replacement,
otherwise do once failure has occurred
ceph orch osd rm 391 --replace

# One drained (or if failure occurred) (we don't use the orch version
yet because we've had issues with it)
ceph-volume lvm zap --osd-id 391 --destroy

# refresh devices
ceph orch device ls --refresh

# monitor ceph for replacement
ceph -W cephadm

# once daemon has been deployed "2021-03-25T18:03:16.742483+0000
mgr.ceph02.duoetc [INF] Deploying daemon osd.391 on ceph15", watch for
rebalance to complete
ceph -s

# consider increasing max_backfills if it's just a single drive replacement:
ceph config set osd osd_max_backfills 10

# if you do, after backfilling is complete (validate with 'ceph -s'):
ceph config rm osd osd_max_backfills

The lvm zap cleans up the db/wal LV, which allows for the replacement
drive to rebuild with db/wal on the NVME.

Hope this helps,
David

On Fri, Aug 27, 2021 at 7:21 PM Eric Fahnle <efahnle@nubi2go.com> wrote:
> 
> Hi David! Very much appreciated your response.
> 
> I'm not sure that may be the problem. I tried with the following (without using \
> "rotational"): 
> ...(snip)...
> data_devices:
> size: "15G:"
> db_devices:
> size: ":15G"
> filter_logic: AND
> placement:
> label: "osdj2"
> service_id: test_db_device
> service_type: osd
> ...(snip)...
> 
> Without success. Also tried without the "filter_logic: AND" in the yaml file and \
> the result was the same. 
> Best regards,
> Eric
> 
> 
> -----Original Message-----
> From: David Orman [mailto:ormandj@corenode.com]
> Sent: 27 August 2021 14:56
> To: Eric Fahnle
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] Missing OSD in SSD after disk failure
> 
> This was a bug in some versions of ceph, which has been fixed:
> 
> https://tracker.ceph.com/issues/49014
> https://github.com/ceph/ceph/pull/39083
> 
> You'll want to upgrade Ceph to resolve this behavior, or you can use size or \
> something else to filter if that is not possible. 
> David
> 
> On Thu, Aug 19, 2021 at 9:12 AM Eric Fahnle <efahnle@nubi2go.com> wrote:
> > 
> > Hi everyone!
> > I've got a doubt, I tried searching for it in this list, but didn't find an \
> > answer. 
> > I've got 4 OSD servers. Each server has 4 HDDs and 1 NVMe SSD disk. The \
> > deployment was done with "ceph orch apply deploy-osd.yaml", in which the file \
> >                 "deploy-osd.yaml" contained the following:
> > ---
> > service_type: osd
> > service_id: default_drive_group
> > placement:
> > label: "osd"
> > data_devices:
> > rotational: 1
> > db_devices:
> > rotational: 0
> > 
> > After the deployment, each HDD had an OSD and the NVMe shared the 4 OSDs, plus \
> > the DB. 
> > A few days ago, an HDD broke and got replaced. Ceph detected the new disk and \
> > created a new OSD for the HDD but didn't use the NVMe. Now the NVMe in that \
> > server has 3 OSDs running but didn't add the new one. I couldn't find out how to \
> > re-create the OSD with the exact configuration it had before. The only "way" I \
> > found was to delete all 4 OSDs and create everything from scratch (I didn't \
> > actually do it, as I hope there is a better way). 
> > Has anyone had this issue before? I'd be glad if someone pointed me in the right \
> > direction. 
> > Currently running:
> > Version
> > 15.2.8
> > octopus (stable)
> > 
> > Thank you in advance and best regards, Eric
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> > email to ceph-users-leave@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-leave@ceph.io


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic