[prev in list] [next in list] [prev in thread] [next in thread]
List: ceph-users
Subject: [ceph-users] Re: Missing OSD in SSD after disk failure
From: Eric Fahnle <efahnle () nubi2go ! com>
Date: 2021-08-31 18:10:02
Message-ID: RO1PR80MB018683D63005A9C0908BA244F2CC9 () RO1PR80MB0186 ! lamprd80 ! prod ! outlook ! com
[Download RAW message or body]
Hi David, no problem, thanks for your help!
Went through your commands, here are the results
-4 Servers with OSDs
-Server "nubceph04" has 2 osd (osd.0 and osd.7 in /dev/sdb and /dev/sdc respectively, \
and db_device in /dev/sdd)
# capture "db device" and raw device associated with OSD (just for safety)
"ceph-volume lvm list" shows for each osd, which disks and lvs are in use (snipped):
====== osd.0 =======
[block] /dev/ceph-block-b301ec31-5779-4834-9fb7-e45afa45f803/osd-block-79d89e54-4a4b-4e89-aea3-72fa6aa343a5
db device \
/dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-e7771b96-7a1d-43b2-a7d8-9204ef158224
osd id 0
devices /dev/sdb
[db] /dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-e7771b96-7a1d-43b2-a7d8-9204ef158224
block device \
/dev/ceph-block-b301ec31-5779-4834-9fb7-e45afa45f803/osd-block-79d89e54-4a4b-4e89-aea3-72fa6aa343a5
db device \
/dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-e7771b96-7a1d-43b2-a7d8-9204ef158224
osd id 0
devices /dev/sdd
====== osd.7 =======
[block] /dev/ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3/osd-block-42278e28-5274-4167-a014-6a6a956110ad
block device \
/dev/ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3/osd-block-42278e28-5274-4167-a014-6a6a956110ad
osd id 7
devices /dev/sdc
[db] /dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-fd2bd125-3f22-40f1-8524-744a100236f3
block device \
/dev/ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3/osd-block-42278e28-5274-4167-a014-6a6a956110ad
db device \
/dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-fd2bd125-3f22-40f1-8524-744a100236f3
osd id 7
devices /dev/sdd
# drain drive if possible, do this when planning replacement, otherwise do once \
failure has occurred #Try to remove the osd.7
ceph orch osd rm 7 --replace
Scheduled OSD(s) for removal
Waited until it finished rebalancing, monitoring with:
ceph -W cephadm
2021-08-30T18:05:32.280716-0300 mgr.nubvm02.viqmmr [INF] OSD <7> is not empty yet. \
Waiting a bit more 2021-08-30T18:06:03.374424-0300 mgr.nubvm02.viqmmr [INF] OSDs \
<[<OSD>(osd_id=7, is_draining=False)]> are now <down>
# One drained (or if failure occurred) (we don't use the orch version
#yet because we've had issues with it)
ceph-volume lvm zap --osd-id 7 --destroy
--> Zapping: /dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-fd2bd125-3f22-40f1-8524-744a100236f3
Running command: /usr/bin/dd if=/dev/zero \
of=/dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-fd2bd125-3f22-40f1-8524-744a100236f3 \
bs=1M count=10 conv=fsync stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.0853454 s, 123 MB/s
--> More than 1 LV left in VG, will proceed to destroy LV only
--> Removing LV because --destroy was given: \
/dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-fd2bd125-3f22-40f1-8524-744a100236f3
Running command: /usr/sbin/lvremove -v -f \
/dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-fd2bd125-3f22-40f1-8524-744a100236f3
stdout: Logical volume "osd-block-db-fd2bd125-3f22-40f1-8524-744a100236f3" \
successfully removed stderr: Removing \
ceph--block--dbs--08ee3a44--8503--40dd--9bdd--ed9a8f674a54-osd--block--db--fd2bd125--3f22--40f1--8524--744a100236f3 \
(253:3) stderr: Archiving volume group \
"ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54" metadata (seqno 9). stderr: \
Releasing logical volume "osd-block-db-fd2bd125-3f22-40f1-8524-744a100236f3" stderr: \
Creating volume group backup \
"/etc/lvm/backup/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54" \
(seqno 10).
--> Zapping: /dev/ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3/osd-block-42278e28-5274-4167-a014-6a6a956110ad
Running command: /usr/bin/dd if=/dev/zero \
of=/dev/ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3/osd-block-42278e28-5274-4167-a014-6a6a956110ad \
bs=1M count=10 conv=fsync stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.054587 s, 192 MB/s
--> Only 1 LV left in VG, will proceed to destroy volume group \
ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3 Running command: /usr/sbin/vgremove \
-v -f ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3 stderr: Removing \
ceph--block--c3d30e81--ff7d--4007--9ad4--c16f852466a3-osd--block--42278e28--5274--4167--a014--6a6a956110ad \
(253:2) stderr: Archiving volume group \
"ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3" metadata (seqno 5). Releasing \
logical volume "osd-block-42278e28-5274-4167-a014-6a6a956110ad" stderr: Creating \
volume group backup "/etc/lvm/backup/ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3" \
(seqno 6). stdout: Logical volume "osd-block-42278e28-5274-4167-a014-6a6a956110ad" \
successfully removed stderr: Removing physical volume "/dev/sdc" from volume group \
"ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3" stdout: Volume group \
"ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3" successfully \
removed
--> Zapping successful for OSD: 7
after that, the command:
ceph-volume lvm list
shows on osd.0 the same as above, but nothing about osd.7
# refresh devices
ceph orch device ls --refresh
HOST PATH TYPE SIZE DEVICE_ID MODEL VENDOR ROTATIONAL AVAIL \
REJECT REASONS nubceph04 /dev/sda hdd 19.0G Virtual disk VMware 1 \
False locked nubceph04 /dev/sdb hdd 20.0G Virtual disk VMware 1 \
False locked, Insufficient space (<5GB) on vgs, LVM detected nubceph04 /dev/sdc \
hdd 20.0G Virtual disk VMware 1 True nubceph04 /dev/sdd \
hdd 10.0G Virtual disk VMware 1 False locked, LVM detected
After some time, recreates osd.7 but without db_device
# monitor ceph for replacement
ceph -W cephadm
..
2021-08-30T18:11:22.439190-0300 mgr.nubvm02.viqmmr [INF] Deploying daemon osd.7 on \
nubceph04
..
Wait until it finishes rebalancing. If I run again:
ceph-volume lvm list
shows for each osd, which disks and lvs are in use (snipped):
====== osd.0 =======
[block] /dev/ceph-block-b301ec31-5779-4834-9fb7-e45afa45f803/osd-block-79d89e54-4a4b-4e89-aea3-72fa6aa343a5
db device \
/dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-e7771b96-7a1d-43b2-a7d8-9204ef158224
osd id 0
devices /dev/sdb
[db] /dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-e7771b96-7a1d-43b2-a7d8-9204ef158224
block device \
/dev/ceph-block-b301ec31-5779-4834-9fb7-e45afa45f803/osd-block-79d89e54-4a4b-4e89-aea3-72fa6aa343a5
db device \
/dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-e7771b96-7a1d-43b2-a7d8-9204ef158224
osd id 0
devices /dev/sdd
====== osd.7 =======
[block] /dev/ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3/osd-block-42278e28-5274-4167-a014-6a6a956110ad
block device \
/dev/ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3/osd-block-42278e28-5274-4167-a014-6a6a956110ad
osd id 7
devices /dev/sdc
It seems it didn't create the lv for "ceph-block-dbs" as it had before
If I run everything again but with osd.0, it creates correctly, because when running:
ceph-volume lvm zap --osd-id 0 --destroy
It doesn't say this line:
--> More than 1 LV left in VG, will proceed to destroy LV only
But it rather says this:
--> Only 1 LV left in VG, will proceed to destroy volume group
As far as I can tell, if the disk is not empty it just doesn't use it. Let me know if \
I wasn't clear enough.
Best regards,
Eric
________________________________
From: David Orman <ormandj@corenode.com>
Sent: Monday, August 30, 2021 1:14 PM
To: Eric Fahnle <efahnle@nubi2go.com>
Cc: ceph-users@ceph.io <ceph-users@ceph.io>
Subject: Re: [ceph-users] Missing OSD in SSD after disk failure
I may have misread your original email, for which I apologize. If you
do a 'ceph orch device ls' does the NVME in question show available?
On that host with the failed OSD, if you lvs/lsblk do you see the old
DB on the NVME still? I'm not sure if the replacement process you
followed will work. Here's what we do on OSD pre-failure as well as
failures on nodes with NVME backing the OSD for DB/WAL:
In cephadm shell, on host with drive to replace (in this example,
let's say 391 on a node called ceph15):
# capture "db device" and raw device associated with OSD (just for safety)
ceph-volume lvm list | less
# drain drive if possible, do this when planning replacement,
otherwise do once failure has occurred
ceph orch osd rm 391 --replace
# One drained (or if failure occurred) (we don't use the orch version
yet because we've had issues with it)
ceph-volume lvm zap --osd-id 391 --destroy
# refresh devices
ceph orch device ls --refresh
# monitor ceph for replacement
ceph -W cephadm
# once daemon has been deployed "2021-03-25T18:03:16.742483+0000
mgr.ceph02.duoetc [INF] Deploying daemon osd.391 on ceph15", watch for
rebalance to complete
ceph -s
# consider increasing max_backfills if it's just a single drive replacement:
ceph config set osd osd_max_backfills 10
# if you do, after backfilling is complete (validate with 'ceph -s'):
ceph config rm osd osd_max_backfills
The lvm zap cleans up the db/wal LV, which allows for the replacement
drive to rebuild with db/wal on the NVME.
Hope this helps,
David
On Fri, Aug 27, 2021 at 7:21 PM Eric Fahnle <efahnle@nubi2go.com> wrote:
>
> Hi David! Very much appreciated your response.
>
> I'm not sure that may be the problem. I tried with the following (without using \
> "rotational"):
> ...(snip)...
> data_devices:
> size: "15G:"
> db_devices:
> size: ":15G"
> filter_logic: AND
> placement:
> label: "osdj2"
> service_id: test_db_device
> service_type: osd
> ...(snip)...
>
> Without success. Also tried without the "filter_logic: AND" in the yaml file and \
> the result was the same.
> Best regards,
> Eric
>
>
> -----Original Message-----
> From: David Orman [mailto:ormandj@corenode.com]
> Sent: 27 August 2021 14:56
> To: Eric Fahnle
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] Missing OSD in SSD after disk failure
>
> This was a bug in some versions of ceph, which has been fixed:
>
> https://tracker.ceph.com/issues/49014
> https://github.com/ceph/ceph/pull/39083
>
> You'll want to upgrade Ceph to resolve this behavior, or you can use size or \
> something else to filter if that is not possible.
> David
>
> On Thu, Aug 19, 2021 at 9:12 AM Eric Fahnle <efahnle@nubi2go.com> wrote:
> >
> > Hi everyone!
> > I've got a doubt, I tried searching for it in this list, but didn't find an \
> > answer.
> > I've got 4 OSD servers. Each server has 4 HDDs and 1 NVMe SSD disk. The \
> > deployment was done with "ceph orch apply deploy-osd.yaml", in which the file \
> > "deploy-osd.yaml" contained the following:
> > ---
> > service_type: osd
> > service_id: default_drive_group
> > placement:
> > label: "osd"
> > data_devices:
> > rotational: 1
> > db_devices:
> > rotational: 0
> >
> > After the deployment, each HDD had an OSD and the NVMe shared the 4 OSDs, plus \
> > the DB.
> > A few days ago, an HDD broke and got replaced. Ceph detected the new disk and \
> > created a new OSD for the HDD but didn't use the NVMe. Now the NVMe in that \
> > server has 3 OSDs running but didn't add the new one. I couldn't find out how to \
> > re-create the OSD with the exact configuration it had before. The only "way" I \
> > found was to delete all 4 OSDs and create everything from scratch (I didn't \
> > actually do it, as I hope there is a better way).
> > Has anyone had this issue before? I'd be glad if someone pointed me in the right \
> > direction.
> > Currently running:
> > Version
> > 15.2.8
> > octopus (stable)
> >
> > Thank you in advance and best regards, Eric
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> > email to ceph-users-leave@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-leave@ceph.io
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic