[prev in list] [next in list] [prev in thread] [next in thread]
List: ceph-users
Subject: [ceph-users] Re: [Quincy] Module 'devicehealth' has failed: disk I/O error
From: Patrick Donnelly <pdonnell () redhat ! com>
Date: 2023-02-22 17:03:46
Message-ID: CA+2bHPa3s4BaVTzC9qG=xnf3fo=F-JGo8dhAAy+6q1BXVtMtzQ () mail ! gmail ! com
[Download RAW message or body]
Hello Satish,
On Thu, Feb 9, 2023 at 11:52 AM Satish Patel <satish.txt@gmail.com> wrote:
>
> Folks,
>
> Any idea what is going on, I am running 3 node quincy version of openstack
> and today suddenly i noticed the following error. I found reference link
> but not sure if that is my issue or not
> https://tracker.ceph.com/issues/51974
>
> root@ceph1:~# ceph -s
> cluster:
> id: cd748128-a3ea-11ed-9e46-c309158fad32
> health: HEALTH_ERR
>
> 1 mgr modules have recently crashed
>
> services:
> mon: 3 daemons, quorum ceph1,ceph2,ceph3 (age 2d)
> mgr: ceph1.ckfkeb(active, since 6h), standbys: ceph2.aaptny
> osd: 9 osds: 9 up (since 2d), 9 in (since 2d)
>
> data:
> pools: 4 pools, 128 pgs
> objects: 1.18k objects, 4.7 GiB
> usage: 17 GiB used, 16 TiB / 16 TiB avail
> pgs: 128 active+clean
>
>
>
> root@ceph1:~# ceph health
> HEALTH_ERR Module 'devicehealth' has failed: disk I/O error; 1 mgr modules
> have recently crashed
> root@ceph1:~# ceph crash ls
> ID ENTITY
> NEW
> 2023-02-07T00:07:12.739187Z_fcb9cbc9-bb55-4e7c-bf00-945b96469035
> mgr.ceph1.ckfkeb *
> root@ceph1:~# ceph crash info
> 2023-02-07T00:07:12.739187Z_fcb9cbc9-bb55-4e7c-bf00-945b96469035
> {
> "backtrace": [
> " File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 373,
> in serve\n self.scrape_all()",
> " File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 425,
> in scrape_all\n self.put_device_metrics(device, data)",
> " File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 500,
> in put_device_metrics\n self._create_device(devid)",
> " File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 487,
> in _create_device\n cursor = self.db.execute(SQL, (devid,))",
> "sqlite3.OperationalError: disk I/O error"
> ],
> "ceph_version": "17.2.5",
> "crash_id":
> "2023-02-07T00:07:12.739187Z_fcb9cbc9-bb55-4e7c-bf00-945b96469035",
> "entity_name": "mgr.ceph1.ckfkeb",
> "mgr_module": "devicehealth",
> "mgr_module_caller": "PyModuleRunner::serve",
> "mgr_python_exception": "OperationalError",
> "os_id": "centos",
> "os_name": "CentOS Stream",
> "os_version": "8",
> "os_version_id": "8",
> "process_name": "ceph-mgr",
> "stack_sig":
> "7e506cc2729d5a18403f0373447bb825b42aafa2405fb0e5cfffc2896b093ed8",
> "timestamp": "2023-02-07T00:07:12.739187Z",
> "utsname_hostname": "ceph1",
> "utsname_machine": "x86_64",
> "utsname_release": "5.15.0-58-generic",
> "utsname_sysname": "Linux",
> "utsname_version": "#64-Ubuntu SMP Thu Jan 5 11:43:13 UTC 2023"
It is probably: https://tracker.ceph.com/issues/55606
It is annoying but not serious. The mgr simply lost its lock to the
sqlite database for the devicehealth module. You can workaround by
restarting the mgr:
ceph mgr fail
--
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-leave@ceph.io
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic