'[ceph-users] Re: [Quincy] Module 'devicehealth' has failed: disk I/O error'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ceph-users
Subject:    [ceph-users] Re: [Quincy] Module 'devicehealth' has failed: disk I/O error
From:       Patrick Donnelly <pdonnell () redhat ! com>
Date:       2023-02-22 17:03:46
Message-ID: CA+2bHPa3s4BaVTzC9qG=xnf3fo=F-JGo8dhAAy+6q1BXVtMtzQ () mail ! gmail ! com
[Download RAW message or body]

Hello Satish,

On Thu, Feb 9, 2023 at 11:52 AM Satish Patel <satish.txt@gmail.com> wrote:
>
> Folks,
>
> Any idea what is going on, I am running 3 node quincy  version of openstack
> and today suddenly i noticed the following error. I found reference link
> but not sure if that is my issue or not
> https://tracker.ceph.com/issues/51974
>
> root@ceph1:~# ceph -s
>   cluster:
>     id:     cd748128-a3ea-11ed-9e46-c309158fad32
>     health: HEALTH_ERR
>
>             1 mgr modules have recently crashed
>
>   services:
>     mon: 3 daemons, quorum ceph1,ceph2,ceph3 (age 2d)
>     mgr: ceph1.ckfkeb(active, since 6h), standbys: ceph2.aaptny
>     osd: 9 osds: 9 up (since 2d), 9 in (since 2d)
>
>   data:
>     pools:   4 pools, 128 pgs
>     objects: 1.18k objects, 4.7 GiB
>     usage:   17 GiB used, 16 TiB / 16 TiB avail
>     pgs:     128 active+clean
>
>
>
> root@ceph1:~# ceph health
> HEALTH_ERR Module 'devicehealth' has failed: disk I/O error; 1 mgr modules
> have recently crashed
> root@ceph1:~# ceph crash ls
> ID                                                                ENTITY
>          NEW
> 2023-02-07T00:07:12.739187Z_fcb9cbc9-bb55-4e7c-bf00-945b96469035
>  mgr.ceph1.ckfkeb   *
> root@ceph1:~# ceph crash info
> 2023-02-07T00:07:12.739187Z_fcb9cbc9-bb55-4e7c-bf00-945b96469035
> {
>     "backtrace": [
>         "  File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 373,
> in serve\n    self.scrape_all()",
>         "  File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 425,
> in scrape_all\n    self.put_device_metrics(device, data)",
>         "  File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 500,
> in put_device_metrics\n    self._create_device(devid)",
>         "  File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 487,
> in _create_device\n    cursor = self.db.execute(SQL, (devid,))",
>         "sqlite3.OperationalError: disk I/O error"
>     ],
>     "ceph_version": "17.2.5",
>     "crash_id":
> "2023-02-07T00:07:12.739187Z_fcb9cbc9-bb55-4e7c-bf00-945b96469035",
>     "entity_name": "mgr.ceph1.ckfkeb",
>     "mgr_module": "devicehealth",
>     "mgr_module_caller": "PyModuleRunner::serve",
>     "mgr_python_exception": "OperationalError",
>     "os_id": "centos",
>     "os_name": "CentOS Stream",
>     "os_version": "8",
>     "os_version_id": "8",
>     "process_name": "ceph-mgr",
>     "stack_sig":
> "7e506cc2729d5a18403f0373447bb825b42aafa2405fb0e5cfffc2896b093ed8",
>     "timestamp": "2023-02-07T00:07:12.739187Z",
>     "utsname_hostname": "ceph1",
>     "utsname_machine": "x86_64",
>     "utsname_release": "5.15.0-58-generic",
>     "utsname_sysname": "Linux",
>     "utsname_version": "#64-Ubuntu SMP Thu Jan 5 11:43:13 UTC 2023"

It is probably: https://tracker.ceph.com/issues/55606

It is annoying but not serious. The mgr simply lost its lock to the
sqlite database for the devicehealth module. You can workaround by
restarting the mgr:

ceph mgr fail

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-leave@ceph.io
[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic