Actions

Copy link

Bug #65494

open

ceph-mgr critical error: "Module 'devicehealth' has failed: table Device already exists"

Added by Nir Soffer about 1 month ago. Updated 6 days ago.

Status:

Pending Backport

Priority:

Normal

Assignee:

Patrick Donnelly

Target version:

Ceph - v20.0.0

% Done:

Source:

Community (dev)

Tags:

backport_processed

Backport:

squid,reef,quincy

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

56997

Crash signature (v1):

Crash signature (v2):

Description

Description¶

We have a random error (about 1 in 200 deploys) when after creating a rook
cephcluster and cephblockpool successfully, configuring rbd mirroring and
adding a cephrbdmirror, the cephrbdmirror never becomes ready (we waited few hours).

Looking at ceph status shows:

  cluster:
    id:     dbf6c8b8-dd8b-4117-933e-93778b1a7274
    health: HEALTH_ERR
            Module 'devicehealth' has failed: table Device already exists

In rook-ceph-mgr-a pod logs we see:

debug 2024-04-09T13:05:48.947+0000 7f0632607700 -1 Traceback (most recent call last):
  File "/usr/share/ceph/mgr/mgr_module.py", line 524, in check
    return func(self, *args, **kwargs)
  File "/usr/share/ceph/mgr/devicehealth/module.py", line 350, in _do_serve
    if self.db_ready() and self.enable_monitoring:
  File "/usr/share/ceph/mgr/mgr_module.py", line 1271, in db_ready
    return self.db is not None
  File "/usr/share/ceph/mgr/mgr_module.py", line 1283, in db
    self._db = self.open_db()
  File "/usr/share/ceph/mgr/mgr_module.py", line 1264, in open_db
    self.configure_db(db)
  File "/usr/share/ceph/mgr/mgr_module.py", line 1241, in configure_db
    self.load_schema(db)
  File "/usr/share/ceph/mgr/mgr_module.py", line 1230, in load_schema
    self.maybe_upgrade(db, int(row['value']))
  File "/usr/share/ceph/mgr/mgr_module.py", line 1207, in maybe_upgrade
    db.executescript(self.SCHEMA)
sqlite3.OperationalError: table Device already exists

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/share/ceph/mgr/devicehealth/module.py", line 394, in serve
    self._do_serve()
  File "/usr/share/ceph/mgr/mgr_module.py", line 532, in check
    self.open_db();
  File "/usr/share/ceph/mgr/mgr_module.py", line 1264, in open_db
    self.configure_db(db)
  File "/usr/share/ceph/mgr/mgr_module.py", line 1241, in configure_db
    self.load_schema(db)
  File "/usr/share/ceph/mgr/mgr_module.py", line 1230, in load_schema
    self.maybe_upgrade(db, int(row['value']))
  File "/usr/share/ceph/mgr/mgr_module.py", line 1207, in maybe_upgrade
    db.executescript(self.SCHEMA)
sqlite3.OperationalError: table Device already exists

Restarting the ceph-mgr pod does not help, rbd-mirroring is broken and
we don't have any workaround.

For testing ramen this is not that bad, we can delete the environment and
recreate it in 10 minutes, but for real deployment this looks bad.

Project

General

Profile

Ceph » cephsqlite

Custom queries

Bug #65494

ceph-mgr critical error: "Module 'devicehealth' has failed: table Device already exists"

Description¶

See also¶

Updated by Nir Soffer about 1 month ago

Updated by Yaarit Hatuka about 1 month ago

Updated by Nir Soffer about 1 month ago

Updated by Patrick Donnelly about 1 month ago

Updated by Patrick Donnelly about 1 month ago

Updated by Ilya Dryomov about 1 month ago

Updated by Nir Soffer about 1 month ago

Updated by Patrick Donnelly about 1 month ago

Updated by Patrick Donnelly 18 days ago

Updated by Backport Bot 18 days ago

Updated by Backport Bot 18 days ago

Updated by Backport Bot 18 days ago

Updated by Patrick Donnelly 18 days ago

Updated by Backport Bot 18 days ago

Updated by Backport Bot 18 days ago

Updated by Nir Soffer 6 days ago