Bug #64988
closed
qa: fs:workloads mgr client evicted indicated by "cluster [WRN] evicting unresponsive client smithi042:x (15288), after 303.306 seconds"
Added by Patrick Donnelly 2 months ago.
Updated about 1 month ago.
- Related to Bug #64985: qa: mgr logs do not include client debugging added
- Status changed from New to In Progress
- Assignee set to Patrick Donnelly
Okay, so as expected this is a non-issue:
2024-03-20T18:59:44.324+0000 7ff1adba6700 1 -- 172.21.15.42:0/4057698876 <== mon.0 v2:172.21.15.42:3300/0 2621 ==== mgrmap(e 19) ==== 137871+0+0 (secure 0 0 0) 0x55bdef6bef00 con 0x55bdec7ec400
2024-03-20T18:59:44.324+0000 7ff1adba6700 10 mgr ms_dispatch2 active mgrmap(e 19)
2024-03-20T18:59:44.324+0000 7ff1adba6700 4 mgr handle_mgr_map received map epoch 19
2024-03-20T18:59:44.324+0000 7ff1adba6700 4 mgr handle_mgr_map active in map: 1 active is 14150
2024-03-20T18:59:44.324+0000 7ff1adba6700 1 mgr handle_mgr_map respawning because set of enabled modules changed!
2024-03-20T18:59:44.324+0000 7ff1adba6700 1 mgr respawn e: '/usr/bin/ceph-mgr'
2024-03-20T18:59:44.324+0000 7ff1adba6700 1 mgr respawn 0: '/usr/bin/ceph-mgr'
2024-03-20T18:59:44.324+0000 7ff1adba6700 1 mgr respawn 1: '-n'
2024-03-20T18:59:44.324+0000 7ff1adba6700 1 mgr respawn 2: 'mgr.x'
2024-03-20T18:59:44.324+0000 7ff1adba6700 1 mgr respawn 3: '-f'
2024-03-20T18:59:44.324+0000 7ff1adba6700 1 mgr respawn 4: '--setuser'
2024-03-20T18:59:44.324+0000 7ff1adba6700 1 mgr respawn 5: 'ceph'
2024-03-20T18:59:44.324+0000 7ff1adba6700 1 mgr respawn 6: '--setgroup'
2024-03-20T18:59:44.324+0000 7ff1adba6700 1 mgr respawn 7: 'ceph'
2024-03-20T18:59:44.324+0000 7ff1adba6700 1 mgr respawn 8: '--default-log-to-file=false'
2024-03-20T18:59:44.324+0000 7ff1adba6700 1 mgr respawn 9: '--default-log-to-journald=true'
2024-03-20T18:59:44.324+0000 7ff1adba6700 1 mgr respawn 10: '--default-log-to-stderr=false'
2024-03-20T18:59:44.325+0000 7ff1adba6700 1 mgr respawn respawning with exe /usr/bin/ceph-mgr
2024-03-20T18:59:44.325+0000 7ff1adba6700 1 mgr respawn exe_path /proc/self/exe
/teuthology/pdonnell-2024-03-20_18:16:52-fs-wip-batrick-testing-20240320.145742-distro-default-smithi/7612921/remote/smithi042/log/6efffee4-e6ea-11ee-95c9-87774f69a715/ceph-mgr.x.log.gz
The mgr modules changed so it rebooted and the client instance got evicted.
I'll work on a fix.
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 56354
The mgr modules changed so it rebooted and the client instance got evicted.
o_0
Shouldn’t we do a polite unmount when rebooting? Leaving a hanging client session from the manager seems real bad…
I guess when the monitor fails it over, it does a blocklist entry so the mds cleans up faster? Otherwise there’d be disasters there, too.
Greg Farnum wrote:
The mgr modules changed so it rebooted and the client instance got evicted.
o_0
Shouldn’t we do a polite unmount when rebooting? Leaving a hanging client session from the manager seems real bad…
I guess when the monitor fails it over, it does a blocklist entry so the mds cleans up faster? Otherwise there’d be disasters there, too.
It's not really a big deal and unlikely to happen in production. Again, it only happens when a failover occurs between when the session is established and the beacon with the client addr is sent to the mons. The mgr doesn't do anything with the mount until it has acknowledgement**.
https://github.com/ceph/ceph/pull/51169/files#diff-50ab66411d9293d402a15e00ed6843a4d37889c616873e69534e609c210f72ec
- Status changed from Fix Under Review to Pending Backport
- Copied to Backport #65092: reef: qa: fs:workloads mgr client evicted indicated by "cluster [WRN] evicting unresponsive client smithi042:x (15288), after 303.306 seconds" added
- Copied to Backport #65093: squid: qa: fs:workloads mgr client evicted indicated by "cluster [WRN] evicting unresponsive client smithi042:x (15288), after 303.306 seconds" added
- Tags set to backport_processed
- Status changed from Pending Backport to Resolved
Also available in: Atom
PDF