Bug #65388
openThe MDS_SLOW_REQUEST warning is flapping even though the slow requests don't go away
0%
Description
I have caught a cluster in an unhealthy state - probably some MDS deadlock that results in requests being blocked (deadlocked?) for multiple hours. I would expect the MDS_SLOW_REQUEST warning to be constantly present until the slow requests are somehow unblocked or the bad clients are gone. However, it is flapping, that is, there are windows of HEALTH_OK that should not be there. These bogus HEALTH_OK windows (and not the deadlock itself) are the subject of this issue.
As an example, I attached a log file generated by this command:
while true ; do date ; ceph -s ; ceph tell mds.0 dump_ops_in_flight ; sleep 5 ; done | tee ceph-bug.log
Files
Updated by Sebastian Wagner about 1 month ago
- Project changed from Ceph to CephFS
Updated by Venky Shankar about 1 month ago
- Category set to Correctness/Safety
- Assignee set to Leonid Usov
- Target version set to v20.0.0
- Source set to Community (user)
Updated by Leonid Usov 12 days ago
Venky, no, not yet. I haven't gotten back to this with the quiesce work that keeps coming. I'll try to continue where I left this off: during my first review of the flows I couldn't spot any obvious problem, so it'll require a deeper look. I also tried reproducing this issue in my dev env without success so far.