Project

General

Profile

Actions

Bug #50389

closed

mds: "cluster [ERR] Error recovering journal 0x203: (2) No such file or directory" in cluster log"

Added by Patrick Donnelly about 3 years ago. Updated about 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
% Done:

100%

Source:
Q/A
Tags:
backport_processed
Backport:
pacific,octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS, osdc
Labels (FS):
qa-failure
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Symptom:

2021-04-15T16:51:03.445+0000 7faaa4574700  1 mds.j Updating MDS map to version 16 from mon.0
2021-04-15T16:51:03.446+0000 7faaa4574700 10 mds.j      my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2}
2021-04-15T16:51:03.446+0000 7faaa4574700 10 mds.j  mdsmap compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2}
2021-04-15T16:51:03.446+0000 7faaa4574700 10 mds.j my gid is 4329
2021-04-15T16:51:03.446+0000 7faaa4574700 10 mds.j map says I am mds.3.0 state up:standby-replay
2021-04-15T16:51:03.446+0000 7faaa4574700 10 mds.j msgr says I am [v2:172.21.15.39:6838/1945817279,v1:172.21.15.39:6839/1945817279]
2021-04-15T16:51:03.446+0000 7faaa4574700 10 mds.j handle_mds_map: handling map as rank 3
2021-04-15T16:51:03.446+0000 7faaa4574700 10 notify_mdsmap: mds.metrics
2021-04-15T16:51:03.446+0000 7faaa4574700 10 notify_mdsmap: mds.metrics: rank0 is mds.i
2021-04-15T16:51:03.446+0000 7faa9dd67700  1 -- [v2:172.21.15.39:6838/1945817279,v1:172.21.15.39:6839/1945817279] --> [v2:172.21.15.39:3300/0,v1:172.21.15.39:6789/0] -- mon_get_version(what=osdmap handle=4) v1 -- 0x5649ed383760 con 0x5649ed42e800
2021-04-15T16:51:03.446+0000 7faa9dd67700  4 mds.3.log Waiting for journal 0x203 to recover...
2021-04-15T16:51:03.448+0000 7faaa4574700  1 -- [v2:172.21.15.39:6838/1945817279,v1:172.21.15.39:6839/1945817279] <== mon.0 v2:172.21.15.39:3300/0 19 ==== mon_get_version_reply(handle=4 version=26) v2 ==== 24+0+0 (secure 0 0 0) 0x5649ed383760 con 0x5649ed42e800
2021-04-15T16:51:03.448+0000 7faa9e568700  0 mds.4329.journaler.mdlog(ro) error getting journal off disk
2021-04-15T16:51:03.448+0000 7faa9dd67700  4 mds.3.log Journal 0x203 recovered.
2021-04-15T16:51:03.448+0000 7faa9dd67700 -1 log_channel(cluster) log [ERR] : Error recovering journal 0x203: (2) No such file or directory

From: /ceph/teuthology-archive/pdonnell-2021-04-15_01:35:57-fs-wip-pdonnell-testing-20210414.230315-distro-basic-smithi/6047603/remote/smithi039/log/ceph-mds.j.log.gz

This one is weird. The standby-replay daemon has no traffic with the OSDs up to that point and yet concluded ENOENT. That usually means the objecter claims there's no pool after getting the latest version of the OSDMap (mon_get_version in above output). I don't see how that could be the case here. Might be a bug in osdc.


Related issues 2 (0 open2 closed)

Copied to CephFS - Backport #50848: pacific: mds: "cluster [ERR] Error recovering journal 0x203: (2) No such file or directory" in cluster log"ResolvedPatrick DonnellyActions
Copied to CephFS - Backport #50849: octopus: mds: "cluster [ERR] Error recovering journal 0x203: (2) No such file or directory" in cluster log"RejectedActions
Actions

Also available in: Atom PDF