Bug #50389: mds: "cluster [ERR] Error recovering journal 0x203: (2) No such file or directory" in cluster log" - CephFS - Ceph

Actions

Copy link

Bug #50389

closed

mds: "cluster [ERR] Error recovering journal 0x203: (2) No such file or directory" in cluster log"

Added by Patrick Donnelly about 3 years ago. Updated about 1 year ago.

Status:

Resolved

Priority:

High

Assignee:

Xiubo Li

Category:

Target version:

Ceph - v17.0.0

% Done:

100%

Source:

Q/A

Tags:

backport_processed

Backport:

pacific,octopus

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

MDS, osdc

Labels (FS):

qa-failure

Pull request ID:

41254

Crash signature (v1):

Crash signature (v2):

Description

Symptom:

2021-04-15T16:51:03.445+0000 7faaa4574700  1 mds.j Updating MDS map to version 16 from mon.0
2021-04-15T16:51:03.446+0000 7faaa4574700 10 mds.j      my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2}
2021-04-15T16:51:03.446+0000 7faaa4574700 10 mds.j  mdsmap compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2}
2021-04-15T16:51:03.446+0000 7faaa4574700 10 mds.j my gid is 4329
2021-04-15T16:51:03.446+0000 7faaa4574700 10 mds.j map says I am mds.3.0 state up:standby-replay
2021-04-15T16:51:03.446+0000 7faaa4574700 10 mds.j msgr says I am [v2:172.21.15.39:6838/1945817279,v1:172.21.15.39:6839/1945817279]
2021-04-15T16:51:03.446+0000 7faaa4574700 10 mds.j handle_mds_map: handling map as rank 3
2021-04-15T16:51:03.446+0000 7faaa4574700 10 notify_mdsmap: mds.metrics
2021-04-15T16:51:03.446+0000 7faaa4574700 10 notify_mdsmap: mds.metrics: rank0 is mds.i
2021-04-15T16:51:03.446+0000 7faa9dd67700  1 -- [v2:172.21.15.39:6838/1945817279,v1:172.21.15.39:6839/1945817279] --> [v2:172.21.15.39:3300/0,v1:172.21.15.39:6789/0] -- mon_get_version(what=osdmap handle=4) v1 -- 0x5649ed383760 con 0x5649ed42e800
2021-04-15T16:51:03.446+0000 7faa9dd67700  4 mds.3.log Waiting for journal 0x203 to recover...
2021-04-15T16:51:03.448+0000 7faaa4574700  1 -- [v2:172.21.15.39:6838/1945817279,v1:172.21.15.39:6839/1945817279] <== mon.0 v2:172.21.15.39:3300/0 19 ==== mon_get_version_reply(handle=4 version=26) v2 ==== 24+0+0 (secure 0 0 0) 0x5649ed383760 con 0x5649ed42e800
2021-04-15T16:51:03.448+0000 7faa9e568700  0 mds.4329.journaler.mdlog(ro) error getting journal off disk
2021-04-15T16:51:03.448+0000 7faa9dd67700  4 mds.3.log Journal 0x203 recovered.
2021-04-15T16:51:03.448+0000 7faa9dd67700 -1 log_channel(cluster) log [ERR] : Error recovering journal 0x203: (2) No such file or directory

From: /ceph/teuthology-archive/pdonnell-2021-04-15_01:35:57-fs-wip-pdonnell-testing-20210414.230315-distro-basic-smithi/6047603/remote/smithi039/log/ceph-mds.j.log.gz

This one is weird. The standby-replay daemon has no traffic with the OSDs up to that point and yet concluded ENOENT. That usually means the objecter claims there's no pool after getting the latest version of the OSDMap (mon_get_version in above output). I don't see how that could be the case here. Might be a bug in osdc.

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #50389

mds: "cluster [ERR] Error recovering journal 0x203: (2) No such file or directory" in cluster log"

Updated by Xiubo Li about 3 years ago

Updated by Xiubo Li about 3 years ago

Updated by Xiubo Li about 3 years ago

Updated by Xiubo Li about 3 years ago

Updated by Xiubo Li about 3 years ago

Updated by Xiubo Li about 3 years ago

Updated by Xiubo Li about 3 years ago

Updated by Patrick Donnelly about 3 years ago

Updated by Backport Bot about 3 years ago

Updated by Backport Bot about 3 years ago

Updated by Backport Bot almost 2 years ago

Updated by Konstantin Shalygin about 1 year ago