Project

General

Profile

Actions

Bug #64717

open

MDS stuck in replay/resolve use

Added by Abhishek Lekshmanan 2 months ago. Updated about 1 month ago.

Status:
New
Priority:
High
Category:
Performance/Resource Usage
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
quincy,reef,squid
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We have a cephfs cluster where we ran a lot of metadata intensive workloads with snapshots enabled. In our monitoring, we saw increasing MDLog segments growing to the order of millions and the MDS daemon being killed after that. Since then MDS's have been permanently stuck in up:replay and up:resolve loop. Initial up:replay always ended up getting oom-killed, this still repeated after 1 TiB of swap space. After finally bumping up to 5 TiB of swap we see that the MDS eventually completed up:replay but after a couple of days of up:resolve, it finally crashed. During the up:resolve stage we took a few stacktraces of MDSes to understand if it was doing anything, except one all the threads were waiting on a mutex... for eg. the log thread

TID 761439:
#0  0x00007f455144345c pthread_cond_wait@@GLIBC_2.3.2
#1  0x00007f455086d870 std::condition_variable::wait(std::unique_lock<std::mutex>&)
#2  0x00005594a654ac76 void std::_V2::condition_variable_any::wait<std::unique_lock<ceph::fair_mutex> >(std::unique_lock<ceph::fair_mutex>&)
#3  0x00005594a6542001 MDLog::_submit_thread()
#4  0x00005594a61f8081 MDLog::SubmitThread::entry()
#5  0x00007f455143d1ca start_thread

The only thread making progress was this thread

#0  0x00007f455085f1e3 std::_Rb_tree_increment(std::_Rb_tree_node_base*)
#1  0x00005594a634803c MDCache::recalc_auth_bits(bool)
#2  0x00005594a635cd25 MDCache::maybe_resolve_finish()
#3  0x00005594a635ef0a MDCache::handle_resolve(boost::intrusive_ptr<MMDSResolve const> const&)
#4  0x00005594a636aaac MDCache::dispatch(boost::intrusive_ptr<Message const> const&)
#5  0x00005594a61de27b MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, bool)
#6  0x00005594a61de8bc MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const> const&)
#7  0x00005594a61c7c2f MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)
#8  0x00007f45526cbf18 Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)
#9  0x00007f45526c935f DispatchQueue::entry()
#10 0x00007f45527933d1 DispatchQueue::DispatchThread::entry()
#11 0x00007f455143d1ca start_thread

Eventaully the resolve stage also crashed with the following stacktrace:

     0> 2024-03-04T07:01:46.040+0100 7f454ba21700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f454ba21700 thread_name:admin_socket

 ceph version 17.2.7-1 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)
 1: /lib64/libpthread.so.0(+0x12cf0) [0x7f4551447cf0]
 2: (MDSDaemon::dump_status(ceph::Formatter*)+0x1e0) [0x5594a61c00a0]
 3: (MDSDaemon::asok_command(std::basic_string_view<char, std::char_traits<char> >, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, boost::variant<std::__cxx11::basic_string<char, std::char_tra
its<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<
char> > > >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > >, std::less<void>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boo
st::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::
basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > > > > > const&, ceph::Formatter*, ceph::buffer::v15_2_0::list const&, std::fu
nction<void (int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::v15_2_0::list&)>)+0x515) [0x5594a61c5e25]
 4: (MDSSocketHook::call_async(std::basic_string_view<char, std::char_traits<char> >, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, boost::variant<std::__cxx11::basic_string<char, std::char_t
raits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocato
r<char> > > >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > >, std::less<void>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, b
oost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11
::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > > > > > const&, ceph::Formatter*, ceph::buffer::v15_2_0::list const&, std::
function<void (int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::v15_2_0::list&)>)+0xa8) [0x5594a61cf258]
 5: (AdminSocket::execute_command(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, c
eph::buffer::v15_2_0::list const&, std::function<void (int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::v15_2_0::list&)>)+0x889) [0x7f4552566459]
 6: (AdminSocket::execute_command(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, c
eph::buffer::v15_2_0::list const&, std::ostream&, ceph::buffer::v15_2_0::list*)+0x11e) [0x7f4552566cce]
 7: (AdminSocket::do_accept()+0x443) [0x7f4552567203]
 8: (AdminSocket::entry()+0x500) [0x7f4552568970]
 9: /lib64/libstdc++.so.6(+0xc2b23) [0x7f4550873b23]
 10: /lib64/libpthread.so.0(+0x81ca) [0x7f455143d1ca]
 11: clone()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Before this what MDS was logging was mostly

2024-02-28T22:47:44.966+0100 7f4549a1d700  1 mds.1.794194 handle_mds_map state change up:replay --> up:resolve
2024-02-28T22:47:44.966+0100 7f4549a1d700  1 mds.1.794194 resolve_start
2024-02-28T22:47:44.966+0100 7f4549a1d700  1 mds.1.794194 reopen_log
2024-02-28T22:47:44.966+0100 7f4549a1d700  1 mds.1.794194  recovery set is 0
2024-02-28T22:47:44.966+0100 7f4549a1d700  1 mds.1.794194  recovery set is 0
2024-02-28T22:47:44.986+0100 7f4549a1d700  1 mds.miniflax-mds-fee831e8fb parse_caps: cannot decode auth caps buffer of length 0
2024-02-28T22:48:01.820+0100 7f454ba21700  1 mds.miniflax-mds-fee831e8fb asok_command: status {prefix=status} (starting...)
2024-02-28T22:52:52.011+0100 7f4547a19700  1 heartbeat_map is_healthy 'MDSRank' had timed out after 15.000000954s
2024-02-28T22:52:52.012+0100 7f4547a19700  0 mds.beacon.miniflax-mds-fee831e8fb Skipping beacon heartbeat to monitors (last acked 4s ago); MDS internal heartbeat is not healthy!
2024-02-28T22:52:52.512+0100 7f4547a19700  1 heartbeat_map is_healthy 'MDSRank' had timed out after 15.000000954s

which continued on with increasing timeouts and eventually we also saw the following lines
2024-02-29T00:24:37.539+0100 7f4548a1b700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2024-02-28T23:24:37.541020+0100)

and towards the crash the last few lines were

   -6> 2024-03-04T07:01:46.007+0100 7f4549a1d700 10 monclient: _finish_auth 0
    -5> 2024-03-04T07:01:46.007+0100 7f4549a1d700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2024-03-04T07:01:16.007875+0100)
    -4> 2024-03-04T07:01:46.007+0100 7f4549a1d700 10 monclient: _finish_auth 0
    -3> 2024-03-04T07:01:46.007+0100 7f4549a1d700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2024-03-04T07:01:16.007888+0100)
    -2> 2024-03-04T07:01:46.007+0100 7f4549a1d700 10 monclient: _finish_auth 0
    -1> 2024-03-04T07:01:46.007+0100 7f4549a1d700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2024-03-04T07:01:16.007894+0100)


Related issues 1 (1 open0 closed)

Related to CephFS - Bug #64348: mds: possible memory leak in up:rejoin when opening cap inodes (from OFT)TriagedLeonid Usov

Actions
Actions

Also available in: Atom PDF