Bug #58041
closedmds: src/mds/Server.cc: 3231: FAILED ceph_assert(straydn->get_name() == straydname)
0%
Description
Nov 6 07:26:27 host0 /builddir/build/BUILD/ceph-16.2.8/src/mds/Server.cc: In function 'CDentry* Server::prepare_stray_dentry(MDRequestRef&, CInode*)' thread 7feb58dcd700 time 2022-11-06T13:26:27.233738+0000 Nov 6 07:26:27 host0 : /builddir/build/BUILD/ceph-16.2.8/src/mds/Server.cc: 3231: FAILED ceph_assert(straydn->get_name() == straydname) Nov 6 07:26:27 host0 : ceph version 16.2.8-84.el8cp (c2980f2fd700e979d41b4bad2939bb90f0fe435c) pacific (stable) Nov 6 07:26:27 host0 : 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7feb617faaa8] Nov 6 07:26:27 host0 : 2: /usr/lib64/ceph/libceph-common.so.2(+0x277cc2) [0x7feb617facc2] Nov 6 07:26:27 host0 : 3: (Server::prepare_stray_dentry(boost::intrusive_ptr<MDRequestImpl>&, CInode*)+0x95) [0x55aee13049e5] Nov 6 07:26:27 host0 : 4: (Server::handle_client_rename(boost::intrusive_ptr<MDRequestImpl>&)+0x1091) [0x55aee132bff1] Nov 6 07:26:27 host0 : 5: (Server::dispatch_client_request(boost::intrusive_ptr<MDRequestImpl>&)+0xe9a) [0x55aee135360a] Nov 6 07:26:27 host0 : 6: (MDCache::dispatch_request(boost::intrusive_ptr<MDRequestImpl>&)+0x33) [0x55aee13fb9b3] Nov 6 07:26:27 host0 : 7: (MDSContext::complete(int)+0x203) [0x55aee15b7ca3] Nov 6 07:26:27 host0 : 8: (MDSCacheObject::finish_waiting(unsigned long, int)+0xce) [0x55aee15d9b4e] Nov 6 07:26:27 host0 : 9: (Locker::eval_gather(SimpleLock*, bool, bool*, std::vector<MDSContext*, std::allocator<MDSContext*> >*)+0x13d6) [0x55aee148ca86] Nov 6 07:26:27 host0 : 10: (CDentry::remove_client_lease(ClientLease*, Locker*)+0x466) [0x55aee14f1a06] Nov 6 07:26:27 host0 : 11: (Locker::handle_client_lease(boost::intrusive_ptr<MClientLease const> const&)+0xc6a) [0x55aee147d2ea] Nov 6 07:26:27 host0 : 12: (Locker::dispatch(boost::intrusive_ptr<Message const> const&)+0x134) [0x55aee149f944] Nov 6 07:26:27 host0 : 13: (MDSRank::handle_message(boost::intrusive_ptr<Message const> const&)+0xbcc) [0x55aee12aeb6c] Nov 6 07:26:27 host0 : 14: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, bool)+0x7bb) [0x55aee12b150b] Nov 6 07:26:27 host0 : 15: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const> const&)+0x55) [0x55aee12b1b05] Nov 6 07:26:27 host0 : 16: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x108) [0x55aee12a16f8] Nov 6 07:26:27 host0 : 17: (DispatchQueue::entry()+0x126a) [0x7feb61a428ba] Nov 6 07:26:27 host0 : 18: (DispatchQueue::DispatchThread::entry()+0x11) [0x7feb61af4b81] Nov 6 07:26:27 host0 : 19: /lib64/libpthread.so.0(+0x81cf) [0x7feb607dd1cf] Nov 6 07:26:27 host0 : 20: clone() Nov 6 07:26:27 host0 : *** Caught signal (Aborted) ** Nov 6 07:26:27 host0 : in thread 7feb58dcd700 thread_name:ms_dispatch Nov 6 07:26:27 host0 : debug 2022-11-06T13:26:27.300+0000 7feb58dcd700 -1 /builddir/build/BUILD/ceph-16.2.8/src/mds/Server.cc: In function 'CDentry* Server::prepare_stray_dentry(MDRequestRef&, CInode*)' thread 7feb58dcd700 time 2022-11-06T13:26:27.233738+0000
Possibly looks like something raced with a rename. Looking at the backtrace, `handle_client_rename` was put on wait (maybe to revoke caps from another client), when woken up ran into the assert that verifies that the stray dentry name (from the MDRequest) should match the name generated using the inode number.
Updated by Venky Shankar over 1 year ago
oh, and btw this was seen in ceph-16.2.8.
Updated by Venky Shankar over 1 year ago
- Labels (FS) multimds added
and another side note, the crash was seen when a directory pin was removed from rank-0 mds. Pinning it back again ceases the crash.
Updated by Milind Changire over 1 year ago
Due to unavailability of debug logs, there has been some speculation about the issue during discussion with Venky.
The issue here is most likely due to a file create op racing with a lagging async unlink op.
This specific issue has been addressed by Xiubo in his PR: https://github.com/ceph/ceph/pull/47399
Updated by Venky Shankar over 1 year ago
- Status changed from New to Duplicate
Milind Changire wrote:
Due to unavailability of debug logs, there has been some speculation about the issue during discussion with Venky.
The issue here is most likely due to a file create op racing with a lagging async unlink op.
It does look related to async unlink.
This specific issue has been addressed by Xiubo in his PR: https://github.com/ceph/ceph/pull/47399
I think this is the correct fix - https://github.com/ceph/ceph/pull/46331
Closing this as the pacific backport (https://github.com/ceph/ceph/pull/48453) is pending merge. Please reopen if its seen again.
Updated by Venky Shankar over 1 year ago
- Related to Bug #55332: Failure in snaptest-git-ceph.sh (it's an async unlink/create bug) added