Actions
Bug #63830
openMDS fails to start
Status:
New
Priority:
High
Assignee:
Category:
Correctness/Safety
Target version:
% Done:
0%
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash, multifs, multimds
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
I have 2 filesystems, production and backup.
The backup fs is offline, because none of the mds's will go active.
Below here, I've added version, mds service spec, pool id and names, mds metadata for backup, one of the many crash reports and the service log output that's generated when i reset-failed + start one of the mds services.
I've also been made aware of https://access.redhat.com/solutions/6994879, but I'm not sure it's the same issue.
$ ceph version ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)
$ ceph orch ls --service_type mds --export service_type: mds service_id: Production service_name: mds.Production placement: count: 2 label: mds --- service_type: mds service_id: backup service_name: mds.backup placement: count: 2 label: mds_backup
$ ceph osd pool ls detail | grep cephfs | awk '{print $1" "$2" "$3}' pool 24 'cephfs.backup.meta' pool 25 'cephfs.backup.data' pool 26 'cephfs.production.data' pool 27 'cephfs.production.metadata'
$ ceph fs ls name: backup, metadata pool: cephfs.backup.meta, data pools: [cephfs.backup.data ] name: production, metadata pool: cephfs.production.metadata, data pools: [cephfs.production.data ]
$ ceph mds metadata | jq .[1] { "name": "backup.ceph03.gcoisu", "addr": "[v2:10.1.0.34:6800/3795710591,v1:10.1.0.34:6801/3795710591]", "arch": "x86_64", "ceph_release": "quincy", "ceph_version": "ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)", "ceph_version_short": "17.2.7", "container_hostname": "ceph03", "container_image": "quay.io/ceph/ceph@sha256:1fcdbead4709a7182047f8ff9726e0f17b0b209aaa6656c5c8b2339b818e70bb", "cpu": "Intel(R) Celeron(R) J4115 CPU @ 1.80GHz", "distro": "centos", "distro_description": "CentOS Stream 8", "distro_version": "8", "hostname": "ceph03", "kernel_description": "#1 SMP PREEMPT_DYNAMIC Thu Sep 21 18:07:33 UTC 2023", "kernel_version": "5.14.0-368.el9.x86_64", "mem_swap_kb": "3055612", "mem_total_kb": "32410468", "os": "Linux" }
$ ceph crash info 2023-12-14T12:08:09.595806Z_430af44c-1138-47fd-94c2-69cd6f82001e { "backtrace": [ "/lib64/libpthread.so.0(+0x12cf0) [0x7f4acf88acf0]", "gsignal()", "abort()", "/lib64/libstdc++.so.6(+0x9009b) [0x7f4acec8409b]", "/lib64/libstdc++.so.6(+0x9654c) [0x7f4acec8a54c]", "/lib64/libstdc++.so.6(+0x965a7) [0x7f4acec8a5a7]", "/lib64/libstdc++.so.6(+0x96808) [0x7f4acec8a808]", "(ceph::buffer::v15_2_0::list::iterator_impl<true>::copy(unsigned int, char*)+0xa5) [0x7f4ad0c620e5]", "(compact_set_base<long, std::set<long, std::less<long>, mempool::pool_allocator<(mempool::pool_index_t)26, long> > >::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x15f) [0x55a2d20088df]", "(inode_t<mempool::mds_co::pool_allocator>::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x55b) [0x55a2d200903b]", "(old_inode_t<mempool::mds_co::pool_allocator>::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x123) [0x55a2d2009623]", "(EMetaBlob::fullbit::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x688) [0x55a2d20eb3f8]", "/usr/bin/ceph-mds(+0x592f2d) [0x55a2d20edf2d]", "(EMetaBlob::replay(MDSRank*, LogSegment*, int, MDPeerUpdate*)+0x7bf) [0x55a2d20f5bff]", "(EUpdate::replay(MDSRank*)+0x61) [0x55a2d20fdbd1]", "(MDLog::_replay_thread()+0x7bb) [0x55a2d208454b]", "(MDLog::ReplayThread::entry()+0x11) [0x55a2d1d37041]", "/lib64/libpthread.so.0(+0x81ca) [0x7f4acf8801ca]", "clone()" ], "ceph_version": "17.2.7", "crash_id": "2023-12-14T12:08:09.595806Z_430af44c-1138-47fd-94c2-69cd6f82001e", "entity_name": "mds.backup.ceph03.gcoisu", "os_id": "centos", "os_name": "CentOS Stream", "os_version": "8", "os_version_id": "8", "process_name": "ceph-mds", "stack_sig": "99cdac589b9de540dc8f5016618788241f1ac1c08b8c8bf453437e6cd9792d18", "timestamp": "2023-12-14T12:08:09.595806Z", "utsname_hostname": "ceph03", "utsname_machine": "x86_64", "utsname_release": "5.14.0-368.el9.x86_64", "utsname_sysname": "Linux", "utsname_version": "#1 SMP PREEMPT_DYNAMIC Thu Sep 21 18:07:33 UTC 2023" }
Files
Actions