Project

General

Profile

Actions

Bug #63830

open

MDS fails to start

Added by Heðin Ejdesgaard Møller 6 months ago. Updated 25 days ago.

Status:
New
Priority:
High
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash, multifs, multimds
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have 2 filesystems, production and backup.
The backup fs is offline, because none of the mds's will go active.

Below here, I've added version, mds service spec, pool id and names, mds metadata for backup, one of the many crash reports and the service log output that's generated when i reset-failed + start one of the mds services.

I've also been made aware of https://access.redhat.com/solutions/6994879, but I'm not sure it's the same issue.

$ ceph version
ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)
$ ceph orch ls --service_type mds --export
service_type: mds
service_id: Production
service_name: mds.Production
placement:
  count: 2
  label: mds
---
service_type: mds
service_id: backup
service_name: mds.backup
placement:
  count: 2
  label: mds_backup
$ ceph osd pool ls detail | grep cephfs | awk '{print $1" "$2" "$3}'
pool 24 'cephfs.backup.meta'
pool 25 'cephfs.backup.data'
pool 26 'cephfs.production.data'
pool 27 'cephfs.production.metadata'
$ ceph fs ls
name: backup, metadata pool: cephfs.backup.meta, data pools: [cephfs.backup.data ]
name: production, metadata pool: cephfs.production.metadata, data pools: [cephfs.production.data ]
$ ceph mds metadata | jq .[1]
{
  "name": "backup.ceph03.gcoisu",
  "addr": "[v2:10.1.0.34:6800/3795710591,v1:10.1.0.34:6801/3795710591]",
  "arch": "x86_64",
  "ceph_release": "quincy",
  "ceph_version": "ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)",
  "ceph_version_short": "17.2.7",
  "container_hostname": "ceph03",
  "container_image": "quay.io/ceph/ceph@sha256:1fcdbead4709a7182047f8ff9726e0f17b0b209aaa6656c5c8b2339b818e70bb",
  "cpu": "Intel(R) Celeron(R) J4115 CPU @ 1.80GHz",
  "distro": "centos",
  "distro_description": "CentOS Stream 8",
  "distro_version": "8",
  "hostname": "ceph03",
  "kernel_description": "#1 SMP PREEMPT_DYNAMIC Thu Sep 21 18:07:33 UTC 2023",
  "kernel_version": "5.14.0-368.el9.x86_64",
  "mem_swap_kb": "3055612",
  "mem_total_kb": "32410468",
  "os": "Linux" 
}
$ ceph crash info 2023-12-14T12:08:09.595806Z_430af44c-1138-47fd-94c2-69cd6f82001e
{
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12cf0) [0x7f4acf88acf0]",
        "gsignal()",
        "abort()",
        "/lib64/libstdc++.so.6(+0x9009b) [0x7f4acec8409b]",
        "/lib64/libstdc++.so.6(+0x9654c) [0x7f4acec8a54c]",
        "/lib64/libstdc++.so.6(+0x965a7) [0x7f4acec8a5a7]",
        "/lib64/libstdc++.so.6(+0x96808) [0x7f4acec8a808]",
        "(ceph::buffer::v15_2_0::list::iterator_impl<true>::copy(unsigned int, char*)+0xa5) [0x7f4ad0c620e5]",
        "(compact_set_base<long, std::set<long, std::less<long>, mempool::pool_allocator<(mempool::pool_index_t)26, long> > >::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x15f) [0x55a2d20088df]",
        "(inode_t<mempool::mds_co::pool_allocator>::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x55b) [0x55a2d200903b]",
        "(old_inode_t<mempool::mds_co::pool_allocator>::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x123) [0x55a2d2009623]",
        "(EMetaBlob::fullbit::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x688) [0x55a2d20eb3f8]",
        "/usr/bin/ceph-mds(+0x592f2d) [0x55a2d20edf2d]",
        "(EMetaBlob::replay(MDSRank*, LogSegment*, int, MDPeerUpdate*)+0x7bf) [0x55a2d20f5bff]",
        "(EUpdate::replay(MDSRank*)+0x61) [0x55a2d20fdbd1]",
        "(MDLog::_replay_thread()+0x7bb) [0x55a2d208454b]",
        "(MDLog::ReplayThread::entry()+0x11) [0x55a2d1d37041]",
        "/lib64/libpthread.so.0(+0x81ca) [0x7f4acf8801ca]",
        "clone()" 
    ],
    "ceph_version": "17.2.7",
    "crash_id": "2023-12-14T12:08:09.595806Z_430af44c-1138-47fd-94c2-69cd6f82001e",
    "entity_name": "mds.backup.ceph03.gcoisu",
    "os_id": "centos",
    "os_name": "CentOS Stream",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-mds",
    "stack_sig": "99cdac589b9de540dc8f5016618788241f1ac1c08b8c8bf453437e6cd9792d18",
    "timestamp": "2023-12-14T12:08:09.595806Z",
    "utsname_hostname": "ceph03",
    "utsname_machine": "x86_64",
    "utsname_release": "5.14.0-368.el9.x86_64",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP PREEMPT_DYNAMIC Thu Sep 21 18:07:33 UTC 2023" 
}

Files

mds-systemd-journal.log (101 KB) mds-systemd-journal.log Heðin Ejdesgaard Møller, 12/14/2023 11:33 PM
Actions

Also available in: Atom PDF