Project

General

Profile

Actions

Bug #53907

closed

BlueStore.h: 4148: FAILED ceph_assert(cur >= p.length)

Added by Vikhyat Umrao over 2 years ago. Updated 2 months ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Target version:
-
% Done:

100%

Source:
Tags:
backport_processed
Backport:
quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2022-01-18T00:11:09.058+0000 7f904d00d700  4 rocksdb: (Original Log Time 2022/01/18-00:11:09.059551) [db/memtable_list.cc:631] [L] Level-0 commit table #2027: memtable #1 done
2022-01-18T00:11:09.058+0000 7f904d00d700  4 rocksdb: (Original Log Time 2022/01/18-00:11:09.059566) EVENT_LOG_v1 {"time_micros": 1642464669059560, "job": 635, "event": "flush_finished", "output_compression": "NoCompression", "lsm_state": [2, 1, 0, 0, 0, 0, 0], "immutable_memtables": 0}
2022-01-18T00:11:09.058+0000 7f904d00d700  4 rocksdb: (Original Log Time 2022/01/18-00:11:09.059592) [db/db_impl/db_impl_compaction_flush.cc:235] [L] Level summary: files[2 1 0 0 0 0 0] max score 0.50

2022-01-18T00:11:09.058+0000 7f904d00d700  4 rocksdb: [db/db_impl/db_impl_files.cc:420] [JOB 635] Try to delete WAL files size 409400591, prev total WAL file size 414051513, number of live WAL files 3.

2022-01-18T00:11:09.145+0000 7f90485f6700 -1 /home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-10229-g7e035110/rpm/el8/BUILD/ceph-17.0.0-10229-g7e035110/src/os/bluestore/BlueStore.h: In function 'virtual void RocksDBBlueFSVolumeSelector::sub_usage(void*, const bluefs_fnode_t&)' thread 7f904d00d700 time 2022-01-18T00:11:09.085371+0000
/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-10229-g7e035110/rpm/el8/BUILD/ceph-17.0.0-10229-g7e035110/src/os/bluestore/BlueStore.h: 4148: FAILED ceph_assert(cur >= p.length)

 ceph version 17.0.0-10229-g7e035110 (7e035110784fba02ba81944e444be9a36932c6a3) quincy (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x561a196bab8e]
 2: /usr/bin/ceph-osd(+0x5d5daf) [0x561a196badaf]
 3: (RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t const&)+0x16a) [0x561a19d7ecca]
 4: (BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long, unsigned long)+0x735) [0x561a19e1cb45]
 5: (BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0xa9) [0x561a19e1d009]
 6: (BlueFS::fsync(BlueFS::FileWriter*)+0x18e) [0x561a19e386de]
 7: (BlueRocksWritableFile::Sync()+0x18) [0x561a19e48fb8]
 8: (rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions const&, rocksdb::IODebugContext*)+0x1f) [0x561a1a36c74f]
 9: (rocksdb::WritableFileWriter::SyncInternal(bool)+0x662) [0x561a1a49cf22]
 10: (rocksdb::WritableFileWriter::Sync(bool)+0xf8) [0x561a1a49e8e8]
 11: (rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned long)+0x341) [0x561a1a383701]
 12: (rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x1c04) [0x561a1a38b454]
 13: (rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x21) [0x561a1a38b5a1]
 14: (RocksDBStore::submit_common(rocksdb::WriteOptions&, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x84) [0x561a1a325f84]
 15: (RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x9a) [0x561a1a32698a]
 16: (BlueStore::_kv_sync_thread()+0x3530) [0x561a19d7d390]
 17: (BlueStore::KVSyncThread::entry()+0x11) [0x561a19dac0b1]
 18: /lib64/libpthread.so.0(+0x817a) [0x7f905f01817a]
 19: clone()

2022-01-18T00:11:09.146+0000 7f904d00d700 -1 /home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-10229-g7e035110/rpm/el8/BUILD/ceph-17.0.0-10229-g7e035110/src/os/bluestore/BlueStore.h: In function 'virtual void RocksDBBlueFSVolumeSelector::sub_usage(void*, const bluefs_fnode_t&)' thread 7f904d00d700 time 2022-01-18T00:11:09.085371+0000
/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-10229-g7e035110/rpm/el8/BUILD/ceph-17.0.0-10229-g7e035110/src/os/bluestore/BlueStore.h: 4148: FAILED ceph_assert(cur >= p.length)

 ceph version 17.0.0-10229-g7e035110 (7e035110784fba02ba81944e444be9a36932c6a3) quincy (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x561a196bab8e]
 2: /usr/bin/ceph-osd(+0x5d5daf) [0x561a196badaf]
 3: (RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t const&)+0x16a) [0x561a19d7ecca]
 4: (BlueFS::_drop_link_D(boost::intrusive_ptr<BlueFS::File>)+0x5fd) [0x561a19e1514d]
 5: (BlueFS::unlink(std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >)+0x706) [0x561a19e23ed6]
 6: (BlueRocksEnv::DeleteFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x47) [0x561a19e47167]
 7: (rocksdb::DeleteDBFile(rocksdb::ImmutableDBOptions const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool)+0x9d) [0x561a1a492abd]
 8: (rocksdb::DBImpl::DeleteObsoleteFileImpl(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::FileType, unsigned long)+0x116) [0x561a1a3a37a6]
 9: (rocksdb::DBImpl::PurgeObsoleteFiles(rocksdb::JobContext&, bool)+0x14c4) [0x561a1a3a6974]
 10: (rocksdb::DBImpl::BackgroundCallFlush(rocksdb::Env::Priority)+0x16b) [0x561a1a39dd8b]
 11: (rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned long)+0x24a) [0x561a1a56243a]
 12: (rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*)+0x5d) [0x561a1a5625dd]
 13: /lib64/libstdc++.so.6(+0xc2ba3) [0x7f905e66bba3]
 14: /lib64/libpthread.so.0(+0x817a) [0x7f905f01817a]
 15: clone()

2022-01-18T00:11:29.158+0000 7f42f7cf7240  0 set uid:gid to 167:167 (ceph:ceph)
2022-01-18T00:11:29.158+0000 7f42f7cf7240  0 ceph version 17.0.0-10229-g7e035110 (7e035110784fba02ba81944e444be9a36932c6a3) quincy (dev), process ceph-osd, pid 8
2022-01-18T00:11:29.158+0000 7f42f7cf7240  0 pidfile_write: ignore empty --pid-file
2022-01-18T00:11:29.160+0000 7f42f7cf7240  1 bdev(0x55e9a1703400 /var/lib/ceph/osd/ceph-4/block) open path /var/lib/ceph/osd/ceph-4/block
2022-01-18T00:11:29.161+0000 7f42f7cf7240  1 bdev(0x55e9a1703400 /var/lib/ceph/osd/ceph-4/block) open size 1999839952896 (0x1d19fc00000, 1.8 TiB) block_size 4096 (4 KiB) rotational discard not supported


Files

20231016_osd6.zip (157 KB) 20231016_osd6.zip OSD log including recent events before crash Zakhar Kirpichenko, 10/16/2023 08:22 AM

Related issues 7 (0 open7 closed)

Has duplicate bluestore - Bug #53906: BlueStore.h: 4158: FAILED ceph_assert(cur >= fnode.size)Duplicate

Actions
Is duplicate of bluestore - Bug #63161: bluestore: FAILED ceph_assert(cur2 >= p.length)Duplicate

Actions
Is duplicate of bluestore - Bug #63172: BlueStore: FAILED ceph_assert(cur >= fnode.size)Duplicate

Actions
Has duplicate bluestore - Bug #63110: Crash in RocksDBBlueFSVolumeSelector::sub_usage via BlueFS::fsync via WriteToWAL in KVSyncThreadDuplicate

Actions
Has duplicate bluestore - Bug #63352: ceph-osd crashed with ceph_assert(cur2 >= p.length) error messageDuplicate

Actions
Copied to bluestore - Backport #54209: quincy: BlueStore.h: 4148: FAILED ceph_assert(cur >= p.length)ResolvedAdam KupczykActions
Copied to bluestore - Backport #62928: pacific: BlueStore.h: 4148: FAILED ceph_assert(cur >= p.length)ResolvedIgor FedotovActions
Actions #1

Updated by Vikhyat Umrao over 2 years ago

- After hitting this crash the systemd restarted the OSD container pod and after the restart, OSD is running fine!

Actions #2

Updated by Vikhyat Umrao over 2 years ago

The same cluster OSD.95 had also hit the same assert and got restarted by systemd and after that running fine!


2022-01-17T22:56:40.077+0000 7f03e4a29700 -1 /home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-10229-g7e035110/rpm/el8/BUILD/ceph-17.0.0-10229-g7e035110/src/os/bluestore/BlueStore.h: In function 'virtual void RocksDBBlueFSVolumeSelector::sub_usage(void*, const bluefs_fnode_t&)' thread 7f03e4a29700 time 2022-01-17T22:56:40.066674+0000
/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-10229-g7e035110/rpm/el8/BUILD/ceph-17.0.0-10229-g7e035110/src/os/bluestore/BlueStore.h: 4148: FAILED ceph_assert(cur >= p.length)

 ceph version 17.0.0-10229-g7e035110 (7e035110784fba02ba81944e444be9a36932c6a3) quincy (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x561ec5505b8e]
 2: /usr/bin/ceph-osd(+0x5d5daf) [0x561ec5505daf]
 3: (RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t const&)+0x16a) [0x561ec5bc9cca]
 4: (BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long, unsigned long)+0x735) [0x561ec5c67b45]
 5: (BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0xa9) [0x561ec5c68009]
 6: (BlueFS::fsync(BlueFS::FileWriter*)+0x18e) [0x561ec5c836de]
 7: (BlueRocksWritableFile::Sync()+0x18) [0x561ec5c93fb8]
 8: (rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions const&, rocksdb::IODebugContext*)+0x1f) [0x561ec61b774f]
 9: (rocksdb::WritableFileWriter::SyncInternal(bool)+0x662) [0x561ec62e7f22]
 10: (rocksdb::WritableFileWriter::Sync(bool)+0xf8) [0x561ec62e98e8]
 11: (rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned long)+0x341) [0x561ec61ce701]
 12: (rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x1c04) [0x561ec61d6454]
 13: (rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x21) [0x561ec61d65a1]
 14: (RocksDBStore::submit_common(rocksdb::WriteOptions&, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x84) [0x561ec6170f84]
 15: (RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x9a) [0x561ec617198a]
 16: (BlueStore::_kv_sync_thread()+0x3530) [0x561ec5bc8390]
 17: (BlueStore::KVSyncThread::entry()+0x11) [0x561ec5bf70b1]
 18: /lib64/libpthread.so.0(+0x817a) [0x7f03fb44b17a]
 19: clone()

2022-01-17T22:57:03.972+0000 7f33a0448240  0 set uid:gid to 167:167 (ceph:ceph)
2022-01-17T22:57:03.972+0000 7f33a0448240  0 ceph version 17.0.0-10229-g7e035110 (7e035110784fba02ba81944e444be9a36932c6a3) quincy (dev), process ceph-osd, pid 8
2022-01-17T22:57:03.972+0000 7f33a0448240  0 pidfile_write: ignore empty --pid-file
2022-01-17T22:57:03.974+0000 7f33a0448240  1 bdev(0x563d0b97d400 /var/lib/ceph/osd/ceph-95/block) open path /var/lib/ceph/osd/ceph-95/block
2022-01-17T22:57:03.975+0000 7f33a0448240  1 bdev(0x563d0b97d400 /var/lib/ceph/osd/ceph-95/block) open size 1999839952896 (0x1d19fc00000, 1.8 TiB) block_size 4096 (4 KiB) rotational discard not supported

Actions #3

Updated by Vikhyat Umrao over 2 years ago

- OSD.0 also had similar crash

2022-01-17T23:40:22.136+0000 7f32423b0700 -1 /home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-10229-g7e035110/rpm/el8/BUILD/ceph-17.0.0-10229-g7e035110/src/os/bluestore/BlueStore.h: In function 'virtual void RocksDBBlueFSVolumeSelector::sub_usage(void*, const bluefs_fnode_t&)' thread 7f32423b0700 time 2022-01-17T23:40:22.040838+0000
/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-10229-g7e035110/rpm/el8/BUILD/ceph-17.0.0-10229-g7e035110/src/os/bluestore/BlueStore.h: 4148: FAILED ceph_assert(cur >= p.length)

 ceph version 17.0.0-10229-g7e035110 (7e035110784fba02ba81944e444be9a36932c6a3) quincy (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x55ffca66cb8e]
 2: /usr/bin/ceph-osd(+0x5d5daf) [0x55ffca66cdaf]
 3: (RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t const&)+0x16a) [0x55ffcad30cca]
 4: (BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long, unsigned long)+0x735) [0x55ffcadceb45]
 5: (BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0xa9) [0x55ffcadcf009]
 6: (BlueFS::fsync(BlueFS::FileWriter*)+0x18e) [0x55ffcadea6de]
 7: (BlueRocksWritableFile::Sync()+0x18) [0x55ffcadfafb8]
 8: (rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions const&, rocksdb::IODebugContext*)+0x1f) [0x55ffcb31e74f]
 9: (rocksdb::WritableFileWriter::SyncInternal(bool)+0x662) [0x55ffcb44ef22]
 10: (rocksdb::WritableFileWriter::Sync(bool)+0xf8) [0x55ffcb4508e8]
 11: (rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned long)+0x341) [0x55ffcb335701]
 12: (rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x1c04) [0x55ffcb33d454]
 13: (rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x21) [0x55ffcb33d5a1]
 14: (RocksDBStore::submit_common(rocksdb::WriteOptions&, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x84) [0x55ffcb2d7f84]
 15: (RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x9a) [0x55ffcb2d898a]
 16: (BlueStore::_kv_sync_thread()+0x3530) [0x55ffcad2f390]
 17: (BlueStore::KVSyncThread::entry()+0x11) [0x55ffcad5e0b1]
 18: /lib64/libpthread.so.0(+0x817a) [0x7f3258dd217a]
 19: clone()

2022-01-17T23:40:22.230+0000 7f32423b0700 -1 *** Caught signal (Aborted) **
 in thread 7f32423b0700 thread_name:bstore_kv_sync

 ceph version 17.0.0-10229-g7e035110 (7e035110784fba02ba81944e444be9a36932c6a3) quincy (dev)
 1: /lib64/libpthread.so.0(+0x12c20) [0x7f3258ddcc20]
 2: gsignal()
 3: abort()
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1b0) [0x55ffca66cbec]
 5: /usr/bin/ceph-osd(+0x5d5daf) [0x55ffca66cdaf]
 6: (RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t const&)+0x16a) [0x55ffcad30cca]
 7: (BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long, unsigned long)+0x735) [0x55ffcadceb45]
 8: (BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0xa9) [0x55ffcadcf009]
 9: (BlueFS::fsync(BlueFS::FileWriter*)+0x18e) [0x55ffcadea6de]
 10: (BlueRocksWritableFile::Sync()+0x18) [0x55ffcadfafb8]
 11: (rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions const&, rocksdb::IODebugContext*)+0x1f) [0x55ffcb31e74f]
 12: (rocksdb::WritableFileWriter::SyncInternal(bool)+0x662) [0x55ffcb44ef22]
 13: (rocksdb::WritableFileWriter::Sync(bool)+0xf8) [0x55ffcb4508e8]
 14: (rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned long)+0x341) [0x55ffcb335701]
 15: (rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x1c04) [0x55ffcb33d454]
 16: (rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x21) [0x55ffcb33d5a1]
 17: (RocksDBStore::submit_common(rocksdb::WriteOptions&, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x84) [0x55ffcb2d7f84]
 18: (RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x9a) [0x55ffcb2d898a]
 19: (BlueStore::_kv_sync_thread()+0x3530) [0x55ffcad2f390]
 20: (BlueStore::KVSyncThread::entry()+0x11) [0x55ffcad5e0b1]
 21: /lib64/libpthread.so.0(+0x817a) [0x7f3258dd217a]
 22: clone()

2022-01-17T23:40:42.537+0000 7fa18eaa0240  0 set uid:gid to 167:167 (ceph:ceph)
2022-01-17T23:40:42.537+0000 7fa18eaa0240  0 ceph version 17.0.0-10229-g7e035110 (7e035110784fba02ba81944e444be9a36932c6a3) quincy (dev), process ceph-osd, pid 7
2022-01-17T23:40:42.537+0000 7fa18eaa0240  0 pidfile_write: ignore empty --pid-file
2022-01-17T23:40:42.555+0000 7fa18eaa0240  1 bdev(0x559239489400 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block
2022-01-17T23:40:42.556+0000 7fa18eaa0240  1 bdev(0x559239489400 /var/lib/ceph/osd/ceph-0/block) open size 1999839952896 (0x1d19fc00000, 1.8 TiB) block_size 4096 (4 KiB) rotational discard not supported
2022-01-17T23:40:42.556+0000 7fa18eaa0240  1 bluestore(/var/lib/ceph/osd/ceph-0) _set_cache_sizes cache_size 1073741824 meta 0.45 kv 0.45 data 0.06
2022-01-17T23:40:42.556+0000 7fa18eaa0240  1 bdev(0x559239488c00 /var/lib/ceph/osd/ceph-0/block.db) open path /var/lib/ceph/osd/ceph-0/block.db

Actions #4

Updated by Neha Ojha over 2 years ago

  • Assignee set to Adam Kupczyk
Actions #5

Updated by Neha Ojha over 2 years ago

  • Has duplicate Bug #53906: BlueStore.h: 4158: FAILED ceph_assert(cur >= fnode.size) added
Actions #6

Updated by Neha Ojha over 2 years ago

  • Priority changed from Normal to Immediate
Actions #7

Updated by Adam Kupczyk over 2 years ago

  • Pull request ID set to 44713
Actions #8

Updated by Igor Fedotov over 2 years ago

  • Status changed from New to Fix Under Review
Actions #9

Updated by Neha Ojha over 2 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to quincy
Actions #10

Updated by Backport Bot over 2 years ago

  • Copied to Backport #54209: quincy: BlueStore.h: 4148: FAILED ceph_assert(cur >= p.length) added
Actions #11

Updated by Backport Bot almost 2 years ago

  • Tags set to backport_processed
Actions #12

Updated by Igor Fedotov over 1 year ago

  • Status changed from Pending Backport to Resolved
Actions #13

Updated by Igor Fedotov 8 months ago

  • Copied to Backport #62928: pacific: BlueStore.h: 4148: FAILED ceph_assert(cur >= p.length) added
Actions #14

Updated by Igor Fedotov 8 months ago

  • Status changed from Resolved to Pending Backport
Actions #15

Updated by Maximilian Stinsky 8 months ago

Hello.

we just upgraded our ceph from 16.2.13 to 16.2.14 and saw a osd crash with the same error message that this bug report states.

ceph-16.2.14/src/os/bluestore/BlueStore.h: 3870: FAILED ceph_assert(cur >= p.length)

The osd comes up without any problems after that.
We are not sure if this is related to the minor upgrade or not because before update 16.2.14 we had failing osds everyday due to another bug.

Should we create a new bug report regarding the ceph_assert error or could it be related to the same issue stated here?

Actions #16

Updated by Zakhar Kirpichenko 8 months ago

We're also affected by this bug, seems to have been introduced by 16.2.14, as crashes were not happening before we upgraded to this version.

Actions #17

Updated by Igor Fedotov 8 months ago

@Maximilian @Zakhar Kirpichenko - the issue to be fixed in the next Pacific minor release. Relevant backport PR is https://github.com/ceph/ceph/pull/53587

Please also see more details on how to workaround this in my post at https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/YNJ35HXN4HXF4XWB6IOZ2RKXX7EQCEIY/

Actions #18

Updated by Igor Fedotov 8 months ago

  • Is duplicate of Bug #63161: bluestore: FAILED ceph_assert(cur2 >= p.length) added
Actions #19

Updated by Igor Fedotov 8 months ago

  • Status changed from Pending Backport to Duplicate
Actions #20

Updated by Igor Fedotov 8 months ago

  • Is duplicate of Bug #63172: BlueStore: FAILED ceph_assert(cur >= fnode.size) added
Actions #21

Updated by Igor Fedotov 8 months ago

  • Status changed from Duplicate to Pending Backport
Actions #22

Updated by jinzhi zhang 8 months ago

We also ran into the same issue, after upgrading from 16.2.13 to 16.2.14

2023-10-15T20:08:11.719196Z_d95e3d5d-3cfe-4a53-9025-daf1745f345d  osd.6    *
2023-10-15T22:27:03.691912Z_1b30c7d8-98f7-448c-a4d1-035c03ee79e7  osd.17   *
2023-10-16T01:38:59.282064Z_0ce2a23a-fe39-4aae-aceb-6cb17864f593  osd.4    *
2023-10-16T04:49:14.343657Z_912bda6e-dc46-4dbb-bc7d-697a709f626c  osd.0    *
2023-10-16T04:56:47.076272Z_bbf9634b-c2ba-40ea-bd20-9deca41c60fd  osd.10   *
2023-10-16T10:21:53.195344Z_610d7e04-a997-4562-89e5-ac86bf78f9e5  osd.23   *
2023-10-16T13:40:57.835138Z_c16c472d-9699-49b1-87d3-d7e2177c2912  osd.9    *
2023-10-16T13:40:57.840230Z_78e52ea5-396a-46ae-bb91-1cece6f3473b  osd.9    *
2023-10-16T20:00:21.100773Z_83aba303-ddec-4ffa-aa95-c5eea6466165  osd.11   *
2023-10-16T21:58:17.103333Z_2d5f5025-95f5-44a6-bbc9-ea60fa9381ee  osd.7    *
2023-10-17T01:19:32.163215Z_8b418d76-3925-4818-8de5-3f5181b930be  osd.6    *
2023-10-17T01:19:32.178271Z_ef39c70c-cdef-4f7a-90ca-54938165b5df  osd.6    *
2023-10-17T01:40:06.571108Z_800aedb2-6092-4f01-89d9-cb214a9ec6e4  osd.21   *

and the crash info is
{
    "assert_condition": "cur2 >= p.length",
    "assert_file": "/root/rpmbuild/BUILD/ceph/src/os/bluestore/BlueStore.h",
    "assert_func": "virtual void RocksDBBlueFSVolumeSelector::sub_usage(void*, const bluefs_fnode_t&)",
    "assert_line": 3875,
    "assert_msg": "/root/rpmbuild/BUILD/ceph/src/os/bluestore/BlueStore.h: In function 'virtual void RocksDBBlueFSVolumeSelector::sub_usage(void*, const bluefs_fnode_t&)' thread 7fba288fb700 time 2023-10-17T01:40:06.473145+0000\n/root/rpmbuild/BUILD/ceph/src/os/bluestore/BlueStore.h: 3875: FAILED ceph_assert(cur2 >= p.length)\n",
    "assert_thread_name": "bstore_kv_sync",
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12c20) [0x7fba3d320c20]",
        "gsignal()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x55fc33a4d199]",
        "ceph-osd(+0x540362) [0x55fc33a4d362]",
        "(RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t const&)+0x15e) [0x55fc340fb6ee]",
        "(BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long, unsigned long)+0x74d) [0x55fc3418e5cd]",
        "(BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0x90) [0x55fc3418ea70]",
        "(BlueFS::fsync(BlueFS::FileWriter*)+0x181) [0x55fc341aa711]",
        "(BlueRocksWritableFile::Sync()+0x18) [0x55fc341bbb98]",
        "(rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions const&, rocksdb::IODebugContext*)+0x1f) [0x55fc3468537f]",
        "(rocksdb::WritableFileWriter::SyncInternal(bool)+0x402) [0x55fc347a2e92]",
        "(rocksdb::WritableFileWriter::Sync(bool)+0x88) [0x55fc347a4558]",
        "(rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned long)+0x30b) [0x55fc34696a1b]",
        "(rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x2687) [0x55fc3469f747]",
        "(rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x21) [0x55fc3469f941]",
        "(RocksDBStore::submit_common(rocksdb::WriteOptions&, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x84) [0x55fc3463e304]",
        "(RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x9a) [0x55fc3463ed0a]",
        "(BlueStore::_kv_sync_thread()+0x2f78) [0x55fc340f9c38]",
        "(BlueStore::KVSyncThread::entry()+0x11) [0x55fc34121ab1]",
        "(Thread::entry_wrapper()+0x53) [0x55fc3421f4e3]",
        "/lib64/libpthread.so.0(+0x817a) [0x7fba3d31617a]",
        "clone()" 
    ],
    "ceph_version": "16.2.14-5.0.1",
    "crash_id": "2023-10-17T01:40:06.571108Z_800aedb2-6092-4f01-89d9-cb214a9ec6e4",
    "entity_name": "osd.21",
    "os_id": "centos",
    "os_name": "CentOS Linux",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-osd",
    "stack_sig": "d4133fd81fc283e10ef4edc13fc01336f8bd22579ecab0ea9c0653bda7f3ffca",
    "timestamp": "2023-10-17T01:40:06.571108Z",
    "utsname_hostname": "storage-001",
    "utsname_machine": "x86_64",
    "utsname_release": "5.4.61-050461-generic",
    "utsname_sysname": "Linux",
    "utsname_version": "#202008260931 SMP Wed Aug 26 09:34:29 UTC 2020" 
}
Actions #23

Updated by Igor Fedotov 7 months ago

  • Has duplicate Bug #63110: Crash in RocksDBBlueFSVolumeSelector::sub_usage via BlueFS::fsync via WriteToWAL in KVSyncThread added
Actions #24

Updated by Igor Fedotov 7 months ago

  • Has duplicate Bug #63352: ceph-osd crashed with ceph_assert(cur2 >= p.length) error message added
Actions #25

Updated by Konstantin Shalygin 7 months ago

  • Status changed from Pending Backport to Resolved
  • % Done changed from 0 to 100
Actions #26

Updated by Maximilian Stinsky 2 months ago

Just wanted to confirm that since we upgraded to 16.2.15 we dont see crashing osd's anymore.

Actions #27

Updated by Igor Fedotov 2 months ago

Great, thanks for the update!

Actions

Also available in: Atom PDF