Project

General

Profile

Bug #1796

mds: exit cleanly on EBLACKLISTED

Added by Sage Weil over 12 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2011-12-07 13:21:30.148211 7fb94c7af700 mds.0.37 beacon_kill last_acked_stamp 2011-12-07 13:21:10.148107, we are laggy!
2011-12-07 13:21:37.798034 7fb94e0b3700 mds.0.journaler(ro) _finish_read got error -108
2011-12-07 13:21:37.798091 7fb94e0b3700 mds.0.journaler(ro) _finish_read got error -108
2011-12-07 13:21:37.798117 7fb94e0b3700 mds.0.journaler(ro) _finish_read got error -108
2011-12-07 13:21:37.798172 7fb94bbaa700 mds.0.log _replay journaler got error -108, aborting
2011-12-07 13:21:37.798197 7fb94bbaa700 mds.0.37 replay_done
2011-12-07 13:21:37.798209 7fb94bbaa700 mds.0.37 making mds journal writeable
osdc/Journaler.cc: In function 'void Journaler::_prezeroed(int, uint64_t, uint64_t)', in thread '7fb94e0b3700'
osdc/Journaler.cc: 662: FAILED assert(r == 0 || r == -2)
 ceph version 0.39 (commit:321ecdaba2ceeddb0789d8f4b7180a8ea5785d83)
 1: (Journaler::_prezeroed(int, unsigned long, unsigned long)+0x38c) [0x6f641c]
 2: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x89c) [0x6d940c]
 3: (MDS::handle_core_message(Message*)+0x9af) [0x4d846f]
 4: (MDS::_dispatch(Message*)+0x157) [0x4d86f7]
 5: (MDS::ms_dispatch(Message*)+0x86) [0x4d9cd6]
 6: (SimpleMessenger::dispatch_entry()+0x8a3) [0x774783]
 7: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4b417c]
 8: (()+0x6d8c) [0x7fb951b76d8c]
 9: (clone()+0x6d) [0x7fb9503b704d]
 ceph version 0.39 (commit:321ecdaba2ceeddb0789d8f4b7180a8ea5785d83)
 1: (Journaler::_prezeroed(int, unsigned long, unsigned long)+0x38c) [0x6f641c]
 2: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x89c) [0x6d940c]
 3: (MDS::handle_core_message(Message*)+0x9af) [0x4d846f]
 4: (MDS::_dispatch(Message*)+0x157) [0x4d86f7]
 5: (MDS::ms_dispatch(Message*)+0x86) [0x4d9cd6]
 6: (SimpleMessenger::dispatch_entry()+0x8a3) [0x774783]
 7: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4b417c]
 8: (()+0x6d8c) [0x7fb951b76d8c]
 9: (clone()+0x6d) [0x7fb9503b704d]
*** Caught signal (Aborted) **
 in thread 7fb94e0b3700
 ceph version 0.39 (commit:321ecdaba2ceeddb0789d8f4b7180a8ea5785d83)
 1: /usr/bin/ceph-mds() [0x7e22b2]
 2: (()+0xfc60) [0x7fb951b7fc60]
 3: (gsignal()+0x35) [0x7fb950304d05]
 4: (abort()+0x186) [0x7fb950308ab6]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fb950bbb6dd]
 6: (()+0xb9926) [0x7fb950bb9926]
 7: (()+0xb9953) [0x7fb950bb9953]
 8: (()+0xb9a5e) [0x7fb950bb9a5e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x396) [0x722856]
 10: (Journaler::_prezeroed(int, unsigned long, unsigned long)+0x38c) [0x6f641c]
 11: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x89c) [0x6d940c]
 12: (MDS::handle_core_message(Message*)+0x9af) [0x4d846f]
 13: (MDS::_dispatch(Message*)+0x157) [0x4d86f7]
 14: (MDS::ms_dispatch(Message*)+0x86) [0x4d9cd6]
 15: (SimpleMessenger::dispatch_entry()+0x8a3) [0x774783]
 16: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4b417c]
 17: (()+0x6d8c) [0x7fb951b76d8c]
 18: (clone()+0x6d) [0x7fb9503b704d]

Associated revisions

Revision 195301ef (diff)
Added by Sage Weil about 12 years ago

mds: respawn when blacklisted

If we are blacklisted by the OSD cluster, it's because we were too slow
and were replaced by another ceph-mds. Respawn and re-register as a
standby.

If we get some other write error, shut down.

Fixes: #1796
Signed-off-by: Sage Weil <>

History

#1 Updated by Sage Weil over 12 years ago

  • translation missing: en.field_position set to 1042

#2 Updated by Sage Weil about 12 years ago

  • Target version deleted (v0.40)
  • translation missing: en.field_position deleted (1066)
  • translation missing: en.field_position set to 125

#3 Updated by Sage Weil about 12 years ago

  • Status changed from New to 4
  • Assignee set to Sage Weil
  • Target version set to v0.44

people hit this and it's confusing when ceph-mds crashes...

wip-1796

#4 Updated by Sage Weil about 12 years ago

  • Status changed from 4 to Fix Under Review

#5 Updated by Sage Weil about 12 years ago

  • Status changed from Fix Under Review to Resolved

#6 Updated by John Spray over 7 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (1)
  • Target version deleted (v0.44)

Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.

Also available in: Atom PDF