Bug #1789
mon: failed assert(paxosv == pg_map.version)
0%
Description
From teuthology:~teuthworker/archive/nightly_coverage_2011-12-02-b/3603/remote/ubuntu@sepia44.ceph.dreamhost.com/log/mon.2.log.gz:
mon/PGMonitor.cc: In function 'virtual bool PGMonitor::update_from_paxos()', in thread '7f9f51567700' mon/PGMonitor.cc: 165: FAILED assert(paxosv == pg_map.version) ceph version 0.38-278-g0622871 (commit:06228716e345a81ee2c93055a6a6133c540fbada) 1: (PGMonitor::update_from_paxos()+0x1145) [0x514fd5] 2: (PGMonitor::tick()+0x5e) [0x5071ae] 3: (Monitor::tick()+0x65) [0x473705] 4: (C_Mon_Tick::finish(int)+0x15) [0x496eb5] 5: (SafeTimer::timer_thread()+0x4b0) [0x5fa1d0] 6: (SafeTimerThread::entry()+0x15) [0x5fe475] 7: (Thread::_entry_func(void*)+0x12) [0x582852] 8: (()+0x7971) [0x7f9f557dd971] 9: (clone()+0x6d) [0x7f9f5406c92d]
Associated revisions
mon: fix slurp_latest to fill in any missing incrementals
Fixes #1789.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
History
#1 Updated by Sage Weil over 12 years ago
- Priority changed from Normal to High
#2 Updated by Sage Weil over 12 years ago
- translation missing: en.field_position set to 14
#3 Updated by Sage Weil over 12 years ago
- Status changed from New to Need More Info
have core, but no matching binary. not clear from code inspection what happened.
#4 Updated by Sage Weil about 12 years ago
- translation missing: en.field_position deleted (
29) - translation missing: en.field_position set to 30
#5 Updated by Sage Weil about 12 years ago
- Priority changed from High to Normal
#6 Updated by Sage Weil about 12 years ago
- Target version deleted (
v0.40) - translation missing: en.field_position deleted (
57) - translation missing: en.field_position set to 27
#7 Updated by Anonymous about 12 years ago
We only saw this the once, but we believe the bug and want to keep it open.
#8 Updated by Matthew Roy about 12 years ago
- File last_committed added
- File first_committed added
- File latest added
- File mon.c.log.head added
Crash occurred on the third monitor when starting after being down for several hours shortly after cluster creation. It's unclear to me whether this monitor ever came up after the cluster was initially created, I suspect it might not have. This cluster later had a bunch of authorization problems.
The assert occurs at line 965 in the attached log.
#9 Updated by Matthew Roy about 12 years ago
- File core.monAssert1435.gz added
Core dump attached. Dumb thought: could this be related to http://tracker.newdream.net/issues/2110, they happened within 5 minutes of each other on this cluster, but on different servers.
#10 Updated by Greg Farnum about 12 years ago
Shouldn't be related — this is a problem with a single monitor daemon and the other is a write problem that an MDS is getting kicked back from an OSD.
#11 Updated by Greg Farnum about 12 years ago
- Status changed from Need More Info to In Progress
- Assignee set to Greg Farnum
Iiiinteresting. This assert is the post-update check, after loading and running through all the incrementals. (Meaning, it passed the pre-update checks.) And it's being called from slurp(). I wonder if we missed a problem there.
#12 Updated by Greg Farnum about 12 years ago
- Status changed from In Progress to 4
Okay, figured it out. Our current slurp code pulls in all the incrementals, then sends off a request for latest_stashed. BUT, it's possible (especially with the pgmap state) that latest_stashed is newer than the incrementals we already pulled in, or possibly even discontiguous.
Which means that when we pull in the latest stashed, we will get a map without all the surrounding incrementals, AND when we update_from_paxos() we'll load that map, but not set up the rest of the Paxos state properly, and fail this assert.
Fixed it by just adding any missing incrementals to the slurp_latest response, which we should have done in the first place since there's already space for it and everything.
Did basic testing on wip-1789, appears to work.
#13 Updated by Greg Farnum about 12 years ago
- Status changed from 4 to Resolved
Pushed to master in d10e1f46df8cc252f2f1d57cf5e577ea38eee1ae