Bug #3252: mon: mixed mon cluster of argonaut + master eats cpu on new code - Ceph - Ceph

Actions

Copy link

Bug #3252

closed

mon: mixed mon cluster of argonaut + master eats cpu on new code

Added by Sage Weil over 11 years ago. Updated over 11 years ago.

Status:

Resolved

Priority:

High

Assignee:

Joao Eduardo Luis

Category:

Monitor

Target version:

% Done:

Source:

Development

Tags:

Backport:

argonaut

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

3 mon cluster, 2 running argonaut, one upgraded to current next. the new one eats cpu and spams log with

) e1 discarding message auth(proto 0 27 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum
2012-10-02 15:29:50.359636 7f1ad0159700  1 mon.burnupi03@0(electing) e1 discarding message auth(proto 0 27 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum
2012-10-02 15:29:50.359698 7f1ad0159700  1 mon.burnupi03@0(electing) e1 discarding message auth(proto 0 30 bytes epoch 0) v1 and sending client elsewhere; we are not in quorum
2012-10-02 15:29:50.359786 7f1ad0159700  1 mon.burnupi03@0(electing) e1 discarding message auth(proto 0 30 bytes epoch 0) v1 and sending client elsewhere; we are not in quorum
2012-10-02 15:29:50.359873 7f1ad0159700  1 mon.burnupi03@0(electing) e1 discarding message auth(proto 0 30 bytes epoch 0) v1 and sending client elsewhere; we are not in quorum
2012-10-02 15:29:55.360164 7f1ad0159700  1 mon.burnupi03@0(electing) e1 discarding message auth(proto 0 28 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum
2012-10-02 15:29:55.360311 7f1ad0159700  1 mon.burnupi03@0(electing) e1 discarding message auth(proto 0 27 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum
2012-10-02 15:29:55.360379 7f1ad0159700  1 mon.burnupi03@0(electing) e1 discarding message auth(proto 0 30 bytes epoch 0) v1 and sending client elsewhere; we are not in quorum

Actions

Copy link

Updated by Joao Eduardo Luis over 11 years ago

This hasn't been verified while running all three monitors on the same machine with the default configuration generated with vstart.sh

So far, the taken approach to reproduce this bug has been as follows:

create a brand new cluster with vstart.sh using argonaut
kill a monitor (say, mon.c)
run mon.c using the same store and ceph.conf as before, but using a ceph-mon built from 'master'
nothing happens
repeat all the same steps, varying the monitor id (a through c)
still nothing happens

We are now going to try with the same conf file as seen on the burnupis, and on different machines.

Actions

Copy link

Updated by Joao Eduardo Luis over 11 years ago

Was finally able to reproduce it. The trick mainly relies on not specifying the cluster's monmap during --mkfs.

Steps to reproduce, assuming three different machines, one monitor per machine (mon.a is leader, others are peons); also assuming 'ceph-mon' as the argonaut version (f031046c4d600ce0a0c2c9ad5856dc100b0901fb) and 'ceph-mon.master' as the 'master' branch version (commit:9078087aba8a174b897312ec9beda7f420a896d4).

for i in a b c; do ./ceph-mon --mkfs -c ceph.conf -i $i ; done
./ceph-mon -i a -c ceph.conf
./ceph-mon -i b -c ceph.conf
./ceph-mon.master -i c -c ceph.conf

Everything should be running smoothly at this point.

kill mon.c
./ceph-mon.master -i c -c ceph.conf

We should now see that mon.c is in an infinite state of calling elections

2012-10-03 07:27:12.503701 7f94ba7fc700  0 log [INF] : mon.plana84 calling new monitor election
2012-10-03 07:27:12.503717 7f94ba7fc700  5 mon.plana84@-1(electing).elector(221) start -- can i be leader?
2012-10-03 07:27:12.503803 7f94ba7fc700  5 mon.plana84@-1(electing).elector(221) handle_propose from mon.1
2012-10-03 07:27:17.503891 7f94b9ffb700  5 mon.plana84@-1(electing).elector(221) election timer expired
2012-10-03 07:27:17.503898 7f94b9ffb700  5 mon.plana84@-1(electing).elector(221) start -- can i be leader?
2012-10-03 07:27:17.753671 7f94ba7fc700  5 mon.plana84@-1(electing).elector(221) handle_propose from mon.0
2012-10-03 07:27:17.753679 7f94ba7fc700 10 mon.plana84@-1(electing).elector(221) bump_epoch 221 to 223
2012-10-03 07:27:17.787529 7f94ba7fc700 10 mon.plana84@-1(electing) e1 reset
2012-10-03 07:27:17.787543 7f94ba7fc700 10 mon.plana84@-1(electing) e1 start_election
2012-10-03 07:27:17.787545 7f94ba7fc700 10 mon.plana84@-1(electing) e1 cancel_probe_timeout (none scheduled)

While mon.a and mon.b are also calling frequent elections due to mon.c(?), although mon.a ends up being elected as leader, mon.b elected as peon:

2012-10-03 07:32:20.245086 7f13bce42700 10 mon.plana61@0(leader) e1 resend_routed_requests
2012-10-03 07:32:20.245092 7f13bce42700 10 mon.plana61@0(leader) e1 register_cluster_logger - already registered
2012-10-03 07:32:20.245281 7f13bd643700  5 mon.plana61@0(leader).elector(310) handle_propose from mon.2
2012-10-03 07:32:20.245287 7f13bd643700  5 mon.plana61@0(leader).elector(310)  got propose from old epoch, quorum is 0,1, mon.2 must have just started
2012-10-03 07:32:20.245292 7f13bd643700 10 mon.plana61@0(leader) e1 reset
2012-10-03 07:32:20.245304 7f13bd643700 10 mon.plana61@0(leader) e1 start_election
2012-10-03 07:32:20.245308 7f13bd643700 10 mon.plana61@0(leader) e1 cancel_probe_timeout (none scheduled)
2012-10-03 07:32:20.245317 7f13bd643700  0 log [INF] : mon.plana61 calling new monitor election
2012-10-03 07:32:20.245349 7f13bd643700  5 mon.plana61@0(electing).elector(310) start -- can i be leader?
2012-10-03 07:32:20.245354 7f13bd643700 10 mon.plana61@0(electing).elector(310) bump_epoch 310 to 311
2012-10-03 07:32:20.287268 7f13bd643700 10 mon.plana61@0(electing) e1 received forwarded message from mon.1 10.214.132.37:6789/0 via mon.1 10.214.132.37:6789/0
2012-10-03 07:32:20.287276 7f13bd643700 10 mon.plana61@0(electing) e1  mesg 0x7f139403ea80 from 10.214.132.37:6789/0
2012-10-03 07:32:20.287313 7f13bd643700  5 mon.plana61@0(electing).elector(311) handle_propose from mon.2
2012-10-03 07:32:20.287344 7f13bd643700 10 mon.plana61@0(electing) e1 received forwarded message from mon.1 10.214.132.37:6789/0 via mon.1 10.214.132.37:6789/0
2012-10-03 07:32:20.287352 7f13bd643700 10 mon.plana61@0(electing) e1  mesg 0x7f1394024bf0 from 10.214.132.37:6789/0
2012-10-03 07:32:20.287381 7f13bd643700 10 mon.plana61@0(electing) e1 received forwarded message from mon.1 10.214.132.37:6789/0 via mon.1 10.214.132.37:6789/0
2012-10-03 07:32:20.287394 7f13bd643700 10 mon.plana61@0(electing) e1  mesg 0x7f139402f8b0 from 10.214.132.37:6789/0
2012-10-03 07:32:20.287436 7f13bd643700 10 mon.plana61@0(electing) e1 received forwarded message from mon.1 10.214.132.37:6789/0 via mon.1 10.214.132.37:6789/0
2012-10-03 07:32:20.287450 7f13bd643700 10 mon.plana61@0(electing) e1  mesg 0x7f1394043c60 from 10.214.132.37:6789/0
2012-10-03 07:32:20.287506 7f13bd643700  5 mon.plana61@0(electing).elector(311) handle_propose from mon.1
2012-10-03 07:32:20.288079 7f13bd643700  5 mon.plana61@0(electing).elector(311) handle_ack from mon.1
2012-10-03 07:32:20.288084 7f13bd643700  5 mon.plana61@0(electing).elector(311)  so far i have {0=327679,1=327679}
2012-10-03 07:32:25.150468 7f13bd643700  5 mon.plana61@0(electing).elector(311) handle_propose from mon.2
2012-10-03 07:32:25.287398 7f13bce42700  5 mon.plana61@0(electing).elector(311) election timer expired
2012-10-03 07:32:25.287414 7f13bce42700 10 mon.plana61@0(electing).elector(311) bump_epoch 311 to 312
2012-10-03 07:32:25.320862 7f13bce42700 10 mon.plana61@0(leader) e1 win_election, epoch 312 quorum is 0,1 features are 327679
2012-10-03 07:32:25.320877 7f13bce42700  0 log [INF] : mon.plana61@0 won leader election with quorum 0,1

This is constantly repeated.

Actions

Copy link

Updated by Joao Eduardo Luis over 11 years ago

Not sure how we reach this state, but with the debug levels cranked up, the leader (say, mon.a), shows the following whenever it receives an election message from mon.c:


2012-10-03 07:39:29.171307 7fcaa5ffb700  1 -- 10.214.132.17:6789/0 <== mon.2 10.214.133.32:6789/0 77 ==== election(ca6b70a6-4cf9-4ec0-81bb-ad54cde3cfa0 propose 387) v3 ==== 527+0+0 (1557536000 0 0) 0x7fca9000d8b0 con 0x7fca980019a0
2012-10-03 07:39:29.171331 7fcaa5ffb700 20 mon.plana61@0(electing) e1 have connection
2012-10-03 07:39:29.171331 7fcaac290700 10 -- 10.214.132.17:6789/0 >> 10.214.133.32:6789/0 pipe(0x7fca980015f0 sd=17 pgs=24346007 cs=1 l=0).write_ack 77
2012-10-03 07:39:29.171335 7fcaa5ffb700 20 mon.plana61@0(electing) e1 ms_dispatch existing session MonSession: mon.2 10.214.133.32:6789/0 is openallow * for mon.2 10.214.133.32:6789/0
2012-10-03 07:39:29.171344 7fcaa5ffb700 20 mon.plana61@0(electing) e1  caps allow *
2012-10-03 07:39:29.171347 7fcaa5ffb700  1 mon.plana61@0(electing).elector(387) discarding election message: 10.214.133.32:6789/0 not in my monmap e1: 3 mons at {plana41=10.214.132.37:6789/0,plana61=10.214.132.17:6789/0,plana84=0.0.0.0:0/1}

Looks like mon.c (in this case plana84) is not on the monmap of the leader (plana61). The same happens on mon.b (plana41). Neither display the correct IP for mon.c (plana64) when inquired through the admin socket, showing simply 0.0.0.0.

Restarting mon.c (plana84) after some time lead to it slurping, and then incurring in the same behavior, trying to get elected without success. Curiously enough, its port changed from 6789 to 6800 after the restart. However, the other monitors acknowledged its existence, but somehow it was never aware of the ending of an election.

As seen from the leader (plana61):


2012-10-03 07:46:18.770288 7ff0a3fff700  1 -- 10.214.132.17:6789/0 <== mon.2 10.214.133.32:6800/0 86 ==== election(ca6b70a6-4cf9-4ec0-81bb-ad54cde3cfa0 propose 455) v3 ==== 527+0+0 (2667890729 0 0) 0x7ff088009b30 con 0x7ff0b00031b0
2012-10-03 07:46:18.770333 7ff0a3fff700 20 mon.plana61@0(electing) e2 have connection
2012-10-03 07:46:18.770328 7ff0a2dfb700 10 -- 10.214.132.17:6789/0 >> 10.214.133.32:6800/0 pipe(0x7ff0b0002d90 sd=18 pgs=1 cs=1 l=0).write_ack 86
2012-10-03 07:46:18.770338 7ff0a3fff700 20 mon.plana61@0(electing) e2 ms_dispatch existing session MonSession: mon.2 10.214.133.32:6800/0 is openallow * for mon.2 10.214.133.32:6800/0
2012-10-03 07:46:18.770347 7ff0a3fff700 20 mon.plana61@0(electing) e2  caps allow *
2012-10-03 07:46:18.770360 7ff0a3fff700  5 mon.plana61@0(electing).elector(455) handle_propose from mon.2
2012-10-03 07:46:18.770357 7ff0a2dfb700 10 -- 10.214.132.17:6789/0 >> 10.214.133.32:6800/0 pipe(0x7ff0b0002d90 sd=18 pgs=1 cs=1 l=0).writer: state = 2 policy.server=0
2012-10-03 07:46:18.770364 7ff0a3fff700 10 -- 10.214.132.17:6789/0 dispatch_throttle_release 527 to dispatch throttler 527/104857600
2012-10-03 07:46:18.870168 7ff0a37fe700  5 mon.plana61@0(electing).elector(455) election timer expired
2012-10-03 07:46:18.870185 7ff0a37fe700 10 mon.plana61@0(electing).elector(455) bump_epoch 455 to 456
2012-10-03 07:46:18.870191 7ff0a37fe700 15 store(/home/ubuntu/cluster/mon.plana61) set_int election_epoch = 456
2012-10-03 07:46:18.903639 7ff0a37fe700  1 -- 10.214.132.17:6789/0 --> mon.1 10.214.132.37:6789/0 -- election(ca6b70a6-4cf9-4ec0-81bb-ad54cde3cfa0 victory 456) v2 -- ?+0 0x7ff0840283a0
2012-10-03 07:46:18.903675 7ff0a37fe700 10 mon.plana61@0(leader) e2 win_election, epoch 456 quorum is 0,1 features are 327679
2012-10-03 07:46:18.903689 7ff0a37fe700  0 log [INF] : mon.plana61@0 won leader election with quorum 0,1

After yet another restart of plana84's monitor, the following happened:


2012-10-03 07:46:20.943444 7fa7a37fe700 10 mon.plana84@-1(probing) e1  monmap is e1: 3 mons at {plana41=10.214.132.37:6789/0,plana61=10.214.132.17:6789/0,plana84=0.0.0.0:0/1}
2012-10-03 07:46:20.943475 7fa7a37fe700 10 mon.plana84@-1(probing) e1  got newer/committed monmap epoch 2, mine was 1
2012-10-03 07:46:20.943487 7fa7a37fe700 10 mon.plana84@-1(probing) e2 bootstrap
2012-10-03 07:46:20.943489 7fa7a37fe700 10 mon.plana84@-1(probing) e2 unregister_cluster_logger - not registered
2012-10-03 07:46:20.943491 7fa7a37fe700 10 mon.plana84@-1(probing) e2 cancel_probe_timeout 0x192b750
2012-10-03 07:46:20.943494 7fa7a37fe700 10 mon.plana84@-1(probing) e2 reset
2012-10-03 07:46:20.943498 7fa7a37fe700 10 mon.plana84@-1(probing) e2 cancel_probe_timeout (none scheduled)
2012-10-03 07:46:20.943507 7fa7a37fe700 10 mon.plana84@-1(probing) e2 reset_probe_timeout 0x7fa77c001880 after 2 seconds
2012-10-03 07:46:20.943519 7fa7a37fe700 10 mon.plana84@-1(probing) e2 probing other monitors
2012-10-03 07:46:20.943684 7fa7a37fe700 10 mon.plana84@-1(probing) e2 do not have session, making new one
2012-10-03 07:46:20.943693 7fa7a37fe700 10 mon.plana84@-1(probing) e2 ms_dispatch new session MonSession: mon.1 10.214.132.37:6789/0 is open for mon.1 10.214.132.37:6789/0
2012-10-03 07:46:20.943706 7fa7a37fe700  5 mon.plana84@-1(probing) e2 setting monitor caps on this connection
2012-10-03 07:46:20.943710 7fa7a37fe700 10 mon.plana84@-1(probing) e2 handle_probe mon_probe(reply ca6b70a6-4cf9-4ec0-81bb-ad54cde3cfa0 name plana41 quorum 0,1 versions {auth=6,logm=103,mdsmap=1,monmap=2,osdmap=1,pgmap=2}) v2
2012-10-03 07:46:20.943724 7fa7a37fe700 10 mon.plana84@-1(probing) e2 handle_probe_reply mon.1 10.214.132.37:6789/0mon_probe(reply ca6b70a6-4cf9-4ec0-81bb-ad54cde3cfa0 name plana41 quorum 0,1 versions {auth=6,logm=103,mdsmap=1,monmap=2,osdmap=1,pgmap=2}) v2
2012-10-03 07:46:20.943741 7fa7a37fe700 10 mon.plana84@-1(probing) e2  monmap is e2: 3 mons at {plana41=10.214.132.37:6789/0,plana61=10.214.132.17:6789/0,plana84=10.214.133.32:6800/0}
2012-10-03 07:46:20.943771 7fa7a37fe700 10 mon.plana84@-1(probing) e2  peer name is plana41
2012-10-03 07:46:20.943778 7fa7a37fe700 10 mon.plana84@-1(probing) e2  existing quorum 0,1
2012-10-03 07:46:20.943783 7fa7a37fe700 10 mon.plana84@-1(probing) e2  peer paxos machine auth v 6 vs my v 6 (ok)
2012-10-03 07:46:20.943787 7fa7a37fe700 10 mon.plana84@-1(probing) e2  peer paxos machine logm v 103 vs my v 91 (too far ahead)
2012-10-03 07:46:20.943796 7fa7a37fe700 10 mon.plana84@-1(probing) e2  peer paxos machine mdsmap v 1 vs my v 1 (ok)
2012-10-03 07:46:20.943802 7fa7a37fe700 10 mon.plana84@-1(probing) e2  peer paxos machine monmap v 2 vs my v 1 (ok)
2012-10-03 07:46:20.943806 7fa7a37fe700 10 mon.plana84@-1(probing) e2  peer paxos machine osdmap v 1 vs my v 1 (ok)
2012-10-03 07:46:20.943810 7fa7a37fe700 10 mon.plana84@-1(probing) e2  peer paxos machine pgmap v 2 vs my v 2 (ok)
2012-10-03 07:46:20.943818 7fa7a37fe700 10 mon.plana84@-1(probing) e2 slurp mon.1 10.214.132.37:6789/0 {auth=6,logm=103,mdsmap=1,monmap=2,osdmap=1,pgmap=2}
2012-10-03 07:46:20.943834 7fa7a37fe700 10 mon.plana84@-1(probing) e2 cancel_probe_timeout 0x7fa77c001880
2012-10-03 07:46:20.943849 7fa7a37fe700 10 mon.plana84@-1(probing) e2 reset_probe_timeout 0x7fa77c001880 after 2 seconds

Followed by a slurp, and then


2012-10-03 07:46:21.433194 7fa7a37fe700 10 mon.plana84@2(probing) e2 handle_probe_reply mon.0 10.214.132.17:6789/0mon_probe(reply ca6b70a6-4cf9-4ec0-81bb-ad54cde3cfa0 name plana61 quorum 0,1 versions {auth=6,logm=103,mdsmap=1,monmap=2,osdmap=1,pgmap=2}) v2
2012-10-03 07:46:21.433203 7fa7a37fe700 10 mon.plana84@2(probing) e2  monmap is e2: 3 mons at {plana41=10.214.132.37:6789/0,plana61=10.214.132.17:6789/0,plana84=0.0.0.0:6800/0}
2012-10-03 07:46:21.433226 7fa7a37fe700 10 mon.plana84@2(probing) e2  peer name is plana61
2012-10-03 07:46:21.433230 7fa7a37fe700 10 mon.plana84@2(probing) e2  existing quorum 0,1
2012-10-03 07:46:21.433233 7fa7a37fe700 10 mon.plana84@2(probing) e2  peer paxos machine auth v 6 vs my v 6 (ok)
2012-10-03 07:46:21.433236 7fa7a37fe700 10 mon.plana84@2(probing) e2  peer paxos machine logm v 103 vs my v 103 (ok)
2012-10-03 07:46:21.433238 7fa7a37fe700 10 mon.plana84@2(probing) e2  peer paxos machine mdsmap v 1 vs my v 1 (ok)
2012-10-03 07:46:21.433240 7fa7a37fe700 10 mon.plana84@2(probing) e2  peer paxos machine monmap v 2 vs my v 2 (ok)
2012-10-03 07:46:21.433242 7fa7a37fe700 10 mon.plana84@2(probing) e2  peer paxos machine osdmap v 1 vs my v 1 (ok)
2012-10-03 07:46:21.433244 7fa7a37fe700 10 mon.plana84@2(probing) e2  peer paxos machine pgmap v 2 vs my v 2 (ok)
2012-10-03 07:46:21.433247 7fa7a37fe700 10 mon.plana84@2(probing) e2  ready to join, but i'm not in the monmap or my addr is blank, trying to join
2012-10-03 07:46:21.433552 7fa7a37fe700 10 mon.plana84@2(probing) e2 do not have session, making new one
2012-10-03 07:46:21.433558 7fa7a37fe700 10 mon.plana84@2(probing) e2 ms_dispatch new session MonSession: mon.1 10.214.132.37:6789/0 is open for mon.1 10.214.132.37:6789/0
2012-10-03 07:46:21.433570 7fa7a37fe700  5 mon.plana84@2(probing) e2 setting monitor caps on this connection
2012-10-03 07:46:21.433575 7fa7a37fe700 10 mon.plana84@2(probing) e2 handle_probe mon_probe(reply ca6b70a6-4cf9-4ec0-81bb-ad54cde3cfa0 name plana41 quorum 0,1 versions {auth=6,logm=103,mdsmap=1,monmap=2,osdmap=1,pgmap=2}) v2
2012-10-03 07:46:21.433588 7fa7a37fe700 10 mon.plana84@2(probing) e2 handle_probe_reply mon.1 10.214.132.37:6789/0mon_probe(reply ca6b70a6-4cf9-4ec0-81bb-ad54cde3cfa0 name plana41 quorum 0,1 versions {auth=6,logm=103,mdsmap=1,monmap=2,osdmap=1,pgmap=2}) v2
2012-10-03 07:46:21.433605 7fa7a37fe700 10 mon.plana84@2(probing) e2  monmap is e2: 3 mons at {plana41=10.214.132.37:6789/0,plana61=10.214.132.17:6789/0,plana84=0.0.0.0:6800/0}
2012-10-03 07:46:21.433637 7fa7a37fe700 10 mon.plana84@2(probing) e2  peer name is plana41
2012-10-03 07:46:21.433643 7fa7a37fe700 10 mon.plana84@2(probing) e2  existing quorum 0,1
2012-10-03 07:46:21.433648 7fa7a37fe700 10 mon.plana84@2(probing) e2  peer paxos machine auth v 6 vs my v 6 (ok)
2012-10-03 07:46:21.433652 7fa7a37fe700 10 mon.plana84@2(probing) e2  peer paxos machine logm v 103 vs my v 103 (ok)
2012-10-03 07:46:21.433656 7fa7a37fe700 10 mon.plana84@2(probing) e2  peer paxos machine mdsmap v 1 vs my v 1 (ok)
2012-10-03 07:46:21.433660 7fa7a37fe700 10 mon.plana84@2(probing) e2  peer paxos machine monmap v 2 vs my v 2 (ok)
2012-10-03 07:46:21.433664 7fa7a37fe700 10 mon.plana84@2(probing) e2  peer paxos machine osdmap v 1 vs my v 1 (ok)
2012-10-03 07:46:21.433668 7fa7a37fe700 10 mon.plana84@2(probing) e2  peer paxos machine pgmap v 2 vs my v 2 (ok)
2012-10-03 07:46:21.433672 7fa7a37fe700 10 mon.plana84@2(probing) e2  ready to join, but i'm not in the monmap or my addr is blank, trying to join
2012-10-03 07:46:23.431082 7fa7a2ffd700  4 mon.plana84@2(probing) e2 probe_timeout 0x7fa77c001880
2012-10-03 07:46:23.431098 7fa7a2ffd700 10 mon.plana84@2(probing) e2 bootstrap
2012-10-03 07:46:23.431102 7fa7a2ffd700 10 mon.plana84@2(probing) e2 unregister_cluster_logger - not registered
2012-10-03 07:46:23.431104 7fa7a2ffd700 10 mon.plana84@2(probing) e2 cancel_probe_timeout (none scheduled)
2012-10-03 07:46:23.431106 7fa7a2ffd700 10 mon.plana84@2(probing) e2 reset
2012-10-03 07:46:23.431110 7fa7a2ffd700 10 mon.plana84@2(probing) e2 cancel_probe_timeout (none scheduled)
2012-10-03 07:46:23.431114 7fa7a2ffd700 10 mon.plana84@2(probing) e2 reset_probe_timeout 0x7fa7740014a0 after 2 seconds
2012-10-03 07:46:23.431129 7fa7a2ffd700 10 mon.plana84@2(probing) e2 probing other monitors

This was repeated for a long, long time, until at some point we killed the leader (plana64) and plana84 was able to rejoin the quorum:


2012-10-03 07:48:17.442397 7fa7a37fe700 10 mon.plana84@2(probing) e2 handle_probe_reply mon.1 10.214.132.37:6789/0mon_probe(reply ca6b70a6-4cf9-4ec0-81bb-ad54cde3cfa0 name plana41 versions {auth=6,logm=103,mdsmap=1,monmap=2,osdmap=1,pgmap=2}) v2
2012-10-03 07:48:17.442408 7fa7a37fe700 10 mon.plana84@2(probing) e2  monmap is e2: 3 mons at {plana41=10.214.132.37:6789/0,plana61=10.214.132.17:6789/0,plana84=0.0.0.0:6800/0}
2012-10-03 07:48:17.442430 7fa7a37fe700 10 mon.plana84@2(probing) e2  peer name is plana41
2012-10-03 07:48:17.442434 7fa7a37fe700 10 mon.plana84@2(probing) e2  outside_quorum now plana41,plana84, need 2
2012-10-03 07:48:17.442436 7fa7a37fe700 10 mon.plana84@2(probing) e2  that's enough to form a new quorum, calling election
2012-10-03 07:48:17.442438 7fa7a37fe700 10 mon.plana84@2(probing) e2 start_election
2012-10-03 07:48:17.442440 7fa7a37fe700 10 mon.plana84@2(probing) e2 cancel_probe_timeout 0x7fa774000e70
2012-10-03 07:48:17.442475 7fa7a37fe700  0 log [INF] : mon.plana84 calling new monitor election
2012-10-03 07:48:17.442501 7fa7a37fe700  5 mon.plana84@2(electing).elector(0) start -- can i be leader?
2012-10-03 07:48:17.442504 7fa7a37fe700 10 mon.plana84@2(electing).elector(0) bump_epoch 0 to 1
2012-10-03 07:48:17.472064 7fa7a37fe700 10 mon.plana84@2(electing) e2 do not have session, making new one
2012-10-03 07:48:17.472068 7fa7a37fe700 10 mon.plana84@2(electing) e2 ms_dispatch new session MonSession: mon.2 0.0.0.0:6800/0 is open for mon.2 0.0.0.0:6800/0
2012-10-03 07:48:17.472075 7fa7a37fe700  5 mon.plana84@2(electing) e2 setting monitor caps on this connection
2012-10-03 07:48:22.158403 7fa7a37fe700  5 mon.plana84@2(electing).elector(1) handle_propose from mon.1
2012-10-03 07:48:22.158413 7fa7a37fe700 10 mon.plana84@2(electing).elector(1) bump_epoch 1 to 457
2012-10-03 07:48:22.197194 7fa7a37fe700  5 mon.plana84@2(electing).elector(457) defer to 1
2012-10-03 07:48:27.200153 7fa7a37fe700  5 mon.plana84@2(electing).elector(457) handle_victory from mon.1 quorum_features 0
2012-10-03 07:48:27.200158 7fa7a37fe700 10 mon.plana84@2(electing).elector(457) bump_epoch 457 to 458
2012-10-03 07:48:27.255911 7fa7a37fe700 10 mon.plana84@2(peon) e2 lose_election, epoch 458 leader is mon1 quorum is 1,2 features are 0

Actions

Copy link

Updated by Sage Weil over 11 years ago

it could be that in the burnupi environment one of the clients only has one of the mon addresses in its config, and is subsequently in a tight loop trying to reconnect. that might explain what is going on. look on burnupi03 and see which ip is connecting, and see what hapepns when you do something like 'ceph health --debug-ms 1 --debug-monc 20' from that node?

Actions

Copy link

Updated by Joao Eduardo Luis over 11 years ago

Status changed from 12 to 7
Backport set to argonaut

Looked like this was caused by a couple of lingering bugs on mon/Elector.cc.

One would not clean up the list of monitors that acked the current monitor on a previous, expired election. This would normally work if we specified a monmap during ceph-mon --mkfs (with --monmap <monmap>), since all monitors would keep their ranks in-between elections. When using the monitor initial peers feature however, given that their ranks would be recalculated each time we got to know about a new initial peer during a probe phase, we would end up with invalid ranks on the acked_me list.

The other was also triggered only on the same conditions, and was due to not bootstrapping the monitor when we learned about a new monmap during the beginning of an election. We must bootstrap the monitor to guarantee that we reset the monitor's state and start off a new probe phase, as well as abort any current election we might have going on, thus avoiding getting in an election loop.

The messages reported, however, have little to do with any of this. They are merely attempts from a client, running on each node of the burnupi, that would directly contact the monitor affected by this problem. Because it is not in the quorum, the monitor would constantly output the "discarding message ... sending client elsewhere; we are not in quorum" messages, but these would also be outputed in any other situation in which the monitor received a message, requiring belonging to the quorum. It happened to be all 'auth' messages because the client was issuing auth operations on a tight loop.

The patch for this can be found on branch 'mon-elector-fixes', commit 6b83f6628b2b5effcd92e7dedf9de42281026c07 .

It should also be backported to argonaut, as any version using the 'mon initial peers' feature should be affected. Further testing should be done in order to guarantee that it gone.

Actions

Copy link

Updated by Sage Weil over 11 years ago

Status changed from 7 to Resolved

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #3252

mon: mixed mon cluster of argonaut + master eats cpu on new code

Updated by Joao Eduardo Luis over 11 years ago

Updated by Joao Eduardo Luis over 11 years ago

Updated by Joao Eduardo Luis over 11 years ago

Updated by Sage Weil over 11 years ago

Updated by Joao Eduardo Luis over 11 years ago

Updated by Sage Weil over 11 years ago