Project

General

Profile

Actions

Bug #2382

closed

osd: unable to start due to 1 child already started

Added by Joao Eduardo Luis almost 12 years ago. Updated almost 12 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I had seen this bug a few days ago while setting up ceph on my desktop, but it went away by rerunning ./ceph-osd so I didn't give it a second thought.

jeffp have been seeing this bug for what appears to be quite a lot:

<jeffp> it's been intermittent but now it's happening every time
<gregaf> oh dear, we've got a race somewhere then...
<jeffp> yeah that's what i was thinking
<jeffp> i'm running a 5 node cluster running on centos inside virtualbox all on the same machine
<jeffp> maybe the slowness is exacerbating it

The error: -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
jeffp's log:

[root@osd0 ceph]# /etc/init.d/ceph -a start
=== mon.a ===
Starting Ceph mon.a on osd0...
starting mon.a rank 0 at 192.168.56.100:6789/0 mon_data /var/local/ceph/mon.a fsid abaf5302-13cc-4531-990f-c56935679649 === mon.b ===
Starting Ceph mon.b on osd1...
starting mon.b rank 1 at 192.168.56.101:6789/0 mon_data /var/local/ceph/mon.b fsid abaf5302-13cc-4531-990f-c56935679649 === mon.c ===
Starting Ceph mon.c on osd2...
starting mon.c rank 2 at 192.168.56.102:6789/0 mon_data /var/local/ceph/mon.c fsid abaf5302-13cc-4531-990f-c56935679649 === osd.0 ===
Starting Ceph osd.0 on osd0...
starting osd.0 at :/0 osd_data /var/local/ceph/osd.0 /var/local/ceph/osd.0/journal
2012-05-03 17:36:31.005537 7f98df29f760 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
failed: ' /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf '
Actions #1

Updated by Jeff Plaisance almost 12 years ago

update: if i start the osd's with ceph-osd manually this doesn't seem to happen - just with /etc/init.d/ceph start and service ceph start

Actions #2

Updated by Sage Weil almost 12 years ago

  • Status changed from New to 12
  • Assignee set to Sage Weil
  • Priority changed from High to Urgent

i saw this on congress too. will reproduce on my burnupi cluster and investigate.

Actions #3

Updated by Joao Eduardo Luis almost 12 years ago

I re-triggered this using 'CEPH_NUM_OSD=1 CEPH_NUM_MDS=1 CEPH_NUM_MON=1 ./vstart.sh' on my desktop (granted, it's a desktop).

The bug can pop up not only on the osd but also on the mon and the mds.

Actions #4

Updated by Joao Eduardo Luis almost 12 years ago

Also, occasionally, this also happens with ./init-ceph when starting all services, or each one individually. For instance, this happened 3 times in a row:

jecluis@Magrathea:~/Code/ceph/src$ ./init-ceph start mds === mds.a ===
Starting Ceph mds.a on Magrathea...
starting mds.a at :/0
2012-05-04 02:04:56.412819 7f9c5f5ae780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
failed: ' ./ceph-mds -i a --pid-file deploy/out/mds.a.pid -c ./ceph.conf '

It has never happened (if I recall correctly) when starting each service individually with ./ceph-{osd,mon,mds}.

Actions #5

Updated by Dennis Jacobfeuerborn almost 12 years ago

I just built fresh 0.46 rpms (ran 0.45 before) and now I'm seeing this too.
Notice the timestamps. I had to call this quite a few times in rapid succession until the daemon finally started:

[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:45.695992 7f0689416780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:56.127370 7f0cfc205780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:56.532800 7f7358edd780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:56.914539 7f7bf6424780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:57.265571 7f3f87f1d780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:57.652731 7f881ffbf780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:58.025576 7f95ad08e780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:58.398080 7face0770780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:58.758070 7fac62bb0780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:59.087255 7f28331ac780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:59.426267 7fe92a33a780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:59.776314 7fa4c3abf780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
[root@ceph1 ~]#

Actions #6

Updated by Sage Weil almost 12 years ago

ok, this is just a bad check. we're verifying there aren't threads because fork()/daemonize() will destroy them. the problem is we just stopped a thread (via join()), and then look in /proc to count threads.. and the kernel is apparently removing the /proc entry asynchronously.

I'm inclined to just remove this check. If threads get wiped out at daemonize() time we'll just have to figure that out the hard way.

Actions #7

Updated by Sage Weil almost 12 years ago

  • Status changed from 12 to Fix Under Review

wip-2382

Actions #8

Updated by Greg Farnum almost 12 years ago

  • Status changed from Fix Under Review to Resolved

Sounds good to me; I never liked depending on /proc for that anyway.
Merged into master. We probably want to put it in stable as well, but it was branched off master and I wasn't sure so I'll leave that for you.
(And I just put the first inktank email into the git history! Hurray!)

Actions

Also available in: Atom PDF