Bug #2382
closedosd: unable to start due to 1 child already started
0%
Description
I had seen this bug a few days ago while setting up ceph on my desktop, but it went away by rerunning ./ceph-osd so I didn't give it a second thought.
jeffp have been seeing this bug for what appears to be quite a lot:
<jeffp> it's been intermittent but now it's happening every time
<gregaf> oh dear, we've got a race somewhere then...
<jeffp> yeah that's what i was thinking
<jeffp> i'm running a 5 node cluster running on centos inside virtualbox all on the same machine
<jeffp> maybe the slowness is exacerbating it
The error: -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
jeffp's log:
[root@osd0 ceph]# /etc/init.d/ceph -a start
=== mon.a ===
Starting Ceph mon.a on osd0...
starting mon.a rank 0 at 192.168.56.100:6789/0 mon_data /var/local/ceph/mon.a fsid abaf5302-13cc-4531-990f-c56935679649
=== mon.b ===
Starting Ceph mon.b on osd1...
starting mon.b rank 1 at 192.168.56.101:6789/0 mon_data /var/local/ceph/mon.b fsid abaf5302-13cc-4531-990f-c56935679649
=== mon.c ===
Starting Ceph mon.c on osd2...
starting mon.c rank 2 at 192.168.56.102:6789/0 mon_data /var/local/ceph/mon.c fsid abaf5302-13cc-4531-990f-c56935679649
=== osd.0 ===
Starting Ceph osd.0 on osd0...
starting osd.0 at :/0 osd_data /var/local/ceph/osd.0 /var/local/ceph/osd.0/journal
2012-05-03 17:36:31.005537 7f98df29f760 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
failed: ' /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf '
Updated by Jeff Plaisance almost 12 years ago
update: if i start the osd's with ceph-osd manually this doesn't seem to happen - just with /etc/init.d/ceph start and service ceph start
Updated by Sage Weil almost 12 years ago
- Status changed from New to 12
- Assignee set to Sage Weil
- Priority changed from High to Urgent
i saw this on congress too. will reproduce on my burnupi cluster and investigate.
Updated by Joao Eduardo Luis almost 12 years ago
I re-triggered this using 'CEPH_NUM_OSD=1 CEPH_NUM_MDS=1 CEPH_NUM_MON=1 ./vstart.sh' on my desktop (granted, it's a desktop).
The bug can pop up not only on the osd but also on the mon and the mds.
Updated by Joao Eduardo Luis almost 12 years ago
Also, occasionally, this also happens with ./init-ceph when starting all services, or each one individually. For instance, this happened 3 times in a row:
jecluis@Magrathea:~/Code/ceph/src$ ./init-ceph start mds
=== mds.a ===
Starting Ceph mds.a on Magrathea...
starting mds.a at :/0
2012-05-04 02:04:56.412819 7f9c5f5ae780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
failed: ' ./ceph-mds -i a --pid-file deploy/out/mds.a.pid -c ./ceph.conf '
It has never happened (if I recall correctly) when starting each service individually with ./ceph-{osd,mon,mds}.
Updated by Dennis Jacobfeuerborn almost 12 years ago
I just built fresh 0.46 rpms (ran 0.45 before) and now I'm seeing this too.
Notice the timestamps. I had to call this quite a few times in rapid succession until the daemon finally started:
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:45.695992 7f0689416780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:56.127370 7f0cfc205780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:56.532800 7f7358edd780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:56.914539 7f7bf6424780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:57.265571 7f3f87f1d780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:57.652731 7f881ffbf780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:58.025576 7f95ad08e780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:58.398080 7face0770780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:58.758070 7fac62bb0780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:59.087255 7f28331ac780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:59.426267 7fe92a33a780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:59.776314 7fa4c3abf780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
[root@ceph1 ~]#
Updated by Sage Weil almost 12 years ago
ok, this is just a bad check. we're verifying there aren't threads because fork()/daemonize() will destroy them. the problem is we just stopped a thread (via join()), and then look in /proc to count threads.. and the kernel is apparently removing the /proc entry asynchronously.
I'm inclined to just remove this check. If threads get wiped out at daemonize() time we'll just have to figure that out the hard way.
Updated by Greg Farnum almost 12 years ago
- Status changed from Fix Under Review to Resolved
Sounds good to me; I never liked depending on /proc for that anyway.
Merged into master. We probably want to put it in stable as well, but it was branched off master and I wasn't sure so I'll leave that for you.
(And I just put the first inktank email into the git history! Hurray!)