Actions
Feature #49269
opencephadm: upgrade stuck in repeating sleep when a host is offline
% Done:
0%
Source:
Community (user)
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:
Description
Even though the documentation clearly mentions that all hosts should be online before you initiate an upgrade I nevertheless wanted to see how cephadm reacts when a host is offline in that time (for example a host might unexpectedly go offline during an upgrade due to a hardware failure).
So I started the upgrade in a cluster with two monitor hosts and one osd host offline and what I saw was that cephadm started as usual but then got stuck in a repeating sleep.
2021-02-12T05:52:07.943110+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214115) 151 : cephadm [INF] Upgrade: Checking mgr daemons...
2021-02-12T05:52:07.943532+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214115) 152 : cephadm [INF] Upgrade: Need to upgrade myself (mgr.iz-ceph-v1-mon-02.foqmfa)
2021-02-12T06:02:18.676352+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214115) 461 : cephadm [INF] Upgrade: Target is docker.io/ceph/ceph:v15.2.8 with id 5553b0cb212ca2aa220d33ba39d9c602c8412ce6c5febc57ef9cdc9c5844b185
2021-02-12T06:02:18.678718+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214115) 462 : cephadm [INF] Upgrade: Checking mgr daemons...
2021-02-12T06:02:18.679090+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214115) 463 : cephadm [INF] Upgrade: Need to upgrade myself (mgr.iz-ceph-v1-mon-02.foqmfa)
2021-02-12T06:12:28.015778+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214115) 769 : cephadm [INF] Upgrade: Target is docker.io/ceph/ceph:v15.2.8 with id 5553b0cb212ca2aa220d33ba39d9c602c8412ce6c5febc57ef9cdc9c5844b185
2021-02-12T06:12:28.018644+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214115) 770 : cephadm [INF] Upgrade: Checking mgr daemons...
2021-02-12T06:12:28.018973+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214115) 771 : cephadm [INF] Upgrade: Need to upgrade myself (mgr.iz-ceph-v1-mon-02.foqmfa)
[...]
2021-02-12T09:30:07.704808+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214115) 10423 : cephadm [INF] Upgrade: Target is docker.io/ceph/ceph:v15.2.8 with id 5553b0cb212ca2aa220d33ba39d9c602c8412ce6c5febc57ef9cdc9c5844b185
2021-02-12T09:30:07.707638+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214115) 10425 : cephadm [INF] Upgrade: Checking mgr daemons...
2021-02-12T09:30:07.708589+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214115) 10428 : cephadm [INF] Upgrade: Need to upgrade myself (mgr.iz-ceph-v1-mon-02.foqmfa)
Debug shows this:
2021-02-12T09:30:07.709513+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214115) 10431 : cephadm [DBG] Opening connection to root@iz-ceph-v1-mon-04 with ssh options '-F /tmp/cephadm-conf-g22x2uth -i /tmp/cephadm-identity-ydwgvq24'
2021-02-12T09:30:10.790286+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214115) 10433 : cephadm [DBG] Sleeping for 600 seconds
A manual ssh connection attempt shows the following error:
ssh: connect to host iz-ceph-v1-mon-04 port 22: No route to host
Actions