Project

General

Profile

Actions

Feature #49269

open

cephadm: upgrade stuck in repeating sleep when a host is offline

Added by Gunther Heinrich over 3 years ago. Updated over 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
cephadm
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

Even though the documentation clearly mentions that all hosts should be online before you initiate an upgrade I nevertheless wanted to see how cephadm reacts when a host is offline in that time (for example a host might unexpectedly go offline during an upgrade due to a hardware failure).

So I started the upgrade in a cluster with two monitor hosts and one osd host offline and what I saw was that cephadm started as usual but then got stuck in a repeating sleep.

2021-02-12T05:52:07.943110+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214115) 151 : cephadm [INF] Upgrade: Checking mgr daemons...
2021-02-12T05:52:07.943532+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214115) 152 : cephadm [INF] Upgrade: Need to upgrade myself (mgr.iz-ceph-v1-mon-02.foqmfa)
2021-02-12T06:02:18.676352+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214115) 461 : cephadm [INF] Upgrade: Target is docker.io/ceph/ceph:v15.2.8 with id 5553b0cb212ca2aa220d33ba39d9c602c8412ce6c5febc57ef9cdc9c5844b185
2021-02-12T06:02:18.678718+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214115) 462 : cephadm [INF] Upgrade: Checking mgr daemons...
2021-02-12T06:02:18.679090+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214115) 463 : cephadm [INF] Upgrade: Need to upgrade myself (mgr.iz-ceph-v1-mon-02.foqmfa)
2021-02-12T06:12:28.015778+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214115) 769 : cephadm [INF] Upgrade: Target is docker.io/ceph/ceph:v15.2.8 with id 5553b0cb212ca2aa220d33ba39d9c602c8412ce6c5febc57ef9cdc9c5844b185
2021-02-12T06:12:28.018644+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214115) 770 : cephadm [INF] Upgrade: Checking mgr daemons...
2021-02-12T06:12:28.018973+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214115) 771 : cephadm [INF] Upgrade: Need to upgrade myself (mgr.iz-ceph-v1-mon-02.foqmfa)
[...]
2021-02-12T09:30:07.704808+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214115) 10423 : cephadm [INF] Upgrade: Target is docker.io/ceph/ceph:v15.2.8 with id 5553b0cb212ca2aa220d33ba39d9c602c8412ce6c5febc57ef9cdc9c5844b185
2021-02-12T09:30:07.707638+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214115) 10425 : cephadm [INF] Upgrade: Checking mgr daemons...
2021-02-12T09:30:07.708589+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214115) 10428 : cephadm [INF] Upgrade: Need to upgrade myself (mgr.iz-ceph-v1-mon-02.foqmfa)

Debug shows this:
2021-02-12T09:30:07.709513+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214115) 10431 : cephadm [DBG] Opening connection to root@iz-ceph-v1-mon-04 with ssh options '-F /tmp/cephadm-conf-g22x2uth -i /tmp/cephadm-identity-ydwgvq24'
2021-02-12T09:30:10.790286+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214115) 10433 : cephadm [DBG] Sleeping for 600 seconds

A manual ssh connection attempt shows the following error:
ssh: connect to host iz-ceph-v1-mon-04 port 22: No route to host


Related issues 2 (0 open2 closed)

Related to Orchestrator - Bug #49827: cephadm driven upgrade test takes 12 hours and still runningDuplicate

Actions
Related to Orchestrator - Bug #46204: cephadm upgrade test: fail if upgrade status is set to errorResolvedAdam King

Actions
Actions

Also available in: Atom PDF