Bug #64424
openCeph orch unsuitable for stateless / RAM-booted hosts
0%
Description
Hi, I'm extremely unhappy with how the new Ceph orchestrator handles node reboots, especially if those nodes are RAM-booted.
I have 78 hosts with 16-20 OSDs each. The hosts are all PXE-booted from a network-provisioned image, i.e. they are statelesss and require redeployment of OSDs after each reboot. Ceph orch's reconciliation time after a reboot is around 25-30 minutes, which is absolute unacceptable. I.e. I have to wait half an hour after rebooting a node before I can reboot the next. That makes a rolling reboot, which usually takes a few hours, a 2-day ordeal.
The only way around this I found was manually deploying all OSDs with a custom script using cephadm deploy. However, then I have to mess with the unit.meta file to sort OSDs into the appropriate orch service and replicate all the filter logic that I have in my service YAML.
One potential solution I see here is a local cephadm command that triggers an immediate reconciliation for the current host, redeploying all necessary services.
Since this is such a critical part of the whole system, I'm labelling this as a bug report rather than a feature request. Feel free to relabel.