Project

General

Profile

Actions

Bug #65749

open

osd_max_pg_per_osd_hard_ratio 3 is set too low for real life

Added by Joshua Blanch 16 days ago. Updated 4 days ago.

Status:
Fix Under Review
Priority:
Normal
Category:
Peering
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
quincy,reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In the field this issue comes up very often. It is quite disruptive because PGs are stuck in activating state and the workaround is difficult to find.

Here is an example of how this can go wrong:

Yesterday I ran into a problem that seems like it could be a bug. When adding 9 new systems with 12x 18TB HDDs to an existing cluster with 80 systems with 5x 4TB HDDs I starting getting PGs stuck in activating even when bringing in the OSDs with a CRUSH weight of 2. According to the logs this is why they were stuck:
2024-04-30T14:30:31.026-0700 7f85f89a7700 1 osd.15 1425620 maybe_wait_for_max_pg withhold creation of pg 24.12df: 750 >= 750
2024-04-30T14:30:31.027-0700 7f85f89a7700 1 osd.15 1425620 maybe_wait_for_max_pg withhold creation of pg 36.15f: 750 >= 750
2024-04-30T14:30:31.034-0700 7f85f81a6700 1 osd.15 1425620 maybe_wait_for_max_pg withhold creation of pg 24.26a4: 750 >= 750
2024-04-30T14:30:31.034-0700 7f85f81a6700 1 osd.15 1425620 maybe_wait_for_max_pg withhold creation of pg 24.2c17: 750 >= 750
The problem is that these new drives aren't close to having 750 PGs yet. The existing drives have around 150 PGs each and the new drives might have seen upwards of 250 PGs, but not 750. This is the first expansion we've done on a cluster that has been upgraded to 16.2.15.

Actions #1

Updated by Dan van der Ster 16 days ago

  • Status changed from New to In Progress
  • Assignee set to Dan van der Ster
Actions #2

Updated by Dan van der Ster 16 days ago

  • Pull request ID set to 57217
Actions #3

Updated by Dan van der Ster 16 days ago

  • Status changed from In Progress to Fix Under Review
Actions #4

Updated by Radoslaw Zarzynski 11 days ago

Is there any trace of autoscaler-induced PG splitting visible during the situation?

Actions #5

Updated by Dan van der Ster 11 days ago

Radoslaw Zarzynski wrote in #note-4:

Is there any trace of autoscaler-induced PG splitting visible during the situation?

No PG splitting in the cases I've worked on related to this.

It feels like the cause is some temporary state while adding a new empty host where many more PGs than expected are mapped to one single OSD.
E.g. consider that new OSDs don't all go up/in simultaneously, so the cause here could be that the first few "up/in" OSDs in a would be mapped all of the PGs for that new host, thereby exceeding the limit.
(Shortly afterwards, other osds will come up/in and the PGs/OSD will drop back below the limit -- however, because one OSD blocked the activation of some PGs, those PGs seem to stay stuck activating forever).

Actions #6

Updated by Laura Flores 4 days ago

Dan, let's bring this issue to the next CDM do discuss its impacts.

Actions

Also available in: Atom PDF