Bug #65749: osd_max_pg_per_osd_hard_ratio 3 is set too low for real life - RADOS - Ceph

Actions

Copy link

Bug #65749

open

« Previous | Next »

osd_max_pg_per_osd_hard_ratio 3 is set too low for real life

Added by Joshua Blanch 16 days ago. Updated 4 days ago.

Status:

Fix Under Review

Priority:

Normal

Assignee:

Dan van der Ster

Category:

Peering

Target version:

% Done:

Source:

Community (dev)

Tags:

Backport:

quincy,reef

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

57217

Crash signature (v1):

Crash signature (v2):

Description

In the field this issue comes up very often. It is quite disruptive because PGs are stuck in activating state and the workaround is difficult to find.

Here is an example of how this can go wrong:

Yesterday I ran into a problem that seems like it could be a bug. When adding 9 new systems with 12x 18TB HDDs to an existing cluster with 80 systems with 5x 4TB HDDs I starting getting PGs stuck in activating even when bringing in the OSDs with a CRUSH weight of 2. According to the logs this is why they were stuck:
2024-04-30T14:30:31.026-0700 7f85f89a7700 1 osd.15 1425620 maybe_wait_for_max_pg withhold creation of pg 24.12df: 750 >= 750
2024-04-30T14:30:31.027-0700 7f85f89a7700 1 osd.15 1425620 maybe_wait_for_max_pg withhold creation of pg 36.15f: 750 >= 750
2024-04-30T14:30:31.034-0700 7f85f81a6700 1 osd.15 1425620 maybe_wait_for_max_pg withhold creation of pg 24.26a4: 750 >= 750
2024-04-30T14:30:31.034-0700 7f85f81a6700 1 osd.15 1425620 maybe_wait_for_max_pg withhold creation of pg 24.2c17: 750 >= 750
The problem is that these new drives aren't close to having 750 PGs yet. The existing drives have around 150 PGs each and the new drives might have seen upwards of 250 PGs, but not 750. This is the first expansion we've done on a cluster that has been upgraded to 16.2.15.

Actions

Copy link

Updated by Dan van der Ster 16 days ago

Status changed from New to In Progress
Assignee set to Dan van der Ster

Actions

Copy link

Updated by Dan van der Ster 16 days ago

Pull request ID set to 57217

Actions

Copy link

Updated by Dan van der Ster 16 days ago

Status changed from In Progress to Fix Under Review

Actions

Copy link

Updated by Radoslaw Zarzynski 11 days ago

Is there any trace of autoscaler-induced PG splitting visible during the situation?

Actions

Copy link

Updated by Dan van der Ster 11 days ago

Radoslaw Zarzynski wrote in #note-4:

Is there any trace of autoscaler-induced PG splitting visible during the situation?

No PG splitting in the cases I've worked on related to this.

It feels like the cause is some temporary state while adding a new empty host where many more PGs than expected are mapped to one single OSD.
E.g. consider that new OSDs don't all go up/in simultaneously, so the cause here could be that the first few "up/in" OSDs in a would be mapped all of the PGs for that new host, thereby exceeding the limit.
(Shortly afterwards, other osds will come up/in and the PGs/OSD will drop back below the limit -- however, because one OSD blocked the activation of some PGs, those PGs seem to stay stuck activating forever).

Actions

Copy link

Updated by Laura Flores 4 days ago

Dan, let's bring this issue to the next CDM do discuss its impacts.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #65749

osd_max_pg_per_osd_hard_ratio 3 is set too low for real life

Updated by Dan van der Ster 16 days ago

Updated by Dan van der Ster 16 days ago

Updated by Dan van der Ster 16 days ago

Updated by Radoslaw Zarzynski 11 days ago

Updated by Dan van der Ster 11 days ago

Updated by Laura Flores 4 days ago