Project

General

Profile

Bug #2161

nonlinear scaling for PGMap::pg_stat encode

Added by Sage Weil about 12 years ago. Updated almost 12 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Monitor
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

> OSDs size of pg_stat_t
> latest encode
> time
> 
> 48 2976397 0.323052
> 72 4472477 0.666633
> 96 5969461 1.159198
> 120 7466021 1.738096
> 144 8963141 2.428229
> 168 10460309 3.203832
> 192 11956709 4.083013
> 240 14950445 6.453171
> 288 17916589 9.462052

My guesses are:
- something in bufferlist is doing something O(n) on the list<ptr>
- some map<> is getting hammered
?

Associated revisions

Revision 98326968 (diff)
Added by Sage Weil almost 12 years ago

encoding: use iterator to copy_in encoded length

This gives us a pointer to the position into the list where the final
length value will be copied. Previously we used bl.copy_in(), which takes
a byte offset and needs iterator over the bufferlist to seek to the
correct position, resulting in O(n^2) encoding time for large structures.

Fixes: #2161
Reported-by: Jim Schutt <>
Diagnosed-by: Ake van der Meer <>
Signed-off-by: Sage Weil <>

History

#1 Updated by Ake van der Meer almost 12 years ago

My ceph-osd processes run at 100% CPU for many minutes at a time doing this: http://pastebin.com/wYnPKWeJ

In src/include/buffer.h (version 0.44.1) I found the following comment about a set of operations including that copy_in():
// WARNING: this are horribly inefficient for large bufferlists.

The same may cause the above encoding behaviour?

#2 Updated by Sage Weil almost 12 years ago

Ake van der Meer wrote:

My ceph-osd processes run at 100% CPU for many minutes at a time doing this: http://pastebin.com/wYnPKWeJ

In src/include/buffer.h (version 0.44.1) I found the following comment about a set of operations including that copy_in():
// WARNING: this are horribly inefficient for large bufferlists.

The same may cause the above encoding behaviour?

Aha! That is indeed the problem. Working up a fix now.

Thanks!

#3 Updated by Sage Weil almost 12 years ago

  • Status changed from 12 to 7
  • Target version set to v0.46

wip-encoding

#4 Updated by Sage Weil almost 12 years ago

  • Status changed from 7 to Resolved

Also available in: Atom PDF