Bug #2161
nonlinear scaling for PGMap::pg_stat encode
0%
Description
> OSDs size of pg_stat_t > latest encode > time > > 48 2976397 0.323052 > 72 4472477 0.666633 > 96 5969461 1.159198 > 120 7466021 1.738096 > 144 8963141 2.428229 > 168 10460309 3.203832 > 192 11956709 4.083013 > 240 14950445 6.453171 > 288 17916589 9.462052
My guesses are:
- something in bufferlist is doing something O(n) on the list<ptr>
- some map<> is getting hammered
?
Associated revisions
encoding: use iterator to copy_in encoded length
This gives us a pointer to the position into the list where the final
length value will be copied. Previously we used bl.copy_in(), which takes
a byte offset and needs iterator over the bufferlist to seek to the
correct position, resulting in O(n^2) encoding time for large structures.
Fixes: #2161
Reported-by: Jim Schutt <jaschut@sandia.gov>
Diagnosed-by: Ake van der Meer <petrabbit@xs4all.nl>
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
History
#1 Updated by Ake van der Meer almost 12 years ago
My ceph-osd processes run at 100% CPU for many minutes at a time doing this: http://pastebin.com/wYnPKWeJ
In src/include/buffer.h (version 0.44.1) I found the following comment about a set of operations including that copy_in():
// WARNING: this are horribly inefficient for large bufferlists.
The same may cause the above encoding behaviour?
#2 Updated by Sage Weil almost 12 years ago
Ake van der Meer wrote:
My ceph-osd processes run at 100% CPU for many minutes at a time doing this: http://pastebin.com/wYnPKWeJ
In src/include/buffer.h (version 0.44.1) I found the following comment about a set of operations including that copy_in():
// WARNING: this are horribly inefficient for large bufferlists.The same may cause the above encoding behaviour?
Aha! That is indeed the problem. Working up a fix now.
Thanks!
#3 Updated by Sage Weil almost 12 years ago
- Status changed from 12 to 7
- Target version set to v0.46
wip-encoding
#4 Updated by Sage Weil almost 12 years ago
- Status changed from 7 to Resolved