Bug #1563
OSD isn't prioritizing data with waiting ops during transfer
0%
Description
ajm reported on irc that his MDS was stuck in replay, and it turned out to be because it was waiting for a read response to come back from an OSD. The OSD in question had died and been replaced (with a completely empty store), but it still should have been fetching requested data early while transferring the PG over.
History
#1 Updated by Sage Weil over 12 years ago
- translation missing: en.field_position set to 1
#2 Updated by Adam Jacob Muller over 12 years ago
log file from the affected OSD: http://adam.gs/osd.5.log.1317056213.bz2
md5(osd.5.log.1317056213.bz2) = c162d9359f861f43fc63f38a8444052a
bytes(osd.5.log.1317056213.bz2) = 209269740
#3 Updated by Greg Farnum over 12 years ago
Copied locally to kai:~gregf/logs/ajm_osd_not_prioritizing_reads.log
#4 Updated by Sage Weil over 12 years ago
- Status changed from New to Closed
2011-09-26 12:36:16.150430 7f66b08b4700 -- 64.188.54.43:6800/4514 <== mds0 64.188.54.36:6800/10043 6 ==== osd_op(mds0.55:40 200.0009335c [read 0~4194304] 1.2363) v2 ==== 127+0+0 (4098147964 0 0) 0x7f6673ca7bb0 con 0x7606c1e0
is the only blocked read, because
2011-09-26 12:36:16.405345 7f66b08b4700 osd5 55528 pg[1.363( v 48421'44428 lc 0'0 (48419'44424,48421'44428]+backlog n=2090 ec=2 les/c 55528/51135 55516/55516/55516) [5,6] r=0 mlcod 0'0 !hml active m=2090 u=1] missing 200.0009335c/head v 48421'44428, is unfound.
and that object is never found. The problem isn't prioritizing, it's in handling unfound/lost objects.