Bug #906
clustered mds: lchown not setting uid/gid
0%
Description
This is from autotest ceph_pjd_fstest, job 257.
saw failure on client node
http://autotest.ceph.newdream.net/results/257-tv/group0/sepia89.ceph.dreamhost.com/status
-------------------
/usr/local/autotest/tests/pjd_fstest/src/tests/chmod/00.t (Wstat: 0 Tests: 58 Failed: 2)
Failed tests: 27, 31
/usr/local/autotest/tests/pjd_fstest/src/tests/chown/00.t (Wstat: 0 Tests: 171 Failed: 7)
Failed tests: 97, 102, 112, 135-137, 153
Files=184, Tests=1957, 199 wallclock secs ( 1.05 usr 0.27 sys + 0.40 cusr 0.86 csys = 2.58 CPU)
Result: FAIL
- debugging chmod/00.t test 27
- test code
expect 0 create ${n0} 0644
ctime1=`${fstest} stat ${n0} ctime`
sleep 1
expect 0 chmod ${n0} 0111
ctime2=`${fstest} stat ${n0} ctime`
test_check $ctime1 -lt $ctime2 - TODO conclusion: chmod does not update ctime
- test code
- debugging chmod/00.t test 31
- test code
expect 0 mkdir ${n0} 0755
ctime1=`${fstest} stat ${n0} ctime`
sleep 1
expect 0 chmod ${n0} 0753
ctime2=`${fstest} stat ${n0} ctime`
test_check $ctime1 -lt $ctime2 - TODO conclusion: same as test 27, for directories
- test code
- debugging chown/00.t test 97, line 209
- test code
expect 0 create ${n0} 0644
ctime1=`${fstest} stat ${n0} ctime`
sleep 1
expect 0 chown ${n0} 65534 65533
expect 65534,65533 lstat ${n0} uid,gid
ctime2=`${fstest} stat ${n0} ctime`
test_check $ctime1 -lt $ctime2 - TODO conclusion: chown does not update ctime
- test code
- debugging chown/00.t test 102, line 218
- test code
expect 0 mkdir ${n0} 0755
ctime1=`${fstest} stat ${n0} ctime`
sleep 1
expect 0 chown ${n0} 65534 65533
expect 65534,65533 lstat ${n0} uid,gid
ctime2=`${fstest} stat ${n0} ctime`
test_check $ctime1 -lt $ctime2 - TODO conclusion: same as test 97, for directories
- test code
- debugging chown/00.t test 112, line 236
- test code
expect 0 symlink ${n1} ${n0}
ctime1=`${fstest} lstat ${n0} ctime`
sleep 1
expect 0 lchown ${n0} 65534 65533
expect 65534,65533 lstat ${n0} uid,gid
ctime2=`${fstest} lstat ${n0} ctime`
test_check $ctime1 -lt $ctime2 - TODO conclusion: lchown does not update ctime of the symlink
- test code
- debugging chown/00.t test 135-137, line 274-
- test code
expect 0 symlink ${n1} ${n0}
expect 0 lchown ${n0} 65534 65533
ctime1=`${fstest} lstat ${n0} ctime`
sleep 1
expect 0 -u 65534 -g 65532 lchown ${n0} 65534 65532
expect 65534,65532 lstat ${n0} uid,gid
ctime2=`${fstest} lstat ${n0} ctime`
test_check $ctime1 -lt $ctime2 - TODO conclusion 1: lchown does not change user/group of the symlink?
- TODO conclusion 2: lchown does not update ctime of the symlink
- test code
- debugging chown/00.t test 153, line 330-
- test code
expect 0 symlink ${n1} ${n0}
ctime1=`${fstest} lstat ${n0} ctime`
sleep 1
expect 0 -- lchown ${n0} -1 -1
ctime2=`${fstest} lstat ${n0} ctime`
case "${os}:${fs}" in
Linux:ext3)
test_check $ctime1 -lt $ctime2
;;
*)
test_check $ctime1 -eq $ctime2
;;
esac - TODO conclusion: lchown does not update ctime of the symlink?
- test code
History
#1 Updated by Greg Farnum about 13 years ago
It's possible this is correct, but have we checked that client and server times match? Otherwise this is probably a duplicate of #854.
#2 Updated by Anonymous about 13 years ago
Hmm. Looks like that explains the ctimes.
[0 tv@dreamer ~]$ dsh -m ubuntu@sepia14.ceph.dreamhost.com,ubuntu@sepia84.ceph.dreamhost.com,ubuntu@sepia85.ceph.dreamhost.com,ubuntu@sepia86.ceph.dreamhost.com,ubuntu@sepia88.ceph.dreamhost.com,ubuntu@sepia89.ceph.dreamhost.com date ubuntu@sepia14.ceph.dreamhost.com: Mon Mar 21 14:16:09 PDT 2011 ubuntu@sepia84.ceph.dreamhost.com: Mon Mar 21 14:15:41 PDT 2011 ubuntu@sepia85.ceph.dreamhost.com: Mon Mar 21 14:15:39 PDT 2011 ubuntu@sepia86.ceph.dreamhost.com: Mon Mar 21 14:15:39 PDT 2011 ubuntu@sepia88.ceph.dreamhost.com: Mon Mar 21 14:15:21 PDT 2011 ubuntu@sepia89.ceph.dreamhost.com: Mon Mar 21 14:15:20 PDT 2011 [0 tv@dreamer ~]$
We need to get ntp or something on all the test machines..
That still doesn't explain this failing:
expect 65534,65532 lstat ${n0} uid,gid
#3 Updated by Sage Weil about 13 years ago
cfuse or kclient?
#4 Updated by Anonymous about 13 years ago
kclient by default now. I can rerun with cfuse if that helps.
#5 Updated by Sage Weil about 13 years ago
- Target version set to v0.27
- translation missing: en.field_position set to 324
#6 Updated by Sage Weil about 13 years ago
- Subject changed from ctime not updated, lchown not working to lchown not setting uid/gid
- translation missing: en.field_story_points set to 2
- translation missing: en.field_position deleted (
329) - translation missing: en.field_position set to 329
#7 Updated by Anonymous almost 13 years ago
Re-running as job 409, clocks are in decent sync:
$ dsh -m ubuntu@sepia17.ceph.dreamhost.com,ubuntu@sepia20.ceph.dreamhost.com,ubuntu@sepia21.ceph.dreamhost.com,ubuntu@sepia23.ceph.dreamhost.com,ubuntu@sepia13.ceph.dreamhost.com,ubuntu@sepia14.ceph.dreamhost.com date ubuntu@sepia17.ceph.dreamhost.com: Mon Apr 11 13:18:59 PDT 2011 ubuntu@sepia20.ceph.dreamhost.com: Mon Apr 11 13:19:00 PDT 2011 ubuntu@sepia21.ceph.dreamhost.com: Mon Apr 11 13:19:00 PDT 2011 ubuntu@sepia23.ceph.dreamhost.com: Mon Apr 11 13:19:00 PDT 2011 ubuntu@sepia13.ceph.dreamhost.com: Mon Apr 11 13:19:00 PDT 2011 ubuntu@sepia14.ceph.dreamhost.com: Mon Apr 11 13:19:00 PDT 2011
yet logs have this http://autotest.ceph.newdream.net/results/409-tv/group0/sepia23.ceph.dreamhost.com/debug/client.0.log :
13:11:20 DEBUG| [stdout] /usr/local/autotest/tests/pjd_fstest/src/tests/chmod/11.t ..... ok 13:11:43 DEBUG| [stdout] /usr/local/autotest/tests/pjd_fstest/src/tests/chown/00.t ..... 13:11:43 DEBUG| [stdout] not ok 135 13:11:43 DEBUG| [stdout] not ok 136 13:11:43 DEBUG| [stdout] not ok 137 13:11:43 DEBUG| [stdout] Failed 3/171 subtests
#8 Updated by Sage Weil almost 13 years ago
- Target version changed from v0.27 to v0.28
#9 Updated by Sage Weil almost 13 years ago
- Subject changed from lchown not setting uid/gid to clustered mds: lchown not setting uid/gid
This isn't popping up with single mds... probably a clustering thing.
#10 Updated by Greg Farnum almost 13 years ago
- translation missing: en.field_position deleted (
355) - translation missing: en.field_position set to 640
#11 Updated by Greg Farnum almost 13 years ago
Still unable to reproduce this locally, and running it again on the autotest cluster it didn't fail.
It's possible we fixed this as part of the rename stuff? Or else it was just an odd symptom of the generic kclient issues with multi-MDS clusters.
#12 Updated by Anonymous almost 13 years ago
Here's an idea: run the autotest say 10 times (after the test, ssh to the sepia machines and ensure they've rebooted, and are not hanging on sync), if none of them fail then we'll call it resolved. And come back to it if up pops up again.
#13 Updated by Sage Weil almost 13 years ago
- Target version changed from v0.28 to v0.29
#14 Updated by Sage Weil almost 13 years ago
audit of the uclinet vs kclient code turned up one difference, but it was a bug fix in kclient that was missing from the uclient, a4bd854f86fe641207f83ab26a0f1b7fdd3ec4f0. does the uclient still pass now, i wonder?
also, are there logs of this happening with the kclient?
#15 Updated by Sage Weil almost 13 years ago
Greg, what did you do before to reproduce this?
#16 Updated by Sage Weil almost 13 years ago
- Target version changed from v0.29 to v0.30
#17 Updated by Greg Farnum almost 13 years ago
I don't think that I ever did manage to reproduce it.
I haven't thought it through much, but it's also possible this got fixed with some of the other caps changes that got made in the last month or so. I'm thinking specifically of a few bugs we had that occasionally directed cap updates to non-auth MDSes, though I don't remember the circumstances of those bugs off-hand.
#18 Updated by Sage Weil almost 13 years ago
- Status changed from New to Can't reproduce
- translation missing: en.field_position deleted (
659) - translation missing: en.field_position set to 391
#19 Updated by Sage Weil almost 13 years ago
- Target version deleted (
v0.30) - translation missing: en.field_position deleted (
401) - translation missing: en.field_position set to 1
- translation missing: en.field_position changed from 1 to 685