Project

General

Profile

Bug #906

clustered mds: lchown not setting uid/gid

Added by Anonymous about 13 years ago. Updated almost 13 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This is from autotest ceph_pjd_fstest, job 257.

saw failure on client node
http://autotest.ceph.newdream.net/results/257-tv/group0/sepia89.ceph.dreamhost.com/status

Test Summary Report
-------------------
/usr/local/autotest/tests/pjd_fstest/src/tests/chmod/00.t (Wstat: 0 Tests: 58 Failed: 2)
Failed tests: 27, 31
/usr/local/autotest/tests/pjd_fstest/src/tests/chown/00.t (Wstat: 0 Tests: 171 Failed: 7)
Failed tests: 97, 102, 112, 135-137, 153
Files=184, Tests=1957, 199 wallclock secs ( 1.05 usr 0.27 sys + 0.40 cusr 0.86 csys = 2.58 CPU)
Result: FAIL
  • debugging chmod/00.t test 27
    • test code
      expect 0 create ${n0} 0644
      ctime1=`${fstest} stat ${n0} ctime`
      sleep 1
      expect 0 chmod ${n0} 0111
      ctime2=`${fstest} stat ${n0} ctime`
      test_check $ctime1 -lt $ctime2
    • TODO conclusion: chmod does not update ctime
  • debugging chmod/00.t test 31
    • test code
      expect 0 mkdir ${n0} 0755
      ctime1=`${fstest} stat ${n0} ctime`
      sleep 1
      expect 0 chmod ${n0} 0753
      ctime2=`${fstest} stat ${n0} ctime`
      test_check $ctime1 -lt $ctime2
    • TODO conclusion: same as test 27, for directories
  • debugging chown/00.t test 97, line 209
    • test code
      expect 0 create ${n0} 0644
      ctime1=`${fstest} stat ${n0} ctime`
      sleep 1
      expect 0 chown ${n0} 65534 65533
      expect 65534,65533 lstat ${n0} uid,gid
      ctime2=`${fstest} stat ${n0} ctime`
      test_check $ctime1 -lt $ctime2
    • TODO conclusion: chown does not update ctime
  • debugging chown/00.t test 102, line 218
    • test code
      expect 0 mkdir ${n0} 0755
      ctime1=`${fstest} stat ${n0} ctime`
      sleep 1
      expect 0 chown ${n0} 65534 65533
      expect 65534,65533 lstat ${n0} uid,gid
      ctime2=`${fstest} stat ${n0} ctime`
      test_check $ctime1 -lt $ctime2
    • TODO conclusion: same as test 97, for directories
  • debugging chown/00.t test 112, line 236
    • test code
      expect 0 symlink ${n1} ${n0}
      ctime1=`${fstest} lstat ${n0} ctime`
      sleep 1
      expect 0 lchown ${n0} 65534 65533
      expect 65534,65533 lstat ${n0} uid,gid
      ctime2=`${fstest} lstat ${n0} ctime`
      test_check $ctime1 -lt $ctime2
    • TODO conclusion: lchown does not update ctime of the symlink
  • debugging chown/00.t test 135-137, line 274-
    • test code
      expect 0 symlink ${n1} ${n0}
      expect 0 lchown ${n0} 65534 65533
      ctime1=`${fstest} lstat ${n0} ctime`
      sleep 1
      expect 0 -u 65534 -g 65532 lchown ${n0} 65534 65532
      expect 65534,65532 lstat ${n0} uid,gid
      ctime2=`${fstest} lstat ${n0} ctime`
      test_check $ctime1 -lt $ctime2
    • TODO conclusion 1: lchown does not change user/group of the symlink?
    • TODO conclusion 2: lchown does not update ctime of the symlink
  • debugging chown/00.t test 153, line 330-
    • test code
      expect 0 symlink ${n1} ${n0}
      ctime1=`${fstest} lstat ${n0} ctime`
      sleep 1
      expect 0 -- lchown ${n0} -1 -1
      ctime2=`${fstest} lstat ${n0} ctime`
      case "${os}:${fs}" in
      Linux:ext3)
      test_check $ctime1 -lt $ctime2
      ;;
      *)
      test_check $ctime1 -eq $ctime2
      ;;
      esac
    • TODO conclusion: lchown does not update ctime of the symlink?

History

#1 Updated by Greg Farnum about 13 years ago

It's possible this is correct, but have we checked that client and server times match? Otherwise this is probably a duplicate of #854.

#2 Updated by Anonymous about 13 years ago

Hmm. Looks like that explains the ctimes.

[0 tv@dreamer ~]$ dsh -m ubuntu@sepia14.ceph.dreamhost.com,ubuntu@sepia84.ceph.dreamhost.com,ubuntu@sepia85.ceph.dreamhost.com,ubuntu@sepia86.ceph.dreamhost.com,ubuntu@sepia88.ceph.dreamhost.com,ubuntu@sepia89.ceph.dreamhost.com date
ubuntu@sepia14.ceph.dreamhost.com: Mon Mar 21 14:16:09 PDT 2011
ubuntu@sepia84.ceph.dreamhost.com: Mon Mar 21 14:15:41 PDT 2011
ubuntu@sepia85.ceph.dreamhost.com: Mon Mar 21 14:15:39 PDT 2011
ubuntu@sepia86.ceph.dreamhost.com: Mon Mar 21 14:15:39 PDT 2011
ubuntu@sepia88.ceph.dreamhost.com: Mon Mar 21 14:15:21 PDT 2011
ubuntu@sepia89.ceph.dreamhost.com: Mon Mar 21 14:15:20 PDT 2011
[0 tv@dreamer ~]$ 

We need to get ntp or something on all the test machines..

That still doesn't explain this failing:

expect 65534,65532 lstat ${n0} uid,gid

#3 Updated by Sage Weil about 13 years ago

cfuse or kclient?

#4 Updated by Anonymous about 13 years ago

kclient by default now. I can rerun with cfuse if that helps.

#5 Updated by Sage Weil about 13 years ago

  • Target version set to v0.27
  • translation missing: en.field_position set to 324

#6 Updated by Sage Weil about 13 years ago

  • Subject changed from ctime not updated, lchown not working to lchown not setting uid/gid
  • translation missing: en.field_story_points set to 2
  • translation missing: en.field_position deleted (329)
  • translation missing: en.field_position set to 329

#7 Updated by Anonymous almost 13 years ago

Re-running as job 409, clocks are in decent sync:


$ dsh -m ubuntu@sepia17.ceph.dreamhost.com,ubuntu@sepia20.ceph.dreamhost.com,ubuntu@sepia21.ceph.dreamhost.com,ubuntu@sepia23.ceph.dreamhost.com,ubuntu@sepia13.ceph.dreamhost.com,ubuntu@sepia14.ceph.dreamhost.com date
ubuntu@sepia17.ceph.dreamhost.com: Mon Apr 11 13:18:59 PDT 2011
ubuntu@sepia20.ceph.dreamhost.com: Mon Apr 11 13:19:00 PDT 2011
ubuntu@sepia21.ceph.dreamhost.com: Mon Apr 11 13:19:00 PDT 2011
ubuntu@sepia23.ceph.dreamhost.com: Mon Apr 11 13:19:00 PDT 2011
ubuntu@sepia13.ceph.dreamhost.com: Mon Apr 11 13:19:00 PDT 2011
ubuntu@sepia14.ceph.dreamhost.com: Mon Apr 11 13:19:00 PDT 2011

yet logs have this http://autotest.ceph.newdream.net/results/409-tv/group0/sepia23.ceph.dreamhost.com/debug/client.0.log :

13:11:20 DEBUG| [stdout] /usr/local/autotest/tests/pjd_fstest/src/tests/chmod/11.t ..... ok
13:11:43 DEBUG| [stdout] /usr/local/autotest/tests/pjd_fstest/src/tests/chown/00.t ..... 
13:11:43 DEBUG| [stdout] not ok 135
13:11:43 DEBUG| [stdout] not ok 136
13:11:43 DEBUG| [stdout] not ok 137
13:11:43 DEBUG| [stdout] Failed 3/171 subtests 

#8 Updated by Sage Weil almost 13 years ago

  • Target version changed from v0.27 to v0.28

#9 Updated by Sage Weil almost 13 years ago

  • Subject changed from lchown not setting uid/gid to clustered mds: lchown not setting uid/gid

This isn't popping up with single mds... probably a clustering thing.

#10 Updated by Greg Farnum almost 13 years ago

  • translation missing: en.field_position deleted (355)
  • translation missing: en.field_position set to 640

#11 Updated by Greg Farnum almost 13 years ago

Still unable to reproduce this locally, and running it again on the autotest cluster it didn't fail.

It's possible we fixed this as part of the rename stuff? Or else it was just an odd symptom of the generic kclient issues with multi-MDS clusters.

#12 Updated by Anonymous almost 13 years ago

Here's an idea: run the autotest say 10 times (after the test, ssh to the sepia machines and ensure they've rebooted, and are not hanging on sync), if none of them fail then we'll call it resolved. And come back to it if up pops up again.

#13 Updated by Sage Weil almost 13 years ago

  • Target version changed from v0.28 to v0.29

#14 Updated by Sage Weil almost 13 years ago

audit of the uclinet vs kclient code turned up one difference, but it was a bug fix in kclient that was missing from the uclient, a4bd854f86fe641207f83ab26a0f1b7fdd3ec4f0. does the uclient still pass now, i wonder?

also, are there logs of this happening with the kclient?

#15 Updated by Sage Weil almost 13 years ago

Greg, what did you do before to reproduce this?

#16 Updated by Sage Weil almost 13 years ago

  • Target version changed from v0.29 to v0.30

#17 Updated by Greg Farnum almost 13 years ago

I don't think that I ever did manage to reproduce it.

I haven't thought it through much, but it's also possible this got fixed with some of the other caps changes that got made in the last month or so. I'm thinking specifically of a few bugs we had that occasionally directed cap updates to non-auth MDSes, though I don't remember the circumstances of those bugs off-hand.

#18 Updated by Sage Weil almost 13 years ago

  • Status changed from New to Can't reproduce
  • translation missing: en.field_position deleted (659)
  • translation missing: en.field_position set to 391

#19 Updated by Sage Weil almost 13 years ago

  • Target version deleted (v0.30)
  • translation missing: en.field_position deleted (401)
  • translation missing: en.field_position set to 1
  • translation missing: en.field_position changed from 1 to 685

Also available in: Atom PDF