Bug #56695
closed[RHEL stock] pjd test failures(a bug that need to wait the unlink to finish)
0%
Description
pdonnell@teuthology /ceph/teuthology-archive/pdonnell-2022-07-22_19:42:58-fs-wip-pdonnell-testing-20220721.235756-distro-default-smithi$ grep 'not ok' {'6945847','6945828','6945853','6945837','6945822','6945804','6945862','6945812'}/teu* 6945847/teuthology.log:2022-07-22T20:52:25.833 INFO:tasks.workunit.client.0.smithi135.stderr:+ echo 'not ok 4' 6945847/teuthology.log:2022-07-22T20:52:25.838 INFO:tasks.workunit.client.0.smithi135.stderr:+ echo 'not ok 6' 6945847/teuthology.log:2022-07-22T20:52:25.851 INFO:tasks.workunit.client.0.smithi135.stdout:not ok 4 6945847/teuthology.log:2022-07-22T20:52:25.852 INFO:tasks.workunit.client.0.smithi135.stdout:not ok 6 6945847/teuthology.log:2022-07-22T20:52:26.141 INFO:tasks.workunit.client.0.smithi135.stderr:+ echo 'not ok 8' 6945847/teuthology.log:2022-07-22T20:52:26.143 INFO:tasks.workunit.client.0.smithi135.stderr:+ echo 'not ok 9' 6945847/teuthology.log:2022-07-22T20:52:26.175 INFO:tasks.workunit.client.0.smithi135.stdout:not ok 8 6945847/teuthology.log:2022-07-22T20:52:26.175 INFO:tasks.workunit.client.0.smithi135.stdout:not ok 9 6945828/teuthology.log:2022-07-22T20:32:24.810 INFO:tasks.workunit.client.0.smithi100.stderr:+ echo 'not ok 4' 6945828/teuthology.log:2022-07-22T20:32:24.815 INFO:tasks.workunit.client.0.smithi100.stderr:+ echo 'not ok 6' 6945828/teuthology.log:2022-07-22T20:32:24.829 INFO:tasks.workunit.client.0.smithi100.stdout:not ok 4 6945828/teuthology.log:2022-07-22T20:32:24.829 INFO:tasks.workunit.client.0.smithi100.stdout:not ok 6 6945828/teuthology.log:2022-07-22T20:32:25.129 INFO:tasks.workunit.client.0.smithi100.stderr:+ echo 'not ok 8' 6945828/teuthology.log:2022-07-22T20:32:25.132 INFO:tasks.workunit.client.0.smithi100.stderr:+ echo 'not ok 9' 6945828/teuthology.log:2022-07-22T20:32:25.162 INFO:tasks.workunit.client.0.smithi100.stdout:not ok 8 6945828/teuthology.log:2022-07-22T20:32:25.162 INFO:tasks.workunit.client.0.smithi100.stdout:not ok 9 6945853/teuthology.log:2022-07-22T21:04:37.135 INFO:tasks.workunit.client.0.smithi107.stderr:+ echo 'not ok 4' 6945853/teuthology.log:2022-07-22T21:04:37.141 INFO:tasks.workunit.client.0.smithi107.stderr:+ echo 'not ok 6' 6945853/teuthology.log:2022-07-22T21:04:37.154 INFO:tasks.workunit.client.0.smithi107.stdout:not ok 4 6945853/teuthology.log:2022-07-22T21:04:37.155 INFO:tasks.workunit.client.0.smithi107.stdout:not ok 6 6945853/teuthology.log:2022-07-22T21:04:37.453 INFO:tasks.workunit.client.0.smithi107.stderr:+ echo 'not ok 8' 6945853/teuthology.log:2022-07-22T21:04:37.456 INFO:tasks.workunit.client.0.smithi107.stderr:+ echo 'not ok 9' 6945853/teuthology.log:2022-07-22T21:04:37.481 INFO:tasks.workunit.client.0.smithi107.stdout:not ok 8 6945853/teuthology.log:2022-07-22T21:04:37.481 INFO:tasks.workunit.client.0.smithi107.stdout:not ok 9 6945837/teuthology.log:2022-07-22T20:37:53.954 INFO:tasks.workunit.client.0.smithi017.stderr:+ echo 'not ok 8' 6945837/teuthology.log:2022-07-22T20:37:53.960 INFO:tasks.workunit.client.0.smithi017.stderr:+ echo 'not ok 9' 6945837/teuthology.log:2022-07-22T20:37:53.987 INFO:tasks.workunit.client.0.smithi017.stdout:not ok 8 6945837/teuthology.log:2022-07-22T20:37:53.987 INFO:tasks.workunit.client.0.smithi017.stdout:not ok 9 6945822/teuthology.log:2022-07-22T20:25:33.552 INFO:tasks.workunit.client.0.smithi029.stderr:+ echo 'not ok 8' 6945822/teuthology.log:2022-07-22T20:25:33.555 INFO:tasks.workunit.client.0.smithi029.stderr:+ echo 'not ok 9' 6945822/teuthology.log:2022-07-22T20:25:33.587 INFO:tasks.workunit.client.0.smithi029.stdout:not ok 8 6945822/teuthology.log:2022-07-22T20:25:33.587 INFO:tasks.workunit.client.0.smithi029.stdout:not ok 9 6945804/teuthology.log:2022-07-22T20:13:29.326 INFO:tasks.workunit.client.0.smithi006.stderr:+ echo 'not ok 4' 6945804/teuthology.log:2022-07-22T20:13:29.331 INFO:tasks.workunit.client.0.smithi006.stderr:+ echo 'not ok 6' 6945804/teuthology.log:2022-07-22T20:13:29.386 INFO:tasks.workunit.client.0.smithi006.stdout:not ok 4 6945804/teuthology.log:2022-07-22T20:13:29.387 INFO:tasks.workunit.client.0.smithi006.stdout:not ok 6 6945804/teuthology.log:2022-07-22T20:13:29.684 INFO:tasks.workunit.client.0.smithi006.stderr:+ echo 'not ok 8' 6945804/teuthology.log:2022-07-22T20:13:29.687 INFO:tasks.workunit.client.0.smithi006.stderr:+ echo 'not ok 9' 6945804/teuthology.log:2022-07-22T20:13:29.714 INFO:tasks.workunit.client.0.smithi006.stdout:not ok 8 6945804/teuthology.log:2022-07-22T20:13:29.714 INFO:tasks.workunit.client.0.smithi006.stdout:not ok 9 6945862/teuthology.log:2022-07-22T21:14:09.659 INFO:tasks.workunit.client.0.smithi027.stderr:+ echo 'not ok 8' 6945862/teuthology.log:2022-07-22T21:14:09.663 INFO:tasks.workunit.client.0.smithi027.stderr:+ echo 'not ok 9' 6945862/teuthology.log:2022-07-22T21:14:09.710 INFO:tasks.workunit.client.0.smithi027.stdout:not ok 8 6945862/teuthology.log:2022-07-22T21:14:09.710 INFO:tasks.workunit.client.0.smithi027.stdout:not ok 9 6945812/teuthology.log:2022-07-22T20:19:02.363 INFO:tasks.workunit.client.0.smithi064.stderr:+ echo 'not ok 4' 6945812/teuthology.log:2022-07-22T20:19:02.376 INFO:tasks.workunit.client.0.smithi064.stderr:+ echo 'not ok 6' 6945812/teuthology.log:2022-07-22T20:19:02.389 INFO:tasks.workunit.client.0.smithi064.stdout:not ok 4 6945812/teuthology.log:2022-07-22T20:19:02.390 INFO:tasks.workunit.client.0.smithi064.stdout:not ok 6 6945812/teuthology.log:2022-07-22T20:19:02.689 INFO:tasks.workunit.client.0.smithi064.stderr:+ echo 'not ok 8' 6945812/teuthology.log:2022-07-22T20:19:02.692 INFO:tasks.workunit.client.0.smithi064.stderr:+ echo 'not ok 9' 6945812/teuthology.log:2022-07-22T20:19:06.580 INFO:tasks.workunit.client.0.smithi064.stdout:not ok 8 6945812/teuthology.log:2022-07-22T20:19:06.581 INFO:tasks.workunit.client.0.smithi064.stdout:not ok 9
/ceph/teuthology-archive/pdonnell-2022-07-22_19:42:58-fs-wip-pdonnell-testing-20220721.235756-distro-default-smithi/6945847/teuthology.log
/ceph/teuthology-archive/pdonnell-2022-07-22_19:42:58-fs-wip-pdonnell-testing-20220721.235756-distro-default-smithi/6945828/teuthology.log
/ceph/teuthology-archive/pdonnell-2022-07-22_19:42:58-fs-wip-pdonnell-testing-20220721.235756-distro-default-smithi/6945853/teuthology.log
/ceph/teuthology-archive/pdonnell-2022-07-22_19:42:58-fs-wip-pdonnell-testing-20220721.235756-distro-default-smithi/6945837/teuthology.log
/ceph/teuthology-archive/pdonnell-2022-07-22_19:42:58-fs-wip-pdonnell-testing-20220721.235756-distro-default-smithi/6945822/teuthology.log
/ceph/teuthology-archive/pdonnell-2022-07-22_19:42:58-fs-wip-pdonnell-testing-20220721.235756-distro-default-smithi/6945804/teuthology.log
/ceph/teuthology-archive/pdonnell-2022-07-22_19:42:58-fs-wip-pdonnell-testing-20220721.235756-distro-default-smithi/6945862/teuthology.log
/ceph/teuthology-archive/pdonnell-2022-07-22_19:42:58-fs-wip-pdonnell-testing-20220721.235756-distro-default-smithi/6945812/teuthology.log
Updated by Patrick Donnelly almost 2 years ago
- Related to Bug #55804: qa failure: pjd link tests failed added
Updated by Xiubo Li almost 2 years ago
- Status changed from New to In Progress
Currently the kclient's testing branch has merged the fscryption name related patches, which will limit the MAX_NAME to 240 instead of the default 256.
Will found one way to fix it.
Updated by Xiubo Li almost 2 years ago
Xiubo Li wrote:
Currently the kclient's testing branch has merged the fscryption name related patches, which will limit the MAX_NAME to 240 instead of the default 256.
Will found one way to fix it.
Sorry, it seems not introduced by this:
Checked the first teuthology test link:
2022-07-22T20:31:40.242 INFO:teuthology.orchestra.run.smithi135.stdout:Package kernel-4.18.0-372.9.1.el8.x86_64 is already installed.
It was using the distro kernel.
Updated by Xiubo Li almost 2 years ago
Test this with the latest testing kclient branch, I couldn't reproduce it.
Will switch to use the distro kernel and have a try.
Updated by Xiubo Li almost 2 years ago
Tried 4.18.0-348.20.1.el8_5.x86_64 and couldn't reproduce it.
Will try the exact same version of kernel-4.18.0-372.9.1.el8.x86_64 with the qa test failures.
Updated by Xiubo Li almost 2 years ago
Xiubo Li wrote:
Tried 4.18.0-348.20.1.el8_5.x86_64 and couldn't reproduce it.
Will try the exact same version of kernel-4.18.0-372.9.1.el8.x86_64 with the qa test failures.
Still the same, couldn't reproduce it too.
Updated by Xiubo Li almost 2 years ago
- Project changed from Linux kernel client to CephFS
- Component(FS) MDS added
Patrick Donnelly wrote:
[...]
/ceph/teuthology-archive/pdonnell-2022-07-22_19:42:58-fs-wip-pdonnell-testing-20220721.235756-distro-default-smithi/6945847/teuthology.log
It's a MDS side bug, not kclient's:
Such as for the following error, the errno is EXDEV:
2022-07-22T20:52:25.830 INFO:tasks.workunit.client.0.smithi135.stderr:++ expr 3 + 1 2022-07-22T20:52:25.831 INFO:tasks.workunit.client.0.smithi135.stderr:+ ntest=4 2022-07-22T20:52:25.831 INFO:tasks.workunit.client.0.smithi135.stderr:+ expect 0 link fstest_d22bbf65b5ea67dbd4f2534c56018c49 _123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234 2022-07-22T20:52:25.831 INFO:tasks.workunit.client.0.smithi135.stderr:+ e=0 2022-07-22T20:52:25.831 INFO:tasks.workunit.client.0.smithi135.stderr:+ shift 2022-07-22T20:52:25.831 INFO:tasks.workunit.client.0.smithi135.stderr:++ /home/ubuntu/cephtest/mnt.0/client.0/tmp/tmp/../pjd-fstest-20090130-RC/tests/link/../../fstest link fstest_d22bbf65b5ea67dbd4f2534c56018c49 _123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234 2022-07-22T20:52:25.832 INFO:tasks.workunit.client.0.smithi135.stderr:++ tail -1 2022-07-22T20:52:25.832 INFO:tasks.workunit.client.0.smithi135.stderr:+ r=EXDEV 2022-07-22T20:52:25.832 INFO:tasks.workunit.client.0.smithi135.stderr:+ echo EXDEV 2022-07-22T20:52:25.832 INFO:tasks.workunit.client.0.smithi135.stderr:+ egrep '^0$' 2022-07-22T20:52:25.833 INFO:tasks.workunit.client.0.smithi135.stderr:+ '[' 1 -eq 0 ']' 2022-07-22T20:52:25.833 INFO:tasks.workunit.client.0.smithi135.stderr:+ echo 'not ok 4'
And in the remote/smithi141/log/a190153a-09fe-11ed-842f-001a4aab830c/ceph-mds.b.log.gz, before the pjd test #4 in test #3 it will unlink the file _123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234:
teuthology logs:
2022-07-22T20:52:25.828 INFO:tasks.workunit.client.0.smithi135.stderr:+ ntest=3 2022-07-22T20:52:25.828 INFO:tasks.workunit.client.0.smithi135.stderr:+ expect 0 unlink _123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234
mds.b logs:
2022-07-22T20:52:25.348+0000 7fe2bf6d4700 1 -- [v2:172.21.15.141:6838/1279389197,v1:172.21.15.141:6839/1279389197] <== client.24956 172.21.15.135:0/377103332 1199 ==== client_request(client.24956:592 unlink #0x10000000220/_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234 2022-07-22T20:52:25.348771+0000 caller_uid=0, caller_gid=0{0,}) v4 ==== 443+0+0 (crc 0 0 0) 0x55b102d34b00 con 0x55b10295e400 ... 2022-07-22T20:52:25.348+0000 7fe2bf6d4700 10 mds.0.locker process_request_cap_release client.24956 pAsXsFscr on [inode 0x100000000d3 [2,head] /volumes/_nogroup/sv_1/6ae107c1-dbbf-417a-8a51-99df917679be/client.0/tmp/tmp/_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234 auth v455 pv458 ap=3 s=0 n(v0 1=1+0)->n(v0 rc2022-07-22T20:52:25.343772+0000 1=1+0) (ilink xlockdone x=1) (iversion lock w=1 last_client=24956) caps={24956=pAsLsXsFscr/pFscr@3},l=24956 | request=2 lock=3 caps=1 authpin=1 0x55b103050000]
And after that the inode 0x100000000d3 will go into the stray directory:
2022-07-22T20:52:25.349+0000 7fe2bf6d4700 10 mds.0.locker got xlock on (dn xlock x=1 by 0x55b1028aec00) [dentry #0x100/stray1/100000000d3 [2,head] auth NULL (dn xlock x=1 by 0x55b1028aec00) (dversion lock) pv=0 v=492 ap=2 ino=(nil) state=1342177281|new | request=1 lock=1 authpin=1 0x55b103059b80]
So when the pjd test #4 began later:
2022-07-22T20:52:25.353+0000 7fe2bf6d4700 1 -- [v2:172.21.15.141:6838/1279389197,v1:172.21.15.141:6839/1279389197] <== client.24956 172.21.15.135:0/377103332 1200 ==== client_request(client.24956:593 link #0x10000000220/_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234 #0x10000000220/fstest_d22bbf65b5ea67dbd4f2534c56018c49 2022-07-22T20:52:25.353771+0000 caller_uid=0, caller_gid=0{0,}) v4 ==== 438+0+0 (crc 0 0 0) 0x55b102b4d8c0 con 0x55b10295e400
And when traversing the path2, it will find the it in the stray directory:
2022-07-22T20:52:25.354+0000 7fe2bf6d4700 7 mds.0.cache traverse: opening base ino 0x10000000220 snap head 2022-07-22T20:52:25.354+0000 7fe2bf6d4700 10 mds.0.locker try_rdlock_snap_layout request(client.24956:593 nref=3 cr=0x55b102b4d8c0) [inode 0x10000000220 [...2,head] /volumes/_nogroup/sv_1/6ae107c1-dbbf-417a-8a51-99df917679be/client.0/tmp/tmp/ auth{1=1,2=1,3=1,4=1} v199 pv209 ap=6 DIRTYPARENT f(v0 m2022-07-22T20:52:25.274773+0000 1=0+1)->f(v0 m2022-07-22T20:52:25.348771+0000 1=1+0) n(v3 rc2022-07-22T20:52:25.274773+0000 2=0+2)->n(v3 rc2022-07-22T20:52:25.348771+0000 1=0+1) (iauth sync r=1) (isnap sync r=5) (inest mix w=4 dirty) (ipolicy sync r=1) (ifile excl w=4) (iversion lock w=4 last_client=24956) caps={24956=pAsLsXsFsxcr/pAsLsXsFsxcral@52},l=24956 | dirtyscattered=1 request=0 lock=6 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=1 replicated=1 dirty=1 authpin=1 scrubqueue=0 0x55b1030a2c00] 2022-07-22T20:52:25.354+0000 7fe2bf6d4700 12 mds.0.cache traverse: path seg depth 0 'fstest_d22bbf65b5ea67dbd4f2534c56018c49' snapid head 2022-07-22T20:52:25.354+0000 7fe2bf6d4700 20 mds.0.cache.dir(0x10000000220) lookup (fstest_d22bbf65b5ea67dbd4f2534c56018c49, 'head') 2022-07-22T20:52:25.354+0000 7fe2bf6d4700 20 mds.0.cache.dir(0x10000000220) hit -> (fstest_d22bbf65b5ea67dbd4f2534c56018c49,head) 2022-07-22T20:52:25.354+0000 7fe2bf6d4700 7 mds.0.cache linking in remote in [inode 0x100000000d3 [2,head] ~mds0/stray1/100000000d3 auth v455 pv493 ap=4 s=0 n(v0 1=1+0)->n(v0 rc2022-07-22T20:52:25.348771+0000 1=1+0) (ilink xlockdone x=2) (iversion lock w=2 last_client=24956) caps={24956=pAsXsFscr/pFscr@3},l=24956 | request=3 lock=5 caps=1 authpin=1 0x55b103050000]
And then it assume the two paths were in two different volumes and then fail it with EXDEV:
2022-07-22T20:52:25.354+0000 7fe2bf6d4700 7 mds.0.server target is in different subvolume, failing... 2022-07-22T20:52:25.354+0000 7fe2bf6d4700 7 mds.0.server reply_client_request -18 ((18) Invalid cross-device link) client_request(client.24956:593 link #0x10000000220/_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234 #0x10000000220/fstest_d22bbf65b5ea67dbd4f2534c56018c49 2022-07-22T20:52:25.353771+0000 caller_uid=0, caller_gid=0{0,}) v4
Updated by Xiubo Li almost 2 years ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 47399
Updated by Xiubo Li about 1 year ago
- Subject changed from [RHEL stock] pjd test failures to [RHEL stock] pjd test failures(a bug that need to wait the unlink to finish)
Updated by Venky Shankar about 1 year ago
- Category set to Correctness/Safety
- Status changed from Fix Under Review to Pending Backport
- Target version set to v19.0.0
- Backport changed from quincy,pacific to reef,quincy,pacific
Updated by Backport Bot about 1 year ago
- Copied to Backport #59384: reef: [RHEL stock] pjd test failures(a bug that need to wait the unlink to finish) added
Updated by Backport Bot about 1 year ago
- Copied to Backport #59385: quincy: [RHEL stock] pjd test failures(a bug that need to wait the unlink to finish) added
Updated by Backport Bot about 1 year ago
- Copied to Backport #59386: pacific: [RHEL stock] pjd test failures(a bug that need to wait the unlink to finish) added
Updated by Kotresh Hiremath Ravishankar about 1 year ago
reef backport is not merged yet. Putting this for record.
Updated by Patrick Donnelly 9 months ago
- Related to Bug #58340: mds: fsstress.sh hangs with multimds (deadlock between unlink and reintegrate straydn(rename)) added
Updated by Patrick Donnelly 9 months ago
- Related to Bug #61818: mds: deadlock between unlink and linkmerge added