Project

General

Profile

Actions

Bug #3087

closed

Hung ceph_msg_kfree

Added by Matt Garner over 11 years ago. Updated over 9 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
libceph
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Sep 4 00:37:00 rmi-orem-ceph1-mds1 kernel: [233040.316059] INFO: task smbd:31483 blocked for more than 120 seconds.
Sep 4 00:37:00 rmi-orem-ceph1-mds1 kernel: [233040.316776] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 4 00:37:00 rmi-orem-ceph1-mds1 kernel: [233040.317650] smbd D 0000000000000000 0 31483 995 0x00000000
Sep 4 00:37:00 rmi-orem-ceph1-mds1 kernel: [233040.317657] ffff8800e5719ae8 0000000000000082 ffffffffa00bff1b 0000000000000000
Sep 4 00:37:00 rmi-orem-ceph1-mds1 kernel: [233040.317664] ffff8800e5719fd8 ffff8800e5719fd8 ffff8800e5719fd8 0000000000013780
Sep 4 00:37:00 rmi-orem-ceph1-mds1 kernel: [233040.317670] ffff8801083b96f0 ffff880107050000 ffff8800e5719ad8 ffff8800e22e3d10
Sep 4 00:37:00 rmi-orem-ceph1-mds1 kernel: [233040.317676] Call Trace:
Sep 4 00:37:00 rmi-orem-ceph1-mds1 kernel: [233040.317690] [<ffffffffa00bff1b>] ? ceph_msg_kfree+0x2b/0x50 [libceph]
Sep 4 00:37:00 rmi-orem-ceph1-mds1 kernel: [233040.317697] [<ffffffff8165a55f>] schedule+0x3f/0x60
Sep 4 00:37:00 rmi-orem-ceph1-mds1 kernel: [233040.317704] [<ffffffff8165b367>] __mutex_lock_slowpath+0xd7/0x150
Sep 4 00:37:00 rmi-orem-ceph1-mds1 kernel: [233040.317708] [<ffffffff8165af7a>] mutex_lock+0x2a/0x50
Sep 4 00:37:00 rmi-orem-ceph1-mds1 kernel: [233040.317713] [<ffffffff81184946>] do_lookup+0x1d6/0x310
Sep 4 00:37:00 rmi-orem-ceph1-mds1 kernel: [233040.317718] [<ffffffff8129c82c>] ? security_inode_permission+0x1c/0x30
Sep 4 00:37:00 rmi-orem-ceph1-mds1 kernel: [233040.317722] [<ffffffff81185278>] link_path_walk+0x138/0x870
Sep 4 00:37:00 rmi-orem-ceph1-mds1 kernel: [233040.317726] [<ffffffff8165c46e>] ? _raw_spin_lock+0xe/0x20
Sep 4 00:37:00 rmi-orem-ceph1-mds1 kernel: [233040.317730] [<ffffffff8119535e>] ? vfsmount_lock_local_unlock+0x1e/0x30
Sep 4 00:37:00 rmi-orem-ceph1-mds1 kernel: [233040.317734] [<ffffffff8118633f>] ? path_init+0x16f/0x3c0
Sep 4 00:37:00 rmi-orem-ceph1-mds1 kernel: [233040.317738] [<ffffffff8119724f>] ? mntput+0x1f/0x30
Sep 4 00:37:00 rmi-orem-ceph1-mds1 kernel: [233040.317741] [<ffffffff811865e8>] path_lookupat+0x58/0x750
Sep 4 00:37:00 rmi-orem-ceph1-mds1 kernel: [233040.317747] [<ffffffff81318c77>] ? __strncpy_from_user+0x27/0x60
Sep 4 00:37:00 rmi-orem-ceph1-mds1 kernel: [233040.317751] [<ffffffff81186d11>] do_path_lookup+0x31/0xc0
Sep 4 00:37:00 rmi-orem-ceph1-mds1 kernel: [233040.317754] [<ffffffff81187819>] user_path_at_empty+0x59/0xa0
Sep 4 00:37:00 rmi-orem-ceph1-mds1 kernel: [233040.317759] [<ffffffff81177452>] ? do_sync_read+0xd2/0x110
Sep 4 00:37:00 rmi-orem-ceph1-mds1 kernel: [233040.317762] [<ffffffff81187871>] user_path_at+0x11/0x20
Sep 4 00:37:00 rmi-orem-ceph1-mds1 kernel: [233040.317767] [<ffffffff8117c8da>] vfs_fstatat+0x3a/0x70
Sep 4 00:37:00 rmi-orem-ceph1-mds1 kernel: [233040.317770] [<ffffffff8117c92e>] vfs_lstat+0x1e/0x20
Sep 4 00:37:00 rmi-orem-ceph1-mds1 kernel: [233040.317774] [<ffffffff8117caca>] sys_newlstat+0x1a/0x40
Sep 4 00:37:00 rmi-orem-ceph1-mds1 kernel: [233040.317777] [<ffffffff81177e0d>] ? vfs_read+0x10d/0x180
Sep 4 00:37:00 rmi-orem-ceph1-mds1 kernel: [233040.317781] [<ffffffff81177eca>] ? sys_read+0x4a/0x90
Sep 4 00:37:00 rmi-orem-ceph1-mds1 kernel: [233040.317785] [<ffffffff8118bf36>] ? sys_poll+0x76/0x110
Sep 4 00:37:00 rmi-orem-ceph1-mds1 kernel: [233040.317789] [<ffffffff81664a82>] system_call_fastpath+0x16/0x1b

Running in a production environment with:
5 mons
1 mds
4 osd

All machines are vanilla Ubuntu Server 12.04
Linux rmi-orem-ceph1-mds1.readymicro.local 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Ceph from packages:
ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)

The mds exports the cephfs via kernel driver via samba (smbd version 3.6.3)

Once this message appears, all access to the cephfs via the kernel driver blocks in definitely.

This is repeatable on my system by transferring large volumes of data in via samba, ie, recurs after about 100 gigabytes however the exact amount of data transfer is variable.

Please suggest what other information and logs will be helpful.

This is on production equipment; however, I'm in process of configuring a similar setup in my lab to attempt to recreate the issue where I can enable more logging.

Actions #1

Updated by Sage Weil over 11 years ago

  • Status changed from New to Need More Info
Actions #2

Updated by Sage Weil over 9 years ago

  • Status changed from Need More Info to Can't reproduce
Actions

Also available in: Atom PDF