linux-kernel - Is there any solution to the XFS lockup in linux?

Question

Apparently there is a known problem of XFS locking up the kernel/processes and corrupting volumes under heavy traffic. Some web pages talk about it, but I was not able to figure out which pages are new and may have a solution.

My company's deployments have Debian with kernel 3.4.107, xfsprogs 3.1.4, and large storage arrays. We have large data (PB) and high throughput (GB/sec) using async IO to several large volumes. We constantly experience these unpredictable lockups on several systems. Kernel logs/dmesg show something like the following:

2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986515] INFO: task Sr2dReceiver-5:46829 blocked for more than 120 seconds.
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986518] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986520] Sr2dReceiver-5  D ffffffff8105b39e     0 46829   7284 0x00000000
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986524]  ffff881e71f57b38 0000000000000082 000000000000000b ffff884066763180
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986529]  0000000000000000 ffff884066763180 0000000000011180 0000000000011180
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986532]  ffff881e71f57fd8 ffff881e71f56000 0000000000011180 ffff881e71f56000
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986536] Call Trace:
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986545]  [<ffffffff814ffe9f>] schedule+0x64/0x66
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986548]  [<ffffffff815005f3>] rwsem_down_failed_common+0xdb/0x10d
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986551]  [<ffffffff81500638>] rwsem_down_write_failed+0x13/0x15
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986555]  [<ffffffff8126b583>] call_rwsem_down_write_failed+0x13/0x20
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986558]  [<ffffffff814ff320>] ? down_write+0x25/0x27
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986572]  [<ffffffffa01f29e0>] xfs_ilock+0xbc/0x12e [xfs]
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986580]  [<ffffffffa01eec71>] xfs_rw_ilock+0x2c/0x33 [xfs]
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986586]  [<ffffffffa01eec71>] ? xfs_rw_ilock+0x2c/0x33 [xfs]
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986593]  [<ffffffffa01ef234>] xfs_file_aio_write_checks+0x41/0xfe [xfs]
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986600]  [<ffffffffa01ef358>] xfs_file_buffered_aio_write+0x67/0x179 [xfs]
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986603]  [<ffffffff8150099a>] ? _raw_spin_unlock_irqrestore+0x30/0x3d
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986611]  [<ffffffffa01ef81d>] xfs_file_aio_write+0x163/0x1b5 [xfs]
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986614]  [<ffffffff8106f1af>] ? futex_wait+0x22c/0x244
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986619]  [<ffffffff8110038e>] do_sync_write+0xd9/0x116
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986622]  [<ffffffff8150095f>] ? _raw_spin_unlock+0x26/0x31
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986634]  [<ffffffff8106f2f1>] ? futex_wake+0xe8/0xfa
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986637]  [<ffffffff81100d1d>] vfs_write+0xae/0x10a
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986639]  [<ffffffff811015b3>] ? fget_light+0xb0/0xbf
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986642]  [<ffffffff81100dd3>] sys_pwrite64+0x5a/0x79
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986645]  [<ffffffff81506912>] system_call_fastpath+0x16/0x1b

Lockups leave the system in a bad state. The processes in D state that hang cannot even be killed with signal 9. The only way to resume operations is to reboot, repair XFS and then the system works for another while. But occasionally after the lockup we cannot even repair some volumes, as they get totally corrupted and we need to rebuild them with mkfs.

As a last resort, we now run xfs-repair periodically and this reduced the frequency of lockups and data loss to a certain extent. But the incidents still occur often enough, so we need some solution.

I was wondering if there is a solution for this with kernel 3.4.107, e.g. some patch that we may apply. Due to the large number of deployments and other software issues, we cannot upgrade the kernel in the near future.

However, we are working towards updating our applications so that we can run on kernel 3.16 in our next releases. Does anyone know if this XFS lockup problem was fixed in 3.16?

score 0 · Accepted Answer

有些人遇到过这种情况，但这不是 XFS 的问题，因为内核无法在 120 秒的时间段内刷新脏页。看看这里，但请检查他们在您自己的系统上默认使用的数字。

http://blog.ronnyegner-consulting.de/2011/10/13/info-task-blocked-for-more-than-120-seconds/

和这里

http://www.blackmoreops.com/2014/09/22/linux-kernel-panic-issue-fix-hung_task_timeout_secs-blocked-120-seconds-problem/

你可以通过运行这个来查看你的脏缓存比率是多少

sysctl -a | grep dirty

或者

cat /proc/sys/vm/dirty_ratio

我能找到的最好的文章在这里......

https://lonesysadmin.net/2013/12/22/better-linux-disk-caching-performance-vm-dirty_ratio/

本质上，您需要调整您的应用程序以确保它可以在时间段内将脏缓冲区写入磁盘或更改计时器周期等。

您还可以看到一些有趣的参数，如下所示

sysctl -a | grep hung

/etc/sysctl.conf您可以使用如下方式永久增加超时...

kernel.hung_task_timeout_secs = 300

score 0 · Accepted Answer

有谁知道这个 XFS 锁定问题是否在 3.16 中得到修复？

在A Short Guide to Kernel Debugging中是这样说的：

搜索“xfs splice deadlock”会出现一个描述此问题的 2011 年电子邮件线程。但是，将内核源存储库一分为二表明，直到 2014 年 4 月 (8d02076) 在 Linux 3.16 中发布，该错误才真正得到解决。

linux-kernel - Is there any solution to the XFS lockup in linux?

2 回答 2

Related

Reference