工作队列挂在我的板子上(ARM-Linux)。一开始板子可以用ssh连接。然后,连接正常,但无法进入提示符。我用sysrq捕获了一些信息,sysrq info(partial)像这样:
kworker/0:2H R running task 0 19783 2 0x00000028
Workqueue: kblockd blk_mq_run_work_fn
Call trace:
__switch_to+0xf4/0x120
__schedule+0x248/0x460
preempt_schedule_common+0x24/0x4c
preempt_schedule+0x28/0x30
_raw_spin_unlock_irqrestore+0x30/0x4c
__wake_up_common_lock+0x88/0xc4
__wake_up+0x14/0x1c
wake_up_bit+0x78/0xa0
end_buffer_read_sync+0x44/0xa4
end_bio_bh_io_sync+0x30/0x60
bio_endio+0xdc/0x110
blk_update_request+0xb8/0x250
mtd_blktrans_work+0xdc/0x1a0
mtd_queue_rq+0x50/0x84
blk_mq_dispatch_rq_list+0xa8/0x43c
blk_mq_do_dispatch_sched+0x78/0x110
blk_mq_sched_dispatch_requests+0x118/0x190
__blk_mq_run_hw_queue+0xc4/0x114
blk_mq_run_work_fn+0x1c/0x24
process_one_work+0x1c8/0x324
worker_thread+0x68/0x3ac
kthread+0x13c/0x150
ret_from_fork+0x10/0x1c
ipc_Session2 D 0 8552 8441 0x00000000
Call trace:
__switch_to+0xf4/0x120
__schedule+0x248/0x460
schedule+0x40/0xe0
squashfs_cache_get+0x2f8/0x340
squashfs_get_datablock+0x1c/0x24
squashfs_readpage_block+0x34/0x90
squashfs_readpage+0x240/0x27c
read_pages.isra.0+0x118/0x180
__do_page_cache_readahead+0x19c/0x1c0
do_sync_mmap_readahead+0xcc/0x174
filemap_fault+0x548/0x6e0
__do_fault+0x38/0xfc
do_fault+0xb4/0x1b0
handle_pte_fault+0x68/0x19c
__handle_mm_fault+0xcc/0x120
handle_mm_fault+0x8c/0xd4
do_page_fault+0x11c/0x3e0
do_translation_fault+0xa4/0xb0
do_mem_abort+0x3c/0xa0
do_el0_ia_bp_hardening+0x3c/0xb0
el0_ia+0x18/0x1c
ipc_Session3 D 0 8598 8441 0x00000000
Call trace:
__switch_to+0xf4/0x120
__schedule+0x248/0x460
schedule+0x40/0xe0
squashfs_cache_get+0x2f8/0x340
squashfs_get_datablock+0x1c/0x24
squashfs_readpage_block+0x34/0x90
squashfs_readpage+0x240/0x27c
read_pages.isra.0+0x118/0x180
__do_page_cache_readahead+0x19c/0x1c0
do_sync_mmap_readahead+0xcc/0x174
filemap_fault+0x548/0x6e0
__do_fault+0x38/0xfc
do_fault+0xb4/0x1b0
handle_pte_fault+0x68/0x19c
__handle_mm_fault+0xcc/0x120
handle_mm_fault+0x8c/0xd4
do_page_fault+0x11c/0x3e0
do_translation_fault+0xa4/0xb0
do_mem_abort+0x3c/0xa0
do_el0_ia_bp_hardening+0x3c/0xb0
el0_ia+0x18/0x1c
...
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
pending: vmstat_shepherd
workqueue events_power_efficient: flags=0x80
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=3/256 refcnt=4
pending: phy_state_machine, neigh_periodic_work, do_cache_clean
workqueue mm_percpu_wq: flags=0x8
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
pending: vmstat_update
workqueue writeback: flags=0x4a
pwq 4: cpus=0-1 flags=0x4 nice=0 active=2/256 refcnt=4
in-flight: 8294:wb_workfn wb_workfn
workqueue kblockd: flags=0x18
pwq 1: cpus=0 node=0 flags=0x0 nice=-20 active=2/256 refcnt=3
in-flight: 19783:blk_mq_run_work_fn
pending: blk_mq_run_work_fn
workqueue mmc_complete: flags=0x18
pwq 1: cpus=0 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2
pending: mmc_blk_mq_complete_work
pool 1: cpus=0 node=0 flags=0x0 nice=-20 hung=21394s workers=3 idle: 6724 1804
pool 4: cpus=0-1 flags=0x4 nice=0 hung=0s workers=3 idle: 19890 12972
如上图,pool 1 挂了 5.9 小时(21394s),可能是 blk_mq_run_work_fn(最有可能)或者mmc_blk_mq_complete_work 。而且很多线程或进程是 D 状态,如图:
/usr/bin# top
Mem: 487160K used, 12976K free, 1172K shrd, 9344K buff, 51200K cached
CPU: 0% usr 54% sys 0% nic 0% idle 45% io 0% irq 0% sirq
Load average: 90.99 90.18 88.74 5/226 30760
PID PPID USER STAT VSZ %VSZ %CPU COMMAND
8445 8441 root D 458m 94% 50% /usr/bin/app
30760 29801 root R 3300 1% 5% top
8444 8441 root D 9688 2% 0% /usr/bin/Daemon
329 1 root S 3724 1% 0% /sbin/logd -S 1024
384 1 root S 3440 1% 0% /usr/sbin/crond -f -c /etc/crontabs -
205 1 root S 3440 1% 0% /bin/ash --login
29592 1 root D 3440 1% 0% -ash
8939 1 root D 3440 1% 0% -ash
10680 1 root D 3440 1% 0% -ash
9210 1 root D 3440 1% 0% -ash
谁能告诉我为什么会发生这种情况,以及如何处理这个问题?谢谢