c - Linux内核模块在迭代正在运行的进程以访问进程打开的文件时冻结计算机

Question

我正在使用内核版本 3.x 开发内核模块。

我有一个函数负责确定正在运行的进程是否打开了给定的文件。

这是我的代码（见我之后的评论）：

struct task_struct *    process = NULL;
struct files_struct *   task_files = NULL;
struct fdtable *        fdt = NULL;
int                     fd_i;
char                    tmpbuf[256];
char *                  process_path = "";

for_each_process(process)
{
    // Ignore processes without files
    if (process->files == NULL)
        continue;

    printk(KERN_INFO "task_lock()...\n");
    task_lock(process);
    printk(KERN_INFO "task_lock() DONE\n");
    task_files = process->files;
    printk(KERN_INFO "task_unlock()...\n");
    task_unlock(process);
    printk(KERN_INFO "task_unlock() DONE\n");

    printk(KERN_INFO "files_fdtable()...\n");
    fdt = files_fdtable(task_files);
    printk(KERN_INFO "files_fdtable() DONE\n");

    printk(KERN_INFO "Iterating files...\n");
    for (fd_i = 0; fd_i < fdt->max_fds; fd_i++)
    {
        if (fcheck_files(task_files, fd_i) == my_file)
        {
            if (process->mm)
            {
                if (process->mm->exe_file)
                {

                    process_path = d_path(&process->mm->exe_file->f_path, tmpbuf, sizeof(tmpbuf));
                    break;
                } else {
                    printk(KERN_INFO "process->mm->exe_file is NULL\n");
                }
            } else {
                printk(KERN_INFO "process->mm is NULL\n");
            }
        }
    }
    printk(KERN_INFO "Files iteration finished\n");
}

此代码正在运行，并且变量process_path包含打开给定文件的进程的路径。但是当机器上有巨大的负载时（所以经常通过这段代码），机器会冻结（在一定时间后）并且最新的打印调试是：

task_unlock() DONE

然后我看不出我做错了什么。

for_each_process没有在进程上调用spin_lock和spin_unlock所以我使用task_lock和task_unlock。
files_fdtable正在调用spin_lock和spin_unlock所以我没有。
fcheck_files也在调用spin_lock和spin_unlock所以我没有。
d_path正在注意锁定，所以我不这样做。

您能否解释一下为什么我的代码会冻结机器以及如何修复它？

score 1 · Accepted Answer

您设计模块的方式导致系统冻结。请注意，您使用了 for_each_process()，这意味着它将遍历系统的每个进程。因此，当您在系统上施加负载时，进程数会变大。此外，在您的 for_each_process() 循环中，您正在调用 task_lock/unlock() 并尝试对进程进行各种操作，所有这些操作都很昂贵，因为它们都有自己的锁要占用。当系统负载较低时，它们并不明显，但是随着系统负载越来越多，模块运行时的复杂性会增加，但在低负载时仍然不太明显。我建议使用 ftrace 之类的功能来检测您的模块，同时避免过度使用 printk（因为 printk 也需要被安排，klogd 用于此目的）。并且在低负载下检查您的模块如何在内核中运行。衡量它在每个循环上花费了多少时间，你就会了解自己。内核是一个大野兽，里面发生了很多事情......

score 0 · Accepted Answer

尝试以下操作：

read_lock(&tasklist_lock);

do_each_thread(g, p) {
  task_lock(p);

  if (check_for_file(p, file)) {
    task_unlock(p);
    goto next;
  }

  task_unlock(p);
} while_each_thread(g, p);

next:

read_unlock(&tasklist_lock);

score -1 · Accepted Answer

我终于修复了我的代码。

我首先在链表中实现了进程和打开文件之间关系的一种缓存，并以多种方法重构了我的代码（并且在修复了内存泄漏之后）它现在正在工作。

谢谢大家的帮助。

c - Linux内核模块在迭代正在运行的进程以访问进程打开的文件时冻结计算机

3 回答 3

Related

Reference