5

为冗长的帖子道歉,我无法以更短的方式制定它。此外,也许这更适合 Unix 和 Linux 堆栈交换,但我会先在这里尝试,因为有一个ftrace标签。

无论如何 - 我想观察用户程序的机器指令是否在function_graph使用ftrace. 一个问题是我需要一个较旧的内核:

$ uname -a
Linux mypc 2.6.38-16-generic #67-Ubuntu SMP Thu Sep 6 18:00:43 UTC 2012 i686 i686 i386 GNU/Linux

...在这个版本中,没有UPROBES- 正如3.5 [LWN.net] 中的 Uprobes所指出的那样,应该能够做类似的事情。(只要我不必修补原始内核,我愿意尝试用树构建的内核模块,正如用户空间探测器(Uprobes)[chunghwan.com]似乎证明的那样;但就目前而言我可以从0 看到:基于 Inode 的 uprobes [LWN.net],2.6 可能需要一个完整的补丁

但是,在这个版本中,有一个/sys/kernel/debug/kprobes, 和/sys/kernel/debug/tracing/kprobe_events; 和Documentation/trace/kprobetrace.txt意味着可以直接在地址上设置 kprobe;即使我在任何地方都找不到有关如何使用它的示例。

无论如何,我仍然不确定要使用哪些地址 - 作为一个小例子,假设我想跟踪程序main功能的开始wtest.c(包括在下面)。我可以这样做来编译并获得机器指令汇编列表:

$ gcc -g -O0 wtest.c -o wtest
$ objdump -S wtest | less
...
08048474 <main>:
int main(void) {
 8048474:       55                      push   %ebp
 8048475:       89 e5                   mov    %esp,%ebp
 8048477:       83 e4 f0                and    $0xfffffff0,%esp
 804847a:       83 ec 30                sub    $0x30,%esp
 804847d:       65 a1 14 00 00 00       mov    %gs:0x14,%eax
 8048483:       89 44 24 2c             mov    %eax,0x2c(%esp)
 8048487:       31 c0                   xor    %eax,%eax
  char filename[] = "/tmp/wtest.txt";
...
  return 0;
 804850a:       b8 00 00 00 00          mov    $0x0,%eax
}
...

我将通过此脚本设置 ftrace 日志记录:

sudo bash -c '
KDBGPATH="/sys/kernel/debug/tracing"
echo function_graph > $KDBGPATH/current_tracer
echo funcgraph-abstime > $KDBGPATH/trace_options
echo funcgraph-proc > $KDBGPATH/trace_options
echo 0 > $KDBGPATH/tracing_on
echo > $KDBGPATH/trace
echo 1 > $KDBGPATH/tracing_on ; ./wtest ; echo 0 > $KDBGPATH/tracing_on
cat $KDBGPATH/trace > wtest.ftrace
'

您可以在调试中看到一部分(否则很复杂)导致ftrace日志 - 观察内核空间中的硬盘写入(使用驱动程序/模块) - Unix & Linux Stack Exchange(我从中获得示例)。

基本上,我希望在此ftrace日志中打印输出,当main(例如)0x8048474、0x8048475、0x8048477、0x804847a、0x804847d、0x8048483 和 0x8048487 处的指令的第一条指令由(任何)CPU 执行时。问题是,据我从“内存中的程序剖析”中所理解的:Gustavo Duarte,这些地址是虚拟地址,从进程本身的角度来看(我认为,相同的角度显示为/proc/PID/maps)。 .. 显然,因为krpobe_event我需要一个物理地址?

所以,我的想法是:如果我能找到与程序反汇编的虚拟地址相对应的物理地址(比如通过编写一个内核模块,它将接受 pid 和地址,并通过 procfs 返回物理地址),我可以设置在上面的脚本中将地址作为一种“跟踪点” /sys/kernel/debug/tracing/kprobe_events- 并希望将它们放入ftrace日志中。原则上这可行吗?

我在 Linux(ubuntu) 上发现了一个问题,C 语言:虚拟到物理地址转换 - 代码日志

在用户代码中,您无法知道虚拟地址对应的物理地址。这是信息根本不会导出到内核之外。它甚至可能随时发生变化,尤其是当内核决定换出您进程的部分内存时。
...
使用 systemcall/procfs 将虚拟地址传递给内核并使用 vmalloc_to_pfn。通过 procfs/registers 返回物理地址。

但是,vmalloc_to_pfn似乎也不是微不足道的:

x86 64 - vmalloc_to_pfn 在 Linux 32 系统上返回 32 位地址。为什么它会切断 PAE 物理地址的更高位?- 堆栈溢出

VA:0xf8ab87fc PA 使用 vmalloc_to_pfn:0x36f7f7fc。但我实际上期待:0x136f7f7fc。
...
物理地址介于 4 到 5 GB 之间。但我无法获得确切的物理地址,我只能获得截断的 32 位地址。还有其他方法可以获得真实的物理地址吗?

所以,我不确定我能多可靠地提取物理地址以便它们被 kprobes 跟踪 - 特别是因为“它甚至可以随时改变”。但是在这里,我希望由于程序很小且微不足道,因此程序在被跟踪时不会交换的合理机会,从而可以获得适当的捕获。(所以即使我必须多次运行上面的调试脚本,只要我能希望在 10 次(甚至 100 次)中获得一次“正确”的捕获,我就可以接受。)。

请注意,我希望通过 输出ftrace,以便时间戳在同一域中表示(请参阅使用 usbmon 和 ftrace 的可靠 Linux 内核时间戳(或其调整)? - 堆栈溢出以说明时间戳问题)。因此,即使我可以想出一个gdb脚本,从用户空间运行和跟踪程序(同时获得ftrace捕获)——我想避免这种情况,因为gdb它本身的开销会显示在ftrace日志中.

所以,总结一下:

  • 从虚拟(从可执行文件的反汇编)地址获取(可能通过单独的内核模块)物理地址的方法 - 因此它们用于触发 ftrace 记录的 kprobe_event - 值得追求吗?如果是这样,是否有任何可用于此地址转换目的的内核模块示例?
  • 在执行特定内存地址时,我是否可以使用内核模块“注册”回调/处理函数?然后我可以简单地trace_printk在该函数中使用 a 来生成ftrace日志(或者即使没有,处理函数名称本身也应该显示在ftrace日志中),并且看起来不会有太多开销......

实际上,在 2007 年的这篇帖子中,Jim Keniston - utrace-based uprobes: systemtap mailing list 中,有一个11. Uprobes Example(添加到Documentation/uprobes.txt),似乎就是这样 - 一个注册处理函数的内核模块。不幸的是,它使用linux/uprobes.h; 我只有kprobes.h在我的/usr/src/linux-headers-2.6.38-16/include/linux/. 此外,在我的系统上,甚至systemtap抱怨CONFIG_UTRACE未启用(请参阅此评论)...因此,如果我可以使用任何其他方法来获得我想要的调试跟踪,而无需重新编译内核以获取 uprobes,它会很高兴知道...


wtest.c

#include <stdio.h>
#include <fcntl.h>  // O_CREAT, O_WRONLY, S_IRUSR

int main(void) {
  char filename[] = "/tmp/wtest.txt";
  char buffer[] = "abcd";
  int fd;
  mode_t perms = S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH;

  fd = open(filename, O_RDWR|O_CREAT, perms);
  write(fd,buffer,4);
  close(fd);

  return 0;
}
4

1 回答 1

1

显然,使用内核 3.5+ 上的内置 uprobes 会容易得多;但鉴于我的内核 2.6.38 的 uprobes 是一个非常深入的补丁(我无法真正将其隔离在单独的内核模块中,以避免修补内核),这是我可以为独立模块注意的内容在 2.6.38。(由于我仍然不确定很多事情,我仍然希望看到一个可以纠正本文中任何误解的答案。)

我想我到了某个地方,但没有kprobes。我不确定,但似乎我设法获得了正确的物理地址;但是,kprobes文档是特定的,当使用“ @ADDR : fetch memory at ADDR (ADDR should be in kernel) ”时;并且我得到的物理地址低于 0xc0000000 的内核边界(但是,0xc0000000 通常与虚拟内存布局一起?)。

所以我改用了硬件断点——模块在下面,但是需要注意的是——它的行为是随机的,偶尔会导致内核糟糕!通过编译模块并运行bash

$ sudo bash -c 'KDBGPATH="/sys/kernel/debug/tracing" ;
echo function_graph > $KDBGPATH/current_tracer ; echo funcgraph-abstime > $KDBGPATH/trace_options
echo funcgraph-proc > $KDBGPATH/trace_options ; echo 8192 > $KDBGPATH/buffer_size_kb ;
echo 0 > $KDBGPATH/tracing_on ; echo > $KDBGPATH/trace'
$ sudo insmod ./callmodule.ko && sleep 0.1 && sudo rmmod callmodule && \
tail -n25 /var/log/syslog | tee log.txt && \
sudo cat /sys/kernel/debug/tracing/trace >> log.txt

...我得到一个日志。我想跟踪 of 的前两个指令,main()wtest我来说是:

$ objdump -S wtest/wtest | grep -A3 'int main'
int main(void) {
 8048474:   55                      push   %ebp
 8048475:   89 e5                   mov    %esp,%ebp
 8048477:   83 e4 f0                and    $0xfffffff0,%esp

...在虚拟地址 0x08048474 和 0x08048475。在syslog输出中,我可以说:

...
[ 1106.383011] callmodule: parent task a: f40a9940 c: kworker/u:1 p: [14] s: stopped
[ 1106.383017] callmodule: - wtest [9404]
[ 1106.383023] callmodule: Trying to walk page table; addr task 0xEAE90CA0 ->mm ->start_code: 0x08048000 ->end_code: 0x080485F4
[ 1106.383029] callmodule: walk_ 0x8048000 callmodule: Valid pgd : Valid pud: Valid pmd: page frame struct is @ f63e5d80; *virtual (page_address) @   (null) (is_vmalloc_addr 0 virt_addr_valid 0 virt_to_phys 0x40000000) page_to_pfn 639ec page_to_phys 0x639ec000
[ 1106.383049] callmodule: walk_ 0x80483c0 callmodule: Valid pgd : Valid pud: Valid pmd: page frame struct is @ f63e5d80; *virtual (page_address) @   (null) (is_vmalloc_addr 0 virt_addr_valid 0 virt_to_phys 0x40000000) page_to_pfn 639ec page_to_phys 0x639ec000
[ 1106.383067] callmodule: walk_ 0x8048474 callmodule: Valid pgd : Valid pud: Valid pmd: page frame struct is @ f63e5d80; *virtual (page_address) @   (null) (is_vmalloc_addr 0 virt_addr_valid 0 virt_to_phys 0x40000000) page_to_pfn 639ec page_to_phys 0x639ec000
[ 1106.383083] callmodule: physaddr : (0x080483c0 ->) 0x639ec3c0 : (0x08048474 ->) 0x639ec474
[ 1106.383106] callmodule: 0x08048474 id [3]
[ 1106.383113] callmodule: 0x08048475 id [4]
[ 1106.383118] callmodule: (( 0x08048000 is_vmalloc_addr 0 virt_addr_valid 0 ))
[ 1106.383130] callmodule: cont pid task a: eae90ca0 c: wtest p: [9404] s: runnable
[ 1106.383147] initcall callmodule_init+0x0/0x1000 [callmodule] returned with preemption imbalance
[ 1106.518074] callmodule: < exit

...意味着它将虚拟地址 0x08048474 映射到物理地址 0x639ec474。但是,物理断点不用于硬件断点 - 我们可以直接提供虚拟地址register_user_hw_breakpoint;但是,我们也需要提供task_struct进程的。有了这个,我可以在ftrace输出中得到这样的东西:

...
  597.907256 |   1)   wtest-5339   |               |  handle_mm_fault() {
...
  597.907310 |   1)   wtest-5339   | + 35.627 us   |      }
  597.907311 |   1)   wtest-5339   | + 46.245 us   |    }
  597.907312 |   1)   wtest-5339   | + 56.143 us   |  }
  597.907313 |   1)   wtest-5339   |   1.039 us    |  up_read();
  597.907317 |   1)   wtest-5339   |   1.285 us    |  native_get_debugreg();
  597.907319 |   1)   wtest-5339   |   1.075 us    |  native_set_debugreg();
  597.907322 |   1)   wtest-5339   |   1.129 us    |  native_get_debugreg();
  597.907324 |   1)   wtest-5339   |   1.189 us    |  native_set_debugreg();
  597.907329 |   1)   wtest-5339   |               |  () {
  597.907333 |   1)   wtest-5339   |               |  /* callmodule: hwbp hit: id [3] */
  597.907334 |   1)   wtest-5339   |   5.567 us    |  }
  597.907336 |   1)   wtest-5339   |   1.123 us    |  native_set_debugreg();
  597.907339 |   1)   wtest-5339   |   1.130 us    |  native_get_debugreg();
  597.907341 |   1)   wtest-5339   |   1.075 us    |  native_set_debugreg();
  597.907343 |   1)   wtest-5339   |   1.075 us    |  native_get_debugreg();
  597.907345 |   1)   wtest-5339   |   1.081 us    |  native_set_debugreg();
  597.907348 |   1)   wtest-5339   |               |  () {
  597.907350 |   1)   wtest-5339   |               |  /* callmodule: hwbp hit: id [4] */
  597.907351 |   1)   wtest-5339   |   3.033 us    |  }
  597.907352 |   1)   wtest-5339   |   1.105 us    |  native_set_debugreg();
  597.907358 |   1)   wtest-5339   |   1.315 us    |  down_read_trylock();
  597.907360 |   1)   wtest-5339   |   1.123 us    |  _cond_resched();
  597.907362 |   1)   wtest-5339   |   1.027 us    |  find_vma();
  597.907364 |   1)   wtest-5339   |               |  handle_mm_fault() {
...

...其中与程序集对应的跟踪由断点 id 标记。值得庆幸的是,正如预期的那样,他们一个接一个。但是,ftrace也捕获了一些中间的调试命令。无论如何,这就是我想看到的。

以下是有关该模块的一些说明:

  • 大部分模块来自执行/调用用户空间程序,并从内核模块获取其 pid;启动用户进程并获得 pid
    • 因为我们必须到达 task_struct 才能到达 pid;在这里我都保存了(这有点多余)
  • 不导出函数符号的地方;如果符号在kallsyms,那么我使用指向地址的函数指针;否则其他需要的功能是从源代码复制的
  • 我不知道如何启动用户空间进程停止,所以在生成后我发出 a SIGSTOP(它本身似乎有点不可靠),并将状态设置为__TASK_STOPPED)。
    • 我有时可能仍然会在我不期望的情况下获得“可运行”状态 - 但是,如果 init 因错误而提前退出,我注意到wtest它会在自然终止后很长时间挂在进程列表中,所以我想这可行。
  • 为了获取绝对/物理地址,我在 Linux 中使用遍历进程的页表来获取与虚拟地址对应的页面,然后挖掘我找到的内核源代码page_to_phys()以获取该地址(内部通过页框号);LDD3 ch.15 有助于理解 pfn 和物理地址之间的关系。
    • 因为这里我希望有物理地址,所以我不使用 PAGE_SHIFT,而是直接从objdump的汇编输出中计算偏移量——不过,我不能 100% 确定这是正确的。
    • 请注意,(另请参阅如何从 Linux 内核中的任何地址获取结构页面),模块输出显示虚拟地址0x08048000既不是is_vmalloc_addr也不是virt_addr_valid;我想,这应该告诉我,一个人既不能使用也vmalloc_to_pfn()不能到达virt_to_page()它的物理地址!?
  • 从内核空间设置有点棘手(需要复制函数 kprobesftrace
    • 尝试kprobe在我得到的物理地址上设置 a(例如 0x639ec474),总是导致“无法插入探针(-22)
    • 只是为了查看格式是否被解析,我正在尝试使用下面kallsyms的函数地址tracing_on()(0xc10bcf60);这似乎可行——因为它引发了一个致命的“ BUG:原子时调度”(显然,我们不打算在 module_init 中设置断点?)。错误是致命的,因为它使kprobes目录从ftrace调试目录中消失
    • 只是创建kprobe不会让它出现在ftrace日志中 - 它还需要启用;启用的必要代码在那里 - 但我从未尝试过,因为以前的错误
  • 最后,断点设置是从Watch a variable (memory address) change in Linux kernel, and print stack trace when it change?
    • 我从未见过设置可执行硬件断点的示例;它对我来说一直失败,直到通过内核源搜索,我发现 for HW_BREAKPOINT_X,attr.bp_len需要设置为sizeof(long)
    • 如果我尝试printk使用attr变量 - 从 _init 或从处理程序 - 事情会严重混乱,无论我接下来尝试打印什么变量,我都会得到它的值 0x5(或 0x48)(?!)
    • 由于我试图对两个断点使用单个处理程序函数,从 _init 到处理程序的唯一可靠信息,能够区分两者,似乎是bp->id
    • 这些 id 是自动分配的,如果您取消注册断点,它们似乎不会被重新声明(我不会取消注册它们以避免额外的 ftrace 打印输出)。

就随机性而言,我认为这是因为进程不是在停止状态下启动的;当它停止时,它最终处于不同的状态(或者,很可能,我在某处丢失了一些锁定)。无论如何,您也可以期待syslog

[ 1661.815114] callmodule: Trying to walk page table; addr task 0xEAF68CA0 ->mm ->start_code: 0x08048000 ->end_code: 0x080485F4
[ 1661.815319] callmodule: walk_ 0x8048000 callmodule: Valid pgd : Valid pud: Valid pmd: page frame struct is @ f5772000; *virtual (page_address) @ c0000000 (is_vmalloc_addr 0 virt_addr_valid 1 virt_to_phys 0x0) page_to_pfn 0 page_to_phys 0x0
[ 1661.815837] callmodule: walk_ 0x80483c0 callmodule: Valid pgd : Valid pud: Valid pmd: page frame struct is @ f5772000; *virtual (page_address) @ c0000000 (is_vmalloc_addr 0 virt_addr_valid 1 virt_to_phys 0x0) page_to_pfn 0 page_to_phys 0x0
[ 1661.816846] callmodule: walk_ 0x8048474 callmodule: Valid pgd : Valid pud: Valid pmd: page frame struct is @ f5772000; *virtual (page_address) @ c0000000 (is_vmalloc_addr 0 virt_addr_valid 1 virt_to_phys 0x0) page_to_pfn 0 page_to_phys 0x0

...也就是说,即使使用正确的任务指针(通过 start_code 判断),也只能获得 0x0 作为物理地址。有时您会得到相同的结果,但使用start_code: 0x00000000 ->end_code: 0x00000000. 有时,atask_struct无法获得,即使 pid 可以:

[  833.380417] callmodule:c: pid 7663
[  833.380424] callmodule: everything all right; pid 7663 (7663)
[  833.380430] callmodule: p is NULL - exiting
[  833.516160] callmodule: < exit

好吧,希望有人会评论并澄清这个模块的一些行为:)
希望这对某人有帮助,
干杯!

Makefile

EXTRA_CFLAGS=-g -O0
obj-m += callmodule.o
all:
  make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
clean:
  make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean

callmodule.c

#include <linux/module.h>
#include <linux/slab.h> //kzalloc
#include <linux/syscalls.h> // SIGCHLD, ... sys_wait4, ...
#include <linux/kallsyms.h> // kallsyms_lookup, print_symbol
#include <linux/highmem.h> // ‘kmap_atomic’ (via pte_offset_map)
#include <asm/io.h> // page_to_phys (arch/x86/include/asm/io.h)

struct subprocess_infoB; // forward declare
// global variable - to avoid intervening too much in the return of call_usermodehelperB:
static int callmodule_pid;
static struct subprocess_infoB* callmodule_infoB;
#define TRY_USE_KPROBES 0 // 1 // enable/disable kprobes usage code
#include <linux/kprobes.h> // enable_kprobe
// for hardware breakpoint:
#include <linux/perf_event.h>
#include <linux/hw_breakpoint.h>

// define a modified struct (with extra fields) here:
struct subprocess_infoB {
  struct work_struct work;
  struct completion *complete;
  char *path;
  char **argv;
  char **envp;
  int wait; //enum umh_wait wait;
  int retval;
  int (*init)(struct subprocess_info *info);
  void (*cleanup)(struct subprocess_info *info);
  void *data;
  pid_t pid;
  struct task_struct *task;
  unsigned long long last_page_physaddr;
};

struct subprocess_infoB *call_usermodehelper_setupB(char *path, char **argv,
                          char **envp, gfp_t gfp_mask);

static inline int
call_usermodehelper_fnsB(char *path, char **argv, char **envp,
            int wait, //enum umh_wait wait,
            int (*init)(struct subprocess_info *info),
            void (*cleanup)(struct subprocess_info *), void *data)
{
  struct subprocess_info *info;
  struct subprocess_infoB *infoB;
  gfp_t gfp_mask = (wait == UMH_NO_WAIT) ? GFP_ATOMIC : GFP_KERNEL;
  int ret;

  populate_rootfs_wait();

  infoB = call_usermodehelper_setupB(path, argv, envp, gfp_mask);
  printk(KBUILD_MODNAME ":a: pid %d\n", infoB->pid);
  info = (struct subprocess_info *) infoB;

  if (info == NULL)
      return -ENOMEM;

  call_usermodehelper_setfns(info, init, cleanup, data);
  printk(KBUILD_MODNAME ":b: pid %d\n", infoB->pid);

  // this must be called first, before infoB->pid is populated (by __call_usermodehelperB):
  ret = call_usermodehelper_exec(info, wait);

  // assign global pid (and infoB) here, so rest of the code has it:
  callmodule_pid = infoB->pid;
  callmodule_infoB = infoB;    
  printk(KBUILD_MODNAME ":c: pid %d\n", callmodule_pid);

  return ret;
}

static inline int
call_usermodehelperB(char *path, char **argv, char **envp, int wait) //enum umh_wait wait)
{
  return call_usermodehelper_fnsB(path, argv, envp, wait,
                     NULL, NULL, NULL);
}

static void __call_usermodehelperB(struct work_struct *work)
{
  struct subprocess_infoB *sub_infoB =
      container_of(work, struct subprocess_infoB, work);
  int wait = sub_infoB->wait; // enum umh_wait wait = sub_info->wait;
  pid_t pid;
  struct subprocess_info *sub_info;
  // hack - declare function pointers
  int (*ptrwait_for_helper)(void *data);
  int (*ptr____call_usermodehelper)(void *data);
  // assign function pointers to verbatim addresses as obtained from /proc/kallsyms
  int killret;
  struct task_struct *spawned_task;
  ptrwait_for_helper = (void *)0xc1065b60;
  ptr____call_usermodehelper = (void *)0xc1065ed0;

  sub_info = (struct subprocess_info *)sub_infoB;

  if (wait == UMH_WAIT_PROC)
      pid = kernel_thread((*ptrwait_for_helper), sub_info, //(wait_for_helper, sub_info,
                  CLONE_FS | CLONE_FILES | SIGCHLD);
  else
      pid = kernel_thread((*ptr____call_usermodehelper), sub_info, //(____call_usermodehelper, sub_info,
                  CLONE_VFORK | SIGCHLD);

  spawned_task = pid_task(find_vpid(pid), PIDTYPE_PID);

  // stop/suspend/pause task
  killret = kill_pid(find_vpid(pid), SIGSTOP, 1); 
  if (spawned_task!=NULL) {
    // does this stop the process really?
    spawned_task->state = __TASK_STOPPED;
    printk(KBUILD_MODNAME ": : exst %d exco %d exsi %d diex %d inex %d inio %d\n", spawned_task->exit_state, spawned_task->exit_code, spawned_task->exit_signal, spawned_task->did_exec, spawned_task->in_execve, spawned_task->in_iowait);
  }
  printk(KBUILD_MODNAME ": : (kr: %d)\n", killret);
  printk(KBUILD_MODNAME ": : pid %d (%p) (%s)\n", pid, spawned_task,
    (spawned_task!=NULL)?((spawned_task->state==-1)?"unrunnable":((spawned_task->state==0)?"runnable":"stopped")):"null" );
  // grab and save the pid (and task_struct) here:
  sub_infoB->pid = pid;
  sub_infoB->task = spawned_task;
    switch (wait) {
    case UMH_NO_WAIT:
        call_usermodehelper_freeinfo(sub_info);
        break;
    case UMH_WAIT_PROC:
        if (pid > 0)
            break;
        /* FALLTHROUGH */
    case UMH_WAIT_EXEC:
        if (pid < 0)
            sub_info->retval = pid;
        complete(sub_info->complete);
    }
}

struct subprocess_infoB *call_usermodehelper_setupB(char *path, char **argv,
                          char **envp, gfp_t gfp_mask)
{
    struct subprocess_infoB *sub_infoB;
    sub_infoB = kzalloc(sizeof(struct subprocess_infoB), gfp_mask);
    if (!sub_infoB)
        goto out;

    INIT_WORK(&sub_infoB->work, __call_usermodehelperB);
    sub_infoB->path = path;
    sub_infoB->argv = argv;
    sub_infoB->envp = envp;
  out:
    return sub_infoB;
}

#if TRY_USE_KPROBES
// copy from /kernel/trace/trace_probe.c (is unexported)
int traceprobe_command(const char *buf, int (*createfn)(int, char **))
{
  char **argv;
  int argc, ret;

  argc = 0;
  ret = 0;
  argv = argv_split(GFP_KERNEL, buf, &argc);
  if (!argv)
    return -ENOMEM;

  if (argc)
    ret = createfn(argc, argv);

  argv_free(argv);

  return ret;
}

// copy from kernel/trace/trace_kprobe.c?v=2.6.38 (is unexported)
#define TP_FLAG_TRACE   1
#define TP_FLAG_PROFILE 2
typedef void (*fetch_func_t)(struct pt_regs *, void *, void *);
struct fetch_param {
  fetch_func_t    fn;
  void *data;
};
typedef int (*print_type_func_t)(struct trace_seq *, const char *, void *, void *);
enum {
  FETCH_MTD_reg = 0,
  FETCH_MTD_stack,
  FETCH_MTD_retval,
  FETCH_MTD_memory,
  FETCH_MTD_symbol,
  FETCH_MTD_deref,
  FETCH_MTD_END,
};
// Fetch type information table * /
struct fetch_type {
  const char      *name;          /* Name of type */
  size_t          size;           /* Byte size of type */
  int             is_signed;      /* Signed flag */
  print_type_func_t       print;  /* Print functions */
  const char      *fmt;           /* Fromat string */
  const char      *fmttype;       /* Name in format file */
  // Fetch functions * /
  fetch_func_t    fetch[FETCH_MTD_END];
};
struct probe_arg {
  struct fetch_param      fetch;
  struct fetch_param      fetch_size;
  unsigned int            offset; /* Offset from argument entry */
  const char              *name;  /* Name of this argument */
  const char              *comm;  /* Command of this argument */
  const struct fetch_type *type;  /* Type of this argument */
};
struct trace_probe {
  struct list_head        list;
  struct kretprobe        rp;     /* Use rp.kp for kprobe use */
  unsigned long           nhit;
  unsigned int            flags;  /* For TP_FLAG_* */
  const char              *symbol;        /* symbol name */
  struct ftrace_event_class       class;
  struct ftrace_event_call        call;
  ssize_t                 size;           /* trace entry size */
  unsigned int            nr_args;
  struct probe_arg        args[];
};
static  int probe_is_return(struct trace_probe *tp)
{
  return tp->rp.handler != NULL;
}
static int probe_event_enable(struct ftrace_event_call *call)
{
  struct trace_probe *tp = (struct trace_probe *)call->data;

  tp->flags |= TP_FLAG_TRACE;
  if (probe_is_return(tp))
    return enable_kretprobe(&tp->rp);
  else
    return enable_kprobe(&tp->rp.kp);
}
#define KPROBE_EVENT_SYSTEM "kprobes"
#endif // TRY_USE_KPROBES

// <<<<<<<<<<<<<<<<<<<<<<

static struct page *walk_page_table(unsigned long addr, struct task_struct *intask)
{
  pgd_t *pgd;
  pte_t *ptep, pte;
  pud_t *pud;
  pmd_t *pmd;

  struct page *page = NULL;
  struct mm_struct *mm = intask->mm;

  callmodule_infoB->last_page_physaddr = 0ULL; // reset here, in case of early exit

  printk(KBUILD_MODNAME ": walk_ 0x%lx ", addr);

  pgd = pgd_offset(mm, addr);
  if (pgd_none(*pgd) || pgd_bad(*pgd))
    goto out;
  printk(KBUILD_MODNAME ": Valid pgd ");

  pud = pud_offset(pgd, addr);
  if (pud_none(*pud) || pud_bad(*pud))
    goto out;
  printk( ": Valid pud");

  pmd = pmd_offset(pud, addr);
  if (pmd_none(*pmd) || pmd_bad(*pmd))
    goto out;
  printk( ": Valid pmd");

  ptep = pte_offset_map(pmd, addr);
  if (!ptep)
    goto out;
  pte = *ptep;

  page = pte_page(pte);
  if (page) {
    callmodule_infoB->last_page_physaddr = (unsigned long long)page_to_phys(page);
    printk( ": page frame struct is @ %p; *virtual (page_address) @ %p (is_vmalloc_addr %d virt_addr_valid %d virt_to_phys 0x%llx) page_to_pfn %lx page_to_phys 0x%llx", page, page_address(page), is_vmalloc_addr((void*)page_address(page)), virt_addr_valid(page_address(page)), (unsigned long long)virt_to_phys(page_address(page)), page_to_pfn(page), callmodule_infoB->last_page_physaddr);
  }

  //~ pte_unmap(ptep);

out:
  printk("\n");
  return page;
}

static void sample_hbp_handler(struct perf_event *bp,
             struct perf_sample_data *data,
             struct pt_regs *regs)
{
  trace_printk(KBUILD_MODNAME ": hwbp hit: id [%llu]\n", bp->id );
  //~ unregister_hw_breakpoint(bp);
}

// ----------------------

static int __init callmodule_init(void)
{
  int ret = 0;
  char userprog[] = "/path/to/wtest";
  char *argv[] = {userprog, "2", NULL };
  char *envp[] = {"HOME=/", "PATH=/sbin:/usr/sbin:/bin:/usr/bin", NULL };
  struct task_struct *p;
  struct task_struct *par;
  struct task_struct *pc;
  struct list_head *children_list_head;
  struct list_head *cchildren_list_head;
  char *state_str;
  unsigned long offset, taddr;
  int (*ptr_create_trace_probe)(int argc, char **argv); 
  struct trace_probe* (*ptr_find_probe_event)(const char *event, const char *group);
  //int (*ptr_probe_event_enable)(struct ftrace_event_call *call); // not exported, copy
  #if TRY_USE_KPROBES
  char trcmd[256] = "";
  struct trace_probe *tp;
  #endif //TRY_USE_KPROBES
  struct perf_event *sample_hbp, *sample_hbpb;
  struct perf_event_attr attr, attrb;

  printk(KBUILD_MODNAME ": > init %s\n", userprog);

  ptr_create_trace_probe = (void *)0xc10d5120;
  ptr_find_probe_event = (void *)0xc10d41e0;
  print_symbol(KBUILD_MODNAME ": symbol @ 0xc1065b60 is %s\n", 0xc1065b60); // shows wait_for_helper+0x0/0xb0
  print_symbol(KBUILD_MODNAME ": symbol @ 0xc1065ed0 is %s\n", 0xc1065ed0); // shows ____call_usermodehelper+0x0/0x90
  print_symbol(KBUILD_MODNAME ": symbol @ 0xc10d5120 is %s\n", 0xc10d5120); // shows create_trace_probe+0x0/0x590
  ret = call_usermodehelperB(userprog, argv, envp, UMH_WAIT_EXEC); 
  if (ret != 0)
      printk(KBUILD_MODNAME ": error in call to usermodehelper: %i\n", ret);
  else
      printk(KBUILD_MODNAME ": everything all right; pid %d (%d)\n", callmodule_pid, callmodule_infoB->pid);
  tracing_on(); // earlier, so trace_printk of handler is caught!
  // find the task:
  rcu_read_lock();
  p = pid_task(find_vpid(callmodule_pid), PIDTYPE_PID);
  rcu_read_unlock();
  if (p == NULL) {
    printk(KBUILD_MODNAME ": p is NULL - exiting\n");
    return 0;
  }
  state_str = (p->state==-1)?"unrunnable":((p->state==0)?"runnable":"stopped");
  printk(KBUILD_MODNAME ": pid task a: %p c: %s p: [%d] s: %s\n",
    p, p->comm, p->pid, state_str);
  // find parent task:
  par = p->parent;
  if (par == NULL) {
    printk(KBUILD_MODNAME ": par is NULL - exiting\n");
    return 0;
  }
  state_str = (par->state==-1)?"unrunnable":((par->state==0)?"runnable":"stopped");
  printk(KBUILD_MODNAME ": parent task a: %p c: %s p: [%d] s: %s\n",
    par, par->comm, par->pid, state_str);

  // iterate through parent's (and our task's) child processes:
  rcu_read_lock(); // read_lock(&tasklist_lock);
  list_for_each(children_list_head, &par->children){
    p = list_entry(children_list_head, struct task_struct, sibling);
    printk(KBUILD_MODNAME ": - %s [%d] \n", p->comm, p->pid);
    if (p->pid == callmodule_pid) {
      list_for_each(cchildren_list_head, &p->children){
        pc = list_entry(cchildren_list_head, struct task_struct, sibling);
        printk(KBUILD_MODNAME ": - - %s [%d] \n", pc->comm, pc->pid);
      }
    }
  }
  rcu_read_unlock(); //~ read_unlock(&tasklist_lock);

  // NOTE: here p == callmodule_infoB->task !!
  printk(KBUILD_MODNAME ": Trying to walk page table; addr task 0x%X ->mm ->start_code: 0x%08lX ->end_code: 0x%08lX \n", (unsigned int) callmodule_infoB->task, callmodule_infoB->task->mm->start_code, callmodule_infoB->task->mm->end_code);
  walk_page_table(0x08048000, callmodule_infoB->task);
  // 080483c0 is start of .text; 08048474 start of main; for objdump -S wtest
  walk_page_table(0x080483c0, callmodule_infoB->task);
  walk_page_table(0x08048474, callmodule_infoB->task);

  if (callmodule_infoB->last_page_physaddr != 0ULL) {
    printk(KBUILD_MODNAME ": physaddr ");
    taddr = 0x080483c0; // .text
    offset = taddr - callmodule_infoB->task->mm->start_code;
    printk(": (0x%08lx ->) 0x%08llx ", taddr, callmodule_infoB->last_page_physaddr+offset);
    taddr = 0x08048474; // main
    offset = taddr - callmodule_infoB->task->mm->start_code;
    printk(": (0x%08lx ->) 0x%08llx ", taddr, callmodule_infoB->last_page_physaddr+offset);
    printk("\n");

    #if TRY_USE_KPROBES // can't use this here (BUG: scheduling while atomic, if probe inserts)
    //~ sprintf(trcmd, "p:myprobe 0x%08llx", callmodule_infoB->last_page_physaddr+offset);
    // try symbol for c10bcf60 - tracing_on
    sprintf(trcmd, "p:myprobe 0x%08llx", (unsigned long long)0xc10bcf60);
    ret = traceprobe_command(trcmd, ptr_create_trace_probe); //create_trace_probe);
    printk("%s -- ret: %d\n", trcmd, ret);
    // try find probe and enable it (compiles, but untested):
    tp = ptr_find_probe_event("myprobe", KPROBE_EVENT_SYSTEM);
    if (tp != NULL) probe_event_enable(&tp->call);
    #endif //TRY_USE_KPROBES
  }

  hw_breakpoint_init(&attr);
  attr.bp_len = sizeof(long); //HW_BREAKPOINT_LEN_1;
  attr.bp_type = HW_BREAKPOINT_X ;
  attr.bp_addr = 0x08048474; // main
  sample_hbp = register_user_hw_breakpoint(&attr, (perf_overflow_handler_t)sample_hbp_handler, p);
  printk(KBUILD_MODNAME ": 0x08048474 id [%llu]\n", sample_hbp->id); //
  if (IS_ERR((void __force *)sample_hbp)) {
    int ret = PTR_ERR((void __force *)sample_hbp);
    printk(KBUILD_MODNAME ": Breakpoint registration failed (%d)\n", ret);
    //~ return ret;
  }

  hw_breakpoint_init(&attrb);
  attrb.bp_len = sizeof(long);
  attrb.bp_type = HW_BREAKPOINT_X ;
  attrb.bp_addr = 0x08048475; // first instruction after main
  sample_hbpb = register_user_hw_breakpoint(&attrb, (perf_overflow_handler_t)sample_hbp_handler, p);
  printk(KBUILD_MODNAME ": 0x08048475 id [%llu]\n", sample_hbpb->id); //45
  if (IS_ERR((void __force *)sample_hbpb)) {
    int ret = PTR_ERR((void __force *)sample_hbpb);
    printk(KBUILD_MODNAME ": Breakpoint registration failed (%d)\n", ret);
    //~ return ret;
  }

  printk(KBUILD_MODNAME ": (( 0x08048000 is_vmalloc_addr %d virt_addr_valid %d ))\n", is_vmalloc_addr((void*)0x08048000), virt_addr_valid(0x08048000));

  kill_pid(find_vpid(callmodule_pid), SIGCONT, 1); // resume/continue/restart task
  state_str = (p->state==-1)?"unrunnable":((p->state==0)?"runnable":"stopped");
  printk(KBUILD_MODNAME ": cont pid task a: %p c: %s p: [%d] s: %s\n",
    p, p->comm, p->pid, state_str);

  return 0;
}

static void __exit callmodule_exit(void)
{
  tracing_off(); //corresponds to the user space /sys/kernel/debug/tracing/tracing_on file
  printk(KBUILD_MODNAME ": < exit\n");
}

module_init(callmodule_init);
module_exit(callmodule_exit);
MODULE_LICENSE("GPL");
于 2014-02-26T20:20:29.853 回答