linux - eBPF 可以修改系统调用的返回值或参数吗？

Question

为了模拟某些行为，我想将一个探针附加到系统调用并在传递某些参数时修改返回值。或者，在函数成为进程之前修改函数的参数也足够了。

BPF有可能吗？

score 13 · Accepted Answer

在内核探针 (kprobes) 中，eBPF 虚拟机对系统调用参数和返回值具有只读访问权限。

然而，eBPF 程序将有一个它自己的返回码。可以应用一个 seccomp 配置文件来捕获 BPF（不是 eBPF；感谢@qeole）返回代码并在执行期间中断系统调用。

允许的运行时修改是：

SECCOMP_RET_KILL: 立即杀死SIGSYS
SECCOMP_RET_TRAP: 发送一个可捕获的SIGSYS，提供模拟系统调用的机会
SECCOMP_RET_ERRNO：强制errno值
SECCOMP_RET_TRACE: 屈服于 ptracer 或设置errno为-ENOSYS
SECCOMP_RET_ALLOW：允许

https://www.kernel.org/doc/Documentation/prctl/seccomp_filter.txt

该SECCOMP_RET_TRACE方法允许修改执行的系统调用、参数或返回值。这取决于体系结构，并且对强制外部引用的修改可能会导致 ENOSYS 错误。

它通过将执行传递给等待的用户空间 ptrace 来做到这一点，该用户空间具有修改跟踪的进程内存、寄存器和文件描述符的能力。

跟踪器需要调用 ptrace，然后调用 waitpid。一个例子：

ptrace(PTRACE_SETOPTIONS, tracee_pid, 0, PTRACE_O_TRACESECCOMP);
waitpid(tracee_pid, &status, 0);

http://man7.org/linux/man-pages/man2/ptrace.2.html

返回时waitpid，根据的内容，可以使用ptrace 操作status检索 seccomp 返回值。PTRACE_GETEVENTMSG这将检索 seccompSECCOMP_RET_DATA值，这是 BPF 程序设置的 16 位字段。例子：

ptrace(PTRACE_GETEVENTMSG, tracee_pid, 0, &data);

系统调用参数可以在继续操作之前在内存中修改。您可以使用该步骤执行单个系统调用进入或退出PTRACE_SYSCALL。恢复执行前可以在用户空间修改 Syscall 返回值；底层程序将无法看到系统调用返回值已被修改。

示例实现：使用 seccomp 和 ptrace 过滤和修改系统调用

score 5 · Accepted Answer

我相信将 eBPF 附加到 kprobes/kretprobes 可以让您读取函数参数和返回值，但您不能篡改它们。我不是 100% 确定；请求确认的好地方是 IO Visor 项目邮件列表或 IRC 频道（#iovisor at irc.oftc.net）。

作为替代解决方案，我知道您至少可以使用 strace 选项更改系统调用的返回值-e。引用手册页：

-e inject=set[:error=errno|:retval=value][:signal=sig][:when=expr]
       Perform syscall tampering for the specified set of syscalls.

此外，如果您感兴趣的话，在 Fosdem 2017 上有关于此和故障注入的演示。以下是幻灯片中的一个示例命令：

strace -P precious.txt -efault=unlink:retval=0 unlink precious.txt

编辑：正如 Ben 所说，kprobes 和 tracepoints 上的 eBPF 绝对是只读的，用于跟踪和监控用例。我也在 IRC 上得到了确认。

score 3 · Accepted Answer

可以使用 eBPF 修改一些用户空间内存。如bpf.h 头文件中所述：

 * int bpf_probe_write_user(void *dst, const void *src, u32 len)
 *  Description
 *      Attempt in a safe way to write *len* bytes from the buffer
 *      *src* to *dst* in memory. It only works for threads that are in
 *      user context, and *dst* must be a valid user space address.
 *
 *      This helper should not be used to implement any kind of
 *      security mechanism because of TOC-TOU attacks, but rather to
 *      debug, divert, and manipulate execution of semi-cooperative
 *      processes.
 *
 *      Keep in mind that this feature is meant for experiments, and it
 *      has a risk of crashing the system and running programs.
 *      Therefore, when an eBPF program using this helper is attached,
 *      a warning including PID and process name is printed to kernel
 *      logs.
 *  Return
 *      0 on success, or a negative error in case of failure.

另外，引用BPF 设计问答：

跟踪 BPF 程序可以使用 bpf_probe_write_user() 覆盖当前任务的用户内存。每次加载此类程序时，内核都会打印警告消息，因此此帮助程序仅对实验和原型有用。跟踪 BPF 程序只能是 root。

您的 eBPF 可能会将数据写入用户空间内存位置。请注意，您仍然无法在 eBPF 程序中修改内核结构。

score 3 · Accepted Answer

可以使用 eBPF 将错误注入到系统调用调用中：https ://lwn.net/Articles/740146/

有一个名为 bpf 的函数bpf_override_return()，它可以覆盖调用的返回值。这是一个使用 bcc 作为前端的示例：https ://github.com/iovisor/bcc/blob/master/tools/inject.py

根据Linux手册页：

bpf_override_return()仅在使用CONFIG_BPF_KPROBE_OVERRIDE配置选项编译内核时可用，在这种情况下，它仅适用于ALLOW_ERROR_INJECTION内核代码中标记的函数。

此外，助手仅适用于具有该CONFIG_FUNCTION_ERROR_INJECTION选项的架构。在撰写本文时，x86 架构是唯一支持此功能的架构。

可以向错误注入框架添加功能。更多信息可以在这里找到：https ://github.com/iovisor/bcc/issues/2485

linux - eBPF 可以修改系统调用的返回值或参数吗？

4 回答 4

Related

Reference