c++ - 如何直接从内存编译执行？

Question

是否可以在不生成可执行文件但直接从内存中写入和执行的情况下编译 C++（或类似）程序？

例如，使用GCCand clang，其效果类似于：

c++ hello.cpp -o hello.x && ./hello.x $@ && rm -f hello.x

在命令行中。

但是没有将可执行文件写入磁盘以立即加载/重新运行它的负担。

（如果可能，该过程可能不使用磁盘空间或至少不使用当前目录中可能是只读的空间）。

score 51 · Accepted Answer

Possible? Not the way you seem to wish. The task has two parts:

1) How to get the binary into memory

When we specify /dev/stdout as output file in Linux we can then pipe into our program x0 that reads an executable from stdin and executes it:

  gcc -pipe YourFiles1.cpp YourFile2.cpp -o/dev/stdout -Wall | ./x0

In x0 we can just read from stdin until reaching the end of the file:

int main(int argc, const char ** argv)
{
    const int stdin = 0;
    size_t ntotal = 0;
    char * buf = 0;
    while(true)
    {
        /* increasing buffer size dynamically since we do not know how many bytes to read */
        buf = (char*)realloc(buf, ntotal+4096*sizeof(char));
        int nread = read(stdin, buf+ntotal, 4096); 
        if (nread<0) break;
        ntotal += nread;
    }
    memexec(buf, ntotal, argv); 
}

It would also be possible for x0 directly execute the compiler and read the output. This question has been answered here: Redirecting exec output to a buffer or file

Caveat: I just figured out that for some strange reason this does not work when I use pipe | but works when I use the x0 < foo.

Note: If you are willing to modify your compiler or you do JIT like LLVM, clang and other frameworks you could directly generate executable code. However for the rest of this discussion I assume you want to use an existing compiler.

Note: Execution via temporary file

Other programs such as UPX achieve a similar behavior by executing a temporary file, this is easier and more portable than the approach outlined below. On systems where /tmp is mapped to a RAM disk for example typical servers, the temporary file will be memory based anyway.

#include<cstring> // size_t
#include <fcntl.h>
#include <stdio.h> // perror
#include <stdlib.h> // mkostemp
#include <sys/stat.h> // O_WRONLY
#include <unistd.h> // read
int memexec(void * exe, size_t exe_size, const char * argv)
{
    /* random temporary file name in /tmp */
    char name[15] = "/tmp/fooXXXXXX"; 
    /* creates temporary file, returns writeable file descriptor */
    int fd_wr = mkostemp(name,  O_WRONLY);
    /* makes file executable and readonly */
    chmod(name, S_IRUSR | S_IXUSR);
    /* creates read-only file descriptor before deleting the file */
    int fd_ro = open(name, O_RDONLY);
    /* removes file from file system, kernel buffers content in memory until all fd closed */
    unlink(name);
    /* writes executable to file */
    write(fd_wr, exe, exe_size);
    /* fexecve will not work as long as there in a open writeable file descriptor */
    close(fd_wr);
    char *const newenviron[] = { NULL };
    /* -fpermissive */
    fexecve(fd_ro, argv, newenviron);
    perror("failed");
}

Caveat: Error handling is left out for clarities sake. Includes for sake of brevity.

Note: By combining step main() and memexec() into a single function and using splice(2) for copying directly between stdin and fd_wr the program could be significantly optimized.

2) Execution directly from memory

One does not simply load and execute an ELF binary from memory. Some preparation, mostly related to dynamic linking, has to happen. There is a lot of material explaining the various steps of the ELF linking process and studying it makes me believe that theoretically possible. See for example this closely related question on SO however there seems not to exist a working solution.

Update UserModeExec seems to come very close.

Writing a working implementation would be very time consuming, and surely raise some interesting questions in its own right. I like to believe this is by design: for most applications it is strongly undesirable to (accidentially) execute its input data because it allows code injection.

What happens exactly when an ELF is executed? Normally the kernel receives a file name and then creates a process, loads and maps the different sections of the executable into memory, performs a lot of sanity checks and marks it as executable before passing control and a file name back to the run-time linker ld-linux.so (part of libc). The takes care of relocating functions, handling additional libraries, setting up global objects and jumping to the executables entry point. AIU this heavy lifting is done by dl_main() (implemented in libc/elf/rtld.c).

Even fexecve is implemented using a file in /proc and it is this need for a file name that leads us to reimplement parts of this linking process.

Libraries

UserModeExec
libelf -- read, modify, create ELF files
eresi -- play with elfes
OSKit (seems like a dead project though)

Reading

http://www.linuxjournal.com/article/1060?page=0,0 -- introduction
http://wiki.osdev.org/ELF -- good overview
http://s.eresi-project.org/inc/articles/elf-rtld.txt -- more detailed Linux-specific explanation
http://www.codeproject.com/Articles/33340/Code-Injection-into-Running-Linux-Application -- how to get to hello world
http://www.acsu.buffalo.edu/~charngda/elf.html -- nice reference of ELF structure
Loaders and Linkers by John Levine -- deeoer explanation of linking

Related Questions at SO

So it seems possible, you decide whether is also practical.

score 24 · Accepted Answer

是的，尽管正确执行此操作需要考虑到这一点来设计编译器的重要部分。LLVM 人已经做到了这一点，首先使用了一个独立的 JIT，然后使用了MC子项目。我认为没有现成的工具可以做到这一点。但原则上，它只是链接到 clang 和 llvm，将源传递给 clang，并将它创建的 IR 传递给 MCJIT 的问题。也许一个演示可以做到这一点（我隐约记得一个像这样工作的基本 C 解释器，尽管我认为它是基于遗留 JIT 的）。

编辑：找到我记得的演示。此外，还有cling，这似乎基本上符合我的描述，但更好。

score 21 · Accepted Answer

Linux 可以使用tempfs在 RAM 中创建虚拟文件系统。例如，我tmp在文件系统表中设置了我的目录，如下所示：

tmpfs       /tmp    tmpfs   nodev,nosuid    0   0

使用它，我放入的任何文件/tmp都存储在我的 RAM 中。

Windows 似乎没有任何“官方”的方式来做到这一点，但有很多第三方选项。

如果没有这个“RAM 磁盘”概念，您可能必须大量修改编译器和链接器才能完全在内存中运行。

score 8 · Accepted Answer

如果您不是特别依赖于 C++，您还可以考虑其他基于 JIT 的解决方案：

在 Common Lisp 中，SBCL能够动态生成机器代码
您可以使用 TinyCC 及其libtcc.a从内存中的 C 代码中快速发出较差（即未优化）的机器代码。
还可以考虑任何 JITing 库，例如libjit、 GNU Lightning、LLVM、GCCJIT、asmjit
当然在一些 tmpfs 上发出 C++ 代码并编译它......

但是，如果您想要好的机器代码，则需要对其进行优化，这并不快（因此写入文件系统的时间可以忽略不计）。

如果你依赖于 C++ 生成的代码，你需要一个好的 C++ 优化编译器（例如g++或clang++）；他们需要花费大量时间将 C++ 代码编译为优化的二进制文件，因此您应该生成某个文件foo.cc（可能在像 some 这样的 RAM 文件系统中tmpfs，但这会带来很小的收益，因为大部分时间都花在内部g++或clang++优化通过，而不是从磁盘读取），然后将其编译foo.cc为foo.so（也许使用make，或者至少使用 forking g++ -Wall -shared -O2 foo.cc -o foo.so，也许使用其他库）。最后让你的主程序dlopen生成foo.so. FWIW，MELT正是这样做的，在 Linux 工作站上，manydl.c程序显示一个进程可以生成然后dlopen(3)数十万个临时插件，每个插件都是通过生成一个临时 C 文件并编译它来获得的。对于 C++，请阅读C++ dlopen mini HOWTO。

^{或者，生成一个独立的源程序foobar.cc，将其编译为可执行文件，foobarbin例如g++ -O2 foobar.cc -o foobarbin使用execve该foobarbin可执行二进制文件执行}

在生成 C++ 代码时，您可能希望避免生成微小的 C++ 源文件（例如，仅十几行；如果可能，至少生成几百行的 C++ 文件；除非template通过大量使用现有 C++容器进行大量扩展，其中生成一个将它们组合起来的小型 C++ 函数是有意义的）。例如，尽可能尝试将多个生成的 C++ 函数放在同一个生成的 C++ 文件中（但避免生成非常大的 C++ 函数，例如在单个函数中包含 10KLOC；它们需要很长时间才能被 GCC 编译）。如果相关，您可以考虑在生成的 C++ 文件中只有一个#include，并预编译通常包含的头文件。

雅克·皮特拉 (Jacques Pitrat ) 的书《人造人，有意识的机器的良心》 (ISBN 9781848211018) 详细解释了为什么在运行时生成代码是有用的（在像他的 CAIA 系统这样的符号人工智能系统中）。RefPerSys项目试图遵循这个想法，并在运行时生成一些 C++ 代码（希望越来越多）。部分评价是一个相关概念。

您的软件在生成 C++ 代码时花费的 CPU 时间可能比在编译时花费更多的GCC时间。

score 1 · Accepted Answer

可以轻松地修改编译器本身。这听起来很难，但仔细想想，它接缝很明显。因此，修改编译器源代码直接公开一个库并使其成为共享库不应该花太多钱（取决于实际实现）。

只需用内存映射文件的解决方案替换每个文件访问。

这是我要做的事情，在后台透明地编译一些东西来操作代码并从 Java 中执行这些代码。

-

但是考虑到您最初的问题，您希望加快编译以及您的编辑和运行周期。首先获得一个 SSD 磁盘，您几乎可以获得内存速度（使用 PCI 版本），并且可以说它是我们正在谈论的 C。C 执行此链接步骤会导致非常复杂的操作，这些操作可能比从磁盘读取和写入磁盘花费更多时间。因此，只需将所有内容都放在 SSD 上并忍受滞后。

score 1 · Accepted Answer

tcc 编译器“-run”选项允许这样做，编译到内存中，在那里运行，最后丢弃编译的东西。不需要文件系统空间。"tcc -run" 可用于 shebang 以允许 C 脚本，来自 tcc 手册页：

#!/usr/local/bin/tcc -run
#include <stdio.h>

int main()
{
    printf("Hello World\n");
    return 0;
}

C 脚本允许混合 bash/C 脚本，“tcc -run”不需要任何临时空间：

#!/bin/bash

echo "foo"
sed -n "/^\/\*\*$/,\$p" $0 | tcc -run -

exit
/**
*/
#include <stdio.h>

int main()
{
    printf("bar\n");
    return 0;
}

执行输出：

$ ./shtcc2
foo
bar
$

带有 gcc 的 C 脚本也是可能的，但需要像其他提到的那样临时空间来存储可执行文件。此脚本产生与前一个相同的输出：

#!/bin/bash

exc=/tmp/`basename $0`
if [ $0 -nt $exc ]; then sed -n "/^\/\*\*$/,\$p" $0 | gcc -x c - -o $exc; fi

echo "foo"
$exc

exit
/**
*/
#include <stdio.h>

int main()
{
    printf("bar\n");
    return 0;
}

带有后缀“.c”的 C 脚本很好，headtail.c是我第一个需要可执行的“.c”文件：

$ echo -e "1\n2\n3\n4\n5\n6\n7" | ./headtail.c 
1
2
3
6
7
$

我喜欢 C 脚本，因为你只有一个文件，你可以轻松地四处移动，并且 bash 或 C 部分的更改不需要进一步的操作，它们只会在下一次执行时起作用。

PS：
上面显示的“tcc -run” C 脚本有问题，C 脚本标准输入不能用于执行的 C 代码。原因是我通过管道将提取的 C 代码传递给“tcc -run”。新的要点run_from_memory_stdin.c正确地做到了：

...
echo "foo"
tcc -run <(sed -n "/^\/\*\*$/,\$p" $0) 42
...

“foo”由 bash 部分打印，“bar 42”由 C 部分打印（42 被传递 argv[⁠1]），然后从 C 代码打印管道脚本输入：

$ route -n | ./run_from_memory_stdin.c 
foo
bar 42
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         172.29.58.98    0.0.0.0         UG    306    0        0 wlan1
10.0.0.0        0.0.0.0         255.255.255.0   U     0      0        0 wlan0
169.254.0.0     0.0.0.0         255.255.0.0     U     303    0        0 wlan0
172.29.58.96    0.0.0.0         255.255.255.252 U     306    0        0 wlan1
$

score 0 · Accepted Answer

最后，OP问题的答案是肯定的！

我从 guitmz 中找到了 memrun repo，它演示了使用 golang 和汇编程序从内存中运行 (x86_64) ELF。我分叉了它，并提供了 C 版本的 memrun，它可以从标准输入或通过第一个参数进程替换运行 ELF 二进制文件（在 x86_64 和 armv7l 上验证）。repo 包含演示和文档（memrun.c 只有 47 行代码）：
https ://github.com/Hermann-SW/memrun/tree/master/C#memrun

这是最简单的示例，使用“-o /dev/fd/1”gcc 编译的 ELF 被发送到标准输出，并通过管道传输到执行它的 memrun：

pi@raspberrypi400:~/memrun/C $ gcc info.c -o /dev/fd/1 | ./memrun
My process ID : 20043
argv[0] : ./memrun
no argv[1]
evecve --> /usr/bin/ls -l /proc/20043/fd
total 0
lr-x------ 1 pi pi 64 Sep 18 22:27 0 -> 'pipe:[1601148]'
lrwx------ 1 pi pi 64 Sep 18 22:27 1 -> /dev/pts/4
lrwx------ 1 pi pi 64 Sep 18 22:27 2 -> /dev/pts/4
lr-x------ 1 pi pi 64 Sep 18 22:27 3 -> /proc/20043/fd
pi@raspberrypi400:~/memrun/C $

我对这个话题感兴趣的原因是在“C 脚本”中的使用。run_from_memory_stdin.c 一起演示：

pi@raspberrypi400:~/memrun/C $ wc memrun.c | ./run_from_memory_stdin.c 
foo
bar 42
  47  141 1005 memrun.c
pi@raspberrypi400:~/memrun/C $

产生显示输出的 C 脚本是如此之小......

#!/bin/bash

echo "foo"
./memrun <(gcc -o /dev/fd/1 -x c <(sed -n "/^\/\*\*$/,\$p" $0)) 42

exit
/**
*/
#include <stdio.h>

int main(int argc, char *argv[])
{
  printf("bar %s\n", argc>1 ? argv[1] : "(undef)");

  for(int c=getchar(); EOF!=c; c=getchar())  { putchar(c); }

  return 0;
}

PS：
我在gcc和g++中添加了tcc的“-run”选项，详情见：
https ://github.com/Hermann-SW/memrun/tree/master/C#adding-tcc--run-option-to- gcc 和 g

很好，没有任何东西存储在文件系统中：

pi@raspberrypi400:~/memrun/C $ uname -a | g++ -O3 -Wall -run demo.cpp 42
bar 42
Linux raspberrypi400 5.10.60-v7l+ #1449 SMP Wed Aug 25 15:00:44 BST 2021 armv7l GNU/Linux
pi@raspberrypi400:~/memrun/C $

c++ - 如何直接从内存编译执行？

7 回答 7

1) How to get the binary into memory

Note: Execution via temporary file

2) Execution directly from memory

Related

Reference