我正在用 Common Lisp(64 位 Debian GNU/Linux 中的 SBCL 1.1.5)编写一个多架构汇编器/反汇编器,目前该汇编器为 x86-64 的子集生成正确的代码。为了汇编 x86-64 汇编代码,我使用了一个哈希表,其中汇编指令助记符(字符串)例如"jc-rel8"
和"stosb"
是返回 1 个或多个编码函数列表的键,如下所示:
(defparameter *emit-function-hash-table-x64* (make-hash-table :test 'equalp)) (setf (gethash "jc-rel8" *emit-function-hash-table-x64*) (list #'jc-rel8-x86)) (setf (gethash "stosb" *emit-function-hash-table-x64*) (list #'stosb-x86))
编码函数是这样的(虽然有些更复杂):
(defun jc-rel8-x86 (arg1 &rest args) (jcc-x64 #x72 arg1)) (defun stosb-x86 (&rest args) (列表#xaa))
现在我试图通过使用转换为 Common Lisp CLOS 语法的 NASM (NASM 2.11.06)指令编码数据 (file ) 来合并完整的 x86-64 指令集。这意味着用自定义类的实例(到目前为止,一个非常基本的类,大约 20 个带,等的插槽insns.dat
)替换用于发出二进制代码的常规函数(如上面的函数),其中将使用带参数的方法用于为给定指令(助记符)和参数发出二进制代码。转换后的指令数据看起来像这样(但它超过 40'000 行并且正好是 7193和 7193 )。x86-asm-instruction
:initarg
:reader
:initform
emit
make-instance
setf
;; 第一个助记符 + 操作数组合实例 (:is-variant t)。 ;; 从 NASM 的 insns.dat 生成的 x86-64 有 4928 个这样的实例。 (eval-when (:compile-toplevel :load-toplevel :execute) (setf Jcc-imm-near (make-instance 'x86-asm-instruction :名称“Jcc” :操作数“imm|near” :code-string "[i: odf 0f 80+c rel]" :arch-flags (list "386" "BND") :is-variant t)) (setf STOSB-void (make-instance 'x86-asm-instruction :名称“STOSB” :操作数“无效” :code-string "[aa]" :arch-flags (list "8086") :is-variant t)) ;; 然后,容器实例包含(或可以被引用) ;; 每条指令的可能变体。 ;; 从 NASM 的 insns.dat 生成的 x86-64 有 2265 个这样的实例。 (setf Jcc (make-instance 'x86-asm-instruction :名称“Jcc” :is-容器 t :variants (list Jcc-imm-near jcc-imm64-near jcc-imm-short JCC-IMM JCC-IMM JCC-IMM Jcc-IMM))) (setf STOSB (make-instance 'x86-asm-instruction :名称“STOSB” :is-容器 t :variants (list STOSB-void))) ;; 这里还有成千上万的物体...... ) ; 这个括号关闭(eval-when (:compile-toplevel :load-toplevel :execute)
我已经insns.dat
使用一个简单的 Perl 脚本(在下面,但脚本本身没有什么有趣的内容)将 NASM 转换为 Common Lisp 语法(如上),并且原则上它可以工作。所以它可以工作,但是编译那些 7193 个对象真的很慢,并且通常会导致堆耗尽。在我的具有 16G 内存的 Linux Core i7-2760QM 笔记本电脑上,编译一个(eval-when (:compile-toplevel :load-toplevel :execute)
包含 7193 个对象的代码块需要 7 分钟以上,有时会导致堆耗尽,如下所示:
;; Swank 开始于端口:4005。 * 垃圾收集期间堆耗尽:0 字节可用,32 请求。 Gen StaPg UbSta LaSta LUbSt Boxed Unboxed LB LUB !move Alloc Waste Trig WP GCs Mem-age 0:0 0 0 0 0 0 0 0 0 0 0 41943040 0 0 0.0000 1:0 0 0 0 0 0 0 0 0 0 0 41943040 0 0 0.0000 2:0 0 0 0 0 0 0 0 0 0 0 41943040 0 0 0.0000 3:38805 38652 0 0 49474 15433 389 416 0 2144219760 9031056 1442579856 0 1 1.5255 4:127998 127996 0 0 45870 14828 106 143 199 1971682720 25428576 2000000 0 0 0.0000 5:0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000 6:0 0 0 0 1178 163 0 0 0 43941888 0 2000000 985 0 0.0000 分配的总字节数 = 4159844368 动态空间大小字节 = 4194304000 GC 控制变量: *GC-INHIBIT* = 真 *GC-PENDING* = 进行中 *STOP-FOR-GC-PENDING* = 假 在 SBCL pid 9994(tid 46912556431104)中遇到致命错误: 堆筋疲力尽,游戏结束。 欢迎使用 LDB,这是一个用于 Lisp 运行时环境的低级调试器。 数据库>
我必须--dynamic-space-size 4000
为 SBCL 添加参数才能完全编译它,但在分配 4 GB 的动态空间堆之后仍然有时会耗尽。即使解决了堆耗尽问题,仅在类中添加一个插槽('x86-asm-instruction
用于这些实例的类)后编译 7193 个实例超过 7 分钟对于 REPL 中的交互式开发来说太多了(我使用slimv,如果这很重要) .
这是(time (compile-file
输出:
; 捕获 18636 警告条件 ; insns.fasl 写的 ; 编译完成于 0:07:11.329 评估采取: 431.329 秒的实时时间 总运行时间 238.317000 秒(234.972000 用户,3.345000 系统) [ 运行时间包括 6.073 秒的 GC 时间和 232.244 秒的非 GC 时间。] 55.25% 中央处理器 解释了 50,367 个表格 784,044 个 lambda 转换 1,031,842,900,608 个处理器周期 19,402,921,376 字节
使用 OOP (CLOS) 可以将指令助记符(例如jc
或stosb
以上, :name
)、指令的允许操作数 ( :operands
)、指令的二进制编码 (例如#xaa
for stosb
, :code-string
) 和指令的可能架构限制 ( :arch-flags
) 合并到一个对象中。但似乎至少我用了 3 年的计算机效率不足以快速编译大约 7000 个 CLOS 对象实例。
我的问题是:有什么方法可以使 SBCLmake-instance
更快,或者我应该像上面的示例那样在常规函数中保持汇编代码生成?我也很高兴知道任何其他可能的解决方案。
这是 Perl 脚本,以防万一:
#!/usr/bin/env perl
use strict;
use warnings;
# this program converts NASM's `insns.dat` to Common Lisp Object System (CLOS) syntax.
my $firstchar;
my $line_length;
my $are_there_square_brackets;
my $mnemonic_and_operands;
my $mnemonic;
my $operands;
my $code_string;
my $flags;
my $mnemonic_of_current_mnemonic_array;
my $clos_object_name;
my $clos_mnemonic;
my $clos_operands;
my $clos_code_string;
my $clos_flags;
my @object_name_array = ();
my @mnemonic_array = ();
my @operands_array = ();
my @code_string_array = ();
my @flags_array = ();
my @each_mnemonic_only_once_array = ();
my @instruction_variants_array = ();
my @instruction_variants_for_current_instruction_array = ();
open(FILE, 'insns.dat');
$mnemonic_of_current_mnemonic_array = "";
# read one line at once.
while (<FILE>)
{
$firstchar = substr($_, 0, 1);
$line_length = length($_);
$are_there_square_brackets = ($_ =~ /\[.*\]/);
chomp;
if (($line_length > 1) && ($firstchar =~ /[^\t ;]/))
{
if ($are_there_square_brackets)
{
($mnemonic_and_operands, $code_string, $flags) = split /[\[\]]+/, $_;
$code_string = "[" . $code_string . "]";
($mnemonic, $operands) = split /[\t ]+/, $mnemonic_and_operands;
}
else
{
($mnemonic, $operands, $code_string, $flags) = split /[\t ]+/, $_;
}
$mnemonic =~ s/[\t ]+/ /g;
$operands =~ s/[\t ]+/ /g;
$code_string =~ s/[\t ]+/ /g;
$flags =~ s/[\t ]+//g;
# we don't want non-x86-64 instructions here.
unless ($flags =~ "NOLONG")
{
# ok, the content of each field is now filtered,
# let's convert them to a suitable Common Lisp format.
$clos_object_name = $mnemonic . "-" . $operands;
# in Common Lisp object names `|`, `,`, and `:` must be escaped with a backslash `\`,
# but that would get too complicated.
# so we'll simply replace them:
# `|` -> `-`.
# `,` -> `.`.
# `:` -> `.`.
$clos_object_name =~ s/\|/-/g;
$clos_object_name =~ s/,/./g;
$clos_object_name =~ s/:/./g;
$clos_mnemonic = "\"" . $mnemonic . "\"";
$clos_operands = "\"" . $operands . "\"";
$clos_code_string = "\"" . $code_string . "\"";
$clos_flags = "\"" . $flags . "\""; # add first and last double quotes.
$clos_flags =~ s/,/" "/g; # make each flag its own Common Lisp string.
$clos_flags = "(list " . $clos_flags. ")"; # convert to `list` syntax.
push @object_name_array, $clos_object_name;
push @mnemonic_array, $clos_mnemonic;
push @operands_array, $clos_operands;
push @code_string_array, $clos_code_string;
push @flags_array, $clos_flags;
if ($mnemonic eq $mnemonic_of_current_mnemonic_array)
{
# ok, same mnemonic as the previous one,
# so the current object name goes to the list.
push @instruction_variants_for_current_instruction_array, $clos_object_name;
}
else
{
# ok, this is a new mnemonic.
# so we'll mark this as current mnemonic.
$mnemonic_of_current_mnemonic_array = $mnemonic;
push @each_mnemonic_only_once_array, $mnemonic;
# we first push the old array (unless it's empty), then clear it,
# and then push the current object name to the cleared array.
if (@instruction_variants_for_current_instruction_array)
{
# push the variants array, unless it's empty.
push @instruction_variants_array, [ @instruction_variants_for_current_instruction_array ];
}
@instruction_variants_for_current_instruction_array = ();
push @instruction_variants_for_current_instruction_array, $clos_object_name;
}
}
}
}
# the last instruction's instruction variants must be pushed too.
if (@instruction_variants_for_current_instruction_array)
{
# push the variants array, unless it's empty.
push @instruction_variants_array, [ @instruction_variants_for_current_instruction_array ];
}
close(FILE);
# these objects need be created already during compilation.
printf("(eval-when (:compile-toplevel :load-toplevel :execute)\n");
# print the code to create each instruction + operands combination object.
for (my $i=0; $i <= $#mnemonic_array; $i++)
{
$clos_object_name = $object_name_array[$i];
$mnemonic = $mnemonic_array[$i];
$operands = $operands_array[$i];
$code_string = $code_string_array[$i];
$flags = $flags_array[$i];
# print the code to create a variant object.
# each object here is a variant of a single instruction (or a single mnemonic).
# actually printed as 6 lines to make it easier to read (for us humans, I mean), with an empty line in the end.
printf("(setf %s (make-instance 'x86-asm-instruction\n:name %s\n:operands %s\n:code-string %s\n:arch-flags %s\n:is-variant t))",
$clos_object_name,
$mnemonic,
$operands,
$code_string,
$flags);
printf("\n\n");
}
# print the code to create each instruction + operands combination object.
# for (my $i=0; $i <= $#each_mnemonic_only_once_array; $i++)
for my $i (0 .. $#instruction_variants_array)
{
$mnemonic = $each_mnemonic_only_once_array[$i];
# print the code to create a container object.
printf("(setf %s (make-instance 'x86-asm-instruction :name \"%s\" :is-container t :variants (list \n", $mnemonic, $mnemonic);
@instruction_variants_for_current_instruction_array = $instruction_variants_array[$i];
# for (my $j=0; $j <= $#instruction_variants_for_current_instruction_array; $j++)
for my $j (0 .. $#{$instruction_variants_array[$i]} )
{
printf("%s", $instruction_variants_array[$i][$j]);
# print 3 closing brackets if this is the last variant.
if ($j == $#{$instruction_variants_array[$i]})
{
printf(")))");
}
else
{
printf(" ");
}
}
# if this is not the last instruction, print two newlines.
if ($i < $#instruction_variants_array)
{
printf("\n\n");
}
}
# print the closing bracket to close `eval-when`.
print(")");
exit;