7

我正在用 Common Lisp(64 位 Debian GNU/Linux 中的 SBCL 1.1.5)编写一个多架构汇编器/反汇编器,目前该汇编器为 x86-64 的子集生成正确的代码。为了汇编 x86-64 汇编代码,我使用了一个哈希表,其中汇编指令助记符(字符串)例如"jc-rel8""stosb"是返回 1 个或多个编码函数列表的键,如下所示:

(defparameter *emit-function-hash-table-x64* (make-hash-table :test 'equalp))
(setf (gethash "jc-rel8" *emit-function-hash-table-x64*) (list #'jc-rel8-x86))
(setf (gethash "stosb" *emit-function-hash-table-x64*) (list #'stosb-x86))

编码函数是这样的(虽然有些更复杂):

(defun jc-rel8-x86 (arg1 &rest args)
  (jcc-x64 #x72 arg1))

(defun stosb-x86 (&rest args)
  (列表#xaa))

现在我试图通过使用转换为 Common Lisp CLOS 语法的 NASM (NASM 2.11.06)指令编码数据 (file ) 来合并完整的 x86-64 指令集。这意味着用自定义类的实例(到目前为止,一个非常基本的类,大约 20 个带,等的插槽insns.dat)替换用于发出二进制代码的常规函数​​(如上面的函数),其中将使用带参数的方法用于为给定指令(助记符)和参数发出二进制代码。转换后的指令数据看起来像这样(但它超过 40'000 行并且正好是 7193和 7193 )。x86-asm-instruction:initarg:reader:initformemitmake-instancesetf

;; 第一个助记符 + 操作数组合实例 (:is-variant t)。
;; 从 NASM 的 insns.dat 生成的 x86-64 有 4928 个这样的实例。

(eval-when (:compile-toplevel :load-toplevel :execute)

(setf Jcc-imm-near (make-instance 'x86-asm-instruction
:名称“Jcc”
:操作数“imm|near”
:code-string "[i: odf 0f 80+c rel]"
:arch-flags (list "386" "BND")
:is-variant t))

(setf STOSB-void (make-instance 'x86-asm-instruction
:名称“STOSB”
:操作数“无效”
:code-string "[aa]"
:arch-flags (list "8086")
:is-variant t))

;; 然后,容器实例包含(或可以被引用)
;; 每条指令的可能变体。
;; 从 NASM 的 insns.dat 生成的 x86-64 有 2265 个这样的实例。

(setf Jcc (make-instance 'x86-asm-instruction
                         :名称“Jcc”
                         :is-容器 t
                         :variants (list Jcc-imm-near
                                         jcc-imm64-near
                                         jcc-imm-short
                                         JCC-IMM
                                         JCC-IMM
                                         JCC-IMM
                                         Jcc-IMM)))

(setf STOSB (make-instance 'x86-asm-instruction
                           :名称“STOSB”
                           :is-容器 t
                           :variants (list STOSB-void)))

;; 这里还有成千上万的物体......

) ; 这个括号关闭(eval-when (:compile-toplevel :load-toplevel :execute)

我已经insns.dat使用一个简单的 Perl 脚本(在下面,但脚本本身没有什么有趣的内容)将 NASM 转换为 Common Lisp 语法(如上),并且原则上它可以工作。所以它可以工作,但是编译那些 7193 个对象真的很慢,并且通常会导致堆耗尽。在我的具有 16G 内存的 Linux Core i7-2760QM 笔记本电脑上,编译一个(eval-when (:compile-toplevel :load-toplevel :execute)包含 7193 个对象的代码块需要 7 分钟以上,有时会导致堆耗尽,如下所示:

;; Swank 开始于端口:4005。
* 垃圾收集期间堆耗尽:0 字节可用,32 请求。
 Gen StaPg UbSta LaSta LUbSt Boxed Unboxed LB LUB !move Alloc Waste Trig WP GCs Mem-age
   0:0 0 0 0 0 0 0 0 0 0 0 41943040 0 0 0.0000
   1:0 0 0 0 0 0 0 0 0 0 0 41943040 0 0 0.0000
   2:0 0 0 0 0 0 0 0 0 0 0 41943040 0 0 0.0000
   3:38805 38652 0 0 49474 15433 389 416 0 2144219760 9031056 1442579856 0 1 1.5255
   4:127998 127996 0 0 45870 14828 106 143 199 1971682720 25428576 2000000 0 0 0.0000
   5:0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000
   6:0 0 0 0 1178 163 0 0 0 43941888 0 2000000 985 0 0.0000
   分配的总字节数 = 4159844368
   动态空间大小字节 = 4194304000
GC 控制变量:
   *GC-INHIBIT* = 真
   *GC-PENDING* = 进行中
   *STOP-FOR-GC-PENDING* = 假
在 SBCL pid 9994(tid 46912556431104)中遇到致命错误:
堆筋疲力尽,游戏结束。

欢迎使用 LDB,这是一个用于 Lisp 运行时环境的低级调试器。
数据库>

我必须--dynamic-space-size 4000为 SBCL 添加参数才能完全编译它,但在分配 4 GB 的动态空间堆之后仍然有时会耗尽。即使解决了堆耗尽问题,仅在类中添加一个插槽('x86-asm-instruction用于这些实例的类)后编译 7193 个实例超过 7 分钟对于 REPL 中的交互式开发来说太多了(我使用slimv,如果这很重要) .

这是(time (compile-file输出:

; 捕获 18636 警告条件


; insns.fasl 写的
; 编译完成于 0:07:11.329
评估采取:
  431.329 秒的实时时间
  总运行时间 238.317000 秒(234.972000 用户,3.345000 系统)
  [ 运行时间包括 6.073 秒的 GC 时间和 232.244 秒的非 GC 时间。]
  55.25% 中央处理器
  解释了 50,367 个表格
  784,044 个 lambda 转换
  1,031,842,900,608 个处理器周期
  19,402,921,376 字节

使用 OOP (CLOS) 可以将指令助记符(例如jcstosb以上, :name)、指令的允许操作数 ( :operands)、指令的二进制编码 (例如#xaafor stosb, :code-string) 和指令的可能架构限制 ( :arch-flags) 合并到一个对象中。但似乎至少我用了 3 年的计算机效率不足以快速编译大约 7000 个 CLOS 对象实例。

我的问题是:有什么方法可以使 SBCLmake-instance更快,或者我应该像上面的示例那样在常规函数中保持汇编代码生成?我也很高兴知道任何其他可能的解决方案。

这是 Perl 脚本,以防万一:

#!/usr/bin/env perl
use strict;
use warnings;

# this program converts NASM's `insns.dat` to Common Lisp Object System (CLOS) syntax.

my $firstchar;
my $line_length;
my $are_there_square_brackets;
my $mnemonic_and_operands;
my $mnemonic;
my $operands;
my $code_string;
my $flags;
my $mnemonic_of_current_mnemonic_array;

my $clos_object_name;
my $clos_mnemonic;
my $clos_operands;
my $clos_code_string;
my $clos_flags;

my @object_name_array = ();
my @mnemonic_array = ();
my @operands_array = ();
my @code_string_array = ();
my @flags_array = ();

my @each_mnemonic_only_once_array = ();

my @instruction_variants_array = ();
my @instruction_variants_for_current_instruction_array = ();

open(FILE, 'insns.dat');

$mnemonic_of_current_mnemonic_array = "";

# read one line at once.
while (<FILE>)
{
    $firstchar = substr($_, 0, 1);
    $line_length = length($_);
    $are_there_square_brackets = ($_ =~ /\[.*\]/);
    chomp;
    if (($line_length > 1) && ($firstchar =~ /[^\t ;]/))
    {
        if ($are_there_square_brackets)
        {
            ($mnemonic_and_operands, $code_string, $flags) = split /[\[\]]+/, $_;
            $code_string = "[" . $code_string . "]";
            ($mnemonic, $operands) = split /[\t ]+/, $mnemonic_and_operands;
        }
        else
        {
            ($mnemonic, $operands, $code_string, $flags) = split /[\t ]+/, $_;
        }
        $mnemonic =~ s/[\t ]+/ /g;
        $operands =~ s/[\t ]+/ /g;
        $code_string =~ s/[\t ]+/ /g;
        $flags =~ s/[\t ]+//g;

        # we don't want non-x86-64 instructions here.
        unless ($flags =~ "NOLONG")
        {
            # ok, the content of each field is now filtered,
            # let's convert them to a suitable Common Lisp format.
            $clos_object_name = $mnemonic . "-" . $operands;

            # in Common Lisp object names `|`, `,`, and `:` must be escaped with a backslash `\`,
            # but that would get too complicated.
            # so we'll simply replace them:
            # `|` -> `-`.
            # `,` -> `.`.
            # `:` -> `.`.
            $clos_object_name =~ s/\|/-/g;              
            $clos_object_name =~ s/,/./g;              
            $clos_object_name =~ s/:/./g;              

            $clos_mnemonic    = "\"" . $mnemonic . "\"";
            $clos_operands    = "\"" . $operands . "\"";
            $clos_code_string = "\"" . $code_string . "\"";

            $clos_flags = "\"" . $flags . "\"";        # add first and last double quotes.
            $clos_flags =~ s/,/" "/g;                  # make each flag its own Common Lisp string.
            $clos_flags = "(list " . $clos_flags. ")"; # convert to `list` syntax.

            push @object_name_array, $clos_object_name;
            push @mnemonic_array, $clos_mnemonic;
            push @operands_array, $clos_operands;
            push @code_string_array, $clos_code_string;
            push @flags_array, $clos_flags;

            if ($mnemonic eq $mnemonic_of_current_mnemonic_array)
            {
                # ok, same mnemonic as the previous one,
                # so the current object name goes to the list.
                push @instruction_variants_for_current_instruction_array, $clos_object_name;
            }
            else
            {
                # ok, this is a new mnemonic.
                # so we'll mark this as current mnemonic.
                $mnemonic_of_current_mnemonic_array = $mnemonic;
                push @each_mnemonic_only_once_array, $mnemonic;

                # we first push the old array (unless it's empty), then clear it,
                # and then push the current object name to the cleared array.

                if (@instruction_variants_for_current_instruction_array)
                {
                    # push the variants array, unless it's empty.
                    push @instruction_variants_array, [ @instruction_variants_for_current_instruction_array ];
                }
                @instruction_variants_for_current_instruction_array = ();
                push @instruction_variants_for_current_instruction_array, $clos_object_name;
            }
        }
    }
}

# the last instruction's instruction variants must be pushed too.
if (@instruction_variants_for_current_instruction_array)
{
    # push the variants array, unless it's empty.
    push @instruction_variants_array, [ @instruction_variants_for_current_instruction_array ];
}

close(FILE);

# these objects need be created already during compilation.
printf("(eval-when (:compile-toplevel :load-toplevel :execute)\n");

# print the code to create each instruction + operands combination object.

for (my $i=0; $i <= $#mnemonic_array; $i++)
{
    $clos_object_name = $object_name_array[$i];
    $mnemonic         = $mnemonic_array[$i];
    $operands         = $operands_array[$i];
    $code_string      = $code_string_array[$i];
    $flags            = $flags_array[$i];

    # print the code to create a variant object.
    # each object here is a variant of a single instruction (or a single mnemonic).
    # actually printed as 6 lines to make it easier to read (for us humans, I mean), with an empty line in the end.
    printf("(setf %s (make-instance 'x86-asm-instruction\n:name %s\n:operands %s\n:code-string %s\n:arch-flags %s\n:is-variant t))",
        $clos_object_name,
        $mnemonic,
        $operands,
        $code_string,
        $flags);
    printf("\n\n");
}

# print the code to create each instruction + operands combination object.

# for (my $i=0; $i <= $#each_mnemonic_only_once_array; $i++)
for my $i (0 .. $#instruction_variants_array)
{
    $mnemonic = $each_mnemonic_only_once_array[$i];

    # print the code to create a container object.
    printf("(setf %s (make-instance 'x86-asm-instruction :name \"%s\" :is-container t :variants (list \n", $mnemonic, $mnemonic);
    @instruction_variants_for_current_instruction_array = $instruction_variants_array[$i];

    # for (my $j=0; $j <= $#instruction_variants_for_current_instruction_array; $j++)
    for my $j (0 .. $#{$instruction_variants_array[$i]} )
    {
        printf("%s", $instruction_variants_array[$i][$j]);

        # print 3 closing brackets if this is the last variant.
        if ($j == $#{$instruction_variants_array[$i]})
        {
            printf(")))");
        }
        else
        {
            printf(" ");
        }
    }

    # if this is not the last instruction, print two newlines.
    if ($i < $#instruction_variants_array)
    {
        printf("\n\n");
    }
}

# print the closing bracket to close `eval-when`.
print(")");

exit;
4

1 回答 1

10

18636 个警告看起来很糟糕,首先要摆脱所有警告。

我会从摆脱EVAL-WHEN所有这些开始。对我来说没有多大意义。要么直接加载文件,要么编译并加载文件。

另请注意,SBCL 不喜欢(setf STOSB-void ...)未定义变量。DEFVAR使用或引入新的顶级变量DEFPARAMETERSETF只是设置它们,但不定义它们。这应该有助于摆脱警告。

此外:is-container t:is-variant t闻起来应该将这些属性转换为要继承的类(例如作为 mixin)。容器具有变体。变体没有变体。

于 2014-11-24T00:26:19.130 回答