javascript - 操纵 V8 ast

Question

我打算直接在 v8 代码中实现一个 js 代码覆盖。我最初的目标是为抽象语法树中的每个语句添加一个简单的打印。我看到有一个AstVisitor类，它允许你遍历 AST。所以我的问题是如何在访问者当前访问的语句之后向 AST 添加语句？

score 6 · Accepted Answer

好的，我将总结我的实验。首先，我写的内容适用于 V8，因为它在 Chromium 版本 r157275 中使用，所以事情可能不再起作用 - 但我仍然会链接到当前版本中的位置。

如前所述，您需要自己的 AST 访问者，例如MyAstVisior，它继承自AstVisitor并且必须从那里实现一堆VisitXYZ方法。唯一需要检测/检查执行代码的是VisitFunctionLiteral. 执行的代码是一个函数或源（文件）中的一组松散语句，V8 将其包装在一个函数中，然后执行该函数。

然后，就在解析的 AST 转换为代码之前，在这里（从松散的语句中编译函数）和那里（在运行时编译，当第一次执行预定义函数时），您将访问者传递给函数文字，它将调用VisitFunctionLiteral访问者：

MyAstVisitor myAV(info);
info->function()->Accept(&myAV);
// next line is the V8 compile call
if (!MakeCode(info)) {

我将CompilationInfo指针传递info给自定义访问者，因为需要它来修改 AST。构造函数如下所示：

MyAstVisitor(CompilationInfo* compInfo) :
    _ci(compInfo), _nf(compInfo->isolate(), compInfo->zone()), _z(compInfo->zone()){};

_ci、_nf和_z 是指向和的CompilationInfo指针。AstNodeFactory<AstNullVisitor>Zone

现在，VisitFunctionLiteral您可以遍历函数体并根据需要插入语句。

void MyAstVisitor::VisitFunctionLiteral(FunctionLiteral* funLit){
    // fetch the function body
    ZoneList<Statement*>* body = funLit->body();
    // create a statement list used to collect the instrumented statements
    ZoneList<Statement*>* _stmts = new (_z) ZoneList<Statement*>(body->length(), _z);
    // iterate over the function body and rewrite each statement
    for (int i = 0; i < body->length(); i++) {
       // the rewritten statements are put into the collector
       rewriteStatement(body->at(i), _stmts);
    }
    // replace the original function body with the instrumented one
    body->Clear();
    body->AddAll(_stmts->ToVector(), _z);
}

在该rewriteStatement方法中，您现在可以检查该语句。指针包含一个语句列表，_stmts最终将替换原始函数体。因此，要在每个语句之后添加一个打印语句，您首先添加原始语句，然后添加您自己的打印语句：

void MyAstVisitor::rewriteStatement(Statement* stmt, ZoneList<Statement*>* collector){
    // add original statement
    collector->Add(stmt, _z);

    // create and add print statement, assuming you define print somewhere in JS:

    // 1) create handle (VariableProxy) for print function
    Vector<const char> fName("print", 5);
    Handle<String> fNameStr = Isolate::Current()->factory()->NewStringFromAscii(fName, TENURED);
    fNameStr = Isolate::Current()->factory()->SymbolFromString(fNameStr);
    // create the proxy - (it is vital to use _ci->function()->scope(), _ci->scope() crashes)
    VariableProxy* _printVP = _ci->function()->scope()->NewUnresolved(&_nf, fNameStr, Interface::NewUnknown(_z), 0);

    // 2) create message
    Vector<const char> tmp("Hello World!", 12);
    Handle<String> v8String = Isolate::Current()->factory()->NewStringFromAscii(tmp, TENURED);
    Literal* msg = _nf.NewLiteral(v8String);

    // 3) create argument list, call expression, expression statement and add the latter to the collector
    ZoneList<Expression*>* args = new (_z) ZoneList<Expression*>(1, _z);
    args->Add(msg);
    Call* printCall = _nf.NewCall(_printVP, args, 0);
    ExpressionStatement* printStmt = _nf.NewExpressionStatement(printCall);
    collector->Add(printStmt, _z);   
}

NewCalland的最后一个参数NewUnresolved是指定脚本中位置的数字。我假设这用于调试/错误消息，以告知错误发生的位置。我至少从未遇到将其设置为 0 的问题（在某处 kNoPosition 也有一个常数）。

一些最后的话：这实际上不会在每个语句之后添加打印语句，因为Blocks（例如循环体）是表示语句列表的语句，而循环是具有条件表达式和主体块的语句。因此，您需要检查当前处理的是哪种语句并递归地查看它。重写块与重写函数体几乎相同。

但是当您开始替换或修改现有语句时会遇到问题，因为 AST 还携带有关分支的信息。因此，如果您在某些情况下替换跳转目标，则会破坏您的代码。我想如果直接向单个表达式和语句类型添加重写功能而不是创建新的来替换它们，这可能会被覆盖。

到目前为止，我希望它有所帮助。

javascript - 操纵 V8 ast

1 回答 1

Related

Reference