compiler-construction - 我的 AST 应该是什么样子才能轻松转换？

Question

我有一种类似于 javascript 的最小玩具语言。我生成了一个 AST 来尝试一些优化技术，比如逃逸分析、类型推断。我尝试了一些方法，例如概括运算符标记而不是每个类/函数，在每个节点上保留类型信息......但我仍然不觉得我要去任何地方。工作很快就会变得笨拙......

我研究了 lua5、neko、v8，但是......好吧......我肯定不是周围最聪明的人之一。

有没有人有设计 AST 和在 AST 上实现转换的经验，这很容易处理？我会很感激让您的生活更轻松的提示和技巧？

（请不要叫我去看龙书，我已经有了。）

score 3 · Accepted Answer

As Alan mentioned, the Appel books are great. I had Modern Compiler Implementation in ML for an undergraduate course on compilers.

I would personally avoid doing many transformations on an AST simply because of the number of different constructs you can have and the number of ways the same thing can be expressed. You will frequently have to write code that handles a large number of cases and sub-cases, and as you said, it gets unwieldly very quickly.

It is better to transform the AST to a more minimal representation such as basic blocks in a control flow graph. Each operation can then be written as a simple statement in a basic block. The set of possible operations should be kept small. Just make sure to keep enough information that you can still do all the transformations you want (in particular, don't throw away types). You can also use Single Static Assignment form, where each program variable gets assigned only once. This provides an invariant which simplifies a lot of transformations and analyses.

score 0 · Accepted Answer

0

我使用ANLTR.org，我觉得这很容易。

于 2009-01-06T15:05:22.250 回答

score 0 · Accepted Answer

ASTs represent the structure of program. For complex languages, your AST will necessarily be complicated. So don't assume this should be "easy".

Many folks assume that having an AST, life will be easier. But parsing is just the foothills of the Himalayas. ASTs don't represent the common inferences, such as the meaning of an identifier, what statement executes next, where this data is consumed. Unless you have all these available, you're not going to be able to do much with a real language, let alone do it easily.

It is best to make those inference results cached or explicit: symbol tables, control flows, data flows, ...

One can add pattern matching langauges to enable the recognition of syntax structures, or even to write transformation rules:

optimize_increment(x:exp):statement
= " \x=\x+1; " -->  " \x++ " if no_side_effects(x);

Such rules need to draw on the cached inferences (eg., "side_effects").

Trying to build all this is really hard. One way to make this practical is to amortize the infrastructure cost across many lanuages and transformation applications. Our DMS Software Reengineering Toolkit has all this machinery to varying degree for a wide variety of langauges, including C, C++, Java, C# and COBOL.

score 0 · Accepted Answer

Okay, I won't recommend the Dragon book, since you already have it. Can I recommend the Appel books? There is even some source code demonstrating the concepts, among them using the Visitor pattern for translating Abstract Syntax Trees to Concrete Syntax Trees. Good luck, and enjoy!

compiler-construction - 我的 AST 应该是什么样子才能轻松转换？

4 回答 4

Related

Reference