unix - 用于解析/打印流的 Shell 命令

Question

具体问题

什么是一个 shell 命令来转换这样的字符串

class A(B, C):

成这样的字符串集

B -> A; 
C -> A;

其中 A、B 和 C 都是 \w+ 的形式，而在我写“B, C”的地方，我实际上是指用逗号和空格分隔的任意数量的术语。即“B，C”同样可以是“B”或“B，C，D，E”。

大图

我正在可视化 Python 项目的类层次结构。我正在查看所有 .py 文件的目录，查找类声明，然后将它们转换为DOT 格式。到目前为止，我已经使用 find 和 grep 来获取行列表。我已经在一个小的 python 脚本中完成了上面的操作。如果可能的话，我想只使用标准的 unix 工具链。理想情况下，我想找到另一种可组合的工具来输入和输出并完成链条。

score 1 · Accepted Answer

你想要原始的？这个 sed 脚本应该可以在 V7 之后的所有 UNIX 上运行（但我还没有在任何真正古老的东西上测试过它，所以要小心）。运行它sed -n -f scriptfile infile > outfile

: loop
/^class [A-Za-z0-9_][A-Za-z0-9_]*(\([A-Za-z0-9_][A-Za-z0-9_]*, *\)*[A-Za-z0-9_][A-Za-z0-9_]*):$/{
h
s/^class \([A-Za-z0-9_][A-Za-z0-9_]*\)(\([A-Za-z0-9_][A-Za-z0-9_]*\)[,)].*/\2 -> \1;/
p
g
s/\(class [A-Za-z0-9_][A-Za-z0-9_]*(\)[A-Za-z0-9_][A-Za-z0-9_]*,* */\1/
b loop
}

这些是 BRE（基本正则表达式）。它们没有+运算符（仅在扩展正则表达式中找到），而且绝对没有\w（由 perl 发明）。所以你的简单\w+变成[A-Za-z0-9_][A-Za-z0-9_]*了我不得不多次使用它，导致严重的丑陋。

以伪代码形式，它的作用是：

while the line matches /^class \w+(comma-separated-list-of \w+):$/ {
    save the line in the hold space
    capture the outer \w and the first \w in the parentheses
    replace the entire line with the new string "\2 -> \1;" using the captures
    print the line
    retrieve the line from the hold space
    delete the first member of the comma-separated list
}

score 0 · Accepted Answer

使用 Python 的ast模块解析 Python 和 Python 一样简单。

import ast
class ClassDumper(ast.NodeVisitor):
  def visit_ClassDef(self, clazz):
    def expand_name(expr):
      if isinstance(expr, ast.Name):
        return expr.id
      if isinstance(expr, ast.Attribute):
        return '%s.%s' % (expand_name(expr.value), expr.attr)
      return ast.dump(expr)
    for base in clazz.bases:
      print '%s -> %s;' % (clazz.name, expand_name(base))
    ClassDumper.generic_visit(self, clazz)
ClassDumper().visit(ast.parse(open(__file__).read()))

（这不是完全正确的嵌套，因为它会输出Inner -> Base;而不是Outer.Inner -> Base;，但您可以通过在手动遍历中跟踪上下文来解决这个问题。）

unix - 用于解析/打印流的 Shell 命令

具体问题

大图

2 回答 2

Related

Reference