1

我正在尝试在 Groovy 源文件上运行 Clojure 正则表达式来解析各个函数。

// gremlin.groovy

def warm_cache() {
  for (vertex in g.getVertices()) {
    vertex.getOutEdges()
  }
}

def clear() {
  g.clear()
}

这是我在 Clojure 中使用的模式:

(def source (read-file "gremlin.groovy"))

(def pattern #"(?m)^def.*[^}]")   

(re-seq pattern source)

但是,它只抓取第一行,而不是多行函数。

4

3 回答 3

6

作为演示如何从 中获取 AST GroovyRecognizer,并避免尝试使用正则表达式解析语言,您可以在 Groovy 中执行此操作:

import org.codehaus.groovy.antlr.*
import org.codehaus.groovy.antlr.parser.*

def code = '''
// gremlin.groovy

def warm_cache() {
  for (vertex in g.getVertices()) {
    vertex.getOutEdges()
  }
}

def clear() {
  g.clear()
}
'''


def ast = new GroovyRecognizer( new GroovyLexer( new StringReader( code ) ).plumb() ).with { p ->
  p.compilationUnit()
  p.AST
}


while( ast ) {
  println ast.toStringTree()
  ast = ast.nextSibling
}

打印出 AST 中每个GroovySourceAST节点的 AST,为您提供(对于本示例):

 ( METHOD_DEF MODIFIERS TYPE warm_cache PARAMETERS ( { ( for ( in vertex ( ( ( . g getVertices ) ELIST ) ) ( { ( EXPR ( ( ( . vertex getOutEdges ) ELIST ) ) ) ) ) )
 ( METHOD_DEF MODIFIERS TYPE clear PARAMETERS ( { ( EXPR ( ( ( . g clear ) ELIST ) ) ) )

你应该可以用 Clojure 的 java interop 和 groovy-all jar 文件做同样的事情


编辑

要获得更多信息,您只需深入了解 AST 并稍微操作输入脚本。while将上述代码中的循环更改为:

while( ast ) {
  if( ast.type == GroovyTokenTypes.METHOD_DEF ) {
    println """Lines $ast.line to $ast.lineLast
              |  Name:  $ast.firstChild.nextSibling.nextSibling.text
              |  Code:  ${code.split('\n')[ (ast.line-1)..<ast.lineLast ]*.trim().join( ' ' )}
              |   AST:  ${ast.toStringTree()}""".stripMargin()
  }
  ast = ast.nextSibling
}

打印出来:

Lines 4 to 8
  Name:  warm_cache
  Code:  def warm_cache() { for (vertex in g.getVertices()) { vertex.getOutEdges() } }
   AST:   ( METHOD_DEF MODIFIERS TYPE warm_cache PARAMETERS ( { ( for ( in vertex ( ( ( . g getVertices ) ELIST ) ) ( { ( EXPR ( ( ( . vertex getOutEdges ) ELIST ) ) ) ) ) )
Lines 10 to 12
  Name:  clear
  Code:  def clear() { g.clear() }
   AST:   ( METHOD_DEF MODIFIERS TYPE clear PARAMETERS ( { ( EXPR ( ( ( . g clear ) ELIST ) ) ) )

显然,该Code:部分只是重新连接在一起的行,因此如果粘贴回 groovy 中可能无法正常工作,但它们可以让您了解原始代码......

于 2012-05-09T11:27:43.233 回答
3

这是你的正则表达式,而不是 Clojure。您请求匹配def,然后是任何内容,然后是一个不等于右括号的字符。该字符可以在任何地方。你想要实现的是:(?sm)def.*?^}.

于 2012-05-09T10:33:40.383 回答
2

Short answer

(re-seq (Pattern/compile "(?m)^def.*[^}]" Pattern/MULTILINE) source)

From http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html

By default, the regular expressions ^ and $ ignore line terminators and only match at the beginning and the end, respectively, of the entire input sequence. If MULTILINE mode is activated then ^ matches at the beginning of input and after any line terminator except at the end of input. When in MULTILINE mode $ matches just before a line terminator or the end of the input sequence.

You need to be able to pass in

Pattern.MULTILINE

when the pattern is compiled. But there is no option for this on re-seq, so you'll probably need to drop down into Java interop to get this to work properly? Ideally, you really should be able to specify this in Clojure land... :(

UPDATE: Actually, it's not all that bad. Instead of using the literal expression for a regex, just use Java interop for your pattern. Use (re-seq (Pattern/compile "(?m)^def.*[^}]" Pattern/MULTILINE) source) instead (assuming that you've imported java.util.regex.Pattern). I haven't tested this, but I think that will do the trick for you.

于 2012-05-09T11:06:59.333 回答