java - 如何使用 Perl 识别对 Java 类的引用？

Question

我正在编写一个 Perl 脚本，并且已经到了需要逐行解析 Java 源文件以检查对完全限定 Java 类名的引用的地步。我预先知道我正在寻找的课程；也是正在搜索的源文件的完全限定名称（基于其路径）。

例如，在 com/bob/is/YourUncle.java 文件中查找对 foo.bar.Baz 的所有有效引用。

目前我能想到的需要考虑的情况是：

被解析的文件与搜索类在同一个包中。

在 foo/bar/Boing.java 中找到 foo.bar.Baz 引用

它应该忽略评论。

// this is a comment saying this method returns a foo.bar.Baz or Baz instance 
// it shouldn't count

/*   a multiline comment as well
 this shouldn't count
 if I put foo.bar.Baz or Baz in here either */

内联完全限定参考。
```
foo.bar.Baz fb = new foo.bar.Baz();
```

基于导入语句的引用。

import foo.bar.Baz;
...
Baz b = new Baz();

在 Perl 5.8 中最有效的方法是什么？也许是一些花哨的正则表达式？

open F, $File::Find::name or die;
# these three things are already known
# $classToFind    looking for references of this class
# $pkgToFind      the package of the class you're finding references of
# $currentPkg     package name of the file being parsed
while(<F>){
  # ... do work here   
}
close F;
# the results are availble here in some form

score 5 · Accepted Answer

正则表达式可能是最好的解决方案，尽管我确实在 CPAN 中找到了您可以使用的以下模块

Java::JVM::Classfile - 解析编译的类文件并返回有关它们的信息。您必须先编译文件才能使用它。

另外，请记住，使用正则表达式捕获多行注释的所有可能变体可能很棘手。

score 5 · Accepted Answer

您还需要跳过带引号的字符串（如果您不处理带引号的字符串，您甚至无法正确跳过注释）。

我可能会编写一个相当简单、高效且不完整的标记器，与我在节点 566467中编写的非常相似。

基于该代码，我可能只是挖掘寻找\bimport\b和\b\Q$toFind\E\b匹配的非注释/非字符串块。也许类似于：

if( m[
        \G
        (?:
            [^'"/]+
          | /(?![/*])
        )+
    ]xgc
) {
    my $code = substr( $_, $-[0], $+[0] - $-[0] );
    my $imported = 0;
    while( $code =~ /\b(import\s+)?\Q$package\E\b/g ) {
        if( $1 ) {
            ... # Found importing of package
            while( $code =~ /\b\Q$class\E\b/g ) {
                ... # Found mention of imported class
            }
            last;
        }
        ... # Found a package reference
    }
} elsif( m[ \G ' (?: [^'\\]+ | \\. )* ' ]xgc
    ||   m[ \G " (?: [^"\\]+ | \\. )* " ]xgc
) {
    # skip quoted strings
} elsif(  m[\G//.*]gc  ) {
    # skip C++ comments

score 2 · Accepted Answer

这实际上只是 Baz 的直接 grep（或 /(foo.bar.| )Baz/，如果您担心来自 some.other.Baz 的误报），但忽略评论，不是吗？

如果是这样，我会拼凑一个状态引擎来跟踪您是否在多行评论中。所需的正则表达式没什么特别的。类似于（未经测试的代码）：

my $in_comment;
my %matches;
my $line_num = 0;
my $full_target = 'foo.bar.Baz';
my $short_target = (split /\./, $full_target)[-1];  # segment after last . (Baz)

while (my $line = <F>) {
    $line_num++;
    if ($in_comment) {
        next unless $line =~ m|\*/|;  # ignore line unless it ends the comment
        $line =~ s|.*\*/||;           # delete everything prior to end of comment
    } elsif ($line =~ m|/\*|) {
        if ($line =~ m|\*/|) {        # catch /* and */ on same line
            $line =~ s|/\*.*\*/||;
        } else {
            $in_comment = 1;
            $line =~ s|/\*.*||;       # clear from start of comment to end of line
        }
    }

    $line =~ s/\\\\.*//;   # remove single-line comments
    $matches{$line_num} = $line if $line =~ /$full_target| $short_target/;
}

for my $key (sort keys %matches) {
    print $key, ': ', $matches{$key}, "\n";
}

它并不完美，嵌套的多行注释或同一行上有多个多行注释可能会弄乱注释的输入/输出状态，但这对于大多数实际情况来说可能已经足够了。

要在没有状态引擎的情况下执行此操作，您需要 slurp 到单个字符串中，删除 / ... / 注释，并将其拆分回单独的行，然后 grep 那些非//-注释命中。但是您将无法以这种方式在输出中包含行号。

score 2 · Accepted Answer

这就是我想出的适用于我抛出的所有不同情况的方法。我仍然是 Perl 菜鸟，它可能不是世界上最快的东西，但它应该可以满足我的需要。感谢他们帮助我以不同方式看待它的所有答案。

  my $className = 'Baz';
  my $searchPkg = 'foo.bar';
  my @potentialRefs, my @confirmedRefs;
  my $samePkg = 0;
  my $imported = 0;
  my $currentPkg = 'com.bob';
  $currentPkg =~ s/\//\./g;
  if($currentPkg eq $searchPkg){
    $samePkg = 1;  
  }
  my $inMultiLineComment = 0;
  open F, $_ or die;
  my $lineNum = 0;
  while(<F>){
    $lineNum++;
    if($inMultiLineComment){
      if(m|^.*?\*/|){
        s|^.*?\*/||; #get rid of the closing part of the multiline comment we're in
        $inMultiLineComment = 0;
      }else{
        next;
      }
    }
    if(length($_) > 0){
      s|"([^"\\]*(\\.[^"\\]*)*)"||g; #remove strings first since java cannot have multiline string literals
      s|/\*.*?\*/||g;  #remove any multiline comments that start and end on the same line
      s|//.*$||;  #remove the // comments from what's left
      if (m|/\*.*$|){
        $inMultiLineComment = 1 ;#now if you have any occurence of /* then then at least some of the next line is in the multiline comment
        s|/\*.*$||g;
      }
    }else{
      next; #no sense continuing to process a blank string
    }

    if (/^\s*(import )?($searchPkg)?(.*)?\b$className\b/){
      if($imported || $samePkg){
        push(@confirmedRefs, $lineNum);
      }else {
        push(@potentialRefs, $lineNum);
      }
      if($1){
        $imported = 1;
      } elsif($2){
        push(@confirmedRefs, $lineNum);
      }
    }
  }
  close F;      
  if($imported){
    push(@confirmedRefs,@potentialRefs);
  }

  for (@confirmedRefs){
    print "$_\n";
  }

score 1 · Accepted Answer

1

如果你有足够的冒险精神，你可以看看Parse::RecDescent。

于 2008-09-29T17:31:45.607 回答

java - 如何使用 Perl 识别对 Java 类的引用？

5 回答 5

Related

Reference