scala - Extract source comments from a Scala source file

Question

I would like to programmatically extract the code comments from a Scala source file.

I have access to both the source file and objects of the classes whose comments I am interested in. I am also open to writing the comments in the Scala source file in a specific form to facilitate extraction (though still following Scaladoc conventions).

Specifically, I am not looking for HTML or similar output.

However, a json object I can then traverse to get the comments for each field would be perfectly fine (though json is not a requirement). Ideally, I would like to get a class or class member comment given its "fully qualified" name or an object of the class.

How do I best do this? I am hoping for a solution that is maintainable (without too much effort) from Scala 2.11 to Scala 3.

Appreciate all help!

score 1 · Accepted Answer

我可以访问两个源文件

通过这个我假设你有文件的路径，我将在我的代码中表示为：

val pathToFile: String = ???

TL;博士

import scala.io.Source

def comments(pathToFile: String): List[String] = {
  def lines: Iterator[(String, Int)] = Source.fromFile(pathToFile).getLines().zipWithIndex

  val singleLineJavaDocStartAndEnds = lines.filter {
    case (line, lineNumber) => line.contains("/*") && line.contains("*/")
  }.map { case (line, _) => line }

  val javaDocComments = lines.filter {
    case (line, lineNumber) =>
      (line.contains("/*") && !line.contains("*/")) ||
      (!line.contains("/*") && line.contains("*/"))
  }
  .grouped(2).map {
    case Seq((_, firstLineNumber), (_, secondLineNumber)) =>
      lines
        .map { case (line, _) => line }
        .slice(firstLineNumber, secondLineNumber+1)
        .mkString("\n")
  }

  val slashSlashComments = lines
    .filter { case (line, _) => line.contains("//") }
    .map { case (line, _) => line }

  (singleLineJavaDocStartAndEnds ++ javaDocComments ++ slashSlashComments).toList
}

完整解释

首先要做的是读取文件的内容：

import scala.io.Source

def lines: Iterator[String]  = Source.fromFile(pathToFile).getLines()

// here we preserve new lines, for Windows you may need to replace "\n" with "\r\n
val content: String = lines.mkString("\n")
// where `content` is the whole file as a `String`

如果多次调用，我已经做lines了一个def防止意外结果。lines这是由于的返回类型Source.fromFile以及它如何处理对文件的迭代。此评论在此处添加了解释。由于您正在阅读源代码文件，我认为重新阅读文件是一种安全的操作，不会导致内存或性能问题。

现在我们有了content文件，我们可以开始过滤掉我们不关心的行。另一种看待问题的方式是我们只想保留 - 过滤 - 作为评论的行。

编辑：

正如@jwvh 正确指出的那样，我使用了.trim.startsWith被忽略的评论，例如：

val x = 1 //mid-code-comments

/*fullLineComment*/

为了解决这个问题，我.trim.startsWith用.contains.

对于单行注释，这很简单：

val slashComments: Iterator[String] = lines.filter(line => line.contains("//"))

~~注意.trim上面的调用很重要，因为开发人员经常会启动旨在匹配代码缩进的注释。trim删除字符串开头的任何空白字符。~~现在使用.containswhich 捕获任何带有从任何地方开始的注释的行。

现在我们将归档多行注释，或 JavaDoc；例如（内容不重要）：

/**
 * Class String is special cased within the Serialization Stream Protocol.
 *
 * A String instance is written into an ObjectOutputStream according to
 * .....
 * .....
 */

最安全的做法是细化/*和*/出现的行，并包括其间的所有行：

def lines: Iterator[(String, Int)] = Source.fromFile(pathToFile).getLines().zipWithIndex

val javaDocStartAndEnds: Iterator[(String, Int)] = lines.filter { 
  case (line, lineNumber) => line.contains("/*") || line.contains("*/")
}

.zipWithIndex在每一行旁边给我们一个递增的数字。我们可以使用这些来表示源文件的行号。目前，这将为我们提供一个包含/*和的行列表*/。我们需要将group它们分成 2 组，因为所有这些类型的评论都会有一对匹配的/*和*/。一旦我们有了这些组，我们就可以使用选择从第一个索引到最后一个索引的slice所有索引。lines我们想包括最后一行，所以我们+1对它做一个。

val javaDocComments = javaDocStartAndEnds.grouped(2).map {
  case Seq((_, firstLineNumber), (_, secondLineNumber)) =>
    lines // re-calling `def lines: Iterator[(String, Int)]`
      .map { case (line, _) => line } // here we only care about the `line`, not the `lineNumber`
      .slice(firstLineNumber, secondLineNumber+1)
      .mkString("\n")
  }

最后我们可以结合slashCommentsand javaDocComments：

val comments: List[String] = (slashComments ++ javaDocComments).toList

无论我们加入它们的顺序如何，它们都不会出现在有序列表中。可以在这里进行的改进是在最后保留lineNumber和订购。

我将在顶部包含一个“太长；未阅读”（TL;DR）版本，因此任何人都可以完整复制代码而无需逐步解释。

我怎样才能最好地做到这一点？我希望有一个从 Scala 2.11 到 Scala 3 的可维护（无需太多努力）的解决方案。

我希望我已经回答了您的问题并提供了有用的解决方案。您提到了一个 JSON 文件作为输出。我提供的是List[String]你可以处理的内存。如果需要输出到 JSON，我可以用这个更新我的答案。

scala - Extract source comments from a Scala source file

1 回答 1

TL;博士

完整解释

Related

Reference