7

我正在尝试使用解析器组合器,并且经常遇到看似无限递归的情况。这是我遇到的第一个:

import util.parsing.combinator.Parsers
import util.parsing.input.CharSequenceReader

class CombinatorParserTest extends Parsers {

  type Elem = Char

  def notComma = elem("not comma", _ != ',')

  def notEndLine = elem("not end line", x => x != '\r' && x != '\n')

  def text = rep(notComma | notEndLine)

}

object CombinatorParserTest {

  def main(args:Array[String]): Unit = {
    val p = new CombinatorParserTest()
    val r = p.text(new CharSequenceReader(","))
    // does not get here
    println(r)
  }

}

如何打印正在发生的事情?为什么这还没有完成?

4

3 回答 3

4

记录解析尝试notCommanotEndLine显示它是被重复解析的文件结尾(在 log(...)("mesg") 输出中显示为 CTRL-Z)。以下是我为此目的修改您的解析器的方法:

def text = rep(log(notComma)("notComma") | log(notEndLine)("notEndLine"))

我不完全确定发生了什么(我对您的语法尝试了许多变化),但我认为它是这样的:EOF 并不是人为地引入输入流中的字符,而是一种永久条件输入结束。因此,这个从未使用过的 EOF 伪字符被反复解析为“不是逗号或不是行尾”。

于 2010-03-05T16:53:30.617 回答
2

好的,我想我已经弄清楚了。`CharSequenceReader 返回 '\032' 作为输入结束的标记。因此,如果我像这样修改我的输入,它会起作用:

import util.parsing.combinator.Parsers
import util.parsing.input.CharSequenceReader

class CombinatorParserTest extends Parsers {

  type Elem = Char

  import CharSequenceReader.EofCh

  def notComma = elem("not comma", x => x != ',' && x!=EofCh)

  def notEndLine = elem("not end line", x => x != '\r' && x != '\n' && x!=EofCh)

  //def text = rep(notComma | notEndLine)
  def text = rep(log(notComma)("notComma") | log(notEndLine)("notEndLine"))

}

object CombinatorParserTest {

  def main(args:Array[String]): Unit = {
    val p = new CombinatorParserTest()
    val r = p.text(new CharSequenceReader(","))
    println(r)
  }

}

请参阅CharSequenceReader 此处的源代码。如果scaladoc提到它,它会为我节省很多时间。

于 2010-03-06T02:10:13.517 回答
0

我发现记录功能输入起来非常尴尬。比如我为什么要这样做log(parser)("string")?为什么没有像parser.log("string")? 这样简单的东西。无论如何,为了克服这个问题,我做了这个:

trait Logging { self: Parsers =>

    // Used to turn logging on or off
    val debug: Boolean

    // Much easier than having to wrap a parser with a log function and type a message
    // i.e. log(someParser)("Message") vs someParser.log("Message")
    implicit class Logged[+A](parser: Parser[A]) {
        def log(msg: String): Parser[A] =
            if (debug) self.log(parser)(msg) else parser
    }
}

现在在你的解析器中,你可以像这样混合这个特征:

import scala.util.parsing.combinator.Parsers
import scala.util.parsing.input.CharSequenceReader


object CombinatorParserTest extends App with Parsers with Logging {

    type Elem = Char

    override val debug: Boolean = true

    def notComma: Parser[Char] = elem("not comma", _ != ',')
    def notEndLine: Parser[Char] = elem("not end line", x => x != '\r' && x != '\n')
    def text: Parser[List[Char]] = rep(notComma.log("notComma") | notEndLine.log("notEndLine"))

    val r = text(new CharSequenceReader(","))

    println(r)
}

如果需要,您还可以覆盖该debug字段以关闭日志记录。

运行它还显示第二个解析器正确解析了逗号:

trying notComma at scala.util.parsing.input.CharSequenceReader@506e6d5e
notComma --> [1.1] failure: not comma expected

,
^
trying notEndLine at scala.util.parsing.input.CharSequenceReader@506e6d5e
notEndLine --> [1.2] parsed: ,
trying notComma at scala.util.parsing.input.CharSequenceReader@15975490
notComma --> [1.2] failure: end of input

,
 ^
trying notEndLine at scala.util.parsing.input.CharSequenceReader@15975490
notEndLine --> [1.2] failure: end of input

,
 ^
The result is List(,)

Process finished with exit code 0
于 2017-04-28T17:31:03.357 回答