基本上,它很慢并且消耗太多内存,因为您的语法效率非常低。
让我们考虑第二行:B = A:(1+2)
. 它将尝试像这样解析这一行:
- term4
OR
term4 然后是 term4。
- term3
AND
term3 然后是 term3。
- 术语 2
<>
术语 2,然后=
,然后NE
,EQ
然后是术语 2。
- term1 8 个不同的运算符 term1,然后是 term1。
- 术语
+
术语,术语-
术语,术语:
术语,然后术语。
- 因子
*
因子,因子/
因子,因子MOD
因子,再因子。
- 括号表达式、一元因子、函数、数字文字、字符串文字、标识。
第一次尝试是这样的:
ident * factor + term < term1 <> term2 AND term3 OR term4
我跳过括号、一元、函数、数字和字符串文字,因为它们不匹配A
——尽管function
可能匹配,但它的定义不可用。现在,由于:
不匹配*
,它将尝试下一个:
ident / factor + term < term1 <> term2 AND term3 OR term4
ident MOD factor + term < term1 <> term2 AND term3 OR term4
ident + term < term1 <> term2 AND term3 OR term4
现在它进入下一个term1
:
ident * factor - term < term1 <> term2 AND term3 OR term4
ident / factor - term < term1 <> term2 AND term3 OR term4
ident MOD factor - term < term1 <> term2 AND term3 OR term4
ident - term < term1 <> term2 AND term3 OR term4
接下来:
ident * factor : term < term1 <> term2 AND term3 OR term4
ident / factor : term < term1 <> term2 AND term3 OR term4
ident MOD factor : term < term1 <> term2 AND term3 OR term4
ident : term < term1 <> term2 AND term3 OR term4
啊哈!我们终于在第 1 学期得到了一场比赛!但是(
不匹配<
,所以它必须尝试下一个term2:
ident * factor + term > term1 <> term2 AND term3 OR term4
ETC...
这一切都是因为每个术语的每行中的第一个术语将始终匹配!要匹配一个简单的数字,它必须解析factor
2 * 2 * 5 * 9 * 4 * 4 = 2880 次!
但这还不是故事的一半!你看,因为 termX 重复了两次,它会在两边重复所有这些东西。例如,第一个匹配A:(1+2)
是这样的:
ident : term < term1 <> term2 AND term3 OR term4
where ident = A
and term = (1+2)
这是不正确的,所以它会尝试>
代替<
, 然后<=
, 等等等等。
我在下面放了这个解析器的日志版本。尝试运行它并查看它试图解析的所有内容。
同时,有很好的例子说明如何编写这些可用的解析器。使用sbaz
,尝试:
sbaz install scala-devel-docs
然后查看doc/scala-devel-docs/examples/parsing
Scala 发行版的目录,您会发现几个示例。
这是您的解析器的一个版本(没有function
),它记录了它尝试的所有内容:
sealed trait Expression
case class Variable(id: String) extends Expression
case class Literal(s: String) extends Expression
case class Number(x: String) extends Expression
case class UnaryOp(op: String, rhs: Expression) extends Expression
case class BinaryOp(op: String, lhs: Expression, rhs: Expression) extends Expression
object TestParser extends scala.util.parsing.combinator.syntactical.StdTokenParsers {
import scala.util.parsing.combinator.lexical.StdLexical
type Tokens = StdLexical
val lexical = new StdLexical
lexical.delimiters ++= List("(", ")", "+", "-", "*", "/", "=", "OR", "AND", "NE", "EQ", "LT", "GT", "LE", "GE", ":", "MOD")
def stmts: Parser[Any] = log(expr.*)("stmts")
def stmt: Parser[Expression] = log(expr <~ "\n")("stmt")
def expr: Parser[Expression] = log(term5)("expr")
def term5: Parser[Expression] = (
log((term4 ~ "OR" ~ term4) ^^ { case lhs~o~rhs => BinaryOp("OR", lhs, rhs) })("term5 OR")
| log(term4)("term5 term4")
)
def term4: Parser[Expression] = (
log((term3 ~ "AND" ~ term3) ^^ { case lhs~a~rhs => BinaryOp("AND", lhs, rhs) })("term4 AND")
| log(term3)("term4 term3")
)
def term3: Parser[Expression] = (
log((term2 ~ "<>" ~ term2) ^^ { case lhs~ne~rhs => BinaryOp("NE", lhs, rhs) })("term3 <>")
| log((term2 ~ "=" ~ term2) ^^ { case lhs~eq~rhs => BinaryOp("EQ", lhs, rhs) })("term3 =")
| log((term2 ~ "NE" ~ term2) ^^ { case lhs~ne~rhs => BinaryOp("NE", lhs, rhs) })("term3 NE")
| log((term2 ~ "EQ" ~ term2) ^^ { case lhs~eq~rhs => BinaryOp("EQ", lhs, rhs) })("term3 EQ")
| log(term2)("term3 term2")
)
def term2: Parser[Expression] = (
log((term1 ~ "<" ~ term1) ^^ { case lhs~lt~rhs => BinaryOp("LT", lhs, rhs) })("term2 <")
| log((term1 ~ ">" ~ term1) ^^ { case lhs~gt~rhs => BinaryOp("GT", lhs, rhs) })("term2 >")
| log((term1 ~ "<=" ~ term1) ^^ { case lhs~le~rhs => BinaryOp("LE", lhs, rhs) })("term2 <=")
| log((term1 ~ ">=" ~ term1) ^^ { case lhs~ge~rhs => BinaryOp("GE", lhs, rhs) })("term2 >=")
| log((term1 ~ "LT" ~ term1) ^^ { case lhs~lt~rhs => BinaryOp("LT", lhs, rhs) })("term2 LT")
| log((term1 ~ "GT" ~ term1) ^^ { case lhs~gt~rhs => BinaryOp("GT", lhs, rhs) })("term2 GT")
| log((term1 ~ "LE" ~ term1) ^^ { case lhs~le~rhs => BinaryOp("LE", lhs, rhs) })("term2 LE")
| log((term1 ~ "GE" ~ term1) ^^ { case lhs~ge~rhs => BinaryOp("GE", lhs, rhs) })("term2 GE")
| log(term1)("term2 term1")
)
def term1: Parser[Expression] = (
log((term ~ "+" ~ term) ^^ { case lhs~plus~rhs => BinaryOp("+", lhs, rhs) })("term1 +")
| log((term ~ "-" ~ term) ^^ { case lhs~minus~rhs => BinaryOp("-", lhs, rhs) })("term1 -")
| log((term ~ ":" ~ term) ^^ { case lhs~concat~rhs => BinaryOp(":", lhs, rhs) })("term1 :")
| log(term)("term1 term")
)
def term: Parser[Expression] = (
log((factor ~ "*" ~ factor) ^^ { case lhs~times~rhs => BinaryOp("*", lhs, rhs) })("term *")
| log((factor ~ "/" ~ factor) ^^ { case lhs~div~rhs => BinaryOp("/", lhs, rhs) })("term /")
| log((factor ~ "MOD" ~ factor) ^^ { case lhs~div~rhs => BinaryOp("MOD", lhs, rhs) })("term MOD")
| log(factor)("term factor")
)
def factor: Parser[Expression] = (
log("(" ~> expr <~ ")")("factor (expr)")
| log(("+" | "-") ~ factor ^^ { case op~rhs => UnaryOp(op, rhs) })("factor +-")
//| function |
| log(numericLit ^^ { case x => Number(x/*.toFloat*/) })("factor numericLit")
| log(stringLit ^^ { case s => Literal(s) })("factor stringLit")
| log(ident ^^ { case id => Variable(id) })("factor ident")
)
def parse(s: String) = stmts(new lexical.Scanner(s))
}