regex - 忽略解析器组合器中的任意前缀

Question

在厌倦了正则表达式之后，我一直在尝试使用 scala 的解析器组合库作为正则表达式的更直观的替代品。但是，当我想在一个字符串中搜索一个模式并忽略它之前的东西时，我遇到了一个问题，例如，如果我想检查一个字符串是否包含“章鱼”这个词，我可以做类似的事情

val r = "octopus".r
r.findFirstIn("www.octopus.com")

哪个正确给出Some(octopus)。

但是，使用解析器组合器

import scala.util.parsing.combinator._
object OctopusParser extends RegexParsers {

  def any = regex(".".r)*
  def str = any ~> "octopus" <~ any

  def parse(s: String) = parseAll(str, s) 
}

OctopusParser.parse("www.octopus.com")

但是我对此有一个错误

scala> OctopusParser.parse("www.octopus.com")
res0: OctopusParser.ParseResult[String] = 
[1.16] failure: `octopus' expected but end of source found

www.octopus.com

有没有什么好方法可以做到这一点？从玩耍来看，似乎any吞噬了太多的输入。

score 3 · Accepted Answer

问题是你的“任何”解析器是贪婪的，所以它匹配整行，没有任何东西可供“str”解析。

您可能想尝试以下方法：

object OctopusParser extends RegexParsers {

  def prefix = regex("""[^\.]*\.""".r) // Match on anything other than a dot and then a dot - but only the once
  def postfix = regex("""\..*""".r)* // Grab any number of remaining ".xxx" blocks
  def str = prefix ~> "octopus" <~ postfix

  def parse(s: String) = parseAll(str, s)
}

然后给了我：

scala> OctopusParser.parse("www.octopus.com")
res0: OctopusParser.ParseResult[String] = [1.13] parsed: octopus

您可能需要使用“前缀”来匹配您期望的输入范围，并且可能想要使用“？” 懒惰的标记，如果它太贪婪。

regex - 忽略解析器组合器中的任意前缀

1 回答 1

Related

Reference