5

我正在从文件中读取行

for (line <- Source.fromFile("test.txt").getLines) {
  ....
}

我基本上想在最后得到一个段落列表。如果一行是空的,则作为一个新段落开始,我将来可能想解析一些关键字-值对。

文本文件包含这样的条目列表(或类似的东西,如 Ini 文件)

User=Hans
Project=Blow up the moon
The slugs are going to eat the mustard. // multiline possible!
They are sneaky bastards, those slugs. 

User=....

我基本上想要一个 List[Project]Project看起来像

class Project (val User: String, val Name:String, val Desc: String) {}

描述是一大块不以 开头的文本<keyword>=,但可以跨越任意数量的行。

我知道如何以迭代的方式做到这一点。只需对关键字进行检查列表,然后填充类的实例,然后将其添加到列表中以便稍后返回。

但我认为应该有可能以适当的函数式样式执行此操作,可能使用match case, yield和递归,从而产生具有字段的对象列表,User依此Project类推。使用的类是已知的,所有关键字也是已知的,文件格式也不是一成不变的。我主要是想学习更好的功能风格。

4

5 回答 5

8

你显然在解析一些东西,所以可能是时候使用...解析器了!

由于您的语言似乎将换行符视为重要,因此您需要参考这个问题来告诉解析器。

除此之外,一个相当简单的实现将是

import scala.util.parsing.combinator.RegexParsers

case class Project(user: String, name: String, description: String)

object ProjectParser extends RegexParsers {
  override val whiteSpace = """[ \t]+""".r

  def eol : Parser[String] = """\r?\n""".r

  def user: Parser[String] = "User=" ~> """[^\n]*""".r <~ eol
  def name: Parser[String] = "Project=" ~> """[^\n]*""".r <~ eol
  def description: Parser[String] = repsep("""[^\n]+""".r, eol) ^^ { case l => l.mkString("\n") }
  def project: Parser[Project] = user ~ name ~ description ^^ { case a ~ b ~ c => Project(a, b, c) }
  def projects: Parser[List[Project]] = repsep(project,eol ~ eol)
}

以及如何使用它:

val sample = """User=foo1
Project=bar1
desc1
desc2
desc3

User=foo
Project=bar
desc4 desc5 desc6
desc7 desc8 desc9"""

import scala.util.parsing.input._
val reader = new CharSequenceReader(sample)
val res = ProjectParser.parseAll(ProjectParser.projects, reader)
if(res.successful) {
    print("Found projects: " + res.get)
} else {
    print(res)
}
于 2013-05-13T12:01:14.513 回答
1

另一种可能的实现(因为这个解析器相当简单),使用递归:

import scala.io.Source
case class Project(user: String, name: String, desc: String)
@scala.annotation.tailrec
def parse(source: Iterator[String], list: List[Project] = Nil): List[Project] = {
  val emptyProject = Project("", "", "")
  @scala.annotation.tailrec
  def parseProject(project: Option[Project] = None): Option[Project] = {
    if(source.hasNext) {
      val line = source.next
      if(!line.isEmpty) {
        val splitted = line.span(_ != '=')
        parseProject(splitted match {
          case (h, t) if h == "User" => project.orElse(Some(emptyProject)).map(_.copy(user = t.drop(1)))
          case (h, t) if h == "Project" => project.orElse(Some(emptyProject)).map(_.copy(name = t.drop(1)))
          case _ => project.orElse(Some(emptyProject)).map(project => project.copy(desc = (if(project.desc.isEmpty) "" else project.desc ++ "\n") ++ line))
        })
      } else project
    } else project
  }

  if(source.hasNext) {
    parse(source, parseProject().map(_ :: list).getOrElse(list))
  } else list.reverse
}

和测试:

object Test {
  def source = Source.fromString("""User=Hans
Project=Blow up the moon
The slugs are going to eat the mustard. // multiline possible!
They are sneaky bastards, those slugs.

User=Plop
Project=SO
Some desc""")

  def test = println(parse(source.getLines))
}

这使:

List(Project(Hans,Blow up the moon,The slugs are going to eat the mustard. // multiline possible!
They are sneaky bastards, those slugs.), Project(Plop,SO,Some desc))
于 2013-05-13T12:32:42.650 回答
1

要在不处理关键字解析的情况下回答您的问题,请折叠行并聚合行,除非它是空行,在这种情况下您开始一个新的空段落。

lines.foldLeft(List("")) { (l, x) => 
    if (x.isEmpty) "" :: l else (l.head + "\n" + x) :: l.tail  
} reverse

您会注意到它在处理零行以及多行和尾随空行的方式上有一些皱纹。适应您的需求。此外,如果您对字符串连接不感兴趣,您可以将它们收集在嵌套列表中并在最后展平(使用 .map(_.mkString)),这只是为了展示将序列折叠为标量而不是折叠的基本技术一个新的序列。

这会以相反的顺序构建一个列表,因为列表前置 (::) 比在每个步骤中附加到 l 更有效。

于 2013-05-14T14:15:30.530 回答
1

你显然正在建造一些东西,所以你可能想尝试......一个建造者!

和 Jürgen 一样,我的第一个想法是弃牌,你正在累积一个结果。

mutable.Builder 可变地进行累积,使用 collection.generic.CanBuildFrom 指示构建器用于从源集合生成目标集合。您将可变事物保留足够长的时间以获得结果。这就是我的本地化可变性插件。以免假设从 List[String] 到 List[Project] 的路径是不可变的。

对于其他好的答案(具有非负评价评级的答案),我要补充一点,功能风格意味着功能分解,通常是小功能。

如果您不使用正则表达式解析器,请不要忽略模式匹配中的正则表达式。

并尽量避免点。事实上,我相信明天是 Spare the Dots Day,建议对点敏感的人留在室内。

case class Project(user: String, name: String, description: String)

trait Sample {
  val sample = """
  |User=Hans
  |Project=Blow up the moon
  |The slugs are going to eat the mustard. // multiline possible!
  |They are sneaky bastards, those slugs. 
  |
  |User=Bob
  |I haven't thought up a project name yet.
  |
  |User=Greta
  |Project=Burn the witch
  |It's necessary to escape from the witch before
  |we blow up the moon.  I hope Hans sees it my way.
  |Once we burn the bitch, I mean witch, we can
  |wreak whatever havoc pleases us.
  |""".stripMargin
}

object Test extends App with Sample {
  val kv = "(.*?)=(.*)".r
  def nonnully(s: String) = if (s == null) "" else s + " "
  val empty = Project(null, null, null)
  val (res, dummy) = ((List.empty[Project], empty) /: sample.lines) { (acc, line) =>
    val (sofar, cur) = acc
    line match {
      case kv("User", u)    => (sofar, cur copy (user = u))
      case kv("Project", n) => (sofar, cur copy (name = n))
      case kv(k, _)         => sys error s"Bad keyword $k"
      case x if x.nonEmpty  => (sofar, cur copy (description = s"${nonnully(cur.description)}$x"))
      case _ if cur != empty => (cur :: sofar, empty)
      case _                => (sofar, empty)
    }
  }
  val ps = if (dummy == empty) res.reverse else (dummy :: res).reverse
  Console println ps
}

比赛也可以这样捣碎:

  val (res, dummy) = ((List.empty[Project], empty) /: sample.lines) {
    case ((sofar, cur), kv("User", u))     => (sofar, cur copy (user = u))
    case ((sofar, cur), kv("Project", n))  => (sofar, cur copy (name = n))
    case ((sofar, cur), kv(k, _))          => sys error s"Bad keyword $k"
    case ((sofar, cur), x) if x.nonEmpty   => (sofar, cur copy (description = s"${nonnully(cur.description)}$x"))
    case ((sofar, cur), _) if cur != empty => (cur :: sofar, empty)
    case ((sofar, cur), _)                 => (sofar, empty)
  }

在折叠之前,先做段落似乎更简单。这是必然的想法吗?

object Test0 extends App with Sample {
  def grafs(ss: Iterator[String]): List[List[String]] = {
    val (g, rest) = ss dropWhile (_.isEmpty) span (_.nonEmpty)
    val others = if (rest.nonEmpty) grafs(rest) else Nil
    g.toList :: others
  }
  def toProject(ss: List[String]): Project = {
    var p = Project("", "", "")
    for (line <- ss; parts = line split '=') parts match {
      case Array("User", u)    => p = p.copy(user = u)
      case Array("Project", n) => p = p.copy(name = n)
      case Array(k, _)         => sys error s"Bad keyword $k"
      case Array(text)         => p = p.copy(description = s"${p.description} $text")
    }
    p
  }
  val ps = grafs(sample.lines) map toProject
  Console println ps
}
于 2013-05-19T08:57:52.763 回答
-1
class Project (val User: String, val Name:String, val Desc: String) {}
object Project {
  def apply(str: String): Project = {
    val user = somehowFetchUserName(str)
    val name = somehowFetchProjectName(str)
    val desc = somehowFetchDescription(str)
    new Project(user, name, desc)
  }
}

val contents: Array[String] = Source.fromFile("test.txt").mkString.split("\\n\\n")
val list = contents map(Project(_))

最终将得到项目列表。

于 2013-05-07T21:55:50.080 回答