4

我正在使用 scala.xml.pull 来解析一个不同的大型 xml 文件。这对事件处理很有用,但我想做的是让我的解析器为特定节点咳出一个迷你文档,我看不到这样做的简单方法,或者至少不是“scala”方式。

我在想我构建了一个像这样的 seek 函数,它可以使用迭代器来查找与我的标签匹配的 EvElemStart 事件:

def seek(tag: String) = {
  while (it.hasNext) {
    it.next match {
      case EvElemStart(_, `tag`, _, _) => 

之后我就不太清楚了。有没有一种简单的方法可以将此标记的所有子项抓取到文档中,而不必遍历 XMLEventReader 弹出的每个事件?

我最终要寻找的是一个扫描文件并为我可以使用正常的scala xml处理处理的特定标签或标签集的每个实例发出一个xml元素(一个Elem?)的过程。

4

2 回答 2

2

这就是我最终做的事情。slurp(tag) 寻找标签的下一个实例并返回该标签的完整节点树。

def slurp(tag: String): Option[Node] = {
  while (it.hasNext) {
    it.next match {
      case EvElemStart(pre, `tag`, attrs, _) => return Some(subTree(tag, attrs))
      case _ => 
    }
  }
  return None
}

def subTree(tag: String, attrs: MetaData): Node = {
  var children = List[Node]()

  while (it.hasNext) {
    it.next match {
      case EvElemStart(_, t, a, _) => {
        children = children :+ subTree(t, a)
      }
      case EvText(t) => {
        children = children :+ Text(t)
      }
      case EvElemEnd(_, t) => {
        return new Elem(null, tag, attrs, xml.TopScope, children: _*)
      }
      case _ =>
    }
  }
  return null   // this shouldn't happen with good XML
}
于 2012-12-03T02:41:40.490 回答
2

根据Jim Baldwin的回答,我创建了一个迭代器,它获取特定级别的节点(而不是特定标签):

import scala.io.Source
import scala.xml.parsing.FatalError
import scala.xml.{Elem, MetaData, Node, Text, TopScope}
import scala.xml.pull.{EvElemEnd, EvElemStart, EvText, XMLEventReader}


/**
  * Streaming XML parser which yields Scala XML Nodes.
  *
  * Usage:
  *
  * val it = new XMLNodeIterator(pathToXML, 1)
  *
  * Will give you all book-nodes of
  *
  * <?xml version="1.0" encoding="UTF-8"?>
  * <books>
  *     <book>
  *         <title>A book title</title>
  *     </book>
  *     <book>
  *         <title>Another book title</title>
  *     </book>
  * </books>
  *
  */
class StreamingXMLParser(filename: String, wantedNodeLevel: Int) extends Iterator[Node] {
    val file = Source.fromFile(filename)
    val it = new XMLEventReader(file)
    var currentLevel = 0
    var nextEvent = it.next // peek into next event

    def getNext() = {
        val currentEvent = nextEvent
        nextEvent = it.next
        currentEvent
    }

    def hasNext = {
        while (it.hasNext && !nextEvent.isInstanceOf[EvElemStart]) {
            getNext() match {
                case EvElemEnd(_, _) => {
                    currentLevel -= 1
                }
                case _ => // noop
            }
        }
        it.hasNext
    }

    def next: Node = {
        if (!hasNext) throw new NoSuchElementException

        getNext() match {
            case EvElemStart(pre, tag, attrs, _) => {
                if (currentLevel == wantedNodeLevel) {
                    currentLevel += 1
                    getElemWithChildren(tag, attrs)
                }
                else {
                    currentLevel += 1
                    next
                }
            }
            case EvElemEnd(_, _) => {
                currentLevel -= 1
                next
            }
            case _ => next
        }
    }

    def getElemWithChildren(tag: String, attrs: MetaData): Node = {
        var children = List[Node]()

        while (it.hasNext) {
            getNext() match {
                case EvElemStart(_, t, a, _) => {
                    currentLevel += 1
                    children = children :+ getElemWithChildren(t, a)
                }
                case EvText(t) => {
                    children = children :+ Text(t)
                }
                case EvElemEnd(_, _) => {
                    currentLevel -= 1
                    return new Elem(null, tag, attrs, TopScope, true, children: _*)
                }
                case _ =>
            }
        }
        throw new FatalError("Failed to parse XML.")
    }
}
于 2018-05-04T12:17:08.860 回答