1

我需要从一段纯文本以及应该插入的每个 XML 元素的开始和结束偏移量创建一个 XML 文档。以下是一些我希望它通过的测试用例:

val text = "The dog chased the cat."
val spans = Seq(
    (0, 23, <xml/>),
    (4, 22, <phrase/>),
    (4, 7, <token/>))
val expected = <xml>The <phrase><token>dog</token> chased the cat</phrase>.</xml>
assert(expected === spansToXML(text, spans))

val text = "aabbccdd"
val spans = Seq(
    (0, 8, <xml x="1"/>),
    (0, 4, <ab y="foo"/>),
    (4, 8, <cd z="42>3"/>))
val expected = <xml x="1"><ab y="foo">aabb</ab><cd z="42>3">ccdd</cd></xml>
assert(expected === spansToXML(text, spans))

val spans = Seq(
    (0, 1, <a/>),
    (0, 0, <b/>),
    (0, 0, <c/>),
    (1, 1, <d/>),
    (1, 1, <e/>))
assert(<a><b/><c/> <d/><e/></a> === spansToXML(" ", spans))

我的部分解决方案(请参阅下面的答案)通过字符串连接和XML.loadString. 这看起来很老套,我也不是 100% 确定这个解决方案在所有极端情况下都能正常工作......

有更好的解决方案吗?(对于它的价值,如果这会使这项任务更容易,我很乐意切换到anti-xml 。)

2011 年 8 月 10 日更新以添加更多测试用例并提供更清晰的规范。

4

5 回答 5

3

鉴于您提出的赏金,我研究了您的问题一段时间,并提出了以下解决方案,该解决方案在您的所有测试用例上都成功。我真的很想让我的答案被接受——如果我的解决方案有什么问题,请告诉我。

一些评论:如果您想弄清楚执行期间发生了什么,我将注释掉的 print 语句留在了里面。除了您的规范之外,我确实保留了他们现有的孩子(如果有的话) - 有一个评论是这样做的。

我没有手动构建 XML 节点,我修改了传入的节点。为了避免拆分开始和结束标记,我不得不对算法进行了很多更改,但是排序的想法跨越begin-end来自您的解决方案。

代码有点高级 Scala,尤其是当我构建Orderings我需要的不同代码时。我确实从我得到的第一个版本稍微简化了它。

SortedMap我避免通过使用 a和在提取后过滤间隔来创建表示间隔的树。这种选择有些欠佳。然而,我听说有“更好”的数据结构来表示嵌套区间,比如区间树(它们在计算几何中进行了研究),但它们实现起来非常复杂,我认为这里不需要它。

/**
 * User: pgiarrusso
 * Date: 12/8/2011
 */

import collection.mutable.ArrayBuffer
import collection.SortedMap
import scala.xml._

object SpansToXmlTest {
    def spansToXML(text: String, spans: Seq[(Int, Int, Elem)]) = {
        val intOrdering = implicitly[Ordering[Int]] // Retrieves the standard ordering on Ints.

        // Sort spans decreasingly on begin and increasingly on end and their label - this processes spans outwards.
        // The sorting on labels matches the given examples.
        val spanOrder = Ordering.Tuple3(intOrdering.reverse, intOrdering, Ordering.by((_: Elem).label))

        //Same sorting, excluding labels.
        val intervalOrder = Ordering.Tuple2(intOrdering.reverse, intOrdering)
        //Map intervals of the source string to the sequence of nodes which match them - it is a sequence because
        //multiple spans for the same interval are allowed.
        var intervalMap = SortedMap[(Int, Int), Seq[Node]]()(intervalOrder)

        for ((start, end, elem) <- spans.sorted(spanOrder)) {
            //Only nested intervals. Interval nesting is a partial order, therefore we cannot use the filter function as an ordering for intervalMap, even if it would be nice.
            val nestedIntervalsMap = intervalMap.until((start, end)).filter(_ match {
                case ((intStart, intEnd), _) => start <= intStart && intEnd <= end
            })
            //println("intervalMap: " + intervalMap)
            //println("beforeMap: " + nestedIntervalsMap)

            //We call sorted to use a standard ordering this time.
            val before = nestedIntervalsMap.keys.toSeq.sorted

            // text.slice(start, end) must be split into fragments, some of which are represented by text node, some by
            // already computed xml nodes.
            val intervals = start +: (for {
                (intStart, intEnd) <- before
                boundary <- Seq(intStart, intEnd)
            } yield boundary) :+ end

            var xmlChildren = ArrayBuffer[Node]()
            var useXmlNode = false

            for (interv <- intervals.sliding(2)) {
                val intervStart = interv(0)
                val intervEnd = interv(1)
                xmlChildren.++=(
                    if (useXmlNode)
                        intervalMap((intervStart, intervEnd)) //Precomputed nodes
                    else
                        Seq(Text(text.slice(intervStart, intervEnd))))
                useXmlNode = !useXmlNode //The next interval will be of the opposite kind.
            }
            //Remove intervals that we just processed
            intervalMap = intervalMap -- before

            // By using elem.child, you also preserve existing xml children. "elem.child ++" can be also commented out.
            var tree = elem.copy(child = elem.child ++ xmlChildren)
            intervalMap += (start, end) -> (intervalMap.getOrElse((start, end), Seq.empty) :+ tree)
            //println(tree)
        }
        intervalMap((0, text.length)).head
    }

    def test(text: String, spans: Seq[(Int, Int, Elem)], expected: Node) {
        val res = spansToXML(text, spans)
        print("Text: \"%s\", expected:\n%s\nResult:\n%s\n\n" format (text, expected, res))
        assert(expected == res)
    }
    def test1() =
        test(
            text = "The dog chased the cat.",
            spans = Seq(
                (0, 23, <xml/>),
                (4, 22, <phrase/>),
                (4, 7, <token/>)),
            expected = <xml>The <phrase><token>dog</token> chased the cat</phrase>.</xml>
        )

    def test2() =
        test(
            text = "aabbccdd",
            spans = Seq(
                (0, 8, <xml x="1"/>),
                (0, 4, <ab y="foo"/>),
                (4, 8, <cd z="42>3"/>)),
            expected = <xml x="1"><ab y="foo">aabb</ab><cd z="42>3">ccdd</cd></xml>
        )

    def test3() =
        test(
            text = " ",
            spans = Seq(
                (0, 1, <a/>),
                (0, 0, <b/>),
                (0, 0, <c/>),
                (1, 1, <d/>),
                (1, 1, <e/>)),
            expected = <a><b/><c/> <d/><e/></a>
        )

    def main(args: Array[String]) {
        test1()
        test2()
        test3()
    }
}
于 2011-08-12T22:08:28.093 回答
2

这很有趣!

我采取了类似于史蒂夫的方法。通过对“开始标签”和“结束标签”中的元素进行排序,然后计算放置它们的位置。

我厚颜无耻地从 Blaisorblade 那里窃取了测试,并添加了一些帮助我开发代码的更多内容。

编辑于 2011-08-14

我对在 test-5 中插入空标签的方式感到不满。然而,这个位置是如何制定 test-3 的结果

  • 即使空标签 (c,d) 在 spans 列表中的跨越标签 (a) 和 c,d-tags 具有与 a 的结束标签相同的插入点,c,d 标签也会进入 a。这使得很难在可能有用的跨标签之间放置空标签。

所以,我稍微改变了一些测试并提供了一个替代解决方案。

在替代解决方案中,我以相同的方式启动,但有 3 个单独的列表,开始、空和结束标记。而不仅仅是排序,我还有第三步,将空标签放入标签列表中。

第一个解决方案:

import xml.{XML, Elem, Node}
import annotation.tailrec

object SpanToXml {
    def spansToXML(text: String, spans: Seq[(Int, Int, Elem)]): Node = {
        // Create a Seq of elements, sorted by where it should be inserted
        //  differentiate start tags ('s) and empty tags ('e)
        val startElms = spans sorted Ordering[Int].on[(Int, _, _)](_._1) map {
            case e if e._1 != e._2 => (e._1, e._3, 's)
            case e => (e._1, e._3, 'e)
        }
        //Create a Seq of closing tags ('c), sorted by where they should be inserted
        // filter out all empty tags
        val endElms = spans.reverse.sorted(Ordering[Int].on[(_, Int, _)](_._2))
            .filter(e => e._1 != e._2)
            .map(e => (e._2, e._3, 'c))

        //Combine the Seq's and sort by insertion point
        val elms = startElms ++ endElms sorted Ordering[Int].on[(Int, _, _)](_._1)
        //The sorting need to be refined
        // - end tag's need to come before start tag's if the insertion point is thesame
        val sorted = elms.sortWith((a, b) => a._1 == b._1 && a._3 == 'c && b._3 == 's )

        //Adjust the insertion point to what it should be in the final string
        // then insert the tags into the text by folding left
        // - there are different rules depending on start, empty or close
        val txt = adjustInset(sorted).foldLeft(text)((tx, e) => {
            val s = tx.splitAt(e._1)
            e match {
                case (_, elem, 's) => s._1 + "<" + elem.label + elem.attributes + ">" + s._2
                case (_, elem, 'e) => s._1 + "<" + elem.label + elem.attributes + "/>" + s._2
                case (_, elem, 'c) => s._1 + "</" + elem.label + ">" + s._2
            }
        })
        //Sanity check
        //println(txt)

        //Convert to XML
        XML.loadString(txt)
    }

    def adjustInset(elems: Seq[(Int, Elem, Symbol)]): Seq[(Int, Elem, Symbol)] = {
        @tailrec
        def adjIns(elems: Seq[(Int, Elem, Symbol)], tmp: Seq[(Int, Elem, Symbol)]): Seq[(Int, Elem, Symbol)] =
            elems match {
                case t :: Nil => tmp :+ t
                case t :: ts => {
                    //calculate offset due to current element
                    val offset = t match {
                        case (_, e, 's) => e.label.size + e.attributes.toString.size + 2
                        case (_, e, 'e) => e.label.size + e.attributes.toString.size + 3
                        case (_, e, 'c) => e.label.size + 3
                    }
                    //add offset to all elm's in tail, and recurse
                    adjIns(ts.map(e => (e._1 + offset, e._2, e._3)), tmp :+ t)
                }
            }

            adjIns(elems, Nil)
    }

    def test(text: String, spans: Seq[(Int, Int, Elem)], expected: Node) {
        val res = spansToXML(text, spans)
        print("Text: \"%s\", expected:\n%s\nResult:\n%s\n\n" format (text, expected, res))
        assert(expected == res)
    }

    def test1() =
        test(
            text = "The dog chased the cat.",
            spans = Seq(
                (0, 23, <xml/>),
                (4, 22, <phrase/>),
                (4, 7, <token/>)),
            expected = <xml>The <phrase><token>dog</token> chased the cat</phrase>.</xml>
        )

    def test2() =
        test(
            text = "aabbccdd",
            spans = Seq(
                (0, 8, <xml x="1"/>),
                (0, 4, <ab y="foo"/>),
                (4, 8, <cd z="42>3"/>)),
            expected = <xml x="1"><ab y="foo">aabb</ab><cd z="42>3">ccdd</cd></xml>
        )

    def test3() =
        test(
            text = " ",
            spans = Seq(
                (0, 1, <a/>),
                (0, 0, <b/>),
                (0, 0, <c/>),
                (1, 1, <d/>),
                (1, 1, <e/>)),
            expected = <a><b/><c/> <d/><e/></a>
        )

    def test4() =
        test(
            text = "aabbccdd",
            spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab/>),
                        (4, 8, <cd/>),
                        (4, 6, <ok/>)),
            expected = <xml><ab>aabb</ab><cd><ok>cc</ok>dd</cd></xml>
        )

    def test5() =
        test(
            text = "aabbccdd",
            spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab/>),
                        (2, 4, <b/>),
                        (4, 4, <empty/>),
                        (4, 8, <cd/>),
                        (4, 6, <ok/>)),
            expected = <xml><ab>aa<b>bb<empty/></b></ab><cd><ok>cc</ok>dd</cd></xml>
        )

    def test6() =
        test(
            text = "aabbccdd",
            spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab/>),
                        (2, 4, <b/>),
                        (2, 4, <c/>),
                        (3, 4, <d/>),
                        (4, 8, <cd/>),
                        (4, 6, <ok/>)),
            expected = <xml><ab>aa<b><c>b<d>b</d></c></b></ab><cd><ok>cc</ok>dd</cd></xml>
        )

    def test7() =
        test(
            text = "aabbccdd",
            spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab a="a" b="b"/>),
                        (4, 8, <cd c="c" d="d"/>)),
            expected = <xml><ab a="a" b="b">aabb</ab><cd c="c" d="d">ccdd</cd></xml>
        )

    def invalidSpans() = {
        val text = "aabbccdd"
        val spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab/>),
                        (4, 6, <err/>),
                        (4, 8, <cd/>))
        try {
            val res = spansToXML(text, spans)
            assert(false)
        } catch {
            case e => {
                println("This generate invalid XML:")
                println("<xml><ab>aabb</ab><err><cd>cc</err>dd</cd></xml>")
                println(e.getMessage)
            }
        }
    }

    def main(args: Array[String]) {
        test1()
        test2()
        test3()
        test4()
        test5()
        test6()
        test7()
        invalidSpans()
    }
}

SpanToXml.main(Array())

替代解决方案:

import xml.{XML, Elem, Node}
import annotation.tailrec

object SpanToXmlAlt {
    def spansToXML(text: String, spans: Seq[(Int, Int, Elem)]): Node = {
        // Create a Seq of start tags, sorted by where it should be inserted
        // filter out all empty tags
        val startElms = spans.sorted(Ordering[Int].on[(Int, _, _)](_._1))
            .filterNot(e => e._1 == e._2)
            .map(e => (e._1, e._3, 's))
        //Create a Seq of closing tags, sorted by where they should be inserted
        // filter out all empty tags
        val endElms = spans.reverse.sorted(Ordering[Int].on[(_, Int, _)](_._2))
            .filterNot(e => e._1 == e._2)
            .map(e => (e._2, e._3, 'c))

        //Create a Seq of empty tags, sorted by where they should be inserted
        val emptyElms = spans.sorted(Ordering[Int].on[(Int, _, _)](_._1))
            .filter(e => e._1 == e._2)
            .map(e => (e._1, e._3, 'e))

        //Combine the Seq's and sort by insertion point
        val elms = startElms ++ endElms sorted Ordering[Int].on[(Int, _, _)](_._1)
        //The sorting need to be refined
        // - end tag's need to come before start tag's if the insertion point is the same
        val sorted = elms.sortWith((a, b) => a._1 == b._1 && a._3 == 'c && b._3 == 's )

        //Insert empty tags
        val allSorted = insertEmpyt(spans, sorted, emptyElms) sorted Ordering[Int].on[(Int, _, _)](_._1)
        //Adjust the insertion point to what it should be in the final string
        // then insert the tags into the text by folding left
        // - there are different rules depending on start, empty or close
        val str = adjustInset(allSorted).foldLeft(text)((tx, e) => {
            val s = tx.splitAt(e._1)
            e match {
                case (_, elem, 's) => s._1 + "<" + elem.label + elem.attributes + ">" + s._2
                case (_, elem, 'e) => s._1 + "<" + elem.label + elem.attributes + "/>" + s._2
                case (_, elem, 'c) => s._1 + "</" + elem.label + ">" + s._2
            }
        })
        //Sanity check
        //println(str)
        //Convert to XML
        XML.loadString(str)
    }

    def insertEmpyt(spans: Seq[(Int, Int, Elem)],
        sorted: Seq[(Int, Elem, Symbol)],
        emptys: Seq[(Int, Elem, Symbol)]): Seq[(Int, Elem, Symbol)] = {

        //Find all tags that should be before the empty tag
        @tailrec
        def afterSpan(empty: (Int, Elem, Symbol),
            spans: Seq[(Int, Int, Elem)],
            after: Seq[(Int, Elem, Symbol)]): Seq[(Int, Elem, Symbol)] = {
            var result = after
            spans match {
                case t :: _ if t._1 == empty._1 && t._2 == empty._1 && t._3 == empty._2 => after //break
                case t :: ts if t._1 == t._2 => afterSpan(empty, ts, after :+ (t._1, t._3, 'e))
                case t :: ts => {
                    if (t._1 <= empty._1) result = result :+ (t._1, t._3, 's)
                    if (t._2 <= empty._1) result = result :+ (t._2, t._3, 'c)
                    afterSpan(empty, ts, result)
                }
            }
        }

        //For each empty tag, insert it in the sorted list
        var result = sorted
        emptys.foreach(e => {
            val afterSpans = afterSpan(e, spans, Seq[(Int, Elem, Symbol)]())
            var emptyInserted = false
            result = result.foldLeft(Seq[(Int, Elem, Symbol)]())((res, s) => {
                if (afterSpans.contains(s) || emptyInserted) {
                    res :+ s
                } else {
                    emptyInserted = true
                    res :+ e :+ s
                }
            })
        })
        result
    }

    def adjustInset(elems: Seq[(Int, Elem, Symbol)]): Seq[(Int, Elem, Symbol)] = {
        @tailrec
        def adjIns(elems: Seq[(Int, Elem, Symbol)], tmp: Seq[(Int, Elem, Symbol)]): Seq[(Int, Elem, Symbol)] =
            elems match {
                case t :: Nil => tmp :+ t
                case t :: ts => {
                    //calculate offset due to current element
                    val offset = t match {
                        case (_, e, 's) => e.label.size + e.attributes.toString.size + 2
                        case (_, e, 'e) => e.label.size + e.attributes.toString.size + 3
                        case (_, e, 'c) => e.label.size + 3
                    }
                    //add offset to all elm's in tail, and recurse
                    adjIns(ts.map(e => (e._1 + offset, e._2, e._3)), tmp :+ t)
                }
            }

            adjIns(elems, Nil)
    }

    def test(text: String, spans: Seq[(Int, Int, Elem)], expected: Node) {
        val res = spansToXML(text, spans)
        print("Text: \"%s\", expected:\n%s\nResult:\n%s\n\n" format (text, expected, res))
        assert(expected == res)
    }

    def test1() =
        test(
            text = "The dog chased the cat.",
            spans = Seq(
                (0, 23, <xml/>),
                (4, 22, <phrase/>),
                (4, 7, <token/>)),
            expected = <xml>The <phrase><token>dog</token> chased the cat</phrase>.</xml>
        )

    def test2() =
        test(
            text = "aabbccdd",
            spans = Seq(
                (0, 8, <xml x="1"/>),
                (0, 4, <ab y="foo"/>),
                (4, 8, <cd z="42>3"/>)),
            expected = <xml x="1"><ab y="foo">aabb</ab><cd z="42>3">ccdd</cd></xml>
        )

    def test3alt() =
        test(
            text = "  ",
            spans = Seq(
                (0, 2, <a/>),
                (0, 0, <b/>),
                (0, 0, <c/>),
                (1, 1, <d/>),
                (1, 1, <e/>)),
            expected = <a><b/><c/> <d/><e/> </a>
        )

    def test4() =
        test(
            text = "aabbccdd",
            spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab/>),
                        (4, 8, <cd/>),
                        (4, 6, <ok/>)),
            expected = <xml><ab>aabb</ab><cd><ok>cc</ok>dd</cd></xml>
        )

    def test5alt() =
        test(
            text = "aabbccdd",
            spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab/>),
                        (2, 4, <b/>),
                        (4, 4, <empty/>),
                        (4, 8, <cd/>),
                        (4, 6, <ok/>)),
            expected = <xml><ab>aa<b>bb</b></ab><empty/><cd><ok>cc</ok>dd</cd></xml>
        )

    def test5b() =
        test(
            text = "aabbccdd",
            spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab/>),
                        (2, 2, <empty1/>),
                        (4, 4, <empty2/>),
                        (2, 4, <b/>),
                        (2, 2, <empty3/>),
                        (4, 4, <empty4/>),
                        (4, 8, <cd/>),
                        (4, 6, <ok/>)),
            expected = <xml><ab>aa<empty1/><b><empty3/>bb<empty2/></b></ab><empty4/><cd><ok>cc</ok>dd</cd></xml>
        )

    def test6() =
        test(
            text = "aabbccdd",
            spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab/>),
                        (2, 4, <b/>),
                        (2, 4, <c/>),
                        (3, 4, <d/>),
                        (4, 8, <cd/>),
                        (4, 6, <ok/>)),
            expected = <xml><ab>aa<b><c>b<d>b</d></c></b></ab><cd><ok>cc</ok>dd</cd></xml>
        )

    def test7() =
        test(
            text = "aabbccdd",
            spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab a="a" b="b"/>),
                        (4, 8, <cd c="c" d="d"/>)),
            expected = <xml><ab a="a" b="b">aabb</ab><cd c="c" d="d">ccdd</cd></xml>
        )

    def failedSpans() = {
        val text = "aabbccdd"
        val spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab/>),
                        (4, 6, <err/>),
                        (4, 8, <cd/>))
        try {
            val res = spansToXML(text, spans)
            assert(false)
        } catch {
            case e => {
                println("This generate invalid XML:")
                println("<xml><ab>aabb</ab><err><cd>cc</err>dd</cd></xml>")
                println(e.getMessage)
            }
        }

    }

    def main(args: Array[String]) {
        test1()
        test2()
        test3alt()
        test4()
        test5alt()
        test5b()
        test6()
        test7()
        failedSpans()
    }
}

SpanToXmlAlt.main(Array())
于 2011-08-13T23:59:12.180 回答
1

我的解决方案是递归的。我根据需要对输入进行排序Seq并将其转换为List. 之后是根据规范需要的基本模式匹配。我的解决方案的最大缺点是,虽然.toString在测试方法中产生相同的字符串==并不会产生 true。

import scala.xml.{NodeSeq, Elem, Text}

object SpansToXml {
  type NodeSpan = (Int, Int, Elem)

  def adjustIndices(offset: Int, spans: List[NodeSpan]) = spans.map {
    case (spanStart, spanEnd, spanNode) => (spanStart - offset, spanEnd - offset, spanNode)
  }

  def sortedSpansToXml(text: String, spans: List[NodeSpan]): NodeSeq = {
    spans match {
      // current span starts and ends at index 0, thus no inner text exists
      case (0, 0, node) :: rest => node +: sortedSpansToXml(text, rest)

      // current span starts at index 0 and ends somewhere greater than 0
      case (0, end, node) :: rest =>
        // partition the text and the remaining spans in inner and outer and process both independently
        val (innerSpans, outerSpans) = rest.partition {
          case (spanStart, spanEnd, spanNode) => spanStart <= end && spanEnd <= end
        }
        val (innerText, outerText) = text.splitAt(end)

        // prepend the generated node to the outer xml
        node.copy(child = node.child ++ sortedSpansToXml(innerText, innerSpans)) +: sortedSpansToXml(outerText, adjustIndices(end, outerSpans))

      // current span has starts at an index larger than 0, convert text prefix to text node
      case (start, end, node) :: rest =>
        val (pre, spanned) = text.splitAt(start)
        Text(pre) +: sortedSpansToXml(spanned, adjustIndices(start, spans))

      // all spans consumed: we can just return the text as node
      case Nil =>
        Text(text)
    }
  }

  def spansToXml(xmlText: String, nodeSpans: Seq[NodeSpan]) = {
    val sortedSpans = nodeSpans.toList.sortBy {
      case (start, end, _) => (start, -end)
    }
    sortedSpansToXml(xmlText, sortedSpans)
  }

  // test code stolen from Blaisorblade and david.rosell

  def test(text: String, spans: Seq[(Int, Int, Elem)], expected: NodeSeq) {
    val res = spansToXml(text, spans)
    print("Text: \"%s\", expected:\n%s\nResult:\n%s\n\n" format (text, expected, res))
    // Had to resort on to string here.
    assert(expected.toString == res.toString)
  }

  def test1() =
        test(
            text = "The dog chased the cat.",
            spans = Seq((0, 23, <xml/>),(4, 22, <phrase/>),(4, 7, <token/>)),
            expected = <xml>The <phrase><token>dog</token> chased the cat</phrase>.</xml>
        )

  def test2() =
        test(
            text = "aabbccdd",
            spans = Seq(
                (0, 8, <xml x="1"/>),
                (0, 4, <ab y="foo"/>),
                (4, 8, <cd z="42>3"/>)),
            expected = <xml x="1"><ab y="foo">aabb</ab><cd z="42>3">ccdd</cd></xml>
        )

  def test3() =
        test(
            text = " ",
            spans = Seq(
                (0, 1, <a/>),
                (0, 0, <b/>),
                (0, 0, <c/>),
                (1, 1, <d/>),
                (1, 1, <e/>)),
            expected = <a><b/><c/> <d/><e/></a>
        )

  def test4() =
      test(
          text = "aabbccdd",
          spans = Seq((0, 8, <xml/>),
                      (0, 4, <ab/>),
                      (4, 8, <cd/>),
                      (4, 6, <ok/>)),
          expected = <xml><ab>aabb</ab><cd><ok>cc</ok>dd</cd></xml>
      )

  def test5() =
      test(
          text = "aabbccdd",
          spans = Seq((0, 8, <xml/>),
                      (0, 4, <ab/>),
                      (2, 4, <b/>),
                      (4, 4, <empty/>),
                      (4, 8, <cd/>),
                      (4, 6, <ok/>)),
          expected = <xml><ab>aa<b>bb<empty/></b></ab><cd><ok>cc</ok>dd</cd></xml>
      )

  def test6() =
      test(
          text = "aabbccdd",
          spans = Seq((0, 8, <xml/>),
                      (0, 4, <ab/>),
                      (2, 4, <b/>),
                      (2, 4, <c/>),
                      (3, 4, <d/>),
                      (4, 8, <cd/>),
                      (4, 6, <ok/>)),
          expected = <xml><ab>aa<b><c>b<d>b</d></c></b></ab><cd><ok>cc</ok>dd</cd></xml>
      )

  def test7() =
      test(
          text = "aabbccdd",
          spans = Seq((0, 8, <xml/>),
                      (0, 4, <ab a="a" b="b"/>),
                      (4, 8, <cd c="c" d="d"/>)),
          expected = <xml><ab a="a" b="b">aabb</ab><cd c="c" d="d">ccdd</cd></xml>
      )

}
于 2011-08-16T14:24:07.747 回答
0

您可以轻松地动态创建 XML 节点:

scala> import scala.xml._
import scala.xml._

scala> Elem(null, "AAA",xml.Null,xml.TopScope, Array[Node]():_*)
res2: scala.xml.Elem = <AAA></AAA>

这是Elem.apply签名def apply (prefix: String, label: String, attributes: MetaData, scope: NamespaceBinding, child: Node*) : Elem

我看到这种方法的唯一问题是您需要先构建内部节点。

让事情变得更容易的东西:

scala> def elem(name:String, children:Node*) = Elem(null, name ,xml.Null,xml.TopScope, children:_*); def elem(name:String):Elem=elem(name, Array[Node]():_*);

scala> elem("A",elem("B"))
res11: scala.xml.Elem = <A><B></B></A>
于 2010-11-09T18:04:13.433 回答
0

这是使用字符串连接和接近正确的解决方案XML.loadString

def spansToXML(text: String, spans: Seq[(Int, Int, Elem)]): Node = {
  // arrange items so that at each offset:
  //   closing tags sort before opening tags
  //   with two opening tags, the one with the later closing tag sorts first
  //   with two closing tags, the one with the later opening tag sorts first
  val items = Buffer[(Int, Int, Int, String)]()
  for ((begin, end, elem) <- spans) {
    val elemStr = elem.toString
    val splitIndex = elemStr.indexOf('>') + 1
    val beginTag = elemStr.substring(0, splitIndex)
    val endTag = elemStr.substring(splitIndex)
    items += ((begin, +1, -end, beginTag))
    items += ((end, -1, -begin, endTag))
  }
  // group tags to be inserted by index
  val inserts = Map[Int, Buffer[String]]()
  for ((index, _, _, tag) <- items.sorted) {
    inserts.getOrElseUpdate(index, Buffer[String]()) += tag
  }
  // put tags and characters into a buffer
  val result = Buffer[String]()
  for (i <- 0 until text.size + 1) {
    for (tags <- inserts.get(i); tag <- tags) {
      result += tag
    }
    result += text.slice(i, i + 1)
  }
  // create XML from the string buffer
  XML.loadString(result.mkString)
}

这通过了前两个测试用例,但在第三个测试用例中失败了。

于 2010-11-09T22:05:41.723 回答