4

我想替换 XML 文件中的文本,但保留源文件中的任何其他格式。

例如,将其解析为 DOM,使用 XPath 替换节点并输出为 String 可能无法解决问题,因为它将重新格式化整个文件。(漂亮的打印可能对 99% 的情况都有好处,但要求是保留现有格式,即使它不是“漂亮”)

是否有任何 Java / Scala 库可以对字符串进行“查找和替换”,而不会将其解析为 DOM 树?或者至少能够保留原始格式?

编辑:

我认为maven replacer 插件了这样的事情,它似乎通过使用保留了原始的空白格式setPreserveSpace(我认为,需要尝试)

import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer; 
...
   private String writeXml(Document doc) throws Exception {
            OutputFormat of = new OutputFormat(doc);
            of.setPreserveSpace(true);
            of.setEncoding(doc.getXmlEncoding());

            StringWriter sw = new StringWriter();
            XMLSerializer serializer = new XMLSerializer(sw, of);
            serializer.serialize(doc);
            return sw.toString();
    }

所以问题变成了:有没有一种(直接的)方法可以做到这一点而没有额外的依赖?

编辑2:

要求是使用外部提供的 XPath 查询,即作为字符串。

4

2 回答 2

2

I was going to code up something quick to recall scala.xml and how much I dislike it; I haven't used it since I first learned some Scala.

You normally see text nodes of white space -- this is mentioned in PiS, in the "catalog" example here.

I did remember that it reverses attributes on load -- I vaguely remembered having to fix pretty printing.

But the compiler doesn't reverse attributes on xml literals. So given that you want to supply an xpath dynamically, you could use the compiler toolbox to compile the source document as a literal and also compile the xpath string, with / operators converted to \.

That's just a little out-of-the-box fun, but maybe it has a sweet spot of applicability, perhaps if you must use only the standard Scala distro.

I'll update later when I get a chance to try it out.

import scala.xml._
import java.io.File

object Test extends App {
  val src =
"""|<doc>
   |  <foo bar="red" baz="yellow"> <bar> red </bar> </foo>
   |  <baz><bar>red</bar></baz>
   |</doc>""".stripMargin

  val red = "(.*)red(.*)".r
  val sub = "blue"

val tmp =
<doc>
   <foo bar="red" baz="yellow"> <bar> red </bar> </foo>
   <baz><bar>red</bar></baz>
</doc>

  Console println tmp

  // replace "red" with "blue" in all bar text

  val root = XML loadString src
  Console println root
  val bars = root \\ "bar"
  val barbars =
    bars map (_ match {
      case <bar>{Text(red(prefix, suffix))}</bar> =>
           <bar>{Text(s"$prefix$sub$suffix")}</bar>
      case b => b
    })
  val m = (bars zip barbars).toMap
  val sb = serialize(root, m)
  Console println sb

  def serialize(x: Node, m: Map[Node, Node], sb: StringBuilder = new StringBuilder) = {
    def serialize0(x: Node): Unit = x match {
      case e0: Elem =>
        val e = if (m contains e0) m(e0) else e0
        sb append "<"
        e nameToString sb
        if (e.attributes ne null) e.attributes buildString sb
        if (e.child.isEmpty) sb append "/>"
        else {
          sb append ">"
          for (c <- e.child) serialize0(c)
          sb append "</"
          e nameToString sb
          sb append ">"
        }
      case Text(t) => sb append t
    }
    serialize0(x)
    sb
  }
}
于 2013-08-14T17:53:57.290 回答
1

您可以尝试scala.xml.pull或缩放 XML。

您可以在此处找到用于解析文件的工作代码。

Scales XML 可以使用 STAX API,它是一个流式 API。因此,永远不会有完整的 DOM,并且通常无需过多的预处理即可到达 XML 的各个部分。

使用您的特殊格式的 XML 文件对其进行测试,看看它是否有效。

我不建议使用简单的文本搜索并替换为 XML。不匹配的可能性很大。然后,您将以不可预知的方式更改文档。由此产生的错误通常很难找到。

我用 Scales XML 做了一个简短的实验,看起来很有希望:

    scala> import scales.utils._
    import scales.utils._
    scala> import ScalesUtils._
    import ScalesUtils._
    scala> import scales.xml._
    import scales.xml._
    scala> import ScalesXml._
    import ScalesXml._
    scala> import scales.xml.serializers.StreamSerializer
    import scales.xml.serializers.StreamSerializer
    scala> import java.io.StringReader
    import java.io.StringReader
    scala> import java.io.PrintWriter
    import java.io.PrintWriter

    scala> def xmlsrc=new StringReader("""
         | <a attr1="value1"> <b/>This
         | is some tex<xt/>
         |   <!-- A comment -->
         |   <c><d>
         |   </d>
         |   <removeme/>
         |   <changeme/>
         | </c>
         | </a>
         | """)
    xmlsrc: java.io.StringReader

    scala> def pull=pullXml(xmlsrc)
    pull: scales.xml.XmlPull with java.io.Closeable with scales.utils.IsClosed

    scala> writeTo(pull, new PrintWriter(System.out))
    <?xml version="1.0" encoding="UTF-8"?><a attr1="value1"> <b/>This
    is some tex<xt/>
      <!-- A comment -->
      <c><d>
      </d>
      <removeme/>
      <changeme/>
    </c>
    res0: Option[Throwable] = None

    scala> def filtered=pull flatMap {
         |   case Left(e : Elem) if e.name.local == "removeme" => Nil
         |   case Right(e : EndElem) if e.name.local == "removeme" => Nil
         |   case Left(e : Elem) if e.name.local == "changeme" => List(Left(Elem("x")), Left(Elem("y"
     Right(EndElem("x")))
         |   case Right(e : EndElem) if e.name.local == "changeme" => List(Right(EndElem("x")))
         |   case otherwise => List(otherwise)
         | }
    filtered: Iterator[scales.xml.PullType]

    scala> writeTo(filtered, new PrintWriter(System.out))
    <?xml version="1.0" encoding="UTF-8"?><a attr1="value1"> <b/>This
    is some tex<xt/>
      <!-- A comment -->
      <c><d>
      </d>

      <x><y/></x>
    </c>
    res1: Option[Throwable] = None

该示例首先初始化 XML 令牌流。然后它打印未修改的令牌流。您可以看到,注释和格式被保留。然后它使用 monadic Sc​​ala API 修改令牌流并打印结果。您可以看到大多数格式都被保留了,只有更改部分的格式有所不同。

所以看起来 Scales XML 可以直接解决您的问题。

于 2013-08-14T08:59:19.350 回答