java - 如何以文本形式检索元素混合子项 (JDOM)

Question

我有一个如下的 XML：

<documentation>
    This value must be <i>bigger</i> than the other.
</documentation>

使用 JDOM，我可以获得以下文本结构：

Document d = new SAXBuilder().build( new StringReader( s ) );
System.out.printf( "getText:          '%s'%n", d.getRootElement().getText() );
System.out.printf( "getTextNormalize: '%s'%n", d.getRootElement().getTextNormalize() );
System.out.printf( "getTextTrim:      '%s'%n", d.getRootElement().getTextTrim() );
System.out.printf( "getValue:         '%s'%n", d.getRootElement().getValue() );

这给了我以下输出：

getText:          '
    This value must be  than the other.
'
getTextNormalize: 'This value must be than the other.'
getTextTrim:      'This value must be  than the other.'
getValue:         '
    This value must be bigger than the other.
'

我真正想要的是将元素的内容作为字符串获取，即"This value must be <i>bigger</i> than the other.". getValue()接近但删除<i>标签。我想我想要类似innerHTMLXML 文档的东西......

我应该只在内容上使用 XMLOutputter 吗？还是有更好的选择？

score 0 · Accepted Answer

在 JDOM 伪代码中：

for Object o in d.getRootElement().getContents()
   if o instanceOf Element
      print <o.getName>o.getText</o.getName>
   else // it's a text
      print o.getText()

然而，正如Prashant Bhate 所写： content.getText() 提供即时文本，仅对带有文本内容的叶子元素有用。

score -1 · Accepted Answer

Jericho HTML非常适合此类任务。您可以使用这样的代码块完成您想要做的事情：

String snippet = new Source(html).getFirstElement().getContent().toString();

它也非常适合处理一般的 HTML，因为它不会试图强迫它成为 XML ......它对它的处理要宽松得多。

score -2 · Accepted Answer

我会说你应该把你的文件改成

<documentation>
  <![CDATA[This value must be <i>bigger</i> than the other.]]>
</documentation>

为了遵守 XML 规范。否则<i>将被视为子元素<documentation>而不是内容。

java - 如何以文本形式检索元素混合子项 (JDOM)

3 回答 3

Related

Reference