-1

我对这个问题视而不见。

我正在将 XML 封装的 HTML 数据输入 Atlassian Confluence。对于 -tags,我需要添加一个 span-tag。但无论我如何尝试,lxml-lib 将我的 < 和 > 分别转换为 < 和 >。但是,转换仅适用于我的新标签,其中的任何现有标签都不会受到影响!

看看这个 Python 代码:

for x in doc.iter():
    if x.tag == "td":
        print x.text
        x.text = "no tags"
        print etree.dump(x)
        x.text = "<span>one tag</span>"
        print etree.dump(x)

对于此输入:

<tr>
  <td>apa</td>
  <td>1.2</td>
  <td>
    <a href="http://korv.com/apa.tar.gz">3.4</a>
  </td>
  <td>no</td>
</tr>
<tr>
  <td>coreutils</td>
  <td>6.12</td>
  <td>
    <a href="http://ftp.gnu.org/gnu/coreutils/coreutils-8.21.tar.xz">8.21</a>
  </td>
  <td>no</td>
</tr>

这是输出:

<td>no tags</td>None
<td>&lt;span&gt;one tag&lt;/span&gt;</td>None
1.2
<td>no tags</td>None
<td>&lt;span&gt;one tag&lt;/span&gt;</td>None
None
<td>no tags<a href="http://korv.com/apa.tar.gz">3.4</a></td>None
<td>&lt;span&gt;one tag&lt;/span&gt;<a href="http://korv.com/apa.tar.gz">3.4</a></td>None
no
<td>no tags</td>None
<td>&lt;span&gt;one tag&lt;/span&gt;</td>None
coreutils
<td>no tags</td>None
<td>&lt;span&gt;one tag&lt;/span&gt;</td>None
6.12
<td>no tags</td>None
<td>&lt;span&gt;one tag&lt;/span&gt;</td>None
None
<td>no tags<a href="http://ftp.gnu.org/gnu/coreutils/coreutils-8.21.tar.xz">8.21</a></td>None
<td>&lt;span&gt;one tag&lt;/span&gt;<a href="http://ftp.gnu.org/gnu/coreutils/coreutils-8.21.tar.xz">8.21</a></td>None
no
<td>no tags</td>None
<td>&lt;span&gt;one tag&lt;/span&gt;</td>None

如您所见,其中的 -tag 未触及,而 my 已转换。我无法理解这个错误。

为什么我的文本完成了转换,而现有的文本却没有改变?

4

2 回答 2

5

You are inserting text into an XML element. Text always will be escaped to be XML-safe.

If you wanted to add a new tag, create a new Element; the ElementTree.SubElement() factory is easiest:

from lxml import etree

etree.SubElement(td, 'span').text = 'one tag'

If you wanted to wrap the contents of the td, simply move all elements over (plus the .text attribute:

def wrap(parent, tagname, **kw):
    sub = etree.SubElement(parent, tagname, **kw)
    parent.text, sub.text = None, parent.text
    for index, child in enumerate(parent.iterchildren()):
        if child is not sub:
            sub.insert(index, child)
    return parent

wrap(td, 'span')

Demo:

>>> etree.tostring(doc.findall('.//td')[2])
'<td>\n    <a href="http://korv.com/apa.tar.gz">3.4</a>\n  </td>\n  '
>>> etree.tostring(wrap(tree.findall('.//td')[2], 'span'))
'<td><span>\n    <a href="http://korv.com/apa.tar.gz">3.4</a>\n  </span></td>\n  '
于 2013-07-08T10:00:59.790 回答
1

When you write

x.text = "<span>one tag</span>"

you are saying that the content of the node is that text. Since < and > are reserved characters in XML, they need to be escaped.

It looks like you are trying to create new <span> nodes and to do that you will have to create the nodes.

于 2013-07-08T10:01:09.787 回答