python - 如何读取带有标签的字符串来自python3的xml文件

Question

我所拥有的：<xliff:g>xml 文件中带有标记的行，例如：

<string name="AAAAAAA" msgid="XXXXXXX">"Activity <xliff:g id="BBBBBBB">%1$s</xliff:g> isn\'t responding."\n\n"Do you want to close it?"</string>

我需要什么：读取整个字符串与以下内容相同：

Activity %1$s isn't responding.\n\nDo you want to close it?

你能帮忙吗？

我尝试使用 xml.dom.minidom。

dom = xml.dom.minidom.parse(xmlfile)
strings = dom.getElementsByTagName('string')
for string in strings:
    rText = string.childNodes[0].nodeValue
    print(rText)

结果是“活动

score 0 · Accepted Answer

您可以使用像BeautifulSoup这样的 XML 解析器，它非常易于使用（在我看来）：

>>> myxml = "thexmlyouposted"
>>> from bs4 import BeautifulSoup as BS
>>> soup = BS(myxml, 'xml')
>>> print soup.find('string').text
"Activity %1$s isn't responding."

"Do you want to close it?"

score 0 · Accepted Answer

我将假设该元素是更大文件的一部分。例如：

<strings xmlns:xliff="some-name-space">
  <string name="AAAAAAA" msgid="XXXXXXX">"Activity <xliff:g id="BBBBBBB">%1$s</xliff:g> isn\'t responding."\n\n"Do you want to close it?"</string>
  <string name="AAAAAAA" msgid="XXXXXXX">"Another <xliff:g id="BBBBBBB">%1$s</xliff:g>message</string>
</strings>

使用 minidom 与任何其他框架一样好。打开文件并遍历所有元素。对于每个元素调用函数get_text。获取下面定义的文本递归返回所有元素的内容（nodeValue）。

import xml.dom.minidom as md
dom = md.parse('wu.xml')
strings = dom.getElementsByTagName('string')
for string in strings:
    print get_text(string)

def get_text(el):
    """get_text
    For text nodes, returns the text. For element nodes, recursively call the
    function to aggregate all the text nodes into a string"""           
    msg = ''
    for n in el.childNodes:
        if n.nodeType == n.TEXT_NODE:
            msg += n.nodeValue
        elif n.nodeType == n.ELEMENT_NODE:
            msg += get_text(n)
    return msg

还有很多其他方法可以做到这一点。

python - 如何读取带有标签的字符串来自python3的xml文件

2 回答 2

Related

Reference