17

我正在尝试使用 Python 而没有外部库将多个 XML 文件合并在一起。XML 文件具有嵌套元素。

示例文件 1:

<root>
  <element1>textA</element1>
  <elements>
    <nested1>text now</nested1>
  </elements>
</root>

示例文件 2:

<root>
  <element2>textB</element2>
  <elements>
    <nested1>text after</nested1>
    <nested2>new text</nested2>
  </elements>
</root>

我想要的是:

<root>
  <element1>textA</element1>    
  <element2>textB</element2>  
  <elements>
    <nested1>text after</nested1>
    <nested2>new text</nested2>
  </elements>  
</root>  

我试过的:

这个答案

from xml.etree import ElementTree as et
def combine_xml(files):
    first = None
    for filename in files:
        data = et.parse(filename).getroot()
        if first is None:
            first = data
        else:
            first.extend(data)
    if first is not None:
        return et.tostring(first)

我得到什么:

<root>
  <element1>textA</element1>
  <elements>
    <nested1>text now</nested1>
  </elements>
  <element2>textB</element2>
  <elements>
    <nested1>text after</nested1>
    <nested2>new text</nested2>
  </elements>
</root>

我希望你能看到并理解我的问题。我正在寻找一个合适的解决方案,任何指导都会很棒。

为了澄清问题,使用我拥有的当前解决方案,嵌套元素不会合并。

4

3 回答 3

30

您发布的代码所做的是将所有元素组合在一起,无论是否已经存在具有相同标签的元素。因此,您需要遍历元素并以您认为合适的方式手动检查和组合它们,因为这不是处理 XML 文件的标准方式。我无法比代码更好地解释它,所以在这里,或多或少地评论了:

from xml.etree import ElementTree as et

class XMLCombiner(object):
    def __init__(self, filenames):
        assert len(filenames) > 0, 'No filenames!'
        # save all the roots, in order, to be processed later
        self.roots = [et.parse(f).getroot() for f in filenames]

    def combine(self):
        for r in self.roots[1:]:
            # combine each element with the first one, and update that
            self.combine_element(self.roots[0], r)
        # return the string representation
        return et.tostring(self.roots[0])

    def combine_element(self, one, other):
        """
        This function recursively updates either the text or the children
        of an element if another element is found in `one`, or adds it
        from `other` if not found.
        """
        # Create a mapping from tag name to element, as that's what we are fltering with
        mapping = {el.tag: el for el in one}
        for el in other:
            if len(el) == 0:
                # Not nested
                try:
                    # Update the text
                    mapping[el.tag].text = el.text
                except KeyError:
                    # An element with this name is not in the mapping
                    mapping[el.tag] = el
                    # Add it
                    one.append(el)
            else:
                try:
                    # Recursively process the element, and update it in the same way
                    self.combine_element(mapping[el.tag], el)
                except KeyError:
                    # Not in the mapping
                    mapping[el.tag] = el
                    # Just add it
                    one.append(el)

if __name__ == '__main__':
    r = XMLCombiner(('sample1.xml', 'sample2.xml')).combine()
    print '-'*20
    print r
于 2013-02-14T16:23:46.240 回答
4

谢谢,但我的问题是通过考虑属性来合并。这是我的补丁后的代码:

    import sys
    from xml.etree import ElementTree as et


    class hashabledict(dict):
        def __hash__(self):
            return hash(tuple(sorted(self.items())))


    class XMLCombiner(object):
        def __init__(self, filenames):
            assert len(filenames) > 0, 'No filenames!'
            # save all the roots, in order, to be processed later
            self.roots = [et.parse(f).getroot() for f in filenames]

        def combine(self):
            for r in self.roots[1:]:
                # combine each element with the first one, and update that
                self.combine_element(self.roots[0], r)
            # return the string representation
            return et.ElementTree(self.roots[0])

        def combine_element(self, one, other):
            """
            This function recursively updates either the text or the children
            of an element if another element is found in `one`, or adds it
            from `other` if not found.
            """
            # Create a mapping from tag name to element, as that's what we are fltering with
            mapping = {(el.tag, hashabledict(el.attrib)): el for el in one}
            for el in other:
                if len(el) == 0:
                    # Not nested
                    try:
                        # Update the text
                        mapping[(el.tag, hashabledict(el.attrib))].text = el.text
                    except KeyError:
                        # An element with this name is not in the mapping
                        mapping[(el.tag, hashabledict(el.attrib))] = el
                        # Add it
                        one.append(el)
                else:
                    try:
                        # Recursively process the element, and update it in the same way
                        self.combine_element(mapping[(el.tag, hashabledict(el.attrib))], el)
                    except KeyError:
                        # Not in the mapping
                        mapping[(el.tag, hashabledict(el.attrib))] = el
                        # Just add it
                        one.append(el)

if __name__ == '__main__':

    r = XMLCombiner(sys.argv[1:-1]).combine()
    print '-'*20
    print et.tostring(r.getroot())
    r.write(sys.argv[-1], encoding="iso-8859-1", xml_declaration=True)
于 2015-04-27T13:13:01.320 回答
1

扩展@jadkik94 的答案以创建一个不更改其参数并更新属性的实用程序方法:

请注意,该代码仅在 Py2 中有效,因为 Py3 尚不支持 Element 类的 copy() 方法。

def combine_xmltree_element(element_1, element_2):
    """
    Recursively combines the given two xmltree elements. Common properties will be overridden by values of those
    properties in element_2.
    
    :param element_1: A xml Element
    :type element_1: L{Element}
    
    :param element_2: A xml Element
    :type element_2: L{Element}
    
    :return: A xml element with properties combined.
    """

    if element_1 is None:
        return element_2.copy()

    if element_2 is None:
        return element_1.copy()

    if element_1.tag != element_2.tag:
        raise TypeError(
            "The two XMLtree elements of type {t1} and {t2} cannot be combined".format(
                t1=element_1.tag,
                t2=element_2.tag
            )
        )

    combined_element = Element(tag=element_1.tag, attrib=element_1.attrib)
    combined_element.attrib.update(element_2.attrib)

    # Create a mapping from tag name to child element
    element_1_child_mapping = {child.tag: child for child in element_1}
    element_2_child_mapping = {child.tag: child for child in element_2}

    for child in element_1:
        if child.tag not in element_2_child_mapping:
            combined_element.append(child.copy())

    for child in element_2:
        if child.tag not in element_1_child_mapping:
            combined_element.append(child.copy())

        else:
            if len(child) == 0:  # Leaf element
                combined_child = element_1_child_mapping[child.tag].copy()
                combined_child.text = child.text
                combined_child.attrib.update(child.attrib)

            else:
                # Recursively process the element, and update it in the same way
                combined_child = combine_xmltree_element(element_1_child_mapping[child.tag], child)

            combined_element.append(combined_child)

    return combined_element
 
于 2020-08-28T14:11:49.130 回答