0

我想比较这两个 xml 文件:

文件1.xml:

<ngs_sample id="40332">
  <workflow value="salmonella" version="101_provisional" />
  <results>
  <gastro_prelim_st reason="not novel" success="false">
      <type st="1364" />
      <type st="9999" />
  </gastro_prelim_st>
 </results>
</ngs_sample>

文件2.xml:

<ngs_sample id="40332">
  <workflow value="salmonella" version="101_provisional" />
  <results>
  <gastro_prelim_st reason="not novel" success="false">
      <type st="1364" />
   </gastro_prelim_st>
 </results>
</ngs_sample>

我曾经xmldiff将 a.xml 与 b.xml 进行比较:

def compare_xmls(observed,expected):

    from xmldiff import main, formatting
    formatter = formatting.DiffFormatter()
    diff = main.diff_files(observed,expected,formatter=formatter)
    return diff

out = compare_xmls(a.xml, b.xml)
print(out)

输出:

[delete, /ngs_sample/results/gastro_prelim_st/type[2]]

任何人都知道如何识别两个 xml 文件之间的区别,即与文件 b.xml 相比已删除的内容。有人推荐在python中比较xml文件的任何其他方式吗?

4

3 回答 3

2

使用xmldiff来执行这个确切的任务。

主文件

from xmldiff import main
diff = main.diff_files("file1.xml", "file2.xml")
print(diff)

输出

[DeleteNode(node='/ngs_sample/results/gastro_prelim_st/type[2]')]
于 2018-11-22T14:09:22.293 回答
2

您可以切换到XMLFormatter并手动过滤掉结果:

...
# Change formatter:
formatter = formatting.XMLFormatter(normalize=formatting.WS_BOTH)

...

# after `out` has been retrieved:
import re
for i in out.splitlines():
  if re.search(r'\bdiff:\w+', i):
    print(i)

# Result:
#       <type st="9999" diff:delete=""/>
于 2018-11-22T18:01:24.293 回答
0

另一种选择是使用xml2 https://github.com/clone/xml2(以及类似bash进程替换的东西)

$ diff --color <(xml2 < File1.xml) <(xml2 < File2.xml)

7,8d6
< /ngs_sample/results/gastro_prelim_st/type
< /ngs_sample/results/gastro_prelim_st/type/@st=9999
于 2020-04-15T11:13:44.040 回答