python - 元素树以不同方式处理相似文件

Question

这是我的 python (2.6) 脚本遇到的两个不同文件。一个会解析，另一个不会。我只是好奇为什么会这样。

此 xml 文件将不会解析，脚本将失败：

<Landfire_Feedback_Point_xlsform id="fbfm40v10" instanceID="uuid:9e062da6-b97b-4d40-b354-6eadf18a98ab" submissionDate="2013-04-30T23:03:32.881Z" isComplete="true" markedAsCompleteDate="2013-04-30T23:03:32.881Z" xmlns="http://opendatakit.org/submissions">
<date_test>2013-04-17</date_test>
<plot_number>10</plot_number>
<select_multiple_names>BillyBob</select_multiple_names>
<geopoint_plot>43.2452830500 -118.2149402900 210.3000030518 3.0000000000</geopoint_plot><fbfm40_new>GS2</fbfm40_new>
<select_grazing>NONE</select_grazing>
<image_close>1366230030355.jpg</image_close>
<plot_note>No road present.</plot_note>
<n0:meta xmlns:n0="http://openrosa.org/xforms">
<n0:instanceID>uuid:9e062da6-b97b-4d40-b354-6eadf18a98ab</n0:instanceID>
</n0:meta>
</Landfire_Feedback_Point_xlsform>

此 xml 文件将正确解析并且脚本成功：

<Landfire_Feedback_Point_xlsform id="fbfm40v10">
<date_test>2013-05-14</date_test>
<plot_number>010</plot_number>
<select_multiple_names>BillyBob</select_multiple_names>
<geopoint_plot>43.26630563 -118.39881809 351.70001220703125 5.0</geopoint_plot>
<fbfm40_new>GR1</fbfm40_new>
<select_grazing>HIGH</select_grazing>
<image_close>fbfm40v10_PLOT_010_ID_6.jpg</image_close>
<plot_note>Heavy grazing</plot_note>
<meta><instanceID>uuid:90e7d603-86c0-46fc-808f-ea0baabdc082</instanceID></meta>
</Landfire_Feedback_Point_xlsform>

这是一个小 Python 脚本，它演示了一个可以工作，而另一个不能。我只是在寻找一个解释，说明为什么 ElementTree 将一个文件视为 xml 文件，而另一个则不是。具体来说，似乎没有解析的那个以“'NONE'类型没有'text'属性”或类似的东西失败。但是，这是因为它似乎没有将文件视为 xml，或者它看不到开头行之外的任何元素。任何有关此错误的解释或指导将不胜感激。提前致谢。

Python脚本：

import os
from xml.etree import ElementTree


def replace_xml_attribute_in_file(original_file,element_name,attribute_value):
    #THIS FUNCTION ONLY WORKS ON XML FILES WITH UNIQUE ELEMENT NAMES
    #  -DUPLICATE ELEMENT NAMES WILL ONLY GET THE FIRST ELEMENT WITH A GIVEN NAME

    #split original filename and add tempfile name
    tempfilename="temp.xml"
    rootsplit = original_file.rsplit('\\')  #split the root directory on the backslash
    rootjoin = '\\'.join(rootsplit[:-1]) #rejoin the root diretory parts with a backslash -minus the last 
    temp_file = os.path.join(rootjoin,tempfilename) 
    et = ElementTree.parse(original_file)
    author=et.find(element_name)
    author.text = attribute_value
    et.write(temp_file)
    if os.path.exists(temp_file) and os.path.exists(original_file): #if both the original and the temp files exist
        os.remove(original_file)                                    #erase the original
        os.rename(temp_file,original_file)                          #rename the new file
    else:
        print "Something went wrong."

replace_xml_attribute_in_file("testfile1.xml","image_close","whoopdeedoo.jpg");

score 0 · Accepted Answer

这是一个小 Python 脚本，它演示了一个可以工作，而另一个不能。我只是在寻找一个解释，说明为什么 ElementTree 将一个文件视为 xml 文件，而另一个则不是。

您的代码根本没有证明这一点。它表明它们都被 ElementTree 视为充满节点的有效 XML 文件。他们都解析得很好，他们都读过了第一行，等等。

唯一的问题是第一个没有名为“image_close”的节点，因此您的代码不起作用。

你可以很容易地看到：

for node in et.getroot().getchildren():
    print node.tag

你会得到 9 个根的孩子，无论是哪个版本。

并且输出应该向您显示问题。您想要的节点实际上{http://opendatakit.org/submissions}image_close在第一个示例中命名，而不是image_close在第二个示例中。

而且，正如您可能猜到的那样，这是因为namespace=http://opendatakit.org/submissions根节点中的。ElementTree 使用“James Clark 表示法”将未知命名空间名称映射到通用名称。

无论如何，因为没有节点被命名image_close，所以et.find(element_name)返回None，所以你的代码存储author=None，然后尝试分配给author.text，并得到一个错误。

至于如何解决这个问题……好吧，您可以了解默认情况下命名空间在 ElementTree 中是如何工作的，或者您可以升级到 Python 2.7 或为 2.6 安装更新的 ElementTree，让您更轻松地自定义事物。但是，如果您想进行自定义命名空间处理并坚持使用旧版本……我将从这篇文章（及其两篇前辈）和这篇文章开始。

python - 元素树以不同方式处理相似文件

1 回答 1

Related

Reference