0
  <myroot>  <data txt="some0" txt1 = "some1" txt2 = "some2" >
                 <data2>
                        < bank = "SBI" bank2 = "SBI2" >
                 <data2>
                 <data3>
                        <branch = "bang1" branch = "bang2" >
                 <data3>
            </data>

            <data txt="some0" txt1 = "some1" txt2 = "some2" >
                 <data2>
                        < bank = "citi" bank2 = "citi2" >
                 <data2>
                 <data3>
                        <branch = "bang3" branch = "bang4" >
                 <data3>
            </data> </myroot>

上述数据存储在一个变量中,而不是 xml 文件中。我无法解析它,因为它不是 xml 文件。请帮助我将数据转换为 xml 格式/文件并在我正在尝试的脚本下方进行相同的解析:

stdout = "<myroot>%s</myroot>" % stdout
print'main data', stdout
tree = ElementTree.fromstring(stdout)
tree1 = ET.parse('tree')

在脚本的第一行中,我向数据添加了一个根标记,在主数据中,我上面显示的 xml 数据将被存储,然后我试图解析它,但它会引发错误。

4

2 回答 2

0

它抛出一个错误,因为你的 XML 是错误的。

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1301, in XML
    parser.feed(text)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1643, in feed
    self._raiseerror(v)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1507, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 3, column 25

所以看看第 3 行,第 25 列。 tada

>>> stdout.split('\n')[2][25:]
' bank = "SBI" bank2 = "SBI2" >'
于 2013-07-08T05:36:31.987 回答
0

它可以很好地解析BeautifulSoup

>>> s = """<myroot>  <data txt="some0" txt1 = "some1" txt2 = "some2" >
...                  <data2>
...                         < bank = "SBI" bank2 = "SBI2" >
...                  <data2>
...                  <data3>
...                         <branch = "bang1" branch = "bang2" >
...                  <data3>
...             </data>
... 
...             <data txt="some0" txt1 = "some1" txt2 = "some2" >
...                  <data2>
...                         < bank = "citi" bank2 = "citi2" >
...                  <data2>
...                  <data3>
...                         <branch = "bang3" branch = "bang4" >
...                  <data3>
...             </data> </myroot>"""

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(s)
>>> print soup.prettify()
<myroot>
 <data txt="some0" txt1="some1" txt2="some2">
  <data2>
   &lt; bank = "SBI" bank2 = "SBI2" &gt;
   <data2>
    <data3>
     <branch "bang1" = branch="bang2">
      <data3>
      </data3>
     </branch>
    </data3>
   </data2>
  </data2>
 </data>
 <data txt="some0" txt1="some1" txt2="some2">
  <data2>
   &lt; bank = "citi" bank2 = "citi2" &gt;
   <data2>
    <data3>
     <branch "bang3" = branch="bang4">
      <data3>
      </data3>
     </branch>
    </data3>
   </data2>
  </data2>
 </data>
</myroot>
于 2013-07-08T05:48:36.223 回答