假设我的 XML 文件中有以下标签:
<?xml version="1.0" encoding="utf-8"?>
<jobs>
<job>
<P class="Beaton"><FONT size=3><SPAN style="FONT-FAMILY: Symbol; COLOR: black; mso-ascii-font-family: 'Times New Roman'">�</SPAN><SPAN style="COLOR: black"><FONT face="Times New Roman"><SPAN style="mso-spacerun: yes"> </SPAN>Position accountability<o:p></o:p></FONT></SPAN></FONT></P>
<P class="Beaton"><FONT size=3><SPAN style="FONT-FAMILY: Symbol; COLOR: black; mso-ascii-font-family: 'Times New Roman'">�</SPAN><SPAN style="COLOR: black"><FONT face="Times New Roman"> <SPAN style="mso-spacerun: yes"> </SPAN>55 FTEs <o:p></o:p></FONT></SPAN></FONT></P>
</job>
</jobs>
以下是我的代码:
from xml.sax.handler import ContentHandler
import xml.sax
xml_path = 'windows/xml_file.xml'
try:
parser = xml.sax.make_parser( )
parser.parse(open(xml_path))
except (xml.sax.SAXParseException), e:
print "*** PARSER error: %s" % e
Result :
*** PARSER error: windows/xml_file.xml:4:113: not well-formed <invalid token>
谁能告诉我 p 标签有什么问题以及如何避免这种错误?