2

我有以下 XML:

<XMLResults><ConfMess><RCode>0</RCode><MId>0</MId></ConfMess><COURSE_DATA><THEHEADING>Review Engagements: Inquiry and Analytical Review Procedures and Reporting</THEHEADING><ABSTRACT><!--this file has been generated by v.3.2.1 8/9/2012 8:50:14 AM by JHancock (and called from 'A G&Q Database')--><html><head><title>Course Abstract</title><link rel='stylesheet' href='https://www.thelearningcenter.org/cserver/case1/css/theabstract.css' type='text/css'></head><body><div style='text-align: center;' class=h2banner>Course Abstract</div><div id="tableContainer" class="tableContainer"><table class="abstract"><tbody class="scrollContent"><tr class="abstract"><td class="abstractCaptions">Main Title</td><td class="abstract" id=courseAbstractTitle>Initial Review: Find Out About Additional Reporting Procedures</td></tr><tr class="abstract"><td class="abstractCaptions">Writer(s)</td><td class="abstract" id=authorsAbstract>Karl Booker<br>Harriet Johnson</td></tr><tr class="abstract"><td class="abstractCaptions">Current Field(s) of Study<sup>1</sup></td><td class="abstract" id=fosAbstract>4.0 study hours in 'History'</td></tr><tr class="abstract"><td class="abstractCaptions">Area Of Study</td><td class="abstract" id=courseLevelAbstract>Medium</td></tr><tr class="abstract"><td class="abstractCaptions">Value (30 min.sec.)<sup>1</sup></td><td class="abstract" id=creditHoursAbstract>3.5</td></tr><tr class="abstract"><td class="abstractCaptions">Must Haves</td><td class="abstract" id=prerequisitesAbstract>None</td></tr><tr class="abstract"><td class="abstractCaptions">Description</td><td class="abstract" id=descriptionAbstract>This topic revolves around discussing important topics in the history field and how they relate to our current situation.</td></tr><tr class="abstract"><td class="abstractCaptions">TheObjective</td><td class="abstract" id=objectivesAbstract><ul><li>Learn more about history and how our modern times have been shaped by it.<li>Plan for the future<li>Help mankind to learn from the past<li>Provide valuable input to others<li>Be greatful for what we have<li>Gain credit for all the hard work we put in<li>Pass this course and move on with our lives.<li>Get a good job and raise a family.<li>Get a vacation home and relax on the beach<li>Soak up the sun and get a tan</ul></td></tr><tr class="abstract" id=idExpirationRow><td class="abstractCaptions">Expires</td><td class="abstract" id=expirationAbstract>This topic is reviewed monthly for value and modified where needed.</td></tr><tr class="abstract"><td class="abstractCaptions">Item ID</td><td class="abstract" id=courseIDabstract>odt</td></tr></tbody></table></div><div id=footnote1ID class="sylFNote"><sup>1</sup>Consult your instructor for infornation on this particular topic</div><div id="idCopyright" class="copyright">© 2004 THIS SCHOOL BOARD</div></body></html></ABSTRACT></COURSE_DATA><STUDY_AREA><SUBJECT>AuditField</SUBJECT><NUMBER_HOURS>3.0</NUMBER_HOURS></FIELD_OF_STUDY></XMLResults>

我似乎找不到可以解析<ABSTRACT>stuff</ABSTRACT>XML 部分中的“东西”的例程。我认为这可能是由于特殊字符或类似的原因。有人可以帮我制定一个可以解决这个问题并且不会失败的例程吗?

4

2 回答 2

2

这不是 XML。这是一堆带尖括号的文本。

您不仅在<ABSTRACT>元素内有问题,而且还有<STUDY_AREA></FIELD_OF_STUDY>.

你如何解决它?你没有。无论是谁向您发送此垃圾,您都会向您发送有效的 XML。这并不是说那里没有很多 XML 编辑器。他们应该使用这样的工具来创建和/或验证他们的“XML”。

于 2012-09-12T19:33:57.030 回答
0

这可能是因为<!-- -->是 XML 中的注释。它本身并没有失败。

Comments in XML

The syntax for writing comments in XML is similar to that of HTML.

<!-- This is a comment -->

这是一个参考链接。

你如何解决这个问题将取决于你使用的库。一些库可能支持获取该元素的原始文本。他们也可能返回一个评论元素。

我可能只是 grep 的纯文本<ABSTRACT>(.*)</ABSTRACT>。如果每个文档有多个记录,则可能会出现问题,因此您可能需要先将其隔离到每个文档。

于 2012-09-12T18:55:24.850 回答