python - 如何在 XML 标签 django 的 CDATA 中获取内容

Question

我正在使用 urllib 和 beautifulsoup 来解析 django 中的 xml 文件。我无法使用 CDATA 解析描述标签的内容。

我的 xml 标签。

<item>
         <title>EU Confronting US Over Surveillance</title>
    <description><![CDATA[Voice of America is an international news and broadcast organization serving Central and Eastern Europe, the Caucasus, Central Asia, Russia, the Middle East and Balkan countries]]></description>
<guid>http://www.voanews.com/content/eu-confronting-us-over-surveillance/1778928.html</guid>
</item>

这个描述标签在项目标签views.py中

for i in soup.findAll('item'):
 print i.description.string

如果 CDATA 不存在意味着我可以解析描述标签内的内容。我不知道如何解析这个内容。请帮助我另外如何获取标签内的图像..

<description>&lt;img src='http://static.ibnlive.in.com/ibnlive/pix/sitepix/10_2013/tony-abbott-visits-afghanistan-says-australias-war-is-over_291013013344_338x225.jpg' width='90' height='62'&gt;&lt;p&gt;"Australia's longest war" is ending and its defence forces mission in Afghanistan will be complete by 2013 end, Prime Minister Tony Abbott announced in a statement on Tuesday.&lt;/p&gt;</description>

score 0 · Accepted Answer

CData 可以这样访问：

>>> import BeautifulSoup
>>> txt = '''<description><![CDATA[Voice of America is an international news and broadcast organization serving Central and Eastern Europe, the Caucasus, Central Asia, Russia, the Middle East and Balkan countries]]></description>'''
>>> soup = BeautifulSoup.BeautifulSoup(txt)
>>> for cd in soup.findAll(text=True):
...   if isinstance(cd, BeautifulSoup.CData):
...     print 'CData value: %r' % cd
...
CData value: u'Voice of America is an international news and broadcast organi
zation serving Central and Eastern Europe, the Caucasus, Central Asia, Russia, t
he Middle East and Balkan countries'
>>>

根据您的评论进行的编辑应该会有所帮助。

from bs4 import BeautifulSoup, CData
import urllib

source_txt = urllib.urlopen("http://voanews.com/api/epiqq")
soup = BeautifulSoup.BeautifulSoup(source_txt.read())
for cd in soup.findAll(text=True):
    if isinstance(cd, CData):
        print 'CData value: %r' % cd

注意事项：

导入语句。我正在导入整个BeautifulSoup 包
urlopen参数。它需要http

python - 如何在 XML 标签 django 的 CDATA 中获取内容

1 回答 1

Related

Reference