python - 使用 BeautifulSoup 解析标签

Question

我遇到了关于 BeautifulSoup 的 python 编程问题。

起初，我需要创建一个函数来从网页的源页面中提取所有标签。我这样做如下：

    from bs4 import BeautifulSoup

    soup=BeautifulSoup(''.join(data))

    def parseUsingSoup(content):
        return soup.findAll('h3')

我要解析的网站是这个：http ://www.auc.nl/news-events/events-and-lectures/events-and-lectures.html?page=1&pageSize=40

它只包含一个 h3-tag。现在问题要我扩展我的函数，以便它还将在 p-tags 中返回与其相关的所有内容。它还要求一个包含四个元组的事件列表，这些元组给出事件的日期、标题、类型和描述。

我真的不知道该怎么做。我尝试了各种不同的东西，但没有什么能给我正确的结果。先感谢您。

score 4 · Accepted Answer

这是获取以下所有<p>标签的一种方法<h3>：

from bs4 import BeautifulSoup
import urllib2

content = 'http://www.auc.nl/news-events/events-and-lectures/events-and-lectures.html?page=1&pageSize=40'

soup = BeautifulSoup(urllib2.urlopen(content))

for x in soup.findAll('h3'):
    for y in soup.findAll('p'):
        print y

然后，您可以将此输出解析为您认为合适的列表。

python - 使用 BeautifulSoup 解析标签

1 回答 1

Related

Reference