python - 使用 urllib 和 BeautifulSoup 通过 Python 从 Web 检索信息

Question

我可以使用 urllib 获取 html 页面，并使用 BeautifulSoup 解析 html 页面，看起来我必须生成要从 BeautifulSoup 读取的文件。

import urllib                                       
sock = urllib.urlopen("http://SOMEWHERE") 
htmlSource = sock.read()                            
sock.close()                                        
--> write to file

有没有办法在不从 urllib 生成文件的情况下调用 BeautifulSoup？

score 23 · Accepted Answer

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup(htmlSource)

无需写入文件：只需传入 HTML 字符串。您也可以urlopen直接传递从返回的对象：

f = urllib.urlopen("http://SOMEWHERE") 
soup = BeautifulSoup(f)

score 0 · Accepted Answer

您可以打开 url，下载 html，并使用gazpacho一次性使其可解析：

from gazpacho import Soup
soup = Soup.get("https://www.example.com/")

python - 使用 urllib 和 BeautifulSoup 通过 Python 从 Web 检索信息

2 回答 2

Related

Reference