python - Python网络爬虫

Question

是否有任何 python 爬虫可以从网页中提取所有数据，例如：http ://www.bestbuy.com/site/HTC+-+One+S+4G+Mobile+Phone+-+Gradient+Blue+%28T-Mobile %29/4980512.p?id=1218587135819&skuId=4980512&contract_desc= 在此页面中，客户评论有两个页面 1 和 2。我想抓取他的 url 并获取两个页面的内容。这可能与 python 爬虫。

python爬虫也支持所有现代的GET/POST技术

score 12 · Accepted Answer

你可以使用Scrapy：

Scrapy 是一个快速的高级屏幕抓取和网络抓取框架，用于抓取网站并从其页面中提取结构化数据。它可用于广泛的用途，从数据挖掘到监控和自动化测试。

score 3 · Accepted Answer

如果您想抓取网站，请参阅这篇文章。如果您只想处理一些页面并分析它们的内容（意味着您知道要处理的 URL），请尝试BeautifulSoup，它允许您执行以下操作：

page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
for f in soup.findAll('form'):
    target_url = f['action']
    #do something with each one of the forms

python - Python网络爬虫

2 回答 2

Related

Reference