python - 谷歌新闻爬虫返回带有网址、标题和简报的结果

Question

我是爬虫新手，我正在使用 Python 3.X。目前我正在练习抓取谷歌新闻以重新开始，但我的代码遇到了一些问题（代码运行但没有返回任何内容）。我希望代码爬取谷歌新闻进行查询，并返回结果中出现 url、标题和简报的结果。

非常感谢您的时间。我的代码如下：

import sys
import urllib
import requests
from bs4 import BeautifulSoup
import time

s = "Stack Overflow"
url = "http://www.google.com.sg/search?q="+s+"&tbm=nws&tbs=qdr:y"
#htmlpage = urllib2.urlopen(url).read()
time.sleep(randint(0, 2))
htmlpage = requests.get(url)
soup = BeautifulSoup(htmlpage.text,'lxml')
#print (len(soup.findAll("table", {"class": "result"})))
for result_table in soup.findAll("table", {"class": "result"}):
    a_click = result_table.find("a")
    print ("-----Title----\n" + a_click.renderContents())#Title
    print ("----URL----\n" + str(a_click.get("href")))#URL
    print ("----Brief----\n" + result_table.find("div", {"class": "c-abstract"}).renderContents())#Brief
    print ("Done")

score 1 · Accepted Answer

这就是我得到结果的方式，希望对您有所帮助：

>>> for result_table in soup.findAll("div", {"class": "g"}):
...     a_click = result_table.find("a")
...     print ("-----Title----\n" + str(a_click.renderContents()))#Title
...     print ("----URL----\n" + str(a_click.get("href")))#URL
...     print ("----Brief----\n" + str(result_table.find("div", {"class": "st"}).renderContents()))#Brief
...     print ("Done")
... 
-----Title----
b"<b>Stack Overflow</b>: Like sleep? Don't code in C"
----URL----
/url?q=http://www.infoworld.com/article/3190701/application-development/stack-overflow-like-sleep-dont-code-in-c.html&sa=U&ved=0ahUKEwjc34W_3NLTAhVIMY8KHVu_BoUQqQIIFigAMAA&usg=AFQjCNE7xDqkg-kyFR65krfMIJqIchHFwg
----Brief----
b'In analysis of programming traffic on the <b>Stack Overflow</b> online community over for four weeks last August, <b>Stack Overflow</b> Insights data scientist David Robinson,\xc2\xa0...'
Done

python - 谷歌新闻爬虫返回带有网址、标题和简报的结果

1 回答 1

Related

Reference