0

我想用 python 获得谷歌搜索结果,到目前为止我有以下脚本,我从这篇文章中学到了:

import urllib2
from bs4 import BeautifulSoup
import lxml
import sqlite3
import urllib
import json

def showSome(searchFor):
    query = urllib.urlencode({'q':searchFor})
    url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s'%query
    searchResponse = urllib.urlopen(url)
    searchResults = searchResponse.read()
    results = json.loads(searchResults)
    data = results['responseData']
    print 'Total results: %s'%data['cursor']['estimatedResultCount']
    hits = data['results']
    print 'Top %d hits'%len(hits)
    for h in hits:
        print ' ', h['url']

showSome("site:www.hitmeister.de/shops/")

它显示了 4380 个结果,当我使用浏览器搜索相同的查询时,它给了我大约 6650 个结果,我如何从谷歌中提取所有结果?这也给了我前 4 个结果,我怎样才能获取所有结果?

4

2 回答 2

2

这里的问题是谷歌的估计结果数量总是估计值,仅此而已。这些估计值可能会因多种因素而异,显然包括您是通过 API 搜索还是通过 Web 浏览器进行搜索。事实上,当您从同一系统上的不同浏览器运行相同的查询时,Google 会返回不同的估计值并不是未知的。这可能是由另一台服务器回答您的查询来解释的,但我对此表示怀疑,而且众所周知,Google 会考虑搜索上下文。

另请参阅此短片有关该主题的 Google 文档。尽管该附录似乎是专门为 Google Search Appliances 编写的,但它很好地描述了这些结果计数的准确性。

实际上,Google 无论如何都不会为查询返回超过 1,000 次点击,因此无论初始估计如何,您都不会获得查询的所有结果。至少,我没有尝试从 API 请求超过 1000 个结果,但这是 Web 界面的行为,我假设 API 具有相同的限制。

于 2012-05-07T14:18:40.390 回答
1

Google is very complex and not the results depend on many different parameters.

For example, if I search for a term on google.co.uk, I get different results than google.com.

This behavior can also be the same for different user-agents and cookies (e.g. because you have set a different language in your cookie).

Very important is also, that the result count is not accurate. It is just an estimation of the google search. If you want to change this behaviour, I would try to inject the same parameters via ajax, that you inject with a normal search (including cookie, etc).

Ultimately my counter-question would be: Why do you need this? This count is most of the time not accurate, because the counter is just an estimation. Much more important is the question if the top results are the same. If this is not the case, that would be a problem I think.

于 2012-05-07T14:18:44.803 回答