web-scraping - 抓取谷歌搜索结果并获取谷歌用来定义它的类别

Question

我正在做一个项目，该项目需要谷歌用来定义该业务/商店的类别。我正在使用漂亮的汤库，但我无法提取结果的特定部分。我将通过使用下图的示例向您展示：单击以打开图像

例如，当我搜索“加拿大贝尔”时，我会得到搜索结果，并且您还可以看到谷歌在页面右侧显示的一个框，其中总结了搜索“加拿大贝尔”。所以我的任务需要，提取名为Telecommunications company的 BELL CANADA 下的字幕。我怎样才能使用网络抓取在谷歌上进行任何搜索。

score 0 · Accepted Answer

确保您正在使用user-agent.
如果搜索结果将显示知识图，则可以实现这一点。
如果您只需要提取一个元素，则可以使用使用选择器查找元素的select_one()方法。有时它比（查看SelectorGadget，它是一个 Chrome 扩展）更好beautifulsoupCSSfind()

代码和完整示例：

from bs4 import BeautifulSoup
import requests
import lxml


headers = {
  "User-Agent":
  "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

response = requests.get(
  'https://www.google.com/search?q=bell canada',
  headers=headers).text

soup = BeautifulSoup(response, 'lxml')

subtitle = soup.select_one('.wwUB2c span').text

print(subtitle)

输出：

Telecommunications company

或者，您可以使用来自 SerpApi 的Google Knowledge Graph API，这是一个免费试用的付费 API。

JSON的一部分：

"knowledge_graph": {
  "title": "Bell Canada",
  "type": "Telecommunications company",
  "image": "https://serpapi.com/searches/6062ef864b91ebfc19196ae1/images/343f5783921730ee8c2f6f94ce7621d396fca86c6e023765.png",
  "website": "http://www.bell.ca/",
  "description": "Bell Canada is a Canadian telecommunications company headquartered at 1 Carrefour Alexander-Graham-Bell in the borough of Verdun in Montreal, Quebec, Canada. It is an ILEC in the provinces of Ontario and Quebec; as such, it was a founding member of the Stentor Alliance.",
  "source": {
    "name": "Wikipedia",
    "link": "https://en.wikipedia.org/wiki/Bell_Canada"
  }
}

要集成的代码：

import os
from serpapi import GoogleSearch

params = {
  "engine": "google",
  "q": "bell canada",
  "api_key": os.getenv("API_KEY"),
}

search = GoogleSearch(params)
results = search.get_dict()

subtitle = results["knowledge_graph"]["type"]
print(subtitle)

输出：

Telecommunications company

免责声明，我为 SerpApi 工作。

web-scraping - 抓取谷歌搜索结果并获取谷歌用来定义它的类别

1 回答 1

Related

Reference