0

我想获取任何 Google 搜索的右侧摘要框,并且我还得到了一个唯一标签,即用于获取它,但是当我使用 Python 代码获取 Google 搜索查询的内容时,我没有在谷歌的回应。

请帮我获取谷歌查询的全部内容:

获取谷歌查询页面的代码:

import requests

url = 'https://www.google.co.in/search?q=dhoni'
r = requests.get(url)
content = r.text
f = open('query.html','w')
f.write(search_results)
f.close()

PS:运行上面的代码并在浏览器中查看保存的文件后,右边的框不出现,这表明在获取页面内容时,没有获取到右边的框内容。

4

1 回答 1

0

这不是因为sberry提到的 Javascript。这是因为当机器人或浏览器发送虚假字符串以宣布自己为不同的客户端时,没有user-agent指定充当“真实”用户访问所需的内容。user-agent

您可以在我写的关于如何减少网络抓取时被阻止的机会的博客文章中阅读更多相关信息。

通过user-agent

headers = {
    'User-agent':
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582'
}

requests.get('URL', headers=headers)

在线IDE中的代码和示例:

from bs4 import BeautifulSoup
import requests, lxml

headers = {
    "User-agent":
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)"
    "Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

params = {
  'q': 'dhoni',
  'hl': 'en',
  'gl': 'uk'  # if set to "us" (united states) it would be a diffrent HTML layout with different CSS selectors
}

html = requests.get('https://www.google.com/search', headers=headers, params=params)
soup = BeautifulSoup(html.text, 'lxml')

title = soup.select_one('#rhs .mfMhoc span, .qrShPb').text
subtitle = soup.select_one('.wwUB2c span').text

try:
    snippet = soup.select_one('.zsYMMe+ span').text
except: snippet = None

print(f"{title}\n{subtitle}\n{snippet}\n")

for result in soup.select(".rVusze"):
    key_element = result.select_one(".w8qArf").text

    if result.select_one(".kno-fv"):
        value_element = result.select_one(".kno-fv").text.replace(": ", "")
    else: value_element = None # or pass

    key_link = f'https://www.google.com{result.select_one(".w8qArf a")["href"]}'

    try:
        key_value_link = f'https://www.google.com{result.select_one(".kno-fv a")["href"]}'
    except: key_value_link = None # or pass

    print(f"{key_element}{value_element}\nkey_link: {key_link}\nkey_value_link: {key_value_link}")


--------------
# long output
'''
MS Dhoni
Indian cricketer
Mahendra Singh Dhoni, is a former Indian international cricketer who captained the Indian national team in limited-overs formats from 2007 to 2017 and in Test cricket from 2008 to 2014. He is widely regarded as one of the greatest in the history of cricket.

Born: 7 July 1981 (age 40 years), Ranchi, India
key_link: https://www.google.com/search?hl=en&gl=uk&q=ms+dhoni+born&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDfWEstOttIvSM0vyEkFUkXF-XlWSflFeYtYeXOLFVIy8vMyFUB8ABdR4Gk1AAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4Q6BMoAHoECF8QAg
key_value_link: https://www.google.com/search?hl=en&gl=uk&q=Ranchi&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDdWAjMNS0pKzLTEspOt9AtS8wtyUoFUUXF-nlVSflHeIla2oMS85IzMHayMANdQE388AAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4QmxMoAXoECF8QAw
Height: 1.8 m
key_link: https://www.google.com/search?hl=en&gl=uk&q=ms+dhoni+height&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDfWks1OttIvLsgvKinWLyjKj08sychJLUm1ykjNTM8oWcTKn1uskJKRn5epABEBAAIai08-AAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4Q6BMoAHoECHIQAg
key_value_link: None
Full name: Mahendra Singh Pansingh Dhoni
key_link: https://www.google.com/search?hl=en&gl=uk&q=ms+dhoni+full+name&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDfWks5OttIvSM0vyEkFUkXF-XlWaaU5OQp5ibmpi1iFcosVUjLy8zIV4IIAqvu4TT8AAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4Q6BMoAHoECHUQAg
key_value_link: None
Spouse: Sakshi Dhoni (m. 2010)
key_link: https://www.google.com/search?hl=en&gl=uk&q=ms+dhoni+spouse&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDfWkshOttIvSM0vyEkFUkXF-XlWxQX5pcWpi1j5c4sVUjLy8zIVICIAPnRCyzkAAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4Q6BMoAHoECHQQAg
key_value_link: https://www.google.com/search?hl=en&gl=uk&q=Sakshi+Dhoni&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDdW4gIxs0xSjIwytCSyk630C1LzC3JSgVRRcX6eVXFBfmlx6iJWnuDE7OKMTAWXjPy8zB2sjADGY2n9RQAAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4QmxMoAXoECHQQAw
Salary: 1.8 million USD (2016)
key_link: https://www.google.com/search?hl=en&gl=uk&q=ms+dhoni+salary&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDfWks1OttIvLsgvKinWLyjKj08sychJLUm1Kk7MSSyqXMTKn1uskJKRn5epABEBAGZmveY-AAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4Q6BMoAHoECG4QAg
key_value_link: None
Parents: Pan Singh, Devaki Devi
key_link: https://www.google.com/search?hl=en&gl=uk&q=ms+dhoni+parents&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDfWksxOttIvSM0vyEkFUkXF-XlWBYlFqXklxYtYBXKLFVIy8vMyFaBCAFvhf287AAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4Q6BMoAHoECGIQAg
key_value_link: https://www.google.com/search?hl=en&gl=uk&q=Pan+Singh&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDdW4gIxsypNM0zNtSSzk630C1LzC3JSgVRRcX6eVUFiUWpeSfEiVs6AxDyF4My89IwdrIwAlGBEk0MAAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4QmxMoAXoECGIQAw
'''

或者,您可以使用来自 SerpApi的 Google Knowledge Graph API来实现相同的目的。这是一个带有免费计划的付费 API。

您的情况的不同之处在于您不必弄清楚东西并从头开始创建所有内容,因为它已经为最终用户完成了,唯一真正需要做的就是获取您想要的数据访问结构化的 JSON 字符串。

要集成的代码:

params = {
    "api_key": os.getenv("API_KEY"),
    "engine": "google",
    "q": "dhoni",
    "hl": "en",
}

search = GoogleSearch(params)
results = search.get_dict()

print(results['knowledge_graph'])

---------------
'''
{
  "title": "MS Dhoni",
  "description": "Mahendra Singh Dhoni, is a former Indian international cricketer who captained the Indian national team in limited-overs formats from 2007 to 2017 and in Test cricket from 2008 to 2014. He is widely regarded as one of the greatest in the history of cricket.",
  "source": {
    "name": "Wikipedia",
    "link": "https://en.wikipedia.org/wiki/MS_Dhoni"
  },
  "born": "July 7, 1981 (age 40 years), Ranchi, India",
  "born_links": [
    {
      "text": "Ranchi, India",
      "link": "https://www.google.com/search?q=Ranchi&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDdWAjMNS0pKzLTEspOt9AtS8wtyUoFUUXF-nlVSflHeIla2oMS85IzMHayMANdQE388AAAA&sa=X&ved=2ahUKEwjOzafLmNvzAhWignIEHY5ZBMcQmxMoAHoFCJUBEAI"
    }
  ]
... # other data
}
'''

免责声明,我为 SerpApi 工作。

于 2021-10-21T09:37:36.843 回答