这不是因为sberry提到的 Javascript。这是因为当机器人或浏览器发送虚假字符串以宣布自己为不同的客户端时,没有user-agent
指定充当“真实”用户访问所需的内容。user-agent
您可以在我写的关于如何减少网络抓取时被阻止的机会的博客文章中阅读更多相关信息。
通过user-agent
:
headers = {
'User-agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582'
}
requests.get('URL', headers=headers)
在线IDE中的代码和示例:
from bs4 import BeautifulSoup
import requests, lxml
headers = {
"User-agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)"
"Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
params = {
'q': 'dhoni',
'hl': 'en',
'gl': 'uk' # if set to "us" (united states) it would be a diffrent HTML layout with different CSS selectors
}
html = requests.get('https://www.google.com/search', headers=headers, params=params)
soup = BeautifulSoup(html.text, 'lxml')
title = soup.select_one('#rhs .mfMhoc span, .qrShPb').text
subtitle = soup.select_one('.wwUB2c span').text
try:
snippet = soup.select_one('.zsYMMe+ span').text
except: snippet = None
print(f"{title}\n{subtitle}\n{snippet}\n")
for result in soup.select(".rVusze"):
key_element = result.select_one(".w8qArf").text
if result.select_one(".kno-fv"):
value_element = result.select_one(".kno-fv").text.replace(": ", "")
else: value_element = None # or pass
key_link = f'https://www.google.com{result.select_one(".w8qArf a")["href"]}'
try:
key_value_link = f'https://www.google.com{result.select_one(".kno-fv a")["href"]}'
except: key_value_link = None # or pass
print(f"{key_element}{value_element}\nkey_link: {key_link}\nkey_value_link: {key_value_link}")
--------------
# long output
'''
MS Dhoni
Indian cricketer
Mahendra Singh Dhoni, is a former Indian international cricketer who captained the Indian national team in limited-overs formats from 2007 to 2017 and in Test cricket from 2008 to 2014. He is widely regarded as one of the greatest in the history of cricket.
Born: 7 July 1981 (age 40 years), Ranchi, India
key_link: https://www.google.com/search?hl=en&gl=uk&q=ms+dhoni+born&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDfWEstOttIvSM0vyEkFUkXF-XlWSflFeYtYeXOLFVIy8vMyFUB8ABdR4Gk1AAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4Q6BMoAHoECF8QAg
key_value_link: https://www.google.com/search?hl=en&gl=uk&q=Ranchi&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDdWAjMNS0pKzLTEspOt9AtS8wtyUoFUUXF-nlVSflHeIla2oMS85IzMHayMANdQE388AAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4QmxMoAXoECF8QAw
Height: 1.8 m
key_link: https://www.google.com/search?hl=en&gl=uk&q=ms+dhoni+height&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDfWks1OttIvLsgvKinWLyjKj08sychJLUm1ykjNTM8oWcTKn1uskJKRn5epABEBAAIai08-AAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4Q6BMoAHoECHIQAg
key_value_link: None
Full name: Mahendra Singh Pansingh Dhoni
key_link: https://www.google.com/search?hl=en&gl=uk&q=ms+dhoni+full+name&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDfWks5OttIvSM0vyEkFUkXF-XlWaaU5OQp5ibmpi1iFcosVUjLy8zIV4IIAqvu4TT8AAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4Q6BMoAHoECHUQAg
key_value_link: None
Spouse: Sakshi Dhoni (m. 2010)
key_link: https://www.google.com/search?hl=en&gl=uk&q=ms+dhoni+spouse&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDfWkshOttIvSM0vyEkFUkXF-XlWxQX5pcWpi1j5c4sVUjLy8zIVICIAPnRCyzkAAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4Q6BMoAHoECHQQAg
key_value_link: https://www.google.com/search?hl=en&gl=uk&q=Sakshi+Dhoni&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDdW4gIxs0xSjIwytCSyk630C1LzC3JSgVRRcX6eVXFBfmlx6iJWnuDE7OKMTAWXjPy8zB2sjADGY2n9RQAAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4QmxMoAXoECHQQAw
Salary: 1.8 million USD (2016)
key_link: https://www.google.com/search?hl=en&gl=uk&q=ms+dhoni+salary&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDfWks1OttIvLsgvKinWLyjKj08sychJLUm1Kk7MSSyqXMTKn1uskJKRn5epABEBAGZmveY-AAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4Q6BMoAHoECG4QAg
key_value_link: None
Parents: Pan Singh, Devaki Devi
key_link: https://www.google.com/search?hl=en&gl=uk&q=ms+dhoni+parents&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDfWksxOttIvSM0vyEkFUkXF-XlWBYlFqXklxYtYBXKLFVIy8vMyFaBCAFvhf287AAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4Q6BMoAHoECGIQAg
key_value_link: https://www.google.com/search?hl=en&gl=uk&q=Pan+Singh&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDdW4gIxsypNM0zNtSSzk630C1LzC3JSgVRRcX6eVUFiUWpeSfEiVs6AxDyF4My89IwdrIwAlGBEk0MAAAA&sa=X&ved=2ahUKEwjR4OPNl9vzAhXILs0KHU4yAf4QmxMoAXoECGIQAw
'''
或者,您可以使用来自 SerpApi的
Google Knowledge Graph API来实现相同的目的。这是一个带有免费计划的付费 API。
您的情况的不同之处在于您不必弄清楚东西并从头开始创建所有内容,因为它已经为最终用户完成了,唯一真正需要做的就是获取您想要的数据访问结构化的 JSON 字符串。
要集成的代码:
params = {
"api_key": os.getenv("API_KEY"),
"engine": "google",
"q": "dhoni",
"hl": "en",
}
search = GoogleSearch(params)
results = search.get_dict()
print(results['knowledge_graph'])
---------------
'''
{
"title": "MS Dhoni",
"description": "Mahendra Singh Dhoni, is a former Indian international cricketer who captained the Indian national team in limited-overs formats from 2007 to 2017 and in Test cricket from 2008 to 2014. He is widely regarded as one of the greatest in the history of cricket.",
"source": {
"name": "Wikipedia",
"link": "https://en.wikipedia.org/wiki/MS_Dhoni"
},
"born": "July 7, 1981 (age 40 years), Ranchi, India",
"born_links": [
{
"text": "Ranchi, India",
"link": "https://www.google.com/search?q=Ranchi&stick=H4sIAAAAAAAAAOPgE-LUz9U3MC0wNDdWAjMNS0pKzLTEspOt9AtS8wtyUoFUUXF-nlVSflHeIla2oMS85IzMHayMANdQE388AAAA&sa=X&ved=2ahUKEwjOzafLmNvzAhWignIEHY5ZBMcQmxMoAHoFCJUBEAI"
}
]
... # other data
}
'''
免责声明,我为 SerpApi 工作。