一个理想的场景是当您拥有良好的代理时,住宅是理想的选择,这将允许您选择特定的位置(国家、城市或移动运营商)和 CAPTCHA 解决服务。
作为替代解决方案,您可以使用来自 SerpApi 的Google Scholar API 。
它是一个付费 API,带有免费计划,可以通过代理和 CAPTCHA 解决方案绕过 Google 的阻止,可以扩展到企业级,而且最终用户无需从头开始创建解析器并随着时间的推移维护它。 HTML 已更改。
此外,它还支持cite、profile、author结果。
集成以解析有机结果的示例代码:
import json
from serpapi import GoogleScholarSearch
params = {
"api_key": "Your SerpAPi API KEY",
"engine": "google_scholar",
"q": "biology",
"hl": "en"
}
search = GoogleScholarSearch(params)
results = search.get_dict()
for result in results["organic_results"]:
print(json.dumps(result, indent=2))
# first organic results output:
'''
{
"position": 0,
"title": "The biology of mycorrhiza.",
"result_id": "6zRLFbcxtREJ",
"link": "https://www.cabdirect.org/cabdirect/abstract/19690600367",
"snippet": "In the second, revised and extended, edition of this work [cf. FA 20 No. 4264], two new chapters have been added (on carbohydrate physiology physiology Subject Category \u2026",
"publication_info": {
"summary": "JL Harley - The biology of mycorrhiza., 1969 - cabdirect.org"
},
"inline_links": {
"serpapi_cite_link": "https://serpapi.com/search.json?engine=google_scholar_cite&q=6zRLFbcxtREJ",
"cited_by": {
"total": 704,
"link": "https://scholar.google.com/scholar?cites=1275980731835430123&as_sdt=5,50&sciodt=0,50&hl=en",
"cites_id": "1275980731835430123",
"serpapi_scholar_link": "https://serpapi.com/search.json?as_sdt=5%2C50&cites=1275980731835430123&engine=google_scholar&hl=en"
},
"related_pages_link": "https://scholar.google.com/scholar?q=related:6zRLFbcxtREJ:scholar.google.com/&scioq=biology&hl=en&as_sdt=0,50",
"versions": {
"total": 4,
"link": "https://scholar.google.com/scholar?cluster=1275980731835430123&hl=en&as_sdt=0,50",
"cluster_id": "1275980731835430123",
"serpapi_scholar_link": "https://serpapi.com/search.json?as_sdt=0%2C50&cluster=1275980731835430123&engine=google_scholar&hl=en"
},
"cached_page_link": "https://scholar.googleusercontent.com/scholar?q=cache:6zRLFbcxtREJ:scholar.google.com/+biology&hl=en&as_sdt=0,50"
}
}
... other results
'''
在我的SerpApi 博客文章中,还有一个使用 Python 的专用 Scrape 历史 Google Scholar 结果。
免责声明,我为 SerpApi 工作。