这是我在几个搜索结果中测试的代码。为了使其适用于不同的搜索结果,只需更改requests.get
变量response
即可。
https://www.google.com/search?hl=en-US&q=best+cookies&tbm=nws
也可以使用较短的网址(例如:) 。
代码和完整示例:
from bs4 import BeautifulSoup
import requests
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
response = requests.get(
'https://www.google.com/search?hl=en-US&q=best+coockie&tbm=nws&sxsrf=ALeKk009n7GZbzUhUpsMTt89rigSAluBsQ%3A1616683043826&ei=I6BcYP_OMeGlrgTAwLpA&oq=best+coockie&gs_l=psy-ab.3...325216.326993.0.327292.12.12.0.0.0.0.163.1250.2j9.11.0....0...1c.1.64.psy-ab..1.0.0....0.305S8ngx0uo',
headers=headers)
html = response.text
soup = BeautifulSoup(html, 'lxml')
for headings in soup.findAll('div', class_='dbsr'):
title = headings.find('div', class_='JheGif nDgy9d').text
link = headings.a['href']
print(title)
print(link)
print()
输出:
The BEST cookie on the planet (and the Village too!)
https://thecoastnews.com/the-best-cookie-on-the-planet-and-the-village-too/
Best baking kits for kids 2021: Cookie mixes to flapjack recipes
https://www.independent.co.uk/extras/indybest/food-drink/baking/best-kids-baking-kits-b1821245.html
The official Girl Scout cookie power rankings
https://www.latimes.com/food/story/2021-02-24/girl-scout-cookie-power-rankings
Girl Scout Cookie Taste Test: Little Brownie Bakers vs. ABC
https://www.thedailymeal.com/eat/girl-scout-cookie-taste-comparison-abc-little-brownie-bakers
Food Critic, Provocateur Definitively Ranks Girl Scout Cookies
https://www.npr.org/2021/03/07/974226510/food-critic-provocateur-definitively-ranks-girl-scout-cookies
Chef Magnus Nilsson Jam Shortbread Cookie Recipe From ...
https://www.bloomberg.com/news/articles/2021-02-26/chef-magnus-nilsson-jam-shortbread-cookie-recipe-from-faviken-breakfast
Top 10 Best Cookie Cutters 2021 – Bestgamingpro
https://bestgamingpro.com/cookie-cutters/
Learn to make a favorite Girl Scout cookie at home
https://www.latimes.com/food/story/2021-02-25/learn-to-make-the-best-girl-scout-cookie-at-home
The 5 Best Cookie Jars
https://www.elitedaily.com/p/the-5-best-cookie-jars-63505798
Ulker Biskuvi Turkey's Best Cookie Picked as Top Stock for 2021
https://www.bloomberg.com/news/articles/2021-02-25/cookie-maker-tops-turkey-s-best-stock-bets-amid-hunt-for-value
此外,为了获取.text
,url's
您需要指定要从哪个来源(div
或其他)抓取它。
在你的代码中,你只指定了一个div
和一个class
+ 如果你想 return .text
,它会给你一个错误:AttributeError: ResultSet object has no attribute 'text'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
在这种情况下,您可以使用for loop
并在内部抓取您想要的东西。
有时,当您调用find_all()
/时findAll()
,它会给您一个空列表,因为您没有指定user-agent
. 默认值user-agent
不同(可能是平板电脑),具有不同的类和选择器。正因为如此,当你调用一个请求时class_=()
"bkWMgd"
,实际上这class_()
是不同的,因为它有一个不同的user-agent
. 希望这是有道理的。
我跳过了这个input
元素,因为它使事情复杂化:)
或者,您也可以使用SerpApi News Result API来获取这些(以及更多)结果。
SerpApi 示例 JSON 新闻结果:
"news_results": [
{
"position": 1,
"title": "Trump brushes aside environmental concerns, signs 2 executive ...",
"link": "https://www.usatoday.com/story/news/nation/2019/04/10/president-trump-orders-speed-oil-gas-pipeline-projects/3431466002/",
"source": "USA TODAY",
"date": "6 hours ago",
"snippet": "Aiming to streamline oil and gas pipeline projects, President Donald Trump on Wednesday signed two executive orders making it harder for ...",
"category": "In-Depth",
"thumbnail": "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRQdBI3wIjf_BX3zfRRYJjTGRRF5CNNZvqWAuza8-4mVZ75iBjlwOVTxcfGtg6_hLyUbPQ9cFA"
}
]
要集成的代码:
import os
from serpapi import GoogleSearch
params = {
"engine": "google",
"q": "best cookies",
"tbm": "nws",
"api_key": os.getenv("API_KEY"),
}
search = GoogleSearch(params)
results = search.get_dict()
for news_result in results["news_results"]:
print(f"Title: {news_result['title']}\n, Link: {news_result['link']}")
免责声明:我为 SerpApi 工作。