你正在寻找这个:
# container with needed data: title, link, etc.
for result in soup.select('.tF2Cxc'):
link = result.select_one('.yuRUbf a')['href']
此外,在使用requests
库时,您可以像这样轻松传递 URL 参数:
# this:
main_site = 'https://www.google.com/'
search = 'search?q='
query = 'pillows'
full_url = main_site+search+query
# could be translated to this:
params = {
'q': 'minecraft',
'gl': 'us',
'hl': 'en',
}
html = requests.get('https://www.google.com/search', params=params)
在使用urllib
时,您可以这样做(在 python 3 中,这已移至urllib.parse.urlencode
):
# https://stackoverflow.com/a/54050957/15164646
# https://stackoverflow.com/a/2506425/15164646
url = "https://disc.gsfc.nasa.gov/SSW/#keywords="
params = {'keyword':"(GPM_3IMERGHHE)", 't1':"2019-01-02", 't2':"2019-01-03", 'bboxBbox':"3.52,32.34,16.88,42.89"}
quoted_params = urllib.parse.urlencode(params)
# 'bboxBbox=3.52%2C32.34%2C16.88%2C42.89&t2=2019-01-03&keyword=%28GPM_3IMERGHHE%29&t1=2019-01-02'
full_url = url + quoted_params
# 'https://disc.gsfc.nasa.gov/SSW/#keywords=bboxBbox=3.52%2C32.34%2C16.88%2C42.89&t2=2019-01-03&keyword=%28GPM_3IMERGHHE%29&t1=2019-01-02'
resp = urllib.urlopen(full_url).read()
在线IDE中的代码和示例:
from bs4 import BeautifulSoup
import requests, lxml
headers = {
'User-agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582'
}
params = {
'q': 'minecraft',
'gl': 'us',
'hl': 'en',
}
html = requests.get('https://www.google.com/search', headers=headers, params=params)
soup = BeautifulSoup(html.text, 'lxml')
for result in soup.select('.tF2Cxc'):
link = result.select_one('.yuRUbf a')['href']
print(link)
---------
'''
https://www.minecraft.net/en-us/
https://classic.minecraft.net/
https://play.google.com/store/apps/details?id=com.mojang.minecraftpe&hl=en_US&gl=US
https://en.wikipedia.org/wiki/Minecraft
'''
或者,您可以使用来自 SerpApi的Google Organic Results API来实现相同的目的。这是一个带有免费计划的付费 API。
您的情况的不同之处在于您不必从头开始制作所有内容,绕过块并随着时间的推移维护解析器。
要集成以实现您的目标的代码:
import os
from serpapi import GoogleSearch
params = {
"engine": "google",
"q": "minecraft",
"hl": "en",
"gl": "us",
"api_key": os.getenv("API_KEY"),
}
search = GoogleSearch(params)
results = search.get_dict()
for result in results["organic_results"]:
print(result['link'])
---------
'''
https://www.minecraft.net/en-us/
https://classic.minecraft.net/
https://play.google.com/store/apps/details?id=com.mojang.minecraftpe&hl=en_US&gl=US
https://en.wikipedia.org/wiki/Minecraft
'''
免责声明,我为 SerpApi 工作。