0

嗨,我试图找到一个元素,我很容易找到了第一个元素(soup1_title),但我很难找到下一个元素!我只需要书籍作者,例如第一个我最喜欢的输出是:“由 JG Ballard 和 Martin Amis 撰写”(没有作者)这是问题所在: 这是我需要的领域

这是我写的代码:

import requests
from bs4 import BeautifulSoup
#Search_Text = input('Please Enter Search Query ')
#Search_Text = Search_Text.replace(' ','+')
url = 'https://www.amazon.com/s?k=j+g+ballard+short+stories'
#print(url)
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.text, 'lxml')
soup1_title = soup.select('.a-color-base.a-text-normal')
soup2_title = soup.select('.a-size-base'+'a-link-normal')


for j in soup2_title:
    print(j.string)

链接是:https ://www.amazon.com/s?k=j+g+ballard+short+stories

伙计们,你能帮我用美丽的汤找到上述元素吗?

4

1 回答 1

1
import requests
from bs4 import BeautifulSoup

params = {
    "k": "j g ballard short stories"
}

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0'
}


def main(url):
    r = requests.get(url, params=params, headers=headers)
    soup = BeautifulSoup(r.content, 'lxml')
    target = soup.select("div[class$=a-color-secondary]")
    for tar in target:
        if "by" in tar.text:
            print(tar.get_text(strip=True, separator=" "))


main("https://www.amazon.com/s")

输出:

by J. G. Ballard and Martin Amis
by J. G. Ballard | Feb 1, 2010
by J. G. Ballard and Anthony Burgess
by J. G. Ballard , Ric Jerrrom , et al.
by J. G. Ballard | Sep 1, 2006
by J. G. Ballard and China Miéville
by J. G. Ballard
by J. G. Ballard
by J. G. Ballard
by J. G. Ballard
by J. G. Ballard
by J. G. Ballard
by J. G. Ballard
by J. G. Ballard
by J. G. Ballard
by J. G. Ballard
于 2020-04-25T20:12:54.793 回答