无法从“https://www.theaic.co.uk/aic/analysis-investment-companies”中刮取@href 标签我正在使用 Python 3.7、scrapy、splash 并且也尝试使用 selenium 但没有用。
问问题
28 次
1 回答
0
您在页面上看到的表格在 inside <iframe>
,因此您必须先加载 iframe 的源代码:
import requests
from bs4 import BeautifulSoup
url = 'https://www.theaic.co.uk/aic/analyse-investment-companies'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
soup = BeautifulSoup(requests.get('https:' + soup.article.iframe['src']).content, 'html.parser')
for a in soup.select('.gridFundName a'):
print(a['href'])
印刷:
http://www.theaic.co.uk/3IN
http://www.theaic.co.uk/AAIF
http://www.theaic.co.uk/ADIG
http://www.theaic.co.uk/AEMC
http://www.theaic.co.uk/AJIT
http://www.theaic.co.uk/ALAI
http://www.theaic.co.uk/ABD
http://www.theaic.co.uk/ANII
http://www.theaic.co.uk/ANW
http://www.theaic.co.uk/ASCI
http://www.theaic.co.uk/AASC
http://www.theaic.co.uk/AAS
http://www.theaic.co.uk/ASEI
http://www.theaic.co.uk/ASLI
http://www.theaic.co.uk/ASL
http://www.theaic.co.uk/ASIT
http://www.theaic.co.uk/ASIZ
http://www.theaic.co.uk/AIF
http://www.theaic.co.uk/AIFZ
http://www.theaic.co.uk/AEWU
于 2020-09-13T08:58:46.313 回答