0

你能帮我解决这个问题吗?我正在尝试抓取此网站https://industrydirectory.mjbizdaily.com/accounting/ 我正在尝试抓取所有链接,例如 https://industrydirectory.mjbizdaily.com/420-businesses/ 但我不能想办法

from bs4 import BeautifulSoup
import requests

url = 'https://industrydirectory.mjbizdaily.com/accounting/'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
test = soup.find_all('ul', class_='business-results')
print(test)
4

2 回答 2

1

您可以使用#main a获取所有网址:

urls = [url["href"] for url in soup.select("#main a")]

以文本为键,以 URL 为值的字典列表:

urls = []
for url in soup.select("#main a"):
    print(url.text, url["href"])
    urls.append({url.text: url["href"]})
于 2019-10-23T09:18:38.067 回答
0

这就是你要找的

for each in test:
  li = each.findAll('li')
  for a in li:
    print(a.find('a').attrs['href'])
于 2019-10-23T09:09:51.670 回答