python - BeautifulSoup 断链检查器/网络爬虫

Question

我正在尝试基于此操作方法构建一个断开的链接检查器：https ://dev.to/arvindmehairjan/build-a-web-crawler-to-check-for-broken-links-with-python-beautifulsoup- 39毫克

但是，我的代码行有问题，因为当我运行程序时，我收到以下错误消息： File "/Users/Documents/brokenlinkchecker.py", line 26 print(f"Url: {link.get ('href')} " + f"| 状态码: {response_code}") SyntaxError: 无效语法

我被困在可能导致此语法错误的原因上。有人对我可以做些什么来使这个程序起作用有什么建议吗？

非常感谢！

这是代码：

# Import libraries
from bs4 import BeautifulSoup, SoupStrainer
import requests

# Prompt user to enter the URL
url = input("Enter your url: ")

# Make a request to get the URL
page = requests.get(url)

# Get the response code of given URL
response_code = str(page.status_code)

# Display the text of the URL in str
data = page.text

# Use BeautifulSoup to use the built-in methods
soup = BeautifulSoup(data)

# Iterate over all links on the given URL with the response code next to it
for link in soup.find_all('a'):
    print(f"Url: {link.get('href')} " + f"| Status Code: {response_code}")

score 0 · Accepted Answer

您必须将附加参数features="lxml"或features="html.parser"传递给 BeautifulSoup 构造函数。

soup = BeautifulSoup(data,features="html.parser")

python - BeautifulSoup 断链检查器/网络爬虫

1 回答 1

Related

Reference