为什么我在尝试抓取 hacked.com 时会收到 403,我该如何绕过它?根据 doesitusecloudflare.com 的说法,没有 cloudflare 阻碍(http://www.doesitusecloudflare.com/?url=https%3A%2F%2Fhacked.com%2Fwp-login.php)robots.txt允许任何用户代理并且只禁止访问 wp-admin 登录。
>>> import mechanicalsoup
>>> browser = mechanicalsoup.StatefulBrowser()
>>> browser.get('https://google.com')
<Response [200]>
>>> browser.get('https://hacked.com')
<Response [403]>
>>> browser.get('https://hacked.com').content
b'<html>\r\n<head><title>403 Forbidden</title></head>\r\n<body bgcolor="white">\r\n<center><h1>403 Forbidden</h1></center>\r\n<hr><center>nginx</center>\r\n</body>\r\n</html>\r\n'