python - Python 解析网站给出

Question

我需要分析一个网站但是，当我尝试分析它时，我得到了响应<html></html>

试图改变用户代理，cookie，没有帮助。

from bs4 import BeautifulSoup
import httpx

response = httpx.get('https://lolz.guru/market/')
soup = BeautifulSoup(response.text, 'lxml')

print(response.text)

score 0 · Accepted Answer

如果该站点需要真正的浏览器，您可以尝试使用真正的浏览器来检索页面和数据。Selenium是一个旨在测试 Web 应用程序的工具，但本质上它可以运行脚本来模拟用户与 Web 浏览器的交互，以便检查应用程序。

那里有很好的教程，也可以使用 Python 中的 Selenium。

它还支持 cookie：https ://www.selenium.dev/documentation/webdriver/browser/cookies/

from selenium import webdriver

driver = webdriver.Chrome()

driver.get("http://www.example.com")

# Adds the cookie into current browser context
driver.add_cookie({"name": "key", "value": "value"})

score 0 · Accepted Answer

你也可以使用request_html，它具有渲染 JavaScript 的能力：

from bs4 import BeautifulSoup
from requests_html import HTMLSession


session = HTMLSession()
resp = session.get('https://lolz.guru/market/')

resp.html.render(sleep=1, keep_page=True)
soup = BeautifulSoup(resp.html.html, "lxml")

print(soup.text)
# print the whole page

您可以使用 pip 安装它：pip install requests-html

python - Python 解析网站给出

2 回答 2

Related

Reference