我正在尝试将表格从动态网页解析为 Pandas 数据框。使用另一个 Stackoverflow 问题(HTTP Error 403: Forbidden when reading HTML)的答案中建议的技巧,我得到了回复。但是,当我尝试将其转换为 Pandas DataFrame 时,我得到一个ParseError. 有任何想法吗?
下面的代码、输出和错误:
代码:
import requests
import pandas as pd
url = "https://www.barchart.com/options/volume-leaders/stocks"
header = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103 Safari/537.36",
"X-Requested-With": "XMLHttpRequest"
}
r = requests.get(url, headers=header)
print("Encoding of response object is: ",r.encoding)
r_content = r.content
pd.read_csv(io.StringIO(r_content.decode('utf-8')))
print() 语句的输出:
Encoding of response object is: UTF-8
错误:
ParserError: Error tokenizing data. C error: Expected 12 fields in line 78, saw 219