python - 从 html 框架中获取数据

Question

我尝试在这个html 框架中获取表格信息。我的意思是有列的表：

Year,Month,Oil Production m3,Gas Production Ksm3,...

使用beautifulSoup，这是迄今为止我尝试过的：

from bs4 import BeautifulSoup
from urllib import urlopen, urlretrieve, quote

url_base = 'https://www.og.decc.gov.uk/information/wells/pprs/Well_production_onshore_oil_fields/onshore_oil_fields_by_well/onshore_oil_fields_by_wel.html'
u = urlopen(url_base)
html = u.read().decode('utf-8')
u.close()
soup = BeautifulSoup(html)

但这仅检索主页信息，而不是页面框架。当我通过框架链接更改 url base 时，它告诉我请求的页面已过时。

score 1 · Accepted Answer

我认为您复制了错误的网址。当我使用以下内容时，它起作用了。

url_base = 'https://www.og.decc.gov.uk/information/wells/pprs/Well_production_onshore_oil_fields/onshore_oil_fields_by_well/0.htm'

注意：是.../onshore_oil_fields_by_well/0.htm，

代替.../onshore_oil_fields_by_well/0.html

python - 从 html 框架中获取数据

1 回答 1

Related

Reference