我在尝试从此 Web 表中获取数据时遇到问题。我想知道是否有人可以阐明我的情况,谢谢!
HTML:
您想使用pandas.read_html()
以将 HTML 表收集到pandas
DataFrame 中。
用法:
>>> import pandas as pd
>>> pd.read_html(io)
来自文档的位置:
io : str 或类似文件
的URL、类似文件的对象或包含 HTML 的原始字符串。
例如,查看这个 Wikipedia URL:
In [0]: url = "https://en.wikipedia.org/wiki/List_of_Olympic_records_in_swimming"
In [1]: pd.read_html(url)[0]
Out[1]:
Event Time Name Nation Games Date Ref
0 50 m freestyle 21.30 César Cielo Brazil (BRA) 2008 Beijing 16 August 2008 [4]
1 100 m freestyle 47.05 Eamon Sullivan Australia (AUS) 2008 Beijing 13 August 2008 [5][A]
2 200 m freestyle 1:42.96 Michael Phelps United States (USA) 2008 Beijing 12 August 2008 [6]
3 400 m freestyle 3:40.14 Sun Yang China (CHN) 2012 London 28 July 2012 [7]
4 1500 m freestyle ♦14:31.02 Sun Yang China (CHN) 2012 London 4 August 2012 [8]
5 100 m backstroke ♦51.85 Ryan Murphy United States (USA) 2016 Rio de Janeiro 13 August 2016 [9][B]
6 200 m backstroke 1:53.41 Tyler Clary United States (USA) 2012 London 2 August 2012 [10]
7 100 m breaststroke 57.13 Adam Peaty Great Britain (GBR) 2016 Rio de Janeiro 7 August 2016 [11]
8 200 m breaststroke 2:07.22 Ippei Watanabe Japan (JPN) 2016 Rio de Janeiro 9 August 2016 [12][C]
9 100 m butterfly 50.39 Joseph Schooling Singapore (SGP) 2016 Rio de Janeiro 12 August 2016 [13]
10 200 m butterfly 1:52.03 Michael Phelps United States (USA) 2008 Beijing 13 August 2008 [14]
11 200 m individual medley 1:54.23 Michael Phelps United States (USA) 2008 Beijing 15 August 2008 [15]
12 400 m individual medley ♦4:03.84 Michael Phelps United States (USA) 2008 Beijing 10 August 2008 [16]
13 4×100 m freestyle relay ♦3:08.24 Michael Phelps (47.51)Garrett Weber-Gale (47.0... United States (USA) 2008 Beijing 11 August 2008 [17]
14 4×200 m freestyle relay 6:58.56 Michael Phelps (1:43.31)Ryan Lochte (1:44.28)R... United States (USA) 2008 Beijing 13 August 2008 [18]
15 4×100 m medley relay 3:27.95 Ryan Murphy (51.85)Cody Miller (59.03) Michael... United States (USA) 2016 Rio de Jainero 13 August 2016 [19]