0

在 python 中使用 pd.read_html,我试图从以下网站复制一个表: https ://finance.naver.com/sise/investorDealTrendDay.nhn?bizdate=215600&sosok=&page=2

import pandas as pd
df = pd.DataFrame()
df = df.append(pd.read_html(pg_url, header=0)[0], ignore_index=False)

但是,由于某种原因,我无法复制数字...

感谢您帮助找出问题所在

4

1 回答 1

1

对我来说工作得很好,删除header=0然后只删除NaNs 行:

url ='https://finance.naver.com/sise/investorDealTrendDay.nhn?bizdate=215600&sosok=&page=2'

df = pd.read_html(url)[0].dropna(how='all')
print (df)
          날짜       개인      외국인     기관계      기관                              \
          날짜       개인      외국인     기관계    금융투자     보험 투신(사모)     은행 기타금융기관   
0   20.08.06   -850.0   1638.0  -801.0  2247.0 -517.0 -993.0   46.0 -138.0   
1   20.08.05   4315.0   -516.0 -3666.0 -1277.0 -441.0 -871.0  -18.0  -30.0   
2   20.08.04   1844.0   -583.0 -1488.0   392.0 -493.0 -205.0   14.0  -54.0   
3   20.08.03   6237.0  -2687.0 -3795.0 -2841.0 -108.0 -411.0    0.0   -5.0   
4   20.07.31   4716.0   -556.0 -3861.0 -2659.0 -129.0 -709.0   -7.0   -4.0   
8   20.07.30     64.0   2247.0 -2342.0   423.0 -171.0 -428.0   -3.0  -13.0   
9   20.07.29    476.0   2936.0 -3368.0 -1346.0 -296.0 -698.0   -8.0  -92.0   
10  20.07.28 -10495.0  13060.0 -2220.0 -1440.0 -526.0  318.0   12.0  -76.0   
11  20.07.27  -2996.0   1584.0  1395.0  1968.0  -20.0  161.0 -179.0  -58.0   
12  20.07.24   2881.0    876.0 -3678.0 -1173.0 -545.0 -843.0  -43.0   -8.0   

             기타법인  
      연기금등   기타법인  
0  -1446.0   13.0  
1  -1029.0 -133.0  
2  -1142.0  227.0  
3   -429.0  246.0  
4   -352.0 -299.0  
8  -2151.0   30.0  
9   -929.0  -44.0  
10  -507.0 -345.0  
11  -476.0   16.0  
12 -1066.0  -79.0  

如果需要第一列到index然后到DatetimeIndex

url ='https://finance.naver.com/sise/investorDealTrendDay.nhn?bizdate=215600&sosok=&page=2'

df = pd.read_html(url, index_col=0)[0].dropna(how='all')
df.index = pd.to_datetime(df.index, format='%y.%m.%d')
print (df)
날짜               개인      외국인     기관계      기관                              \
날짜               개인      외국인     기관계    금융투자     보험 투신(사모)     은행 기타금융기관   
2020-08-06   -850.0   1638.0  -801.0  2247.0 -517.0 -993.0   46.0 -138.0   
2020-08-05   4315.0   -516.0 -3666.0 -1277.0 -441.0 -871.0  -18.0  -30.0   
2020-08-04   1844.0   -583.0 -1488.0   392.0 -493.0 -205.0   14.0  -54.0   
2020-08-03   6237.0  -2687.0 -3795.0 -2841.0 -108.0 -411.0    0.0   -5.0   
2020-07-31   4716.0   -556.0 -3861.0 -2659.0 -129.0 -709.0   -7.0   -4.0   
2020-07-30     64.0   2247.0 -2342.0   423.0 -171.0 -428.0   -3.0  -13.0   
2020-07-29    476.0   2936.0 -3368.0 -1346.0 -296.0 -698.0   -8.0  -92.0   
2020-07-28 -10495.0  13060.0 -2220.0 -1440.0 -526.0  318.0   12.0  -76.0   
2020-07-27  -2996.0   1584.0  1395.0  1968.0  -20.0  161.0 -179.0  -58.0   
2020-07-24   2881.0    876.0 -3678.0 -1173.0 -545.0 -843.0  -43.0   -8.0   

날짜                   기타법인  
날짜            연기금등   기타법인  
2020-08-06 -1446.0   13.0  
2020-08-05 -1029.0 -133.0  
2020-08-04 -1142.0  227.0  
2020-08-03  -429.0  246.0  
2020-07-31  -352.0 -299.0  
2020-07-30 -2151.0   30.0  
2020-07-29  -929.0  -44.0  
2020-07-28  -507.0 -345.0  
2020-07-27  -476.0   16.0  
2020-07-24 -1066.0  -79.0  
于 2020-08-21T06:27:29.073 回答