我建议@Wouter 可以将其放入他的(已接受)答案中,因为它使用 WDI 数据中的实际名称,并使其对使用它们的其他人来说更加剪切和粘贴。抱歉——我确定这不是沟通的正确方式……
对于您想要保留/使用的任何变量,只需在此字典中为它们命名:
WDIconversions={"Year":'year',
"YearCode":'',
"Country Name":'country_name_wb',
"Country Code":'countryCode_ISO3_WB',
"Inflation, consumer prices (annual %)":'',
"Inflation, GDP deflator (annual %)":'',
"GDP per capita, PPP (constant 2005 international $)":'GDPpc',
"Firms with female participation in ownership (% of firms)":'',
"Investment in energy with private participation (current US$)":'',
"Investment in telecoms with private participation (current US$)":'',
"Investment in transport with private participation (current US$)":'',
"Investment in water and sanitation with private participation (current US$)":'',
"Labor participation rate, female (% of female population ages 15+)":'',
"Labor participation rate, male (% of male population ages 15+)":'',
"Labor participation rate, total (% of total population ages 15+)":'',
"Ratio of female to male labor participation rate (%)":'',
"Life expectancy at birth, female (years)":'',
"Life expectancy at birth, male (years)":'',
"Life expectancy at birth, total (years)":'lifeExpectancy',
"Population, total":'nat_pop',
"GINI index":'GiniWB',
} # etc etc etc
dfW=pd.read_table(WBDrawfile)
df = dfW.set_index(['Country Name','Country Code','Indicator Name'])
del df['Indicator Code']
df.columns.name = 'year'
df=df.stack().unstack('Indicator Name')
df=df[[kk for kk,ii in WDIconversions.items() if ii and kk in df]].reset_index().rename(columns=WDIconversions)
这导致:
df
<class 'pandas.core.frame.DataFrame'>
Int64Index: 12983 entries, 0 to 12982
Data columns:
country_name_wb 12983 non-null values
countryCode_ISO3_WB 12983 non-null values
year 12983 non-null values
GiniWB 845 non-null values
nat_pop 12601 non-null values
GDPpc 6292 non-null values
educPrimary 4949 non-null values
lifeExpectancy 11077 non-null values
dtypes: float64(5), object(3)