1
I have a dataframe like this:         
    Country                  Energy Supply      Energy Supply per Capita
16  Afghanistan              3.210000e+08       10.0    
17  Albania                  1.020000e+08       35.0    
18  Algeria                  1.959000e+09       51.0    
19  American Samoa           NaN                                        
40  Bolivia 
   (Plurinational State of)  3.360000e+08       32.0
... ... ... ...
213 Switzerland17            1.113000e+09       136.0   
214 Syrian Arab Republic     5.420000e+08       28.0    
215 Tajikistan               1.060000e+08       13.0    
216 Thailand                 5.336000e+09       79.0    
228 Ukraine18                4.844000e+09       107.0   
232 United States of 
    America20                9.083800e+10       286.0

我需要替换名称中带有括号或数字的所有国家/地区的名称。例如:“Bolivia (Plurinational State of)”应为“Bolivia”,“Switzerland17”应为“Switzerland”,“United States of America20”应为“United States of America”。我使用 replace() 和 split() 尝试了这个,但对我来说没有任何效果。

有人可以帮我解决这个问题。

4

3 回答 3

2

regex您可以像这样使用多个str.replace

考虑以下数据框:

In [1431]: df 
Out[1431]: 
                            Country
0                       Afghanistan
1  Bolivia (Plurinational State of)
2                     Switzerland17

In [1433]: df['Country'] = df['Country'].str.replace(r"\(.*\)|\d+",'')
In [1434]: df  
Out[1434]: 
         Country
0    Afghanistan
1       Bolivia 
2    Switzerland
于 2020-06-02T18:38:46.067 回答
1

您可以将此正则表达式模式与str.extract

df['Country'] = df.Country.str.extract('^([^\d\(]*)')[0]

输出:

                      Country  Energy Supply  Energy Supply per Capita
16                Afghanistan   3.210000e+08                      10.0
17                    Albania   1.020000e+08                      35.0
18                    Algeria   1.959000e+09                      51.0
19             American Samoa            NaN                       NaN
40                   Bolivia    3.360000e+08                      32.0
213               Switzerland   1.113000e+09                     136.0
214      Syrian Arab Republic   5.420000e+08                      28.0
215                Tajikistan   1.060000e+08                      13.0
216                  Thailand   5.336000e+09                      79.0
228                   Ukraine   4.844000e+09                     107.0
232  United States of America   9.083800e+10                     286.0
于 2020-06-02T18:38:32.970 回答
1
df.Country = df.Country.str.extract(r"([^(\d]+)")
      Country              Energy Supply     Energy Supply per Capita
16   Afghanistan           3.210000e+08     10.0
17   Albania               1.020000e+08     35.0
18   Algeria               1.959000e+09     51.0
19   American Samoa                 NaN     NaN
40   Bolivia               3.360000e+08     32.0
213  Switzerland           1.113000e+09     136.0
214  Syrian Arab Republic  5.420000e+08     28.0
215  Tajikistan            1.060000e+08     13.0
216  Thailand              5.336000e+09     79.0
228  Ukraine               4.844000e+09     107.0
于 2020-06-02T19:57:27.500 回答