我所做的只是将“年”和“年”的位置从第一行切换到第二行,反之亦然。
这是原始列
10+ years 653
< 1 year 249
2 years 243
3 years 235
5 years 202
4 years 191
1 year 177
6 years 163
7 years 127
8 years 108
9 years 72
. 2
Name: Employment.Length, dtype: int64
第一个例子(第一行的“年”,第二行的“年”)
raw_data['Employment.Length'] = raw_data['Employment.Length'].str.replace('years',' ')
raw_data['Employment.Length'] = raw_data['Employment.Length'].str.replace('year',' ')
raw_data['Employment.Length'] = np.where(raw_data['Employment.Length'].str[:2]=='10',10,raw_data['Employment.Length'])
raw_data['Employment.Length'] = np.where(raw_data['Employment.Length'].str[0]=='<',0,raw_data['Employment.Length'])
raw_data['Employment.Length'] = pd.to_numeric(raw_data['Employment.Length'], errors = 'coerce')
输出
10.0 653
0.0 249
2.0 243
3.0 235
5.0 202
4.0 191
1.0 177
6.0 163
7.0 127
8.0 108
9.0 72
Name: Employment.Length, dtype: int64
第二个例子(第一行的'year',第二行的'years')
raw_data_copy['Employment.Length'] = raw_data_copy['Employment.Length'].str.replace('year',' ')
raw_data_copy['Employment.Length'] = raw_data_copy['Employment.Length'].str.replace('years',' ')
raw_data_copy['Employment.Length'] = np.where(raw_data_copy['Employment.Length'].str[:2]=='10',10, raw_data_copy['Employment.Length'])
raw_data_copy['Employment.Length'] = np.where(raw_data_copy['Employment.Length'].str[0]=='<',0,raw_data_copy['Employment.Length'])
raw_data_copy['Employment.Length'] = pd.to_numeric(raw_data_copy['Employment.Length'], errors = 'coerce')
输出
10.0 653
0.0 249
1.0 177
Name: Employment.Length, dtype: int64
还有一件事是,当我用'year'注释掉我的第二行时,它给我的输出与第一个示例相同。当我用'years'注释掉我的第二行时,它给我的输出与第二个示例相同。
第三个例子
raw_data_copy['Employment.Length'] = raw_data_copy['Employment.Length'].str.replace('years',' ')
#raw_data_copy['Employment.Length'] = raw_data_copy['Employment.Length'].str.replace('years',' ')
raw_data_copy['Employment.Length'] = np.where(raw_data_copy['Employment.Length'].str[:2]=='10',10, raw_data_copy['Employment.Length'])
raw_data_copy['Employment.Length'] = np.where(raw_data_copy['Employment.Length'].str[0]=='<',0,raw_data_copy['Employment.Length'])
raw_data_copy['Employment.Length'] = pd.to_numeric(raw_data_copy['Employment.Length'], errors = 'coerce')
输出
10.0 653
0.0 249
2.0 243
3.0 235
5.0 202
4.0 191
6.0 163
7.0 127
8.0 108
9.0 72
Name: Employment.Length, dtype: int64