我有一个数据框列,其值如下:
Salary Offered
----------------------
£18,323 per annum
£18,000 - £22,000 per annum
Salary not specified
£15,000 - £17,000 per annum, pro-rata
£37,000 - £45,000 per annum
£9,100 - £9,152 per annum, OTE
£9.25 - £10.15 per hour
£35,000 - £40,000 per annum
£23,000 - £26,600 per annum
£18,000 - £25,000 per annum, inc benefits
所以我运行了以下命令,它通过将纯字符串值(例如:“未指定薪水”)替换为 None 来做得很好,我可以用随机值替换它,但我必须再次将它们拆分为 £:
In[13]: df = pd.DataFrame(df.salary_offered.str.split('£',1).tolist(),
columns = ['flips','row'])
In[14]: df['row']
Out[14]:
0 18,323 per annum
1 18,000 - £22,000 per annum
2 None
3 15,000 - £17,000 per annum, pro-rata
4 37,000 - £45,000 per annum
5 9,100 - £9,152 per annum, OTE
6 9.25 - £10.15 per hour
7 35,000 - £40,000 per annum
8 23,000 - £26,600 per annum
9 18,000 - £25,000 per annum, inc benefits
此外,很少有几行按小时计算工资,因此也需要更换它们,这可以直观地完成。但我想分成具有平均值的不同列,如下所示:
Salary (£)
---------------
18323
20000
18000
16000
41000
...