我需要按范围对数值进行编码:低:0,中:1,高:2,非常高:3。我正在为四分位数做这件事。我有以下代码:
import pandas as pd
import numpy as np
def fun(df):
table = df.copy() # pandas dataframe
N = int(table.shape[0])
for header in list(table.columns):
q1 = np.percentile(table[header], 25)
q2 = np.percentile(table[header], 50)
q3 = np.percentile(table[header], 75)
for k in range(0, N):
if( table[header][k] < q1 ):
table[header][k] = int(0)
elif( (table[header][k] >= q1) & (table[header][k] < q2)):
table[header][k] = int(1)
elif( (table[header][k] >= q2) & (table[header][k] < q3)):
table[header][k] = int(2)
else:
table[header][k] = int(3)
pass
table = table.astype(int)
return table
证明
df = pd.DataFrame( {
'A': [30, 28, 32, 25, 25, 25, 22, 24, 35, 40],
'B': [25, 30, 27, 40, 42, 40, 50, 45, 30, 25],
'C': [25.5, 30.1, 27.3, 40.77, 25.1, 25.34, 22.11, 23.81, 33.66, 38.56],
}, columns = [ 'A', 'B', 'C' ] )
结果:
A B C
2 0 1
2 1 2
3 0 2
1 2 3
1 3 0
1 2 1
0 3 0
0 3 0
3 1 3
3 0 3
有什么方法可以有效地做同样的事情吗?