使用的解决方案df.iterrows
:
首先设置你的参数:
import pandas as pd
import numpy as np
#Sample:
df = pd.DataFrame({'series1':[78, 1, 3, 4, 5, 6, 7, 8, 99]})
#Parameters:
win_size = 9 #size of the rolling window
p = (5,85) #percentile (min,max) between (0,100)
然后进行迭代:
window = [] #the rolling window
output = [] #the output
# Iterate over your df
for index, row in df.iterrows():
#Update your output
output = np.append(output,row.series1)
#Manage the window
window = np.append(window,row.series1) #append the element
if len(window) > win_size: #skip the first if window is full
window = np.delete(window,0)
#Winsorize
if len(window) == win_size:
ll = np.round(np.percentile(window,p[0]),2) #Find the lower limit
ul = np.round(np.percentile(window,p[1]),2) #Find the upper limit
window = np.clip(window, ll , ul) #Clip the window
output[-win_size:] = window #Update your output with the winsorized data
df['winsorized'] = output #Append to your dataframe
print(df)
结果:
series1 winsorized
0 78 64.0
1 1 3.2
2 3 3.2
3 4 4.0
4 5 5.0
5 6 6.0
6 7 7.0
7 8 8.0
8 99 64.0
if len(window) == win_size:
即使窗口未满,您也可以删除第一个数据。