1

我有一个时间序列熊猫数据框,并且我计算了一个新列

df['std_series']= ( df['series1']-df['series1'].rolling(252).mean() )/ df['series1'].rolling(252).std()

但是,我想在标准化之前以滚动方式将 Winsorize 降低到 5% 的水平。因此,对于任何数据点,如果它在 5% 分位数之外,请回顾 252 天,然后将其剪裁到 5% 分位数,然后进行标准化。我不知道如何使它与rolling.apply.

例如(滚动 10 个元素):并假设我在 (和)
df = pd.DataFrame({'series1':[78, 1, 3, 4, 5, 6, 7, 8, 99]})
处剪辑。然后剪辑级别:。然后在标准化之前预期的winsorized窗口将是0.150.85(min=3.2, max=64)
[ 64 3.2 3.2 4 5 6 7 8 64]

我发现的所有示例都是对数据框或整个列进行winsorize。

4

1 回答 1

1

使用的解决方案df.iterrows

首先设置你的参数:

import pandas as pd
import numpy as np

#Sample:
df = pd.DataFrame({'series1':[78, 1, 3, 4, 5, 6, 7, 8, 99]})

#Parameters:
win_size = 9 #size of the rolling window
p = (5,85) #percentile (min,max) between (0,100)

然后进行迭代:

window = [] #the rolling window
output = [] #the output

# Iterate over your df
for index, row in df.iterrows():
    #Update your output
    output = np.append(output,row.series1)

    #Manage the window
    window = np.append(window,row.series1) #append the element
    if len(window) > win_size: #skip the first if window is full
        window = np.delete(window,0)

    #Winsorize
    if len(window) == win_size:
        ll = np.round(np.percentile(window,p[0]),2) #Find the lower limit
        ul = np.round(np.percentile(window,p[1]),2) #Find the upper limit

        window = np.clip(window, ll , ul) #Clip the window

    output[-win_size:] = window #Update your output with the winsorized data

df['winsorized'] = output #Append to your dataframe
print(df)

结果:

   series1  winsorized
0       78        64.0
1        1         3.2
2        3         3.2
3        4         4.0
4        5         5.0
5        6         6.0
6        7         7.0
7        8         8.0
8       99        64.0

if len(window) == win_size:即使窗口未满,您也可以删除第一个数据。

于 2018-01-18T08:23:20.937 回答