python - 在数据框中创建滚动平均值直到设定点

Question

我有一个这样的数据框：

month val1 val2 val3
1      2    3    5
2      3    4    7
3      5    1    2
4      7    4    3
5      2    6    4
6      2    2    2

我的初始列中的最后一个月在这里是 6，但可以是从第 1 个月到第 12 个月的任何时间。我想根据最后 2 个值计算滚动平均值，直到第 12 个月的每个 val 列。得到这样的东西：

month val1 val2 val3
1      2    3    5
2      3    4    7
3      5    1    2
4      7    4    3
5      2    6    4
6      2    2    2
7      2    4    3
8      2    3    2.5
9      2   3.5   2.75
10     2   3.25  2.63
11     2   3.38  2.69
12     2   3.32  2.66

score 0 · Accepted Answer

定义以下函数，根据最后 2 行生成当年剩余时间的行：

def getRest(last2):
    last2 = last2.set_index('month')
    lastMonth = last2.index[1]
    rv = []
    for mnth in range(lastMonth, 12):
        newRow = last2.mean()
        newRow.name = mnth + 1
        rv.append(newRow)
        last2 = last2.drop([mnth - 1])
        last2 = last2.append(newRow)
    return rv

然后通过以下方式调用它，与原始 DataFrame 连接：

pd.concat([df, pd.concat(getRest(df.iloc[-2:]), axis=1).T.reset_index()
    .rename(columns={'index': 'month'})], ignore_index=True)

结果是：

    month  val1    val2     val3
0       1   2.0  3.0000  5.00000
1       2   3.0  4.0000  7.00000
2       3   5.0  1.0000  2.00000
3       4   7.0  4.0000  3.00000
4       5   2.0  6.0000  4.00000
5       6   2.0  2.0000  2.00000
6       7   2.0  4.0000  3.00000
7       8   2.0  3.0000  2.50000
8       9   2.0  3.5000  2.75000
9      10   2.0  3.2500  2.62500
10     11   2.0  3.3750  2.68750
11     12   2.0  3.3125  2.65625

如果需要，将此结果保存在原始变量或另一个变量下。

score 0 · Accepted Answer

主要问题是将行附加到数据帧是一个非常低效的过程（即每次迭代创建一个新的数据帧系列并将其附加到原始数据帧将非常昂贵）。

可能最好的方法是从数据帧创建一个数组，在那里进行滚动计算，然后将结果转换为新的数据帧。

import pandas as pd
import numpy as np 

# create dataframe with the first month removed to show the solution is generalizable
df = pd.DataFrame({'month':[2,3,4,5,6],'val1':[3,5,7,2,2],'val2':[4,1,4,6,2],'val3':[7,2,3,4,2]})

df
   month  val1  val2  val3
0      2     3     4     7
1      3     5     1     2
2      4     7     4     3
3      5     2     6     4
4      6     2     2     2

# extract values of the dataframe as numpy and perform rolling operations
# separate out months from other columns
array_values = df.drop(columns = 'month').values

# loop from most recent month to month 12 
for month in range(df.month.iloc[-1],12):
    array_values = np.append(array_values, np.apply_along_axis(np.mean, 0,array_values[-2:]).reshape(1,3), axis = 0)

array_months = np.append(df.month.values, np.arange(df.month.values[-1]+1,13,1))
array_months = array_months.reshape(len(array_months),1)
array_values = np.append(array_months, array_values, axis = 1)

new_df = pd.DataFrame(data = array_values, columns = df.columns)
new_df.month = new_df.month.astype('int')

输出：

new_df
    month  val1    val2     val3
0       2   3.0  4.0000  7.00000
1       3   5.0  1.0000  2.00000
2       4   7.0  4.0000  3.00000
3       5   2.0  6.0000  4.00000
4       6   2.0  2.0000  2.00000
5       7   2.0  4.0000  3.00000
6       8   2.0  3.0000  2.50000
7       9   2.0  3.5000  2.75000
8      10   2.0  3.2500  2.62500
9      11   2.0  3.3750  2.68750
10     12   2.0  3.3125  2.65625

python - 在数据框中创建滚动平均值直到设定点

2 回答 2

Related

Reference