我目前正在使用以下函数来计算 python 中的 Pearson Product-Moment Correlation Coefficient。
def PearsonCoefficient(x, y):
assert len(x) == len(y)
n = len(x)
assert n > 0
avg_x = float(sum(x)) / n
avg_y = float(sum(y)) / n
diffprod = 0
xdiff2 = 0
ydiff2 = 0
for idx in range(n):
xdiff = x[idx] - avg_x
ydiff = y[idx] - avg_y
diffprod += xdiff * ydiff
xdiff2 += xdiff * xdiff
ydiff2 += ydiff * ydiff
p = math.sqrt(xdiff2 * ydiff2)
if p == 0:
return None
return diffprod / p
我的数据是基于(基于 x)的时间序列,y 值表示用户分数。我按周对时间序列数据进行分组,并取该时间段的平均分数。但是,我想将过去三个月的数据权重高于以前的数据。我不确定如何根据这个假设生成我的权重向量。
我的数据看起来像
jan 1st - 0.4
jan 8th - 0.7
jan 15th - 0.55
jan 22nd - 0.75
jan 29th - 0.88
feb 5th - 0.91
feb 12th - 0.87
feb 19th - 0.89
feb 26th - 0.93
feb 5th - 0.56
...