r - 加权皮尔逊相关系数？

Question

我有一个2396x34 double matrix命名y，其中每一行（2396）代表一个由 34 个连续时间段组成的单独情况。

我还有一个代表 34 个连续时间段的单一情况的numeric[34]名称。x

目前我正在计算每一行之间的相关性y，x如下所示：

crs[,2] <- cor(t(y),x)

我现在需要的是用加权相关替换cor上述语句中的函数。权重向量有 34 个元素长，因此可以为 34 个连续时间段中的每一个分配不同的权重。xy.wt

我找到了这个Weighted Covariance Matrix函数cov.wt，并认为如果我首先scale获取数据，它应该像cor函数一样工作。事实上，您也可以指定函数返回一个相关矩阵。不幸的是，我似乎不能以相同的方式使用它，因为我无法分别提供我的两个变量 (x和y)。

有谁知道我可以在不牺牲太多速度的情况下以我描述的方式获得加权相关性的方法？

编辑：也许可以y在函数之前应用一些数学函数cor，以获得我正在寻找的相同结果。也许如果我将每个元素乘以xy.wt/sum(xy.wt)？

编辑#2corr我在boot包中发现了另一个函数。

corr(d, w = rep(1, nrow(d))/nrow(d))

d   
A matrix with two columns corresponding to the two variables whose correlation we wish to calculate.

w   
A vector of weights to be applied to each pair of observations. The default is equal weights for each pair. Normalization takes place within the function so sum(w) need not equal 1.

这也不是我需要的，但它更接近。

编辑#3 这是一些生成我正在使用的数据类型的代码：

x<-cumsum(rnorm(34))
y<- t(sapply(1:2396,function(u) cumsum(rnorm(34))))
xy.wt<-1/(34:1)

crs<-cor(t(y),x) #this works but I want to use xy.wt as weight

score 23 · Accepted Answer

不幸的是，当y矩阵超过一行时，接受的答案是错误的。错误在行

vy <- rowSums( w * y * y )

我们想将yby的列相乘w，但这会将行乘以的元素w，并根据需要回收。因此

> f(x, y[1, , drop = FALSE], xy.wt)
[1] 0.103021

是正确的，因为在这种情况下，乘法是按元素执行的，这相当于这里的按列乘法，但是

> f(x, y, xy.wt)[1]
[1] 0.05463575

由于逐行乘法，给出了错误的答案。

我们可以如下修正函数

f2 <- function( x, y, w = rep(1,length(x))) {
  stopifnot(length(x) == dim(y)[2] )
  w <- w / sum(w)
  # Center x and y, using the weighted means
  x <- x - sum(x * w)
  ty <- t(y - colSums(t(y) * w))
  # Compute the variance
  vx <- sum(w * x * x)
  vy <- colSums(w * ty * ty)
  # Compute the covariance
  vxy <- colSums(ty * x * w)
  # Compute the correlation
  vxy / sqrt(vx * vy)
}

corr并根据包中产生的结果检查结果boot：

> res1 <- f2(x, y, xy.wt)
> res2 <- sapply(1:nrow(y), 
+                function(i, x, y, w) corr(cbind(x, y[i,]), w = w),
+                x = x, y = y, w = xy.wt)
> all.equal(res1, res2)
[1] TRUE

这本身就提供了另一种解决这个问题的方法。

score 3 · Accepted Answer

您可以回到相关性的定义。

f <- function( x, y, w = rep(1,length(x))) {
  stopifnot( length(x) == dim(y)[2] )
  w <- w / sum(w)
  # Center x and y, using the weighted means
  x <- x - sum(x*w)
  y <- y - apply( t(y) * w, 2, sum )
  # Compute the variance
  vx <- sum( w * x * x )
  vy <- rowSums( w * y * y ) # Incorrect: see Heather's remark, in the other answer
  # Compute the covariance
  vxy <- colSums( t(y) * x * w )
  # Compute the correlation
  vxy / sqrt(vx * vy)
}
f(x,y)[1]
cor(x,y[1,]) # Identical
f(x, y, xy.wt)

score 3 · Accepted Answer

这是计算两个矩阵之间加权 Pearson 相关性的概括（而不是原始问题中的向量和矩阵）：

matrix.corr <- function (a, b, w = rep(1, nrow(a))/nrow(a)) 
{
    # normalize weights
    w <- w / sum(w)

    # center matrices
    a <- sweep(a, 2, colSums(a * w))
    b <- sweep(b, 2, colSums(b * w))

    # compute weighted correlation
    t(w*a) %*% b / sqrt( colSums(w * a**2) %*% t(colSums(w * b**2)) )
}

使用上面的例子和 Heather 的相关函数，我们可以验证它：

> sum(matrix.corr(as.matrix(x, nrow=34),t(y),xy.wt) - f2(x,y,xy.wt))
[1] 1.537507e-15

在调用语法方面，这类似于未加权cor：

> a <- matrix( c(1,2,3,1,3,2), nrow=3)
> b <- matrix( c(2,3,1,1,7,3,5,2,8,1,10,12), nrow=3)
> matrix.corr(a,b)
     [,1]      [,2] [,3]      [,4]
[1,] -0.5 0.3273268  0.5 0.9386522
[2,]  0.5 0.9819805 -0.5 0.7679882
> cor(a, b)
     [,1]      [,2] [,3]      [,4]
[1,] -0.5 0.3273268  0.5 0.9386522
[2,]  0.5 0.9819805 -0.5 0.7679882

r - 加权皮尔逊相关系数？

3 回答 3

Related

Reference