r - R中的数据框（产品）相关性

Question

我有 2 个数据框，每个数据框有 150 行和 10 列 + 列和行 ID。我想将一个数据帧中的每一行与另一个数据帧中的每一行相关联（例如 150x150 相关性）并绘制结果 22500 值的分布。（然后我想从分布中计算 p 值等 - 但这是下一步）。

坦率地说，我不知道从哪里开始。我可以读取我的数据并查看如何关联向量或匹配两个矩阵的切片等，但我无法处理我在这里尝试做的事情。

score 2 · Accepted Answer

set.seed(42)
DF1 <- as.data.frame(matrix(rnorm(1500),150))
DF2 <- as.data.frame(matrix(runif(1500),150))

#transform to matrices for better performance
m1 <- as.matrix(DF1)
m2 <- as.matrix(DF2)

#use outer to get all combinations of row numbers and apply a function to them
#22500 combinations is small enough to fit into RAM
cors <- outer(seq_len(nrow(DF1)),seq_len(nrow(DF2)),
     #you need a vectorized function
     #Vectorize takes care of that, but is just a hidden loop (slow for huge row numbers)
     FUN=Vectorize(function(i,j) cor(m1[i,],m2[j,])))
hist(cors)

在此处输入图像描述

score 1 · Accepted Answer

1

您可以使用cor两个参数：

cor( t(m1), t(m2) )

于 2013-05-16T10:20:15.850 回答

r - R中的数据框（产品）相关性

2 回答 2

Related

Reference