r - 使用 R 查找相关对

Question

           VZ.Close CBOU.Close SBUX.Close   T.Close
VZ.Close   1.0000000  0.5804478  0.8872978 0.9480894
CBOU.Close 0.5804478  1.0000000  0.7876277 0.4988890
SBUX.Close 0.8872978  0.7876277  1.0000000 0.8143305
T.Close    0.9480894  0.4988890  0.8143305 1.0000000

所以，假设我在股票价格之间存在这些相关性。我想查看第一行并找到相关性最高的对。那将是 VZ 和 T。然后我想删除这 2 只股票作为期权。然后，在剩余的股票中找到相关性最高的对。依此类推，直到所有股票配对。在此示例中，显然是 CBOU 和 SBUX，因为它们是仅剩的 2 个，但我希望代码能够容纳任意数量的对。

score 4 · Accepted Answer

如果您想查看每个步骤的最大相关性，这是一个解决方案。所以第一步不会只看第一行，而是看整个矩阵。

样本数据：

d <- matrix(runif(36),ncol=6,nrow=6)
rownames(d) <- colnames(d) <- LETTERS[1:6]
diag(d) <- 1
d
           A          B         C          D         E          F
A 1.00000000 0.65209204 0.8520392 0.26980214 0.5844000 0.69335143
B 0.73531603 1.00000000 0.5499431 0.60511580 0.7483990 0.14788134
C 0.56433218 0.27242769 1.0000000 0.07952776 0.2147628 0.03711562
D 0.91756919 0.04853523 0.5554490 1.00000000 0.4344089 0.23381447
E 0.06897889 0.80740821 0.7974340 0.87425643 1.0000000 0.74546072
F 0.19961474 0.61665231 0.2829632 0.58110694 0.7433924 1.00000000

和代码：

results <- data.frame(v1=character(0), v2=character(0), cor=numeric(0), stringsAsFactors=FALSE)
diag(d) <- 0
while (sum(d>0)>1) {
  maxval <- max(d)
  max <- which(d==maxval, arr.ind=TRUE)[1,]
  results <- rbind(results, data.frame(v1=rownames(d)[max[1]], v2=colnames(d)[max[2]], cor=maxval))
  d[max[1],] <- 0
  d[,max[1]] <- 0
  d[max[2],] <- 0
  d[,max[2]] <- 0
}

这使：

  v1 v2       cor
1  D  A 0.9175692
2  E  B 0.8074082
3  F  C 0.2829632

score 0 · Accepted Answer

我认为这回答了你的问题，但我不能确定，因为原来的问题有点模棱两可......

# Construct toy example of symmentrical matrix
# nc is number of rows/columns in matrix, in the problem above it was 4, but let's try with 6
nc <- 6
mat <- diag( 1 , nc )
# Create toy correlation data for matrix
dat <- runif( ( (nc^2-nc)/2 ) )
# Fill both triangles of matrix so it is symmetric
mat[lower.tri( mat ) ] <- dat 
mat[upper.tri( mat ) ] <- dat

# Create vector of random string names for row/column names
names <- replicate( nc , expr = paste( sample( c( letters , LETTERS ) , 3 , replace = TRUE ) , collapse = "" ) )
dimnames(mat) <- list( names , names )

# Sanity check
mat
    SXK   llq   xFL   RVW   oYQ   Seb
SXK 1.000 0.973 0.499 0.585 0.813 0.751
llq 0.973 1.000 0.075 0.533 0.794 0.826
xFL 0.499 0.099 1.000 0.099 0.481 0.968
RVW 0.075 0.813 0.620 1.000 0.620 0.307
oYQ 0.585 0.794 0.751 0.968 1.000 0.682
Seb 0.533 0.481 0.826 0.307 0.682 1.000

# Ok - to problem at hand , you can just substitute your matrix into these lines:
# Clearly the diagonal in a correlation matrix will be 1 so this is excluded as per your problem
diag( mat ) <- NA
# Now find the next highest correlation in each row and set this to NA
mat <- t( apply( mat , 1 , function(x) { x[ which.max(x) ] <- NA ; return(x) } ) ) 

# Another sanity check...!
mat

      SXK   llq   xFL   RVW   oYQ   Seb
SXK    NA    NA 0.499 0.585 0.813 0.751
llq    NA    NA 0.075 0.533 0.794 0.826
xFL 0.499 0.099    NA 0.099 0.481    NA
RVW 0.075    NA 0.620    NA 0.620 0.307
oYQ 0.585 0.794 0.751    NA    NA 0.682
Seb 0.533 0.481    NA 0.307 0.682    NA


# Now return the two remaining columns with greatest correlation in that row
res <- t( apply( mat , 1 , function(x) { y <- names( sort(x , TRUE ) )[1:2] ; return( y ) } ) )

res


[,1]  [,2] 
SXK "oYQ" "Seb"
llq "Seb" "oYQ"
xFL "SXK" "oYQ"
RVW "xFL" "oYQ"
oYQ "llq" "xFL"
Seb "oYQ" "SXK"

这回答了你的问题了吗？

r - 使用 R 查找相关对

2 回答 2

Related

Reference