r - 如何在 r 中创建用于组比较的表？

Question

我正在比较各组之间的一系列基线和研究结束差异。例如，我可能有以下数据集：

> baseline.comp
                             cluster 1970_pred 2008_pred  ratio   diff
 9  Many Transitions, Middle Income    0.1156    0.0248 4.6613 0.0908
10     Many Transitions, Low Income    0.1779    0.0389 4.5733 0.1390
 4       Dictatorships, High Income    0.1403    0.0307 4.5700 0.1096
 7    One Transition, Middle Income    0.0801    0.0219 3.6575 0.0582
 1         Democracies, High Income    0.0396    0.0116 3.4138 0.0280
 5     Dictatorships, Middle Income    0.1252    0.0399 3.1378 0.0853
 2       Democracies, Middle Income    0.0811    0.0291 2.7869 0.0520
 8       One Transition, Low Income    0.1912    0.0775 2.4671 0.1137
 3          Democracies, Low Income    0.1612    0.0698 2.3095 0.0914
 6        Dictatorships, Low Income    0.1854    0.0821 2.2582 0.1033

在此示例中，我想将列pred_1970与其自身进行比较，以便我可以有一个表格来告诉我这些集群中基线条件的差异。这将是一个 10 x 10 的表格，但只有下面的对角线单元格会有实际数字，这反映了这些组的初始条件的差异。我想知道是否R已经实现了一些功能来做到这一点。

谢谢，

安东尼奥·佩德罗

score 2 · Accepted Answer

尝试以下操作：

# This part is just to create your data:

baseline.comp <- read.table(text="
                             cluster 1970_pred 2008_pred  ratio   diff
 9  'Many Transitions, Middle Income'    0.1156    0.0248 4.6613 0.0908
10     'Many Transitions, Low Income'    0.1779    0.0389 4.5733 0.1390
 4       'Dictatorships, High Income'    0.1403    0.0307 4.5700 0.1096
 7    'One Transition, Middle Income'    0.0801    0.0219 3.6575 0.0582
 1         'Democracies, High Income'    0.0396    0.0116 3.4138 0.0280
 5     'Dictatorships, Middle Income'    0.1252    0.0399 3.1378 0.0853
 2      'Democracies, Middle Income'    0.0811    0.0291 2.7869 0.0520
 8       'One Transition, Low Income'    0.1912    0.0775 2.4671 0.1137
 3          'Democracies, Low Income'    0.1612    0.0698 2.3095 0.0914
 6        'Dictatorships, Low Income'   0.1854    0.0821 2.2582 0.1033")

colnames(baseline.comp) <- c("cluster", "1970_pred", "2008_pred", "ratio", "diff")

# Now, we use outer

diff.1970 <- outer(baseline.comp$`1970_pred`, baseline.comp$`1970_pred`, "-")

# Just renaming the output matrix. I've used A through J to make 
# the output more readable.

#colnames(diff.1970) <- baseline.comp$cluster
colnames(diff.1970) <- LETTERS[1:10]
#rownames(diff.1970) <- baseline.comp$cluster
rownames(diff.1970) <- LETTERS[1:10]

# Make sure only the lower half of the result contains non-zero values

> diff.1970 * lower.tri(diff.1970)
        A       B       C       D      E       F      G       H      I J
A  0.0000  0.0000  0.0000  0.0000 0.0000  0.0000 0.0000  0.0000 0.0000 0
B  0.0623  0.0000  0.0000  0.0000 0.0000  0.0000 0.0000  0.0000 0.0000 0
C  0.0247 -0.0376  0.0000  0.0000 0.0000  0.0000 0.0000  0.0000 0.0000 0
D -0.0355 -0.0978 -0.0602  0.0000 0.0000  0.0000 0.0000  0.0000 0.0000 0
E -0.0760 -0.1383 -0.1007 -0.0405 0.0000  0.0000 0.0000  0.0000 0.0000 0
F  0.0096 -0.0527 -0.0151  0.0451 0.0856  0.0000 0.0000  0.0000 0.0000 0
G -0.0345 -0.0968 -0.0592  0.0010 0.0415 -0.0441 0.0000  0.0000 0.0000 0
H  0.0756  0.0133  0.0509  0.1111 0.1516  0.0660 0.1101  0.0000 0.0000 0
I  0.0456 -0.0167  0.0209  0.0811 0.1216  0.0360 0.0801 -0.0300 0.0000 0
J  0.0698  0.0075  0.0451  0.1053 0.1458  0.0602 0.1043 -0.0058 0.0242 0

关于这一点的一些注意事项：

一般来说，变量（或列名）以数字开头并不是一个好主意。这就是为什么我们在使用时必须重命名列的原因read.table：R 会自动在数字前放置一个“X”。请注意，在outer函数中引用这些列名时，我必须使用刻度。最好完全避免这种情况。

至于outer功能，我使用了轻微的变化。通常的调用看起来像x %o% y，与相同outer(x, y, "*")。然而，在这种情况下，我们感兴趣的是差异而不是乘法。

最后一步是将其乘以lower.tri，这将返回一个 TRUE/FALSE 矩阵，其中对角线以下的所有内容均为 TRUE，其他所有内容均为 FALSE。如果您已将其用作diag = TRUE参数，对角线也将是 TRUE，但这并不重要，因为对角线将始终为零。由于 R 将 TRUE 视为 1 并将 FALSE 视为零，因此我们可以乘以lower.tri原始矩阵以返回除我们感兴趣的值（对角线下方的值）之外的所有值。

score 1 · Accepted Answer

outer就是你要找的。

baseline_diff <- outer(baseline.comp[['1970_pred']],baseline.comp[['1970_pred']], '-')
## if you want to set the dimension names (but they will be very long!)
# dimnames(baseline_diff) <- list(baseline.comp[['cluster']],
#                                  baseline.comp[['cluster']])
 baseline_diff
          [,1]    [,2]    [,3]    [,4]   [,5]    [,6]    [,7]    [,8]    [,9]   [,10]
 [1,]  0.0000 -0.0623 -0.0247  0.0355 0.0760 -0.0096  0.0345 -0.0756 -0.0456 -0.0698
 [2,]  0.0623  0.0000  0.0376  0.0978 0.1383  0.0527  0.0968 -0.0133  0.0167 -0.0075
 [3,]  0.0247 -0.0376  0.0000  0.0602 0.1007  0.0151  0.0592 -0.0509 -0.0209 -0.0451
 [4,] -0.0355 -0.0978 -0.0602  0.0000 0.0405 -0.0451 -0.0010 -0.1111 -0.0811 -0.1053
 [5,] -0.0760 -0.1383 -0.1007 -0.0405 0.0000 -0.0856 -0.0415 -0.1516 -0.1216 -0.1458
 [6,]  0.0096 -0.0527 -0.0151  0.0451 0.0856  0.0000  0.0441 -0.0660 -0.0360 -0.0602
 [7,] -0.0345 -0.0968 -0.0592  0.0010 0.0415 -0.0441  0.0000 -0.1101 -0.0801 -0.1043
 [8,]  0.0756  0.0133  0.0509  0.1111 0.1516  0.0660  0.1101  0.0000  0.0300  0.0058
 [9,]  0.0456 -0.0167  0.0209  0.0811 0.1216  0.0360  0.0801 -0.0300  0.0000 -0.0242
[10,]  0.0698  0.0075  0.0451  0.1053 0.1458  0.0602  0.1043 -0.0058  0.0242  0.0000

要仅显示下（或上）三角形，请使用tril或triu在Matrix包中

library(Matrix)

tril(baseline_diff)

10 x 10 Matrix of class "dtrMatrix"
      [,1]    [,2]    [,3]    [,4]    [,5]    [,6]    [,7]    [,8]    [,9]    [,10]  
 [1,]  0.0000       .       .       .       .       .       .       .       .       .
 [2,]  0.0623  0.0000       .       .       .       .       .       .       .       .
 [3,]  0.0247 -0.0376  0.0000       .       .       .       .       .       .       .
 [4,] -0.0355 -0.0978 -0.0602  0.0000       .       .       .       .       .       .
 [5,] -0.0760 -0.1383 -0.1007 -0.0405  0.0000       .       .       .       .       .
 [6,]  0.0096 -0.0527 -0.0151  0.0451  0.0856  0.0000       .       .       .       .
 [7,] -0.0345 -0.0968 -0.0592  0.0010  0.0415 -0.0441  0.0000       .       .       .
 [8,]  0.0756  0.0133  0.0509  0.1111  0.1516  0.0660  0.1101  0.0000       .       .
 [9,]  0.0456 -0.0167  0.0209  0.0811  0.1216  0.0360  0.0801 -0.0300  0.0000       .
[10,]  0.0698  0.0075  0.0451  0.1053  0.1458  0.0602  0.1043 -0.0058  0.0242  0.0000

r - 如何在 r 中创建用于组比较的表？

2 回答 2

Related

Reference