1

这是真实数据的一个更简单的例子,它是一个 N>100 的 NxN 矩阵。现在我要查找的是每一行的前 3 名,然后找出哪些列最常出现在前 3 名中,以及它出现在哪些行下。

        A   B   C   D   E   F   G
    A   0   70  5   73  96  46  58
    B   47  0   20  89  75  50  19
    C   42  98  0   30  30  22  76
    D   66  20  18  0   63  18  60
    E   73  0   63  51  0   23  7
    F   79  34  61  56  12  0   99
    G   25  26  41  86  51  30  0

现在,我试图通过创建一个二进制矩阵来找到相同的结果,如果一个单元格位于一行的前 3 个,那么它是 1,否则为 0;但是,我在创建一个不是矩形的表格时遇到了最大的困难。

        A   B   C   D   E   F   G
    A   0   1   0   1   1   0   0
    B   0   0   0   1   1   1   0
    C   1   1   0   0   0   0   1
    D   1   0   0   0   1   0   1
    E   1   0   1   1   0   0   0
    F   1   0   1   0   0   0   1
    G   0   0   1   1   1   0   0

最终产品应如下所示,其中 A 在条件 C、D、E 和 F 下位于前三名。

    A       D       E       C       G       B       F
    C (42)  A (73)  A (96)  E (63)  C (76)  A (70)  B (50)
    D (66)  B (89)  B (75)  F (61)  D (60)  C (98)  
    E (73)  E (51)  D (63)  G (41)  F (99)      
    F (79)  G (86)  G (51)      

任何正确方向的建议或观点将不胜感激。

谢谢!

4

4 回答 4

1

这是我的做法:

X <- read.table(stdin())
A   B   C   D   E   F   G
A   0   70  5   73  96  46  58
B   47  0   20  89  75  50  19
C   42  98  0   30  30  22  76
D   66  20  18  0   63  18  60
E   73  0   63  51  0   23  7
F   79  34  61  56  12  0   99
G   25  26  41  86  51  30  0



indices <- which(t(apply(X,1,function(x) x >= sort(x,decr=TRUE)[3])), arr.ind=TRUE)
values  <- X[indices]

letters <- matrix(rownames(X)[indices],ncol=2)


data.frame(letters, values)

首先,我们以表格的形式读入数据。然后我们针对每行中的第三大元素测试每个元素。其余的只是使输出更好一点:

   X1 X2 values
1   C  A     42
2   D  A     66
3   E  A     73
4   F  A     79
5   A  B     70
6   C  B     98
7   E  C     63
8   F  C     61
9   G  C     41
10  A  D     73
11  B  D     89
12  E  D     51
13  G  D     86
14  A  E     96
15  B  E     75
16  D  E     63
17  G  E     51
18  B  F     50
19  C  G     76
20  D  G     60
21  F  G     99
于 2013-01-02T23:24:11.553 回答
1

你展示的结构我觉得很混乱。相反,如果您用它们在第二个矩阵中的位置替换值怎么办:

foo = structure(c(0L, 47L, 42L, 66L, 73L, 79L, 25L, 70L, 0L, 98L, 20L, 
0L, 34L, 26L, 5L, 20L, 0L, 18L, 63L, 61L, 41L, 73L, 89L, 30L, 
0L, 51L, 56L, 86L, 96L, 75L, 30L, 63L, 0L, 12L, 51L, 46L, 50L, 
22L, 18L, 23L, 0L, 30L, 58L, 19L, 76L, 60L, 7L, 99L, 0L), .Dim = c(7L, 
7L), .Dimnames = list(c("A", "B", "C", "D", "E", "F", "G"), c("A", 
"B", "C", "D", "E", "F", "G")))

bar = structure(c(0L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 1L, 
1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 
1L, 0L, 1L, 0L), .Dim = c(7L, 7L), .Dimnames = list(c("A", "B", 
"C", "D", "E", "F", "G"), c("A", "B", "C", "D", "E", "F", "G"
)))

bar[as.logical(bar)] = foo[as.logical(bar)]

> bar
   A  B  C  D  E  F  G
A  0 70  0 73 96  0  0
B  0  0  0 89 75 50  0
C 42 98  0  0  0  0 76
D 66  0  0  0 63  0 60
E 73  0 63 51  0  0  0
F 79  0 61  0  0  0 99
G  0  0 41 86 51  0  0

您显示的结构可以创建为list,但我认为它不能清晰地显示数据。但是,请随意扩展或澄清您的输出需求。

于 2013-01-02T22:26:08.873 回答
0
dat <- structure(c(0L, 47L, 42L, 66L, 73L, 79L, 25L, 70L, 0L, 98L, 20L, 
0L, 34L, 26L, 5L, 20L, 0L, 18L, 63L, 61L, 41L, 73L, 89L, 30L, 
0L, 51L, 56L, 86L, 96L, 75L, 30L, 63L, 0L, 12L, 51L, 46L, 50L, 
22L, 18L, 23L, 0L, 30L, 58L, 19L, 76L, 60L, 7L, 99L, 0L), .Dim = c(7L, 
7L), .Dimnames = list(c("A", "B", "C", "D", "E", "F", "G"), c("A", 
"B", "C", "D", "E", "F", "G")))

您的中间矩阵可以通过执行以下操作来计算:

in.top3 <- t(apply(-foo, 1, rank)) <= 3

然后,您可以通过执行以下操作为每列创建前 3 行列表;

apply(in.top3, 2, function(x)names(which(x)))
# $A
# [1] "C" "D" "E" "F"
# 
# $B
# [1] "A" "C"
# 
# $C
# [1] "E" "F" "G"
# 
# $D
# [1] "A" "B" "E" "G"
# 
# $E
# [1] "A" "B" "D" "G"
# 
# $F
# [1] "B"
# 
# $G
# [1] "C" "D" "F"
于 2013-01-02T23:24:07.497 回答
0
inter <- 0 + t( apply(foo, 1, rank) >= 5)  # don't actually need it but shows how.

vals <- foo * ( 0+t(apply(foo, 1, rank)>=5))
#----------
   A  B  C  D  E  F  G
A  0 70  0 73 96  0  0
B  0  0  0 89 75 50  0
C 42 98  0  0  0  0 76
D 66  0  0  0 63  0 60
E 73  0 63 51  0  0  0
F 79  0 61  0  0  0 99
G  0  0 41 86 51  0  0

vals [ , rev( order( colSums(vals >0) )) ]  
 # Arrange in order of increasing numbers of top three
#-------
   E  D  A  G  C  B  F
A 96 73  0  0  0 70  0
B 75 89  0  0  0  0 50
C  0  0 42 76  0 98  0
D 63  0 66 60  0  0  0
E  0 51 73  0 63  0  0
F  0  0 79 99 61  0  0
G 51 86  0  0 41  0  0

如果您希望 A col 排在第一位,您可以使用: vals [ , order( -colSums(vals >0) ) ]。如果您可以满足于水平显示:

> apply( vals2 , 2, function(col) paste0(rownames(foo)[col>0], " (", col[col>0], ")" ) )
$A
[1] "C (42)" "D (66)" "E (73)" "F (79)"

$D
[1] "A (73)" "B (89)" "E (51)" "G (86)"

$E
[1] "A (96)" "B (75)" "D (63)" "G (51)"

$C
[1] "E (63)" "F (61)" "G (41)"

$G
[1] "C (76)" "D (60)" "F (99)"

$B
[1] "A (70)" "C (98)"

$F
[1] "B (50)"
于 2013-01-02T23:58:26.317 回答