r - 在R中对齐字符矩阵中的行

Question

我有一个结构如下的字符矩阵：

dog    1   cow    9     mouse  7 
bird   10  tiger  1     gnu    2
tiger  3   deer   7     deer   27
skunk  2   rat    50    NA     NA
mouse  8   snake  3     NA     NA 
cow    7   NA     NA    NA     NA
sheep  21  NA     NA    NA     NA 
gnu    5   NA     NA    NA     NA

想象这是一个语言环境中的动物矩阵，每个语言环境的数据由连续的列对定义。有些动物可能在不同地区之间很常见，但地区也可能有独特的动物。最终我想制作这个数据的热图，因此需要重新排序这个矩阵以具有一个结构，其中所有类型的动物都有一个列，每个区域设置对应于数字的连续列：

dog    1    NA    NA 
tiger  3    1     NA 
skunk  2    NA    NA
mouse  8    NA    NA
cow    7    9     NA
sheep  21   NA    NA
gnu    5    NA    2
deer   NA   7     27
rat    NA   50    NA
snake  NA   3     NA
mouse  NA   NA    7
bird   10   NA    NA

换句话说，我有

A1 <- c("dog", "bird", "tiger", "skunk", "mouse", "cow", "sheep", "gnu")
B1 <- as.character(c(1, 10, 3, 2, 8, 7, 21, 5))
A2 <- c("cow", "tiger", "deer", "rat", "snake", NA, NA, NA)
B2 <- as.character(c(9, 1, 7, 50, 3, NA, NA, NA))
A3 <- c("mouse", "gnu", "deer", NA, NA, NA, NA, NA)
B3 <- as.character(c(7, 2, 27, NA, NA, NA, NA, NA))
TheMatrix <- cbind(A1, B1, A2, B2, A3, B3)

并且想要

a1 <- c("dog", "tiger", "skunk", "mouse", "cow", "sheep", "gnu", "deer", "rat", "snake", "mouse", "bird")
b1 <- as.character(c(1, 3, 2, 8, 7, 21, 5, NA, NA, NA, NA, 10))
b2 <- as.character(c(NA, 1, NA, NA, 9, NA, NA, 7, 50, 3, NA, NA))
b3 <- as.character(c(NA, NA, NA, NA, NA, NA, 2, 27, NA, NA, 7, NA))
DesiredResult <- cbind(a1, b1, b2, b3)

关于如何实现这种重组的想法？它可以通过循环和会计来完成，但肯定有一种更优雅的方式，我错过了。

score 5 · Accepted Answer

library(reshape2)

ncols = ncol(TheMatrix)
nrows = nrow(TheMatrix)

dcast(as.data.frame(na.omit(cbind(c(TheMatrix[,seq(1,ncols,2)]),
                                  c(TheMatrix[,seq(2,ncols,2)]),
                                  rep(colnames(TheMatrix)[seq(2,ncols,2)],
                                      each = nrows)))),
      V1 ~ V3, value.var = 'V2')
#      V1   B1   B2   B3
#1   bird   10 <NA> <NA>
#2    cow    7    9 <NA>
#3   deer <NA>    7   27
#4    dog    1 <NA> <NA>
#5    gnu    5 <NA>    2
#6  mouse    8 <NA>    7
#7    rat <NA>   50 <NA>
#8  sheep   21 <NA> <NA>
#9  skunk    2 <NA> <NA>
#10 snake <NA>    3 <NA>
#11 tiger    3    1 <NA>

这里发生了很多事情（每件事都非常简单），要理解，只需自己运行每一点（从内部开始并走出去）。

score 2 · Accepted Answer

这是我的看法：

> x <- read.table(text = "
+ dog    1   cow    9     mouse  7 
+ bird   10  tiger  1     gnu    2
+ tiger  3   deer   7     deer   27
+ skunk  2   rat    50    NA     NA
+ mouse  8   snake  3     NA     NA 
+ cow    7   NA     NA    NA     NA
+ sheep  21  NA     NA    NA     NA 
+ gnu    5   NA     NA    NA     NA ")

A. 将您的源数据转换为具有 3 列的数据框列表：动物、计数和语言环境编号：

> ll <- lapply(1:(ncol(x)/2), 
               function(i) cbind(x[c(2*i-1, 2*i)], data.frame(locale = i)))
[[1]]
     V1 V2 locale
1   dog  1      1
2  bird 10      1
3 tiger  3      1
4 skunk  2      1
5 mouse  8      1
6   cow  7      1
7 sheep 21      1
8   gnu  5      1

[[2]]
     V3 V4 locale
1   cow  9      2
2 tiger  1      2
3  deer  7      2
4   rat 50      2
5 snake  3      2
6  <NA> NA      2
7  <NA> NA      2
8  <NA> NA      2

[[3]]
     V5 V6 locale
1 mouse  7      3
2   gnu  2      3
3  deer 27      3
4  <NA> NA      3
5  <NA> NA      3
6  <NA> NA      3
7  <NA> NA      3
8  <NA> NA      3

B.rbind这些数据帧在一起。您应该首先使所有数据框中的名称相等，否则rbind将不起作用：

> for (i in 1:(ncol(x)/2)) names(ll[[i]])[1:2] <- c("animal", "count")
> x <- Reduce(rbind, ll)
   animal count locale
1     dog     1      1
2    bird    10      1
3   tiger     3      1
4   skunk     2      1
5   mouse     8      1
6     cow     7      1
7   sheep    21      1
8     gnu     5      1
9     cow     9      2
10  tiger     1      2
11   deer     7      2
12    rat    50      2
13  snake     3      2
14   <NA>    NA      2
15   <NA>    NA      2
16   <NA>    NA      2
17  mouse     7      3
18    gnu     2      3
19   deer    27      3
20   <NA>    NA      3
21   <NA>    NA      3
22   <NA>    NA      3
23   <NA>    NA      3
24   <NA>    NA      3

C. 最后，dcast从reshape2包中使用：

> library(reshape2)
> dcast(x, animal ~ locale, fun.aggregate = sum, value.var = "count")
   animal  1  2  3
1    bird 10  0  0
2     cow  7  9  0
3    deer  0  7 27
4     dog  1  0  0
5     gnu  5  0  2
6   mouse  8  0  7
7     rat  0 50  0
8   sheep 21  0  0
9   skunk  2  0  0
10  snake  0  3  0
11  tiger  3  1  0
12   <NA>  0 NA NA

D. 清理输出和替换为的最后一步0留给NA读者练习:)。

score 0 · Accepted Answer

这里有一个解决方案Reduce

#provide number of locales
max_locale=3
#this list contains the column numbers we want to use to split TheMatrix
split_list=split(1:(2*max_locale),sort(rep(1:max_locale,2)))

#this function will be used to re-merge the split matrix
my_locale_merge=function(x,y) {
    merge(x,y,by.x=colnames(x)[1],by.y=colnames(y)[1],all=TRUE)
}

#the outer subset is used to get rid of the NA animals
subset(
    #reduce subsequently applies my_locale_merge to the split matrix
    Reduce(
        "my_locale_merge",
        #lapply is used to split the matrix
        lapply(split_list,function(x) {
            as.data.frame(TheMatrix[,x,drop=FALSE],stringsAsFactors=FALSE)
            })
        ),
    !is.na(A1)
)

据我了解，Reduce不允许用户传递其他函数参数，例如by.x. 因此，我定义了一个my_locale_merge处理这些参数的新函数。

r - 在R中对齐字符矩阵中的行

3 回答 3

Related

Reference