r - 重新排列数据框 - R

Question

我有一个看起来像这样的数据框：

       a        b       c         d
ab    0        0        1         0
cd   -0.415    1.415    0         0
ef    0        0        0.0811    0.918

有没有一种简单的方法可以将此表转换为：

       a        b       c         d
ab    0        0        1         0
cd   -0.415    0        0         0
cd    0    1.415        0         0
ef    0        0        0.0811    0
ef    0        0        0         0.918

如果原始表中有两个或多个数字，我想将其转换为相应的行数。我不知道该怎么做，所以任何帮助将不胜感激

score 4 · Accepted Answer

这是一种方法，使用矩阵索引。（数据被转换成一个矩阵，所以如果你的数据是一种类型会更好，就像你的例子中的情况一样。）

reformat.dat <- function(dat) {
  tdat <- t(dat)
  nz <- tdat != 0
  i <- col(tdat)[nz]
  j <- row(tdat)[nz]
  out <- matrix(0, sum(nz), ncol(dat))
  out[cbind(seq_len(sum(nz)), j)] <- tdat[nz]
  rownames(out) <- rownames(dat)[i]
  colnames(out) <- colnames(dat)
  out
}

reformat.dat(dat)
#         a     b      c     d
# ab  0.000 0.000 1.0000 0.000
# cd -0.415 0.000 0.0000 0.000
# cd  0.000 1.415 0.0000 0.000
# ef  0.000 0.000 0.0811 0.000
# ef  0.000 0.000 0.0000 0.918

score 4 · Accepted Answer

从@AnandaMahto 借一些并根据您的要求融化。请考虑：您希望检查的任何独特组合都在左侧~变量的值在右侧。在这种情况下，变量名变成了值。

library(reshape2)
mydf <- structure(list(a = c(0, -0.415, 0), b = c(0, 1.415, 0), 
                       c = c(1, 0, 0.0811), d = c(0, 0, 0.918)), 
                  .Names = c("a", "b", "c", "d"), 
                  class = "data.frame", row.names = c("ab", "cd", "ef"))
mydf$rows<- rownames(mydf)
m1<- melt(mydf, id="rows", measured= names(mydf))
m2<- dcast(m1, rows+value~..., fill=0)
m2<- m2[m2$value!=0, ]
m2$value <- NULL    

#rows      a     b      c     d
#2   ab  0.000 0.000 1.0000 0.000
#3   cd -0.415 0.000 0.0000 0.000
#5   cd  0.000 1.415 0.0000 0.000
#7   ef  0.000 0.000 0.0811 0.000
#8   ef  0.000 0.000 0.0000 0.918

score 2 · Accepted Answer

这是一个简单的解决方案，使用diag：

o <- apply(df, 1, function(x) {
    t <- diag(x)
    colnames(t) <- names(x)
    t <- t[rowSums(t == 0) != length(x), ,drop = FALSE]
    t
})
ids <- rep(names(o), sapply(o, nrow))
o <- do.call(rbind, o)
row.names(o) <- ids

#         a     b      c     d
# ab  0.000 0.000 1.0000 0.000
# cd -0.415 0.000 0.0000 0.000
# cd  0.000 1.415 0.0000 0.000
# ef  0.000 0.000 0.0811 0.000
# ef  0.000 0.000 0.0000 0.918

这是一个matrix. 如果as.data.frame(.)您需要data.frame.

score 1 · Accepted Answer

这是一种方法，但您需要跟进一些外观更改以修复行名称。

您的数据以可重现的形式：

mydf <- structure(list(a = c(0, -0.415, 0), b = c(0, 1.415, 0), 
                       c = c(1, 0, 0.0811), d = c(0, 0, 0.918)), 
                  .Names = c("a", "b", "c", "d"), 
                  class = "data.frame", row.names = c("ab", "cd", "ef"))

用 s 替换零NA：

mydf[mydf == 0] <- NA

stack你data.frame让它成为一个“长” data.frame：

mydf1 <- data.frame(Rows = rownames(mydf), stack(mydf))

为“行”生成唯一值

mydf1$Rows <- make.unique(as.character(mydf1$Rows))
# Let's see what we have so far....
mydf1
#    Rows  values ind
# 1    ab      NA   a
# 2    cd -0.4150   a
# 3    ef      NA   a
# 4  ab.1      NA   b
# 5  cd.1  1.4150   b
# 6  ef.1      NA   b
# 7  ab.2  1.0000   c
# 8  cd.2      NA   c
# 9  ef.2  0.0811   c
# 10 ab.3      NA   d
# 11 cd.3      NA   d
# 12 ef.3  0.9180   d

现在，只需使用xtabs来获取您正在寻找的输出。as.data.frame.matrix如果需要，将其包装起来data.frame，并在需要时清理行名。

as.data.frame.matrix(xtabs(values ~ Rows + ind, mydf1))
#           a     b      c     d
# ab.2  0.000 0.000 1.0000 0.000
# cd   -0.415 0.000 0.0000 0.000
# cd.1  0.000 1.415 0.0000 0.000
# ef.2  0.000 0.000 0.0811 0.000
# ef.3  0.000 0.000 0.0000 0.918

score -1 · Accepted Answer

我认为您所要求的内容没有一个优雅的版本，但也许您可以使用meltfromreshape2代替？它会给你每行/列对一行：

> library(reshape2) 
> # add row names as column
> df <- cbind(df, names=rownames(df))
> df <- melt(df,id.var="names")
Using  as id variables
> df[df$value != 0,]
   names variable   value
2     cd        a -0.4150
5     cd        b  1.4150
7     ab        c  1.0000
9     ef        c  0.0811
12    ef        d  0.9180

r - 重新排列数据框 - R

5 回答 5

Related

Reference