r - R - 数据帧的转置部分

Question

我有一个从 ArcGIS 10.1 导出的 .dbf 文件，我需要重新组织它。数据示例如下：

       V1               V2
40.000000000000000 41.000000000000000

40.000000000000000 42.000000000000000

41.000000000000000 40.000000000000000

41.000000000000000 42.000000000000000

41.000000000000000 43.000000000000000

42.000000000000000 40.000000000000000

42.000000000000000 41.000000000000000

42.000000000000000 43.000000000000000

43.000000000000000 41.000000000000000

43.000000000000000 42.000000000000000

我需要一种格式的数据，其中第一列中的每个唯一值只有一行，第二列中的所有相应值现在都出现在该行中，例如：

  V1                   V2              V3                  V4
40.000000000000000 41.000000000000000 42.000000000000000

41.000000000000000 40.000000000000000 42.000000000000000 43.000000000000000

42.000000000000000 40.000000000000000 41.000000000000000 43.000000000000000

43.000000000000000 41.000000000000000 42.000000000000000

如果有人可以帮助我解决这个问题，我将不胜感激。谢谢！

score 2 · Accepted Answer

你也可以这样做dplyr

library(dplyr)
library(tidyr)
  dat%>% 
  group_by(X1) %>%
  mutate(Time=seq_along(X1)) 
  %>%spread(Time,X2)
 #Source: local data frame [4 x 4]

 #X1  1  2  3
#1 40 41 42 NA
#2 41 40 42 43
#3 42 40 41 43
#4 43 41 42 NA

score 2 · Accepted Answer

split您可以使用第一列上的函数和lapply用于提取向量的方法拆分数据框：

dat = data.frame(X1=c(40, 40, 41, 41, 41, 42, 42, 42, 43, 43),
                 X2=c(41, 42, 40, 42, 43, 40, 41, 43, 41, 42))
res <- lapply(split(dat, dat[,1]), function(d) c(d[1,1], sort(unique(d[,2]))))
res
# $`40`
# [1] 40 41 42
# 
# $`41`
# [1] 41 40 42 43
# 
# $`42`
# [1] 42 40 41 43
# 
# $`43`
# [1] 43 41 42

大多数人可能更愿意以这种格式保存数据，但您也可以将列表组合成一个矩阵，用NA值右填充向量：

max.len <- max(unlist(lapply(res, length)))
do.call(rbind, lapply(res, function(x) { length(x) <- max.len ; x }))
#    [,1] [,2] [,3] [,4]
# 40   40   41   42   NA
# 41   41   40   42   43
# 42   42   40   41   43
# 43   43   41   42   NA

score 1 · Accepted Answer

这本质上是一个reshape问题，但您没有“时间”变量。

您可以轻松地创建一个“时间”变量，如下所示：

dat$time <- with(dat, ave(X1, X1, FUN = seq_along))

从那里，reshape从基础 R 使用...

reshape(dat, direction = "wide", idvar="X1", timevar="time")
#   X1 X2.1 X2.2 X2.3
# 1 40   41   42   NA
# 3 41   40   42   43
# 6 42   40   41   43
# 9 43   41   42   NA

...或dcast来自“reshape2”...

library(reshape2)
dcast(dat, X1 ~ time, value.var="X2")
#   X1  1  2  3
# 1 40 41 42 NA
# 2 41 40 42 43
# 3 42 40 41 43
# 4 43 41 42 NA

r - R - 数据帧的转置部分

3 回答 3

Related

Reference