0

我有一个如下数据框:

read.csv(text="num,placed,recovered
1,2013-02-22 12:14:00,2013-02-27 15:14:00
1,2013-03-03 17:32:00,2013-03-07 17:32:00
1,2013-04-24 10:13:00,2013-04-26 07:47:00
1,2013-04-15 14:51:00,2013-04-19 09:36:00
1,2013-04-11 11:56:00,2013-04-15 12:52:00
10,2013-02-22 07:30:00,2013-02-27 14:55:00
10,2013-03-03 17:20:00,2013-03-07 17:20:00
10,2013-04-15 15:22:00,2013-04-19 09:48:00
10,2013-02-17 10:38:00,2013-02-22 07:18:00
10,2013-04-11 10:09:00,2013-04-15 13:21:00
10,2013-04-24 10:07:00,2013-04-26 08:23:00
11,2013-02-22 14:23:00,2013-02-27 15:50:00
11,2013-04-11 12:51:00,2013-04-14 09:40:00
11,2013-04-15 14:45:00,2013-04-19 08:28:00
11,2013-04-19 10:13:00,2013-04-23 12:01:00
14,2013-03-01 13:45:00,2013-03-08 14:28:00
14,2013-02-22 13:22:00,2013-02-27 15:24:00
14,2013-04-04 15:36:00,2013-04-17 15:04:00",header=TRUE)

我想重新排列它,使每个输入num出现一次,其所有placedrecovered值都在一行中。以下是示例行:

num           placed1          recovered1             placed2          recovered2             placed3          recovered3             placed4          recovered4             placed5          recovered5
1 2013-02-22 12:14:00 2013-02-27 15:14:00 2013-03-03 17:32:00 2013-03-07 17:32:00 2013-04-24 10:13:00 2013-04-26 07:47:00 2013-04-15 14:51:00 2013-04-19 09:36:00 2013-04-11 11:56:00 2013-04-15 12:52:00

某些行将具有不同数量的放置和恢复值。NAs出现在那些地方就好了。我试过使用重塑功能,但似乎无法得到我想要的东西。

我这样做是对我正在清理的数据集进行子集化的一个步骤。另一个数据集记录了随时间变化的测量值以及收集时间。获取数据的设备存储在num列中。我想获取该数据帧的一个子集来仅获取放置该设备的时间间隔(每对数据placed之间的时间recovered)。因此,另一个数据框将如下所示:

num  temp time
1    5    2013-02-22 12:13:50
1    6    2013-02-22 12:14:00
1    4    2013-02-22 12:14:10
1    9    2013-04-24 09:45:20
1    7    2013-04-24 11:45:50
10   23   2013-03-03 19:23:40

如果我能够成功对其进行子集化,结果将如下所示

num  temp time
1    6    2013-02-22 12:14:00
1    4    2013-02-22 12:14:10
1    7    2013-04-24 11:45:50
10   23   2013-03-03 19:23:40
4

1 回答 1

2

您只需要在数据集中包含一个“时间”变量即可reshape正常工作:

mydf$time <- with(mydf, ave(num, num, FUN = seq_along))
head(mydf)
#   num              placed           recovered time
# 1   1 2013-02-22 12:14:00 2013-02-27 15:14:00    1
# 2   1 2013-03-03 17:32:00 2013-03-07 17:32:00    2
# 3   1 2013-04-24 10:13:00 2013-04-26 07:47:00    3
# 4   1 2013-04-15 14:51:00 2013-04-19 09:36:00    4
# 5   1 2013-04-11 11:56:00 2013-04-15 12:52:00    5
# 6  10 2013-02-22 07:30:00 2013-02-27 14:55:00    1
reshape(mydf, idvar="num", timevar="time", direction = "wide")
#    num            placed.1         recovered.1            placed.2         recovered.2
# 1    1 2013-02-22 12:14:00 2013-02-27 15:14:00 2013-03-03 17:32:00 2013-03-07 17:32:00
# 6   10 2013-02-22 07:30:00 2013-02-27 14:55:00 2013-03-03 17:20:00 2013-03-07 17:20:00
# 12  11 2013-02-22 14:23:00 2013-02-27 15:50:00 2013-04-11 12:51:00 2013-04-14 09:40:00
# 16  14 2013-03-01 13:45:00 2013-03-08 14:28:00 2013-02-22 13:22:00 2013-02-27 15:24:00
#               placed.3         recovered.3            placed.4         recovered.4
# 1  2013-04-24 10:13:00 2013-04-26 07:47:00 2013-04-15 14:51:00 2013-04-19 09:36:00
# 6  2013-04-15 15:22:00 2013-04-19 09:48:00 2013-02-17 10:38:00 2013-02-22 07:18:00
# 12 2013-04-15 14:45:00 2013-04-19 08:28:00 2013-04-19 10:13:00 2013-04-23 12:01:00
# 16 2013-04-04 15:36:00 2013-04-17 15:04:00                <NA>                <NA>
#               placed.5         recovered.5            placed.6         recovered.6
# 1  2013-04-11 11:56:00 2013-04-15 12:52:00                <NA>                <NA>
# 6  2013-04-11 10:09:00 2013-04-15 13:21:00 2013-04-24 10:07:00 2013-04-26 08:23:00
# 12                <NA>                <NA>                <NA>                <NA>
# 16                <NA>                <NA>                <NA>                <NA>

如果您像我上面那样添加了“时间”变量,您还可以在制作更长的数据集后使用“reshape2”包。那个超长的数据集(我在下面称之为“mydf.l”)对于子集可能比宽数据集更有用:

library(reshape2)
mydf.l <- melt(mydf, id.vars=c("num", "time"))
head(mydf.l)
#   num time variable               value
# 1   1    1   placed 2013-02-22 12:14:00
# 2   1    2   placed 2013-03-03 17:32:00
# 3   1    3   placed 2013-04-24 10:13:00
# 4   1    4   placed 2013-04-15 14:51:00
# 5   1    5   placed 2013-04-11 11:56:00
# 6  10    1   placed 2013-02-22 07:30:00
dcast(mydf.l, num ~ variable + time)
#   num            placed_1            placed_2            placed_3            placed_4
# 1   1 2013-02-22 12:14:00 2013-03-03 17:32:00 2013-04-24 10:13:00 2013-04-15 14:51:00
# 2  10 2013-02-22 07:30:00 2013-03-03 17:20:00 2013-04-15 15:22:00 2013-02-17 10:38:00
# 3  11 2013-02-22 14:23:00 2013-04-11 12:51:00 2013-04-15 14:45:00 2013-04-19 10:13:00
# 4  14 2013-03-01 13:45:00 2013-02-22 13:22:00 2013-04-04 15:36:00                <NA>
#              placed_5            placed_6         recovered_1         recovered_2
# 1 2013-04-11 11:56:00                <NA> 2013-02-27 15:14:00 2013-03-07 17:32:00
# 2 2013-04-11 10:09:00 2013-04-24 10:07:00 2013-02-27 14:55:00 2013-03-07 17:20:00
# 3                <NA>                <NA> 2013-02-27 15:50:00 2013-04-14 09:40:00
# 4                <NA>                <NA> 2013-03-08 14:28:00 2013-02-27 15:24:00
#           recovered_3         recovered_4         recovered_5         recovered_6
# 1 2013-04-26 07:47:00 2013-04-19 09:36:00 2013-04-15 12:52:00                <NA>
# 2 2013-04-19 09:48:00 2013-02-22 07:18:00 2013-04-15 13:21:00 2013-04-26 08:23:00
# 3 2013-04-19 08:28:00 2013-04-23 12:01:00                <NA>                <NA>
# 4 2013-04-17 15:04:00                <NA>                <NA>                <NA>
于 2013-06-14T01:44:28.473 回答