0

我有一个如下所示的data.frame。

toolid          startdate       enddate         stage
abc                 1-Jan-13    5-Jan-13    production
abc                 6-Jan-13    10-Jan-13   down
xyz                 3-Jan-13    8-Jan-13    production
xyz                 9-Jan-13    15-Jan-13   down

我想将 data.frame 转换为以下格式。我正在尝试将列'startdate''enddate'从上面的 data.frame 组合成一个名为'date'下面的列。我拥有的原始数据在toolids许多阶段有几千行。我已经找到了一种使用 SQL 的方法,但更喜欢 R 解决方案。我已经开始融化数据,如下面的代码所示。

toolid  date            stage
abc     1-Jan-13    production
abc     2-Jan-13    production
abc     3-Jan-13    production
abc     4-Jan-13    production
abc     5-Jan-13    production
abc     6-Jan-13    down
abc     7-Jan-13    down
abc     8-Jan-13    down
abc     9-Jan-13    down
abc     10-Jan-13   down
xyz     3-Jan-13    production
xyz     4-Jan-13    production
xyz     5-Jan-13    production
xyz     6-Jan-13    production
xyz     7-Jan-13    production
xyz     8-Jan-13    production
xyz     9-Jan-13    down
xyz     10-Jan-13   down
xyz     11-Jan-13   down
xyz     12-Jan-13   down
xyz     13-Jan-13   down
xyz     14-Jan-13   down
xyz     15-Jan-13   down

R代码

startdate=c('1-Jan-13','6-Jan-13','3-Jan-13','9-Jan-13')
enddate=c('5-Jan-13',    '10-Jan-13',   '8-Jan-13', '15-Jan-13')
toolid=c('abc',     'abc',  'xyz',  'xyz')
stage=c('production',    'down',    'production',   'down')
data=data.frame(toolid,startdate,enddate,stage)
require(reshape2)
newdata=melt(data,id.vars=c('toolid','stage'))

更新:来自@Ananda Mahto答案的应对代码并添加几行代码以提供数据透视表类型的输出

## Convert "startdate" and "enddate" to date objects
data$startdate <- as.Date(data$startdate, format="%d-%b-%y")
data$enddate <- as.Date(data$enddate, format="%d-%b-%y")


## Use `seq` to create the date sequence, and manually recreate
##   your dataframe. `do.call(rbind, ...) to put it back together
ddd=do.call(rbind, lapply(sequence(nrow(data)), function(x) {
  data.frame(toolid = data$toolid[x], 
             date = seq(data$startdate[x], data$enddate[x], by = 1),
             stage = data$stage[x])
}))

ddd


   toolid       date      stage
1     abc 2013-01-01 production
2     abc 2013-01-02 production
3     abc 2013-01-03 production
4     abc 2013-01-04 production
5     abc 2013-01-05 production
6     abc 2013-01-06       down
7     abc 2013-01-07       down
8     abc 2013-01-08       down
9     abc 2013-01-09       down
10    abc 2013-01-10       down
11    xyz 2013-01-03 production
12    xyz 2013-01-04 production
13    xyz 2013-01-05 production
14    xyz 2013-01-06 production
15    xyz 2013-01-07 production
16    xyz 2013-01-08 production
17    xyz 2013-01-09       down
18    xyz 2013-01-10       down
19    xyz 2013-01-11       down
20    xyz 2013-01-12       down
21    xyz 2013-01-13       down
22    xyz 2013-01-14       down
23    xyz 2013-01-15       down

ddd1=dcast(ddd,date~stage)


ddd1
         date down production
1  2013-01-01    0          1
2  2013-01-02    0          1
3  2013-01-03    0          2
4  2013-01-04    0          2
5  2013-01-05    0          2
6  2013-01-06    1          1
7  2013-01-07    1          1
8  2013-01-08    1          1
9  2013-01-09    2          0
10 2013-01-10    2          0
11 2013-01-11    1          0
12 2013-01-12    1          0
13 2013-01-13    1          0
14 2013-01-14    1          0
15 2013-01-15    1          0
4

1 回答 1

4

我确信有更多“正确”的方法可以做到这一点,但这就是我很快想到的。

首先,将“startdate”和“enddate”转换为日期对象

data$startdate <- as.Date(data$startdate, format="%d-%b-%y")
data$enddate <- as.Date(data$enddate, format="%d-%b-%y")

然后,用于seq创建日期序列,并手动重新创建您的data.frame. 使用 `do.call(rbind, ...) 将其重新组合在一起。

ddd <- do.call(rbind, lapply(sequence(nrow(data)), function(x) {
  data.frame(toolid = data$toolid[x], 
             date = seq(data$startdate[x], data$enddate[x], by = 1),
             stage = data$stage[x])
}))
ddd
#    toolid       date      stage
# 1     abc 2013-01-01 production
# 2     abc 2013-01-02 production
# 3     abc 2013-01-03 production
# 4     abc 2013-01-04 production
# 5     abc 2013-01-05 production
# 6     abc 2013-01-06       down
# 7     abc 2013-01-07       down
# 8     abc 2013-01-08       down
# 9     abc 2013-01-09       down
# 10    abc 2013-01-10       down
# 11    xyz 2013-01-03 production
# 12    xyz 2013-01-04 production
# 13    xyz 2013-01-05 production
# 14    xyz 2013-01-06 production
# 15    xyz 2013-01-07 production
# 16    xyz 2013-01-08 production
# 17    xyz 2013-01-09       down
# 18    xyz 2013-01-10       down
# 19    xyz 2013-01-11       down
# 20    xyz 2013-01-12       down
# 21    xyz 2013-01-13       down
# 22    xyz 2013-01-14       down
# 23    xyz 2013-01-15       down

最后,看看你说你想结束的地方,你可以一直坚持使用 base R 并使用table. 我把它放进去是as.data.frame.matrix()因为我假设你想要 adata.frame作为结果:

as.data.frame.matrix(table(ddd[-1]))
#            down production
# 2013-01-01    0          1
# 2013-01-02    0          1
# 2013-01-03    0          2
# 2013-01-04    0          2
# 2013-01-05    0          2
# 2013-01-06    1          1
# 2013-01-07    1          1
# 2013-01-08    1          1
# 2013-01-09    2          0
# 2013-01-10    2          0
# 2013-01-11    1          0
# 2013-01-12    1          0
# 2013-01-13    1          0
# 2013-01-14    1          0
# 2013-01-15    1          0
于 2013-09-06T16:28:25.630 回答