r - 获取数据集的 difftime

Question

我有一个根据 diff 或 difftime 的问题。

Equip <- c(1001,1001,1001,1002,1002,1002,1003,1003,1003,1003,1003,1003,1003,1003)
Notif <- c(321,322,322,319,319,345,495,495,495,441,441,441,471,471)
Job <- c("01.01.2011","05.01.2011","05.01.2011","05.01.2011","05.01.2011",
"15.01.2011","23.03.2011","23.03.2011","23.03.2011","27.03.2011","27.03.2011",
"27.03.2011","29.03.2011",
"29.03.2011")
Job <- as.Date(Job,format="%d. %m. %Y")
df <- data.frame(Equip,Notif,Job)

我想在 data.frame 中有一个新列，时间差 [以天为单位] 应该是。

计算时差的条件如下：我现在要做的是，如果Equipnumber相同，但Notifnumber不同，我想要时差（Jobdate）

输出应该是这样的：

df$dd <- c(0,4,4,0,0,10,0,0,0,4,4,4,2,2)

（对于Equipnumber中的第一个Notifnumber，dd为0，因为是第一次访问）

希望你能帮助我，我试着去做，但我不能像我想要的那样去做。

我只能使用没有任何软件包的标准 R 程序...

根据给定的链接，我创建了以下示例，该示例也不起作用：

也许你可以帮助我：

Equips <- c(10006250,10006252,10006252,10006252,10006252,10006252,10006252,
10006252,10006252,10006252,10006252,10006252,10006252,10006252,10006252,10006777)
Notifs <- c(306863771,306862774,306862774,306862774,306933440,
306933440,306998451,306998451,307024311,307024311,
307024311,307024311,307033136,307033136,307128754,307158697)
Jobs <- c("25.01.2011","23.06.2011","23.06.2011","23.06.2011","28.06.2011",
"28.06.2011","02.07.2011","02.07.2011","03.09.2011","03.09.2011",
"03.09.2011","03.09.2011","05.09.2011","05.09.2011","02.11.2011","05.05.2011")
Comps <- c("Service Boiler","General Boiler Components","Ignition and Flame Detection",
"Service Boiler!!!","Electrical Components","Gas Train Assembly",
"Control Box"," Ignition and Flame Detection","CH Components Active",
"CH Components Passive","CH Components Passive","DHW Components",
"DHW Components","Internal Pipeworks and Connections","not grouped in WCC",
"Service Boiler")
Category <- c("service_repair","service_repair","service_repair",
"service_repair","repair","repair","repair","repair","repair","repair",
"repair","repair","repair","repair","repair","service_repair")
Job <- as.Date(Job,format="%d. %m. %Y")
df <- data.frame(Equips,Notifs,Jobs,Comps,Category)

我真的不知道为什么它不适用于此，但是根据第一篇文章中的数据，也许您可以帮助我。

score 3 · Accepted Answer

这是使用基本软件包的有点长且可能令人费解的答案。具有更好知识的人plyr可能能够提供更优雅的解决方案。

> df
   Equip Notif        Job
1   1001   321 2011-01-01
2   1001   322 2011-01-05
3   1001   322 2011-01-05
4   1002   319 2011-01-05
5   1002   319 2011-01-05
6   1002   345 2011-01-15
7   1003   495 2011-03-23
8   1003   495 2011-03-23
9   1003   495 2011-03-23
10  1003   441 2011-03-27
11  1003   441 2011-03-27
12  1003   441 2011-03-27
13  1003   471 2011-03-29
14  1003   471 2011-03-29

diff无条件先获取日期

> df$diff <- c(0,diff(df$Job))
> df
   Equip Notif        Job diff
1   1001   321 2011-01-01    0
2   1001   322 2011-01-05    4
3   1001   322 2011-01-05    0
4   1002   319 2011-01-05    0
5   1002   319 2011-01-05    0
6   1002   345 2011-01-15   10
7   1003   495 2011-03-23   67
8   1003   495 2011-03-23    0
9   1003   495 2011-03-23    0
10  1003   441 2011-03-27    4
11  1003   441 2011-03-27    0
12  1003   441 2011-03-27    0
13  1003   471 2011-03-29    2
14  1003   471 2011-03-29    0

创建新列diff1，即1您的条件为真，0如果为假

> df$diff1 <- c(0, ifelse(diff(df$Equip) == 0 & diff(df$Notif) != 0, 1, 0))
> df
   Equip Notif        Job diff diff1
1   1001   321 2011-01-01    0     0
2   1001   322 2011-01-05    4     1
3   1001   322 2011-01-05    0     0
4   1002   319 2011-01-05    0     0
5   1002   319 2011-01-05    0     0
6   1002   345 2011-01-15   10     1
7   1003   495 2011-03-23   67     0
8   1003   495 2011-03-23    0     0
9   1003   495 2011-03-23    0     0
10  1003   441 2011-03-27    4     1
11  1003   441 2011-03-27    0     0
12  1003   441 2011-03-27    0     0
13  1003   471 2011-03-29    2     1
14  1003   471 2011-03-29    0     0

仅当条件为真时，将结果相乘以获取 diff 列的值

> df$diff <- df$diff * df$diff1
> df$diff1 <- NULL
> df
   Equip Notif        Job diff
1   1001   321 2011-01-01    0
2   1001   322 2011-01-05    4
3   1001   322 2011-01-05    0
4   1002   319 2011-01-05    0
5   1002   319 2011-01-05    0
6   1002   345 2011-01-15   10
7   1003   495 2011-03-23    0
8   1003   495 2011-03-23    0
9   1003   495 2011-03-23    0
10  1003   441 2011-03-27    4
11  1003   441 2011-03-27    0
12  1003   441 2011-03-27    0
13  1003   471 2011-03-29    2
14  1003   471 2011-03-29    0

如果重复读数，则将数据与其自身合并以重复值。（尽管如果数据集中有其他列，则可能需要更改此步骤）

> res <- merge(df[,1:3], df[df$diff!=0,], all.x=T)
> res
   Equip Notif        Job diff
1   1001   321 2011-01-01   NA
2   1001   322 2011-01-05    4
3   1001   322 2011-01-05    4
4   1002   319 2011-01-05   NA
5   1002   319 2011-01-05   NA
6   1002   345 2011-01-15   10
7   1003   441 2011-03-27    4
8   1003   441 2011-03-27    4
9   1003   441 2011-03-27    4
10  1003   471 2011-03-29    2
11  1003   471 2011-03-29    2
12  1003   495 2011-03-23   NA
13  1003   495 2011-03-23   NA
14  1003   495 2011-03-23   NA

将 NA 替换为 0

> res[is.na(res)] <- 0
> res
   Equip Notif        Job diff
1   1001   321 2011-01-01    0
2   1001   322 2011-01-05    4
3   1001   322 2011-01-05    4
4   1002   319 2011-01-05    0
5   1002   319 2011-01-05    0
6   1002   345 2011-01-15   10
7   1003   441 2011-03-27    4
8   1003   441 2011-03-27    4
9   1003   441 2011-03-27    4
10  1003   471 2011-03-29    2
11  1003   471 2011-03-29    2
12  1003   495 2011-03-23    0
13  1003   495 2011-03-23    0
14  1003   495 2011-03-23    0

对于具有更多列的第二个示例数据，将 2 个步骤替换为

res <- merge(df[,c('Equip', 'Notif', 'Job', 'Comps', 'Category')], df[ df$diff !=0    ,c('Equip', 'Notif', 'Job', 'diff')], all.x=T)
res[is.na(res)] <- 0
res
      Equip     Notif        Job                              Comps       Category diff
1  10006250 306863771 2011-01-25                     Service Boiler service_repair    0
2  10006252 306862774 2011-06-23          General Boiler Components service_repair    0
3  10006252 306862774 2011-06-23       Ignition and Flame Detection service_repair    0
4  10006252 306862774 2011-06-23                  Service Boiler!!! service_repair    0
5  10006252 306933440 2011-06-28              Electrical Components         repair    5
6  10006252 306933440 2011-06-28                 Gas Train Assembly         repair    5
7  10006252 306998451 2011-07-02                        Control Box         repair    4
8  10006252 306998451 2011-07-02       Ignition and Flame Detection         repair    4
9  10006252 307024311 2011-09-03               CH Components Active         repair   63
10 10006252 307024311 2011-09-03              CH Components Passive         repair   63
11 10006252 307024311 2011-09-03              CH Components Passive         repair   63
12 10006252 307024311 2011-09-03                     DHW Components         repair   63
13 10006252 307033136 2011-09-05                     DHW Components         repair    2
14 10006252 307033136 2011-09-05 Internal Pipeworks and Connections         repair    2
15 10006252 307128754 2011-11-02                 not grouped in WCC         repair   58
16 10006777 307158697 2011-05-05                     Service Boiler service_repair    0

r - 获取数据集的 difftime

1 回答 1

Related

Reference