r - R plyr 应用程序填充缺失值

Question

我有一个数据框，其中包含对许多变量的人年观察。它看起来像这样：

   year     serial moved urban.rural.code   
15 1982     1000_1     0                0
16 1983     1000_1     0                0
17 1984     1000_1     0                0
18 1985     1000_1     1                0
19 1986     1000_1     1                1
20 1981     1000_2     0                1
21 1982     1000_2     0                1
22 1983     1000_2     0                1
23 1984     1000_2     0                0
24 1985     1000_2     0                9   
25 1996     1000_2     0                1
26 1993     1000_3     0                1
27 1994     1000_3     0                1
28 1984     1000_4     0                0
29 1985     1000_4     0                7  
30 1987     1000_5     0                1
31 1984     1000_6     0                0
32 1999     1000_6     0                8

对于序列号内的每个观察，如果观察记录在 1985 年并且moved在 1895 年的值 = 0，那么我想将urban.rural.code1984 年的值分配给 1985 年的值。在上面的示例中，urban.rural.code行的唯一23 和 28 应分别分配给 9 和 7。

我使用了ddply和辅助函数的组合，如下所示：

fill1984 <- function(group) {
    if((1984 %in% group$year) & (group[group$year == 1985, 'moved'] == 0)) {
        group[group$year == 1984, 'urban.rural.code'] <- group[group$year == 1985,     'urban.rural.code']
        } 
     return(group)
 }

data <- ddply(data, 'serial', fill1984, .parallel=TRUE)

我收到以下错误：

Error in do.ply(i) : task 2 failed - "argument is of length zero"
In addition: Warning message:
In setup_parallel() : No parallel backend registered

我不知道我哪里错了。如何urban.rural.code在每个serial号码组内进行编辑？

score 0 · Accepted Answer

这是在 dplyr 中，可能可以清理一些，但看起来它可以工作：

library(dplyr)
newdf <- data %>%
          group_by(serial) %>%
          mutate(
            cidx = year == 1985 & moved == 0,
            urban.rural.code = ifelse(year == 1984 & isTRUE(cidx[year==1985]),
                                      urban.rural.code[year == 1985],
                                      urban.rural.code)
          )

r - R plyr 应用程序填充缺失值

1 回答 1

Related

Reference