4

我在 R data.frame 中有面板数据,其中包含 1989 年至 2008 年期间各国武装冲突的年份。但是,仅包括在给定年份经历过武装冲突的国家的观察结果。

数据集与此类似:

df <- data.frame(c("1989","1993","1998",
     "1990","1995","1997"),
    c(rep(c(750, 135), c(3,3))), c(rep(1,6)))
names(df)<-c("year","countrycode","conflict")
print(df)

  year countrycode conflict
1 1989         750        1
2 1993         750        1
3 1998         750        1
4 1990         135        1
5 1995         135        1
6 1997         135        1

我现在想填补面板数据中的空白,但仅限于不超过三年的空白。例如,我想在第 1 行和第 2 行之间以及第 5 行和第 7 行之间添加行(间隔分别为 3 和 1 年),但既不在第 2 行和第 3 行之间,也不在第 4 和第 5 行之间(间隔分别为 4 年) . 在此过程之后,上面的 data.frame 将如下所示:

> df2 <- data.frame(c("1989","1990","1991","1992","1993","1998",
+      "1990","1995","1996","1997"),
+     c(rep(c(750, 135), c(6,4))), c(1,0,0,0,1,1,1,1,0,1))
> names(df2) <- c("year","countrycode","conflict")
> print(df2)
   year countrycode conflict
1  1989         750        1
2  1990         750        0
3  1991         750        0
4  1992         750        0
5  1993         750        1
6  1998         750        1
7  1990         135        1
8  1995         135        1
9  1996         135        0
10 1997         135        1

我查看了plm包裹(见这里),但在那里找不到任何答案。另外,我对 R 比较陌生,所以我会很高兴任何提示。

4

2 回答 2

3

这是一个使用data.table. 这个想法是首先创建一个data.table只有缺少的条目(dt.rest),然后再创建rbind它们。我以这样一种方式编写它,即每行的输出(通过复制/粘贴和打印)应该相当简单易懂。如果有不清楚的地方,请告诉我。

require(data.table)
dt <- data.table(df, key="countrycode")
dt$year <- as.numeric(as.character(dt$year))
dt[J(unique(countrycode)), year2 := c(tail(year, -1), NA)]
dt.rest <- dt[, { tt <- which(year2-year-1 <=3); 
                  list(year = unlist(lapply(tt, function(x) 
                              seq(year[x]+1, year2[x]-1, by=1))), 
                       conflict = 0)
                }, by=countrycode]
setcolorder(dt.rest, c("year", "countrycode", "conflict"))

#    year countrycode conflict
# 1: 1996         135        0
# 2: 1990         750        0
# 3: 1991         750        0
# 4: 1992         750        0

Now, we just have to rbind them. This is done using rbindlist function within data.table that binds data.frame or data.table much more efficiently than rbind.

dt[, year2 := NULL]
dt <- rbindlist(list(dt, dt.rest))
setkey(dt, "countrycode", "year")

dt
#     year countrycode conflict
#  1: 1990         135        1
#  2: 1995         135        1
#  3: 1996         135        0
#  4: 1997         135        1
#  5: 1989         750        1
#  6: 1990         750        0
#  7: 1991         750        0
#  8: 1992         750        0
#  9: 1993         750        1
# 10: 1998         750        1
于 2013-03-30T12:31:57.977 回答
2

对于初学者来说,这个解决方案可能看起来很混乱且难以消化,但由于它是一个非常具体且不寻常的问题(至少对我而言),我想不出任何更基本的东西。

# Convert the `year` column to integer in case it is a factor
df$year <- as.integer(as.character(df$year))

df.country <- lapply(
    # Split `df` by `countrycode` to make one data frame per country
    split(df, df$countrycode),

    # Apply the following function to each coutry's data frame
    function(tab){
        # Send the start and end years of each gap to the following function
        imputed.yr <- mapply(function(start, end)
            # If the gap is small enough add all values in between
            # otherwise just return the start and end years
            if(end - start < 5) start:end else c(start, end),
        tab$year[-nrow(tab)], tab$year[-1])

        # Remove duplicate years
        imputed.yr <- unique(unlist(imputed.yr))
        # Pack up and return a new data frame
        data.frame(year = imputed.yr,
                   contrycode = tab$countrycode[1],
                   conflict = imputed.yr %in% tab$year)
    })

# Paste all the imputed country specific data frames together
do.call(rbind, df.country)

上面的代码产生以下输出,这与您要求的基本相同。

      year contrycode conflict
135.1 1990        135     TRUE
135.2 1995        135     TRUE
135.3 1996        135    FALSE
135.4 1997        135     TRUE
750.1 1989        750     TRUE
750.2 1990        750    FALSE
750.3 1991        750    FALSE
750.4 1992        750    FALSE
750.5 1993        750     TRUE
750.6 1998        750     TRUE
于 2013-03-30T12:15:57.863 回答