4

我想将具有开始年和结束年变量的数据框转换为完整的时间序列,其中(1)包括开始年和结束年之间的所有年份,以及(2)填写的值其间年份的所有变量。

这是原始数据的样子:

data_original <- data.frame(name = c("peter", "peter", "eric", "denisse"), lastname = c("smith", "smith", "jordan", "williams"), age = c(54, 54, 48, 40), start_year = c(1980,1986, 1990, 2000), end_year = c(1984, 1988, 1993, 2001))

data_original
#>      name lastname age start_year end_year
#> 1   peter    smith  54       1980     1984
#> 2   peter    smith  54       1986     1988
#> 3    eric   jordan  48       1990     1993
#> 4 denisse williams  40       2000     2001

这就是我希望数据的样子:

data_final <- data.frame(name = c("peter", "peter", "peter", "peter", "peter", "peter", "peter", "peter", "eric", "eric", "eric", "eric", "denisse", "denisse"), lastname = c("smith", "smith", "smith", "smith", "smith", "smith", "smith", "smith", "jordan", "jordan", "jordan", "jordan", "williams", "williams"), age = c(54, 54, 54, 54, 54, 54, 54, 54, 48, 48, 48, 48, 40, 40), year = c(1980, 1981, 1982, 1983, 1984, 1986, 1987, 1988, 1990, 1991, 1992, 1993, 2000, 2001))

data_final
#>       name lastname age year
#> 1    peter    smith  54 1980
#> 2    peter    smith  54 1981
#> 3    peter    smith  54 1982
#> 4    peter    smith  54 1983
#> 5    peter    smith  54 1984
#> 6    peter    smith  54 1986
#> 7    peter    smith  54 1987
#> 8    peter    smith  54 1988
#> 9     eric   jordan  48 1990
#> 10    eric   jordan  48 1991
#> 11    eric   jordan  48 1992
#> 12    eric   jordan  48 1993
#> 13 denisse williams  40 2000
#> 14 denisse williams  40 2001

非常感谢您对此的持续帮助!

4

2 回答 2

4

这是一个选项tidyverse。通过获取 'start_year'、'end_year' 的序列、map2相关select列和unnest

library(tidyverse)
data_original %>% 
    mutate(year = map2(start_year, end_year, `:`)) %>% 
    select(-start_year, -end_year) %>% 
    unnest
#      name lastname age year
#1    peter    smith  54 1980
#2    peter    smith  54 1981
#3    peter    smith  54 1982
#4    peter    smith  54 1983
#5    peter    smith  54 1984
#6    peter    smith  54 1986
#7    peter    smith  54 1987
#8    peter    smith  54 1988
#9     eric   jordan  48 1990
#10    eric   jordan  48 1991
#11    eric   jordan  48 1992
#12    eric   jordan  48 1993
#13 denisse williams  40 2000
#14 denisse williams  40 2001

或者另一种选择是data.table

library(data.table)
setDT(data_original)[, .(name, lastname, year = seq(start_year, end_year, by = 1)), 
          .(grp = 1:nrow(data_original))][, grp := NULL][] 

或者我们也可以base R使用Map

lst <- do.call(Map, c(f = `:`, data_original[4:5]))
out <- data_original[1:3][rep(seq_len(nrow(data_original)), lengths(lst)),]
row.names(out) <- NULL
于 2018-04-26T03:31:29.363 回答
2

这是使用and的另一种tidyverse方法:sequnnest

data_original %>%
    rowwise() %>%
    mutate(year = list(seq(start_year, end_year, 1))) %>%
    ungroup() %>%
    select(-start_year, -end_year) %>%
    unnest()
## A tibble: 14 x 4
#   name    lastname   age  year
#   <fct>   <fct>    <dbl> <dbl>
# 1 peter   smith      54. 1980.
# 2 peter   smith      54. 1981.
# 3 peter   smith      54. 1982.
# 4 peter   smith      54. 1983.
# 5 peter   smith      54. 1984.
# 6 peter   smith      54. 1986.
# 7 peter   smith      54. 1987.
# 8 peter   smith      54. 1988.
# 9 eric    jordan     48. 1990.
#10 eric    jordan     48. 1991.
#11 eric    jordan     48. 1992.
#12 eric    jordan     48. 1993.
#13 denisse williams   40. 2000.
#14 denisse williams   40. 2001.

PS。事后看来,@akrun 的使用purrr::map2方法要干净得多;它节省了按行显式(取消)分组的需要。

于 2018-04-26T03:35:39.487 回答