r - 扩展由“from”和“to”列定义的范围

Question

我有一个包含"name"美国总统的数据框，他们开始和结束就职的年份，（"from"和"to"列）。这是一个示例：

name           from  to
Bill Clinton   1993 2001
George W. Bush 2001 2009
Barack Obama   2009 2012

...以及来自的输出dput：

dput(tail(presidents, 3))
structure(list(name = c("Bill Clinton", "George W. Bush", "Barack Obama"
), from = c(1993, 2001, 2009), to = c(2001, 2009, 2012)), .Names = c("name", 
"from", "to"), row.names = 42:44, class = "data.frame")

我想创建具有两列（"name"和"year"）的数据框，其中一行代表总统在任的每一年。因此，我需要创建一个每年从 " from", 到的常规序列"to"。这是我的预期：

name           year
Bill Clinton   1993
Bill Clinton   1994
...
Bill Clinton   2000
Bill Clinton   2001
George W. Bush 2001
George W. Bush 2002
... 
George W. Bush 2008
George W. Bush 2009
Barack Obama   2009
Barack Obama   2010
Barack Obama   2011
Barack Obama   2012

我知道我可以用它data.frame(name = "Bill Clinton", year = seq(1993, 2001))来为单个总统扩展内容，但我不知道如何为每个总统进行迭代。

我该怎么做呢？我觉得我应该知道这一点，但我在画一个空白。

更新 1

好的，我已经尝试了这两种解决方案，但我收到了一个错误：

foo<-structure(list(name = c("Grover Cleveland", "Benjamin Harrison", "Grover Cleveland"), from = c(1885, 1889, 1893), to = c(1889, 1893, 1897)), .Names = c("name", "from", "to"), row.names = 22:24, class = "data.frame")
ddply(foo, "name", summarise, year = seq(from, to))
Error in seq.default(from, to) : 'from' must be of length 1

score 17 · Accepted Answer

这是一个data.table解决方案。它有一个很好的（如果是次要的）功能，可以让总统按照他们提供的顺序排列：

library(data.table)
dt <- data.table(presidents)
dt[, list(year = seq(from, to)), by = name]
#               name year
#  1:   Bill Clinton 1993
#  2:   Bill Clinton 1994
#  ...
#  ...
# 21:   Barack Obama 2011
# 22:   Barack Obama 2012

编辑：要处理具有非连续任期的总统，请改用：

dt[, list(year = seq(from, to)), by = c("name", "from")]

score 16 · Accepted Answer

您可以使用该plyr软件包：

library(plyr)
ddply(presidents, "name", summarise, year = seq(from, to))
#              name year
# 1    Barack Obama 2009
# 2    Barack Obama 2010
# 3    Barack Obama 2011
# 4    Barack Obama 2012
# 5    Bill Clinton 1993
# 6    Bill Clinton 1994
# [...]

如果按年份对数据进行排序很重要，则可以使用以下arrange功能：

df <- ddply(presidents, "name", summarise, year = seq(from, to))
arrange(df, df$year)
#              name year
# 1    Bill Clinton 1993
# 2    Bill Clinton 1994
# 3    Bill Clinton 1995
# [...]
# 21   Barack Obama 2011
# 22   Barack Obama 2012

编辑 1：遵循@edgester 的“更新 1”，更合适的方法是使用adply非连续条款来解释总统：

adply(foo, 1, summarise, year = seq(from, to))[c("name", "year")]

score 9 · Accepted Answer

使用和的另一种tidyverse方法。 unnestmap2

library(tidyverse)

presidents %>%
  unnest(year = map2(from, to, seq)) %>%
  select(-from, -to)

#              name  year
# 1    Bill Clinton  1993
# 2    Bill Clinton  1994
...
# 21   Barack Obama  2011
# 22   Barack Obama  2012

编辑：tidyr v1.0.0不能再从新变量创建为unnest().

presidents %>%
  mutate(year = map2(from, to, seq)) %>%
  unnest(year) %>%
  select(-from, -to)

score 7 · Accepted Answer

这是一个dplyr解决方案：

library(dplyr)

# the data
presidents <- 
structure(list(name = c("Bill Clinton", "George W. Bush", "Barack Obama"
), from = c(1993, 2001, 2009), to = c(2001, 2009, 2012)), .Names = c("name", 
"from", "to"), row.names = 42:44, class = "data.frame")

# the expansion of the table
presidents %>%
    rowwise() %>%
    do(data.frame(name = .$name, year = seq(.$from, .$to, by = 1)))

# the output
Source: local data frame [22 x 2]
Groups: <by row>

             name  year
            (chr) (dbl)
1    Bill Clinton  1993
2    Bill Clinton  1994
3    Bill Clinton  1995
4    Bill Clinton  1996
5    Bill Clinton  1997
6    Bill Clinton  1998
7    Bill Clinton  1999
8    Bill Clinton  2000
9    Bill Clinton  2001
10 George W. Bush  2001
..            ...   ...

h/t：https ://stackoverflow.com/a/24804470/1036500

score 3 · Accepted Answer

另一种base解决方案：

l <- mapply(`:`, d$from, d$to)
data.frame(name = d$name[rep(1:nrow(d), lengths(l))], year = unlist(l))
#              name year
# 1    Bill Clinton 1993
# 2    Bill Clinton 1994
# ...snip
# 8    Bill Clinton 2000
# 9    Bill Clinton 2001
# 10 George W. Bush 2001
# 11 George W. Bush 2002
# ...snip
# 17 George W. Bush 2008
# 18 George W. Bush 2009
# 19   Barack Obama 2009
# 20   Barack Obama 2010
# 21   Barack Obama 2011
# 22   Barack Obama 2012

score 2 · Accepted Answer

这是一个快速的基本R解决方案，Df您的data.frame：

do.call(rbind, apply(Df, 1, function(x) {
  data.frame(name=x[1], year=seq(x[2], x[3]))}))

它给出了一些关于行名的警告，但似乎返回了正确的data.frame.

score 1 · Accepted Answer

使用的另一个选项tidyverse可能是将gather数据转换为长格式，并在和日期group_by name之间创建一个序列。fromto

library(tidyverse)

presidents %>%
  gather(key, date, -name) %>%
  group_by(name) %>%
  complete(date = seq(date[1], date[2]))%>%
  select(-key) 

# A tibble: 22 x 2
# Groups:   name [3]
#   name          date
#   <chr>        <dbl>
# 1 Barack Obama  2009
# 2 Barack Obama  2010
# 3 Barack Obama  2011
# 4 Barack Obama  2012
# 5 Bill Clinton  1993
# 6 Bill Clinton  1994
# 7 Bill Clinton  1995
# 8 Bill Clinton  1996
# 9 Bill Clinton  1997
#10 Bill Clinton  1998
# … with 12 more rows

score 0 · Accepted Answer

用于by创建一个data.framesby列表L，每个总统一个 data.frame，然后将rbind它们放在一起。不使用任何包。

L <- by(presidents, presidents$name, with, data.frame(name, year = from:to))
do.call("rbind", setNames(L, NULL))

如果您不介意行名，那么最后一行可以简化为：

do.call("rbind", L)

score 0 · Accepted Answer

dplyr使用and 的另一种解决方案tidyr：

library(magrittr) # for pipes
df <- data.frame(tata = c('toto1', 'toto2'), from = c(2000, 2004), to = c(2001, 2009))

#    tata from   to
# 1 toto1 2000 2001
# 2 toto2 2004 2009

df %>% 
  dplyr::as.tbl() %>%
  dplyr::rowwise() %>%
  dplyr::mutate(combined = list(seq(from, to))) %>%
  dplyr::select(-from, -to) %>%
  tidyr::unnest(combined)

#   tata  combined
#   <fct>    <int>
# 1 toto1     2000
# 2 toto1     2001
# 3 toto2     2004
# 4 toto2     2005
# 5 toto2     2006
# 6 toto2     2007
# 7 toto2     2008
# 8 toto2     2009

r - 扩展由“from”和“to”列定义的范围

更新 1

9 回答 9

Related

Reference