2

我现在一直在为我的时间序列横截面数据集苦苦挣扎,特别是在试图找到一种方法来定义每个国家和年份的一列的最大值时。我尝试了不同版本的 for 和 if/else 循环,但并没有真正成功。你能帮我找到任何线索吗?

对于我的数据结构,这将是一个可重复的小示例:

country <- c("a","a","a","a","a","a","b","b","b","b","b","b","c","c","c","c","c","c")
year <- c(2002, 2003, 2004, 2005, 2006, 2007, 2002, 2003, 2004, 2005, 2006, 2007, 2002, 2003, 2004, 2005, 2006, 2007)
topic <-c("u", "v", "w", "x","y","z","u", "v", "w", "x","y","z","u", "v", "w", "x","y","z")
perc <-c(0.3,0.4,0.1,0.2,0,0,0.2,0.3,0.1,0.1,0.1,0.2,0.1,0.2,0.2,0.3,0, 0.2)
dta <- data.frame(country, year, topic, perc)

最后,我想创建一个新变量来说明给定年份和国家/地区百分比最高的主题:

topicmax <-c("v","v","v","v","v","v","v","v","v","v","v","v","x","x","x","x","x","x")

最好我还会生成另一个变量,指定具有最高 perc 值的主题的确切百分比。

任何帮助将不胜感激。我发现的所有关于循环的教程都没有解决时间序列横截面问题......谢谢!

4

1 回答 1

2

解决问题的一种方法是使用which.max. 它定位最大值的索引位置。该索引可用于子集topic

library(data.table)
setDT(dta)[, topicmax := topic[which.max(perc)], by=country]
#     country year topic perc topicmax
#  1:       a 2002     u  0.3        v
#  2:       a 2003     v  0.4        v
#  3:       a 2004     w  0.1        v
#  4:       a 2005     x  0.2        v
#  5:       a 2006     y  0.0        v
#  6:       a 2007     z  0.0        v
#  7:       b 2002     u  0.2        v
#  8:       b 2003     v  0.3        v
#  9:       b 2004     w  0.1        v
# 10:       b 2005     x  0.1        v
# 11:       b 2006     y  0.1        v
# 12:       b 2007     z  0.2        v
# 13:       c 2002     u  0.1        x
# 14:       c 2003     v  0.2        x
# 15:       c 2004     w  0.2        x
# 16:       c 2005     x  0.3        x
# 17:       c 2006     y  0.0        x
# 18:       c 2007     z  0.2        x
于 2015-11-30T21:22:01.337 回答