我有以下数据框:
df<-structure(list(totprivland = c(175L, 50L, 100L, 14L, 4L, 240L,
10L, 20L, 20L, 58L), ncushr8d1 = c(0L, 0L, 0L, 0L, 0L, 30L, 5L,
0L, 0L, 50L), ncu_CENREG1 = structure(c(4L, 4L, 4L, 4L, 1L, 3L,
3L, 3L, 4L, 4L), .Label = c("Northeast", "Midwest", "South",
"West"), class = "factor"), ncushr8d2 = c(75L, 50L, 100L, 14L,
2L, 30L, 5L, 20L, 20L, 8L), ncu_CENREG2 = structure(c(4L, 4L,
4L, 4L, 1L, 2L, 1L, 4L, 3L, 4L), .Label = c("Northeast", "Midwest",
"South", "West"), class = "factor"), ncushr8d3 = c(100L, NA,
NA, NA, 2L, 180L, 0L, NA, NA, NA), ncu_CENREG3 = structure(c(4L,
NA, NA, NA, 1L, 1L, 3L, NA, NA, NA), .Label = c("Northeast",
"Midwest", "South", "West"), class = "factor"), ncushr8d4 = c(NA,
NA, NA, NA, 0L, NA, NA, NA, NA, NA), ncu_CENREG4 = structure(c(NA,
NA, NA, NA, 1L, NA, NA, NA, NA, NA), .Label = c("Northeast",
"Midwest", "South", "West"), class = "factor")), .Names = c("totprivland",
"ncushr8d1", "ncu_CENREG1", "ncushr8d2", "ncu_CENREG2", "ncushr8d3",
"ncu_CENREG3", "ncushr8d4", "ncu_CENREG4"), row.names = c(27404L,
27525L, 27576L, 27822L, 28099L, 28238L, 28306L, 28312L, 28348L,
28379L), class = "data.frame")
=======
这是dput
以下基本思想:
Total VariableA LocationA VariableB LocationB
30 20 East 10 East
20 20 South NA West
115 15 East 100 South
100 50 West 50 West
35 10 East 25 South
总数(或 dput 示例中的 totprivland)是变量(ncushr8d1、ncushr8d2、ncushr8d3 和 ncushr8d4)的总和,每个变量都有一个对应的因子位置变量(ncu_CENREG1 等)。在这个相同的模式中还有 6 个额外的变量和位置。对于多个数值变量,位置变量通常是相同的值(例如,多个“东”位置值,如示例的第一行)。
我想通过公共位置因子获得每行的值的总和,为每个位置的总和创建一个新列。它看起来像这样,可以忽略 NA 值:
Total VariableA LocationA VariableB LocationB TotalWest TotalEast TotalSouth
30 20 East 10 East 0 30 0
20 20 South NA NA 0 0 20
115 15 East 100 South 0 15 100
100 50 West 50 West 100 0 0
35 10 East 25 South 0 10 25
我研究了聚合和拆分,但似乎无法弄清楚如何让它们在这么多列中工作。我也在考虑一个冗长的“if”语句,它将遍历所有 8 个变量及其相应的位置,但我觉得必须有一个更好的解决方案。观察被加权以用于调查包,我想避免重复观察并使它们与 reshape 包“长”,尽管也许我可以稍后重新组合它们。任何建议表示赞赏!
非常感谢,卢克