r - R：通过合并来自另一个数据集的值对变量进行部分重新编码

Question

我什至不太确定如何问这个问题，所以请多多包涵。

我注意到我正在使用的数据集 ANES 累积文件中有一个错误。在数据集中的某一年（2004 年）中，一个变量（我将其重命名为“grewup”）的值被意外遗漏了，因此该年仅显示“NA”。其他年份的值在那里，所以数据集基本上看起来像这样：

id   year   grewup
1    2002   127
2    2002   310
3    2004   NA
4    2004   NA
5    2008   332
6    2008   614

我确实有另一个仅包含 2004 年的数据集，并且缺少“grewup”的值。我想做的是使用第二个数据集中的值重新编码 2004 年的 NA。我该怎么做？同样，这些值在其余年份的累积数据集中；我只想为 2004 年重新编码，而不要理会其余的值。

谢谢。

一些澄清和补充：

我只想从第二个数据集中引入这个变量，以避免使第一个数据集比现在（951 列）更加庞大和消耗内存。实际上，我已经拥有了许多其他变量
此外，虽然 2004 年的所有值都是 NA，但并非数据集中的每个 NA 都是 2004 年的。其他年份的一些值是合法的缺失值。

score 0 · Accepted Answer

您应该能够按 id 和 year 合并这些数据框：

 merge(dat1,dat2,by=c("id", "year"),all.x=TRUE)  # and "outer join"
  id year grewup.x grewup.y
1  1 2002      127       NA
2  2 2002      310       NA
3  3 2004       NA      438
4  4 2004       NA      834
5  5 2008      332       NA
6  6 2008      614       NA
 datm <- merge(dat1,dat2,by=c("id", "year"),all.x=TRUE)

 # No "fill in the blanks
 datm[is.na(datm$grewup.x), "grewup.x"] <- datm[is.na(datm$grewup.x), "grewup.y"] 
 # Notice that the logical index is the same on both sides of the assignment

 datm[ ! names(datm) %in% 'grewup.y' ]  # drop the supplementary column

  id year grewup.x
1  1 2002      127
2  2 2002      310
3  3 2004      438
4  4 2004      834
5  5 2008      332
6  6 2008      614

r - R：通过合并来自另一个数据集的值对变量进行部分重新编码

1 回答 1

Related

Reference