考虑以下数据:
Country1 = c("Brazil", "India", "China","China","Brazil")
Date1<-as.Date(c("2001-01-21", "2002-04-13","2003-06-19","2006-06-19","2007-06-19"))
Name1<-c("B","C","A","A","A")
Data1<-data.frame(Country1,Date1,Name1)
Name2<-c("B","B","C","C","C","A","A","A")
Quality2<-c("good","good","medium","good","good","bad","good","good")
Country2<-c("China","Brazil","Taiwan","India","India","United States","China","Brazil")
Date2<-as.Date(c("2002-02-21", "1999-03-13","1998-08-19", "1996-09-13","2000-12-12","1998-07-21","2005-03-22","2003-06-19"))
Data2<-data.frame(Name2,Quality2,Country2,Date2)
在 Data1 中,我想添加一个名为“结果”的列。“结果”(对于 Data1 的每一行)应该是满足四个条件的 Data2 的行数的总和(1)Data2$Name2 应该匹配 Data1$Name1 的行条目,(2)Data2$Country2 应该匹配行的条目Data1$Country1,(3) Data2$Quality2 应该是“好”,(4) Data2$Date2 应该小于 Data1$Date1 的行条目。因此,Data1$Result 应该是 1、2、0、1 和 1。
例如,对于第一行,Data1$Result 应该为 1,因为 Data2 只有 1 行满足这些条件:
sum(Data2$Name2==as.character(Data1$Name1)[1] & Data2$Country2==as.character(Data1$Country1)[1] & Data2$Quality2=="good" & Data2$Date2 < Data1$Date1[1])
或者,换句话说
sum(Data2$Name2=="B" & Data2$Country2=="Brazil" & Data2$Quality2=="good" & Data2$Date2 < "2001-01-21")
同样,对于第二行,Data1$Result 应该是 2,因为 Data2 有 2 行满足这些条件:sum(Data2$Name2==as.character(Data1$Name1)[2] & Data2$Country2==as.character(Data1$Country1)[2] & Data2$Quality2=="good" & Data2$Date2 < Data1$Date1[2])
或者,
sum(Data2$Name2=="C" & Data2$Country2=="India" & Data2$Quality2=="good" & Data2$Date2 < "2002-04-13")
.
对于第三行,Data1$Result 应该为 0,因为 Data2 没有任何满足这些条件的行:
sum(Data2$Name2==as.character(Data1$Name1)[3] & Data2$Country2==as.character(Data1$Country1)[3] & Data2$Quality2=="good" & Data2$Date2 < Data1$Date1[3])
或者,
sum(Data2$Name2=="A" & Data2$Country2=="China" & Data2$Quality2=="good" & Data2$Date2 < "2003-06-19")
.
第 4 行和第 5 行也是如此:
sum(Data2$Name2==as.character(Data1$Name1)[4] & Data2$Country2==as.character(Data1$Country1)[4] & Data2$Quality2=="good" & Data2$Date2 < Data1$Date1[4])
sum(Data2$Name2==as.character(Data1$Name1)[5] & Data2$Country2==as.character(Data1$Country1)[5] & Data2$Quality2=="good" & Data2$Date2 < Data1$Date1[5])
作为 R 的初学者,我编写了以下代码:
sum(Data2$Name2==as.character(Data1$Name1)[1:nrow(Data1)] & Data2$Country2==as.character(Data1$Country1)[1:nrow(Data1)] & Data2$Quality2=="good" & Data2$Date2 < Data1$Date1[1:nrow(Data1)])
但是,它不会返回所需的结果。我想根据 Data1 的行数编写一个动态代码。在我的实际数据中,我在每个数据中都有大约 100,000 个观察值。
理想情况下,我正在寻找 R 根据 Data1 “n” 的行数读取的一些代码。
例如,对于第一行,R 应该执行
sum(Data2$Name2==as.character(Data1$Name1)[1] & Data2$Country2==as.character(Data1$Country1)[1] & ata2$Quality2=="good" & Data2$Date2 < Data1$Date1[1])
对于第二行,
sum(Data2$Name2==as.character(Data1$Name1)[2] & Data2$Country2==as.character(Data1$Country1)[2] & ata2$Quality2=="good" & Data2$Date2 < Data1$Date1[2])
对于(假设)第 54,342 行
sum(Data2$Name2==as.character(Data1$Name1)[54342] & Data2$Country2==as.character(Data1$Country1)[54342] & ata2$Quality2=="good" & Data2$Date2 < Data1$Date1[54342])
对于第 n 行
sum(Data2$Name2==as.character(Data1$Name1)[n] & Data2$Country2==as.character(Data1$Country1)[n] & Data2$Quality2=="good" & Data2$Date2 < Data1$Date1[n])
另外,我想在 Data1 中添加另一列,名称为“Min.Date.Result”,它给出了满足相同四个条件的 Data2$Date2 的最小(最旧)值。所以 Data1$Min.Date.Result 应该是“1999-03-13”、“1996-09-13”、“NA”、“2005-03-22”、“2003-06-19”。