-1

We have "big data frame" and "small data frame". Var1s are my ID, and Var2s are some value.

df1 <- data.frame(row.names=1:10, var1=c("A","B","C","D","E","F","G","H","I","J"), var2=runif(10))
df2 <- data.frame(row.names=1:4, var1=c("B","D","K","A"), var2=runif(4))

I want to compare both data frame and receive new data.frame "DF", which look like: [head(DF)]

  var1      var2     Compare
1    A 0.7145085           1
2    B 0.9966129           1
3    C 0.5062709           0
4    D 0.4899432           1
5    E 0.6491614           0
6    F 0.8308064           0

I only want to compare df1$var1 with df2$var2.

The aim of this task is calculating the sum of var2 (from data frame df1), where compare=1.

I think about logical function, but this check only row by row...as you see I will have all "FALSE".

4

1 回答 1

0

You could certainly improve this question (also notice I use set.seed?). Here is one approach using merge and apply but I'm certain there's better ways:

set.seed(10)
df1 <- data.frame(row.names=1:10, var1=c("A","B","C","D","E","F","G","H","I","J"), var2=runif(10))
df2 <- data.frame(row.names=1:4, var1=c("B","D","K","A"), var2=runif(4))


df3 <- merge(df1, df2, by="var1", all=TRUE)
df3$Compare <- rowSums(apply(df3[, -1], 2, function(x) !is.na(x))) - 1
df3$var2 <- apply(df3[, 2:3], 1, sum, na.rm=TRUE)
df3[, c(1, 5, 4)]

##    var1       var2 Compare
## 1     A 1.10340351       1
## 2     B 0.95842417       1
## 3     C 0.42690767       0
## 4     D 1.26083983       1
## 5     E 0.08513597       0
## 6     F 0.22543662       0
## 7     G 0.27453052       0
## 8     H 0.27230507       0
## 9     I 0.61582931       0
## 10    J 0.42967153       0
## 11    K 0.11350898       0
于 2013-09-08T13:25:36.407 回答