2

给定两个数据框

old.df = data.frame(SampleNo=c('A1', 'B4', 'C5', 'D4'), Result=c(rep("Successful",4)), NoUnit = c(rep(4,4)))
new.df = data.frame(SampleNo=c('A1', 'C5', 'D4', 'E4'), Result=c(rep("Successful",2),rep( "Failure",2)),State=c(rep("California",2),rep("New York",2)))

使其具有以下格式:

> old.df
  SampleNo     Result      NoUnit
1       A1     Successful      4
2       B4     Successful      4
3       C5     Successful      4
4       D4     Successful      4


> new.df
  SampleNo     Result      State
1       A1 Successful California
2       C5 Successful California
3       D4    Failure   New York
4       E4    Failure   New York

我想用 new.df 中的新数据更新 old.df 的内容,维护 old.df 的行连续性并从 new.df 添加新列。结果 data.frame 将是:

 SampleNo     Result   NoUnit      State
1       A1 Successful      4 California
2       B4 Successful      4       <NA>
3       C5 Successful      4 California
4       D4    Failure      4   New York
5       E4    Failure     NA   New York
4

2 回答 2

3
merge(old.df,new.df,all=TRUE)

  SampleNo     Result NoUnit      State
1       A1 Successful      4 California
2       B4 Successful      4       <NA>
3       C5 Successful      4 California
4       D4    Failure      4   New York
5       E4    Failure     NA   New York

OP更改规则后进行编辑:

df <- merge(old.df,new.df,all=TRUE,by="SampleNo")
df$Result <- with(df,factor(ifelse(is.na(Result.y),
                             as.character(Result.x),as.character(Result.y))))
df$Result.x <- NULL; df$Result.y <- NULL

  SampleNo NoUnit      State     Result
1       A1      4 California Successful
2       B4      4       <NA> Successful
3       C5      4 California Successful
4       D4      4   New York    Failure
5       E4     NA   New York    Failure
于 2012-11-23T08:09:08.297 回答
1

Merge 不会自行执行此操作。但是您并不想在"Result"列上合并,只在"SampleNo"列上合并,然后合并"Result"值,如果可用则使用新值,否则使用旧值。

这是一些执行此操作的代码,适用于除交集以外的所有列"SampleNo"

merge.by.sample <- function(old.df, new.df, by='SampleNo') {
  r <- merge(old.df, new.df,all=T,by=by)

  merge.col <- function(r, col) {
    xname <- paste0(col, '.x')
    yname <- paste0(col, '.y')

    r[col] <- factor(r[,yname], levels=union(levels(r[,xname]), levels(r[,yname])))
    r[col][is.na(r[col])] <- r[xname][is.na(r[col])]
    r[!(names(r) %in% c(xname, yname))]
  }

  i <- intersect(names(old.df), names(new.df))
  i <- i[!i %in% by]

  for (col in i) {
    r <- merge.col(r, col)
  }
  r
}
于 2012-11-24T03:17:56.480 回答