0

I couldn't figure this out....

I have a data frame looks like this (only the top 10 rows are shown):

Value   Type
NA       3      
23       2
54       1
45       1
21       2
55       3
67       3
78       1
10       1
NA       2

Task:

Replace NA with the mean value of its given Type. Ex: The first NA is in Type 3, so I'd like to replace it with the average value in Type 3, that is (55+67)/2= 61

My code:

for (i in 1:nrow(df)){
  if(is.na(df[i,"Value"])==TRUE & Type==1){
    df[i,"Value"] = mean(with(df, subset(Value, Type==1)))
  }
  else if (is.na(df[i,"Value"])==TRUE & Type==2){
    df[i,"Value"] = mean(with(df, subset(Value, Type==2)))
  }
  else if (is.na(df[i,"Value"])==TRUE & Type==3){
    df[i,"Value"] = mean(with(df, subset(Value, Type==3)))
  }
  else (df[i,"Value"] = df[i,"Value"])
}

Result

NAs are still observed in the Value column and they are not being replaced by the mean value of its class.

any help is appreciated!

4

2 回答 2

2
library(plyr) 

ddply(dat, .(Type), function(df){
  m <- mean(df$Value, na.rm=TRUE)
  df$Value[is.na(df$Value)] <- m
  df
})
于 2013-09-20T18:58:40.997 回答
0

这是基础 R 中的两行代码,假设X是您的data.frame

Means <- tapply(X$Value, X$Type, mean, na.rm=TRUE)
X$Value <- apply(X, 1, function(r) ifelse(is.na(r[1]), Means[r[2]], r[1]))

对于大型数据集,可能比使用 更快ddply,尽管plyrdata.table包更通用,当然值得学习。

于 2013-09-22T01:11:30.013 回答