r - Using apply function on a matrix with NA entries

Question

I read Data from a csv file. If I see this file in R, I have:

  V1 V2  V3 V4  V5 V6 V7
1 14 25  83 64 987 45 78
2 15 65 789 32  14 NA NA
3 14 67  89 14  NA NA NA

If I want the maximum value in each column, I use this:

apply(df,2,max)

and this is the result:

 V1  V2  V3  V4  V5  V6  V7 
 15  67 789  64  NA  NA  NA

but it works on the column that has no NA. How can I change my code, to compare columns with NA too?

score 36 · Accepted Answer

您只需要添加na.rm=TRUE到您的应用调用。

apply(df,2,max,na.rm=TRUE)

注意：这确实假设每一列至少有一个数据点。如果一个不sum将返回0。

根据评论进行编辑

fft没有na.rm论据。因此，您需要编写自己的函数。

apply(df,2,function(x){fft(x[!is.na(x)])})

例如：

df <- data.frame(matrix(5,5,5))
df[,3] <- NA

> df
  X1 X2 X3 X4 X5
1  5  5 NA  5  5
2  5  5 NA  5  5
3  5  5 NA  5  5
4  5  5 NA  5  5
5  5  5 NA  5  5

> apply(df,2,function(x){fft(x[!is.na(x)])})
$X1
[1] 2.500000e+01+0i 1.776357e-15+0i 1.776357e-15+0i 1.776357e-15+0i
[5] 1.776357e-15+0i

$X2
[1] 2.500000e+01+0i 1.776357e-15+0i 1.776357e-15+0i 1.776357e-15+0i
[5] 1.776357e-15+0i

$X3
complex(0)

$X4
[1] 2.500000e+01+0i 1.776357e-15+0i 1.776357e-15+0i 1.776357e-15+0i
[5] 1.776357e-15+0i

$X5
[1] 2.500000e+01+0i 1.776357e-15+0i 1.776357e-15+0i 1.776357e-15+0i
[5] 1.776357e-15+0i

score 5 · Accepted Answer

另外的选择：

sapply(apply(df,2,na.exclude), fft)

编辑：如果apply()返回矩阵而不是列表，上面的代码可能会失败。例如，如果没有NAs，就会发生这种情况。下面的代码解决了这个问题：

sapply(tapply(m, col(m), na.exclude), max)

有趣的是，不需要设置simplify=FALSE，因为只有在每列返回一个标量tapply()时才会简化结果；na.exclude()在这种情况下sapply，以相同的方式工作。

score 2 · Accepted Answer

另一种选择是使用以下内容：

apply(na.omit(df),2,max)

na.omit(df)将简单地从数据框df的每一列中删除不完整的案例，然后apply()函数将为每一列产生最大值。

score 1 · Accepted Answer

另一种选择，-Inf如果 col 的所有元素都是 NA ，这将返回

df<-structure(list(x = c(10, 12, 13), y = c(12, 13, NA), z = c(NA_real_, 
NA_real_, NA_real_)), .Names = c("x", "y", "z"), row.names = c(NA, 
-3L), class = "data.frame")

kk<-Map(function(x) max(na.omit(df[,x])),as.list(names(df)))
ll<-do.call(rbind,kk)
rownames(ll)<-names(df)

> ll

 [,1]
x   13
y   13
z -Inf

score 1 · Accepted Answer

这可能是后期版本的结果，但您实际上可以这样做：

apply(df,2,function(x) max(x,na.rm=T))

这将返回一个向量或等效地：

lapply(df,function(x) max(x,na.rm=T))

这将为您返回一个列表。请注意，只要 df 中的一列是字符，它将无法返回所有 NA。在这种情况下，您可能需要事先选择目标变量。

r - Using apply function on a matrix with NA entries

5 回答 5

Related

Reference