Find centralized, trusted content and collaborate around the technologies you use most.
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
我有数据框:
mat=data.frame(A=c(12,10,0,14,0,60),B=c(0,0,0,0,13,65))
问题是:如何过滤掉包含过多零的列 [例如 > 50%]?例如,必须删除 B 列。
最好用 nrow(mat) * 0.5 设置阈值,然后删除计数为零高于该阈值的列。
这是一种方法:
> mat <- data.frame(A=c(12,10,0,14,0,60),B=c(0,0,0,0,13,65)) > > keep <- (colSums(mat > 0) / nrow(mat)) > 0.5 > keep A B TRUE FALSE > > mat[, keep, drop = FALSE] A 1 12 2 10 3 0 4 14 5 0 6 60