A dplyr
solution to get the count could be:
summarise_all(df, ~sum(is.na(.)))
Or to get a percentage:
summarise_all(df, ~(sum(is_missing(.) / nrow(df))))
Maybe also worth noting that missing data can be ugly, inconsistent, and not always coded as NA
depending on the source or how it's handled when imported. The following function could be tweaked depending on your data and what you want to consider missing:
is_missing <- function(x){
missing_strs <- c('', 'null', 'na', 'nan', 'inf', '-inf', '-9', 'unknown', 'missing')
ifelse((is.na(x) | is.nan(x) | is.infinite(x)), TRUE,
ifelse(trimws(tolower(x)) %in% missing_strs, TRUE, FALSE))
}
# sample ugly data
df <- data.frame(a = c(NA, '1', ' ', 'missing'),
b = c(0, 2, NaN, 4),
c = c('NA', 'b', '-9', 'null'),
d = 1:4,
e = c(1, Inf, -Inf, 0))
# counts:
> summarise_all(df, ~sum(is_missing(.)))
a b c d e
1 3 1 3 0 2
# percentage:
> summarise_all(df, ~(sum(is_missing(.) / nrow(df))))
a b c d e
1 0.75 0.25 0.75 0 0.5