Perhaps this helps:
df <- data.frame(group=c("a","b","c","a","b","c","a","b","c"),
var1=1:9, var2=c(1,2,3,NA,5,6,7,8,9))
with(df, length(cbind(var1, var2)))
> with(df, length(cbind(var1, var2)))
[1] 18
length()
treats cbind(var1, var2)
as a matrix, which is just a vector with dimensions, hence you get the length reported as prod(nrow(mat), ncol(mat))
where mat
is the resulting matrix.
Ideally you'd use nrow()
instead of length()
, but perhaps more widely applicable is the NROW()
function, which will treat a vector as a 1-column matrix for purposes of evaluating the function. nrow()
won't work for a vector input
> nrow(1:10)
NULL
E.g. try these:
aggregate(cbind(var1,var2) ~ group, df, NROW)
aggregate(var1 ~ group, df, NROW)
> aggregate(cbind(var1,var2) ~ group, df, NROW)
group var1 var2
1 a 2 2
2 b 3 3
3 c 3 3
> aggregate(var1 ~ group, df, NROW)
group var1
1 a 3
2 b 3
3 c 3
and as you have NA
, you probably don't want the incomplete cases removed, which would happen by default. This is seen above and hence why the number of rows for group a
is 2. For that add na.action = na.pass
to the call:
aggregate(cbind(var1,var2) ~ group, df, NROW, na.action = na.pass)
> aggregate(cbind(var1,var2) ~ group, df, NROW, na.action = na.pass)
group var1 var2
1 a 3 3
2 b 3 3
3 c 3 3
The issues is that in building up the data frame to pass to aggregate.data.frame
, the usual model frame generation process takes place and aggregate.formula
has the na.action
argument set to na.omit
by default - which is standard behaviour in modelling functions that use formula interfaces.
If you want to count the number of non-NA
values per variable then you need a completely different approach, perhaps using is.na()
, as in
foo <- function(x) sum(!is.na(x))
aggregate(cbind(var1,var2) ~ group, df, foo, na.action = na.pass)
> aggregate(cbind(var1,var2) ~ group, df, foo, na.action = na.pass)
group var1 var2
1 a 3 2
2 b 3 3
3 c 3 3
Which works by counting the number of non-NA
values through coercion of first TRUE
-> FALSE
via !
and then resulting TRUE
s are converted to 1
and FALSE
s to 0
, which sum()
then adds for us.