I have a data frame made by row binding many data frames, each identified with a unique key. I wish to calculate the correlation coefficients for columns in each subset (using the unique key) of the big data frame. For example, using the mtcars data I might want to calculate the correlation between columns hp
and wt
for each unique value in column cyl
. I could do it in a loop
data("mtcars")
for(i in c(4,6,8)){
temp = subset(mtcars,mtcars$cyl==i)
cor(temp$hp,temp$wt)
}
I think aggregate would be better, but this code doesn't work:
data("mtcars")
aggregate(mtcars,by=mycars$cyl,cor)