I'm working on a large dataset in R with 3 factors: FY (6 levels), Region (10 levels), and Service (24 levels). I need to sum my numeric vector, SumOfUnits, at all three levels, and the only way I can think to do this is to split the data frames up into first: 6 data frames, split by FY, then split those 6 into 10 data frames, split on region, then those 10 into the 24 Services, then I can finally take the sum of the numeric vector and recombine all of the data frames into one. This data frame would then have 6*10*24 (1440) rows and 4 columns. The way I'm currently doing it involves a lot of splitting, so I thought there might be a function I could write that I could use at each level of the split, but I haven't used "function" very much in R so I'm not sure what to write (if there even is something). I also imagine there is probably a more efficient way to get the formatted data set, so I welcome all suggestions.
Here are a few lines from my data frame:
FY Region Service SumOfUnits
1 2006 1 Medication 13
2 2006 1 Medication 1
3 2006 1 Screening & Assessment 38
4 2006 1 Screening & Assessment 13
5 2006 1 Screening & Assessment 41
6 2006 1 Screening & Assessment 67
7 2006 1 Screening & Assessment 222
8 2006 1 Residential Treatment 38
9 2006 1 Residential Treatment 1558
This is the code I've been using for my splits:
# Creating a data frame by year
X <- split(MIC, MIC$FY)
Y <- lapply(seq_along(X), function(x) as.data.frame(X[[x]])[, ])
#Assign the dataframes in the list Y to individual objects
A <- Y[[1]]
B <- Y[[2]]
C <- Y[[3]]
D <- Y[[4]]
E <- Y[[5]]
Q <- Y[[6]]
#Creating 10 dataframes from 2006 split by region
X <- split(A, A$Region)
Y <- lapply(seq_along(X), function(x) as.data.frame(X[[x]])[, ])
Reg1 <- Y[[1]]
Reg2 <- Y[[2]]
Reg3<- Y[[3]]
Reg4 <- Y[[4]]
Reg5<- Y[[5]]
Reg6 <- Y[[6]]
Reg7 <- Y[[7]]
Reg8 <- Y[[8]]
Reg9 <- Y[[9]]
Reg10<- Y[[10]]
#Creating 24 dataframes: for 2006, region 1
X <- split(Reg1, Reg1$Service)
Y <- lapply(seq_along(X), function(x) as.data.frame(X[[x]])[, ])
Serv1 <- Y[[1]]
Serv2 <- Y[[2]]
Serv3<- Y[[3]]
Serv4 <- Y[[4]]
Serv5<- Y[[5]]
#etc...
I would want a sample of my data to look something like this:
FY Region Service SumOfUnits
2006 1 Medication 4300
2006 2 Medication 3299
2006 3 Medication 2198
2007 1 Medication 5467
2007 2 Medication 3214
2007 3 Medication 9807