All,
The company where I work gave me this data to work with. In short, it's TSCS data with the firm as the cross-sectional unit with time units as fiscal years. Each firm has various accounts. I'm interested in creating a total of money spent on each account for a given firm.
I can provide a simple illustration of the data below. Let firm
be the cross-sectional unit of interest. Each firm
has various accounts on which the company spends money. Some accounts are common to all firms, others are unique. Not every firm
had money spent on an account in a given year. In fact, some were not eligible for accounts until later on in the data, and others drop out (as such, the panel data can be considered unbalanced). As such, the NAs in the data I was provided could be treated as 0s, though it's a little bit problematic. Some firms are eligible in a given year but don't receive money in an account. Other firms are ineligible because of drop-out or late entry.
The data look like this, and it was given to me in wide format. It's a simplified version for illustration. In this illustration, firm=B
wasn't eligible for an account in FY1990 and firm=C
drops out in FY1992.
firm account FY1990 FY1991 FY1992
A Account 1 500 900 1000
A Account 2 30 40 40
A Account 3 NA 60 20
A Account 4 NA 35 NA
B Account 1 NA 340 60
B Account 2 NA 500 800
B Account 3 NA 800 NA
B Account 4 NA 60 1000
C Account 1 1000 400 NA
C Account 5 500 60 NA
C Account 8 60 1000 NA
D Account 1 400 400 400
D Account 2 NA 1000 1000
D Account 3 300 40 300
D Account 6 NA 300 300
D Account 7 900 900 1000
D Account 8 1000 1200 1500
What I'd like to do (and was told to do) was amend this data so that it looks like this:
firm account FY1990 FY1991 FY1992
A Account 1 500 900 1000
A Account 2 30 40 40
A Account 3 NA 60 20
A Account 4 NA 35 NA
A TOTAL 530 1035 1060
B Account 1 NA 340 60
B Account 2 NA 500 800
B Account 3 NA 800 NA
B Account 4 NA 60 1000
B TOTAL NA 1700 1860
C Account 1 1000 400 NA
C Account 5 500 60 NA
C Account 8 60 1000 NA
C TOTAL 1560 1460 NA
D Account 1 400 400 400
D Account 2 NA 1000 1000
D Account 3 300 40 300
D Account 6 NA 300 300
D Account 7 900 900 1000
D Account 8 1000 1200 1500
D TOTAL 2600 3840 4500
I could just as easily do this in Excel or some other spreadsheet program, but that would be tedious and it invites more human error than if I were to use R to program this. I'm not against creating a new data frame with the totals rather than trying to add a row underneath all the accounts for a given firm. It might be easier to just put a 0 for the total for a given firm ineligible for an account in a given fiscal year. I can always recode some zeroes as NAs next and automate that process as well.
My assumption is this would require a loop, but I'm a novice in R programming. Any input would be greatly appreciated.
Reproducible code for this illustration is below.
firm <- c("A","A","A","A","B","B","B","B","C","C","C","D","D","D","D","D","D")
account <- c("Account 1","Account 2","Account 3","Account 4","Account 1","Account 2","Account 3","Account 4","Account 1","Account 5","Account 8","Account 1","Account 2","Account 3","Account 6","Account 7","Account 8")
FY1990 <- c(500,30,NA,NA,NA,NA,NA,NA,1000,500,60,400,NA,300,NA,900,1000)
FY1991 <- c(900,40,60,35,340,500,800,60,400,60,1000,400,1000,40,300,900,1200)
FY1992 <- c(1000,40,20,NA,60,800,NA,1000,NA,NA,NA,400,1000,300,300,1000,1500)
Data=data.frame(firm=firm, account=account, FY1990=FY1990, FY1991=FY1991, FY1992=FY1992)
summary(Data)
Data