r - How do i refer to rows/ columns in data frames that are within a list?

Question

I'm looking to work with the data frames I have within a list. I'm getting 'incorrect number of subscripts' and similar errors, despite my current best efforts. Here's my code:

folder = 'C:/Path to csv files-071813/'
symbs = c('SPX', 'XLF', 'XLY', 'XLV', 'XLI', 'IYZ', 'XLP', 'XLE', 'XLK', 'XLB', 'XLU', 'SHV')
importData = vector('list', length(symbs))
names(importData) = symbs

for (sIdx in 1:length(symbs)){
    #Import the data for each symbol into the list.
    importData[sIdx] = read.csv(paste(folder, symbs[sIdx], '.csv', sep = ''), header = TRUE)

}

Each csv file is thousands of rows, and 7 columns. I'm assuming what I have above is returning a data frame from each csv file, into my list. I'd like to enter:

importData[[1]][, 1]

to work with the first column of the first data frame in my list. Am I close? I can't find resolution despite all my searching. Many thanks in advance...

score 4 · Accepted Answer

The apply family of functions are going to be your friend here, specifically lapply, a function which, given a list and a function, applies the function to every element of that list and returns the results as elements of a new list.

folder = 'C:/Path to csv files-071813/'
symbs = c('SPX', 'XLF', 'XLY', 'XLV', 'XLI', 'IYZ', 'XLP', 'XLE', 'XLK', 'XLB', 'XLU', 'SHV')
filenames = paste0(folder,symbs,'.csv')
listOfDataframes=lapply(filenames,read.table,header=T)

Now if you want the second column from all the dataframes you could do something like

listOfFirstCols=lapply(listOfDataframes,"[",,2)

Or more explicitly

listOfFirstCols=lapply(listOfDataframes,function(x)x[,1])

score 1 · Accepted Answer

Yes, you are close. You need

importData[[sIdx]] <- read.csv(....)

(i.e. [[) as you want to assign a data frame inside the sIdxth component. Single brackets [ would require a list to be assigned.

importData[[1]] returns the object inside importData[1]. This is a subtle difference, with the latter returning a list containing the first component, whereas the former returns the object inside that list.

As importData[[sIdx]] is a data frame, you can index it as you would any other data frame. It might help to think of importData[[sIdx]] as data frame df and then add on to that what you would normally use to index the first column, i.e. df[, 1] (or alternatively df[[1]]), then substitute back in the real object instead of df

                df[, 1]
importData[[sIdx]][, 1] ## substitute back in the real object for `df`

If you want to extract each first column in turn, then

lapply(importData, `[`, , 1) ## matches df[, 1]

or

lapply(importData, `[[`, 1)  ## matches df[[1]]

will return them as a list, with versions using sapply() instead of lapply() simplifying the result to an array where possible.

Note that in the first example

lapply(importData, `[`, , 1)

the empty argument (, , 1) is important as it refers to the empty argument in df[ , 1], i.e. the bit before the comma. Hence the second option, using [[ in the lapply() call may be less error-prone and why I mentioned it earlier.

score 0 · Accepted Answer

>  myfunc<-function(a,b){ ###a is numeric (vector of) symbol indices to
> include,b is (vector of) column indices to include
>        if (length(a)>0){
>           importalldata<-read.csv(paste(folder, symbs[a[1]], '.csv', sep = ''), header = TRUE)[b]
>        if (length(a)>1){
>           for(i in 2:length(d)){
>              importalldata<-rbind(importalldata,read.csv(paste(folder, symbs[a[i]], '.csv', sep = ''), header = TRUE)[b])
>           }
>        }else{print('Must select at least one symbol')}
>     return(importalldata)
>     }

to load your data for one symbol, do:

importalldata<-myfunc(1,1)

for multiple symbols:

importalldata<-myfunc(c(1,3,4),1)

for multiple columns:

importalldata<-myfunc(c(1,3,4),1:3)

I think that is what you want? Or are you trying to get all column 1's for each file into 1 dataframe? If you include reproducible data, you will get a better answer.

That said, thousands of rows isn't much and you will would likely be better off combining ('stacking') your data into 1 csv with your symbls as a factors, and then using subset/data.table package to select the data you want. Check out

?stack

r - How do i refer to rows/ columns in data frames that are within a list?

3 回答 3

Related

Reference