I have a set of dataframes that look like this (they have the same columns, not the same amount of rows):
df1 <- data.frame(v = c("banana", "apple", "orange", "grape", "kiwi fruit", "pear"), x = rnorm(6, 0.06, 0.01))
df2 <- data.frame(v = c("table", "chair", "couch", "dresser", "night stand"), x = rnorm(5, 0.06, 0.01))
df3 <- data.frame(v = c("white", "blue", "pink", "bright red", "orange", "dark green", "black"), x = rnorm(7, 0.06, 0.01))
I have a range of operations (counting things about the words in df1$v, df2$v, df3$v) that I would like to perform on these dataframes. One solution I found is to put the datframes in a list, and then use lapply to apply a function over all the dataframes in the list:
ls <- list(df1, df2, df3)
func1 <- function(dat){
dat$complex <- sapply(strsplit(as.character(dat$v), " "), length)
}
ls_func1 <- lapply(ls, FUN = func1)
ls_func1
[[1]]
[1] 1 1 1 1 2 1
[[2]]
[1] 1 1 1 1 2
[[3]]
[1] 1 1 1 2 1 2 1
At least this gets me the counts of the number of words in v, which I can then combine again into a dataframe or whatever.
The problem is, it does not seem to work for each function. This, for instance, works fine when done for a single dataframe:
for(i in 1:length(df1$v)){
string <- strsplit(as.character(df1$v[i]), "")
counter <- 0
for(j in 1:length(string[[1]])){
if(grepl("a|b|c|d|e", string[[1]][j])){
counter <- counter + 1
}
}
df1$length[i] <- counter
}
df1
v x length
1 banana 0.05233752 4
2 apple 0.08564292 2
3 orange 0.04679124 2
4 grape 0.06655950 2
5 kiwi fruit 0.05684803 0
6 pear 0.07654617 2
But when transform it into a function, it does not work:
func2 <- function(dat){
for(i in 1:length(dat$v)){
string <- strsplit(as.character(dat$v[i]), "")
counter <- 0
for(j in 1:length(string[[1]])){
if(grepl("a|b|c|d|e", string[[1]][j])){
counter <- counter + 1
}
}
dat$length[i] <- counter
}
}
ls_func2 <- lapply(ls, FUN = func2)
ls_func2
[[1]]
NULL
[[2]]
NULL
[[3]]
NULL
What am I doing wrong here? And is there any way to create new columns in my existing dataframes using these functions and lapply? In other words, to create the folowing by first applying the first function, and then applying the second function:
ls
[[1]]
v x complex length
1 banana 0.05233752 1 4
2 apple 0.08564292 1 2
3 orange 0.04679124 1 2
4 grape 0.06655950 1 2
5 kiwi fruit 0.05684803 2 0
6 pear 0.07654617 1 2
[[2]]
v x complex length
1 table 0.65790811 1 2
....
[[3]]
....
etc.?