1

我正在尝试从尾随空格中清除数据框中的因子变量。但是,级别分配在我的 lapply 函数中不起作用。

rm.space<-function(x){
    a<-gsub(" ","",x)
    return(a)}


lapply(names(barn),function(x){
    levels(barn[,x])<-rm.space(levels(barn[,x]))
    })

有什么想法可以在 lapply 函数中分配级别吗?

//M

4

3 回答 3

6

R 是矢量化的,你不需要apply()

> f <- as.factor(sample(c("  a", " b", "c", "  d"), 10, replace=TRUE))                                                                                                             
> levels(f)                                                                                                                                                                        
[1] "  a" " b"  "c"   "  d"                                                                                                                                                        
> levels(f) <- gsub(" +", "", levels(f), perl=TRUE)                                                                                                                                
> levels(f)                                                                                                                                                                        
[1] "a" "b" "c" "d"                                                                                                                                                                
> f                                                                                                                                                                                
 [1] d a c b c d d a a a                                                                                                                                                           
Levels: a b c d                                                                                                                                                                    
>
于 2010-09-09T00:32:07.820 回答
1

From your code I read that the lapply is used to loop over different variables, not over the levels of the factor. So then you do need some kind of looping structure, but lapply is a bad choice:

  • you loop over a vector -names(barn)- so it's better to use sapply
  • the apply family will return the result from each loop, something you don't want. So you're using memory without purpose.

Anyway, in case you need to assign something to a variable in your global environment within a lapply, you need the <<- operator. Say you need to have a number of variables you selected where the spaces have to be removed:

f <- paste("",letters[1:5])

Df <- data.frame(
    X1 = sample(f,10,r=T),
    X2 = sample(f,10,r=T),
    X3 = sample(f,10,r=T)
    )

# Bad example :   
lapply(c("X1","X3"),function(x){
    levels(Df[,x])<<-gsub(" +","",levels(Df[,x]))
    })

gives

> str(Df)
'data.frame':   10 obs. of  3 variables:
 $ X1: Factor w/ 3 levels "a","b","c": 2 3 1 1 1 2 3 2 2 2
 $ X2: Factor w/ 5 levels " a"," b"," c",..: 4 5 4 2 5 5 1 2 5 3
 $ X3: Factor w/ 5 levels "a","b","c","d",..: 2 3 4 1 4 1 3 3 5 4

Better is to use a for loop :

for( i in c("X1","X3")){
    levels(Df[,i])<-gsub(" +","",levels(Df[,i]))
}

Does what you need without the hassle of the <<- operator and without holding memory unnecessarily.

于 2010-09-09T08:23:11.750 回答
0

正如 Joris 所说,它lapply适用于 的本地副本data.frame,因此它不会修改您的原始数据。但是您可以使用它来替换您的数据:

barn[] <- lapply(barn, function(x) {
    levels(x) <- rm.space(levels(x))
    x
    })

当您有不同类型的数据并且只想修改factor's 时,它很有用,例如:

factors <- sapply(barn, is.factor)
barn[factors] <- lapply(barn[factors], function(x) {
                    levels(x) <- rm.space(levels(x))
                    x
                 })
于 2010-09-09T10:14:08.340 回答