1

我需要在 R 中用零 (0) 替换空单元格。我有一个这样的数据框:

输入(df)

structure(list(CHANNEL = structure(c(1L, 1L, 1L), .Label = "Native BlackBerry App", class = "factor"), 
    DATE = structure(c(1L, 1L, 1L), .Label = "01/01/2011", class = "factor"), 
    HOUR = structure(c(3L, 1L, 2L), .Label = c("1:00am-2:00am", 
    "2:00am-3:00am", "Midnight-1:00am"), class = "factor"), UNIQUE_USERS = structure(c(1L, 
    1L, 1L), .Label = "", class = "factor"), LOGON_VOLUME = structure(c(1L, 
    1L, 1L), .Label = "", class = "factor")), .Names = c("CHANNEL", 
"DATE", "HOUR", "UNIQUE_USERS", "LOGON_VOLUME"), row.names = c(NA, 
-3L), class = "data.frame")

我有这个功能:

sapply(df, function (x) 
     as.numeric(gsub("(^ +)|( +$)", "0", x))) 

我收到这些错误,无法正常工作。

[ reached getOption("max.print") -- omitted 422793 rows ]
Warning messages:
1: In FUN(X[[4L]], ...) : NAs introduced by coercion
2: In FUN(X[[4L]], ...) : NAs introduced by coercion
3: In FUN(X[[4L]], ...) : NAs introduced by coercion
4: In FUN(X[[4L]], ...) : NAs introduced by coercion

更新:当我将此功能应用于df时:

sapply(df, function (x) gsub("(^ +)|( +$)", "0", x) )

我明白了:

  CHANNEL                 DATE         HOUR              UNIQUE_USERS LOGON_VOLUME
[1,] "Native BlackBerry App" "01/01/2011" "Midnight-1:00am" ""           ""          
[2,] "Native BlackBerry App" "01/01/2011" "1:00am-2:00am"   ""           ""          
[3,] "Native BlackBerry App" "01/01/2011" "2:00am-3:00am"   ""           ""  
4

1 回答 1

4

您定义了一个匿名函数,sapply然后永远不要使用该函数的参数。

sapply(df, function (x) gsub("(^ +)|( +$)", "0", x) ) #===> change df to x

您还可以将所有内容强制转换为数值,从而生成NA包含非数字的字符串的值。由于 的每一列data.frame都是原子向量,它只能包含一种类型的数据。因此,所有元素的通用数据类型是字符。

也许你打算这样做......

sapply( df , gsub , pattern = "^\\s*$" , replacement = 0 )

     CHANNEL                 DATE         HOUR              UNIQUE_USERS LOGON_VOLUME
[1,] "Native BlackBerry App" "01/01/2011" "Midnight-1:00am" "0"          "0"         
[2,] "Native BlackBerry App" "01/01/2011" "1:00am-2:00am"   "0"          "0"         
[3,] "Native BlackBerry App" "01/01/2011" "2:00am-3:00am"   "0"          "0"  

使用之后,gsub您必须将转换为整数,并且您还将获得NA包含除数字字符表示之外的其他内容的任何列。如果您需要更改整个列,您可以检查整个列是否为空,如果是则替换为零。同一列中不能有字符元素和数字元素。

len <- colSums( sapply( df , grepl , pattern = "^\\s*$" ) )    
df[ , len > 0 ] <- rep( 0 , nrow(df) )
#                CHANNEL       DATE            HOUR UNIQUE_USERS LOGON_VOLUME
#1 Native BlackBerry App 01/01/2011 Midnight-1:00am            0            0
#2 Native BlackBerry App 01/01/2011   1:00am-2:00am            0            0
#3 Native BlackBerry App 01/01/2011   2:00am-3:00am            0            0
于 2013-09-11T20:50:04.193 回答