3

我有一个我正在discretized使用的数据框RWeka。RWeka 的离散化创建了带有单引号的 bin。尽管它们没有引起任何问题,但在绘制带有'All'类别的变量时看起来很难看。

这是离散化的数据框:

structure(list(outlook = structure(c(1L, 1L, 2L, 3L, 3L, 3L, 
2L, 1L, 1L, 3L, 1L, 2L, 2L, 3L), .Label = c("sunny", "overcast", 
"rainy"), class = "factor"), temperature = structure(c(1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "'All'", class = "factor"), 
humidity = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L), .Label = "'All'", class = "factor"), 
windy = c(FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, 
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE), play = structure(c(2L, 
2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), .Label = c("yes", 
"no"), class = "factor")), .Names = c("outlook", "temperature", 
"humidity", "windy", "play"), row.names = c(NA, -14L), class = "data.frame")

如何从数据中删除单引号并重新创建因子?

4

1 回答 1

3

这应该这样做:

df$temperature <- gsub("\\'", "", df$temperature)
df$humidity <- gsub("\\'", "", df$humidity)
> df
    outlook temperature humidity windy play
1     sunny         All      All FALSE   no
2     sunny         All      All  TRUE   no
3  overcast         All      All FALSE  yes
4     rainy         All      All FALSE  yes
5     rainy         All      All FALSE  yes
6     rainy         All      All  TRUE   no
7  overcast         All      All  TRUE  yes
8     sunny         All      All FALSE   no
9     sunny         All      All FALSE  yes
10    rainy         All      All FALSE  yes
11    sunny         All      All  TRUE  yes
12 overcast         All      All  TRUE  yes
13 overcast         All      All FALSE  yes
14    rainy         All      All  TRUE   no

如果您需要在多个列上执行相同的操作,这可能会更有效。

df[, 2:3] <- apply(df[, 2:3], 2, function(x) { 
    gsub("\\'", "", x)
    })
于 2012-10-16T19:10:50.863 回答