1

我经常使用cut(). 因为cut()不明白时钟时间大约为零,所以我首先将时间分为三组(白天的任一侧的夜晚),然后合并两个“夜晚”因子水平。这可以通过将相同的“夜间”值两次赋予 来完成levels()。例如

x <- c(4, 10, 23) # i.e. 4 am, 10 am, 11 pm
x <- cut(x
         , breaks = c(0, 6, 22, 23)
         , include.lowest = FALSE
         , labels = c("night2", "day", "night1"))
# [1] night2 day    night1
# Levels: night2 day night1

levels(x) <- c("night", "day", "night")
x
# [1] night day   night
# Levels: night day

现在我正在尝试对ff对象中的巨大数据集做同样的事情:

require(ff)
require(ffbase)

y <- ff(c(4, 10, 23))
y <- ff(cut(y
            , breaks = c(0, 6, 22, 23)
            , include.lowest = FALSE
            , labels = c("night2", "day", "night1")))
y
# ff (open) integer length=3 (3) levels: night2 day night1
#    [1]    [2]    [3] 
# night2 day    night1 

levels(y) <- c("night", "day", "night")
y
# ff (open) integer length=3 (3) levels: night day night
#  [1]   [2]   [3] 
# night day   night

请注意,在这种情况下,levels()保留了三个因子水平,其中两个具有相同的标签。recodeLevels看起来很有希望,但并不完全一样:

y <- recodeLevels(y, c("night", "day", "night"))
y
# ff (open) integer length=3 (3) levels: night day night
# [1] [2] [3] 
# NA  day NA  

我也尝试过cut()(实际上cut.ff())内重复的“夜间”标签,但它仍然返回三个级别,加上一个警告,即不推荐使用因子中的重复级别。

谢谢你的建议。

4

2 回答 2

2

This may be too simple, but why not just do:

x <- c(4, 10, 23)
y = c("day", "night")[(x <= 6 | x > 22) + 1]
y
[1] "night" "day"   "night"
于 2014-02-03T22:44:02.133 回答
2

这可能是您正在寻找的。recodeLevels从包 ff 中使用

require(ff)
y <- c(4, 10, 23)
y <- ff(cut(y, breaks = c(0, 6, 22, 23), include.lowest = FALSE, 
            labels = c("night2", "day", "night1")))
levels(y) <- c("night", "day", "night")
alllevs <- c("night", "day")
y <- recodeLevels(y, alllevs)
levels(y) <- alllevs
y
ff (open) integer length=3 (3) levels: night day
[1]   [2]   [3] 
night day   night 
于 2014-02-05T10:01:01.677 回答