0

I'm trying to split a data set into deciles. I gave all the rows an id # (1:nrow(dataset)), then I use the cut() function to assign each row to a decile.

> df1 <- data.frame(id = 1:1000, cutter1 = NA)
> head(df1)
  id cutter1
1  1      NA
2  2      NA
3  3      NA
4  4      NA
5  5      NA
6  6      NA
> df1$cutter1 <- cut(df1$id,10, labels = F)
> table(df1$cutter1)

  1   2   3   4   5   6   7   8   9  10 
100 100 100 100 100 100 100 100 100 100 

Above is what I expect, however when I increase the # of rows to 100000 I see some weird behavior in the 1 and 10 deciles.

> df1 <- data.frame(id = 1:100000, cutter1 = NA)
> head(df1)
  id cutter1
1  1      NA
2  2      NA
3  3      NA
4  4      NA
5  5      NA
6  6      NA
> df1$cutter1 <- cut(df1$id,10, labels = F)
> table(df1$cutter1)

    1     2     3     4     5     6     7     8     9    10 
 9920 10020 10020 10020 10020 10020 10020 10020 10020  9920

I played with the include.lowest and right parameters but they didn't fix anything. Any idea why this is happening?

4

1 回答 1

3

我不确定输出的格式是否适合您,但这可能是另一种解决方案:

decile <- with(df1, cut(id, breaks=quantile(df1$id, probs=seq(0,1, by=0.1)), include.lowest=TRUE))
res = table(decile)
names(res) <-as.character(seq(1:10))

即使在以下情况下也可以正常工作: df1 <- data.frame(id = 1:100000, cutter1 = NA)

于 2013-08-19T17:57:37.673 回答