I'm trying to split a data set into deciles. I gave all the rows an id # (1:nrow(dataset)), then I use the cut() function to assign each row to a decile.
> df1 <- data.frame(id = 1:1000, cutter1 = NA)
> head(df1)
id cutter1
1 1 NA
2 2 NA
3 3 NA
4 4 NA
5 5 NA
6 6 NA
> df1$cutter1 <- cut(df1$id,10, labels = F)
> table(df1$cutter1)
1 2 3 4 5 6 7 8 9 10
100 100 100 100 100 100 100 100 100 100
Above is what I expect, however when I increase the # of rows to 100000 I see some weird behavior in the 1 and 10 deciles.
> df1 <- data.frame(id = 1:100000, cutter1 = NA)
> head(df1)
id cutter1
1 1 NA
2 2 NA
3 3 NA
4 4 NA
5 5 NA
6 6 NA
> df1$cutter1 <- cut(df1$id,10, labels = F)
> table(df1$cutter1)
1 2 3 4 5 6 7 8 9 10
9920 10020 10020 10020 10020 10020 10020 10020 10020 9920
I played with the include.lowest
and right
parameters but they didn't fix anything. Any idea why this is happening?