r - 使用 cut() 添加十分位列时接收 NA

Question

新的 R 用户。我正在尝试根据这个问题中的过程使用 cut 来拆分基于十分位数的数据集。我想将十分位值添加为数据框中的新列，但是当我这样做时，由于某种原因，最低值被列为 NA。无论 include.lowest=TRUE 还是 FALSE，都会发生这种情况。有人知道为什么吗？

当我使用这个样本集时也会发生，所以它不是我的数据独有的。

数据 <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)

> decile <- cut(data, quantile(data, (0:10)/10, labels=TRUE, include.lowest=FALSE))

> df <- cbind(data, decile)

> df

      data decile
 [1,]    1     NA
 [2,]    2      1
 [3,]    3      2
 [4,]    4      2
 [5,]    5      3
 [6,]    6      3
 [7,]    7      4
 [8,]    8      4
 [9,]    9      5
[10,]   10      5
[11,]   11      6
[12,]   12      6
[13,]   13      7
[14,]   14      7
[15,]   15      8
[16,]   16      8
[17,]   17      9
[18,]   18      9
[19,]   19     10
[20,]   20     10

score 4 · Accepted Answer

有两个问题，首先你的cut电话有几个问题。我想你的意思是

cut(data, quantile(data, (0:10)/10), include.lowest=FALSE)
##                                ^missing parenthesis

此外，labels应该是FALSE、或包含所需标签NULL的向量。length(breaks)

其次，主要问题是因为您设置了include.lowest=FALSE,和 data[1]is 1，它对应于定义的第一个中断

> quantile(data, (0:10)/10)
  0%  10%  20%  30%  40%  50%  60%  70%  80%  90% 100% 
 1.0  2.9  4.8  6.7  8.6 10.5 12.4 14.3 16.2 18.1 20.0

该值1不属于任何类别；它超出了您的休息时间定义的类别的下限。

我不确定您想要什么，但是您可以尝试以下两种选择之一，具体取决于您想1参加的课程：

> cut(data, quantile(data, (0:10)/10), include.lowest=TRUE)
 [1] [1,2.9]     [1,2.9]     (2.9,4.8]   (2.9,4.8]   (4.8,6.7]   (4.8,6.7]  
 [7] (6.7,8.6]   (6.7,8.6]   (8.6,10.5]  (8.6,10.5]  (10.5,12.4] (10.5,12.4]
[13] (12.4,14.3] (12.4,14.3] (14.3,16.2] (14.3,16.2] (16.2,18.1] (16.2,18.1]
[19] (18.1,20]   (18.1,20]  
10 Levels: [1,2.9] (2.9,4.8] (4.8,6.7] (6.7,8.6] (8.6,10.5] ... (18.1,20]
> cut(data, c(0, quantile(data, (0:10)/10)), include.lowest=FALSE)
 [1] (0,1]       (1,2.9]     (2.9,4.8]   (2.9,4.8]   (4.8,6.7]   (4.8,6.7]  
 [7] (6.7,8.6]   (6.7,8.6]   (8.6,10.5]  (8.6,10.5]  (10.5,12.4] (10.5,12.4]
[13] (12.4,14.3] (12.4,14.3] (14.3,16.2] (14.3,16.2] (16.2,18.1] (16.2,18.1]
[19] (18.1,20]   (18.1,20]  
11 Levels: (0,1] (1,2.9] (2.9,4.8] (4.8,6.7] (6.7,8.6] ... (18.1,20]

r - 使用 cut() 添加十分位列时接收 NA

1 回答 1

Related

Reference