好的,这让我感到非常困惑和担心——作为例行程序的一部分,我一直在将变量的单个观察值分类为TRUE
或FALSE
基于它们的值是否高于或低于/等于中值。但是,我在 R 中得到了一种行为,这在执行这个简单的测试时很大程度上是出乎意料的。
所以采取这组观察:
data=c(0.6666667, 0.8333, 0.6666667, 0.8333, 0.8333, 0.75, 0.9999, 0.7499667, 0.25, 0.6666667, 0.1667, 0.7499667, 0.5, 0.2500333, 0.3333667, 0.0834, 0.0001, 0.2500333, 0.8333, 0.9999, 0.9999, 0.2500333, 0.2500333, 0.3333667, 0.9166, 0.5, 0.2500333, 0.4166667, 0.0001, 0.1667333, 0.6666333, 0.0834, 0.1667, 0.6666333, 0.9166, 0.1667, 0.7499333, 0.9166, 0.9166, 0.9166, 0.7499667, 0.7499667, 0.4166667, 0.5, 0.2500333, 0.9166, 0.6666667, 0.1667333, 0.25, 0.0001, 0.3333667, 0.0001, 0.25, 0.0834, 0.9999, 0.0834, 0.1667, 0.5, 0.2500333, 0.3333667, 0.9166, 0.9166, 0.8333, 0.9166, 0.75, 0.0834, 0.4166667, 0.5, 0.0001, 0.9999, 0.8333, 0.6666667, 0.9166)
为了对这些值进行分类,我做了:
data_med=median(data)
quant_data=data
quant_data[quant_data>data_med]="High"
quant_data[quant_data<=data_med]="Low"
我知道有 1 亿种方法可以更有效地做到这一点,但我担心的是,这样做的输出没有意义。由于集合上没有NaN
s 并且测试是全包的(>
或<=
),我最终应该得到一个只有TRUE
/FALSE
值的列表,但我得到了:
[1] "High" "High" "High" "High" "High" "High" "High" "High" "Low" "High" "Low" "High" "Low" "Low" "Low" "Low" "1e-04"
[18] "Low" "High" "High" "High" "Low" "Low" "Low" "High" "Low" "Low" "Low" "1e-04" "Low" "High" "Low" "Low" "High"
[35] "High" "Low" "High" "High" "High" "High" "High" "High" "Low" "Low" "Low" "High" "High" "Low" "Low" "1e-04" "Low"
[52] "1e-04" "Low" "Low" "High" "Low" "Low" "Low" "Low" "Low" "High" "High" "High" "High" "High" "Low" "Low" "Low"
[69] "1e-04" "High" "High" "High" "High"
看到“1e-04”了吗?更奇怪的是,让我们选择值 69,它是返回奇数的值之一:
data[69]
>1e-04
如果我单独测试这个值,我会得到我期望得到的结果:
data[69]<=data_med
TRUE
有人可以解释这种行为吗?只是看起来很危险...