r - 如何选择 n 行，例如给定列的值都是不同的？

Question

下午好！

假设我有以下矩阵：

 Sensor location  Target location detection Probability
1                 7              13             0.2943036
2                21              15             0.2943036
3                16              13             0.2943036
4                18              15             0.2943036
5                21              15             0.2943036
6                 1               2             0.2943036
7                16              22             0.2943036
8                10               4             0.2943036
9                16              17             0.2943036
10                2               5             0.2943036
11               13              16             0.2943036
12                9              12             0.2943036
13                2               8             0.2943036
14                7               1             0.2943036
15                7              10             0.2943036
16                1               2             0.2943036
17               18              12             0.2943036
18               23              17             0.2943036
19               21              15             0.2943036
20               20              21             0.2943036
21                2               1             0.2943036
22               12              18             0.2943036
23               24              21             0.2943036
24               22              23             0.2943036
25                2               3             0.2943036
26               11              10             0.2943036
27                7              10             0.2943036
28                2               3             0.2943036
29               12               6             0.2943036
30                2               1             0.2943036
31               24              21             0.2943036
32               14               8             0.2943036

如何从这个矩阵中采样 n 行，例如第二列的值都是不同的？

所需输出的示例（Target location列值必须是唯一的）：

与 n=4 ：

         Sensor location  Target location detection Probability 
4                18              15             0.2943036
7                16              22             0.2943036
8                10               4             0.2943036
9                16              17             0.2943036

不需要的输出（该值15在第二列中出现多次）：

      Sensor location  Target location detection Probability 
4                18              15             0.2943036
2                21              15             0.2943036
8                10               4             0.2943036
9                16              17             0.2943036

我知道它dplyr具有类似sample_n()and的功能dplyr::distinct，我曾尝试过：

data %>% distinct("Target location")

我希望我的问题很清楚，非常感谢您的帮助！

score 2 · Accepted Answer

您可以执行以下操作：

n <- 25
indicesToSampleFrom <- which(!duplicated(data[["Target location"]]))
data[sample(indicesToSampleFrom,n),]

编辑：如果您想将此逻辑应用于其他列，最好检查是否有足够的不同值。因此，不是对n行进行采样，而是 sample min(n,length(indicesToSampleFrom))。

r - 如何选择 n 行，例如给定列的值都是不同的？

1 回答 1

Related

Reference