2

我有一个包含 150 个数字的数据集,从中抽取了 100 个数字。我如何识别(放入新矩阵)剩余的 50 个?

X <- runif(150)
Combined <- sample(X, 100)
4

3 回答 3

2

将您的样本创建为单独的向量:

using <- sample(1:150, 100)

Entires <- All.Entries[using]
Non.Entries <- All.Entries[-using]
于 2012-11-12T02:28:09.287 回答
0

All numbers:

x <- sample(10, 150, TRUE) # as an example

A random sample:

Combined <- sample(x,100)

The remaining numbers:

xs <- sort(x) # sort the values of x
tab <- table(match(Combined, xs))
Remaining <- xs[-unlist(mapply(function(x, y) seq(y, length = x),
                               tab, as.numeric(names(tab))))]

Note. This solution also works if x has duplicated values.

于 2012-11-11T23:54:24.493 回答
0

根据您的评论更新。

如果Combined是 的子集X,要查找那些X不在其中的元素,Combined您可以使用:

    X[ !(X %in% Combined) ] 

X %in% Combined)当元素在和元素不在时,将为您提供X与 value 相同大小的逻辑向量。TRUECombinedFALSE

作为课程解释:这个逻辑向量可以用作索引。 X[ X %in% Combined ]会给你所有X这样XCombined

由于您正在寻求相反的否定逻辑向量X[ !(X %in% Combined) ]以获取X所有X不在Combined.


如果X包含重复项,那么您可以根据名称进行过滤(当然假设名称是唯一的)

X[ !(names(X) %in% names(Combined)) ] 

# or if sampling by rows
X[ !(rownames(X) %in% rownames(Combined)) ] 

您可以轻松地将名称分配给X

names(X) <- 1:length(X)

# or for multi-dimensional
rownames(X)  <- 1:nrow(X)

另请参阅帮助文档

?"%in%"  # note the quotes
?which
?match 


或者,您可以对索引进行采样,使用负号,如下mat[-indices,] 示例:

    # Create a sample matrix of 150 rows, 3 columns
    mat <- matrix(rnorm(450), ncol=3)

    # Take a sampling of indices to the rows
    indices <- sample(nrow(mat), 100, replace=F)

    # Splice the matrix
    mat.included <- mat[indices,]
    mat.leftover <- mat[-indices,]

    # Confirm everything is of proper size
    dim(mat)
    # [1] 150   3
    dim(mat.included)
    # [1] 100   3
    dim(mat.leftover)
    # [1] 50  3
于 2012-11-11T22:35:20.297 回答