r - 数据集的剩余变量

Question

我有一个包含 150 个数字的数据集，从中抽取了 100 个数字。我如何识别（放入新矩阵）剩余的 50 个？

X <- runif(150)
Combined <- sample(X, 100)

score 2 · Accepted Answer

将您的样本创建为单独的向量：

using <- sample(1:150, 100)

Entires <- All.Entries[using]
Non.Entries <- All.Entries[-using]

score 0 · Accepted Answer

All numbers:

x <- sample(10, 150, TRUE) # as an example

A random sample:

Combined <- sample(x,100)

The remaining numbers:

xs <- sort(x) # sort the values of x
tab <- table(match(Combined, xs))
Remaining <- xs[-unlist(mapply(function(x, y) seq(y, length = x),
                               tab, as.numeric(names(tab))))]

Note. This solution also works if x has duplicated values.

score 0 · Accepted Answer

根据您的评论更新。

如果Combined是的子集X，要查找那些X不在其中的元素，Combined您可以使用：

    X[ !(X %in% Combined) ]

X %in% Combined)当元素在和元素不在时，将为您提供X与 value 相同大小的逻辑向量。TRUECombinedFALSE

作为课程解释：这个逻辑向量可以用作索引。 X[ X %in% Combined ]会给你所有X这样X的Combined。

由于您正在寻求相反的否定逻辑向量X[ !(X %in% Combined) ]以获取X所有X不在Combined.

如果X包含重复项，那么您可以根据名称进行过滤（当然假设名称是唯一的）

X[ !(names(X) %in% names(Combined)) ] 

# or if sampling by rows
X[ !(rownames(X) %in% rownames(Combined)) ]

您可以轻松地将名称分配给X

names(X) <- 1:length(X)

# or for multi-dimensional
rownames(X)  <- 1:nrow(X)

另请参阅帮助文档

?"%in%"  # note the quotes
?which
?match

或者，您可以对索引进行采样，使用负号，如下mat[-indices,] 示例：

    # Create a sample matrix of 150 rows, 3 columns
    mat <- matrix(rnorm(450), ncol=3)

    # Take a sampling of indices to the rows
    indices <- sample(nrow(mat), 100, replace=F)

    # Splice the matrix
    mat.included <- mat[indices,]
    mat.leftover <- mat[-indices,]

    # Confirm everything is of proper size
    dim(mat)
    # [1] 150   3
    dim(mat.included)
    # [1] 100   3
    dim(mat.leftover)
    # [1] 50  3

r - 数据集的剩余变量

3 回答 3

根据您的评论更新。

Related

Reference