我有一个包含 150 个数字的数据集,从中抽取了 100 个数字。我如何识别(放入新矩阵)剩余的 50 个?
X <- runif(150)
Combined <- sample(X, 100)
将您的样本创建为单独的向量:
using <- sample(1:150, 100)
Entires <- All.Entries[using]
Non.Entries <- All.Entries[-using]
All numbers:
x <- sample(10, 150, TRUE) # as an example
A random sample:
Combined <- sample(x,100)
The remaining numbers:
xs <- sort(x) # sort the values of x
tab <- table(match(Combined, xs))
Remaining <- xs[-unlist(mapply(function(x, y) seq(y, length = x),
tab, as.numeric(names(tab))))]
Note. This solution also works if x
has duplicated values.
如果Combined
是 的子集X
,要查找那些X
不在其中的元素,Combined
您可以使用:
X[ !(X %in% Combined) ]
X %in% Combined)
当元素在和元素不在时,将为您提供X
与 value 相同大小的逻辑向量。TRUE
Combined
FALSE
作为课程解释:这个逻辑向量可以用作索引。 X[ X %in% Combined ]
会给你所有X
这样X
的Combined
。
由于您正在寻求相反的否定逻辑向量X[ !(X %in% Combined) ]
以获取X
所有X
不在Combined
.
如果X
包含重复项,那么您可以根据名称进行过滤(当然假设名称是唯一的)
X[ !(names(X) %in% names(Combined)) ]
# or if sampling by rows
X[ !(rownames(X) %in% rownames(Combined)) ]
您可以轻松地将名称分配给X
names(X) <- 1:length(X)
# or for multi-dimensional
rownames(X) <- 1:nrow(X)
另请参阅帮助文档
?"%in%" # note the quotes
?which
?match
mat[-indices,]
示例:
# Create a sample matrix of 150 rows, 3 columns
mat <- matrix(rnorm(450), ncol=3)
# Take a sampling of indices to the rows
indices <- sample(nrow(mat), 100, replace=F)
# Splice the matrix
mat.included <- mat[indices,]
mat.leftover <- mat[-indices,]
# Confirm everything is of proper size
dim(mat)
# [1] 150 3
dim(mat.included)
# [1] 100 3
dim(mat.leftover)
# [1] 50 3