2

我有一个大矩阵,约 300 行和 200000 列。我想通过选择至少具有一个大于 0.5 或小于 -0.5 的值(不仅仅是那个特定值)的整个列来缩小它。我想保留行名和列名。我能够通过做得到一个真假矩阵tmp<-mymat > 0.5 | mymat < -0.5。我想提取其中至少有一个TRUE的所有列。我只是简单地尝试过mymat[tmp],但这只是返回满足该条件的值的向量。如何获得原始矩阵的实际列?谢谢。

4

2 回答 2

6

试试这个:

> set.seed(007) # for the example being reproducible
> X <- matrix(rnorm(100), 20) # generating some data
> X <- cbind(X, runif(20, max=.48)) # generating a column with all values < 0.5
> colnames(X) <- paste('col', 1:ncol(X), sep='') # some column names
> X # this is how the matrix looks like
              col1        col2         col3        col4         col5        col6
 [1,]  2.287247161  0.83975036  1.218550535  0.07637147  0.342585350 0.335107187
 [2,] -1.196771682  0.70534183 -0.699317079  0.15915528  0.004248236 0.419502015
 [3,] -0.694292510  1.30596472 -0.285432752  0.54367418  0.029219842 0.346358090
 [4,] -0.412292951 -1.38799622 -1.311552673  0.70480735 -0.393423429 0.212185020
 [5,] -0.970673341  1.27291686 -0.391012431  0.31896914 -0.792704563 0.224824248
 [6,] -0.947279945  0.18419277 -0.401526613  1.10924979 -0.311701865 0.415837389
 [7,]  0.748139340  0.75227990  1.350517581  0.76915419 -0.346068592 0.057660111
 [8,] -0.116955226  0.59174505  0.591190027  1.15347367 -0.304607588 0.007812921
 [9,]  0.152657626 -0.98305260  0.100525456  1.26068350 -1.785893487 0.298192099
[10,]  2.189978107 -0.27606396  0.931071996  0.70062351  0.587274672 0.216225091
[11,]  0.356986230 -0.87085102 -0.262742349  0.43262716  1.635794434 0.026097800
[12,]  2.716751783  0.71871055 -0.007668105 -0.92260172 -0.645423474 0.190567072
[13,]  2.281451926  0.11065288  0.367153007 -0.61558421  0.618992169 0.402829397
[14,]  0.324020540 -0.07846677  1.707162545 -0.86665969  0.236393598 0.248196976
[15,]  1.896067067 -0.42049046  0.723740263 -1.63951709  0.846500899 0.406511129
[16,]  0.467680511 -0.56212588  0.481036049 -1.32583924 -0.573645739 0.162457572
[17,] -0.893800723  0.99751344 -1.567868244 -0.88903673  1.117993204 0.383801555
[18,] -0.307328300 -1.10513006  0.318250283 -0.55760233 -1.540001132 0.347037954
[19,] -0.004822422 -0.14228783  0.165991451 -0.06240231 -0.438123899 0.262938992
[20,]  0.988164149  0.31499490 -0.899907630  2.42269298 -0.150672971 0.139233120
> 
> # defining a index for selecting if the condition is met
> ind <- apply(X, 2, function(X) any(abs(X)>0.5))  
> X[,ind] # since col6 only has values less than 0.5 it is not taken
              col1        col2         col3        col4         col5
 [1,]  2.287247161  0.83975036  1.218550535  0.07637147  0.342585350
 [2,] -1.196771682  0.70534183 -0.699317079  0.15915528  0.004248236
 [3,] -0.694292510  1.30596472 -0.285432752  0.54367418  0.029219842
 [4,] -0.412292951 -1.38799622 -1.311552673  0.70480735 -0.393423429
 [5,] -0.970673341  1.27291686 -0.391012431  0.31896914 -0.792704563
 [6,] -0.947279945  0.18419277 -0.401526613  1.10924979 -0.311701865
 [7,]  0.748139340  0.75227990  1.350517581  0.76915419 -0.346068592
 [8,] -0.116955226  0.59174505  0.591190027  1.15347367 -0.304607588
 [9,]  0.152657626 -0.98305260  0.100525456  1.26068350 -1.785893487
[10,]  2.189978107 -0.27606396  0.931071996  0.70062351  0.587274672
[11,]  0.356986230 -0.87085102 -0.262742349  0.43262716  1.635794434
[12,]  2.716751783  0.71871055 -0.007668105 -0.92260172 -0.645423474
[13,]  2.281451926  0.11065288  0.367153007 -0.61558421  0.618992169
[14,]  0.324020540 -0.07846677  1.707162545 -0.86665969  0.236393598
[15,]  1.896067067 -0.42049046  0.723740263 -1.63951709  0.846500899
[16,]  0.467680511 -0.56212588  0.481036049 -1.32583924 -0.573645739
[17,] -0.893800723  0.99751344 -1.567868244 -0.88903673  1.117993204
[18,] -0.307328300 -1.10513006  0.318250283 -0.55760233 -1.540001132
[19,] -0.004822422 -0.14228783  0.165991451 -0.06240231 -0.438123899
[20,]  0.988164149  0.31499490 -0.899907630  2.42269298 -0.150672971

# It could be done just in one step avoiding 'ind'
X[, apply(X, 2, function(X) any(abs(X)>0.5))]
于 2012-07-25T15:37:10.313 回答
1

对于过滤后只剩下一列的情况,Jilber 的回答是:

X[, apply(X, 2, function(X) any(abs(X)>0.5)), drop=FALSE]

如果没有 drop=FLASE 参数,剩余的列将被转换为向量,您将丢失列名信息。

于 2014-03-04T23:25:52.160 回答