0

这个问题和我之前的问题类似。(用于应用掩码和创建均值向量的循环

我想读取 45、259,200 x 21 数据帧并为每列应用 2 个掩码。我的面具是 259,200 x 1

pdt <- (theData * mask1)*mask2
Error in Ops.data.frame(theData, mask1) : 
* only defined for equally-sized data frames

如何将掩码应用于每个单独的列?

然后我想要一个数据框,每个掩码列的平均值给我 45 x 21 数据框。这是完整的编码:

dataDir <-"C:\\dir\\"
patternC <-"pattern_"
filesSizeC = sort(list.files(dataDir,patternC))
#(filesSizeC)

for (i in 1:length(filesSizeC)) {
  theData<-read.table(paste(dataDir,filesSizeC[i],sep=""),header=F,sep="\t")  
  theData
  pdt <- (theData*mask1)*mask2
  pdt[pdt == 0] <- NA #all zeros become NA's


  if (i>1) {
    theMeanValues <- c(theMeanValues, mean(pdt$V1:pdt$V21, na.rm=T))

  } else {
    theMeanValues <- c(mean(pdt$V1:pdt$V21, na.rm=T))
  }
}

谢谢

编辑 - 13-8-13

好的,所以我已经能够应用我在这里完成的两个面具:

pdt <- theData * rep(mask_1, ncol(theData))
pdt <- pdt * rep(mask_2, ncol(pdt))
pdt[pdt == 0] <- NA #all zeros become NA's

现在这给了我,

> summary(pdt)
       V1                  V2                  V3                  V4                           
 Min.   : 20261945   Min.   : 21312164   Min.   : 22243882   Min.   : 23064587  
 1st Qu.: 91201092   1st Qu.: 95889488   1st Qu.:100047585   1st Qu.:103709299   
 Median :205769790   Median :216624073   Median :226261360   Median :234756158   
 Mean   :231083595   Mean   :242654479   Mean   :252906061   Mean   :261926034   
 3rd Qu.:345700883   3rd Qu.:363602884   3rd Qu.:379489788   3rd Qu.:393487592   
 Max.   :741504636   Max.   :776855896   Max.   :808103971   Max.   :835543870   
 NA's   :259065      NA's   :259065      NA's   :259065      NA's   :259065      
...
 V21           
 Min.   : 27844725  
 1st Qu.:124843018  
 Median :284331924  
 Mean   :314292645  
 3rd Qu.:475087713  
 Max.   :993931538  
 NA's   :259065 

我想在没有 NA 的情况下为每一列取平均值。

在这个更简单的示例中,我想要一个循环来为每列制作一个 1 x 21 均值的数据框。

mat1 <- matrix(rnorm(10), nrow=5, ncol=21)
mat1 <- data.frame(mat1)
mat1

         X1         X2          X3         X4   ......
1  0.56660450  0.1690268  0.56660450  0.1690268
2  0.01571945  1.1650268  0.01571945  1.1650268
3  0.38305734 -0.0442040  0.38305734 -0.0442040
4 -0.04513712 -0.1003684 -0.04513712 -0.1003684
5  0.03435191 -0.2834446  0.03435191 -0.2834446

 for (i in 1:length(mat1)) {
  if (i>1) {
    theMeanValues <- c(themeanvalues, mean(mat1$[i]), na.rm=T)

  } else {
    theMeanValues <- c(mean(mat1$[i]), na.rm=T)
 }
}

The coding doesn't work, I think I need to change the syntax at mean(mat1$[i]) but not sure to what.

4

1 回答 1

2

You are not using the correct syntax for selecting the columns of the matrix and the brackets are not at the correct places. And using a loop for this is slow and cumbersome. Use the colMeans() function.

> mat1 <- matrix(rnorm(21 * 1e6), ncol = 21)
> mat1 <- data.frame(mat1)
> 
> system.time({
+   for (i in seq_len(ncol(mat1))) {
+     if (i>1) {
+       theMeanValues <- c(theMeanValues, mean(mat1[, i], na.rm = TRUE))
+     } else {
+       theMeanValues <- mean(mat1[, i], na.rm = TRUE)
+     }
+   } 
+ })
   user  system elapsed 
   0.53    0.05    0.58 
> system.time({
+   theMeanValues2 <- colMeans(mat1, na.rm = TRUE)
+ })
   user  system elapsed 
   0.16    0.09    0.25 
> names(theMeanValues2) <- NULL
> all.equal(theMeanValues, theMeanValues2)
[1] TRUE
于 2013-08-13T09:57:13.417 回答