r - 尝试从具有 R 条件的表中提取行时出错

Question

我正在尝试选择具有相同 gene_id但具有起始坐标最小值的行：href_pos$start。为什么我会收到此错误，即使我的内存限制为 ~ 16Gb？或者我做错了什么？我有以下代码：

头（href_pos，5）

    chr      region    start      end strand nu   gene_id
1  chr1 start_codon 67000042 67000044      +  . NM_032291
2  chr1         CDS 67000042 67000051      +  0 NM_032291
3  chr1        exon 66999825 67000051      +  . NM_032291
4  chr1         CDS 67091530 67091593      +  2 NM_032291
5  chr1        exon 67091530 67091593      +  . NM_032291

d1 <- ddply(as.data.frame(href_pos), "gene_id", function(href_pos) href_pos[which.min(href_pos$start), ])
错误：无法分配大小为 283 Kb 的向量另外：警告消息：
1：在 lapply(dfs, function(df) levels(df[[var]])) 中：达到 16383Mb 的总分配：请参阅帮助（memory.size）

score 0 · Accepted Answer

证明你的语法很好：

#Create a minimal, reproducible example
gene_id <- gl(3, 3, 9, labels <- letters[1:3])
start <- rep(1:3, 3)
href_pos <- data.frame(gene_id=gene_id, start=start)

d1 <- ddply(as.data.frame(href_pos), "gene_id", function(href_pos) href_pos[which.min(href_pos$start), ])
 gene_id  start
1      a      1
2      b      1
3      c      1

正如data.table大通建议的那样，这应该有效：

require(data.table)
HREF_POS <- data.table(href_pos)
setkey(HREF_POS, gene_id)
MINS <- HREF_POS[HREF_POS[,start] %in% HREF_POS[ ,min(start), by=gene_id]$V1,]

r - 尝试从具有 R 条件的表中提取行时出错

1 回答 1

Related

Reference