r - 从列值中过滤行名称中的数据框

Question

基本上我有两列（target_id和fpkm）的数据框。我只想在第一列中保留那些不重复的行名。
例如，在下面的数据框中，您可以看到有两个具有相同名称的行名称（几乎）comp267138_c0_seq1 comp267138_c0_seq2并且来自两者，我只想comp267138_c0_seq2在第 2 列中保留一个基于高值的行名称。

       target_id        fpkm
comp247393_c0_seq1    3.197885
comp257058_c0_seq4    1.624577
comp242590_c0_seq1    1.750319
comp77911_c0_seq1     1.293059
comp241426_c0_seq1    1.626589
comp288413_c0_seq1   14.828853
comp294436_c0_seq1   11.555596
comp63603_c0_seq1     1.982386
comp267138_c0_seq1    8.594494
comp267138_c0_seq2   11.134958
comp321623_c0_seq1    6.934149

score 1 · Accepted Answer

It appears you only want to consider part of the target_id (the first two components, splitting by _)

If your data.frame is called DT

# create   column without the _seqx part
DT$new_id <- sapply(lapply(strsplit(as.character(DT[['target_id']]), '_'), head, 2),
              paste, collapse = '_')
library(plyr)


ddply(DT, .(new_id), function(x) x[which.max(x$fpkm),])

r - 从列值中过滤行名称中的数据框

1 回答 1

Related

Reference