r - 使用 R 提取特定行

Question

我有一个 ID 向量，它描述了一个组的成员身份。每个 ID 在列表中只出现一次。

例子：

GO:0006169
GO:0032238
GO:0046086
GO:0006154
GO:0046085
GO:0004001

我还有一个表（3 列，74985 行，无标题），其中包含 V1 中的单个 ID（记录为数字），V2 中的组 ID。以及 V3 中该组的简短描述。

例子：

1 GO:0003674                                           molecular_function
1 GO:0005576                                         extracellular region
1 GO:0008150                                           biological_process
2 GO:0001869 negative regulation of complement activation, lectin pathway
2 GO:0004867                 serine-type endopeptidase inhibitor activity
2 GO:0005515                                              protein binding

每个人可以属于多个组，每个组中可以有多个个人。在示例中，个人 1 在 group 中 GO:0003674, GO:0005576 and GO:0008150。

我想从表中提取并保留组 ID 与组 ID 向量匹配的每一行（即每个组）。第一个向量中的一些 ID 在表中没有匹配项。我尝试过使用合并功能，但没有成功，它似乎在一个组中多次包含同一个人。

score 3 · Accepted Answer

我猜你的表是指数据框 - 如果不是，只需转换并可能使用names()或使用索引来调整列名。

在 df 中查找索引which()，然后使用这些索引来提取适当的行：

> df <- data.frame(g=1:10,v=1:10)
> v <- c(3,4,7,33)
> df[df$g %in% v,]
  g v
3 3 3
4 4 4
7 7 7

另一种选择是使用sqldf然后处理数据帧，如使用 SQL 的表。

score 2 · Accepted Answer

使用merge：

#dummy - GO dataframe
df1 <- read.table(text="GO:0006169
GO:0032238
GO:0046086
GO:0006154
GO:0046085
GO:0004001",col.names=c("GO_ID"))

#dummy - sample
df2 <- read.table(text="
1 GO:0003674 molecular_function
1 GO:0046086 extracellular_region
1 GO:0008150 biological_process
1 GO:0046085 xxx
2 GO:0046085 negative_xx_lectinpathway
2 GO:0004867 serine-type_endopeptidase_inhibitor
2 GO:0005515 protein_binding",col.names=c("Sample_ID","GO_ID","Description"))

#output
merge(df1,df2)
#GO_ID Sample_ID               Description
#1 GO:0046085         1                       xxx
#2 GO:0046085         2 negative_xx_lectinpathway
#3 GO:0046086         1      extracellular_region

r - 使用 R 提取特定行

2 回答 2

Related

Reference