r - 匹配后的子集

Question

我有一个这样的数据框（特别是 data.frame 包含 50 列）：

  "G1"            "G2"  
  SEP11          ABCC1   
  0.1365         0.1858   
  214223_at      ADAM19     
  0.1305         0.131   
  COPS4          BIK 
  0.1271         0.1143
  ACE            ALG3
  0.1333         0.119
  EMP3           GGH
  0.1246         0.1214

和另一个这样的data.frame（特别是data.frame包含50列）：

   "G1"           "G2"  
  0.1365         0.1858   
  0.1271         0.1143    
  0.1246         0.1214

我想要以下输出：

  "G1"           "G2"  
 SEP11          ABCC1  
 0.1365         0.1858  
 COPS4          BIK     
 0.1271         0.1143   
 EMP3           GGH
 0.1246         0.1214

任何人都可以帮助我吗？

基本上，在 R 找到 data.frame 1 中的“0.1365”和 data.frame2 中的“0.1365”之间的匹配后，它将从 data.frame1 中提取对应的名称，该名称与存在匹配的数字和数字相关联因为我想回答这个问题：data.frame1 中的哪个元素与该数字相关联？

score 1 · Accepted Answer

df1 <- read.table(text=" G1            G2  
  SEP11          ABCC1   
  0.1365         0.1858   
  214223_at      ADAM19     
  0.1305         0.131   
  COPS4          BIK 
  0.1271         0.1143
  ACE            ALG3
  0.1333         0.119
  EMP3           GGH
  0.1246         0.1214",header=TRUE,stringsAsFactors=FALSE)

df2 <- read.table(text="G1           G2  
      0.1365         0.1858   
      0.1271         0.1143    
      0.1246         0.1214 
 ",header=TRUE,stringsAsFactors=FALSE)

#separate names and numbers
df1a <- df1[seq(from=1,to=nrow(df1)-1,by=2),]
df1b <- df1[seq(from=2,to=nrow(df1),by=2),]

#look up and merge again
df <- rbind(df1b[apply(df1b,1,paste,collapse=",") %in% apply(df2,1,paste,collapse=","),],
            df1a[apply(df1b,1,paste,collapse=",") %in% apply(df2,1,paste,collapse=","),])
df <- df[order(as.numeric(rownames(df))),]
#       G1     G2
#1   SEP11  ABCC1
#2  0.1365 0.1858
#5   COPS4    BIK
#6  0.1271 0.1143
#9    EMP3    GGH
#10 0.1246 0.1214

score 0 · Accepted Answer

假设您的数据是成对的行，这应该有效：

您的数据：

df1 <- read.table(header = TRUE, text = '  "G1"            "G2"
                  SEP11          ABCC1
                  0.1365         0.1858
                  214223_at      ADAM19
                  0.1305         0.131
                  COPS4          BIK
                  0.1271         0.1143
                  ACE            ALG3
                  0.1333         0.119
                  EMP3           GGH
                  0.1246         0.1214')
df2 <- read.table(header = TRUE, text = ' "G1"           "G2"
                  0.1365         0.1858
                  0.1271         0.1143
                  0.1246         0.1214 ')

匹配指定数据和上一行的数据

myMatch <- which(df1$G1 %in% df2$G1)
myMatch <- sort(c(myMatch, myMatch-1))

子集。

df1[myMatch, ]
#        G1     G2
# 1   SEP11  ABCC1
# 2  0.1365 0.1858
# 5   COPS4    BIK
# 6  0.1271 0.1143
# 9    EMP3    GGH
# 10 0.1246 0.1214

更新

借用一点 Roland 的方法，如果您尝试跨多个列进行匹配，那么确实merge可能是一种更合适的方法。不幸的是，您的数据目前不是一种易于合并的形式，但这也很容易修复：

data.frame通过分离名称和值并cbinding 输出来“修复”您的“df1” 。

df1.new <- cbind(df1[seq(from = 1, to = nrow(df1), by = 2), ], 
                 df1[seq(from = 2, to = nrow(df1), by = 2), ])

重命名数据前半部分的列以指示它们是名称。后半部分数据的列将保留以进行合并。

names(df1.new)[1:(ncol(df1.new)/2)] <- 
  paste(names(df1.new[1:(ncol(df1.new)/2)]), "Name", sep = ".")
df1.new
#     G1.Name G2.Name     G1     G2
# 1     SEP11   ABCC1 0.1365 0.1858
# 3 214223_at  ADAM19 0.1305  0.131
# 5     COPS4     BIK 0.1271 0.1143
# 7       ACE    ALG3 0.1333  0.119
# 9      EMP3     GGH 0.1246 0.1214

用于merge()获取数据的“子集”。

merge(df1.new, df2)
#       G1     G2 G1.Name G2.Name
# 1 0.1246 0.1214    EMP3     GGH
# 2 0.1271 0.1143   COPS4     BIK
# 3 0.1365 0.1858   SEP11   ABCC1

一般来说，这个“更宽”data.frame可能更方便您使用。

r - 匹配后的子集

2 回答 2

更新

Related

Reference