r - 创建一个新行以根据标题将 M/F 分配给列，引用第二个表？

Question

我是 R 新手（以及一般的编码），我真的很想知道如何解决这个问题。

我有一个非常大的数据集；列是样本 ID#（约 7000 个样本），行是基因表达（约 20,000 个基因）。列标题是BIOPSY1-A, BIOPSY1-B, BIOPSY1-C, ..., BIOPSY200-Z. 每个数字 (1-200) 是不同的患者，该患者的每个样本是不同的字母 (-A, -Z)。

我想对来自男性和女性的样本进行一些比较。此基因表达表中不包括性别。我有一个单独的文件，其中包含患者编号 ( BIOPSY1-200) 及其性别 M/F。

我想编写一些将查看列 ID（例如：）的代码BIOPSY7-A，认识到它包括“BIOPSY7”（但不包括 == BIOPSY7，因为有BIOPSY7-Athrough BIOPSY7-Z），在参考文件中找到“BIOPSY7”，推断 M/F , 并创建一个带有 M/F 名称的新行。

老实说，我对这个编码感到不知所措，以至于我试图在 Excel 中打开文件以手动输入 7000 列的 M/F，因为它可能会更快。但是，该文件太大，Excel 在打开时会崩溃。

任何能让我走上正确道路的输入或资源将不胜感激！

score 0 · Accepted Answer

我不太清楚你的数据是什么样子的，所以我根据你的定义做了我的。我相信您可以根据您的需求和数据集结构修改此答案：

library(data.table)
genderfile <-data.frame("ID"=c("BIOPSY1", "BIOPSY2", "BIOPSY3", "BIOPSY4", "BIOPSY5"),"Gender"=c("F","M","M","F","M"))

#you can just read in your gender file to r with the line below
#genderfile <- read.csv("~/gender file.csv")
    View(genderfile)

df<-matrix(rnorm(45, mean=10, sd=5),nrow=3)
colnames(df)<-c("BIOPSY1-A", "BIOPSY1-B", "BIOPSY1-C", "BIOPSY2-A", "BIOPSY2-B", "BIOPSY2-C","BIOPSY3-A", "BIOPSY3-B", "BIOPSY3-C","BIOPSY4-A", "BIOPSY4-B", "BIOPSY4-C","BIOPSY5-A", "BIOPSY5-B", "BIOPSY5-C")
df<-cbind(Gene=seq(1:3),df)
df<-as.data.frame(df)
#you can just read in your main df to r with the line below, fread prevents dashes to turn to period in r, you need data.table package installed and checked in 
#df<-fread("~/first file.csv")
View(df)

请注意，以下代码行从 df 的列名中删除了破折号和字母（我通过 df[,-c(1)] 删除了第一列，因为它是 Gene id）：

substr(x=names(df[,-c(1)]),start=1,stop=nchar(names(df[,-c(1)]))-2)
#[1] "BIOPSY1" "BIOPSY1" "BIOPSY1" "BIOPSY2" "BIOPSY2" "BIOPSY2" "BIOPSY3" "BIOPSY3" "BIOPSY3" "BIOPSY4" "BIOPSY4"
#[12] "BIOPSY4" "BIOPSY5" "BIOPSY5" "BIOPSY5"

现在，我们准备将 df 的列与 genderfile 中的 ID 进行匹配以获取 Gender 列：

Gender<-genderfile[, "Gender"][match(substr(x=names(df[,-c(1)]),start=1,stop=nchar(names(df[,-c(1)]))-2), genderfile[,"ID"])]
Gender
 #[1] F F F M M M M M M F F F M M M

最后一步是将上面定义的 Gender 作为一行添加到 df 中：

df_withGender<-rbind(c("Gender", as.character(Gender)), df)
View(df_withGender)

r - 创建一个新行以根据标题将 M/F 分配给列，引用第二个表？

1 回答 1

Related

Reference