我使用自己的数据成功完成了 DADA2 管道教程 ( https://benjjneb.github.io/dada2/tutorial.html ),但在过渡到 Phyloseq 时遇到了困难。我需要从文件名中编码的信息构造一个简单的 data.frame。这是教程中提供的代码。
#Make a data.frame holding the sample data
samples.out <- rownames(seqtab.nochim)
subject <- sapply(strsplit(samples.out, "D"), `[`, 1)
gender <- substr(subject,1,1)
subject <- substr(subject,2,999)
day <- as.integer(sapply(strsplit(samples.out, "D"), `[`, 2))
samdf <- data.frame(Subject=subject, Gender=gender, Day=day)
samdf$When <- "Early"
samdf$When[samdf$Day>100] <- "Late"
rownames(samdf) <- samples.out
我的应该比这更简单,因为我没有时间作为一个因素。我只有六个治疗组。
这是我想弄清楚的。
#Make a data.frame holding the sample data
samples.out <- rownames(seqtab.nochim)
#create vector with the treatments
trtmt <- c("EM", "EP", "EM", "AR37", "NEA2", "AR1", "AR37", "NEA2", "EP", "NEA2", "EP", "EM", "AR37", "EP", "NEA2", "Ctrl", "Ctrl", "AR37", "EP", "AR37", "AR37", "EP", "AR1", "AR1", "EP", "EM", "EM", "AR37", "AR1", "EM", "AR37", "NEA2", "AR1", "Ctrl", "EP", "Ctrl", "EP", "AR37", "AR37")
#Add a new column to the samples.out dataframe
samples.out_2 <- samples.out
samples.out_2 <- cbind(samples.out, new_col = trtmt)
#Rename columns
colnames(samples.out_2)[colnames(samples.out_2) == "samples.out"] <- "Sample"
colnames(samples.out_2)[colnames(samples.out_2) == "new_col"] <- "Treatment"
#Head of my samples.out_2 data frame (I have a total of 39 samples and 6 treatment groups)
Sample Treatment
193 EM
194 EP
196 EM
197 AR37
198 NEA2
#Still stuck with how to make this relevant to my metadata!
sample <- sapply(strsplit(samples.out_2, "D"), `[`, 1) #what does the "D" mean (I think it has to do with the mouse dataset used in the tutorial)? However, I am not sure what I need to pull from my data.frame. Also, What does '[' mean? I know the meanings for operators like [], (), etc., but not for a single one in quotes.
treatment <- substr(sample,1,39) #I don't understand what I am trying to extract or change
sample <- substr(sample,2,999) #I don't understand what I am trying to extract or change
samdf <- data.frame(Sample=sample, Treatment=treatment)
rownames(samdf) <- samples.out
如果有人使用自己的数据完成了本教程并理解了这种转变,我将不胜感激您的见解。谢谢