我已经阅读了很多与数据争吵和“重复”t 检验相关的帖子,但我无法弄清楚在我的情况下实现它的方法。
您可以在此处获取 StackOverflow 的示例数据集:https ://www.dropbox.com/s/0b618fs1jjnuzbg/dataset.example.stckovflw.txt?dl=0
我有一个 gen 表达式的大数据框,例如:
> b<-read.delim("dataset.example.stckovflw.txt")
> head(b)
animal gen condition tissue LogFC
1 animalcontrol1 kjhss1 control brain 7.129283
2 animalcontrol1 sdth2 control brain 7.179909
3 animalcontrol1 sgdhstjh20 control brain 9.353147
4 animalcontrol1 jdygfjgdkydg21 control brain 6.459432
5 animalcontrol1 shfjdfyjydg22 control brain 9.372865
6 animalcontrol1 jdyjkdg23 control brain 9.541097
> str(b)
'data.frame': 21507 obs. of 5 variables:
$ animal : Factor w/ 25 levels "animalcontrol1",..: 1 1 1 1 1 1 1 1 1 1 ...
$ gen : Factor w/ 1131 levels "dghwg1041","dghwg1086",..: 480 761 787 360 863 385 133 888 563 738 ...
$ condition: Factor w/ 5 levels "control","treatmentA",..: 1 1 1 1 1 1 1 1 1 1 ...
$ tissue : Factor w/ 2 levels "brain","heart": 1 1 1 1 1 1 1 1 1 1 ...
$ LogFC : num 7.13 7.18 9.35 6.46 9.37 ...
每组有5只动物,每只动物有许多量化的基因。(然而,每只动物可能有一组不同的量化基因组,而且许多基因组在动物和群体之间是共同的)。
我想对我的治疗组(A、B、C 或 D)和对照组之间的每一代进行 t 检验。数据应以表格形式呈现,其中包含每组中每个基因的 p 值。
因为我有这么多的基因(千),所以我不能对每个基因进行子集化。
你知道我怎样才能自动化这个过程吗?
我正在考虑一个循环,但我绝对不确定它能否实现我想要的以及如何进行。
此外,我正在使用以下apply
函数查看这些帖子:Apply t-test on many columns in a dataframe split by factor and Looping through t.tests for data frame subsets in r
@andrew_reece:非常感谢你。这几乎正是我想要的。但是,我找不到使用 t 检验的方法。ANOVA 是有趣的信息,但我需要知道哪些治疗组与我的对照组有显着差异。我还需要知道哪个治疗组彼此之间存在显着差异,“两个两个”。
我一直在尝试通过更改“t.test(...)”中的“aov(..)”来使用您的代码。为此,首先我实现了一个子集(b, condition == "control" | condition == "treatmentA" ),以便仅比较两组。但是,在 csv 文件中导出结果表时,该表是无法理解的(没有生成名称、没有 p 值等,只有数字)。我将继续寻找一种方法来正确地做到这一点,但直到现在我被困住了。
@42:
非常感谢您提供这些提示。这只是一个数据集示例,假设我们必须使用单独的 t 检验。
这是探索我的数据的非常有用的开始。例如,我一直在尝试用 Venndiagrams 表示我的数据。我可以编写我的代码,但这有点超出最初的主题。另外,我不知道如何以不那么挑剔的方式总结在每种条件组合中检测到的共享“基因”,所以我只用 3 个条件进行了简化。
# Visualisation of shared genes by VennDiagrams :
# let's simplify and consider only 3 conditions :
b<-read.delim("dataset.example.stckovflw.txt")
b<- subset(b, condition == "control" | condition == "treatmentA" | condition == "treatmentB")
b1<-table(b$gen, b$condition)
b1
b2<-subset(data.frame(b1, "control" > 2
|"treatmentA" > 2
|"treatmentB" > 2 ))
b3<-subset(b2, Freq>2) # select only genes that have been quantified in at least 2 animals per group
b3
b4 = within(b3, {
Freq = ifelse(Freq > 1, 1, 0)
}) # for those observations, we consider the gene has been detected so we change the value 0 regardless the freq of occurence (>2)
b4
b5<-table(b4$Var1, b4$Var2)
write.csv(b5, file = "b5.csv")
# make an intermediate file .txt (just add manually the name of the cfirst column title)
# so now we have info
bb5<-read.delim("bb5.txt")
nrow(subset(bb5, control == 1))
nrow(subset(bb5, treatmentA == 1))
nrow(subset(bb5, treatmentB == 1))
nrow(subset(bb5, control == 1 & treatmentA == 1))
nrow(subset(bb5, control == 1 & treatmentB == 1))
nrow(subset(bb5, treatmentA == 1 & treatmentB == 1))
nrow(subset(bb5, control == 1 & treatmentA == 1 & treatmentB == 1))
library(grid)
library(futile.logger)
library(VennDiagram)
venn.plot <- draw.triple.venn(area1 = 1005,
area2 = 927,
area3 = 943,
n12 = 843,
n23 = 861,
n13 = 866,
n123 = 794,
category = c("controls", "treatmentA", "treatmentB"),
fill = c("red", "yellow", "blue"),
cex = 2,
cat.cex = 2,
lwd = 6,
lty = 'dashed',
fontface = "bold",
fontfamily = "sans",
cat.fontface = "bold",
cat.default.pos = "outer",
cat.pos = c(-27, 27, 135),
cat.dist = c(0.055, 0.055, 0.085),
cat.fontfamily = "sans",
rotation = 1);