0

我正在构建一个 R 函数来执行 Spearman 对感兴趣的临床特征与所有患者感兴趣的单个基因的等级相关性分析。我成功地创建了一个名为“computeC”的函数来完成这项任务。你可以在这里试试:

  1. 虚拟数据:

一、临床资料

df=structure(list(lymph = c(1L, 5L, 8L, 1L, 0L), npi = c(4.036, 
6.032, 6.03, 5.042, 3.046), stage = c(2L, 2L, 3L, 2L, 2L)), row.names = c("MB-0362", 
"MB-0346", "MB-0386", "MB-0574", "MB-0503"), class = "data.frame")

注意:感兴趣的临床特征(列)的数量是无限且不固定的,具体取决于用户的需要。在这种情况下,我选择三个临床特征:“lymph”、“stage”和“npi”作为示例

b,这些个体基因在所有患者中的表达水平:

df1=structure(list(NCOR1 = c(0.6488, 0.3312, -0.3336, 0.2663, -1.3986
), ZFP36L1 = c(-1.4278, -1.9684, -1.4047, -1.1984, 0.397), SMAD4 = c(-0.5692, 
-2.5897, -1.4175, -2.2613, 0.6804), CDKN1B = c(-0.9829, -1.7246, 
-1.1409, -1.5033, -0.8475), CDH1 = c(-0.1387, 1.5924, -0.7637, 
1.2737, 0.5298), PIK3R1 = c(0.2649, -0.2267, -0.6875, -0.8364, 
1.3622), BRCA2 = c(0.6442, 1.2209, -0.6712, -1.0785, -0.296), 
    KMT2C = c(-0.8759, -0.327, -0.0154, -0.7076, -0.0817), KRAS = c(0.5975, 
    -0.0729, 0.0069, -1.3664, -0.9904), MUC16 = c(0.4375, -0.7318, 
    -0.5569, -0.8224, -0.3882)), row.names = c("MB-0362", "MB-0346", 
"MB-0386", "MB-0574", "MB-0503"), class = "data.frame")
  1. 这是构建“computeC”的代码:

computeC = 函数(数据,变量,x){

computeQ <- function(x){(x$P.value*nrow(x))/(x$rank)}

#missing input
if(missing(data)){
  stop("Error: omics input is missing \n")
}

if(missing(var)){
  stop("Error: clinical data is missing \n")
}

if(missing(x)){
  stop("Error: clinical feature column in clinical data is missing \n")
}

#implementation
cc1 <- data.frame(name=paste("Site", 1:ncol(data)),Estimate=NA ,P.value=NA)
estimates = numeric(ncol(data))
pvalues = numeric(ncol(data))
for (i in c(1:ncol(data))) {
  cc=cor.test(data[,i],var[,x],
              method = "spearman")
  cc1$Estimate[i]=cc$estimate
  cc1$P.value[i]=cc$p.value
  rownames(cc1) = colnames(data)[1:ncol(data)]
}
cc1 = cc1[,-1]
order.pvalue = order(cc1$P.value)
cc1 = cc1[order.pvalue,] #order rows following p-value
cc1$rank = rank(cc1$P.value) #re-order
cc1$Q.value = computeQ(cc1) #compute Q-value
cc1 = cc1 %>% subset(P.value <= 0.05) #only retain Genes with P <=0.05
cc1 = cc1 %>% subset(Q.value <= 0.05) #only retain Genes with Q <=0.05
cc1 = dplyr::select(cc1, -rank)
return(cc1)}

您可以尝试单独运行此功能,如下所示。由于虚拟数据很小,您无法看到确切的结果,我将结果发布为数字以轻松想象结果的样子:

library(dplyr)
computeC(df1,df,"lymph")

图1

computeC(df1,df,"stage")

图 2

computeC(df1,df,"npi")

图 3

现在我想用每个临床特征自动连续地实现computeC,并同时将这些结果放到列表中的相应子列表中。这是我尝试的:

listCC=list();
for (i in length(names(df))){
    listCC[[i]] = computeC(df1,df, names(df)[i]}

但它没有按预期工作。请帮我

4

1 回答 1

0

该函数返回一个空数据框,并针对您共享的数据向我发出一些警告,但也许您可以使用lapply

lapply(names(df), computeC, data = df1, var = df)
于 2020-07-01T05:19:45.283 回答