0

当我尝试将计算与 R 并行化时,我遇到了性能问题。我有一个包含 8 列(条件)和 12000 行(基因)的矩阵,我想确定代表基因表达的最佳簇数(将具有相同表达模式的基因分组)。为此,我按照本教程使用 clustGap 函数和围绕 medoid 的分区。

由于计算很长,而且我可以访问一个计算集群,我打算将它并行化。

我想使用雪包,并评估我提取子矩阵的速度,并在我的计算机上进行了第一次测试。

library(snow)
cl<-makeCluster(8) 
clusterEvalQ(cl, library(cluster))
clusterExport(cl,"df")
T1<-Sys.time()
results <-clusterCall(cl,function(x) clusGap(df, FUN = pam, K.max = 20, B= 500,verbose=TRUE))
T2<-Sys.time() 
difftime(T2, T1) 

时差 14.59781 秒

T3<-Sys.time()
clusGap(df, FUN = pam, K.max = 20, B = 500,verbose=TRUE)
T4<-Sys.time()
difftime(T4, T3) 

时差 8.251367 秒

所以我有点被测试卡住了,因为 1 核计算似乎比 8 核更有效:o

有人知道我在这个计算中错过了什么吗?

非常感谢,

sessionInfo()

> R version 3.3.3 (2017-03-06)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS  10.13.4

locale:
[1] fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8

attached base packages:
 [1] splines   stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] snow_0.4-2          cluster_2.0.6       DDRTree_0.1.5       irlba_2.3.1         VGAM_1.0-3          ggplot2_2.2.1      
[7] Biobase_2.34.0      BiocGenerics_0.20.0 Matrix_1.2-12      

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.17           lattice_0.20-35        tidyr_0.8.1            GO.db_3.4.0            assertthat_0.2.0      
 [6] digest_0.6.15          slam_0.1-40            R6_2.2.2               plyr_1.8.4             RSQLite_2.1.1         
[11] pillar_1.2.3           rlang_0.2.1            lazyeval_0.2.1         data.table_1.11.4      blob_1.1.1            
[16] S4Vectors_0.12.2       combinat_0.0-8         qvalue_2.6.0           BiocParallel_1.8.2     stringr_1.3.1         
[21] igraph_1.1.2           pheatmap_1.0.10        bit_1.1-14             munsell_0.5.0          fgsea_1.0.2           
[26] pkgconfig_2.0.1        tidyselect_0.2.4       tibble_1.4.2           gridExtra_2.3          matrixStats_0.53.1    
[31] IRanges_2.8.2          dplyr_0.7.5            grid_3.3.3             gtable_0.2.0           DBI_1.0.0             
[36] magrittr_1.5           scales_0.5.0           stringi_1.2.3          GOSemSim_2.0.4         reshape2_1.4.3        
[41] bindrcpp_0.2.2         limma_3.30.13          DO.db_2.9              clusterProfiler_3.2.14 fastmatch_1.1-0       
[46] fastICA_1.2-1          RColorBrewer_1.1-2     tools_3.3.3            bit64_0.9-7            glue_1.2.0            
[51] purrr_0.2.5            HSMMSingleCell_0.108.0 AnnotationDbi_1.36.2   colorspace_1.3-2       DOSE_3.0.10           
[56] memoise_1.1.0          bindr_0.1.1        
4

0 回答 0