当我尝试将计算与 R 并行化时,我遇到了性能问题。我有一个包含 8 列(条件)和 12000 行(基因)的矩阵,我想确定代表基因表达的最佳簇数(将具有相同表达模式的基因分组)。为此,我按照本教程使用 clustGap 函数和围绕 medoid 的分区。
由于计算很长,而且我可以访问一个计算集群,我打算将它并行化。
我想使用雪包,并评估我提取子矩阵的速度,并在我的计算机上进行了第一次测试。
library(snow)
cl<-makeCluster(8)
clusterEvalQ(cl, library(cluster))
clusterExport(cl,"df")
T1<-Sys.time()
results <-clusterCall(cl,function(x) clusGap(df, FUN = pam, K.max = 20, B= 500,verbose=TRUE))
T2<-Sys.time()
difftime(T2, T1)
时差 14.59781 秒
T3<-Sys.time()
clusGap(df, FUN = pam, K.max = 20, B = 500,verbose=TRUE)
T4<-Sys.time()
difftime(T4, T3)
时差 8.251367 秒
所以我有点被测试卡住了,因为 1 核计算似乎比 8 核更有效:o
有人知道我在这个计算中错过了什么吗?
非常感谢,
sessionInfo()
> R version 3.3.3 (2017-03-06)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS 10.13.4
locale:
[1] fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8
attached base packages:
[1] splines stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] snow_0.4-2 cluster_2.0.6 DDRTree_0.1.5 irlba_2.3.1 VGAM_1.0-3 ggplot2_2.2.1
[7] Biobase_2.34.0 BiocGenerics_0.20.0 Matrix_1.2-12
loaded via a namespace (and not attached):
[1] Rcpp_0.12.17 lattice_0.20-35 tidyr_0.8.1 GO.db_3.4.0 assertthat_0.2.0
[6] digest_0.6.15 slam_0.1-40 R6_2.2.2 plyr_1.8.4 RSQLite_2.1.1
[11] pillar_1.2.3 rlang_0.2.1 lazyeval_0.2.1 data.table_1.11.4 blob_1.1.1
[16] S4Vectors_0.12.2 combinat_0.0-8 qvalue_2.6.0 BiocParallel_1.8.2 stringr_1.3.1
[21] igraph_1.1.2 pheatmap_1.0.10 bit_1.1-14 munsell_0.5.0 fgsea_1.0.2
[26] pkgconfig_2.0.1 tidyselect_0.2.4 tibble_1.4.2 gridExtra_2.3 matrixStats_0.53.1
[31] IRanges_2.8.2 dplyr_0.7.5 grid_3.3.3 gtable_0.2.0 DBI_1.0.0
[36] magrittr_1.5 scales_0.5.0 stringi_1.2.3 GOSemSim_2.0.4 reshape2_1.4.3
[41] bindrcpp_0.2.2 limma_3.30.13 DO.db_2.9 clusterProfiler_3.2.14 fastmatch_1.1-0
[46] fastICA_1.2-1 RColorBrewer_1.1-2 tools_3.3.3 bit64_0.9-7 glue_1.2.0
[51] purrr_0.2.5 HSMMSingleCell_0.108.0 AnnotationDbi_1.36.2 colorspace_1.3-2 DOSE_3.0.10
[56] memoise_1.1.0 bindr_0.1.1