我在 R 中构建了一个函数(在 Ubuntu 12.04 LTS 64 位、具有多线程和 6gb ram 的 4 核 i7 服务器上运行),我使用标准包安装了 R:
sudo apt-get install r-base r-recommended r-base-dev
sudo apt-get install r-cran-multicore r-cran-iterators r-cran-foreach r-cran-domc
注意:我还在 R 中安装了foreach
& doMC
(这也没有帮助),就像我安装了deldir
软件包一样:
install.packages(c("deldir"), dependencies = TRUE)
我的函数运行良好,但它不使用并行内核(仅最大化 8 个内核中的 1 个):
library(deldir)
library(foreach)
library(doMC)
registerDoMC(cores=8)
#getDoParWorkers()
#getDoParName()
#getDoParVersion()
# loop through files
inputfiles <- dir(path="/home/geoadmin/data/objects/", pattern='.txt')
for( inputfilenr in 1:length(inputfiles))
{
# set file variables
curinputfile = paste("/home/geoadmin/data/objects/",inputfiles[[inputfilenr]], sep = "", collapse = NULL)
print (curinputfile)
curoutputfile = paste("/home/geoadmin/data/objects/",substr(inputfiles[[inputfilenr]], start=1, stop=10), '.out', sep = "", collapse = NULL)
# select the point x/y coordinates into a data frame...
points <- read.csv(curinputfile, header = TRUE, sep = ",", dec=".", fill = TRUE)
# set calculation variables, precision on 3 digits only because of the RDW coordinate system
voro = deldir(points$x, points$y, digits=3, list(ndx=2,ndy=2), rw=c(min(points$x)-abs(min(points$x)-max(points$x)), max(points$x)+abs(min(points$x)-max(points$x)), min(points$y)-abs(min(points$y)-max(points$y)), max(points$y)+abs(min(points$y)-max(points$y))))
tiles = tile.list(voro)
poly = array()
# start loop
poly <- foreach (i=1:length(tiles), .combine=cbind) %dopar%
{
# load tile info
tile = tiles[[i]]
# start with EWKB notation
curpoly = "POLYGON(("
# add list of coordinates by looping through the points in tile
for (j in 1:length(tiles[[i]]$x)) { curpoly = sprintf("%s %.6f %.6f,",curpoly,tile$x[[j]],tile$y[[j]]) }
# then again the first point to close the polygon and end the EWKB notation, adding that to the poly array
sprintf("%s %.6f %.6f))",curpoly,tile$x[[1]],tile$y[[1]])
}
write.csv(t(poly), file = curoutputfile, row.names = FALSE)
}
所以结果很好,但是没有并行性...
doMC
确实注册正确:
> getDoParWorkers()
[1] 8
> getDoParName()
[1] "doMC"
> getDoParVersion()
[1] "1.2.5"
如果我看一下用法(带top
):
top - 01:03:19 up 9 min, 3 users, load average: 1.02, 0.86, 0.45
Tasks: 131 total, 2 running, 127 sleeping, 0 stopped, 2 zombie
Cpu(s): 12.5%us, 0.0%sy, 0.0%ni, 87.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 6104932k total, 1240512k used, 4864420k free, 16656k buffers
Swap: 6283260k total, 0k used, 6283260k free, 141996k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1553 zzzzzzzz 20 0 913m 850m 3716 R 100 14.3 8:22.03 R
因此,只需最大化一个核心。有谁知道什么可能导致foreach
/doMC
不使用多个核心?
> sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] doMC_1.2.5 multicore_0.1-7 iterators_1.0.6 foreach_1.4.0
[5] deldir_0.0-19
loaded via a namespace (and not attached):
[1] codetools_0.2-8