r - Use a loop or a function to run a script over multiple files using R

Question

Possible Duplicate:
Which is the best method to apply a script repetitively to n .csv files in R?

I have a script that calculates kernel utilization distributions and generates a PDF of the summary plots. My script currently split the database by each "period" and uses a for loop to generate the KUD plots. I also have multiple csv files store in a directory that correspond to individual fish. What I would like is to use a function or loop to read all the files in the directory and run the script that will generate PDF plots for each individual fish and each period.

My script looks like this:

library(ks)

dd<-read.table(text="period  dist   depth
            1   4916.64 8.661827
            1   4916.64 14.789091
            1   4916.64 13.555909
            1   4916.64 12.92816
            1   4916.64 11.708774
            1   4916.64 15.28
            1   4916.64 13.369875
            1   4916.64 14.039655
            1   4916.64 13.454545
            1   4916.64 12.638261
            1   4916.64 13.251081
            1   4916.64 14.006341
            1   4916.64 12.64
            1   4916.64 15.521818
            1   4916.64 10.202121
            1   4916.64 14.816667
            1   4916.64 15.504
            1   9674.844    23.93
            1   11000.151   22.157143
            1   11414.31    22.72
            1   11414.31    25.7
            1   11414.31    19.07
            1   11414.31    23.085714
            1   9481.57 17.266667
            1   11414.31    26.8
            1   11414.31    19.382222
            1   5616.09 12.016667
            1   10658.02    18.873913
            1   11414.31    25.2
            1   11414.31    20.9
            1   11414.31    27.65
            1   11414.31    22.133333
            1   11414.31    30.9
            1   5616.09 23.3
            2   11172.718   20.391667
            2   9964.755    23.51
            2   5616.09 19.43
            2   5616.09 19.1
            2   4916.64 18.42
            2   8515.2  17.683333
            2   11414.31    22.128571
            2   11414.31    22.8608
            2   10391.095   24.955882
            2   10931.125   25.225
            2   6444.407    20.228571
            2   11276.257   23.77619
            2   10585.993   23.285714
            2   10641.214   20.653333
            2   9757.676    24.007143
            2   11414.31    18.817
            2   11414.31    23.525
            2   11414.31    22.873684
            2   11414.31    26.15
            2   10486.595   21.9
            2   11000.151   24.142857
            2   11414.31    24.3875
            2   10819.621   20.569231
            2   10360.088   29.345455
            2   9708.951    21.488235
            2   11414.31    30.775
            2   11414.31    25.5
            2   11414.31    18.477917
            2   10327.144   26.8625
            2   11414.31    26.12963
            2   11414.31    29.28125
            2   11414.31    23.166667
            2   10689.532   21.8625
            2   11414.31    28.328571
            2   11414.31    22.563158
            2   11414.31    25.490909
            2   11414.31    26.0625
            2   11414.31    34.5
            2   11414.31    17.375294
            ",header=T)

dd1<-data.frame(dd$period,dd$dist,dd$depth)

# split database for each period
period<-dd1$dd.period
M<-split(dd1,period)
l<-length(M)

# run loop through each tag and create a PDF file with all KUD plots
pdf("KUD plot.pdf",width=11,height=8,paper="a4r")
par(mfcol=c(1,1))

for(j in 1:l){

# calculate the 2D kernel
dd2<-data.frame(M[[j]]$dd.dist,M[[j]]$dd.depth)

## auto bandwidth selection
H.pi2<-Hpi(dd2,binned=TRUE)*1
ddhat<-kde(dd2,H=H.pi2)

# Kernel contour plot
plot(ddhat,cont=c(95),drawpoints=TRUE,col="black",xlab="Distance (m)",lwd=2.5, 
    ylab="Depth (m)",ptcol="grey15",cex=0.7,
    xlim=c(min(dd2[,1]-dd2[,1]*0.4),max(dd2[,1]+dd2[,1]*0.4)),ylim=c(45,-1),
    main=paste("Period"," - ",M[[j]]$dd.period[1])) 

plot(ddhat,cont=c(25),add=TRUE,col="red",lwd=2.4)
plot(ddhat,cont=c(50),add=TRUE,col="seagreen2",lwd=2.4)
plot(ddhat,cont=c(75),add=TRUE,col="royalblue",lty=5,lwd=2.5)  

}

dev.off()

Any suggestions or ideas of how to improve the script will be appreciated. Thanks in advance!

score 0 · Accepted Answer

@RomanLustrik 链接的答案不包含apply基于解决方案的解决方案，因此我将在这里介绍一个。您可能想要的功能d*ply来自plyr包。plyr做所有splitting- applying a function to each split- and combining it back together again。例如，要获得每个周期的平均深度：

library(plyr)
ddply(dd, .(period), summarise, mn = mean(depth))

您可以使用相同的想法来执行分析：

res = dlply(dd, .(period), function(x) {
   dd2 = x[-1]
   H.pi2<-Hpi(dd2,binned=TRUE)*1
   ddhat<-kde(dd2,H=H.pi2)
 })

其中res是包含分析结果的长度为 2 的列表。您可以使用类似的方法从中提取信息res并绘制它。我会用ggplot2它。

score 0 · Accepted Answer

简化代码的一种方法是将for循环替换为ddply对plyr包的调用。例如：

library(plyr)

pdf("test.pdf",width=11,height=8,paper="a4r")
invisible(ddply(dd1, .(period), function(df) {
  period <- df$dd.period[1]
  df <- df[,-1]
  ## auto bandwidth selection
  H.pi2<-Hpi(df,binned=TRUE)*1
  ddhat<-kde(df,H=H.pi2)

  ## Kernel contour plot
  plot(ddhat,cont=c(95),drawpoints=TRUE,col="black",xlab="Distance (m)",lwd=2.5, 
       ylab="Depth (m)",ptcol="grey15",cex=0.7,
       xlim=c(min(df$dd.dist-df$dd.dist*0.4),max(df$dd.dist+df$dd.dist*0.4)),ylim=c(45,-1),
       main=paste("Period"," - ",period)) 

  plot(ddhat,cont=c(25),add=TRUE,col="red",lwd=2.4)
  plot(ddhat,cont=c(50),add=TRUE,col="seagreen2",lwd=2.4)
  plot(ddhat,cont=c(75),add=TRUE,col="royalblue",lty=5,lwd=2.5)  

}))
dev.off()

然后，您可以将此代码包装到您自己的函数中，并将数据框和 pdf 文件名作为参数：

pdfks <- function(dd1, filename) {
  pdf(filename, width=11, height=8, paper="a4r")
  ...
  dev.off()
}

然后，您可以创建一个循环，依次加载每个 csv 文件并将此函数应用于每个文件。

r - Use a loop or a function to run a script over multiple files using R

2 回答 2

Related

Reference