1

我正在尝试将两个嵌套的 for 循环转换为两个嵌套的 foreach 循环,以根据匹配的先决条件更改数据帧的值。原因是我相信我可以显着加快这个过程。下面是我的代码示例:

 library(foreach) # for loop to parallelize
 library(doMC) # create the number of cores to use

 # set the number of cores to use
 registerDoMC(22)  # number of CPU cores

 file_list <- c("a", "b", "c")
 ldf <- c(data.frame(Date = c("2016-10-01", "2016-10-02", "2016-10-03", "2016-10-04")),
     data.frame(Date = c("2016-10-07", "2016-10-08", "2016-10-09")),
     data.frame(Date = c("2016-10-15", "2016-10-16", "2016-10-17", "2016-10-18", "2016-10-19")))

 DF <- data.frame(Date = seq(as.POSIXct("2016-10-01", tz = "UTC"), as.POSIXct("2016-10-31", tz = "UTC"), by = 'day'),
             A = 0,
             B = 0,
             C = 0)

 DF2 <- DF # DF2 is used to compare my attempt result


 for (i in 1:length(file_list))
 {
   Date <- ldf[[i]]
   Date <- as.POSIXct(Date, tz = "UTC")

   for (j in 1:length(Date))
   {
     ROW <- which(DF$Date == Date[j])
     DF[ROW,i+1] <- 1
   }

 }

 throwaway <- foreach (i = 1:length(file_list)) %dopar%
 {
   Date <- ldf[[i]]
   Date <- as.POSIXct(Date, tz = "UTC")

   foreach (j = 1:length(Date)) %do%
   {
     ROW <- which(DF2$Date == Date[j])
     DF2[ROW,i+1] <- 1
     return(NULL)
   }
 }

filelist是我正在阅读的文件列表

ldf是用于存储读取的文件的变量

这两个变量在这个例子中组成,只是为了有一个可重现的例子。

DF是我要存储foreach循环所做值的更改的地方

DF2是我的尝试以及它的存储位置

我正在寻找的输出是DF,但DF2保持不变。我了解 foreach 循环是为它们的返回值而设计的,但是我怎样才能让返回值与数据框的值应该更改的位置相匹配。这些值是读入的每个文件的日期与数据框中的日期file_list匹配的地方DF2。如果它们匹配,则将 1 放置在行(日期)和列(文件名)的特定位置。提前感谢您的帮助!

期望的输出是:

 > DF
          Date A B C
 1  2016-10-01 1 0 0
 2  2016-10-02 1 0 0
 3  2016-10-03 1 0 0
 4  2016-10-04 1 0 0
 5  2016-10-05 0 0 0
 6  2016-10-06 0 0 0
 7  2016-10-07 0 1 0
 8  2016-10-08 0 1 0
 9  2016-10-09 0 1 0
 10 2016-10-10 0 0 0
 11 2016-10-11 0 0 0
 12 2016-10-12 0 0 0
 13 2016-10-13 0 0 0
 14 2016-10-14 0 0 0
 15 2016-10-15 0 0 1
 16 2016-10-16 0 0 1
 17 2016-10-17 0 0 1
 18 2016-10-18 0 0 1
 19 2016-10-19 0 0 1
 20 2016-10-20 0 0 0
 21 2016-10-21 0 0 0
 22 2016-10-22 0 0 0
 23 2016-10-23 0 0 0
 24 2016-10-24 0 0 0
 25 2016-10-25 0 0 0
 26 2016-10-26 0 0 0
 27 2016-10-27 0 0 0
 28 2016-10-28 0 0 0
 29 2016-10-29 0 0 0
 30 2016-10-30 0 0 0
 31 2016-10-31 0 0 0
4

1 回答 1

0

考虑在数据帧列表的所有 df 项中使用零循环但使用Reduce()with 。merge但是,您需要设置的数据框和列表略有不同。

首先,将顺序数据帧添加为列表的第一个 elmenet Date。然后,在您读取的每个文件中添加与,相对应的第二列A,每列等于 1(这可以在 read in process 中使用的or循环中完成 - 发布此部分以进行演示)。总而言之,与原始 DF 完全匹配的结果如下所示:BClapplyforall.equal

# INITIALIZE LIST WITH DATE SEQUENCE DF
newldf <- list(data.frame(Date = as.factor(seq(as.POSIXct("2016-10-01", tz = "UTC"), 
                                  as.POSIXct("2016-10-31", tz = "UTC"), 
                                  by = 'day'))))

# APPEND LIST OF DATA FRAMES THAT ARE READ IN, EACH WITH SECOND COL = 1
newldf <- append(newldf,
                list(data.frame(Date = c("2016-10-01", "2016-10-02", 
                                         "2016-10-03", "2016-10-04"), A = 1),
                     data.frame(Date = c("2016-10-07", "2016-10-08", 
                                         "2016-10-09"), B = 1),
                     data.frame(Date = c("2016-10-15", "2016-10-16", 
                                         "2016-10-17", "2016-10-18", "2016-10-19"), C=1)))

# MERGE ALL DATA FAMES TOGETHER
newDF <- Reduce(function(...) merge(..., by=c("Date"), all=T), newldf)
newDF[is.na(newDF)] <- 0                                # CONVERT NAs TO ZEROs
newDF$Date <- as.POSIXct(newDF$Date, tz = "UTC")        # CONVERT DATE TO POSIXct
str(newDF)
# 'data.frame': 31 obs. of  4 variables:
#  $ Date: POSIXct, format: "2016-10-01" "2016-10-02" ...
#  $ A   : num  1 1 1 0 0 0 0 0 0 0 ...
#  $ B   : num  0 0 0 0 0 0 1 1 1 0 ...
#  $ C   : num  0 0 0 0 0 0 0 0 0 0 ...

str(DF)
# 'data.frame': 31 obs. of  4 variables:
#  $ Date: POSIXct, format: "2016-10-01" "2016-10-02" ...
#  $ A   : num  1 1 1 0 0 0 0 0 0 0 ...
#  $ B   : num  0 0 0 0 0 0 1 1 1 0 ...
#  $ C   : num  0 0 0 0 0 0 0 0 0 0 ...

all.equal(DF, newDF)
# [1] TRUE
于 2016-10-31T01:41:30.243 回答