我正在尝试根据另一个数据框中的信息创建一个数据框。
第一个数据帧(base_mar_bop)具有如下数据:
201301|ABC|4
201302|DEF|12
我的愿望是从中创建一个包含 16 行的数据框:
4 times: 201301|ABC|1
12 times: 201302|DEF|1
我编写了一个需要很长时间才能运行的脚本。为了了解最终数据帧有大约 200 万行,源数据帧有大约 10k 行。由于数据的机密性,我无法发布数据帧的源文件。
由于运行这段代码需要很长时间,我决定在 PHP 中执行此操作,它在一分钟内运行并完成了工作,将其写入 txt 文件,然后将 txt 文件导入 R。
我不知道为什么 R 需要这么长时间。是函数的调用吗?它是嵌套的for循环吗?从我的角度来看,那里没有那么多计算密集型的步骤。
# first create an empty dataframe called base_eop that will each subscriber on a row
identified by CED, RATEPLAN and 1
# where 1 is the count and the sum of 1 should end up with the base
base_eop <-base_mar_bop[1,]
# let's give some logical names to the columns in the df
names(base_eop) <- c('CED','RATEPLAN','BASE')
# define the function that enables us to insert a row at the bottom of the dataframe
insertRow <- function(existingDF, newrow, r) {
existingDF[seq(r+1,nrow(existingDF)+1),] <- existingDF[seq(r,nrow(existingDF)),]
existingDF[r,] <- newrow
existingDF
}
# now loop through the eop base for march, each row contains the ced, rateplan and number of subs
# we need to insert a row for each individual sub
for (i in 1:nrow(base_mar_eop)) {
# we go through every row in the dataframe
for (j in 1:base_mar_eop[i,3]) {
# we insert a row for each CED, rateplan combination and set the base value to 1
base_eop <- insertRow(base_eop,c(base_mar_eop[i,1:2],1),nrow(base_eop))
}
}
# since the dataframe was created using the first row of base_mar_bop we need to remove this first row
base_eop <- base_eop[-1,]