0

I have a lot of CSV that need to be standardized. I created a dictionary for doing so and so far the function that I have looks like this:

inputpath <- ("input")

files<- paste0(inputpath, "/", 
                 list.files(path = inputpath, pattern = '*.gz',
                            full.names = FALSE))

standardizefunctiontofiles = lapply(files, function(x){
    DF <- read_delim(x, delim = "|",  na="")
    names(DF) <- dictionary$final_name[match(names(DF), dictionary$old_name)]
})

Nonetheless, the issue that I have is that when I read the CSV and turn them into a dataframe they lose their path and therefore I can't not write each of them as a CSV that matches the input name. What I would normally do would be:

output_name <- str_replace(x, "input", "output")
write_delim(x, "output_name", delim = "|")

I was thinking that a way of solving this would be to make this step:

DF <- read_delim(x, delim = "|",  na="")

so that the DF gets the name of the path but I haven't find any solution for that.

Any ideas on how to solve this issue for being able to apply a function and writing each of them as a standardized CSV?

4

1 回答 1

0

我不完全理解这个问题。但据我了解,您想用包含修改(和正确)数据框信息的新 CSV 文件覆盖正在读取的 CSV 文件。

我认为你有两种选择

选项 1) 读取数据时,将 CSV 存储为数据框并将路径存储为列表中的字符串。

这就像

file_list <- list()

for (i in seq_along(files)) {
  file_list[[i]] <- list(df = read_delim(files[[i]], delim = "|",  na = ""),
                         path = files[[i]])
}

然后,当您编写更正的数据帧时,您可以使用 list 中列表的第二个元素中的路径file_list。请注意,为了将路径作为字符串获取,您需要执行类似的操作file_list[[1]][["path"]]

选项 2) 使用assign

for (i in seq_along(files)) {
   assign(files[[i]], read_delim(files[[i]], delim = "|",  na = ""))
}

选项 3) 使用并且是一个函数do.call的事实!<-

for (i in seq_along(files)) {
   do.call("<-", list(files[[i]], read_delim(files[[i]], delim = "|",  na = "")))
}

我希望这是有用的!

NB)没有一个功能尽可能有效地实现。他们只是介绍这个想法。

于 2019-12-12T02:17:05.717 回答