读取多个文件并将它们放入单个数据框或数据表中的三种快速方法
首先获取所有txt文件的列表(包括子文件夹中的)
list_of_files <- list.files(path = ".", recursive = TRUE,
pattern = "\\.txt$",
full.names = TRUE)
1)从包装中使用fread()
w/rbindlist()
data.table
#install.packages("data.table", repos = "https://cran.rstudio.com")
library(data.table)
# Read all the files and create a FileName column to store filenames
DT <- rbindlist(sapply(list_of_files, fread, simplify = FALSE),
use.names = TRUE, idcol = "FileName")
2)使用框架中的readr::read_table2()
w/ :purrr::map_df()
tidyverse
#install.packages("tidyverse",
# dependencies = TRUE, repos = "https://cran.rstudio.com")
library(tidyverse)
# Read all the files and create a FileName column to store filenames
df <- list_of_files %>%
set_names(.) %>%
map_df(read_table2, .id = "FileName")
3)(可能是三者中最快的)使用vroom::vroom()
:
#install.packages("vroom",
# dependencies = TRUE, repos = "https://cran.rstudio.com")
library(vroom)
# Read all the files and create a FileName column to store filenames
df <- vroom(list_of_files, .id = "FileName")
注意:清理文件名,使用basename
或gsub
功能
基准测试: readr
vsdata.table
vsvroom
大数据
编辑1:读取多个csv
文件并跳过header
使用readr::read_csv
list_of_files <- list.files(path = ".", recursive = TRUE,
pattern = "\\.csv$",
full.names = TRUE)
df <- list_of_files %>%
purrr::set_names(nm = (basename(.) %>% tools::file_path_sans_ext())) %>%
purrr::map_df(read_csv,
col_names = FALSE,
skip = 1,
.id = "FileName")
编辑 2:要将包含通配符的模式转换为等效的正则表达式,请使用glob2rx()