r - 使用 R 包合并多个 .xlsx 文件（按列）在标题前带有垃圾文本的问题：readxl 并写入 csv

Question

我对 R 和一般编程非常陌生，需要帮助来排列包含在 ~2000 个 .xls 和 .xlsx 文件中的数据。每个文件以标题前 34 - 40 行“垃圾”文本开始；标题下的所有数据都具有相同的维度。

我尝试的第一种方法将数据添加到列表中；垂直格式没有用。

library(readxl)
file.list <- list.files(pattern='*.xls')
dm.list <- lapply(file.list, read_excel)

我目前正在尝试一次读取一个文件，删除“垃圾”文本，然后写入 .csv 文件（按列附加数据）。

library(readxl)
file.list <- list.files(pattern='*.xls')

for(i in 1:dim.data.frame(file.list))

store.matrix <-  read_excel((paste0("C:\\Users\\jlmine\\Desktop\\qPCRextData\\", file.list[i])), sheet = "Results")

while (store.matrix[1,1] != "Well") #search for header
{  store.matrix <- store.matrix[-c(1)] } #delete non-header rows

write.csv(store.matrix, file = "qPCRdataanalysis.csv", append = TRUE)

以下行引发错误：

store.matrix <- read_excel((paste0("C:\\Users\\jlmine\\Desktop\\qPCRextData\\", file.list[i])), sheet = "Results")

错误：“C:\Users\jlmine\Desktop\qPCRextData\”不存在。另外：警告消息：在 1:dim.data.frame(file.list) 中：
数值表达式有 2 个元素：只使用第一个

“C:\Users\jlmine\Desktop\qPCRextData\”被设置为我的工作目录任何想法将不胜感激。

score 0 · Accepted Answer

如果没有看到您的一些数据，我无法确定，但看起来您可以读取每个文件，找到“真实”数据开始的行，然后删除“垃圾”行。例如：

df.list = lapply(file.list, function(f) {

  # Read file
  tmp = read_excel(f, sheet="Results")

  # Find highest index of row containing "Well" and add 1 (assuming here
  # that a row containing "Well" will come before the header row).
  s = which(apply(tmp, 1, function(x) {grep("Well", x)}) > 0)
  s = ifelse(length(s) > 0, max(s) + 1, 0)

  # Reset column names to the values in row s (the actual header row)
  # Remove rows 1 through s (the "junk" text plus the header row) from the data frame
  if(s > 0) {
    names(tmp) = tmp[s, ]
    tmp[-(1:s), ]
  }

})

您现在将拥有df.list一个列表，其中每个元素都是您刚刚加载的 xls/xlsx 文件之一。您说要按列组合数据，但是如果每个数据框都有相同的列，您是否不想堆叠数据框。为此，您可以这样做：

df.list = do.call(rbind, df.list)

您现在有一个数据框，您可以将其另存为 csv 文件。

score 0 · Accepted Answer

如果无法访问您的 .xlsx 文件，问题似乎出在您的 for 循环语句中。list.files返回指定目录中文件的字符向量。在长度为 5 的向量 x 上使用dim.data.frame会得到结果：

#[0, 5]

从您的警告消息中，您知道 for 循环中仅使用了第一个元素。所以你不会循环任何东西。

因此，如果您想更优雅地遍历所有文件，您可以...

for (i in seq(length(file.list)) {

r - 使用 R 包合并多个 .xlsx 文件（按列）在标题前带有垃圾文本的问题：readxl 并写入 csv

2 回答 2

Related

Reference