-1

我有许多 .csv 文件,我保存在我 PC 上的一个文件夹中。然后,我创建一个这些数据集的列表,如下所示:

> file_list <- list.files()
> file_list
 [1] "ABWAbwut50.csv"        "ABWEinfam50.csv"       "ABWFeldwaldasph50.csv" "ABWGarage50.csv"      
 [5] "ABWGemeindestr50.csv"  "ABWHotel50.csv"        "ABWInd50.csv"          "ABWIntflaechen50.csv" 
 [9] "ABWKantonsstr50.csv"   "ABWMehrfam50.csv"      "ABWNutzwald50.csv"     "ABWSchutzwald50.csv"  
[13] "ABWstahlmitvieh50.csv" "ABWStromut50.csv"      "ABWWeideland50.csv"   

.csv 文件包含相同的列,小数使用.,列由 . 分隔;。我尝试使用以下代码组合这些数据集:

for (file in file_list){
  if (!exists("dataset")){
    dataset <- read_delim(file, ";", escape_double = FALSE, trim_ws = TRUE)
  }
}
dataset

但它只读取第一个文件。我怎样才能将所有 15 个 .csv 文件合并到一个数据框中?

当我运行不同的代码时,我收到以下错误消息:

> View(dataset)
> dataset <- do.call("rbind",lapply(file_list,
+                                   FUN=function(files){read.table(files,
+                                                                  header=TRUE, sep=";")}))
 Show Traceback

 Rerun with Debug
 Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  line 103 did not have 8 elements 

我假设出了点问题,其中一个文件(实际上我知道它在一个文件中只有几行)只有 7 列而不是 8 列。我不想单独查看每个文件以尝试查找是否有一些异常。我怎样才能让这些不遵循模式的行自动删除?

我的数据文件看起来像:

> dput(dataset[1:10,])
structure(list(Berechnung = c("EconoMe original", "Berechnung 1", 
"Berechnung 2", "Berechnung 3", "Berechnung 4", "Berechnung 5", 
"Berechnung 6", "Berechnung 7", "Berechnung 8", "Berechnung 9"
), Situation = c("Nach Massnahme Neue Gerinnefuehrung Gafenbach", 
"Nach Massnahme Neue Gerinnefuehrung Gafenbach", "Nach Massnahme Neue Gerinnefuehrung Gafenbach", 
"Nach Massnahme Neue Gerinnefuehrung Gafenbach", "Nach Massnahme Neue Gerinnefuehrung Gafenbach", 
"Nach Massnahme Neue Gerinnefuehrung Gafenbach", "Nach Massnahme Neue Gerinnefuehrung Gafenbach", 
"Nach Massnahme Neue Gerinnefuehrung Gafenbach", "Nach Massnahme Neue Gerinnefuehrung Gafenbach", 
"Nach Massnahme Neue Gerinnefuehrung Gafenbach"), NK = c(0, 0, 
0, 0, 0, 0, 0, 0, 0, 0), PID = c(2639L, 2639L, 2639L, 2639L, 
2639L, 2639L, 2639L, 2639L, 2639L, 2639L), Case = c(1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L), Differenz = c(0, 0, 0, 0, 0, 0, 
0, 0, 0, 0), Prozess = c("Murgang", "Murgang", "Murgang", "Murgang", 
"Murgang", "Murgang", "Murgang", "Murgang", "Murgang", "Murgang"
), Objektart = c("Abwasser unter Terrain", "Abwasser unter Terrain", 
"Abwasser unter Terrain", "Abwasser unter Terrain", "Abwasser unter Terrain", 
"Abwasser unter Terrain", "Abwasser unter Terrain", "Abwasser unter Terrain", 
"Abwasser unter Terrain", "Abwasser unter Terrain")), .Names = c("Berechnung", 
"Situation", "NK", "PID", "Case", "Differenz", "Prozess", "Objektart"
), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
4

1 回答 1

1

其中一个文件可能包含;在文本中。此解决方案使用您的第一个编码示例并进行修改,以检查哪些文件包含问题。

file_list <- list.files()
# setup the dataset
dataset <- read.table(file_list[1], sep = ";", header = TRUE)

# cycle through all other files
for (file in file_list[-1]){
    temp <- try(read.table(file, sep = ";", header = TRUE))
    # check if the file can be read as a table
    if(class(temp) == "try-error"){
        message(paste("One file skipped. Correct mistakes in file", file))
        print(temp)
        next
    }
    dataset <- rbind(dataset, temp)
}
于 2017-04-27T10:41:43.660 回答