r - 计算一系列csv文件的行数

Question

我正在研究 R 教程，并怀疑我必须使用其中一个功能，但我不确定是哪个（是的，我研究了它们，但直到我对 R 术语更加流利，它们才变得相当混乱）。

在我的工作目录中有一个文件夹“specdata”。Specdata 包含数百个名为 001.csv - 300.csv 的 CSV 文件。

我正在处理的函数必须计算输入的 csv 文件数的总行数。因此，如果函数中的参数是1:10并且每个文件都有十行，则返回 100。

这是我到目前为止所拥有的：

complete <- function(directory,id = 1:332) {
    setpath <- paste("/Users/gcameron/Desktop",directory,sep="/")
    setwd(setpath)
    csvfile <- sprintf("%03d.csv", id)
    file <- read.csv(csvfile)
    nrow(file)
 }

这在 ID 参数是一个数字时有效，比如 17。但是，如果我输入 10:50 作为参数，我会收到一个错误：

Error in file(file, "rt") : invalid 'description' argument

我应该怎么做才能从输入的 ID 参数中计算总行数？

score 9 · Accepted Answer

read.csv期望只读取一个文件，因此您需要遍历文件，这样做的 R 惯用方法是使用sapply：

nrows <- sapply( csvfile, function(f) nrow(read.csv(f)) )
sum(nrows)

例如，这是您的complete函数的重写：

complete <- function(directory,id = 1:332) {
    csvfiles <- sprintf("/Users/gcameron/Desktop/%s/%03d.csv", directory, id)
    nrows <- sapply( csvfiles, function(f) nrow(read.csv(f)) )
    sum(nrows)
}

score 1 · Accepted Answer

家庭作业问题通常会被标记为这样，虽然我不知道这是否需要，但这显然是家庭作业。

您编写的函数期望 id不是向量（尽管默认值是整数向量）。

将其更改为使用 *apply 函数之一（更简洁和通用），甚至使用显式循环。对于 id 向量中的每个元素，您必须调用一个函数来打开该文件并计算观察值。

这篇 stackoverflow 帖子很好地解释了 *apply 函数之间的区别。

score 0 · Accepted Answer

id <-c(1:332)
filenames=list.files(path="source_path", full.names=TRUE)

for(a in id){

    dataset <- read.csv(filenames[a])

    res <- nrow(na.exclude(dataset))  #nrow count the row of the dataset
    
    df <-data.frame(
    id =a,
    nobs =res,  
    stringsAsFactors=FALSE)
}   

df

score 0 · Accepted Answer

complete <- function(directory, id = 1:332){
  mylist <- list.files(path = directory, pattern = ".csv")
  result <- data.frame()
  for(i in id){
    my_data <- read.csv(paste(directory,mylist[i],sep=""))
    res <- nrow(na.exclude(my_data))  #nrow count the row of the dataset
    df <- data.frame("id" = i,"nobs" = res,  stringsAsFactors=FALSE)
    result <- rbind(result,df)
  }
  return(result)
}

r - 计算一系列csv文件的行数

4 回答 4

Related

Reference