3

我知道这可能是一个非常愚蠢的问题,但我一直在这上面花了几个小时

想要读取我没有完整路径 (*/*data.csv) 的 .csv 文件。我知道下面会得到当前目录的路径但不知道如何适应

Marks <- read.csv(dir(path = '.', full.names=T, pattern='^data.*\\.csv'))

这个也试过了,但没用

Marks <- read.csv(file = "*/*/data.csv", sep = ",", header=FALSE))

我无法识别特定路径,因为这将在具有不同路径的不同机器上使用,但我确信主目录的子文件夹是 bash 脚本的结果

我打算从定义工作空间的unix中调用它

我的数据结构是

lecture01/test/data.csv
lecture02/test/data.csv
lecture03/test/data.csv
4

2 回答 2

2

您的评论(尽管目前不是您的问题本身)表明您希望在包含一些子目录(lecture01、lecture02 等)的工作目录中运行代码,每个子目录都包含一个子目录“marks”,而这些子目录又包含一个 data.csv 文件。如果是这样,并且您的目标是从每个子目录中读取 csv,那么您有几个选项,具体取决于剩余的详细信息。

案例 1:直接指定顶级目录名称,如果您都知道它们并且它们可能是特殊的:

dirs <- c("lecture01", "lecture02", "some_other_dir")
paths <- file.path(dirs, "marks/data.csv")

案例 2:构建顶级目录名称,例如,如果它们都以“lecture”开头,后跟一个两位数,您可以(或特别希望)指定一个数字范围,例如 01 到 15:

dirs <- sprintf("lecture%02s", 1:15)
paths <- file.path(dirs, "marks/data.csv")

Case 3: Determine the top-level directory names by matching a pattern, e.g. if you want to read data from within every directory starting with the string "lecture":

matched.names <- list.files(".", pattern="^lecture")
dirs <- matched.names[file.info(matched.names)$isdir]
paths <- file.path(dirs, "marks/data.csv")

Once you have a vector of the paths, I'd probably use lapply to read the data into a list for further processing, naming each one with the base directory name:

csv.data <- lapply(paths, read.csv)
names(csv.data) <- dirs

Alternatively, if whatever processing you do on each individual CSV is done just for its side effects, such as modifying the data and writing out a new version, and especially if you don't ever want all of them to be in memory at the same time, then use a loop.

If this answer misses the mark, of even if it doesn't, it would be great if you could clarify the question accordingly.

于 2013-02-27T03:25:52.447 回答
0

我没有代码,但我会从根目录做一个隐蔽的 glob 并做一个 preg_match 来查找 .csv 文件(使用 glob 大括号)。

于 2013-02-27T00:22:45.453 回答