r - 在 ggplot 中绘制多个大型数据文件中的数据

Question

我有几个数据文件（数字），大约 150000 行和 25 列。在我使用 gnuplot（其中脚本行是比例绘图对象）来绘制数据之前，但由于我现在必须对其进行一些额外的分析，所以我转向了R和 ggplot2。

如何组织数据，思考？一个带有额外列的大 data.frame 来标记数据来自哪个文件真的是唯一的选择吗？或者有什么办法解决这个问题？

编辑：更准确地说，我将以我现在拥有数据的形式举例说明：

filelst=c("filea.dat", "fileb.dat", "filec.dat")
dat=c()
for(i in 1:length(filelst)) {
    dat[[i]]=read.table(file[i])
}

score 2 · Accepted Answer

假设您有以“.dat”结尾的文件名，这是 Chase 提出的策略的模型示例，

require(plyr)

# list the files
lf = list.files(pattern = "\.dat")
str(lf)

# 1. read the files into a data.frame
d = ldply(lf, read.table, header = TRUE, skip = 1) # or whatever options to read
str(d) # should contain all the data, and and ID column called L1

# use the data, e.g. plot
pdf("all.pdf")
d_ply(d, "L1", plot, t="l")
dev.off()
# or using ggplot2
ggplot(d, aes(x, y, colour=L1)) + geom_line()

# 2. read the files into a list

ld = lapply(lf, read.table, header = TRUE, skip = 1) # or whatever options to read
names(ld) = gsub("\.dat", "", lf) # strip the file extension
str(ld) 

# use the data, e.g. plot
pdf("all2.pdf")
lapply(names(l), function(ii) plot(l[[ii]], main=ii), t="l")
dev.off()

# 3. is not fun

score 1 · Accepted Answer

你的问题有点含糊。如果我按照正确的方式进行操作，我认为您有三个主要选择：

按照你的建议做，然后使用 R 中存在的任何一个“split-apply-combine”函数按组进行分析。这些函数可能包括by、aggregate、ave、package(plyr)和package(data.table)许多其他函数。
将您的数据对象作为单独的元素存储在list(). 然后使用lapply()和朋友来处理它们。
将所有内容分开保存在不同的数据对象中，并单独处理它们。这可能是最低效的做事方式，除非你有内存限制等。

r - 在 ggplot 中绘制多个大型数据文件中的数据

2 回答 2

Related

Reference