0

我正在努力弄清楚如何在 R 中做到这一点。我从一组约 50 个 csv 文件中获得了这样的数据,每个文件都详细说明了一个单独的图书销售交易:

**week 1**
**author** **title** **customerID**
author1 title1 1
author1 title2 2
author2 title3 3
author3 title4 3

**week 2**
**author** **title** **customerID**
author1 title1 4
author3 title4 5
author4 title5 1
author5 title6 6

... ~ 50 weeks, each from a separate csv file

我想得到一个新表,每一行代表一个出现在完整数据集中的作者,以及我有数据的大约 50 周中的每一周的列。每个单元格应该是该作者在该周的图书销量。这可以简单地通过将该作者在该周的销售文件中的行数相加来计算。所以它应该看起来像这样:

**author** **week1** **week2** ... **week50**
author1 2 1 ...
author2 1 0 ...
author3 1 1 ...
author4 0 1 ...
author5 0 1 ...
...

有任何想法吗?我知道如何获取唯一作者列表以制作第一列。而且我知道如何将每周的销售数据加载到数据框中。但我需要帮助自动化这个过程:1)迭代独特的作者 2)迭代每周的 csv 文件或数据框 3)总结该作者在该周的销售额 4)添加计数作为该单元格的值

有人可以帮忙吗?谢谢 :-)

4

1 回答 1

1
text1<-"**week 1**
**author** **title** **customerID**
author1 title1 1
author1 title2 2
author2 title3 3
author3 title4 3
"

df1<-read.table(header=T,skip=1,stringsAsFactors=F,text=text1)
week1<-read.table(header=F,nrows=1,stringsAsFactors=F,text=text1,sep=";")
week1<-substr(week1,3,nchar(week1)-2)
df1$week<-rep(week1,nrow(df1))

text2<-"**week 2**
**author** **title** **customerID**
author1 title1 4
author3 title4 5
author4 title5 1
author5 title6 6
"

df2<-read.table(header=T,skip=1,stringsAsFactors=F,text=text2)
week2<-read.table(header=F,nrows=1,stringsAsFactors=F,text=text2,sep=";")
week2<-substr(week2,3,nchar(week2)-2)
df2$week<-rep(week2,nrow(df2))

df<-rbind(df1,df2)
names(df)<-c("author","title","customerID","week")

require(plyr)
agg<-ddply(df,~author*week,function(df) length(df$title))


require(reshape)
res<-cast(agg,author~week,value="V1",fill=0)
res

   author week 1 week 2
1 author1      2      1
2 author2      1      0
3 author3      1      1
4 author4      0      1
5 author5      0      1

您只需要一个循环来读取您的数据。为此,您可以使用类似的东西

ff<-list.files(pattern="*.[Cc][Ss][Vv]")
for (i in 1:length(ff)){
  code for reading the data 
  and constructing the data.frame 
}
于 2012-07-10T11:24:18.767 回答