我正在尝试在具有销售分类变量的数据框架上运行一些基本统计数据(以及以后更深入的统计数据)。除了销售额之外,它还跟踪区域(商家所在的位置)、星期几、一天中的时间(午餐、下班后等)以及其他各种信息。
这是数据的一个小的随机子集:(请注意,这是一个基本表示形式 - 实际数据框有 38 列 - 我只是去掉了大部分不适用的列)
structure(list(dayofweek = structure(c(4L, 7L, 3L, 7L, 3L, 2L,
2L, 7L, 3L, 3L, 2L, 7L, 5L, 5L, 2L, 5L, 1L, 3L, 7L, 3L, 4L, 1L,
3L, 5L, 7L), .Label = c("Friday", "Monday", "Saturday", "Sunday",
"Thursday", "Tuesday", "Wednesday"), class = "factor"), timeofday = structure(c(6L,
4L, 5L, 5L, 2L, 6L, 6L, 5L, 6L, 3L, 6L, 3L, 5L, 4L, 1L, 3L, 5L,
6L, 5L, 4L, 6L, 6L, 3L, 2L, 5L), .Label = c("After Work", "Early AM",
"Evening", "Late AM", "Lunch", "MidAfternoon", "Overnight"), class = "factor"),
area = c(6L, 4L, 4L, 5L, 5L, 1L, 4L, 2L, 3L, 2L, 7L, 3L,
7L, 5L, 7L, 4L, 1L, 4L, 1L, 4L, 5L, 7L, 1L, 3L, 7L), totsales = c(40,
6, 5, 10, 1, 0, 0, 3, 5, 3, 10, 30, 2, 1, 2, 22, 8, 1, 1,
5, 11, 20, 0, 1, 5)), .Names = c("dayofweek", "timeofday",
"area", "totsales"), class = "data.frame", row.names = c(192278L,
140773L, 121051L, 157984L, 154299L, 258034L, 108031L, 43760L,
78005L, 42103L, 95603L, 98431L, 30252L, 165303L, 40713L, 108252L,
304549L, 137041L, 268473L, 124599L, 161253L, 12897L, 240815L,
89439L, 21032L))
我要做的第一件事是尝试获得每个区域和一天中每个时间的平均销售额和中位数销售额。我想让 R 遍历每个列表并返回所有值。我试过这个:
vallist<-list(a= c("Early AM", "Late AM", "Lunch", "MidAfternoon", "After Work",
"Evening", "Overnight"),
b= c(1,2,3,4,5,6,7))
sapply(vallist[['b']], function(x)
mapply(function(a,b) mean(data$totsales[which(data$timeofday==a & data$area==b)]),
vallist[['a']], vallist[['b']])
)
但是,它仅将平均值应用于区域 1 中的每个时间段,而不是区域 1-7 中的每个时间段。所以,我的结果是这样的:
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
Early AM 9.192847 9.192847 9.192847 9.192847 9.192847 9.192847 9.192847
Late AM 8.020678 8.020678 8.020678 8.020678 8.020678 8.020678 8.020678
Lunch 10.096277 10.096277 10.096277 10.096277 10.096277 10.096277 10.096277
MidAfternoon 11.503961 11.503961 11.503961 11.503961 11.503961 11.503961 11.503961
After Work 8.206124 8.206124 8.206124 8.206124 8.206124 8.206124 8.206124
Evening 11.457599 11.457599 11.457599 11.457599 11.457599 11.457599 11.457599
Overnight 11.415667 11.415667 11.415667 11.415667 11.415667 11.415667 11.415667
这是区域 1 的正确答案,但您可以看到它们对于每个区域都是相同的值。如何让 R 将该函数应用于多个列表并返回所有值组合?
接下来的步骤将是应用中位数,并在地区级别和不同的工作日进行评估,但我认为相同的想法将适用于所有不同的组合。