我正在使用ddply
insidesubset
来计算一些指标并根据需要汇总表格。我要计算的一些指标需要使用作为ddply
操作结果创建的汇总列。
这是具有简单计算列的函数:
subset_by_market <- function (q, marketname, dp) {
subset(ddply(df, .(quarter, R.DMA.NAMES, daypart, station), summarise,
spot.count = length(spot.id),
station.investment = sum(rate),
nullspots.male = sum(nullspot.male),
nullspots.allpersons = sum(nullspot.allpersons),
total.male.imp = sum(male.imp),
total.allpersons.imp = sum(allpersons.imp),
spotvalue.male = sum(spotvalue.male),
spotvalue.allpersons = sum(spotvalue.allpersons)),
quarter == q & R.DMA.NAMES == marketname & daypart == dp)
}
我subset_by_market ("Q32013" , "Columbus.OH", "primetime")
用来总结创建一个子集。我的结果表如下所示:
quarter R.DMA.NAMES daypart station spot.count station.investment nullspots.male nullspots.allpersons
10186 Q32013 Columbus.OH primetime ADSM COLUMBUS, OH 103 5150 67 61
10187 Q32013 Columbus.OH primetime ESYX 49 0 49 49
10188 Q32013 Columbus.OH primetime MTV COLUMBUS, OH 61 4500 7 1
10189 Q32013 Columbus.OH primetime WCMH-Retro TV 94 564 93 93
10190 Q32013 Columbus.OH primetime WTTE 1 0 0 0
10191 Q32013 Columbus.OH primetime WWHO 9 0 2 2
total.male.imp total.allpersons.imp spotvalue.male spotvalue.allpersons
10186 47.2 127.7 4830.409 4775.1068
10187 0.0 0.0 NaN NaN
10188 157.9 371.1 4649.746 4505.2608
10189 0.3 0.3 3162.000 3162.0000
10190 3.5 10.3 570.166 591.0231
10191 3.9 15.8 7155.000 4356.4162
问题 1:我想添加到相同的数据框中,例如:spot.count 的百分比值。= spot.count / sum(spot.count) (ii)percent.nullspots.male = nullspots.male / sum(nullspots.male)
但是,当我将其添加到ddply
参数中时,我在结果列中得到 1 (100%)。该值除以自身,而不是除以列的总和。
问题 2:这是缓慢而谦虚的我接受这可能不是最佳编码。我正在使用带有 16Gb ddr3 RAM 和 64 位操作系统的 i5-2.6GHz PC。数据集是 1M 行。
system.time(subset_by_market ("Q32013" , "Albuquerque.Santa.Fe", "late fringe"))
user system elapsed
228.13 176.84 416.12
目的是在在线仪表板上可视化所有计算的指标,并允许用户选择subset_by_market (q , marketname, dp)
使用下拉菜单。我怎样才能让它更快?
添加样本数据:
`> structure(list(market = c("Local", "Local", "Local", "Local",
"Local", "Local", "Local", "NATIONAL CABLE", "Local", "Local"
), spot.id = c(11248955L, 11262196L, 11946349L, 11625265L, 12929889L,
11259758L, 11517638L, 11599834L, 12527365L, 12930259L), date = structure(c(1375675200,
1376625600, 1390280400, 1383627600, 1401249600, 1375848000, 1380772800,
1383019200, 1397102400, 1401163200), class = c("POSIXct", "POSIXt"
), tzone = ""), hour = c(15, 17, 11, 18, 19, 1, 13, 14, 16, 22
), time = structure(c(0.642361111111111, 0.749305555555556, 0.481944444444444,
0.770138888888889, 0.830555555555556, 0.0597222222222222, 0.582638888888889,
0.597222222222222, 0.675694444444444, 0.930555555555556), format = "h:m:s", class = "times"),
local.date = structure(c(1375675200, 1376625600, 1390280400,
1383627600, 1401249600, 1375848000, 1380772800, 1383019200,
1397102400, 1401163200), class = c("POSIXct", "POSIXt"), tzone = ""),
local.hour = c(15, 17, 11, 18, 18, 0, 13, 14, 15, 22), local.time = structure(c(0.642361111111111,
0.749305555555556, 0.481944444444444, 0.770138888888889,
0.788888888888889, 0.0180555555555556, 0.582638888888889,
0.597222222222222, 0.634027777777778, 0.930555555555556), format = "h:m:s", class = "times"),
vendor = c("Time Warner - Myrtle Beach", "WMYD", "WSBK",
"WDCA", "Comcast - Memphis", "Charter Media - Birmingham",
"WBNA", "G4", "Comcast - Houston", "Comcast - Youngstown"
), station = c("VH-1 MYRTLE BEACH", "WMYD", "WSBK", "WDCA",
"COM MEMPHIS", "FX BIRMINGHAM", "WBNA", "G4", "SPK HOUSTON",
"COM YOUNGSTOWN CC"), male.imp = c(0, 2, 0, 0, 0.6, 0.4,
0, 0, 3.9, 0), women.imp = c(0, 2.5, 0, 2.5, 0.2, 0.6, 0,
0, 4.6, 0.6), allpersons.imp = c(0, 3.5, 0, 2.5, 0.8, 0.8,
0, 0, 7.8, 0.6), hh.imp = c(0, 8.5, 8, 64.5, 1.3, 2.9, 1.3,
15, 13.7, 1), isci = c("IT6140MB", "ITCD78DT", "IT6192BS",
"IT6170WD", "IT6173ME", "IT6162BI", "IT6155LO", "ITES13410",
"IT3917", "IT3921"), creative = c("Eugene Elbert (Bach. Tcom Eng. Tech) :60",
"The Problem Solvers (revised) - IET :60", "Murtech/Kinetic/Integra :60",
"Kevin Bumper/NTSG/Lifetime :60", "NCR/Schlumberger/Sprint (revised) (Bach) :60",
"Skills Gap (revised) /Kevin :60", "Rising Costs60 (Opportunity Scholar - No Nursing)",
"Irina Lund (Bach. ISS) :60", "Augustine Lopez (A. CEET) :30 (no loc)",
"John Ryan Ellis (B. PM/A. CDD) :30 (no loc)"), program = c(NA,
"TYLER PERRY'S MEET THE BROWNS", "THE PEOPLE'S COURT", "Judge Judy",
NA, NA, "Meet the Browns/Are We There Yet/News/Wendy Willia",
"HEROES", "Spike EF Rotator", NA), rate = c(5, 230, 100,
625, 40, 0, 15, 40, 110, 7), R.DMA.NAMES = c("Myrtle.Beach.Florence",
"Detroit", "Boston.Manchester.", "Washington.DC.Hagrstwn.",
"Memphis", "Birmingham.Ann.and.Tusc.", "Louisville", "NATIONAL CABLE",
"Houston", "Youngstown"), date.time = c("2013-08-05 15:25:00",
"2013-08-16 17:59:00", "2014-01-21 11:34:00", "2013-11-05 18:29:00",
"2014-05-28 19:56:00", "2013-08-07 01:26:00", "2013-10-03 13:59:00",
"2013-10-29 14:20:00", "2014-04-10 16:13:00", "2014-05-27 22:20:00"
), daypart = c("afternoon", "evening", "morning", "evening",
"evening", "late fringe", "afternoon", "afternoon", "afternoon",
"primetime"), quarter = structure(c(4L, 4L, 1L, 6L, 3L, 4L,
6L, 6L, 3L, 3L), .Label = c("Q12014", "Q22013", "Q22014",
"Q32013", "Q32014", "Q42013"), class = "factor"), cpi.allpersons = c(96.2179487179487,
79.0114068441065, 35.1219512195122, 82.3322348711803, 30,
0, 138.721804511278, 28.3135215453195, 28.2384088854449,
86.6666666666667), cpi.male = c(750.5, 188.882673751923,
115.959004392387, 144.492639327024, 38.9847715736041, 0,
595.161290322581, 34.7402005469462, 62.010777084515, 156.712328767123
), spotvalue.allpersons = c(0, 276.539923954373, 0, 205.830587177951,
24, 0, 0, 0, 220.25958930647, 52), spotvalue.male = c(0,
377.765347503846, 0, 0, 23.3908629441624, 0, 0, 0, 241.842030629609,
0), nullspot.allpersons = c(1, 0, 1, 0, 0, 0, 1, 1, 0, 0),
nullspot.male = c(1, 0, 1, 1, 0, 0, 1, 1, 0, 1)), .Names = c("market",
"spot.id", "date", "hour", "time", "local.date", "local.hour",
"local.time", "vendor", "station", "male.imp", "women.imp", "allpersons.imp",
"hh.imp", "isci", "creative", "program", "rate", "R.DMA.NAMES",
"date.time", "daypart", "quarter", "cpi.allpersons", "cpi.male",
"spotvalue.allpersons", "spotvalue.male", "nullspot.allpersons",
"nullspot.male"), row.names = c(561147L, 261262L, 89888L, 941010L,
500366L, 65954L, 484053L, 598996L, 380976L, 968615L), class = "data.frame")`
为丑陋道歉dput
。