样本数据:
product_id <- c("1000","1000","1000","1000","1000","1000", "1002","1002","1002","1002","1002","1002")
qty_ordered <- c(1,2,1,1,1,1,1,2,1,2,1,1)
price <- c(2.49,2.49,2.49,1.743,2.49,2.49, 2.093,2.093,2.11,2.11,2.11, 2.97)
date <- c("2/23/15","2/23/15", '3/16/15','3/16/15','5/16/15', "6/18/15", "2/19/15","3/19/15","3/19/15","3/19/15","3/19/15","4/19/15")
sampleData <- data.frame(product_id, qty_ordered, price, date)
我想确定每次价格发生变化的时间。另外,我想 sum() 这两个价格变化日期之间的总 qty_ordered。例如,对于product_id == "1000"
,价格在 2015 年 3 月 16 日从 2.49 美元变为 1.743 美元。总 qty_ordered 为 1+2+1=4;这两个最早的价格变化日期之间的差异是从 2/23/15 到 3/16/15,即 21 天。
所以新的数据框应该是:
product_id sum_qty_ordered price date_diff
1000 4 2.490 21
1000 1 1.743 61
1000 2 2.490 33
这是我尝试过的:
**注意:对于这种情况,简单的“ dplyr::group_by
”将不起作用,因为它会忽略日期效果。
1)我从确定 data.frame 的列何时更改值并返回更改的索引中找到此代码:这是为了识别每次价格变化的时间,它识别每个产品价格变化的第一个日期。
IndexedChanged <- c(1,which(rowSums(sapply(sampleData[,3],diff))!=0)+1)
sampleData[IndexedChanged,]
但是,如果我使用该代码,我不确定如何计算sum(qty_ordered)
每个条目的日期差和日期差。
2)我尝试编写一个 WHILE 循环来临时存储每批product_id
、价格、日期范围(例如,具有一个product_id
、一个价格的数据框子集,并且所有条目的范围从价格变化的最早日期到最后一个日期价格变化之前的价格),然后总结该子集以获得 sum( sum_qty_ordered
) 和日期差异。但是,我认为我总是对 WHILE 和 FOR 感到困惑,所以我的代码存在一些问题。这是我的代码:
为以后的数据存储创建一个空的数据框
NewData_Ready <- data.frame(
product_id = character(),
price = double(),
early_date = as.Date(character()),
last_date=as.Date(character()),
total_qty_demanded = double(),
stringsAsFactors=FALSE)
创建一个临时表来存储批量价格订单条目
temp_dataset <- data.frame(
product_id = character(),
qty_ordered = double(),
price = double(),
date=as.Date(character()),
stringsAsFactors=FALSE)
循环:这很混乱......而且可能没有意义,所以我真的在这方面提供帮助。
for ( i in unique(sampleData$product_id)){
#for each unique product_id in the dataset, we are gonna loop through it based on product_id
#for first product_id which is "1000"
temp_table <- sampleData[sampleData$product_id == "i", ] #subset dataset by ONE single product_id
#this dataset only has product of "1000" entries
#starting a new for loop to loop through the entire entries for this product
for ( p in 1:length(temp_table$product_id)){
current_price <- temp_table$price[p] #assign current_price to the first price value
#assign $2.49 to current price.
min_date <- temp_table$date[p] #assign the first date when the first price change
#assign 2015-2-23 to min_date which is the earliest date when price is $2.49
while (current_price == temp_table$price[p+1]){
#while the next price is the same as the first price
#that is, if the second price is $2.49 is the same as the first price of $2.49, which is TRUE
#then execute the following statement
temp_dataset <- rbind(temp_dataset, temp_table[p,])
#if the WHILE loop is TRUE, means every 2 entries have the same price
#then combine each entry when price is the same in temp_table with the temp_dataset
#if the WHILE loop is FALSE, means one entry's price is different from the next one
#then stop the statement at the above, but do the following
current_price <- temp_table$price[p+1]
#this will reassign the current_price to the next price, and restart the WHILE loop
by_idPrice <- dplyr::group_by(temp_dataset, product_id, price)
NewRow <- dplyr::summarise(
early_date = min(date),
last_date = max(date),
total_qty_demanded = sum(qty_ordered))
NewData_Ready <- rbind(NewData_Ready, NewRow)
}
}
}
我已经搜索了很多相关问题,但我还没有找到与这个问题相关的任何内容。如果您有一些建议,请告诉我。另外,请就我的问题的解决方案提供一些建议。非常感谢您的时间和帮助!
Here is my R version:
platform x86_64-apple-darwin13.4.0
arch x86_64
os darwin13.4.0
system x86_64, darwin13.4.0
status
major 3
minor 3.1
year 2016
month 06
day 21
svn rev 70800
language R
version.string R version 3.3.1 (2016-06-21)
nickname Bug in Your Hair