我有一个包含两列的 data.frame:
category quantity
a 20
b 30
c 100
d 10
e 1
f 23
g 3
h 200
我需要编写一个带有两个参数的函数:dataframe
,bin_size
它在数量列上运行 a ,如果超过 ,cumsum
则对后续行进行拆分,并添加一个正在运行的 bin 编号作为附加列。cumsum
bin_size
说,通过输入:
function(dataframe, 50)
在上面的例子中应该给我:
category quantity cumsum bin_nbr
a 20 20 1
b 30 50 1
c 50 50 2
c 50 50 3
d 10 10 4
e 1 11 4
f 23 34 4
g 3 37 4
h 13 50 4
h 50 50 5
h 50 50 6
h 50 50 7
h 37 37 8
解释:
row a + b sum up to 50 --> bin_nbr 1
row c is 100 -> split into 2 rows @ 50 -> bin nbr 2, bin_nbr 3
row d,e,f,g sum up to 37 -> bin_nbr 4
I need another 13 from row h to fill in bin_nbr 4 to 50
The rest of the remaining quantity from h will be spitted into 4 bins -> bin_nbr 5, 6, 7, 8