pandas - Pandas Cumsum 在扩展行中

Question

希望学习如何以更优雅的方式编写此解决方案。需要将一组行拆分为较小的部分并控制利用率以及计算余额。当前的解决方案无法正确生成余额

import pandas as pd
import numpy as np

box_list = [['Box0', 0.2],
               ['Box1', 1.0],
               ['Box2', 1.8],
               ['Box4', 2.0],
               ['Box8', 4.01],]
  
sdf = pd.DataFrame(box_list, columns = ['Name', 'Size'])

print(sdf)

	姓名	尺寸
1	盒子1	1.00
2	方框2	1.80
3	方框4	2.00
4	Box8	4.01

df = pd.DataFrame({'Name': np.repeat(sdf['Name'], sdf['Size'].apply(np.ceil)),
                    'Size': np.repeat(sdf['Size'], sdf['Size'].apply(np.ceil)),})

df['Max_Units']=df['Size'].apply(lambda x: np.ceil(x) if x>1.0 else 1.0) 
df = df.reset_index()
df['Utilization'] =df['Size'].apply(lambda x: x-int(x) if x>1.0 else (x if x<1.0 else 1.0))  
df['Balance'] =df['Max_Units'] 

g = df.groupby(['index'], as_index=0, group_keys=0)

df['Utilization'] = g.apply(lambda x: 
                           pd.Series(np.where((x.Balance.shift(1) >= 1.0), 
                           1.0, 
                           x.Utilization))).values
df.loc[(df.Utilization == 0.0), ['Utilization']] = 1.0

df['Balance'] = g.apply(lambda x: 
                           pd.Series(np.where((x.Balance.shift(1) >= 1.0), 
                           x.Max_Units-x.Utilization, 
                           0))).values
print(df)

	指数	姓名	尺寸	Max_Units	利用率	平衡
0	0	盒子0	0.20	1.0	0.20	0.0
1	1	盒子1	1.00	1.0	1.00	0.0
2	2	方框2	1.80	2.0	0.80	0.0
3	2	方框2	1.80	2.0	1.00	1.0
4	3	方框4	2.00	2.0	1.00	0.0
5	3	方框4	2.00	2.0	1.00	1.0
6	4	Box8	4.01	5.0	0.01	0.0
7	4	Box8	4.01	5.0	1.00	4.0
8	4	Box8	4.01	5.0	1.00	4.0
9	4	Box8	4.01	5.0	1.00	4.0
10	4	Box8	4.01	5.0	1.00	4.0

score 0 · Accepted Answer

我不确定我是否完全理解所有这些值应该代表什么。

但是，我已经以更直接的方式为您的样本集实现了正确的期望输出：

import pandas as pd
import numpy as np

box_list = [['Box0', 0.2],
            ['Box1', 1.0],
            ['Box2', 1.8],
            ['Box4', 2.0],
            ['Box8', 4.01], ]

df = pd.DataFrame(box_list, columns=['Name', 'Size'])

# Set ceil column to ceil of size since it's used more than once
df['ceil'] = df['Size'].apply(np.ceil)

# Duplicate Rows based on Ceil of Size
df = df.loc[df.index.repeat(df['ceil'])]

# Get Max Units by comparing it to the ceil column
df['Max_Units'] = df.apply(lambda s: max(s['ceil'], 1), axis=1)

# Extract Decimal Portion By Using % 1 (Catch Special Case of x == 1)
df['Utilization'] = df['Size'].apply(lambda x: 1 if x == 1 else x % 1)

# Everywhere Max_Units cumcount is not 0 set Utilization to 1
df.loc[df.groupby(df['Max_Units']).cumcount().ne(0), 'Utilization'] = 1

# Set Balance to index cumcount as float
df['Balance'] = df.groupby(df.index).cumcount().astype(float)

# Drop Unnecessary Column and reset index for output
df = df.drop(columns=['ceil']).reset_index()

# For Display
print(df)

输出：

	指数	姓名	尺寸	Max_Units	利用率	平衡
0	0	盒子0	0.20	1.0	0.20	0.0
1	1	盒子1	1.00	1.0	1.00	0.0
2	2	方框2	1.80	2.0	0.80	0.0
3	2	方框2	1.80	2.0	1.00	1.0
4	3	方框4	2.00	2.0	1.00	0.0
5	3	方框4	2.00	2.0	1.00	1.0
6	4	Box8	4.01	5.0	0.01	0.0
7	4	Box8	4.01	5.0	1.00	1.0
8	4	Box8	4.01	5.0	1.00	2.0
9	4	Box8	4.01	5.0	1.00	3.0
10	4	Box8	4.01	5.0	1.00	4.0

pandas - Pandas Cumsum 在扩展行中

1 回答 1

Related

Reference