我一直在努力解决这个问题并且无法解决它,我得到了当前的数据框:
import databricks.koalas as ks
x = ks.DataFrame.from_records(
{'ds': {0: Timestamp('2018-10-06 00:00:00'),
1: Timestamp('2017-06-08 00:00:00'),
2: Timestamp('2018-10-22 00:00:00'),
3: Timestamp('2017-02-08 00:00:00'),
4: Timestamp('2019-02-03 00:00:00'),
5: Timestamp('2019-02-26 00:00:00'),
6: Timestamp('2017-04-15 00:00:00'),
7: Timestamp('2017-07-02 00:00:00'),
8: Timestamp('2017-04-04 00:00:00'),
9: Timestamp('2017-03-20 00:00:00'),
10: Timestamp('2018-06-09 00:00:00'),
11: Timestamp('2017-01-15 00:00:00'),
12: Timestamp('2018-05-07 00:00:00'),
13: Timestamp('2018-01-17 00:00:00'),
14: Timestamp('2017-07-11 00:00:00'),
15: Timestamp('2018-12-17 00:00:00'),
16: Timestamp('2018-12-05 00:00:00'),
17: Timestamp('2017-05-22 00:00:00'),
18: Timestamp('2017-08-13 00:00:00'),
19: Timestamp('2018-05-21 00:00:00')},
'store': {0: 81,
1: 128,
2: 81,
3: 128,
4: 25,
5: 128,
6: 11,
7: 124,
8: 43,
9: 25,
10: 25,
11: 124,
12: 124,
13: 128,
14: 81,
15: 11,
16: 124,
17: 11,
18: 167,
19: 128},
'stock': {0: 1,
1: 236,
2: 3,
3: 9,
4: 36,
5: 78,
6: 146,
7: 20,
8: 12,
9: 12,
10: 15,
11: 25,
12: 10,
13: 7,
14: 0,
15: 230,
16: 80,
17: 6,
18: 110,
19: 8},
'sells': {0: 1.0,
1: 17.0,
2: 1.0,
3: 2.0,
4: 1.0,
5: 2.0,
6: 7.0,
7: 1.0,
8: 1.0,
9: 1.0,
10: 2.0,
11: 1.0,
12: 1.0,
13: 1.0,
14: 1.0,
15: 1.0,
16: 1.0,
17: 3.0,
18: 2.0,
19: 1.0}}
)
以及我想在 groupby 中使用的这个功能 - 应用:
import numpy as np
def compute_indicator(df):
return (
df.copy()
.assign(
indicator=lambda x: x['a'] < np.percentile(x['b'], 80)
)
.astype(int)
.fillna(1)
)
其中 df 是一个熊猫数据框。如果我使用 pandas 进行分组应用,代码将按预期执行:
import pandas as pd
# This runs
a = pd.DataFrame.from_dict(x.to_dict()).groupby('store').apply(compute_indicator)
但是当试图在考拉上运行同样的程序时,它给了我以下错误:ValueError: cannot insert store, already exists
x.groupby('store').apply(compute_indicator)
# ValueError: cannot insert store, already exists
我不能使用输入注释,compute_indicator
因为某些列不是固定的(它们与数据框一起移动,旨在供其他转换使用)。
在考拉中运行代码应该怎么做?