假设我想使用类型提示运行此代码:
def foo(df):
"""A very simple function which only add 3 days to one
of the dataframe's datetime columns.
"""
df['time'] = df['col1'] + pd.Timedelta('3D')
return df
# Creating a dummy dataframe
n_cols = 3
df = pd.concat([pd.Series(pd.date_range('20200101', '20200105')) for i in
range(n_cols)], keys=[f'col{i}' for i in range(n_cols)], axis=1)
df['group'] = [0, 0, 0, 1, 1]
df['name'] = ['s', 'dfgdfgg', 'd', 'd', 's']
# Using koalas groupby.apply mechanism without type hinting
res = ks.DataFrame(df).groupby('group').apply(foo)
原始数据类型:
>>> ks.DataFrame(df).dtypes
col0 datetime64[ns]
col1 datetime64[ns]
col2 datetime64[ns]
group int64
name object
如果我按原样运行,则在 groupby.apply 过程之后 dtypes 保持不变
>>> res.dtypes
col0 datetime64[ns]
col1 datetime64[ns]
col2 datetime64[ns]
group int64
name object
time datetime64[ns]
我目前使用类型提示的最佳工作版本是:
def foo(df) -> pd.DataFrame['col1': np.datetime64, 'col2': np.datetime64, 'col3':
np.datetime64, 'group': int, 'name': str]:
df['time'] = df['col1'] + pd.Timedelta('3D')
return df
res = ks.DataFrame(df).groupby('group').apply(foo)
但是返回的 dtypes 有点不同。
>>> res.dtypes
col1 datetime64
col2 datetime64
col3 datetime64
group int64
name <U0
有没有办法获得确切的“datetime64[ns]”和“object”dtypes?