0

假设我想使用类型提示运行此代码:

def foo(df):
"""A very simple function which only add 3 days to one 
   of the dataframe's datetime columns.
"""
    df['time'] = df['col1'] + pd.Timedelta('3D')
    return df

# Creating a dummy dataframe
n_cols = 3
df = pd.concat([pd.Series(pd.date_range('20200101', '20200105')) for i in 
    range(n_cols)], keys=[f'col{i}' for i in range(n_cols)], axis=1)
df['group'] = [0, 0, 0, 1, 1]
df['name'] = ['s', 'dfgdfgg', 'd', 'd', 's']

# Using koalas groupby.apply mechanism without type hinting
res = ks.DataFrame(df).groupby('group').apply(foo)

原始数据类型:

>>> ks.DataFrame(df).dtypes

col0     datetime64[ns]
col1     datetime64[ns]
col2     datetime64[ns]
group             int64
name             object

如果我按原样运行,则在 groupby.apply 过程之后 dtypes 保持不变

>>> res.dtypes

col0     datetime64[ns]
col1     datetime64[ns]
col2     datetime64[ns]
group             int64
name             object
time     datetime64[ns]

我目前使用类型提示的最佳工作版本是:

def foo(df) -> pd.DataFrame['col1': np.datetime64, 'col2': np.datetime64, 'col3': 
    np.datetime64, 'group': int, 'name': str]:
    df['time'] = df['col1'] + pd.Timedelta('3D')
    return df

res = ks.DataFrame(df).groupby('group').apply(foo)

但是返回的 dtypes 有点不同。

>>> res.dtypes

col1      datetime64
col2      datetime64
col3      datetime64
group          int64
name             <U0

有没有办法获得确切的“datetime64[ns]”和“object”dtypes?

4

0 回答 0