1

我有多个股票的分钟数据数据框,每只股票都有多个会话。请参阅下面的示例

         Symbol            Time     Open    High    Low  Close  Volume  LOD
2724312   AEHR 2019-09-23 09:31:00   1.42   1.42   1.42   1.42     200  NaN
2724313   AEHR 2019-09-23 09:43:00   1.35   1.35   1.34   1.34    6062  NaN
2724314   AEHR 2019-09-23 09:58:00   1.35   1.35   1.29   1.30    8665  NaN
2724315   AEHR 2019-09-23 09:59:00   1.32   1.32   1.32   1.32     100  NaN
2724316   AEHR 2019-09-23 10:00:00   1.35   1.35   1.35   1.35     400  NaN
...        ...                 ...    ...    ...    ...    ...     ...  ...
4266341     ZI 2021-09-10 15:56:00  63.08  63.16  63.08  63.15   18205  NaN
4266342     ZI 2021-09-10 15:57:00  63.14  63.14  63.07  63.07   19355  NaN
4266343     ZI 2021-09-10 15:58:00  63.07  63.12  63.07  63.10   16650  NaN
4266344     ZI 2021-09-10 15:59:00  63.09  63.12  63.06  63.11   25775  NaN
4266345     ZI 2021-09-10 16:00:00  63.11  63.17  63.11  63.17   28578  NaN

我需要会话(9:30-4pm)到每一行的时间的最低日(LOD)。

完成的 df 应该是这样的

         Symbol            Time     Open    High    Low  Close  Volume  LOD
2724312   AEHR 2019-09-23 09:31:00   1.42   1.42   1.42   1.42     200  1.42   
2724313   AEHR 2019-09-23 09:43:00   1.35   1.35   1.34   1.34    6062  1.34   
2724314   AEHR 2019-09-23 09:58:00   1.35   1.35   1.29   1.30    8665  1.29   
2724315   AEHR 2019-09-23 09:59:00   1.32   1.32   1.32   1.32     100  1.29   
2724316   AEHR 2019-09-23 10:00:00   1.35   1.35   1.35   1.35     400  1.29   
...        ...                 ...    ...    ...    ...    ...     ...  ...
4266341     ZI 2021-09-10 15:56:00  63.08  63.16  63.08  63.15   18205  63.08  
4266342     ZI 2021-09-10 15:57:00  63.14  63.14  63.07  63.07   19355  63.07  
4266343     ZI 2021-09-10 15:58:00  63.07  63.12  63.07  63.10   16650  63.07  
4266344     ZI 2021-09-10 15:59:00  63.09  63.12  63.06  63.11   25775  63.06  
4266345     ZI 2021-09-10 16:00:00  63.11  63.17  63.11  63.17   28578  63.06 

我目前的解决方案

prev_symbol = "WXYZ"
prev_low = 10000000
prev_session = datetime.date(1920, 1, 1)
session_start = 1

for i, row in df.iterrows():
    current_session = (df['Time'].iloc[i]).time()
    current_symbol = df['Symbol'].iloc[i]
    if current_symbol == prev_symbol:
        if current_session == prev_session:
            sesh_low = df.iloc[session_start:i, 'Low'].min()
            df.at[i, 'LOD'] = sesh_low
        else:
            df.at[i, 'LOD'] = df.at[i, 'Low']
            prev_session = current_session
            session_start = i
    else:
        df.at[i, 'LOD'] = df.at[i, 'Low']
        prev_symbol = current_symbol
        prev_session = current_session
        session_start = i

这将返回一个SettingWithCopyWarning错误。请帮忙

4

1 回答 1

0

你可以试试.groupby()+ .expanding()

# if you have values already converted/sorted, skip:
# df["Time"] = pd.to_datetime(df["Time"])
# df = df.sort_values(by=["Symbol", "Time"])

df["LOD"] = df.groupby("Symbol")["Low"].expanding().min().values
print(df)

印刷:

        Symbol                 Time   Open   High    Low  Close  Volume    LOD
2724312   AEHR  2019-09-23 09:31:00   1.42   1.42   1.42   1.42     200   1.42
2724313   AEHR  2019-09-23 09:43:00   1.35   1.35   1.34   1.34    6062   1.34
2724314   AEHR  2019-09-23 09:58:00   1.35   1.35   1.29   1.30    8665   1.29
2724315   AEHR  2019-09-23 09:59:00   1.32   1.32   1.32   1.32     100   1.29
2724316   AEHR  2019-09-23 10:00:00   1.35   1.35   1.35   1.35     400   1.29
4266341     ZI  2021-09-10 15:56:00  63.08  63.16  63.08  63.15   18205  63.08
4266342     ZI  2021-09-10 15:57:00  63.14  63.14  63.07  63.07   19355  63.07
4266343     ZI  2021-09-10 15:58:00  63.07  63.12  63.07  63.10   16650  63.07
4266344     ZI  2021-09-10 15:59:00  63.09  63.12  63.06  63.11   25775  63.06
4266345     ZI  2021-09-10 16:00:00  63.11  63.17  63.11  63.17   28578  63.06
于 2021-09-11T18:56:42.067 回答