python - 根据时间将来自一个 pandas 数据帧的条目关联到第二个数据帧

Question

我有两个熊猫数据框。一个包含我通常的测量值（时间索引）。来自不同来源的第二帧包含系统状态。它也是时间索引的，但是状态数据框中的时间与我的数据框的时间与测量值不匹配。我想要实现的是，现在测量数据框中的每一行还包含测量时间之前状态数据框中出现的最后一个状态。

例如，我有一个这样的状态框架：

                                          state
time                                           
2013-02-14 12:29:37.101000          SystemReset
2013-02-14 12:29:39.103000             WaitFace
2013-02-14 12:29:39.103000      NormalExecution
2013-02-14 12:29:39.166000        GreetVisitors
2013-02-14 12:29:46.879000  AskForParticipation
2013-02-14 12:29:56.807000  IntroduceVernissage
2013-02-14 12:30:07.275000      PictureQuestion

我的测量是这样的：

                            utime
time
2013-02-14 12:29:38.697038      0
2013-02-14 12:29:38.710432      1
2013-02-14 12:29:39.106475      2
2013-02-14 12:29:39.200701      3
2013-02-14 12:29:40.197014      0
2013-02-14 12:29:42.217976      5
2013-02-14 12:29:57.460601      7

我想最终得到一个这样的数据框：

                            utime                 state
time
2013-02-14 12:29:38.697038      0           SystemReset
2013-02-14 12:29:38.710432      1           SystemReset
2013-02-14 12:29:39.106475      2       NormalExecution
2013-02-14 12:29:39.200701      3         GreetVisitors
2013-02-14 12:29:40.197014      0         GreetVisitors
2013-02-14 12:29:42.217976      5         GreetVisitors
2013-02-14 12:29:57.460601      7   Introducevernissage

我发现了一个非常低效的解决方案，如下所示：

result = measurements.copy()
stateList = []
for timestamp, _ in measurements.iterrows():
    candidateStates = states.truncate(after=timestamp).tail(1)
    if len(candidateStates) > 0:
        stateList.append(candidateStates['state'].values[0])
    else:
        stateList.append("unknown")

result['state'] = stateList

你有什么方法可以优化这个吗？

score 2 · Accepted Answer

也许像

df = df1.join(df2, how='outer')
df['state'].fillna(method='ffill',inplace=True)
df.dropna()

会工作？产品join：

>>> df
                                          state  utime
time                                                  
2013-02-14 12:29:37.101000          SystemReset    NaN
2013-02-14 12:29:38.697038                  NaN      0
2013-02-14 12:29:38.710432                  NaN      1
2013-02-14 12:29:39.103000             WaitFace    NaN
2013-02-14 12:29:39.103000      NormalExecution    NaN
2013-02-14 12:29:39.106475                  NaN      2
2013-02-14 12:29:39.166000        GreetVisitors    NaN
2013-02-14 12:29:39.200701                  NaN      3
2013-02-14 12:29:40.197014                  NaN      0
2013-02-14 12:29:42.217976                  NaN      5
2013-02-14 12:29:46.879000  AskForParticipation    NaN
2013-02-14 12:29:56.807000  IntroduceVernissage    NaN
2013-02-14 12:29:57.460601                  NaN      7
2013-02-14 12:30:07.275000      PictureQuestion    NaN

然后我们可以向前填充状态列：

>>> df['state'].fillna(method='ffill',inplace=True)
time
2013-02-14 12:29:37.101000            SystemReset
2013-02-14 12:29:38.697038            SystemReset
2013-02-14 12:29:38.710432            SystemReset
2013-02-14 12:29:39.103000               WaitFace
2013-02-14 12:29:39.103000        NormalExecution
2013-02-14 12:29:39.106475        NormalExecution
2013-02-14 12:29:39.166000          GreetVisitors
2013-02-14 12:29:39.200701          GreetVisitors
2013-02-14 12:29:40.197014          GreetVisitors
2013-02-14 12:29:42.217976          GreetVisitors
2013-02-14 12:29:46.879000    AskForParticipation
2013-02-14 12:29:56.807000    IntroduceVernissage
2013-02-14 12:29:57.460601    IntroduceVernissage
2013-02-14 12:30:07.275000        PictureQuestion
Name: state

然后删除没有 utime 的行：

>>> df.dropna()
                                          state  utime
time                                                  
2013-02-14 12:29:38.697038          SystemReset      0
2013-02-14 12:29:38.710432          SystemReset      1
2013-02-14 12:29:39.106475      NormalExecution      2
2013-02-14 12:29:39.200701        GreetVisitors      3
2013-02-14 12:29:40.197014        GreetVisitors      0
2013-02-14 12:29:42.217976        GreetVisitors      5
2013-02-14 12:29:57.460601  IntroduceVernissage      7

您可能需要对其进行调整以处理同时拥有 utime 和（可能是多个）状态的情况。可能会这样做drop_duplicates。take_last=True在我早上喝咖啡之前，您还必须比我更努力地思考问<题<=。

python - 根据时间将来自一个 pandas 数据帧的条目关联到第二个数据帧

1 回答 1

Related

Reference