3

我有两个 Pandas TimeSeries:xy,我想“从”同步。我想找到它之前x的最新(按索引)元素中的每个元素y(按索引值)。例如,我想计算这个new_x

x       new_x
----    -----
13:01   13:00  
14:02   14:00

y
----
13:00
13:01
13:30
14:00

我正在寻找矢量化解决方案,而不是 Python 循环。时间值基于 Numpy datetime64。数组的y长度约为数百万,因此 O(n^2) 解决方案可能不实用。

4

2 回答 2

2

在某些圈子中,此操作称为“asof”连接。这是一个实现

def diffCols(df1, df2):
    """ Find columns in df1 not present in df2
    Return df1.columns  - df2.columns maintaining the order which the resulting
    columns appears in df1.

    Parameters:
    ----------
    df1 : pandas dataframe object
    df2 : pandas dataframe objct
    Pandas already offers df1.columns - df2.columns, but unfortunately
    the original order of the resulting columns is not maintained.
    """
    return [i for i in df1.columns if i not in df2.columns]


def aj(df1, df2, overwriteColumns=True, inplace=False):
    """ KDB+ like asof join.
    Finds prevailing values of df2 asof df1's index. The resulting dataframe
    will have same number of rows as df1.

    Parameters
    ----------
    df1 : Pandas dataframe
    df2 : Pandas dataframe
    overwriteColumns : boolean, default True
         The columns of df2 will overwrite the columns of df1 if they have the same
         name unless overwriteColumns is set to False. In that case, this function
         will only join columns of df2 which are not present in df1.
    inplace : boolean, default False.
        If True, adds columns of df2 to df1. Otherwise, create a new dataframe with
        columns of both df1 and df2.

    *Assumes both df1 and df2 have datetime64 index. """
    joiner = lambda x : x.asof(df1.index)
    if not overwriteColumns:
        # Get columns of df2 not present in df1
        cols = diffCols(df2, df1)
        if len(cols) > 0:
            df2 = df2.ix[:,cols]
    result = df2.apply(joiner)
    if inplace:
        for i in result.columns:
            df1[i] = result[i]
        return df1
    else:
        return result

在内部,这使用pandas.Series.asof().

于 2013-01-24T09:34:03.850 回答
1

如何使用Series.searchsorted()返回y插入位置的索引x。然后,您可以从该值中减去一个并将其用于索引y

In [1]: x
Out[1]:
0    1301
1    1402

In [2]: y
Out[2]:
0    1300
1    1301
2    1330
3    1400

In [3]: y[y.searchsorted(x)-1]
Out[3]:
0    1300
3    1400

注意:以上示例使用 int64 系列

于 2013-01-24T14:08:42.303 回答