python - pandas 为每个 DatetimeIndex 条目获取第一个过滤行的有效方法

Question

我有一个具有以下结构的 DataFrame：

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 3333 entries, 2000-01-03 00:00:00+00:00 to 2012-11-21 00:00:00+00:00
Data columns:
open          3333  non-null values
high          3333  non-null values
low           3333  non-null values
close         3333  non-null values
volume        3333  non-null values
amount        3333  non-null values
pct_change    3332  non-null values
dtypes: float64(7)

该pct_change列包含百分比变化数据。

从上面的 DataFrame 中给定一个过滤的 DatetimeIndex：

<class 'pandas.tseries.index.DatetimeIndex'>
[2000-03-01 00:00:00, ..., 2012-11-01 00:00:00]
Length: 195, Freq: None, Timezone: UTC

我想过滤开始每个日期条目并返回pct_change列低于 0.015 的第一行。

我想出了这个解决方案，但它很慢：

stops = []
#dates = DatetimeIndex
for d in dates:
    #check if pct_change is below -0.015 starting from date of signal. return date of first match
    match = df[df["pct_change"] < -0.015].ix[d:][:1].index

    stops.append([df.ix[d]["close"], df.ix[match]["close"].values[0]])

关于如何改进这一点的任何建议？

score 2 · Accepted Answer

您可能会发现将索引提取为列并使用applyand会更快bfill。
像这样的东西：

df['datetime'] = df.index
df['stops'] = df.apply(lambda x: x['datetime']
                                 if x['pct_change'] < -0.015
                                 else np.nan,
                        axis=1)
df['stops'] = df['stops'].bfill()

score 2 · Accepted Answer

这个怎么样：

result = df[df.pct_change < -0.015].reindex(filtered_dates, method='bfill')

唯一的问题是，如果一个区间不包含低于 -0.015 的值，它将从未来的区间中检索一个。如果添加包含日期的列，您可以看到每行的来源时间，如果检索到的时间戳超过下一个“bin 边缘”，则将 rows 设置为 NA。

python - pandas 为每个 DatetimeIndex 条目获取第一个过滤行的有效方法

2 回答 2

Related

Reference