2

我有一个带有 datetimeindex 的数据框。我只需要索引属于列表中指定日期的那些行,例如周一和周二的 [1,2]。这可以在熊猫中用一行代码实现吗?

4

2 回答 2

3

IIUC 那么以下应该起作用:

df[df.index.to_series().dt.dayofweek.isin([0,1])]

例子:

In [9]:
df = pd.DataFrame(index=pd.date_range(start=dt.datetime(2015,1,1), end = dt.datetime(2015,2,1)))
df[df.index.to_series().dt.dayofweek.isin([0,1])]

Out[9]:
Empty DataFrame
Columns: []
Index: [2015-01-05 00:00:00, 2015-01-06 00:00:00, 2015-01-12 00:00:00, 2015-01-13 00:00:00, 2015-01-19 00:00:00, 2015-01-20 00:00:00, 2015-01-26 00:00:00, 2015-01-27 00:00:00]

所以这将转换DateTimeIndex为 aSeries以便我们可以调用isin来测试成员资格,使用.dt.dayofweek和传递0,1(这对应于星期一和星期二),我们使用布尔掩码来屏蔽索引

另一种方法是构造一个布尔掩码而不转换为Series

In [12]:
df[(df.index.dayofweek == 0) | (df.index.dayofweek == 1)]

Out[12]:
Empty DataFrame
Columns: []
Index: [2015-01-05 00:00:00, 2015-01-06 00:00:00, 2015-01-12 00:00:00, 2015-01-13 00:00:00, 2015-01-19 00:00:00, 2015-01-20 00:00:00, 2015-01-26 00:00:00, 2015-01-27 00:00:00]

或者实际上这会起作用:

In [13]:
df[df.index.dayofweek < 2]

Out[13]:
Empty DataFrame
Columns: []
Index: [2015-01-05 00:00:00, 2015-01-06 00:00:00, 2015-01-12 00:00:00, 2015-01-13 00:00:00, 2015-01-19 00:00:00, 2015-01-20 00:00:00, 2015-01-26 00:00:00, 2015-01-27 00:00:00]

时间安排

In [14]:
%timeit df[df.index.dayofweek < 2]
%timeit df[np.in1d(df.index.dayofweek, [1, 2])]

1000 loops, best of 3: 464 µs per loop
1000 loops, best of 3: 521 µs per loop

所以我的最后一种方法在这里比np方法稍微快一点

于 2015-11-23T20:01:02.013 回答
3

你可以试试这个:

In [3]: import pandas as pd
In [4]: import numpy as np

In [5]: index = pd.date_range('11/23/2015', end = '11/30/2015', freq='d')
In [6]: df = pd.DataFrame(np.random.randn(len(index),2),columns=list('AB'),index=index)

In [7]: df
Out[7]:
                   A         B
2015-11-23 -0.673626 -1.009921
2015-11-24 -1.288852 -0.338795
2015-11-25 -1.414042 -0.767050
2015-11-26  0.018223 -0.726230
2015-11-27 -1.288709 -1.144437
2015-11-28  0.121093  1.396825
2015-11-29 -0.791611 -1.014375
2015-11-30  1.223220 -1.223499


In [8]: df[np.in1d(df.index.dayofweek, [1, 2])]
Out[8]:
                   A         B
2015-11-24  0.116678 -0.715655
2015-11-25 -1.494921  0.218176

1 实际上是星期二。但如果需要,这应该很容易解释。

之前的答案是在写这篇文章时发布的,作为比较:

In [15]: %timeit df.loc[df.index.to_series().dt.dayofweek.isin([0,1]).values]
100 loops, best of 3: 2.01 ms per loop

In [16]: %timeit df[np.in1d(df.index.dayofweek, [0, 1])]
1000 loops, best of 3: 393 µs per loop

请注意,此比较是在我创建的测试 DF 上完成的,我不知道它如何必然扩展到更大的数据帧,尽管性能应该是一致的。

于 2015-11-23T20:05:54.490 回答