python - Pandas DataFrame 中布尔行的矢量化“真”值范围

Question

所以我有一个熊猫数据框，它有一个看起来像这样的布尔列

    IS_TRUE
           
0      True
1      True
2      True
3      True
4      True
5     False
6     False
7     False
8      True
9      True
10    False
11    False
12     True
13     True
14     True
15     False
.
.
.
9000   False

我想知道是否有一种矢量化方式来获取该列为IS_TRUE真的范围。所以在这种情况下，它会类似于[(0,4),(8,9),(12,14)]（包括）。它可以是独家的，我真的不认为这是一个问题。

我当然可以在列上运行一个 for 循环......但我只是好奇是否有更快的方法

score 1 · Accepted Answer

让我们做cumsum

df = df.reset_index()
s = (~df['IS_TRUE']).cumsum()
out = df[df['IS_TRUE']].groupby(s)['index'].agg(['min','max'])
Out[16]: 
         min  max
IS_TRUE          
0          0    4
3          8    9
5         12   14

l = out.values.tolist()
Out[18]: [[0, 4], [8, 9], [12, 14]]

score 1 · Accepted Answer

您可以使用 diff() 来识别您的系列何时从 True 切换为 False。该索引将包含切换点。

a= iter(df[df['IS_TRUE'].diff().fillna(df['IS_TRUE'][0])].index.tolist()+[len(df)])
print([(el1, el2-1) for el1,el2 in zip(a,a) ])

输出：

[(0, 4), (8, 9), (12, 14)]

python - Pandas DataFrame 中布尔行的矢量化“真”值范围

2 回答 2

Related

Reference