python - pandas：识别 DataFrame 中的“部分”

Question

我有一个看起来或多或少像这样的数据框：

import pandas as pd
df = pd.DataFrame([list('AAABBBAAA')]).T
df.columns = [ 'type']
print(df)

   type
0     A
1     A
2     A
3     B
4     B
5     B
6     B
7     A
8     A
9     A
10    B

假设我的 DataFrame 已经排序，我的目标是沿着“类型”列识别“连续性”；我会很高兴这样的事情：

   type     portion_ID
0     A             A0
1     A             A0
2     A             A0
3     B             B0
4     B             B0
5     B             B0
6     B             B0
7     A             A1
8     A             A1
9     A             A1
10    B             B1

我想像

df['portion_ID'] = g['type'].apply(lambda s: s + some_magics())

会成功的，但我在任何地方都没有找到“some_magic()”:-)

提前致谢

score 2 · Accepted Answer

我首先想到的是你可以在一个对象中保持状态：

class State(object):
    def __init__(self):
        self.current = None
        self.current_label = None
        self.types = {}

def func(row, state):
    t = row['type']
    if state.current != t:
        state.current = t
        state.types[t] = state.types.get(t, -1) + 1
        state.current_label = t + str(state.types[t])
    return state.current_label

>>> df.apply(func, args=(State(),), axis=1)
0     A0
1     A0
2     A0
3     B0
4     B0
5     B0
6     B0
7     A1
8     A1
9     A1
10    B1
dtype: object

如果状态应该更改，您还可以计算包含信息的列，然后仅将字典作为状态传递：

df['change'] = ~ (df == df.shift())
def func(row, state):
    t = row['type']
    if row['change']:
        state[t] = state.get(t, -1) + 1
    return t + str(state[t])
df.apply(func, args=({},), axis=1)

python - pandas：识别 DataFrame 中的“部分”

1 回答 1

Related

Reference