我需要根据组中的非空值将字符串应用于组。一个例子是:
ID name surname prsn_id
A john smith prsn_01
A john smith NaN
A john smith NaN
A john smith NaN
B mary jane prsn_02
B mary jane NaN
B mary jane NaN
B mary jane NaN
B mary jane NaN
B mary jane NaN
B mary jane NaN
C Barry willis prsn_03
C Barry willis Nan
C Barry willis Nan
C Barry willis Nan
C Barry willis Nan
输出应该是:
ID name surname prsn_id
A john smith prsn_01
A john smith prsn_01
A john smith prsn_01
A john smith prsn_01
B mary jane prsn_02
B mary jane prsn_02
B mary jane prsn_02
B mary jane prsn_02
B mary jane prsn_02
B mary jane prsn_02
B mary jane prsn_02
C Barry willis prsn_03
C Barry willis prsn_03
C Barry willis prsn_03
C Barry willis prsn_03
C Barry willis prsn_03
或者:
ID name surname prsn_id prsn_id_2
A john smith prsn_01 NaN
A john smith NaN prsn_01
A john smith NaN prsn_01
A john smith NaN prsn_01
B mary jane prsn_02 NaN
B mary jane NaN prsn_02
B mary jane NaN prsn_02
B mary jane NaN prsn_02
B mary jane NaN prsn_02
B mary jane NaN prsn_02
B mary jane NaN prsn_02
C Barry willis prsn_03 NaN
C Barry willis Nan prsn_03
C Barry willis Nan prsn_03
C Barry willis Nan prsn_03
C Barry willis Nan prsn_03
我努力了:
df['prsn_id_2'] = (df
.groupby(['ID', 'name', 'surname'])['prsn_id']
.fillna(method='ffill'))
这可能会奏效,但它需要时间,因此未来不会很实用。我需要另一种矢量化且相对快速的解决方案。