0

我有一个如下的熊猫数据框。我想通过迭代名为“fields_list”的列表来创建列列表,并分离出以“fields_list”中的列表结尾的列表

import pandas as pd
import numpy as np
import sys
df = pd.DataFrame({'a_balance': [3,4,5,6], 'b_balance': [5,1,1,1]})
df['ah_balance'] = 0
df['a_agg_balance'] = 0
df['b_agg_balance'] = 0
df

a_balance   b_balance   ah_balance  a_agg_balance   b_agg_balance
3           5           0           0               0
4           1           0           0               0
5           1           0           0               0
6           1           0           0               0

fields_list =   [ ['<val>','_balance'],['<val_class>','_agg_balance']]
fields_list
[['<val>', '_balance'], ['<val_class>', '_agg_balance']]

for i,field in fields_list:
    df_final= [col for col in df if col.endswith(field)]
    print("df_final" ,df_final)

我尝试了上面的代码,但是当它迭代 fields_list 的第一个元素(即'','_balance')时,它还包括以'_agg_balance'结尾的元素,因此我得到以下结果

df_final ['a_balance', 'b_balance', 'ah_balance', 'a_agg_balance', 'b_agg_balance']
df_final ['a_agg_balance', 'b_agg_balance']

我的预期输出是

df_final ['a_balance', 'b_balance', 'ah_balance']
df_final ['a_agg_balance', 'b_agg_balance']
4

1 回答 1

0

您可以对正在查看的后缀进行排序,并从最长的后缀开始。当您找到与后缀匹配的列时,将其从您需要查看的列集中删除:

fields_list =   [ ['<val>','_balance'],['<val_class>','_agg_balance']]

sorted_list = sorted(fields_list, key=lambda x: len(x[1]), reverse = True)
sorted_suffixes = [x[1] for x in sorted_list]

col_list = set(df.columns)
for suffix in sorted_suffixes:

    forecast_final_fields = [col for col in col_list if col.endswith(suffix)]
    col_list.difference_update(forecast_final_fields)
    print("forecast_final_fields" ,forecast_final_fields)

结果是

forecast_final_fields ['a_agg_balance', 'b_agg_balance']
forecast_final_fields ['ah_balance', 'a_balance', 'b_balance']
于 2020-05-26T19:54:31.313 回答