成像有一个数据框,其中缺少很多 transaction_total 和 balance_total 和 date
id,date,transaction_total,balance_total
1,01/01/2019,-1,102
1,01/02/2019,-2,100
1,01/03/2019,-3,
1,01/04/2019,,
1,01/05/2019,-4,
2,01/01/2019,-2,200
2,01/02/2019,-2,100
2,01/04/2019,,
2,01/05/2019,-4,
这是创建输入脚本:
import pandas as pd
import numpy as np
users=pd.DataFrame(
[
{'id':1,'date':'01/01/2019', 'transaction_total':-1, 'balance_total':102},
{'id':1,'date':'01/02/2019', 'transaction_total':-2, 'balance_total':100},
{'id':1,'date':'01/03/2019', 'transaction_total':-3, 'balance_total':''},
{'id':1,'date':'01/04/2019', 'transaction_total':'', 'balance_total':''},
{'id':1,'date':'01/05/2019', 'transaction_total':-4, 'balance_total':''},
{'id':2,'date':'01/01/2019', 'transaction_total':-2, 'balance_total':200},
{'id':2,'date':'01/02/2019', 'transaction_total':-2, 'balance_total':100},
{'id':2,'date':'01/04/2019', 'transaction_total':'', 'balance_total':''},
{'id':2,'date':'01/05/2019', 'transaction_total':-4, 'balance_total':''}
]
)
目标是实现以下目标:
所需的最终输出:
id,date,balance_total
1,01/01/2019,102
1,01/02/2019,100
1,01/03/2019,97
1,01/04/2019,97
1,01/05/2019,93
2,01/01/2019,200
2,01/02/2019,100
2,01/03/2019,97
2,01/04/2019,97
2,01/05/2019,93
(1)如果缺少日期,请用前一个日期的余额填写日期(我认为此链接中的重新索引解决方案可能会起作用Pandas 填充组中缺少的日期和值)
(2)如果有有效的'date'和'transaction_total'时缺少balance_total,则在“balance_total”中填写“上一个日期的balance_total-balance_total缺失时的那一天的transaction_total”(第3行的情况) : 100+ (-3)=97)
(3) 如果有一个有效的日期,但是transaction_total和balance_total都是NaN,只需填写最后一个日期的balance_total(例如第4行:因为根据之前的计算,01/03/2019的total_balance将是97, 2019 年 1 月 4 日余额将为 97,因为没有 transaction_total。)
所需的元数据输出:
id,date,transaction_total,balance_total
1,01/01/2019,-1,102
1,01/02/2019,-2,100
1,01/03/2019,-3,97
1,01/04/2019,0,97
1,01/05/2019,-4,93
2,01/01/2019,-2,200
2,01/02/2019,-2,100
2,01/03/2019,-3,97
2,01/04/2019,,97
2,01/05/2019,-4,93