我正在寻找处理 Pandas DataFrame 的 Pythonic 方式。假设我的 DataFrame 如下所示:
帐户 | 阶段 | 杰出的 | 分期付款 | EIR |
---|---|---|---|---|
一个 | 1 | 10000 | 100 | 0.07 |
乙 | 2 | 50000 | 500 | 0.04 |
C | 3 | 10000 | 100 | 0.07 |
我正在尝试根据给定的信息分阶段制作摊销表。例如:
Account A Stage 1 will be amortized for 12 months
Account B Stage 2 will be amortized until Outstanding = 0 (or close to 0)
Account C Stage 3 will NOT be amortized
我有 SAS 代码来执行下面前面解释的这种逻辑:
data want;
set have;
if Stage = 1 then do;
do Term = 1 to 12;
Outstanding = Outstanding - (abs(Installment) - (Outstanding * EIR / 100 / 12));
if Outstanding < 0 then delete;
output;
end;
end;
else if Stage = 2 then do;
do Term = 1 to Term;
Outstanding = Outstanding - (abs(Installment) - (Outstanding * EIR / 100 / 12));
if Outstanding < 0 then delete;
output;
end;
end;
else if Stage = 3 then do;
Outstanding = Outstanding;
output;
end;
run;
运行后代码将提供如下输出表(数字只是模型):
帐户 | 阶段 | 杰出的 | 分期付款 | EIR | 学期 |
---|---|---|---|---|---|
一个 | 1 | 10000 | 100 | 0.07 | 1 |
一个 | 1 | 9000 | 100 | 0.07 | 2 |
一个 | 1 | 8000 | 100 | 0.07 | 3 |
一个 | 1 | ... | ... | ... | ... |
一个 | 1 | 2000 | 100 | 0.07 | 12 |
乙 | 2 | 50000 | 500 | 0.04 | 1 |
乙 | 2 | 49000 | 500 | 0.04 | 2 |
乙 | 2 | 48000 | 500 | 0.04 | 3 |
乙 | 2 | ... | ... | ... | ... |
乙 | 2 | 125 | 500 | 0.04 | 48 |
C | 3 | 10000 | 100 | 0.07 | 1 |
我有相同的python代码,但我认为它效率不高。我有类似的东西:
# Amortization function
def balances(rate, payment, os):
interestAmount = os * rate / 100 / 12
nextBalance = os + interestAmount - payment
return nextBalance
然后,我使用for循环逐个帐户调用函数,并用于np.repeat()
重复我需要的信息。
result = []
for i, account in enumerate(df['Account']):
if i % 5000 == 0:
print(f'Calcultion account: {i}')
accountTable = df[df['Account'] == account]
rate = float(accountTable['EIR'])
payment = float(accountTable['Installment'])
amount = float(accountTable['Outstanding'])
if int(accountTable['Stage']) <= 2:
while amount > 0:
amount = balances(rate, payment, amount)
amortization.append(amount)
if amortization[-1] <= 0:
amortization.pop(-1)
amortizationTable = pd.DataFrame(np.repeat(accountTable.values, len(amortization), axis = 0), columns = accountTable.columns)
amortizationTable['Outstanding'] = amortization
amortizationTable['Term'] = amortizationTable.index + 1
result.append(amortizationTable)
与 SAS 编程相比,我发现它非常慢。任何提高速度或使其更pythonic方式的建议。
谢谢你。