0

我正在寻找处理 Pandas DataFrame 的 Pythonic 方式。假设我的 DataFrame 如下所示:

帐户 阶段 杰出的 分期付款 EIR
一个 1 10000 100 0.07
2 50000 500 0.04
C 3 10000 100 0.07

我正在尝试根据给定的信息分阶段制作摊销表。例如:

Account A Stage 1 will be amortized for 12 months
Account B Stage 2 will be amortized until Outstanding = 0 (or close to 0)
Account C Stage 3 will NOT be amortized

我有 SAS 代码来执行下面前面解释的这种逻辑:

data want;
set have;

if Stage = 1 then do;
    do Term = 1 to 12;
        Outstanding = Outstanding - (abs(Installment) - (Outstanding * EIR / 100 / 12));
        if Outstanding < 0 then delete;
        output;
        end;
    end;

else if Stage = 2 then do;
    do Term = 1 to Term;
        Outstanding = Outstanding - (abs(Installment) - (Outstanding * EIR / 100 / 12));
        if Outstanding < 0 then delete;
        output;
        end;
    end;

else if Stage = 3 then do;
    Outstanding = Outstanding;
    output;
    end;

run;

运行后代码将提供如下输出表(数字只是模型):

帐户 阶段 杰出的 分期付款 EIR 学期
一个 1 10000 100 0.07 1
一个 1 9000 100 0.07 2
一个 1 8000 100 0.07 3
一个 1 ... ... ... ...
一个 1 2000 100 0.07 12
2 50000 500 0.04 1
2 49000 500 0.04 2
2 48000 500 0.04 3
2 ... ... ... ...
2 125 500 0.04 48
C 3 10000 100 0.07 1

我有相同的python代码,但我认为它效率不高。我有类似的东西:

# Amortization function
def balances(rate, payment, os):
    interestAmount = os * rate / 100 / 12   
    nextBalance = os + interestAmount - payment
    return nextBalance

然后,我使用for循环逐个帐户调用函数,并用于np.repeat()重复我需要的信息。

result = []
for i, account in enumerate(df['Account']):
    if i % 5000 == 0:
        print(f'Calcultion account: {i}')
    accountTable = df[df['Account'] == account]
    rate = float(accountTable['EIR'])
    payment = float(accountTable['Installment'])
    amount = float(accountTable['Outstanding'])

    if int(accountTable['Stage']) <= 2:
        while amount > 0:
            amount = balances(rate, payment, amount)
            amortization.append(amount)
            if amortization[-1] <= 0:
                amortization.pop(-1)
            amortizationTable = pd.DataFrame(np.repeat(accountTable.values, len(amortization), axis = 0), columns = accountTable.columns)
            amortizationTable['Outstanding'] = amortization
            amortizationTable['Term'] = amortizationTable.index + 1
            result.append(amortizationTable)

与 SAS 编程相比,我发现它非常慢。任何提高速度或使其更pythonic方式的建议。

谢谢你。

4

1 回答 1

0

尝试这个:

import pandas as pd
df = pd.DataFrame({'acc': ['a', 'b', 'c'],
                   'stage': [1, 2, 3],
                   'bal': [10000, 50000, 10000],
                   'installment': [100, 500, 100],
                   'eir': [0.07, 0.04, 0.07],
                   })


def computeBal(bal, eir, installment):
    intt = bal * eir / 12 / 100
    next_bal = bal + intt - installment
    return next_bal


def processAccount(df_acc):
    acc = df_acc['acc'].values[0]
    stg = int(df_acc['stage'])
    bal = float(df_acc['bal'])
    eir = float(df_acc['eir'])
    installment = float(df_acc['installment'])

    amort = []
    amort.append(bal)
    if stg == 1:
        for _ in range(1, 12):
            bal = computeBal(bal, eir, installment)
            amort.append(round(bal, 2))
    elif stg == 2:
        while bal > 0:
            bal = computeBal(bal, eir, installment)
            if bal > 0:
                amort.append(round(bal, 2))

    out = pd.DataFrame(amort)
    out['acc'] = acc
    out['stage'] = stg
    out['installment'] = installment
    out['eir'] = eir
    out.reset_index(inplace=True)
    out.rename(columns={0: 'bal', 'index': 'term'}, inplace=True)
    out['term'] += 1

    return out[['acc', 'stage', 'bal', 'installment', 'eir', 'term']]


result = dict()
for acc in df['acc'].unique():
    df_acc = df.loc[df['acc'] == acc, :].copy()
    result[acc] = processAccount(df_acc)


out = pd.concat(result).reset_index(drop=True)
out
于 2022-01-01T13:47:52.620 回答