python - 如何在 Pandas DataFrame 中复制行

Question

我正在寻找处理 Pandas DataFrame 的 Pythonic 方式。假设我的 DataFrame 如下所示：

帐户	阶段	杰出的	分期付款	EIR
一个	1	10000	100	0.07
乙	2	50000	500	0.04
C	3	10000	100	0.07

我正在尝试根据给定的信息分阶段制作摊销表。例如：

Account A Stage 1 will be amortized for 12 months
Account B Stage 2 will be amortized until Outstanding = 0 (or close to 0)
Account C Stage 3 will NOT be amortized

我有 SAS 代码来执行下面前面解释的这种逻辑：

data want;
set have;

if Stage = 1 then do;
    do Term = 1 to 12;
        Outstanding = Outstanding - (abs(Installment) - (Outstanding * EIR / 100 / 12));
        if Outstanding < 0 then delete;
        output;
        end;
    end;

else if Stage = 2 then do;
    do Term = 1 to Term;
        Outstanding = Outstanding - (abs(Installment) - (Outstanding * EIR / 100 / 12));
        if Outstanding < 0 then delete;
        output;
        end;
    end;

else if Stage = 3 then do;
    Outstanding = Outstanding;
    output;
    end;

run;

运行后代码将提供如下输出表（数字只是模型）：

帐户	阶段	杰出的	分期付款	EIR	学期
一个	1	10000	100	0.07	1
一个	1	9000	100	0.07	2
一个	1	8000	100	0.07	3
一个	1	...	...	...	...
一个	1	2000	100	0.07	12
乙	2	50000	500	0.04	1
乙	2	49000	500	0.04	2
乙	2	48000	500	0.04	3
乙	2	...	...	...	...
乙	2	125	500	0.04	48
C	3	10000	100	0.07	1

我有相同的python代码，但我认为它效率不高。我有类似的东西：

# Amortization function
def balances(rate, payment, os):
    interestAmount = os * rate / 100 / 12   
    nextBalance = os + interestAmount - payment
    return nextBalance

然后，我使用for循环逐个帐户调用函数，并用于np.repeat()重复我需要的信息。

result = []
for i, account in enumerate(df['Account']):
    if i % 5000 == 0:
        print(f'Calcultion account: {i}')
    accountTable = df[df['Account'] == account]
    rate = float(accountTable['EIR'])
    payment = float(accountTable['Installment'])
    amount = float(accountTable['Outstanding'])

    if int(accountTable['Stage']) <= 2:
        while amount > 0:
            amount = balances(rate, payment, amount)
            amortization.append(amount)
            if amortization[-1] <= 0:
                amortization.pop(-1)
            amortizationTable = pd.DataFrame(np.repeat(accountTable.values, len(amortization), axis = 0), columns = accountTable.columns)
            amortizationTable['Outstanding'] = amortization
            amortizationTable['Term'] = amortizationTable.index + 1
            result.append(amortizationTable)

与 SAS 编程相比，我发现它非常慢。任何提高速度或使其更pythonic方式的建议。

谢谢你。

score 0 · Accepted Answer

尝试这个：

import pandas as pd
df = pd.DataFrame({'acc': ['a', 'b', 'c'],
                   'stage': [1, 2, 3],
                   'bal': [10000, 50000, 10000],
                   'installment': [100, 500, 100],
                   'eir': [0.07, 0.04, 0.07],
                   })


def computeBal(bal, eir, installment):
    intt = bal * eir / 12 / 100
    next_bal = bal + intt - installment
    return next_bal


def processAccount(df_acc):
    acc = df_acc['acc'].values[0]
    stg = int(df_acc['stage'])
    bal = float(df_acc['bal'])
    eir = float(df_acc['eir'])
    installment = float(df_acc['installment'])

    amort = []
    amort.append(bal)
    if stg == 1:
        for _ in range(1, 12):
            bal = computeBal(bal, eir, installment)
            amort.append(round(bal, 2))
    elif stg == 2:
        while bal > 0:
            bal = computeBal(bal, eir, installment)
            if bal > 0:
                amort.append(round(bal, 2))

    out = pd.DataFrame(amort)
    out['acc'] = acc
    out['stage'] = stg
    out['installment'] = installment
    out['eir'] = eir
    out.reset_index(inplace=True)
    out.rename(columns={0: 'bal', 'index': 'term'}, inplace=True)
    out['term'] += 1

    return out[['acc', 'stage', 'bal', 'installment', 'eir', 'term']]


result = dict()
for acc in df['acc'].unique():
    df_acc = df.loc[df['acc'] == acc, :].copy()
    result[acc] = processAccount(df_acc)


out = pd.concat(result).reset_index(drop=True)
out

python - 如何在 Pandas DataFrame 中复制行

1 回答 1

Related

Reference