1

我有一个包含 250,000 多行的 df。我有一些依赖于 t-1 值的字段。这在 excel 中轻而易举,但不确定在 pandas 中最有效的方法是什么。目前我设置了 t[0] 值,然后使用 for 循环来完成其余的工作,但这非常慢。有没有更快的方法来做到这一点?

任何帮助将不胜感激!

下面的代码

import pandas as pd
import numpy as np
import math
import datetime
from scipy.optimize import minimize

df = pd.DataFrame({
    'Time': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
    'Price': [44, 100, 40, 110, 77, 109, 65, 93, 89, 73]})

# Create Empty Columns
df[['Qty', 'Buy', 'Sell', 'Cost', 'Rev']] = pd.DataFrame([[0.00, 0.00, 0.00, 0.00, 0.00]], index=df.index)

# Initial Values
buy_price = 50
sell_price = 100

# Set Values at Time 0
df.at[0, 'Qty'] = 0
df.at[0, 'Buy'] = np.where(df.at[0, 'Price'] < buy_price, min(30 - df.at[0, 'Qty'], 10), 0)
df.at[0, 'Sell'] = np.where(df.at[0, 'Price'] > sell_price, min(df.at[0, 'Qty'], 10), 0)
df.at[0, 'Cost'] = df.at[0, 'Buy'] * df.at[0, 'Price']
df.at[0, 'Rev'] = df.at[0, 'Sell'] * df.at[0, 'Price']

# Set Remaining Values
for t in range(1, len(df)):
    df.at[t, 'Qty'] = df.at[t-1, 'Qty'] + df.at[t-1, 'Buy'] - df.at[t-1, 'Sell']
    df.at[t, 'Buy'] = np.where(df.at[t, 'Price'] < buy_price, min(30 - df.at[t, 'Qty'], 10), 0)
    df.at[t, 'Sell'] = np.where(df.at[t, 'Price'] > sell_price, min(df.at[t, 'Qty'], 10), 0)
    df.at[t, 'Cost'] = df.at[t, 'Buy'] * df.at[t, 'Price']
    df.at[t, 'Rev'] = df.at[t, 'Sell'] * df.at[t, 'Price']

我查看了之前的这篇文章,它很相似,但我认为 cumsum() 在这种情况下不会起作用,因为所有 3 个主要字段(数量、购买、销售)都是相互关联的。

4

3 回答 3

2

pandas数据框并不意味着循环遍历行。我建议你好好学习一下它的用途和功能。同时,这应该可以帮助您满足您的需求(我是即时完成的,所以如果有编译错误,请告诉我):

df['Qty'] = df['Qty'].shift() + df['Buy'].shift() - df['Sell'].shift()
df['Buy'] = df.apply(lambda x: 0 if x['Price'] >= buy_price else min(30 - X['Qty'], 10))
df['Sell'] = df.apply(lambda x: 0 if x['Price'] <= sell_price else min(x['Qty'], 10))
df['Cost'] = df['Buy'] * df['Price']
df['Rev'] = dft['Sell'] * df['Price']
于 2019-10-28T06:42:08.573 回答
1

使用cumsumandnp.where代替apply

df["Buy"]= np.where(df["Price"]<50, np.where((30 - df["Qty"]) > 10, 10, 30 - df["Qty"]), 0)
df["Sell"] = np.where(df["Price"]>100, np.where(df["Qty"] > 10, df["Qty"], 10), 0)
df["Qty"] = (df["Buy"].shift()-df["Sell"].shift()).cumsum()
df['Cost'] = df['Buy'] * df['Price']
df['Rev'] = df['Sell'] * df['Price']

print (df)
#
   Time  Price   Qty   Buy  Sell   Cost     Rev
0     0     44   NaN  10.0   0.0  440.0     0.0
1     1    100  10.0   0.0   0.0    0.0     0.0
2     2     40  10.0  10.0   0.0  400.0     0.0
3     3    110  20.0   0.0  10.0    0.0  1100.0
4     4     77  10.0   0.0   0.0    0.0     0.0
5     5    109  10.0   0.0  10.0    0.0  1090.0
6     6     65   0.0   0.0   0.0    0.0     0.0
7     7     93   0.0   0.0   0.0    0.0     0.0
8     8     89   0.0   0.0   0.0    0.0     0.0
9     9     73   0.0   0.0   0.0    0.0     0.0
于 2019-10-28T07:26:08.717 回答
1

更简洁的方法是编写一个可以存储状态的谓词,然后调用一次应用函数。定义谓词如下所示

class Predicate():
    def __init__(self):
        self.buy_price = 50
        self.sell_price = 100
        self.prev_qty = 0
        self.prev_buy_price = 0
        self.prev_sell_price = 0
    def __call__(self, x):
        x.Qty = self.prev_qty + self.prev_buy_price - self.prev_sell_price
        x.Buy = np.where(x.Price < buy_price, min(30 - x.Qty, 10), 0)
        x.Sell = np.where(x.Price > sell_price, min(x.Qty, 10), 0)
        x.Cost = x.Buy * x.Price
        x.Rev = x.Sell * x.Price
        self.prev_buy_price = x.Buy
        self.prev_qty = x.Qty
        self.prev_sell_price = x.Sell
        return x

并将谓词应用为

p = Predicate()
df.apply(p, axis=1)

给出以下结果

    Time    Price   Qty Buy Sell    Cost    Rev
0   0.0 44.0    10.0    10.0    0.0 440.0   0.0
1   1.0 100.0   20.0    0.0 0.0 0.0 0.0
2   2.0 40.0    20.0    10.0    0.0 400.0   0.0
3   3.0 110.0   30.0    0.0 10.0    0.0 1100.0
4   4.0 77.0    20.0    0.0 0.0 0.0 0.0
5   5.0 109.0   20.0    0.0 10.0    0.0 1090.0
6   6.0 65.0    10.0    0.0 0.0 0.0 0.0
7   7.0 93.0    10.0    0.0 0.0 0.0 0.0
8   8.0 89.0    10.0    0.0 0.0 0.0 0.0
9   9.0 73.0    10.0    0.0 0.0 0.0 0.0
于 2019-10-28T08:43:17.107 回答