3

我有一个包含以下信息的 Pandas 数据框:

index       year  month day symbol transaction  nr_shares
2011-01-10  2011  1     10  AAPL       Buy       1500
2011-01-13  2011  1     13  GOOG       Sell      1000

我想填充第二个零填充的 Pandas 数据框

index        AAPL  GOOG
2011-01-10     0     0
2011-01-11     0     0
2011-01-12     0     0
2011-01-13     0     0

使用来自第一个数据帧的信息,所以我得到

index        AAPL  GOOG
2011-01-10   1500    0
2011-01-11     0     0
2011-01-12     0     0
2011-01-13     0  -1000

可以看出,在相关日期,特定数量的股票的买卖交易已输入相应的列,买入订单为正数,卖出订单为负数。

我怎样才能做到这一点?我是否必须遍历第一个数据帧索引并使用嵌套的“if”语句检查符号和事务列,然后写入第二个数据帧,或者是否有更优雅的数据帧方法可以使用?

4

2 回答 2

4

你可以使用pivot_table. 从(编辑为稍微复杂一些)开始:

>>> df1
        index  year  month  day symbol transaction  nr_shares
0  2011-01-10  2011      1   10   AAPL         Buy       1500
1  2011-01-10  2011      1   10   AAPL        Sell        200
2  2011-01-10  2011      1   10   GOOG        Sell        500
3  2011-01-10  2011      1   10   GOOG         Buy        600
4  2011-01-13  2011      1   13   GOOG        Sell       1000
>>> df2
        index  AAPL  GOOG
0  2011-01-10     0     0
1  2011-01-11     0     0
2  2011-01-12     0     0
3  2011-01-13     0     0

我们可以签署股份:

>>> df1["nr_shares"] = df1.apply(lambda row: row["nr_shares"] * (-1 if row["transaction"] == "Sell" else 1), axis=1)
>>> df1
        index  year  month  day symbol transaction  nr_shares
0  2011-01-10  2011      1   10   AAPL         Buy       1500
1  2011-01-10  2011      1   10   AAPL        Sell       -200
2  2011-01-10  2011      1   10   GOOG        Sell       -500
3  2011-01-10  2011      1   10   GOOG         Buy        600
4  2011-01-13  2011      1   13   GOOG        Sell      -1000

然后你可以枢轴df1。默认情况下,它使用聚合值的平均值,但我们想要总和:

>>> a = df1.pivot_table(values="nr_shares", rows="index", cols="symbol",
                    aggfunc=sum)
>>> a
symbol      AAPL  GOOG
index                 
2011-01-10  1300   100
2011-01-13   NaN -1000

给出b相同的索引:

>>> b = df2.set_index("index")
>>> b
            AAPL  GOOG
index                 
2011-01-10     0     0
2011-01-11     0     0
2011-01-12     0     0
2011-01-13     0     0

然后添加它们:

>>> (a+b).fillna(0)
symbol      AAPL  GOOG
index                 
2011-01-10  1300   100
2011-01-11     0     0
2011-01-12     0     0
2011-01-13     0 -1000
于 2013-03-29T18:35:53.487 回答
3

首先使用apply您可以添加一个带有签名股票的列(正面代表买入,负面代表卖出):

In [11]: df['signed_shares'] = df.apply(lambda row: row['nr_shares']
                                                    if row['transaction'] == 'Buy'
                                                    else -row['nr_shares'],
                                        axis=1)

In [12]: df
Out[12]: 
            year  month  day symbol transaction  nr_shares  signed_shares
index                                                                    
2011-01-10  2011      1   10   AAPL         Buy       1500           1500
2011-01-13  2011      1   13   GOOG        Sell       1000          -1000

仅使用您感兴趣的那些列并取消堆叠它们:

In [13]: df[['symbol', 'signed_shares']].set_index('symbol', append=True)
Out[13]: 
                   signed_shares
index      symbol               
2011-01-10 AAPL             1500
2011-01-13 GOOG            -1000

In [14]: a = df[['symbol', 'signed_shares']].set_index('symbol', append=True).unstack()

In [15]: a
Out[15]: 
            signed_shares      
symbol               AAPL  GOOG
index                          
2011-01-10           1500   NaN
2011-01-13            NaN -1000

在您喜欢的任何日期范围内重新编制索引:

In [16]: rng = pd.date_range('2011-01-10', periods=4)

In [17]: a.reindex(rng)
Out[17]: 
            signed_shares      
symbol               AAPL  GOOG
2011-01-10           1500   NaN
2011-01-11            NaN   NaN
2011-01-12            NaN   NaN
2011-01-13            NaN -1000

最后使用 0 填充 NaN fillna

In [18]: a.reindex(rng).fillna(0)
Out[18]: 
            signed_shares      
symbol               AAPL  GOOG
2011-01-10           1500     0
2011-01-11              0     0
2011-01-12              0     0
2011-01-13              0 -1000

正如@DSM 指出的那样,您可以使用 [13]-[15] 做得更好pivot_table

In [20]: df.reset_index().pivot_table('signed_shares', 'index', 'symbol')
Out[20]: 
symbol      AAPL  GOOG
index                 
2011-01-10  1500   NaN
2011-01-13   NaN -1000
于 2013-03-29T18:36:40.890 回答