0

我在按股票代码分组的熊猫数据框中有一些股票数据。我想修改表格格式,以便每个日期和每个代码都存在一行。

import yfinance as yf

def get_all_data(tickers, start_date="2009-01-01", end_date="2020-09-25"):
    df_orig = yf.download(tickers, start=start_date, end=end_date, group_by='ticker')
    df_orig = df_orig.fillna(method='ffill')
    return df_orig


sec_list = ['ULVR.L', 'MSFT', 'ABF.L']
df_orig = get_all_data(sec_list, start_date="2020-09-21", end_date="2020-09-25")
display(df_orig)

我目前得到这个在此处输入图像描述,但我想要以下格式的数据(即 7 列/12 行,而不是 15 列/4 行)

我该怎么做?


| Date       | Ticker | Open   |
| ---------- | ------ | ------ |
| 2020-09-21 | MSFT   | 197.19 |
| 2020-09-21 | ABF.L  | 1903.5 |
| 2020-09-21 | ULVR.L | 4706   |
| 2020-09-22 | MSFT   | 205.06 |
| 2020-09-22 | ABF.L  | 1855   |
| 2020-09-22 | ULVR.L | 4671   |
| 2020-09-23 | MSFT   | 207.9  |
| 2020-09-23 | ABF.L  | 1870.5 |
| 2020-09-23 | ULVR.L | 4766   |
| 2020-09-24 | MSFT   | 199.85 |
| 2020-09-24 | ABF.L  | 1847   |
| 2020-09-24 | ULVR.L | 4743   |

4

1 回答 1

1

您可以为此使用堆栈,命名列/索引最大值也更容易:

In [26]: df
Out[26]:
              a          b
           open close open close
2020-10-10    1     2    3     4
2020-10-10    5     6    7     8
2020-10-10    1     2    3     4
2020-10-10    6     7    8     9

In [27]: df.columns.names = ["ticker", "metric"]

In [28]: df.index.name = "date"

In [29]: df.stack("ticker")
Out[29]:
metric             close  open
date       ticker
2020-10-10 a           2     1
           b           4     3
           a           6     5
           b           8     7
           a           2     1
           b           4     3
           a           7     6
           b           9     8

或者,如果您不关心命名事物,只需使用带有 int 的堆栈:

In [46]: df
Out[46]:
              a          b
           open close open close
2020-10-10    1     2    3     4
2020-10-10    5     6    7     8
2020-10-10    1     2    3     4
2020-10-10    6     7    8     9

In [47]: df.stack(0)
Out[47]:
              close  open
2020-10-10 a      2     1
           b      4     3
           a      6     5
           b      8     7
           a      2     1
           b      4     3
           a      7     6
           b      9     8

# to set index names:

In [56]: gf = df.stack(0)

In [57]: gf.index = gf.index.set_names(["date", "ticker"])

In [58]: gf
Out[58]:
                   close  open
date       ticker
2020-10-10 a           2     1
           b           4     3
           a           6     5
           b           8     7
           a           2     1
           b           4     3
           a           7     6
           b           9     8
于 2020-09-28T13:51:18.717 回答