0

我有一个大的 DataFrame (Pair3):

    DatetimeIndex: 4062 entries, 1997-06-06 00:00:00 to 2013-09-13 00:00:00
    Data columns (total 3 columns):
    A         4062  non-null values
    G         4062  non-null values
    S         4062  non-null values
    etc.

我想计算不同对的相关性和滚动相关性。因此,我做了:

   pairs = ([Pair3.A, Pair3.G], [Pair3.A, Pair3.S])

我用这个函数计算了这些对的相关性:

   tresults = []
   def correlation(x):
       for i in pairs:
            tresults.append(np.corrcoef(i)[1][0])

获得:

   tresults
   Out[161]: [0.94756275037713467, 0.91061348701825506]
                   (Correlation AG , Correlation AS)

我的问题:

  1. 我想创建一个 DataFrame - 名为 Correlation - 自动命名关于所考虑的对的列,例如 Correlation AG、Correlation AS 等以及相应的结果值

像这样的表:

    Correlation AG     ,  Correlation AS
    0.94756275037713467,  0.91061348701825506

我需要手动完成吗?

4

1 回答 1

3

这通过滚动交叉计算所有对,返回结果的面板。在此处查看文档

In [18]: df = DataFrame(randn(100,3),columns=list('ABC'),index=date_range('20130101',periods=100))

In [19]: pd.rolling_corr_pairwise(df,50,10)
Out[19]: 
<class 'pandas.core.panel.Panel'>
Dimensions: 100 (items) x 3 (major_axis) x 3 (minor_axis)
Items axis: 2013-01-01 00:00:00 to 2013-04-10 00:00:00
Major_axis axis: A to C
Minor_axis axis: A to C

In [20]: pd.rolling_corr_pairwise(df,50,10).loc[:,'A','C']
Out[20]: 
2013-01-01         NaN
2013-01-02         NaN
2013-01-03         NaN
2013-01-04         NaN
2013-01-05         NaN
2013-01-06         NaN
2013-01-07         NaN
2013-01-08         NaN
2013-01-09         NaN
2013-01-10   -0.380174
2013-01-11   -0.368027
2013-01-12   -0.256105
2013-01-13   -0.208781
2013-01-14   -0.209550
2013-01-15   -0.188442
...
2013-03-27   -0.147510
2013-03-28   -0.130810
2013-03-29   -0.139143
2013-03-30   -0.149664
2013-03-31   -0.117451
2013-04-01   -0.129279
2013-04-02   -0.119471
2013-04-03   -0.040025
2013-04-04   -0.045022
2013-04-05   -0.025215
2013-04-06   -0.048226
2013-04-07   -0.048213
2013-04-08   -0.046223
2013-04-09   -0.060886
2013-04-10   -0.032557
Freq: D, Name: C, Length: 100, dtype: float64
于 2013-10-02T13:00:59.933 回答