2

我正在尝试估计面板回归(请参阅:https ://bashtage.github.io/linearmodels/doc/panel/examples/examples.html )

我的数据是这样格式化的(这只是一个示例片段;在原始文件中有 11 列加上时间戳和数千行):

我有的

Timestamp   Country Dummy   Pre Post    All_Countries   Timestamp
1993-11-01  1               0   1       6.18    1993-11-01
1993-11-02  1               0   1       6.18    1993-11-02
1993-11-03  1               0   1       6.17    1993-11-03
1993-11-04  1               1   0       6.17    1993-11-04
1993-11-15  1               1   0       6.40    1993-11-15
1993-11-01  2               0   1       7.05    1993-11-01
1993-11-02  2               0   1       7.05    1993-11-02
1993-11-03  2               0   1       7.20    1993-11-03
1993-11-04  2               1   0       7.50    1993-11-04
1993-11-15  2               1   0       7.60    1993-11-15
1993-11-01  3               0   1       7.69    1993-11-01
1993-11-02  3               0   1       7.61    1993-11-02
1993-11-03  3               0   1       7.67    1993-11-03
1993-11-04  3               1   0       7.91    1993-11-04
1993-11-15  3               1   0       8.61    1993-11-15

如何重新创建它

import numpy as np
import pandas as pd
df = pd.DataFrame({"Timestamp" : ['1993-11-01' ,'1993-11-02', '1993-11-03', '1993-11-04','1993-11-15'], "Pre" : [0 ,0, 0, 1, 1], "Post" : [1 ,1, 1, 0, 0],  "Austria" : [6.18 ,6.18, 6.17, 6.17, 6.40],"Belgium" : [7.05, 7.05, 7.2, 7.5, 7.6],"France" : [7.69, 7.61, 7.67, 7.91, 8.61]},index = [1, 2, 3,4,5])
df


 index_data = df.melt(['Timestamp','Pre','Post'], var_name='Country Dummy', value_name='All_Countries')

index_data['Country Dummy'] = index_data['Country Dummy'].factorize()[0] + 1
                     # pd.Categorical(out['Country Dummy']).codes + 1
timestamp = pd.Categorical(index_data['Timestamp'])
index_data = index_data.set_index(['Timestamp', 'Country Dummy'])
index_data['Timestamp'] = timestamp
index_data

**我所做的 **

!pip install linearmodels
from linearmodels.panel import PooledOLS
import statsmodels.api as sm
exog_vars = ['Pre','Post']
exog = sm.add_constant(index_data[exog_vars])
mod = PooledOLS(index_data.All_Countries, exog)
pooled_res = mod.fit()
print(pooled_res)

**我得到了什么**

“ValueError:exog 没有完整的列排名。”

问题

任何人都知道什么可能导致这个问题?

主意

是不是因为我的数据应该这样格式化(参见顶部链接中的示例):-> 如果是,我怎么能得到那个

Timestamp   Country Dummy   Pre Post    All_Countries   Timestamp
1993-11-01  1               0   1       6.18    1993-11-01
1993-11-02                  0   1       6.18    1993-11-02
1993-11-03                  0   1       6.17    1993-11-03
1993-11-04                  1   0       6.17    1993-11-04
1993-11-15                  1   0       6.40    1993-11-15
1993-11-01  2               0   1       7.05    1993-11-01
1993-11-02                  0   1       7.05    1993-11-02
1993-11-03                  0   1       7.20    1993-11-03
1993-11-04                  1   0       7.50    1993-11-04
1993-11-15                  1   0       7.60    1993-11-15
1993-11-01  3               0   1       7.69    1993-11-01
1993-11-02                  0   1       7.61    1993-11-02
1993-11-03                  0   1       7.67    1993-11-03
1993-11-04                  1   0       7.91    1993-11-04
1993-11-15                  1   0       8.61    1993-11-15
4

1 回答 1

3

之所以会出现该错误,是因为Pre它是 的线性组合Post。您应该只使用其中一列,因为另一列不会添加信息(并且会破坏模型背后的代数)。在这种情况下:

Pre = 1 - Post

这与您在运行 OLS 模型时删除将用作基线的虚拟对象的原因相同。

这应该有效:

exog_vars = ['Post']
exog = sm.add_constant(index_data[exog_vars])
mod = PooledOLS(index_data.All_Countries, exog)
pooled_res = mod.fit()
于 2020-11-18T16:33:47.650 回答