我想对一组个体进行面板回归(固定效应模型),这些个体由province
和唯一标识city
,跨时间t
。
创建数据框并运行回归的代码
import numpy as np
import pandas as pd
from linearmodels import PanelOLS
data = {'y':[1,2,3,1,0,3],
'x1': [0,1,2,3,0,2],
'x2':[1,1,3,2,1,0],
't': ['2020-02-18', '2020-02-18', '2020-02-17', '2020-02-18', '2020-02-18', '2020-02-17'],
'province': ['A', 'A','A','B','B','B'],
'city': ['a','b','a','a','c','a']}
dataframe = pd.DataFrame (data, columns = ['y','x1', 'x2', 't', 'province', 'city'])
dataframe=dataframe.set_index(['t','province','city'], append=True)
mod = PanelOLS(dataframe.y, dataframe[['x1','x2']], entity_effects=True)
但我收到一条错误消息,上面写着“DataFrame 输入必须具有 2 个级别的 MultiIndex”。
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-33-eb5264bfefc9> in <module>
1 dataframe=dataframe.set_index(['t','province','city'], append=True)
----> 2 mod = PanelOLS(dataframe.y, dataframe[['x1','x2']], entity_effects=True)
C:\ProgramData\Anaconda3\lib\site-packages\linearmodels\panel\model.py in __init__(self, dependent, exog, weights, entity_effects, time_effects, other_effects, singletons, drop_absorbed)
1038 drop_absorbed: bool = False,
1039 ) -> None:
-> 1040 super(PanelOLS, self).__init__(dependent, exog, weights=weights)
1041
1042 self._entity_effects = entity_effects
C:\ProgramData\Anaconda3\lib\site-packages\linearmodels\panel\model.py in __init__(self, dependent, exog, weights)
224 weights: Optional[PanelDataLike] = None,
225 ) -> None:
--> 226 self.dependent = PanelData(dependent, "Dep")
227 self.exog = PanelData(exog, "Exog")
228 self._original_shape = self.dependent.shape
C:\ProgramData\Anaconda3\lib\site-packages\linearmodels\panel\data.py in __init__(self, x, var_name, convert_dummies, drop_first, copy)
198 if len(x.index.levels) != 2:
199 raise ValueError(
--> 200 "DataFrame input must have a " "MultiIndex with 2 levels"
201 )
202 if isinstance(self._original, (DataFrame, PanelData, Series)):
ValueError: DataFrame input must have a MultiIndex with 2 levels
作为解决方案,而不是做
dataframe=dataframe.set_index(['t','province','city'], append=True)
我这样做
dataframe=dataframe.set_index(['t'], append=True)
这将允许模型通过。但我不知道为什么。在这种情况下,我使用两列来标识组。如果我需要三列来标识我的组怎么办?python如何区分ID和x变量?