我在数据集上运行 pandas OLS。如果我在 x 值中使用少于 20 个时间序列运行它,一切正常是否有最大的依赖 pandas.ols 可以处理?这就是我正在做的事情,除了我有文件中的数据而不是使用 DataReader 获取它:
from pandas import Series, DataFrame, ols
from pandas.io.data import DataReader
from DataContainer import DataContainer
import random
window = 21
basic = DataReader("BHI", "yahoo")
print len(basic)
dependance = 15
sp100 = [
"AAPL", "ABT", "ACN", "AEP", "ALL", "AMGN", "AMZN", "APC",
"AXP", "BA", "BAC", "BAX", "BK", "BMY", "BRK.B", "CAT", "C", "CL",
"CMCSA", "COF", "COP", "COST", "CPB", "CSCO", "CVS", "CVX", "DD", "DELL",
"DIS", "DOW", "DVN", "EBAY", "EMC", "EXC", "F", "FCX", "FDX", "GD", "GE",
"GILD", "GOOG", "GS", "HAL", "HD", "HNZ", "HON", "HPQ", "IBM", "INTC",
"JNJ", "JPM_1", "KFT", "KO", "LLY", "LMT", "LOW", "MA", "MCD", "MDT", "MET",
"MMM", "MO", "MON", "MRK", "MS", "MSFT", "NKE", "NOV", "NSC", "NWSA",
"NYX", "ORCL", "OXY", "PEP", "PFE", "PG", "PM", "QCOM", "RF", "RTN",
"SBUX", "SLB", "SLE", "SO", "SPG", "T", "TGT", "TWX", "TXN", "UNH", "UPS",
"USB", "UTX", "VZ", "WAG", "WFC", "WMB", "WMT", "XOM"
]
keys = random.sample(sp100, dependance)
data = {key: DataReader(key, "yahoo") for key in keys}
vals = {key: DataFrame(data=Series(data[key], name=key), index=basic.index) for key in data}
model = ols(y=basic, x=vals, window=window)
一旦相关性> = 20,就会发生错误,但从不发生相关性<20。vals 字典就在那里,因为我的本地数据结构为每个 DataFrame 赋予了相同的名称,这是 ols 不喜欢的,而且我没有找到更好的方法来重命名 DataFrame。