2

我正在尝试使用 polyfit 来找到一组数据的最佳拟合直线,但我还需要知道参数的不确定性,所以我也想要协方差矩阵。在线文档建议我写:

polyfit(x, y, 2, cov=True)

但这给出了错误:

TypeError:polyfit() 得到了一个意外的关键字参数“cov”

果然 help(polyfit) 没有显示关键字参数“cov”。

那么在线文档是否引用了numpy的先前版本?(我有 1.6.1,最新的)。我可以编写自己的 polyfit 函数,但有人对我的 polyfit 没有协方差选项有什么建议吗?

谢谢

4

1 回答 1

1

对于来自库的解决方案,我发现使用scikits.statsmodels是一个方便的选择。在 statsmodels 中,回归对象具有返回参数和标准错误的可调用属性。我在下面举例说明了这将如何为您工作:

# Imports, I assume NumPy for forming your data.
import numpy as np
import scikits.statsmodels.api as sm

# Form the data here
(X, Y) = ....

reg_x_data =  np.ones(X.shape);                      # 0th degree term. 
for ii in range(1,deg+1):
    reg_x_data = np.hstack(( reg_x_data, X**(ii) )); # Append the ii^th degree term.


# Store OLS regression results into `result`
result = sm.OLS(Y,reg_x_data).fit()


# Print the estimated coefficients
print result.params

# Print the basic OLS standard error in the coefficients
print result.bse

# Print the estimated basic OLS covariance matrix
print result.cov_params()   # <-- Notice, this one is a function by convention.

# Print the heteroskedasticity-consistent standard error
print result.HC0_se

# Print the heteroskedasticity-consistent covariance matrix
print result.cov_HC0

对象中还有其他强大的协方差属性result。您可以通过打印来查看它们dir(result)。此外,按照惯例,异方差一致估计量的协方差矩阵仅您已经调用相应的标准误差后才可用,例如:您必须在调用result.HC0_se之前调用,result.cov_HC0因为第一个引用会导致第二个引用被计算和存储。

Pandas是另一个可能为这些操作提供更高级支持的库。

非库函数

当您不想/不能依赖额外的库函数时,这可能很有用。

下面是我编写的用于返回 OLS 回归系数的函数,以及一堆东西。它返回残差、回归方差和标准误差(残差平方的标准误差)、大样本方差的渐近公式、OLS 协方差矩阵、异方差一致的“稳健”协方差估计(即 OLS 协方差但根据残差加权)和“白色”或“偏差校正”异方差一致协方差。

import numpy as np

###
# Regression and standard error estimation functions
###
def ols_linreg(X, Y):
    """ ols_linreg(X,Y) 

        Ordinary least squares regression estimator given explanatory variables 
        matrix X and observations matrix Y.The length of the first dimension of 
        X and Y must be the same (equal to the number of samples in the data set).

        Note: these methods should be adapted if you need to use this for large data.
        This is mostly for illustrating what to do for calculating the different
        classicial standard errors. You would never really want to compute the inverse
        matrices for large problems.

        This was developed with NumPy 1.5.1.
    """
    (N, K) = X.shape
    t1 = np.linalg.inv( (np.transpose(X)).dot(X) )
    t2 = (np.transpose(X)).dot(Y)

    beta = t1.dot(t2)
    residuals = Y - X.dot(beta)
    sig_hat = (1.0/(N-K))*np.sum(residuals**2)
    sig_hat_asymptotic_variance = 2*sig_hat**2/N
    conv_st_err = np.sqrt(sig_hat)

    sum1 = 0.0
    for ii in range(N):
        sum1 = sum1 + np.outer(X[ii,:],X[ii,:])

    sum1 = (1.0/N)*sum1
    ols_cov = (sig_hat/N)*np.linalg.inv(sum1)

    PX = X.dot(  np.linalg.inv(np.transpose(X).dot(X)).dot(np.transpose(X))   )
    robust_se_mat1 = np.linalg.inv(np.transpose(X).dot(X))
    robust_se_mat2 = np.transpose(X).dot(np.diag(residuals[:,0]**(2.0)).dot(X))
    robust_se_mat3 = np.transpose(X).dot(np.diag(residuals[:,0]**(2.0)/(1.0-np.diag(PX))).dot(X))

    v_robust = robust_se_mat1.dot(robust_se_mat2.dot(robust_se_mat1))
    v_modified_robust = robust_se_mat1.dot(robust_se_mat3.dot(robust_se_mat1))

    """ Returns:
        beta -- The vector of coefficient estimates, ordered on the columns on X.
        residuals -- The vector of residuals, Y - X.beta
        sig_hat -- The sample variance of the residuals.
        conv_st_error -- The 'standard error of the regression', sqrt(sig_hat).
        sig_hat_asymptotic_variance -- The analytic formula for the large sample variance
        ols_cov -- The covariance matrix under the basic OLS assumptions.
        v_robust -- The "robust" covariance matrix, weighted to account for the residuals and heteroskedasticity.
        v_modified_robust -- The bias-corrected and heteroskedasticity-consistent covariance matrix.
    """
    return beta, residuals, sig_hat, conv_st_err, sig_hat_asymptotic_variance, ols_cov, v_robust, v_modified_robust

对于您的问题,您可以像这样使用它:

import numpy as np

# Define or load your data:
(Y, X) = ....

# Desired polynomial degree
deg = 2;

reg_x_data =  np.ones(X.shape); # 0th degree term. 
for ii in range(1,deg+1):
    reg_x_data = np.hstack(( reg_x_data, X**(ii) )); # Append the ii^th degree term.


# Get all of the regression data.
beta, residuals, sig_hat, conv_st_error, sig_hat_asymptotic_variance, ols_cov, v_robust, v_modified_robust = ols_linreg(reg_x_data,Y)

# Print the covariance matrix:
print ols_cov

如果您在我的计算中发现任何错误(尤其是异方差一致性估计器),请告诉我,我会尽快修复它。

于 2012-04-21T19:24:01.770 回答