0

我是 r 新手,不知道如何解决我遇到的错误。
以下是我的数据摘要:

> summary(data)
        Metro                          MrktRgn     MedAge     numHmSales   
     Abilene  : 1   Austin-Waco-Hill Country  : 6   20-25: 3   Min.   :  302  
     Amarillo : 1   Far West Texas            : 1   25-30: 6   1st Qu.: 1057  
     Arlington: 1   Gulf Coast - Brazos Bottom:10   30-35:28   Median : 2098  
     Austin   : 1   Northeast Texas           :14   35-40: 6   Mean   : 7278  
     Bay Area : 1   Panhandle and South Plains: 5   45-50: 2   3rd Qu.: 5086  
     Beaumont : 1   South Texas               : 7   50-55: 1   Max.   :83174  
     (Other)  :40   West Texas                : 3                             
        AvgSlPr          totNumLs         MedHHInc          Pop         
     Min.   :123833   Min.   :  1257   Min.   :37300   Min.   :   2899  
     1st Qu.:149117   1st Qu.:  6028   1st Qu.:53100   1st Qu.:  56876  
     Median :171667   Median : 11106   Median :57000   Median : 126482  
     Mean   :188637   Mean   : 24302   Mean   :60478   Mean   : 296529  
     3rd Qu.:215175   3rd Qu.: 25472   3rd Qu.:66200   3rd Qu.: 299321  
     Max.   :303475   Max.   :224230   Max.   :99205   Max.   :2196000  
     NA's   :1 

然后我用 AvSlPr 作为 y 变量和其他其他变量作为 x 变量创建一个模型

> model1 = lm(AvgSlPr ~ Metro + MrktRgn + MedAge + numHmSales + totNumLs + MedHHInc + Pop)

但是当我对模型进行总结时,我得到了标准的 NA。误差、t 值和 t p 值。

> summary(model1)

Call:
lm(formula = AvgSlPr ~ Metro + MrktRgn + MedAge + numHmSales + 
    totNumLs + MedHHInc + Pop)

Residuals:
ALL 45 residuals are 0: no residual degrees of freedom!

Coefficients: (15 not defined because of singularities)
                                  Estimate Std. Error t value Pr(>|t|)
(Intercept)                         143175         NA      NA       NA
MetroAmarillo                        24925         NA      NA       NA
MetroArlington                       35258         NA      NA       NA
MetroAustin                         160300         NA      NA       NA
MetroBay Area                        68642         NA      NA       NA
MetroBeaumont                         5942         NA      NA       NA
...
MrktRgnWest Texas                       NA         NA      NA       NA
MedAge25-30                             NA         NA      NA       NA
MedAge30-35                             NA         NA      NA       NA
MedAge35-40                             NA         NA      NA       NA
MedAge45-50                             NA         NA      NA       NA
MedAge50-55                             NA         NA      NA       NA
numHmSales                              NA         NA      NA       NA
totNumLs                                NA         NA      NA       NA
MedHHInc                                NA         NA      NA       NA
Pop                                     NA         NA      NA       NA

Residual standard error: NaN on 0 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:      1,     Adjusted R-squared:    NaN 
F-statistic:   NaN on 44 and 0 DF,  p-value: NA

有谁知道出了什么问题以及我该如何解决?另外,我不应该使用虚拟变量。

4

1 回答 1

1

您的Metro变量总是指每个因子水平的单行。您至少需要两个点来拟合一条线。让我用一个例子来演示:

dat = data.frame(AvgSlPr=runif(4), Metro = factor(LETTERS[1:4]), MrktRgn = runif(4))
model1 = lm(AvgSlPr ~ Metro + MrktRgn, data = dat)
summary(model1)

#Call:
#lm(formula = AvgSlPr ~ Metro + MrktRgn, data = dat)

#Residuals:
#ALL 4 residuals are 0: no residual degrees of freedom!

#Coefficients: (1 not defined because of singularities)
#            Estimate Std. Error t value Pr(>|t|)
#(Intercept)  0.33801         NA      NA       NA
#MetroB       0.47350         NA      NA       NA
#MetroC      -0.04118         NA      NA       NA
#MetroD       0.20047         NA      NA       NA
#MrktRgn           NA         NA      NA       NA

#Residual standard error: NaN on 0 degrees of freedom
#Multiple R-squared:      1,    Adjusted R-squared:    NaN 
#F-statistic:   NaN on 3 and 0 DF,  p-value: NA

但是如果我们添加更多的数据,使得至少一些因子水平有不止一行的数据,那么线性模型可以计算出来:

dat = rbind(dat, data.frame(AvgSlPr=2:4, Metro=factor(LETTERS[2:4]), MrktRgn = 3:5))
model2 = lm(AvgSlPr ~ Metro + MrktRgn, data=dat)
summary(model2)

#Call:
#lm(formula = AvgSlPr ~ Metro + MrktRgn, data = dat)

#Residuals:
#         1          2          3          4          5          6          7 
# 9.021e-17  2.643e-01  7.304e-03 -1.498e-01 -2.643e-01 -7.304e-03  1.498e-01 

#Coefficients:
#            Estimate Std. Error t value Pr(>|t|)   
#(Intercept)  0.24279    0.30406   0.798  0.50834   
#MetroB      -0.10207    0.38858  -0.263  0.81739   
#MetroC      -0.06696    0.39471  -0.170  0.88090   
#MetroD       0.06804    0.41243   0.165  0.88413   
#MrktRgn      0.70787    0.06747  10.491  0.00896 **
#---
#Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

#Residual standard error: 0.3039 on 2 degrees of freedom
#Multiple R-squared:  0.9857,   Adjusted R-squared:  0.9571 
#F-statistic: 34.45 on 4 and 2 DF,  p-value: 0.02841

需要重新考虑用于拟合模型的数据。分析的目标是什么?实现目标需要哪些数据?

于 2017-11-20T07:17:05.470 回答