2

我当前的数据集对女性进行了过采样,以至于她们占 411 总样本量的 74%——而且应该是 50% 到 50%。如何使用我的分层后输出来影响我的(逻辑回归)预测模型?

这就是我在改变接受调查的女性数量时获得支持的新均值和系数所做的:

> library(foreign)
> library(survey)
> 
> mydata <- read.csv("~/Desktop/R/mydata.csv")
> 
> #Enter Actual Population Size
> mydata$fpc <- 1200
> 
> #Enter ID Column Name
> id <- mydata$My.ID
> 
> #Enter Column to Post-Stratify
> type <- mydata$Male
> 
> #Enter Column Variables
> x1 <- 0
> y1 <- 1
> 
> #Enter Corresponding Frequencies
> x2 <- 600
> y2 <- 600
> 
> #Enter the Variable of Interest
> mydata$interest <- mydata$Support
> 
> preliminary.design <- svydesign(id = ~1, data = mydata, fpc = ~fpc)
> 
> ps.weights <- data.frame(type = c(x1,y1), Freq = c(x2, y2))
> 
> mydesign <- postStratify(preliminary.design, ~type, ps.weights)
> 
> #Print Original Mean of Variable of Interest
> mean(mydata$Support)
[1] 0.6666666667
> 
> #Total Actual Population Size
> sum(ps.weights$Freq)
[1] 1200
> 
> #Unweighted Observations Where the Variable of Interest is Not Missing
> unwtd.count(~interest, mydesign)
       counts SE
counts    411  0
> 
> #Print the Post-Stratified Mean and SE of the Variable
> svymean(~interest, mydesign)
               mean      SE
interest 0.71077946 0.01935
> 
> #Print the Weighted Total and SE of the Variable
> svytotal(~interest, mydesign)
             total       SE
interest 852.93535 23.21552
> 
> #Print the Mean and SE of the Interest Variable, by Type
> svyby(~interest, ~type, mydesign, svymean)
  type     interest            se
0    0 0.6196721311 0.02256768435
1    1 0.8018867925 0.03142947839
> 
> mysvyby <- svyby(~interest, ~type, mydesign, svytotal)
> 
> #Print the Coefficients of each Type
> coef(mysvyby)
          0           1 
371.8032787 481.1320755 
> 
> #Print the Standard Error of each Type
> SE(mysvyby)
[1] 13.54061061 18.85768704
> 
> #Print Confidence Intervals for the Coefficient Estimates
> confint(mysvyby)
        2.5 %      97.5 %
0 345.2641696 398.3423878
1 444.1716880 518.0924629

上面的所有输出似乎都是正确的——但我不知道如何利用这些数据来影响我的逻辑回归模型的输出。这是没有任何分层后影响的代码:

> mydata <- read.csv("~/Desktop/R/mydata.csv")
> 
> attach(mydata) 
> 
> # Define variables 
> 
> Y <- cbind(Support)
> X <- cbind(Black, vote, Male) 
> 
> # Descriptive statistics 
> 
> summary(Y) 
    Support         
 Min.   :0.0000000  
 1st Qu.:0.0000000  
 Median :1.0000000  
 Mean   :0.6666667  
 3rd Qu.:1.0000000  
 Max.   :1.0000000  
> 
> summary(X) 
     Black            vote                   Male          
 Min.   :0.0000000   Min.   : 0.8100   Min.   :0.0000000  
 1st Qu.:0.0000000   1st Qu.:24.0350   1st Qu.:0.0000000  
 Median :0.0000000   Median :47.6300   Median :0.0000000  
 Mean   :0.4355231   Mean   :48.0447   Mean   :0.2579075  
 3rd Qu.:1.0000000   3rd Qu.:72.1300   3rd Qu.:1.0000000  
 Max.   :1.0000000   Max.   :91.3200   Max.   :1.0000000  
> 
> table(Y) 
Y
  0   1 
137 274 
> 
> table(Y)/sum(table(Y)) 
Y
           0            1 
0.3333333333 0.6666666667 
> 
> 
> # Logit model coefficients 
> 
> logit<- glm(Y ~ X, family=binomial (link = "logit")) 
> 
> summary(logit) 

Call:
glm(formula = Y ~ X, family = binomial(link = "logit"))

Deviance Residuals: 
       Min          1Q      Median          3Q         Max  
-2.1658288  -1.1277933   0.5904486   0.9190314   1.3256407  

Coefficients:
                  Estimate   Std. Error  z value   Pr(>|z|)    
(Intercept)    0.462496014  0.265017604  1.74515  0.0809584 .  
XBlack         1.329633506  0.244053422  5.44812 5.0904e-08 ***
Xvote         -0.008839950  0.004262016 -2.07412  0.0380678 *  
XMale          0.781144950  0.283218355  2.75810  0.0058138 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 523.21465  on 410  degrees of freedom
Residual deviance: 469.48706  on 407  degrees of freedom
AIC: 477.48706

Number of Fisher Scoring iterations: 4

> 
> # Logit model odds ratios 
> 
> exp(logit$coefficients) 
  (Intercept)        XBlack Xvote                XMale 
 1.5880327947  3.7796579101  0.9911990073  2.1839713716 

有没有办法在 R 中结合这两个脚本来更新我的 logit 模型,以便在我预测时将性别视为 50/50 而不是 74% 女性/26% 男性?

谢谢!

4

1 回答 1

0

由于您想从模型中创建预测,这里有一个可能的解决方案:(1) 将逻辑回归模型与您手头的数据(即 74% 女性和 26% 男性)拟合,然后 (2) 提取预测您的模型中将性别变量设置为 0.5 的概率。有关?predict.glm更多信息,请参阅。

于 2014-04-20T17:21:15.887 回答