我当前的数据集对女性进行了过采样,以至于她们占 411 总样本量的 74%——而且应该是 50% 到 50%。如何使用我的分层后输出来影响我的(逻辑回归)预测模型?
> library(foreign)
> library(survey)
> mydata <- read.csv("~/Desktop/R/mydata.csv")
> #Enter Actual Population Size
> mydata$fpc <- 1200
> #Enter ID Column Name
> id <- mydata$My.ID
> #Enter Column to Post-Stratify
> type <- mydata$Male
> #Enter Column Variables
> x1 <- 0
> y1 <- 1
> #Enter Corresponding Frequencies
> x2 <- 600
> y2 <- 600
> #Enter the Variable of Interest
> mydata$interest <- mydata$Support
> preliminary.design <- svydesign(id = ~1, data = mydata, fpc = ~fpc)
> ps.weights <- data.frame(type = c(x1,y1), Freq = c(x2, y2))
> mydesign <- postStratify(preliminary.design, ~type, ps.weights)
> #Print Original Mean of Variable of Interest
> mean(mydata$Support)
[1] 0.6666666667
> #Total Actual Population Size
> sum(ps.weights$Freq)
[1] 1200
> #Unweighted Observations Where the Variable of Interest is Not Missing
> unwtd.count(~interest, mydesign)
counts SE
counts 411 0
> #Print the Post-Stratified Mean and SE of the Variable
> svymean(~interest, mydesign)
mean SE
interest 0.71077946 0.01935
> #Print the Weighted Total and SE of the Variable
> svytotal(~interest, mydesign)
total SE
interest 852.93535 23.21552
> #Print the Mean and SE of the Interest Variable, by Type
> svyby(~interest, ~type, mydesign, svymean)
type interest se
0 0 0.6196721311 0.02256768435
1 1 0.8018867925 0.03142947839
> mysvyby <- svyby(~interest, ~type, mydesign, svytotal)
> #Print the Coefficients of each Type
> coef(mysvyby)
0 1
371.8032787 481.1320755
> #Print the Standard Error of each Type
> SE(mysvyby)
[1] 13.54061061 18.85768704
> #Print Confidence Intervals for the Coefficient Estimates
> confint(mysvyby)
2.5 % 97.5 %
0 345.2641696 398.3423878
1 444.1716880 518.0924629
> mydata <- read.csv("~/Desktop/R/mydata.csv")
> attach(mydata)
> # Define variables
> Y <- cbind(Support)
> X <- cbind(Black, vote, Male)
> # Descriptive statistics
> summary(Y)
Min. :0.0000000
1st Qu.:0.0000000
Median :1.0000000
Mean :0.6666667
3rd Qu.:1.0000000
Max. :1.0000000
> summary(X)
Black vote Male
Min. :0.0000000 Min. : 0.8100 Min. :0.0000000
1st Qu.:0.0000000 1st Qu.:24.0350 1st Qu.:0.0000000
Median :0.0000000 Median :47.6300 Median :0.0000000
Mean :0.4355231 Mean :48.0447 Mean :0.2579075
3rd Qu.:1.0000000 3rd Qu.:72.1300 3rd Qu.:1.0000000
Max. :1.0000000 Max. :91.3200 Max. :1.0000000
> table(Y)
0 1
137 274
> table(Y)/sum(table(Y))
0 1
0.3333333333 0.6666666667
> # Logit model coefficients
> logit<- glm(Y ~ X, family=binomial (link = "logit"))
> summary(logit)
glm(formula = Y ~ X, family = binomial(link = "logit"))
Deviance Residuals:
Min 1Q Median 3Q Max
-2.1658288 -1.1277933 0.5904486 0.9190314 1.3256407
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.462496014 0.265017604 1.74515 0.0809584 .
XBlack 1.329633506 0.244053422 5.44812 5.0904e-08 ***
Xvote -0.008839950 0.004262016 -2.07412 0.0380678 *
XMale 0.781144950 0.283218355 2.75810 0.0058138 **
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 523.21465 on 410 degrees of freedom
Residual deviance: 469.48706 on 407 degrees of freedom
AIC: 477.48706
Number of Fisher Scoring iterations: 4
> # Logit model odds ratios
> exp(logit$coefficients)
(Intercept) XBlack Xvote XMale
1.5880327947 3.7796579101 0.9911990073 2.1839713716
有没有办法在 R 中结合这两个脚本来更新我的 logit 模型,以便在我预测时将性别视为 50/50 而不是 74% 女性/26% 男性?