2

I am trying to get the marginal effects, according to this post: http://andrewgelman.com/2016/01/14/rstanarm-and-more/

td <- readRDS("some data")

CHAINS <- 1
CORES <- 1
SEED <- 42
ITERATIONS <- 2000
MAX_TREEDEPTH <- 9

md <- td[,.(y,x1,x2)] # selection the columns i need. y is binary


glm1 <- stan_glm(y~x1+x2,
                 data = md,
                 family = binomial(link="logit"),
                 prior = NULL,
                 prior_intercept = NULL,
                 chains = CHAINS,
                 cores = CORES,
                 seed = SEED,
                 iter = ITERATIONS,
                 control=list(max_treedepth=MAX_TREEDEPTH)
)

# launch_shinystan(glm1) 


tmp <- posterior_predict(glm1,newdata=md[,.(x1,x2)])

Issue

After running this code i get the following error: I get an error that y not found, which actually means that i also need to pass y in the newdata, which it shouldn't be the case according to ?posterior_predict

Reasoning

I need tmp <- posterior_predict(glm1,newdata=md[,.(x1,x2)]) because according to the post above (as far as i understand), in order to calculate the marginal effect of x1 (if i assume that x1 is binary) would be

temp <- md
temp[,x1:=0]
temp[,x2:=mean(x2)]
number_0 <- posterior_predict(glm1,newdata=temp)

temp <- md
temp[,x1:=1]
temp[,x2:=mean(x2)]
number_1 <- posterior_predict(glm1,newdata=temp)

marginal_effect_x1 <- number_1 - number_0
4

1 回答 1

3

对于二元 logit 模型,连续变量的边际效应是关于该变量的成功概率的导数,根据链式法则,它是逻辑密度(在预测变量的某些值下评估,通常是观察到的值预测变量)乘以相关变量的系数。在您的情况下,那将是 df <- as.data.frame(glm1) ME <- df$x2 * dlogis(posterior_linpred(glm1)) 因为这取决于预测变量的观察值,因此通常对数据进行平均对于 AME <- rowMeans(ME) 二元预测变量,您可以从成功的概率中减去成功x1 = 0的概率x1 = 1通过 nd <- md nd$x1 <- 0 p0 <- posterior_linpred(glm1, newdata = nd, transform = TRUE) nd$x1 <- 1 p1 <- posterior_linpred(glm1, newdata = nd, transform = TRUE) ME <- p1 - p0 AME <- rowMeans(ME)

于 2017-07-11T18:46:24.190 回答