7

我正在考虑在 Cross-Validated 中发布我的问题,但决定来这里。我正在使用 nnet 包中的 multinom() 函数来估计受年龄和教育程度影响的就业、失业或失业的几率。我需要一些翻译方面的帮助。

我有以下数据集,其中包含一个因分类变量就业状况(EmpSt)和两个独立分类变量:年龄(Age)和教育水平(教育)。

>head(df)
               EmpSt   Age                         Education
1           Employed   61+   Less than a high school diploma
2           Employed 50-60 High school graduates, no college
3 Not in labor force 50-60   Less than a high school diploma
4           Employed 30-39       Bachelor's degree or higher
5           Employed 20-29  Some college or associate degree
6           Employed 20-29  Some college or associate degree

以下是级别的摘要:

>summary(df)
                EmpSt          Age                                    Education    
 Not in universe   :    0   16-19: 6530   Less than a high school diploma  :14686  
 Employed          :61478   20-29:16031   High school graduates, no college:30716  
 Unemployed        : 3940   30-39:16520   Some college or associate degree :28525  
 Not in labor force:38508   40-49:17403   Bachelor's degree or higher      :29999  
                            50-60:20779                                            
                            61+  :26663                                    
  • 一、什么是估计方程(模型)

我想确定调用的估计方程(模型)是什么

df$EmpSt<-relevel(df$EmpSt,ref="Employed") multinom(EmpSt ~ Age + Education,data=df)

所以我可以把它写在我的研究论文中。据我了解, Employed 是基本级别,此调用的 logit 模型是:

在此处输入图像描述 在此处输入图像描述

其中 i 和 n 分别是变量年龄和教育的类别(对不起,符号混乱)。如果我对 multinom() 产生的逻辑模型的理解不正确,请纠正我。我不打算包含测试的摘要,因为它有很多输出,所以下面我只包含 call 的输出>test

> test
Call:
multinom(formula = EmpSt ~ Age + Education, data = ml)

Coefficients:
                   (Intercept)   Age20-29   Age30-39   Age40-49   Age50-60     Age61+
Unemployed           -1.334734 -0.3395987 -0.7104361 -0.8848517 -0.9358338 -0.9319822
Not in labor force    1.180028 -1.2531405 -1.6711616 -1.6579095 -1.2579600  0.8197373
                   EducationHigh school graduates, no college EducationSome college or associate degree
Unemployed                                         -0.4255369                                 -0.781474
Not in labor force                                 -0.8125016                                 -1.004423
                   EducationBachelor's degree or higher
Unemployed                                    -1.351119
Not in labor force                            -1.580418

Residual Deviance: 137662.6 
AIC: 137698.6 

鉴于我对 multinom() 生成的 logit 模型的理解是正确的,因此系数是使用基本级别的记录几率。为了得到实际的赔率,我通过电话来反对,exp(coef(test))这给了我实际的赔率:

> exp(coef(test))
                   (Intercept)  Age20-29  Age30-39  Age40-49  Age50-60    Age61+
Unemployed           0.2632281 0.7120560 0.4914298 0.4127754 0.3922587 0.3937724
Not in labor force   3.2544655 0.2856064 0.1880285 0.1905369 0.2842333 2.2699035
                   EducationHigh school graduates, no college EducationSome college or associate degree
Unemployed                                          0.6534189                                 0.4577308
Not in labor force                                  0.4437466                                 0.3662560
                   EducationBachelor's degree or higher
Unemployed                                    0.2589504
Not in labor force                            0.2058891

这让我想到了下一个问题。

  • 二、概率

我想知道是否有一种方法可以根据年龄和教育程度来获得失业与就业的实际概率,例如,如果我 22 岁并且拥有高中文凭,那么失业的概率是多少。很抱歉这个冗长的问题。谢谢你的帮助。让我知道是否需要进一步澄清。

4

1 回答 1

5

About your first question, I'm also having some doubts about multinom with categorical variables (here is my question: Multinom with Matrix of Counts as Response).

From what a user replied in that question and the output of >test you posted, I guess that the math you wrote is partially right: indeed, a multinomial model should work only if the predictor variables are continuous or dichotomous (i.e., with values only 0 or 1), and it seems that when multinom gets categorical variables as predictors, like in your example, R automatically converts them to dummy varibales (only 0 or 1).

With reference to your example, considering only the Age predictor, we should have ln(\frac{Pr(unemployed)}{Pr(employed}) = \beta_0 + \beta_1*Age20-29 + \beta_2*Age30-39 + ... and an analogous formula for Pr(not in labor force), but with different \beta coefficients.

About your second question: yes, there is a way. Use predict(test, newdata, "probs"), where newdata is an array with Age20-29 and High school graduates, no college as entries (given your example).

于 2014-03-10T07:59:58.493 回答