r - 在 R 中设置一个 Mlogit，每个类别都有许多观察结果

Question

我正在尝试Mlogit在 R 中使用，我对 logits 有点陌生，并且在 Mlogit 框架中设置我的问题时遇到了麻烦。我实际上并不完全确定mlogit 是正确的方法。这是一个类似的问题。

考虑一个棒球数据集，其结果变量包含“出局”、“单人”、“双人”、“三人”和“本垒打”。对于解释变量，我们有击球手的名字、投手的名字和体育场。每个击球手都有数百个观察结果，包括许多击球手面对同一个投手的情况。

我认为这绝对是多项式 logit，因为我有多个分类结果，但我不确定，因为所有文档似乎都在处理备选方案之间的“选择”，这不是真的。我尝试通过为击球手设置一个因子变量、为投手设置另一个变量、为体育场设置另一个变量来开始我的 logit 模型。当我在 R 中尝试这个时，我得到

Error in row.names<-.data.frame(*tmp*, value = value) : invalid 'row.names' length

通过一些谷歌搜索，我认为可能只期望对击球手、投手和公园的每种组合进行一次观察？也许不吧？我究竟做错了什么？我应该如何设置？

编辑：此处的数据示例

https://docs.google.com/spreadsheets/d/19fiq_QEMj4nAPcTqIRxeaYNPgqeHxKAEuPrfHMeIJ7o/edit?usp=sharing

score 1 · Accepted Answer

以下是有关如何开始分析数据的一些建议。

# Your dataset
dts <- structure(list(outcome = c(1L, 1L, 2L, 3L, 1L, 3L, 2L, 3L, 3L, 
3L, 3L, 1L, 2L, 2L, 2L, 1L, 3L, 2L, 2L, 2L, 1L, 2L, 3L, 2L, 2L, 
2L, 2L, 1L, 1L, 2L, 3L, 2L, 3L, 1L, 2L, 2L, 3L, 2L, 3L, 3L, 3L, 
2L, 1L, 1L, 1L, 2L, 3L, 2L, 1L), hitter = structure(c(3L, 3L, 
3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 2L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("james", 
"jill", "john"), class = "factor"), pitcher = structure(c(3L, 
3L, 1L, 1L, 1L, 1L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 3L, 1L, 1L, 
2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 3L, 2L, 1L, 2L, 3L, 2L, 
3L, 2L, 1L, 1L, 2L, 2L, 1L, 3L, 3L, 1L, 2L, 2L, 1L, 1L, 2L, 2L
), .Label = c("bill", "bob", "brett"), class = "factor"), place = structure(c(3L, 
3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 5L, 
5L, 5L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L
), .Label = c("ca", "co", "dc", "ny", "tn"), class = "factor")), .Names = c("outcome", 
"hitter", "pitcher", "place"), class = "data.frame", row.names = c(NA, 
-49L))

# Estimation of a multinomial logistic regression model
library(mlogit)
dts.wide <- mlogit.data(dts, choice="outcome", shape="wide")
fit.mlogit <- mlogit(outcome ~ 1 | hitter+pitcher+place, data=dts.wide)

# Results
library(stargazer)
stargazer(fit.mlogit, type="text")

# Model coefficients with standard errors and statistical significance (stars)
==========================================
                   Dependent variable:    
               ---------------------------
                         outcome          
------------------------------------------
2:(intercept)            19.456           
                       (3,056.626)        

3:(intercept)            35.179           
                       (4,172.540)        

2:hitterjill             -17.543          
                       (3,056.625)        

3:hitterjill             -33.117          
                       (4,172.540)        

2:hitterjohn             -0.188           
                         (0.996)          

3:hitterjohn             -1.410           
                         (1.056)          

2:pitcherbob             -0.070           
                         (1.005)          

3:pitcherbob             -1.270           
                         (1.091)          

2:pitcherbrett           -0.908           
                         (1.063)          

3:pitcherbrett           -2.284*          
                         (1.257)          

2:placeco                -1.655           
                         (1.557)          

3:placeco                -17.688          
                       (2,840.270)        

2:placedc                -19.428          
                       (3,056.626)        

3:placedc                -34.479          
                       (4,172.540)        

2:placeny                -18.802          
                       (3,056.625)        

3:placeny                -32.873          
                       (4,172.540)        

2:placetn                -18.885          
                       (3,056.626)        

3:placetn                -32.140          
                       (4,172.540)        

------------------------------------------
Observations               49             
R2                        0.155           
Log Likelihood           -44.605          
LR Test             16.388 (df = 18)      
==========================================
Note:          *p<0.1; **p<0.05; ***p<0.01

有关 R 中多项逻辑模型估计的更多详细信息，请参见此处。

r - 在 R 中设置一个 Mlogit，每个类别都有许多观察结果

1 回答 1

Related

Reference