0

我的主要问题是: 的predict()函数给出了哪些概率mnlogit(),它与包nnet和的概率有何不同mlogit

在某些背景下,我尝试仅根据个别特定变量对结果进行建模,因为我不知道我的选择者的替代方案。对于给定的模型,我可以从所有三个模型中得到相同的预测概率,但mnlogit给出了几组概率,其中第一组与其他包给出的概率相似。看着 的小插图mnlogit,我知道我可以得到个别的特定概率,但我不认为那些是我提取的(?),我也不认为指定模型来获得这些概率。

查看下面的示例(不是最紧凑的示例,而是我在学习这些函数时使用的示例),您可以看到它mnlogit给出了几组概率。

    library(data.table);library(stringr);library(nnet);library(mlogit);library(mnlogit)
data("ModeCanada", package = "mlogit")
bususers <- with(ModeCanada, case[choice == 1 & alt == "bus"])
ModeCanada <- subset(ModeCanada, !case %in% bususers)
ModeCanada <- subset(ModeCanada, nchoice == 4)
ModeCanada <- subset(ModeCanada, alt != "bus")
ModeCanada$alt <- ModeCanada$alt[drop = TRUE]
KoppWen00 <- mlogit.data(ModeCanada, shape='long', chid.var = 'case',
                         alt.var = 'alt', choice='choice',
                         drop.index=TRUE)

data("ModeCanada", package = "mlogit")
busUsers <- with(ModeCanada, case[choice == 1 & alt == "bus"])
Bhat <- subset(ModeCanada, !case %in% busUsers & alt != "bus" &
                     nchoice == 4)
Bhat$alt <- Bhat$alt[drop = TRUE]
head(ModeCanada)
Mode = data.table(ModeCanada)

# Some additional editing in order to make it more similar to the typical data sets I work with
Bhat2 = data.table(KoppWen00)
Bhat2[,Choice:=gsub("\\.","",str_sub(row.names(KoppWen00),5,-1))][,id:=as.character(as.numeric(str_sub(row.names(Bhat),1,4)))]
Bhat2 = Bhat2[choice=="TRUE"][,c("Choice","urban","income","id"),with=F]

# nnet package
ml.nn<- multinom(Choice ~ urban + income,
                 Bhat2)
tmp = data.table(cbind(Bhat2, predict(ml.nn, type="probs", newdata=Bhat2)))
# nnet predictions
tmp[urban=="0" & income==45 & Choice=="air"][1,c("Choice", "urban", "income" , "air","car","train"),with=F]

# mlogit package
ml <- mlogit(Choice ~ 1| urban + income,shape="wide",
                Bhat2)
pml = data.table(cbind(Bhat2, predict(ml,mlogit.data(Bhat2, shape="wide", choice="Choice"))))
# mlogit predictions
unique(pml[Choice=="air" & urban=="0" & income==45 ][,c("Choice", "urban", "income" , "air","car","train"),with=F])

# mnlogit packages
mln.MC <- mnlogit(Choice ~ 1| urban + income, mlogit.data(Bhat2,choice = "Choice",shape="wide"))
preddata = data.table(cbind(mlogit.data(Bhat2,choice = "Choice",shape="wide"), predict(mln.MC)))
# mnlogit predictions, returns several probabilities for each outcome
preddata[Choice==TRUE & urban=="0" & income==45 & alt == "air"]

ps!随意添加标签“mnlogit”!

4

1 回答 1

1

我将使用一个比你更简单的例子,但想法是一样的

library(mnlogit)
data(Fish, package = "mnlogit")
fm <- formula(mode ~ price | income | catch)
fit <- mnlogit(fm, Fish, choiceVar="alt", ncores = 2)
p <- predict(fit)

R> head(p)
             beach      boat   charter       pier
1.beach 0.09299770 0.5011740 0.3114002 0.09442818
2.beach 0.09151069 0.2749292 0.4537956 0.17976449
3.beach 0.01410359 0.4567631 0.5125571 0.01657626
4.beach 0.17065867 0.1947959 0.2643696 0.37017583
5.beach 0.02858216 0.4763721 0.4543225 0.04072325
6.beach 0.01029792 0.5572462 0.4216448 0.01081103

R> summary(apply(p,1,sum))
Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1       1       1       1       1       1 

如您所见,输出概率predict.mnlogit正是您所期望的:它们是预测观察属于指定类的概率。即P(Y_i = y_j | X_i)其中 j = 1,2,...,k 对于 k 个特定类。如以下评论中所述,概率也取决于模型。因此,更完整的符号是P(Y_i = y_j | X_i, \theta),其中\theta表示模型的估计参数。

在这种情况下,对于 Obs 1:海滩 9%,船 50%,包机 31%,码头 9%。您选择的任何分类方法(nnetmlogit等)都应该对其预测概率有类似的解释。类似地,任何数据集对预测概率都有相同的解释。

您还可以看到,多项预测的所有可能分类的总和为 1。

于 2015-11-25T16:24:50.943 回答