3

EDIT: As Dwin pointed out in the comments, the code below is not for an ROC curve. An ROC curve must be indexed in variation in t and not in lambda (as I do below). I will edit the code below when I get the chance.

Below is my attempt to create an ROC curve of glmnet predicting a binary outcome. I've simulated a matrix that approximates glmnet results in the code below. As some of you know, given an n x p matrix of inputs, glmnet outputs an n x 100 matrix of predicted probabilities [$\Pr(y_i = 1)$] for 100 different values of lambda. The output will be narrower than 100 if further changes in lambda stop increasing predictive power. The simulated matrix of glmnet predicted probabilities below is a 250x69 matrix.

First, is there an easier way to plot a glmnet ROC curve? Second, if not, does the below approach seem correct? Third, do I care about plotting (1) the probability of false/true positives OR (2) simply the observed rate of false/true positives?

set.seed(06511)

# Simulate predictions matrix
phat = as.matrix(rnorm(250,mean=0.35, sd = 0.12))
lambda_effect = as.matrix(seq(from = 1.01, to = 1.35, by = 0.005))
phat = phat %*% t(lambda_effect)


#Choose a cut-point
t = 0.5

#Define a predictions matrix
predictions = ifelse(phat >= t, 1, 0)

##Simulate y matrix
y_phat = apply(phat, 1, mean) + rnorm(250,0.05,0.10)
y_obs = ifelse(y_phat >= 0.55, 1, 0)

#percentage of 1 observations in the validation set, 
p = length(which(y_obs==1))/length(y_obs)

#   dim(testframe2_e2)

#probability of the model predicting 1 while the true value of the observation is 0, 
apply(predictions, 1, sum)

## Count false positives for each model
## False pos ==1, correct == 0, false neg == -1
error_mat = predictions - y_obs
## Define a matrix that isolates false positives
error_mat_fp = ifelse(error_mat ==1, 1, 0)
false_pos_rate = apply(error_mat_fp, 2,  sum)/length(y_obs)

# Count true positives for each model
## True pos == 2, mistakes == 1, true neg == 0
error_mat2 = predictions + y_obs
## Isolate true positives
error_mat_tp = ifelse(error_mat2 ==2, 1, 0)
true_pos_rate = apply(error_mat_tp, 2,  sum)/length(y_obs)


## Do I care about (1) this probability OR (2) simply the observed rate?
## (1)
#probability of false-positive, 
p_fp = false_pos_rate/(1-p)
#probability of true-positive, 
p_tp = true_pos_rate/p

#plot the ROC, 
plot(p_fp, p_tp)


## (2)
plot(false_pos_rate, true_pos_rate)

There's one question on this on SO, but the answer was rough and not quite right: glmnet lasso ROC charts

4

2 回答 2

8

ROCR用于计算 AUC 和绘制 ROC 曲线的选项:

library(ROCR)
library(glmnet)
library(caret)

df <- data.matrix(… ) # dataframe w/ predictor variables & a response variable
                      # col1 = response var; # cols 2:10 = predictor vars

# Create training subset for model development & testing set for model performance testing
inTrain <- createDataPartition(df$ResponsVar, p = .75, list = FALSE)
Train <- df[ inTrain, ]
Test <- df[ -inTrain, ]

# Run model over training dataset
lasso.model <- cv.glmnet(x = Train[,2:10], y = Train[,1], 
                         family = 'binomial', type.measure = 'auc')

# Apply model to testing dataset
Test$lasso.prob <- predict(lasso.model,type="response", 
                           newx = Test[,2:10], s = 'lambda.min')
pred <- prediction(Test$lasso.prob, Test$ResponseVar)

# calculate probabilities for TPR/FPR for predictions
perf <- performance(pred,"tpr","fpr")
performance(pred,"auc") # shows calculated AUC for model
plot(perf,colorize=FALSE, col="black") # plot ROC curve
lines(c(0,1),c(0,1),col = "gray", lty = 4 )

对于Test$lasso.prob上述情况,您可以输入不同的 lambda 来测试每个值的预测能力。

于 2014-08-04T14:29:40.747 回答
0

使用预测和标签,以下是创建基本 ROC 曲线的方法

# randomly generated data for example, binary outcome
predictions = runif(100, min=0, max=1) 
labels = as.numeric(predictions > 0.5) 
labels[1:10] = abs(labels[1:10] - 1) # randomly make some labels not match predictions

# source: https://blog.revolutionanalytics.com/2016/08/roc-curves-in-two-lines-of-code.html
labels_reordered = labels[order(predictions, decreasing=TRUE)]
roc_dat = data.frame(TPR=cumsum(labels_reordered)/sum(labels_reordered), FPR=cumsum(!labels_reordered)/sum(!labels_reordered))

# plot the roc curve
plot(roc_dat$FPR, roc_dat$TPR)

生成的图

于 2019-03-26T22:43:33.027 回答