r - R中具有不同阈值和二进制预测的自定义AUC

Question

我希望在不同阈值的 AUC 图上绘制 FPR 与 TPR 点。

例如，如果data$C2我的数据框具有真实响应列（0 或 1），我想在data$C1（不同的测量列）高于或低于指定阈值时创建一个具有预测值（0 或 1）的向量。这是我尝试使用 ROCR 包的功能。

 fun <- function (data, col1, col2){

   perfc <- NULL    #Create null vectors for prediction and performance
   perfs <- NULL
   temp <- NULL

 d <- seq(0.10,0.30,0.01)    ##Various thresholds to be tested

  for (i in length(d){

   temp <- ifelse(data[,col1] > d, 1 , 0)  ##Create predicted responses 
   pred <- prediction(temp, data[,col2])  #Predict responses over true values
   perf <- performance(pred, "tpr","fpr") #Store performance information

    predc[i] <- pred #Do this i times for every d in the sequence
    perfc[i] <- perf

   preds <- prediction.class(predc, col2)  #Combine to make prediction class
   perfs <- performance.class(preds, "tpr","fpr") #Combine to make performance class
}

  plot(perfs) #Plot TPR against FPR 
}

问题是因为temp是列表向量而真正的标签来自矩阵吗？我是否错误地应用了这个 for 循环？

提前致谢！

编辑：这是我在没有 ROC 包的情况下手动执行此操作的尝试。

for(t in seq(0.40,0.60,0.01))  #I want to do this for every t in the sequence
{
  t <- t
  TP <- 0
  FP <- 0
  p <- sum(data$C2==1, na.rm=TRUE)  #Total number of true positives
  n <- sum(data$C2==0, na.rm=TRUE)   #Total number of true negatives
  list <- data$C1 #Column to vector 
  test <- ifelse(list > t, 1, 0)  #Make prediction vector

 for(i in 1:nrow(data))
    {if(test==1 & data$C2==1)
      {TP <- TP + 1}  #Count number of correct predictions
   if(test==1 & data$C2==0) 
      {FP <- FP + 1}   #Count number of false positives
     }
  plot(x=(FP/n),y=(TP/p))    #Plot every FP,TP pair
 }

score 0 · Accepted Answer

我希望我能正确理解你的问题，但我认为 AUC 图是指 ROC 曲线。ROC 曲线已经考虑了不同的阈值来做出这些分类决策。请参阅此维基百科页面。我发现这张照片特别有用。

如果以上是正确的，那么您需要在代码中做的就是：

pred <- prediction(data[,col1], data[,col2])  
perf <- performance(pred, "tpr","fpr")  
plot(perf)

如果您想在该图中“添加”一条不同的曲线，可能是因为您使用了不同的分类技术（例如决策树而不是逻辑回归。然后使用plot(perf2,add=TRUE). Whereperf2的创建方式与 . 相同perf。请参阅文档。

r - R中具有不同阈值和二进制预测的自定义AUC

1 回答 1

Related

Reference