r - R包插入符号混淆矩阵缺少类别

Question

我正在使用R包confusionMatrix中的函数来计算我拥有的一些数据的一些统计数据。我一直在将我的预测以及我的实际值放入函数中，以获取要在函数中使用的表，如下所示：carettableconfusionMatrix

table(predicted,actual)

然而，有多种可能的结果（例如 A、B、C、D），我的预测并不总是代表所有的可能性（例如只有 A、B、D）。函数的结果输出table不包括缺失的结果，如下所示：

    A    B    C    D
A  n1   n2   n2   n4  
B  n5   n6   n7   n8  
D  n9  n10  n11  n12
# Note how there is no corresponding row for `C`.

该confusionMatrix函数无法处理丢失的结果并给出错误：

Error in !all.equal(nrow(data), ncol(data)) : invalid argument type

有没有办法我可以table不同地使用该函数来获取带有零的缺失行，或者以confusionMatrix不同的方式使用该函数，以便它将缺失的结果视为零？

注意：由于我是随机选择要测试的数据，因此有时一个类别也没有在实际结果中表示，而不仅仅是预测结果。我不相信这会改变解决方案。

score 27 · Accepted Answer

您可以使用union来确保相似的级别：

library(caret)

# Sample Data
predicted <- c(1,2,1,2,1,2,1,2,3,4,3,4,6,5) # Levels 1,2,3,4,5,6
reference <- c(1,2,1,2,1,2,1,2,1,2,1,3,3,4) # Levels 1,2,3,4

u <- union(predicted, reference)
t <- table(factor(predicted, u), factor(reference, u))
confusionMatrix(t)

score 5 · Accepted Answer

首先注意除了被对象调用外，confusionMatrix还可以调用 as 。但是，如果和（都被视为s）没有相同的级别数，则该函数会引发错误。confusionMatrix(predicted, actual)tablepredictedactualfactor

这（以及包向我吐出错误的事实，caret因为他们一开始没有得到正确的依赖关系）是我建议创建自己的函数的原因：

# Create a confusion matrix from the given outcomes, whose rows correspond
# to the actual and the columns to the predicated classes.
createConfusionMatrix <- function(act, pred) {
  # You've mentioned that neither actual nor predicted may give a complete
  # picture of the available classes, hence:
  numClasses <- max(act, pred)
  # Sort predicted and actual as it simplifies what's next. You can make this
  # faster by storing `order(act)` in a temporary variable.
  pred <- pred[order(act)]
  act  <- act[order(act)]
  sapply(split(pred, act), tabulate, nbins=numClasses)
}

# Generate random data since you've not provided an actual example.
actual    <- sample(1:4, 1000, replace=TRUE)
predicted <- sample(c(1L,2L,4L), 1000, replace=TRUE)

print( createConfusionMatrix(actual, predicted) )

这会给你：

      1  2  3  4
[1,] 85 87 90 77
[2,] 78 78 79 95
[3,]  0  0  0  0
[4,] 89 77 82 83

score 0 · Accepted Answer

我有同样的问题，这是我的解决方案：

tab <- table(my_prediction, my_real_label)
if(nrow(tab)!=ncol(tab)){

missings <- setdiff(colnames(tab),rownames(tab))

missing_mat <- mat.or.vec(nr = length(missings), nc = ncol(tab))
tab  <- as.table(rbind(as.matrix(tab), missing_mat))
rownames(tab) <- colnames(tab)
}

my_conf <- confusionMatrix(tab)

干杯坎库特

r - R包插入符号混淆矩阵缺少类别

3 回答 3

Related

Reference