python - 如何使用 python 将混淆矩阵记录到 azureml 平台

Question

你好 Stackoverflowers，

我正在使用 azureml，我想知道是否可以记录我正在训练的 xgboost 模型的混淆矩阵以及我已经记录的其他指标。这是我正在使用的代码示例：

from azureml.core.model import Model
from azureml.core import Workspace
from azureml.core.experiment import Experiment
from azureml.core.authentication import ServicePrincipalAuthentication
import json

with open('./azureml.config', 'r') as f:
    config = json.load(f)

svc_pr = ServicePrincipalAuthentication(
   tenant_id=config['tenant_id'],
   service_principal_id=config['svc_pr_id'],
   service_principal_password=config['svc_pr_password'])


ws = Workspace(workspace_name=config['workspace_name'],
                        subscription_id=config['subscription_id'],
                        resource_group=config['resource_group'],
                        auth=svc_pr)

y_pred = model.predict(dtest)

acc = metrics.accuracy_score(y_test, (y_pred>.5).astype(int))
run.log("accuracy",  acc)
f1 = metrics.f1_score(y_test, (y_pred>.5).astype(int), average='binary')
run.log("f1 score",  f1)


cmtx = metrics.confusion_matrix(y_test,(y_pred>.5).astype(int))
run.log_confusion_matrix('Confusion matrix', cmtx)

上面的代码引发了这种错误：

TypeError: Object of type ndarray is not JSON serializable

我已经尝试将矩阵转换为更简单的矩阵，但是在我记录它的“手动”版本之前发生了另一个错误（cmtx = [[30000, 50],[40, 2000]]）。

run.log_confusion_matrix('Confusion matrix', [list([int(y) for y in x]) for x in cmtx])

AzureMLException: AzureMLException:
    Message: UserError: Resource Conflict: ArtifactId ExperimentRun/dcid.3196bf92-4952-4850-9a8a-    c5103b205379/Confusion matrix already exists.
    InnerException None
    ErrorResponse 
{
    "error": {
        "message": "UserError: Resource Conflict: ArtifactId ExperimentRun/dcid.3196bf92-4952-4850-9a8a-c5103b205379/Confusion matrix already exists."
    }
}

这让我觉得我没有正确处理命令run.log_confusion_matrix()。那么，再一次，我可以将混淆矩阵记录到我的 azureml 实验中的最佳方式是什么？

score 4 · Accepted Answer

感谢我的同事，我最终找到了解决方案。因此，我正在回答自己，以结束这个问题，也许，帮助别人。

您可以在此链接中找到正确的功能：https ://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.run.run?view=azure-ml-py#log-混淆矩阵名称--值--描述----。

无论如何，您还必须考虑到，显然 Azure 不适用于 sklearn 返回的标准混淆矩阵格式。它确实只接受列表列表，而不是 numpy 数组，填充有 numpy.int64 元素。因此，您还必须以更简单的格式转换矩阵（为了简单起见，我在下面的命令中使用了嵌套列表推导：

cmtx = metrics.confusion_matrix(y_test,(y_pred>.5).astype(int))
cmtx = {

"schema_type": "confusion_matrix",
"parameters": params,
 "data": {"class_labels": ["0", "1"],
          "matrix": [[int(y) for y in x] for x in cmtx]}
}
run.log_confusion_matrix('Confusion matrix - error rate', cmtx)

python - 如何使用 python 将混淆矩阵记录到 azureml 平台

1 回答 1

Related

Reference