17

任何地方都没有记录永久删除实验。我正在使用带有后端 postgres db 的 Mlflow

这是我运行的:

client = MlflowClient(tracking_uri=server)
client.delete_experiment(1)

这会删除实验,但是当我运行与刚刚删除的实验同名的新实验时,它将返回此错误:

mlflow.exceptions.MlflowException: Cannot set a deleted experiment 'cross-sell' as the active experiment. You can restore the experiment, or permanently delete the  experiment to create a new one.

我在文档中找不到任何地方显示如何永久删除所有内容。

4

5 回答 5

19

不幸的是,目前似乎无法通过 UI 或 CLI 执行此操作:-/

执行此操作的方法取决于您使用的后端文件存储的类型。

文件存储

如果您使用文件系统作为存储机制(默认),那么这很容易。“已删除”的实验被移动到一个.trash文件夹中。你只需要清除它:

rm -rf mlruns/.trash/*

截至文档的当前版本(1.7.2),他们说:

建议使用 cron 作业或替代工作流机制来清除.trash文件夹。

SQL 数据库:

这更棘手,因为需要删除依赖项。我正在使用 MySQL,这些命令对我有用:

USE mlflow_db;  # the name of your database
DELETE FROM experiment_tags WHERE experiment_id=ANY(
    SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
);
DELETE FROM latest_metrics WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
    )
);
DELETE FROM metrics WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
    )
);
DELETE FROM tags WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
    )
);
DELETE FROM runs WHERE experiment_id=ANY(
    SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
);
DELETE FROM experiments where lifecycle_stage="deleted";
于 2020-03-26T14:05:13.453 回答
13

从 mlflow 1.11.0 开始,在实验中永久删除运行的推荐方法是:mlflow gc [OPTIONS].

从文档中,mlflow gc

从指定的后端存储中永久删除已删除生命周期阶段中的运行。此命令删除与指定运行关联的所有工件和元数据。

于 2020-08-04T12:51:21.620 回答
9

如果您使用 PostgreSQL 作为后端存储,如果您想永久删除 MLFlow 的垃圾,我将添加 SQL 命令。

更改为您的 MLFlow 数据库,例如使用:\c mlflow 然后:

DELETE FROM experiment_tags WHERE experiment_id=ANY(
    SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
);
DELETE FROM latest_metrics WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM metrics WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM tags WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM params WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs where experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
));
DELETE FROM runs WHERE experiment_id=ANY(
    SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
);
DELETE FROM experiments where lifecycle_stage='deleted';

不同之处在于,我在那里添加了“params”表 SQL 删除命令。

于 2020-11-24T07:57:35.963 回答
2

扩展@Lee Netherton的答案,您可以在从 MLFlow 跟踪客户端删除实验后,使用PyMySQL执行这些查询并从 MLFlow 跟踪服务器中删除所有元数据。

import pymysql

def perm_delete_exp():
    connection = pymysql.connect(
        host='localhost',
        user='user',
        password='password',
        db='mlflow',
        cursorclass=pymysql.cursors.DictCursor)
    with connection.cursor() as cursor:
        queries = """
            USE mlflow;
            DELETE FROM experiment_tags WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted");
            DELETE FROM latest_metrics WHERE run_uuid=ANY(SELECT run_uuid FROM runs WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted"));
            DELETE FROM metrics WHERE run_uuid=ANY(SELECT run_uuid FROM runs WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted"));
            DELETE FROM tags WHERE run_uuid=ANY(SELECT run_uuid FROM runs WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted"));
            DELETE FROM runs WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted");
            DELETE FROM experiments where lifecycle_stage="deleted";
        """
        for query in queries.splitlines()[1:-1]:
            cursor.execute(query.strip())
    connection.commit()
    connection.close()

您可以(也许应该)一次执行整个查询,但我发现这种方式更容易调试。

于 2020-11-15T14:00:54.600 回答
1

不幸的是,在我的例子中,上面的 SQL 命令不适用于 SQLITE。这是通过将“any”命令替换为“in”来在数据库 IDE 中使用 sqlite 的 SQL 版本:

DELETE FROM experiment_tags WHERE experiment_id in (
    SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    );
DELETE FROM latest_metrics WHERE run_uuid in (
    SELECT run_uuid FROM runs WHERE experiment_id in (
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM metrics WHERE run_uuid in (
    SELECT run_uuid FROM runs WHERE experiment_id in (
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM tags WHERE run_uuid in (
    SELECT run_uuid FROM runs WHERE experiment_id in (
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM params WHERE run_uuid in (
    SELECT run_uuid FROM runs where experiment_id in (
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
));
DELETE FROM runs WHERE experiment_id in (
    SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
);
DELETE FROM experiments where lifecycle_stage='deleted';
于 2021-07-18T18:26:34.577 回答