1

我试过了:

#Generate data
import pandas as pd 
import numpy as np

df = pd.DataFrame(np.random.randn(100, 5), columns=['a', 'b', 'c', 'd', 'e'])
df["y"] = (df['a'] > 0.5).astype(int)
df.head()

from mleap.sklearn.ensemble.forest import RandomForestClassifier

forestModel = RandomForestClassifier()
forestModel.mlinit(input_features='a',
                   feature_names='a',
                           prediction_column='e_binary')


forestModel.fit(df[['a']], df[['y']])

forestModel.serialize_to_bundle("jar:file:/dbfs/FileStore/tables/mleaptestmodelforestpysparkzip", "randomforest.zip")

我收到了这个错误:

No such file or directory: 'jar:file:/dbfs/FileStore/tables/mleaptestmodelforestpysparkzip/randomforest.zip.node'

我也试过:forestModel.serialize_to_bundle("jar:file:/dbfs/FileStore/tables/mleaptestmodelforestpysparkzip/randomforest.zip")

并收到一条错误消息,指出缺少“model_name”属性。

请问你能帮帮我吗?


我添加了我尝试做的所有事情以及得到的结果:

到 Zip 的管道:

1.

pipeline.serialize_to_bundle("jar:file:/dbfs/FileStore/tables/mleap/pipeline_zip/1/model.zip", model_name="forest")

=> FileNotFoundError:[Errno 2] 没有这样的文件或目录:'jar:file:/dbfs/FileStore/tables/mleap/pipeline_zip/1/model.zip/model.json'

2.

pipeline.serialize_to_bundle("jar:file:/dbfs/FileStore/tables/mleap/pipeline_zip/1/model.zip", model_name="forest", init=True)

FileNotFoundError:[Errno 2] 没有这样的文件或目录:'jar:file:/dbfs/FileStore/tables/mleap/pipeline_zip/1/model.zip/forest'

3.

pipeline.serialize_to_bundle("jar:file:/dbfs/FileStore/tables/mleap/pipeline_zip/1/model.zip", model_name="forest", init=True) 并创建“/dbfs/FileStore/tables/mleap/pipeline_zip/1/model.zip/forest”

=> FileNotFoundError:[Errno 2] 没有这样的文件或目录:'jar:file:/dbfs/FileStore/tables/mleap/pipeline_zip/1/model.zip/forest'

4.

pipeline.serialize_to_bundle("/dbfs/FileStore/tables/mleap/pipeline_zip/1/model.zip", model_name="forest", init=True)

=> FileNotFoundError:[Errno 2] 没有这样的文件或目录:'/dbfs/FileStore/tables/mleap/pipeline_zip/1/model.zip/forest'

5.

pipeline.serialize_to_bundle("/dbfs/FileStore/tables/mleap/pipeline_zip/1/model.zip", model_name="forest", init=True)

=> OSError: [Errno 95] Operation not supported - 但保存一些东西

  1. pipeline.serialize_to_bundle("jar:dbfs:/dbfs/FileStore/tables/mleap/pipeline_zip/1/model.zip", model_name="forest", init=True)

=> FileNotFoundError: [Errno 2] 没有这样的文件或目录:'jar:dbfs:/dbfs/FileStore/tables/mleap/pipeline_zip/1/model.zip/forest

7.

pipeline.serialize_to_bundle("jar:dbfs:/FileStore/tables/lifttruck_mleap/pipeline_zip2/1/model.zip", model_name="forest", init=True)

=> FileNotFoundError:[Errno 2] 没有这样的文件或目录:'jar:dbfs:/FileStore/tables/mleap/pipeline_zip/1/model.zip/forest'

8.

pipeline.serialize_to_bundle("dbfs:/FileStore/tables/lifttruck_mleap/pipeline_zip2/1/model.zip", model_name="forest", init=True)

=> FileNotFoundError:[Errno 2] 没有这样的文件或目录:'dbfs:/FileStore/tables/mleap/pipeline_zip2/1/model.zip/forest'


要压缩的模型

  1. forest.serialize_to_bundle("jar:file:/dbfs/FileStore/tables/mleap/random_forest_zip/1/model.zip", model_name="forest")

=> FileNotFoundError:[Errno 2] 没有这样的文件或目录:'jar:file:/dbfs/FileStore/tables/mleap/random_forest_zip/1/model.zip/forest.node'

  1. forest.serialize_to_bundle("jar:file:/dbfs/FileStore/tables/mleap/random_forest_zip/1", model_name="model.zip")

=> FileNotFoundError: [Errno 2] 没有这样的文件或目录:'jar:file:/dbfs/FileStore/tables/mleap/random_forest_zip/1/model.zip.node'

  1. forest.serialize_to_bundle("/dbfs/FileStore/tables/mleap/random_forest_zip/1", model_name="model.zip")

=> 不要保存拉链。而是保存一个包。

4

1 回答 1

0

我发现了问题和解决方法。

无法再使用 Databricks 进行随机写入,如下所述:https ://docs.databricks.com/data/databricks-file-system.html?_ga=2.197884399.1151871582.1592826411-509486897.1589442523#local-file-apis

一种解决方法是将 zip 文件写入本地文件系统,然后将其复制到 DBFS。所以:

  1. 使用“init = True”在管道中序列化模型并将其保存在本地目录中
  2. 使用“dbutils.fs.cp(源,目标)”将其复制到您的数据湖

dbutils.fs.cp(源,目标)

于 2020-06-29T09:56:32.977 回答