scala - 设置 opName 时，Transformer 的 Op 名称不可用

Question

我创建了我的自定义转换器（将字符串添加到列值的简单模型）来测试 Mleap 序列化，但是在为 Mleap 和 Spark 序列化编写我的 Op 文件时，我无法知道我的转换器的名称。

我的reference.conf 文件看起来像这样

my.domain.mleap.spark.ops = ["spark_side.CustomTransformerOp"]

// include the custom transformers ops we have defined to the default Spark registries
ml.combust.mleap.spark.registry.v20.ops += my.domain.mleap.spark.ops
ml.combust.mleap.spark.registry.v21.ops += my.domain.mleap.spark.ops
ml.combust.mleap.spark.registry.v22.ops += my.domain.mleap.spark.ops
ml.combust.mleap.spark.registry.v23.ops += my.domain.mleap.spark.ops

my.domain.mleap.ops = ["mleap_side.CustomTransformerOp"]

// include the custom transformers we have defined to the default MLeap registry
ml.combust.mleap.registry.default.ops += my.domain.mleap.ops

当我在我的数据集上仅使用该阶段运行管道时，它工作正常，如果我将 opName 设置为某个字符串或 Bundle.BuiltinOps 成员之一，我什至可以保存管道。

如果我输入一些字符串，则会弹出错误消息：“无法找到密钥：thatString”，如果我使用另一个成员，则错误表明它无法从该成员那里找到密钥（这是完全合理的，我明白为什么它发生了）。

我的问题是如何在我的 Op 文件中声明 opName 时使我的转换器的名称可用。

（如果有人能找到 Hollin Wilkins 那就太棒了：D）

score 0 · Accepted Answer

我有同样的问题。根据这个链接

https://github.com/combust/mleap/wiki/Adding-an-MLeap-Spark-Transformer

您需要自己将其添加到ml.combust.bundle.dsl.Bundle.BuiltinOps

在第 3 节中。为 MLeap 实现 Bundle.ML 序列化

注意：如果实现 vanilla Spark 转换器，请确保将 opName 添加到 ml.combust.bundle.dsl.Bundle.BuiltinOps。

scala - 设置 opName 时，Transformer 的 Op 名称不可用

1 回答 1

Related

Reference