这是我正在使用的示例数据框(df):
+---+----+--------+
| id|orig|scrubbed|
+---+----+--------+
| 1| a| a|
| 2| B| b|
| 3| c| c|
| 4| D| d|
| 5| *| XX|
| 6| $| XX|
| 7| ZZ| ZZ|
| 8| XX| XX|
| 9| y| y|
| 10| Z| z|
+---+----+--------+
我想执行一项检查,告诉我擦洗后“填充”(不包含“XX”或“ZZ”)的项目比例是否至少为 80%。(此检查应该失败。)我可以添加一个合规性分析器VerificationRunBuilder
来计算指标,如下所示:
val myVerificationResult: VerificationResult = new VerificationRunBuilder(df).
addRequiredAnalyzer(
Compliance(
"populatedAfterScrubbing",
"`scrubbed` NOT IN ('ZZ', 'XX') AND `scrubbed` IS NOT NULL",
Some("`orig` NOT IN ('ZZ', 'XX') AND `orig` IS NOT NULL")
)
).
addCheck(
Check(CheckLevel.Error, "Review Check").
hasSize(_ >= 1)
).
run()
此代码运行并使用约束成功检查数据hasSize
,但我无法弄清楚如何根据我的自定义合规性分析器添加约束。这可能吗?