我想在我的 PMML 回归模型中添加一个额外的目标(“outputState”)。
- outputState = 0:没有丢失/无效的输入值(-> 回归模型中没有插补)
- outputState = 1:存在缺失/无效的无效值(->回归模型中的插补)
我尝试使用多个模型,但我不知道如何正确处理多个模型/目标/输出。
示例(以下解释):
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PMML xmlns="http://www.dmg.org/PMML-4_3" xmlns:data="http://jpmml.org/jpmml-model/InlineTable" version="4.3"><Header><Application name="JPMML-R" version="1.3.14"/><Timestamp>2020-01-07T15:56:07Z</Timestamp></Header>
<DataDictionary>
<DataField name="outputState" optype="categorical" dataType="integer"/>
<DataField name="outputResult" optype="continuous" dataType="double"/>
<DataField name="inputA" optype="continuous" dataType="double">
<Interval closure="closedClosed" leftMargin="-1" rightMargin="1"/>
<Value property="missing" value="NA"/>
</DataField>
<DataField name="inputB" optype="continuous" dataType="double">
<Interval closure="closedClosed" leftMargin="-1" rightMargin="1"/>
<Value property="missing" value="NA"/>
</DataField>
<DataField name="inputC" optype="continuous" dataType="double">
<Interval closure="closedClosed" leftMargin="-1" rightMargin="1"/>
<Value property="missing" value="NA"/>
</DataField>
</DataDictionary>
<TransformationDictionary/>
<MiningModel functionName="mixed">
<MiningSchema>
<MiningField name="outputState" usageType="target"/>
<MiningField name="outputResult" usageType="target"/>
<MiningField name="inputA"/>
<MiningField name="inputB"/>
<MiningField name="inputC"/>
</MiningSchema>
<Output>
<OutputField name="outputState" optype="categorical" dataType="integer" targetField="outputState"/>
<OutputField name="outputResult" optype="continuous" dataType="double" targetField="outputResult"/>
</Output>
<Segmentation multipleModelMethod="selectAll">
<Segment id="1">
<True/>
<TreeModel modelName="TEST" functionName="classification" noTrueChildStrategy="returnLastPrediction">
<MiningSchema>
<MiningField name="outputState" usageType="target"/>
<MiningField name="inputA" invalidValueTreatment="asMissing"/>
<MiningField name="inputB" invalidValueTreatment="asMissing"/>
<MiningField name="inputC" invalidValueTreatment="asMissing"/>
</MiningSchema>
<Node score="0">
<True/>
<Node score="1">
<CompoundPredicate booleanOperator="or">
<SimplePredicate field="inputA" operator="isMissing"/>
<SimplePredicate field="inputB" operator="isMissing"/>
<SimplePredicate field="inputC" operator="isMissing"/>
</CompoundPredicate>
</Node>
</Node>
</TreeModel>
</Segment>
<Segment id="2">
<True/>
<RegressionModel functionName="regression">
<MiningSchema>
<MiningField name="outputResult" usageType="target"/>
<MiningField name="inputA" missingValueReplacement="0" missingValueTreatment="asMean" invalidValueTreatment="asMissing"/>
<MiningField name="inputB" missingValueReplacement="0" missingValueTreatment="asMean" invalidValueTreatment="asMissing"/>
<MiningField name="inputC" missingValueReplacement="0" missingValueTreatment="asMean" invalidValueTreatment="asMissing"/>
</MiningSchema>
<RegressionTable intercept="2">
<NumericPredictor name="inputA" coefficient="1"/>
<NumericPredictor name="inputB" coefficient="2"/>
<NumericPredictor name="inputC" coefficient="3"/>
</RegressionTable>
</RegressionModel>
</Segment>
</Segmentation>
</MiningModel>
</PMML>
解释:
- DataDictionary(左右边距)
- MiningModel(functionName="mixed" 好像错了?;Segmentation multipleModelMethod="selectAll" 也错了?):
- 输出定义(似乎也错了?因为不同的目标?)
- 简单分类树模型(检测缺失/估算值)-> 目标:outputState
- 简单回归模型 -> 目标:输出结果
任何人的想法或更好的建议?