c - MLP训练：如何处理未知特征值

翻译自：https://stackoverflow.com/questions/24157317 2014-06-11T07:44:36.580

197 次

1

假设我们有一个要使用一组特征向量训练的 MLP，其中一些向量包含未知值。我应该如何处理？MLP能做到这一点吗？

假设训练向量是：

(1.0, 3.4, unknown, 2.0), (3.1, unknown, 1.2, 0.1), (2.1,3.4,1.2,4.5), ...

我正在使用 FANN。

1 回答 1

0

缺失数据

您指的是缺失数据问题（Little. and Rubin 1987）。这不是神经网络分类器可以单独处理的事情。您应该预处理您的数据，并尝试通过基于已知实例变量值的简单统计估计值 (1) 或更高级的算法 (2) 来填充缺失的数据。

(1) 举例：

instance1 = 0, 0, 1, 0, 1
instance2 = 0, 0, 1, 0, 1
instance3 = 1, 1, 1, 0, 0
instanceX = 1, 1, 1, 0, ?

# The statistical approach
We can see that instanceX shares a lot of instance3's features,
thus we will set the unknown variable accoring to instance3's defined value: 0
# The mean
We could calculate the dataset's mean value for this variable and
use the calculated value: 1

(2) EM算法

这是一种更复杂的算法，用于查找缺失数据的近似估计值。在此处阅读算法介绍。

于 2014-06-20T07:32:23.460 回答