python - 在 OneHotEncoder 之后取消转换

Question

我正在使用 sklearn 的 OneHotEncoder，但想取消转换我的数据。知道怎么做吗？

>>> from sklearn.preprocessing import OneHotEncoder
>>> enc = OneHotEncoder()
>>> enc.fit([[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]])  
>>> enc.n_values_
array([2, 3, 4])
>>> enc.feature_indices_
array([0, 2, 5, 9])
>>> enc.transform([[0, 1, 1]]).toarray()
array([[ 1.,  0.,  0.,  1.,  0.,  0.,  1.,  0.,  0.]])

但我希望能够做到以下几点：

>>> enc.untransform(array([[ 1.,  0.,  0.,  1.,  0.,  0.,  1.,  0.,  0.]]))
[[0, 1, 1]]

我该怎么做呢？

对于上下文，我构建了一个学习 one-hot 编码空间的神经网络，现在想使用 nn 进行需要采用原始数据格式的真实预测。

score 1 · Accepted Answer

对于反转单个热编码项，
请参见：https ://stackoverflow.com/a/39686443/7671913

from sklearn.preprocessing import OneHotEncoder
import numpy as np

orig = np.array([6, 9, 8, 2, 5, 4, 5, 3, 3, 6])

ohe = OneHotEncoder()
encoded = ohe.fit_transform(orig.reshape(-1, 1)) # input needs to be column-wise

decoded = encoded.dot(ohe.active_features_).astype(int)
assert np.allclose(orig, decoded)

对于反转一个热编码项的数组，请参阅（如评论中所述）
请参阅：如何反转 sklearn.OneHotEncoder 转换以恢复原始数据？

给定名为 ohc 的 sklearn.OneHotEncoder 实例，调用从 ohc.fit_transform 或 ohc.transform 输出的编码数据 (scipy.sparse.csr_matrix)，以及原始数据的形状 (n_samples, n_feature)，用：

recovered_X = np.array([ohc.active_features_[col] for col in out.sorted_indices().indices])
            .reshape(n_samples, n_features) - ohc.feature_indices_[:-1]

python - 在 OneHotEncoder 之后取消转换

1 回答 1

Related

Reference