parallel-processing - 特征选择、特征提取、特征权重的区别

Question

我对“特征选择/提取器/权重”的含义以及它们之间的区别感到有些困惑。当我阅读文献时，有时我会感到迷茫，因为我发现这个词使用得非常松散，我主要担心的是——

当人们谈论特征频率、特征存在时——是特征选择吗？
当人们谈论诸如信息增益、最大熵之类的算法时，它仍然是特征选择吗？
如果我训练分类器——以一个要求分类器记下文档中单词位置的特征集为例——人们还会称之为特征选择吗？

谢谢拉胡尔·迪赫

score 18 · Accepted Answer

Rahul-

All of these are good answers. The one thing I would mention is that the fundamental difference between selection and extraction has to do with how you are treating the data.

Feature Extraction methods are transformative -- that is you are applying a transformation to your data to project it into a new feature space with lower dimension. PCA, and SVD are examples of this.

Feature Selection methods choose features from the original set based on some criteria, Information Gain, Correlation and Mutual Information are just criteria that are used to filter out unimportant or redundant features. Embedded or wrapper methods, as they are called, can use specialized classifiers to achieve feature selection and classify the dataset at the same time.

A really nice overview of the problem space is given here.

Good Luck!

score 8 · Accepted Answer

特征提取：通过将 D 维向量（线性或非线性）投影到 d 维向量 (d < D) 上来降低维度。示例：主成分分析

特征选择：通过选择原始变量的子集来降低维度。示例：前向或后向特征选择

score 6 · Accepted Answer

特征选择是从您的集合中选择“有趣”特征以进行进一步处理的过程。

特征频率就是一个特征出现的频率。

信息增益、最大熵等是加权方法，它们使用特征频率，进而允许您执行特征选择。

可以这样想：

您解析一个语料库，并创建一个术语/文档矩阵。这个矩阵开始是对术语的计数，以及它们出现在哪个文档中（简单频率）。

为了使该矩阵更有意义，您可以根据包括频率在内的某些函数对术语进行加权（例如术语频率-逆文档频率、信息增益、最大熵）。现在该矩阵包含权重，或每个项相对于矩阵中其他项的重要性。

一旦你有了它，你就可以使用特征选择来只保留最重要的术语（如果你正在做分类或分类之类的事情）并执行进一步的分析。

parallel-processing - 特征选择、特征提取、特征权重的区别

3 回答 3

Related

Reference