1

归一化特征向量以用于线性内核 SVM 的正确方法是什么?

查看 LIBSVM,看起来它只是通过将每个功能重新缩放到一个标准的上限/下限来完成的。但是,PyML 似乎没有提供一种以这种方式扩展数据的方法。取而代之的是,有一些选项可以按向量的长度对向量进行归一化,按平均值移动每个特征值,同时按标准差重新缩放,等等。

我正在处理大多数特征都是二进制的情况,除了少数是数字的。

4

1 回答 1

0

I am not an expert in this, but I believe centering and scaling each feature vector by subtracting its mean and dividing thereafter by the standard deviation is a typical way to normalize feature vectors for use with SVMs. In R, this can be done with the scale function.

Another way is to transform each feature vector to the [0,1] range:

(x - min(x)) / (max(x) - min(x))

Maybe some features could benefit from a log-transformation if the distribution is very scewed, but this would change the shape of the distribution as well and not only "move" it.

I am not sure what you gain in an SVM-setting by normalizing the vectors by their L1 or L2 norm like PyML does with its normalize method. I guess binary features (0 or 1) don't need to be normalized.

于 2011-08-22T10:02:59.983 回答