scale - svmlib 缩放与 pyml 规范化、缩放和转换

Question

归一化特征向量以用于线性内核 SVM 的正确方法是什么？

查看 LIBSVM，看起来它只是通过将每个功能重新缩放到一个标准的上限/下限来完成的。但是，PyML 似乎没有提供一种以这种方式扩展数据的方法。取而代之的是，有一些选项可以按向量的长度对向量进行归一化，按平均值移动每个特征值，同时按标准差重新缩放，等等。

我正在处理大多数特征都是二进制的情况，除了少数是数字的。

score 0 · Accepted Answer

I am not an expert in this, but I believe centering and scaling each feature vector by subtracting its mean and dividing thereafter by the standard deviation is a typical way to normalize feature vectors for use with SVMs. In R, this can be done with the scale function.

Another way is to transform each feature vector to the [0,1] range:

(x - min(x)) / (max(x) - min(x))

Maybe some features could benefit from a log-transformation if the distribution is very scewed, but this would change the shape of the distribution as well and not only "move" it.

I am not sure what you gain in an SVM-setting by normalizing the vectors by their L1 or L2 norm like PyML does with its normalize method. I guess binary features (0 or 1) don't need to be normalized.

scale - svmlib 缩放与 pyml 规范化、缩放和转换

1 回答 1

Related

Reference