如果您检查文件svm-scale.c,您会发现缩放数据的公式是:
value = y_lower + (y_upper-y_lower) * (value - y_min)/(y_max-y_min);
缩放限制y_lower
y_upper
在哪里y
因此,您可以看到缩放值没有像您假设的那样计算出来“从最小值中减去该值,然后除以特定特征的范围”。如果你想恢复真正的价值,你只需要撤消公式。
例子:
如果您以 libSVM 站点中可用的众多数据集之一为例,例如这个:covtype dataset
,然后打开它,您将看到这样一个文件:
1 1:2596 2:51 3:3 4:258 6:510 7:221 8:232 9:148 10:6279 11:1 43:1
1 1:2590 2:56 3:2 4:212 5:-6 6:390 7:220 8:235 9:151 10:6225 11:1 43:1
2 1:2804 2:139 3:9 4:268 5:65 6:3180 7:234 8:238 9:135 10:6121 11:1 26:1
2 1:2785 2:155 3:18 4:242 5:118 6:3090 7:238 8:238 9:122 10:6211 11:1 44:1
1 1:2595 2:45 3:2 4:153 5:-1 6:391 7:220 8:234 9:150 10:6172 11:1 43:1
...
现在让我们使用以下方法对其进行缩放:
./svm-scale -s covtype.libsvm.binary.range covtype.libsvm.binary > covtype.libsvm.binary.scale
这将生成两个文件,该.range
文件将包含与缩放过程相关的所有信息(每列的最大值和最小值),以及.scale
作为输出的文件,如下所示:
1 1:-0.262631 2:-0.716667 3:-0.909091 4:-0.630637 5:-0.552972 6:-0.856681 7:0.740157 8:0.826772 9:0.165354 10:0.750732 11:1 12:-1 13:-1 14:-1 15:-1 16:-1 17:-1 18:-1 19:-1 20:-1 21:-1 22:-1 23:-1 24:-1 25:-1 26:-1 27:-1 28:-1 29:-1 30:-1 31:-1 32:-1 33:-1 34:-1 35:-1 36:-1 37:-1 38:-1 39:-1 40:-1 41:-1 42:-1 43:1 44:-1 45:-1 46:-1 47:-1 48:-1 49:-1 50:-1 51:-1 52:-1 53:-1 54:-1
1 1:-0.268634 2:-0.688889 3:-0.939394 4:-0.696492 5:-0.568475 6:-0.890403 7:0.732283 8:0.850394 9:0.188976 10:0.735675 11:1 12:-1 13:-1 14:-1 15:-1 16:-1 17:-1 18:-1 19:-1 20:-1 21:-1 22:-1 23:-1 24:-1 25:-1 26:-1 27:-1 28:-1 29:-1 30:-1 31:-1 32:-1 33:-1 34:-1 35:-1 36:-1 37:-1 38:-1 39:-1 40:-1 41:-1 42:-1 43:1 44:-1 45:-1 46:-1 47:-1 48:-1 49:-1 50:-1 51:-1 52:-1 53:-1 54:-1
2 1:-0.0545273 2:-0.227778 3:-0.727273 4:-0.616321 5:-0.385013 6:-0.106365 7:0.84252 8:0.874016 9:0.0629921 10:0.706678 11:1 12:-1 13:-1 14:-1 15:-1 16:-1 17:-1 18:-1 19:-1 20:-1 21:-1 22:-1 23:-1 24:-1 25:-1 26:1 27:-1 28:-1 29:-1 30:-1 31:-1 32:-1 33:-1 34:-1 35:-1 36:-1 37:-1 38:-1 39:-1 40:-1 41:-1 42:-1 43:-1 44:-1 45:-1 46:-1 47:-1 48:-1 49:-1 50:-1 51:-1 52:-1 53:-1 54:-1
2 1:-0.0735368 2:-0.138889 3:-0.454545 4:-0.653543 5:-0.248062 6:-0.131657 7:0.874016 8:0.874016 9:-0.0393701 10:0.731772 11:1 12:-1 13:-1 14:-1 15:-1 16:-1 17:-1 18:-1 19:-1 20:-1 21:-1 22:-1 23:-1 24:-1 25:-1 26:-1 27:-1 28:-1 29:-1 30:-1 31:-1 32:-1 33:-1 34:-1 35:-1 36:-1 37:-1 38:-1 39:-1 40:-1 41:-1 42:-1 43:-1 44:1 45:-1 46:-1 47:-1 48:-1 49:-1 50:-1 51:-1 52:-1 53:-1 54:-1
1 1:-0.263632 2:-0.75 3:-0.939394 4:-0.780959 5:-0.555556 6:-0.890122 7:0.732283 8:0.84252 9:0.181102 10:0.720898 11:1 12:-1 13:-1 14:-1 15:-1 16:-1 17:-1 18:-1 19:-1 20:-1 21:-1 22:-1 23:-1 24:-1 25:-1 26:-1 27:-1 28:-1 29:-1 30:-1 31:-1 32:-1 33:-1 34:-1 35:-1 36:-1 37:-1 38:-1 39:-1 40:-1 41:-1 42:-1 43:1 44:-1 45:-1 46:-1 47:-1 48:-1 49:-1 50:-1 51:-1 52:-1 53:-1 54:-1
...
该.range
文件如下所示:
x
-1 1
1 1859 3858
2 0 360
3 0 66
4 0 1397
...
因此,考虑到这一点y_lower = -1
,y_upper = 1
您可以验证第一个元素2596
的转换:
value = -1 + (1 - (-1)) * (2596 - 1859) / (3858 - 1859) = -0.26263131565782893
这是预期值:)
提示:
通常,您使用 缩放您的训练集svm-scale
,获取您的模型(使用 k 折交叉验证),最后使用从训练中获得的值 (y_max
和) 执行测试缩放数据。y_min
您可以在文件中看到该过程tools/easy.py
。