python - liblinear 内存开销太大

Question

我已经运行 liblinear 来对模型文件进行建模。

python代码在这里：

y, x = svm_read_problem(vector_file)
prob = problem(y, x)
param = parameter('-s 2 -c 1')
m = train(prob, param)
save_model(model_file, m)

问题是当vector_file约为247MB时，运行liblinear时的总内存成本约为3.08GB。为什么要花这么多钱？

而在我的项目中，vector_file 将有 2GB 大，如何使用 liblinear 来训练问题，然后我可以得到一个模型文件？

score 1 · Accepted Answer

好的，我知道问题出在哪里了。

读题时，liblinear的python接口使用：

prob_y = []
prob_x = []

for line in open(data_file_name):
    line = line.split(None, 1)
    # In case an instance with all zero features
    if len(line) == 1: line += ['']
    label, features = line
    xi = {}
    for e in features.split():
        ind, val = e.split(":")
        xi[int(ind)] = float(val)
    prob_y += [float(label)]
    prob_x += [xi]

return (prob_y, prob_x)

在 python 中，int 需要 28 个字节，float 需要 24 个字节，这超出了我的想象。

我会将此类案例发布给作者。

python - liblinear 内存开销太大

1 回答 1

Related

Reference