背景:我刚刚开始使用 scikit-learn,并在页面底部阅读了有关joblib 与 pickle的内容。
使用joblib替换pickle(joblib.dump & joblib.load)可能更有趣,这在大数据上效率更高,但只能pickle到磁盘而不是字符串
我读了这个关于 Pickle 的问答, Python 中 pickle 的常见用例,想知道这里的社区是否可以分享 joblib 和 pickle 之间的区别?什么时候应该使用一个而不是另一个?
背景:我刚刚开始使用 scikit-learn,并在页面底部阅读了有关joblib 与 pickle的内容。
使用joblib替换pickle(joblib.dump & joblib.load)可能更有趣,这在大数据上效率更高,但只能pickle到磁盘而不是字符串
我读了这个关于 Pickle 的问答, Python 中 pickle 的常见用例,想知道这里的社区是否可以分享 joblib 和 pickle 之间的区别?什么时候应该使用一个而不是另一个?
mmap_mode="r"
。感谢 Gunjan 给我们这个脚本!我为 Python3 结果修改了它
#comapare pickle loaders
from time import time
import pickle
import os
import _pickle as cPickle
from sklearn.externals import joblib
file = os.path.join(os.path.dirname(os.path.realpath(__file__)), 'database.clf')
t1 = time()
lis = []
d = pickle.load(open(file,"rb"))
print("time for loading file size with pickle", os.path.getsize(file),"KB =>", time()-t1)
t1 = time()
cPickle.load(open(file,"rb"))
print("time for loading file size with cpickle", os.path.getsize(file),"KB =>", time()-t1)
t1 = time()
joblib.load(file)
print("time for loading file size joblib", os.path.getsize(file),"KB =>", time()-t1)
time for loading file size with pickle 79708 KB => 0.16768312454223633
time for loading file size with cpickle 79708 KB => 0.0002372264862060547
time for loading file size joblib 79708 KB => 0.0006849765777587891
我遇到了同样的问题,所以我尝试了这个(使用 Python 2.7),因为我需要加载一个大的泡菜文件
#comapare pickle loaders
from time import time
import pickle
import os
try:
import cPickle
except:
print "Cannot import cPickle"
import joblib
t1 = time()
lis = []
d = pickle.load(open("classi.pickle","r"))
print "time for loading file size with pickle", os.path.getsize("classi.pickle"),"KB =>", time()-t1
t1 = time()
cPickle.load(open("classi.pickle","r"))
print "time for loading file size with cpickle", os.path.getsize("classi.pickle"),"KB =>", time()-t1
t1 = time()
joblib.load("classi.pickle")
print "time for loading file size joblib", os.path.getsize("classi.pickle"),"KB =>", time()-t1
输出为
time for loading file size with pickle 1154320653 KB => 6.75876188278
time for loading file size with cpickle 1154320653 KB => 52.6876490116
time for loading file size joblib 1154320653 KB => 6.27503800392
根据这个 joblib 比这 3 个模块中的 cPickle 和 Pickle 模块工作得更好。谢谢
只是一个谦虚的说明...... Pickle 更适合拟合的 scikit-learn 估计器/训练模型。在 ML 应用程序中,训练过的模型主要被保存和加载以进行预测。