哪种产品(Mallet 或 Weka)更适合文本分类任务:
- 训练更简单
- 更好的结果
- 文档
我是这个问题的新手,所以任何评论都会很棒
MALLET 更易于使用,并且大部分工作都是在无形中完成的。你也不必转换任何东西的格式,你只需给它文本文件,它就会给你返回结果。
Weka 需要将文本转换为特定格式(Weka 脚本执行此操作非常缓慢且效率低下,我建议您自己编写)。
MALLET 的问题是训练使用 GB 的内存,如果你有大量的训练集,它可能需要几个小时。
Weka 有更多的文档,但其中大部分没有意义。MALLET 的文档很少,但使用起来非常简单。
老实说,在测试了它们之后,我选择了编写自己的分类器。
I'm really enjoying Weka vs Mallet. Maybe I don't know enough yet, but doing machine learning with a GUI is awesome. You can tweak parameters and run different experiments (keeping the results of past experiments in front of you, too) very easily. I'm new to Weka, so this is FWIW.
As far as which one is simpler to train, I find Weka simpler. I don't know what kind of control you can have over your feature space by just pointing Mallet at some text (maybe it's good enough), but my experience with Mallet was comparable to Weka... writing scripts to get the input in the proper format, with the caveat that I had to do multiple steps to utilize some kind of serialized version of the data in Mallet.
Regarding your other questions, I can't really answer them right now, but am hoping this answer doesn't get downvoted 'cause it's good information to be out there, anyway.