我正在尝试使用 Mallet 在 ~1GB 文本文件上运行主题建模,该文件有 11403956 行。从 mallet 目录中,我cd
将bin
内存要求升级到 1024GB:
set MALLET_MEMORY=1024G
然后我尝试运行命令:
bin/mallet import-file --input combined_bios.txt --output dh_size.mallet --keep-sequence --remove-stopwords
但是,这会引发内存错误:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at gnu.trove.TObjectIntHashMap.rehash(TObjectIntHashMap.java:170)
at gnu.trove.THash.postInsertHook(THash.java:359)
at gnu.trove.TObjectIntHashMap.put(TObjectIntHashMap.java:155)
at cc.mallet.types.Alphabet.lookupIndex(Alphabet.java:115)
at cc.mallet.types.Alphabet.lookupIndex(Alphabet.java:123)
at cc.mallet.types.FeatureSequence.add(FeatureSequence.java:131)
at cc.mallet.pipe.TokenSequence2FeatureSequence.pipe(TokenSequence2FeatureSequence.java:44)
at cc.mallet.pipe.Pipe$SimplePipeInstanceIterator.next(Pipe.java:294)
at cc.mallet.pipe.Pipe$SimplePipeInstanceIterator.next(Pipe.java:282)
at cc.mallet.types.InstanceList.addThruPipe(InstanceList.java:267)
at cc.mallet.classify.tui.Csv2Vectors.main(Csv2Vectors.java:290)
这种情况有解决方法吗?其他人可以提供的任何帮助将不胜感激!