1

I have 3 milion abstracts and I would like to extract 4-grams from them. I want to build a language model so I need to find the frequencies of these 4-grams.

My problem is that I can't extract all these 4-grams in memory. How can I implement a system that it can estimate all frequencies for these 4-grams?

4

1 回答 1

0

听起来您需要将中频计数存储在磁盘上而不是内存中。幸运的是,大多数数据库都可以做到这一点,python 可以与大多数数据库通信。

于 2016-09-21T10:20:25.507 回答