3

我注意到它DocumentTermMatrix(myCorpus, control=list(dictionary=myDict))消耗的内存比DocumentTermMatrix(myCorpus)

为什么会这样?

有什么线索吗?

这是代码片段:

library(tm)
library(XML)
source("MyXMLReader.r") # contains the myXML reader code 
myCorpus <- Corpus(DirSource(paste(basepath,"corpus",sep=""))
readerControl = list(reader = myXMLReader))
myDict = unlist(readLines("some-file-containing-a-fixed-vocab"))

现在这是我的问题:

dtm = DocumentTermMatrix(mYCorpus) # takes very little extra RAM to do this
dtm = DocumentTermMatrix(myCorpus,control=list(dictionary=myDict)) # Takes a whole lot of # RAM` which is not even released after dtm is formed...

我猜有内存泄漏和可能的错误。

4

0 回答 0