我注意到它DocumentTermMatrix(myCorpus, control=list(dictionary=myDict))
消耗的内存比DocumentTermMatrix(myCorpus)
为什么会这样?
有什么线索吗?
这是代码片段:
library(tm)
library(XML)
source("MyXMLReader.r") # contains the myXML reader code
myCorpus <- Corpus(DirSource(paste(basepath,"corpus",sep=""))
readerControl = list(reader = myXMLReader))
myDict = unlist(readLines("some-file-containing-a-fixed-vocab"))
现在这是我的问题:
dtm = DocumentTermMatrix(mYCorpus) # takes very little extra RAM to do this
dtm = DocumentTermMatrix(myCorpus,control=list(dictionary=myDict)) # Takes a whole lot of # RAM` which is not even released after dtm is formed...
我猜有内存泄漏和可能的错误。