I'm training a spam detector using the MultinomialNB model in scikit-learn. I use the DictVectorizer class to transform tokens to word counts (i.e. features). I would like to be able to train the model over time using new data as it arrives (in this case in the form of chat messages incoming to our app server). For this, it looks like the partial_fit function will be useful.
However what I can't seem to figure out is how to enlarge the size of the DictVectorizer after it has been initially "trained". If new features/words arrive that have never been seen, they are simply ignored. What I would like to do is pickle the current version of the model and the DictVectorizer and update them each time we do a new training session. Is this possible?