3

我有以下代码(基于此处的示例),但它不起作用:

[...]
def my_analyzer(s):
    return s.split()
my_vectorizer = CountVectorizer(analyzer=my_analyzer)
X_train = my_vectorizer.fit_transform(traindata)

ch2 = SelectKBest(chi2,k=1)
X_train = ch2.fit_transform(X_train,Y_train)
[...]

调用时出现以下错误fit_transform

AttributeError: 'function' object has no attribute 'analyze'

根据文档, CountVectorizer 应该像这样创建:vectorizer = CountVectorizer(tokenizer=my_tokenizer). 但是,如果我这样做,我会收到以下错误:"got an unexpected keyword argument 'tokenizer'".

我实际的 scikit-learn 版本是 0.10。

4

1 回答 1

3

您正在查看 0.11(即将发布)的文档,其中矢量化器已经过大修。检查0.10 的文档,其中没有tokenizer参数并且analyzer应该是实现analyze方法的对象:

class MyAnalyzer(object):
    @staticmethod
    def analyze(s):
        return s.split()

v = CountVectorizer(analyzer=MyAnalyzer())

http://scikit-learn.org/dev is the documentation for the upcoming release (which may change at any time), while http://scikit-learn/stable has the documentation for the current stable version.

于 2012-04-29T16:14:55.343 回答