0

我正在尝试使用 soundex 将一行的每个单词转换为哈希版本,然后使用 scikit-learn 对其执行一些机器学习。

代码如下:

train = []
for line in text:
    a = ' '
    sound = []
    for word in line.split():
        sound.append(soundex(word))
        a = ' '.join(sound)
    train.append(a)

count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(real_train)

但是当我这样做时,我收到一个错误:

X_train_counts = count_vect.fit_transform(real_train)
File "/usr/local/lib/python2.7/dist-packages/sklearn/feature_extraction/text.py", line 780, in fit_transform
vocabulary, X = self._count_vocab(raw_documents, self.fixed_vocabulary)
File "/usr/local/lib/python2.7/dist-packages/sklearn/feature_extraction/text.py", line 710, in _count_vocab
analyze = self.build_analyzer()
File "/usr/local/lib/python2.7/dist-packages/sklearn/feature_extraction/text.py", line 226, in build_analyzer
tokenize = self.build_tokenizer()
File "/usr/local/lib/python2.7/dist-packages/sklearn/feature_extraction/text.py", line 203, in build_tokenizer
token_pattern = re.compile(self.token_pattern)
File "/usr/lib/python2.7/re.py", line 190, in compile
return _compile(pattern, flags)
File "/usr/lib/python2.7/re.py", line 244, in _compile
raise error, v # invalid expression
sre_constants.error: unexpected end of pattern
4

0 回答 0