python - Sklearn Tfidf Vectorizer norm=None norm-l2

问问题 2018-04-13T20:37:53.480

3087 次

嗨，我正在尝试了解 scikit-learn 如何计算矩阵中的 TFIDF 分数：文档 1，功能 6，“wine”：

test_doc = ['The wine was lovely', 'The red was delightful',
            'Terrible choice of wine', 'We had a bottle of red']

# Create vectorizer
vec = TfidfVectorizer(stop_words='english')
# Feature vector
tfidf = vec.fit_transform(test_doc)

feature_names = vec.get_feature_names()
feature_matrix = tfidf.todense()

['bottle', 'choice', 'delightful', 'lovely', 'red', 'terrible', 'wine']
[[ 0.         0.         0.         0.78528828 0.        0.         0.6191303 ]
 [ 0.         0.         0.78528828 0.         0.6191303 0.         0.        ]
 [ 0.         0.61761437 0.         0.         0.        0.61761437 0.48693426]
 [ 0.78528828 0.         0.         0.         0.6191303 0.         0.        ]]

我正在使用一个非常相似的问题的答案来为自己计算： How are TF-IDF computed by the scikit-learn TfidfVectorizer然而在他们的 TFIDFVectorizer 中，norm=None。

当我使用默认设置 norm=l2 时，这与 norm=None 有何不同，我如何为自己计算？

python - Sklearn Tfidf Vectorizer norm=None norm-l2

0 回答 0

Related

Reference