我正在尝试使用 scikit learn 中的“被动攻击分类器”和 20 个新闻组数据集来实现一个在线分类器。我对此很陌生,因此我不确定我是否正确实施了这一点。话虽如此,我开发了一个小代码,但是当我执行它时,我不断收到错误消息:
回溯(最后一次调用):文件“/home/suleka/Documents/RNN models/passiveagressive.py”,第 100 行,在 clf.fit(X, y) 文件“/home/suleka/anaconda3/lib/python3. 6/site-packages/sklearn/linear_model/passive_aggressive.py”,第 225 行,适合 coef_init=coef_init,intercept_init=intercept_init) 文件“/home/suleka/anaconda3/lib/python3.6/site-packages/sklearn/linear_model /stochastic_gradient.py”,第 444 行,在 _fit 类中,sample_weight、coef_init、intercept_init) 文件“/home/suleka/anaconda3/lib/python3.6/site-packages/sklearn/linear_model/stochastic_gradient.py”,第 407 行, in _partial_fit raise ValueError("类标签的数量必须是" ValueError: 类标签的数量必须大于一。
我检查了 stackoverflow 中的大多数帖子,他们建议必须只有一个唯一的类。所以我做了np.unique(labels)
,它显示了 20 个(20 个新闻组):
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]
谁能帮我解决这个错误,如果我实施错误,请告诉我。
我的代码如下所示:
from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.datasets import make_classification
from string import punctuation
import numpy as np
from sklearn.datasets import fetch_20newsgroups
from collections import Counter
from sklearn.preprocessing import MinMaxScaler, LabelBinarizer
from sklearn.utils import shuffle
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import TfidfVectorizer
import nltk
nltk.download('stopwords')
seed = 42
np.random.seed(seed)
def preProcess():
newsgroups_data = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))
vectorizer = TfidfVectorizer(sublinear_tf=True, max_df=0.5,
stop_words='english')
features = vectorizer.fit_transform(newsgroups_data.data)
labels= newsgroups_data.target
return features, labels
if __name__ == '__main__':
features, labels = preProcess()
X_train, y_train = shuffle(features, labels, random_state=seed)
clf = PassiveAggressiveClassifier(random_state=seed)
n, d =X_train.shape
print(np.unique(labels))
error = 0
iteration = 0
for i in range(n):
print(iteration)
X, y = X_train[i:i + 1], y_train[i:i + 1]
clf.fit(X, y)
pred = clf.predict(X)
print(pred)
print(y)
if y - pred != 0:
error += 1
iteration += iteration
print(error)
print(np.divide(error, n, dtype=np.float))
先感谢您!