当我20newsgroups_vectorized
通过
newsgroups = fetch_20newsgroups_vectorized(subset='all')
labels = newsgroups.target_names
target = newsgroups.target
target = pd.DataFrame([labels[i] for i in target], columns=['label'])
data = newsgroups.data
data
是<class 'scipy.sparse.csr.csr_matrix'>
形状
(18846, 130107)
如何按目标名称对数据进行子集化(例如,仅提取'rec.sport.baseball'
)并对那些稀疏行向量使用向量运算(例如,计算平均向量或距离)?