2

我只是按照这里的代码(对 sklearn 0.17 进行了少量修改)。在该示例中,数据只是列表或 numpy 数组。现在我想在磁盘上准备一个玩具训练数据集,并使用datasets.load_files它来加载它以进行多标签分类。但是,简单地遵循load_files约定,然后将同一个文件复制到多个文件夹中,不会为dataset.target.

那么为多标签分类准备数据集的正确方法是什么?

4

1 回答 1

2

I don't think load_files supports multilabel classes, to be honest I've never used scikit learn to load data, I always do my initial data load and preprocessing using pandas. One option for your case would be to store your data as csv, serializing your labels as pipe-delimited lists

For example your file data.csv might be

recipe_name,classes
'stir fried broccoli',chinese|vegetarian
'kung po chicken',chinese|meat
'sauerkraut salad',vegetarian|polish

And you would load it as follows:

import pandas as pd
df = pd.read_csv('data.csv')
X_train = df.recipe_name
y_train = df.classes.str.split('|')
于 2016-05-02T04:58:28.493 回答