0

我是 fastText 的新手,并且已经阅读了教程:https ://fasttext.cc/docs/en/supervised-tutorial.html 。

我下载了示例数据,发现标签是字符串类型。

$ head cooking.stackexchange.txt   
                                                           
__label__sauce __label__cheese How much does potato starch affect a cheese sauce recipe?
__label__food-safety __label__acidity Dangerous pathogens capable of growing in acidic environments
__label__cast-iron __label__stove How do I cover up the white spots on my cast iron stove?
__label__restaurant Michelin Three Star Restaurant; but if the chef is not there
__label__knife-skills __label__dicing Without knife skills, how can I quickly and accurately dice vegetables?
__label__storage-method __label__equipment __label__bread What's the purpose of a bread box?
__label__baking __label__food-safety __label__substitutions __label__peanuts how to seperate peanut oil from roasted peanuts at home?
__label__chocolate American equivalent for British chocolate terms
__label__baking __label__oven __label__convection Fan bake vs bake
__label__sauce __label__storage-lifetime __label__acidity __label__mayonnaise Regulation and balancing of readymade packed mayonnaise and other sauces

以及教程中的训练和测试代码。

>>> model = fasttext.train_supervised(input="cooking.train", lr=1.0)
Read 0M words
Number of words:  9012
Number of labels: 734
Progress: 100.0%  words/sec/thread: 81469  lr: 0.000000  loss: 6.405640  eta: 0h0m

>>> model.test("cooking.valid")
(3000L, 0.563, 0.245)

我的问题是为什么不应用标签(比如sklearn)LabelEncoder?我已经运行了这个例子,它运行良好。我很困惑。

[更新] - - - -

IMO,代码如下所示

from sklearn import preprocessing

texts_train, labels_train = load_dataset()

label_encoder = preprocessing.LabelEncoder()
labels_train = label_encoder.fit_transform(labels_train)


with open('cooking.train.2', 'w') as f:
    for i in range(len(texts_train)):
        f.write('%s __label__%d\n' % (texts_train[i], labels_train[i]))

model = fasttext.train_supervised('cooking.train.2',lr=1.0)
4

0 回答 0