1

我有一个这样的数据框:

    mid value   label
ID          
192 3   176.6   [9, 6, 8, 0, 8, 8, 7, 9, 2, 19...
192 4   73.6    [9, 6, 8, 0, 8, 8, 7, 9, 2, 19...
192 5   15.8    [9, 6, 8, 0, 8, 8, 7, 9, 2, 19...
194 3   9603.2  [0, 0, 0, 0, 0, 9, 6, 1, 8, ...

我想在删除每个标签列列表中的重复值后实现 MultiLabelBinarizer。

我试过循环框架并删除重复项。而且,多标签二值化器不起作用并引发异常

    from sklearn.preprocessing import MultiLabelBinarizer
    mlb = MultiLabelBinarizer()
    mlb.fit(y_train.data)
    X_train includes the mid and value columns
    y_train includes label values
    id is the index

I expect a prediction from the above values after the duplicate values are removed from each list of label column
4

1 回答 1

0

假设您的数据框名为df

df2 = pd.DataFrame(df.groupby(['ID','mid', 'value'])['label'].apply(lambda x: tuple(x.values)))
df2.reset_index(inplace=True)

from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
mlb.fit(df2['label'])
mlb.transform(df2['label'])
于 2019-11-11T14:15:25.980 回答