我有一个这样的数据框:
mid value label
ID
192 3 176.6 [9, 6, 8, 0, 8, 8, 7, 9, 2, 19...
192 4 73.6 [9, 6, 8, 0, 8, 8, 7, 9, 2, 19...
192 5 15.8 [9, 6, 8, 0, 8, 8, 7, 9, 2, 19...
194 3 9603.2 [0, 0, 0, 0, 0, 9, 6, 1, 8, ...
我想在删除每个标签列列表中的重复值后实现 MultiLabelBinarizer。
我试过循环框架并删除重复项。而且,多标签二值化器不起作用并引发异常
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
mlb.fit(y_train.data)
X_train includes the mid and value columns
y_train includes label values
id is the index
I expect a prediction from the above values after the duplicate values are removed from each list of label column