0

我想知道 text2vec 包是否可用于多标签分类,如 skmultilearn.problem_transform 中的 python 的 BinaryRelevance 我目前指的是记录在以下位置的管道:http: //text2vec.org/vectorization.html

4

1 回答 1

0

您可以使用 text2vec 创建文档术语矩阵 (dtm)。要创建 dtm,您可以使用http://text2vec.org/vectorization.html。当您的 dtm 矩阵准备就绪时,您可以将它们用于多标签分类。对于分类,xgboost 模型是很好的模型之一,在https://rpubs.com/mharris/multiclass_xgboost中有讨论。

library(xgboost)

# dtm_train is the training matrix obtained by text2vec  
# dtm_test is the testing matrix obtained by text2vec    
# label_train is labels for dtm_trian; should be factors
# label_train <- factor(label_train, labels = classes)

nclass <- 3  # how many classes you have
param       <- list("objective" = "multi:softmax", # multi class classification
               "num_class"= nclass ,          # Number of classes
               "eval_metric" = "mlogloss",    # evaluation metric 
               "nthread" = 8,                # number of threads to be used 
               "max_depth" = 16,             # maximum depth of tree 
               "eta" = 0.3,                  # step size shrinkage 
               "gamma" = 0,                  # minimum loss reduction 
               "subsample" = 0.7,            # part of data instances 
               "colsample_bytree" = 1,       # subsample ratio 
               "min_child_weight" = 12       # minimum sum of instance weight 
)

bst = xgboost(
 param=param,
 data =as.matrix(dtm_train),
 label = label_training,
 nrounds=200)

# Make prediction on the testing data.
pred <- predict(bst, as.matrix(dtm_test))

希望有所帮助。

如果您需要进一步的解释,请告诉我。

于 2018-10-29T22:34:15.023 回答