我想知道 text2vec 包是否可用于多标签分类,如 skmultilearn.problem_transform 中的 python 的 BinaryRelevance 我目前指的是记录在以下位置的管道:http: //text2vec.org/vectorization.html
问问题
206 次
1 回答
0
您可以使用 text2vec 创建文档术语矩阵 (dtm)。要创建 dtm,您可以使用http://text2vec.org/vectorization.html。当您的 dtm 矩阵准备就绪时,您可以将它们用于多标签分类。对于分类,xgboost 模型是很好的模型之一,在https://rpubs.com/mharris/multiclass_xgboost中有讨论。
library(xgboost)
# dtm_train is the training matrix obtained by text2vec
# dtm_test is the testing matrix obtained by text2vec
# label_train is labels for dtm_trian; should be factors
# label_train <- factor(label_train, labels = classes)
nclass <- 3 # how many classes you have
param <- list("objective" = "multi:softmax", # multi class classification
"num_class"= nclass , # Number of classes
"eval_metric" = "mlogloss", # evaluation metric
"nthread" = 8, # number of threads to be used
"max_depth" = 16, # maximum depth of tree
"eta" = 0.3, # step size shrinkage
"gamma" = 0, # minimum loss reduction
"subsample" = 0.7, # part of data instances
"colsample_bytree" = 1, # subsample ratio
"min_child_weight" = 12 # minimum sum of instance weight
)
bst = xgboost(
param=param,
data =as.matrix(dtm_train),
label = label_training,
nrounds=200)
# Make prediction on the testing data.
pred <- predict(bst, as.matrix(dtm_test))
希望有所帮助。
如果您需要进一步的解释,请告诉我。
于 2018-10-29T22:34:15.023 回答