4

我正在按照这个示例在受监督的文本模型上使用石灰https://rdrr.io/github/thomasp85/lime/man/lime.html

我刚刚更改了 get_matrix 函数来创建 dtm。这个新功能适用于此链接示例中的数据,但不适用于我的真实数据。我收到此错误:

Error in glmnet(x[, c(features, j), drop = FALSE], y, weights = weights,  : x should be a matrix with 2 or more columns

我使用的代码如下 - 数据和分析仅用于此目的,但复制了我在真实数据上遇到的错误(我有 1000 个文本文档而不是 10 个):

 data<-data.frame(articles = c("Prince Harry proposed to Meghan", "Football transfer rumours Chelsea David Luiz", "Football transfer rumours Chelsea David Luiz", 
                    "World Cup team by team guide", "Destiny free trial goes live today", "What happens today ahead of crucial vote",
                    "Story image for sport news football from BBC Sport", "Premier League news conferences", "What is Meghan Markles engagement ring", "Harry and Megan")
       , topic = c("other", "sport", "sport", "sport", "other", "other", "sport", "sport", "other", "other"))


      data$articles<-as.character(data$articles)
      data$topic<-as.character(data$topic)
      data_train<-data[1:6,]
      data_test<-data[6:10,]

      my_stop_word <-c (stopwords(), "one", "two", "three")
      get_matrix <- function(text) {
        it <- itoken(text, tolower, progressbar = FALSE)
        vocab2 = create_vocabulary(it, stopwords = my_stop_word)
        vectorizer = vocab_vectorizer(vocab2)
        create_dtm(it, vectorizer = vectorizer)
      }
      dtm_train = get_matrix(data_train$articles)
      xgb_model <- xgb.train(list(max_depth = 7, eta = 0.1, objective = "binary:logistic",
                                  eval_metric = "error", nthread = 1),
                             xgb.DMatrix(dtm_train, label = data_train$topic == "sport"),
                             nrounds = 50)

      sentences <- head(data_test[data_test$topic == "sport", "articles"], 1)
      explainer <- lime(data_test$articles, xgb_model, get_matrix)
      explanations <- explain(sentences, explainer, n_labels = 1, n_features = 2)

错误:glmnet(x[, c(features, j), drop = FALSE], y, weights = weights, 中的错误:x 应该是具有 2 列或更多列的矩阵

谢谢!

4

1 回答 1

0

我有同样的问题。似乎错误来自 n_features。我增加了 n_features 并且它对我有用。请为 n_features 尝试不同的数字,例如 > 6,因为它使用不同的特征选择方法,在https://shiring.github.io/machine_learning/2017/04/23/lime中进行了讨论。

n_features 是用于每个解释的特征数。请参阅 https://www.rdocumentation.org/packages/lime/versions/0.4.0/topics/explain

于 2018-11-05T04:42:42.307 回答