1

我正在尝试使用以下代码从 quanteda NB 预测情绪分析:

library(quanteda)
X_train <-c( "I love this sandwich.",
             "This is an amazing place!",
             "I feel very good about these beers.",
             "This is my best work.",
             "What an awesome view",
             "I do not like this restaurant",
             "I am tired of this stuff.",
             "I can't deal with this",
             "He is my sworn enemy!",
             "this guy is horrible.")

Y_train <- c( 1,1,1,1,1,0,0,0,0,0)

Y_train <- c( 1,1,1,1,1,0,0,0,0,0)
X_test <- c( "The beer was good.",
             "I do not enjoy my job",
             "I ain't feeling dandy today.",
             "I feel amazing!   pos",
             "Gary is a friend of mine.",
             "I can't believ I'm doing this.",
             "very sad about Iran",
             "You're the only one who can see this cause no one else is following me this is for you because you're pretty awesome",
             "ok thats it you win.",
             "My horsie is moving on Saturday morning.",
             "times by like a million",
             "but i'm proud.",
             "i want a hug)")
Y_test <- c(1,0,0,1,1,0,0,1,1,0,1,1,1) 
dfm_mat <- dfm( X_train)
tfidf_mat <- tfidf( dfm_mat, normalize = TRUE)
model <- textmodel_NB( tfidf_mat, Y_train, distribution = "multinomial")

predict( model, X_test)

我收到以下错误消息:

Error in newdata %*% t(log(object$PwGc)) : not-yet-implemented method for <character> %*% <dgeMatrix>
5.stop(gettextf("not-yet-implemented method for <%s> %%*%% <%s>", class(x), class(y)), domain = NA)
4.newdata %*% t(log(object$PwGc))
3.newdata %*% t(log(object$PwGc))
2.predict.textmodel_NB_fitted(model, X_test)
1.predict(model, X_test)

运行:quanteda_0.9.8.5
Matrix_1.2-7.1
R 版本 3.3.1 (2016-06-21)
平台:x86_64-pc-linux-gnu (64-bit)
运行于:Ubuntu 16.10

有人知道吗?

4

1 回答 1

2

这里的问题是您正在尝试从字符向量预测拟合的朴素贝叶斯模型,该字符向量(如错误消息所述,虽然公认不是最清晰的方式)没有为字符向量定义。

解决方案是在 dfm 对象上预测您的模型,但其特征已与训练 dfm 匹配。

# this creates a test dfm, and matches its features to the training dfm
dfm_test <- dfm_select(dfm(X_test), dfm_mat) 
## found 15 features from 36 supplied types in a dfm, padding 0s for another 21 

然后该predict()方法工作正常:

predict(model, dfm_test)
## Predicted textmodel of type: Naive Bayes
## 
##              lp(1)       lp(0)     Pr(1)  Pr(0) Predicted
## text1   -4.2419639  -4.3728368    0.5327 0.4673         1
## text2  -15.1799166 -14.8238632    0.4119 0.5881         0
## text3   -4.2637198  -4.2239433    0.4901 0.5099         0
## text4  -11.3125631 -11.5833225    0.5673 0.4327         1
## text5   -7.9101340  -7.7336472    0.4560 0.5440         0
## text6  -11.5324821 -11.2864767    0.4388 0.5612         0
## text7   -7.7907806  -8.0525264    0.5651 0.4349         1
## text8  -18.3944576 -18.5330895    0.5346 0.4654         1
## text9   -0.6931472  -0.6931472    0.5000 0.5000         1
## text10  -7.7792864  -7.7569503    0.4944 0.5056         0
## text11  -4.3754953  -4.2186861    0.4609 0.5391         0
## text12  -0.6931472  -0.6931472    0.5000 0.5000         1
## text13  -4.2637198  -4.2239433    0.4901 0.5099         0
于 2016-11-18T13:50:20.110 回答