r - R：创建 termDocumentMatrix() 对象时出错

Question

这是我用来为训练数据创建 termdocumentmatrix 对象的代码：

text_train = iconv(data_train$SentimentText, "UTF-8", "ASCII", sub = "")
corpus_train = Corpus(VectorSource(text_train))
tdm_train = TermDocumentMatrix(
  corpus_train,
  control = list(
    removePunctuation = TRUE,
    removestopWords   = TRUE,
    stemming = FALSE,
    removeNumbers = TRUE, 
    tolower = TRUE,
    weighting = weightTfIdf)
)

它有效！机器没有抱怨。

但是，当我使用相同的技术为验证数据集创建一个时，机器会抱怨！

这是我用来为验证集创建 termdocumentmatrix 对象的代码。请注意，唯一的区别是我在控件中添加了“字典”参数：

text_val = iconv(data_val$SentimentText, "UTF-8", "ASCII", sub = "")
corpus_val = Corpus(VectorSource(text_val))
tdm_val = TermDocumentMatrix(
  corpus_val,
  control = list(
    removePunctuation = TRUE,
    removestopWords   = TRUE,
    stemming = FALSE,
    removeNumbers = TRUE, 
    tolower = TRUE,
    weighting = weightTfIdf,
    dictionary = tdm_train$dimnames$Terms
  )
)

但是，我不断收到以下错误消息：

simple_triplet_matrix 中的错误（i = i，j = j，v = as.numeric（v），nrow = length（allTerms），：'i，j，v'不同的长度

我已经阅读了很多帖子，包括：

我尝试了他们所有建议的解决方案，但没有一个有效。

我想补充的一点是，只有当我使用超过 2000 条推文时才会出现问题。

关于输入数据的注意事项：

输入数据是一个包含两列的数据表，其中一列名为“SentimentText”（您在上面的代码中看到）。

在这一列中，每一行是一条推文，每条推文是一个文本字符串，即 character()。

一条示例推文，也就是一个行数据，如下所示：“我今天过得很愉快！:>”

非常感谢任何帮助！

r - R：创建 termDocumentMatrix() 对象时出错

0 回答 0

Related

Reference