r - R的文本挖掘包...添加新函数getTransformation

Question

我正在尝试添加一个使用表格查找方法工作的新词干分析器。如果 h 是包含词干提取操作的散列，则其编码如下：在词干提取之前将键作为词，在词干之后作为词的值。

我想理想地添加一个自定义哈希，允许我执行以下操作

myCorpus = tm_map(myCorpus, replaceWords, h)

replaceWords 函数应用于 myCorpus 中的每个文档，并使用哈希来阻止文档的内容

这是我的 replaceWords 函数的示例代码

$hash_replace <- function(x,h) {
if (length(h[[x]])>0) {
    return(h[[x]])
} else {
    return(x)
}
}

replaceWords <- function(x,h) {
y = tolower(unlist(strsplit(x," ")))
y=y[which(as.logical(nchar(y)))]
z = unlist(lapply(y,hash_replace,h))
return(paste(unlist(z),collapse=' '))
}

尽管这可行，但转换后的语料库不再包含“TextDocument”或“PlainTextDocument”类型的内容，而是“character”类型的内容

我尝试使用

return(as.PlainTextDocument(paste(unlist(z),collapse=' ')))

但这在尝试运行时给了我一个错误。

在 R 的 tm 包的早期版本中，我确实看到了一个允许同义词和基于 WORDNET 的替换的 replaceWords 函数。但是我在当前版本的 tm 包中不再看到它（尤其是当我调用函数 getTransformations() 时）

有没有人对我如何实现这一点有想法？

任何帮助是极大的赞赏。

干杯，希瓦尼

谢谢， Shivani Rao

score 1 · Accepted Answer

您只需要使用该PlainTextDocument函数而不是as.PlainTextDocument. R 会自动返回函数中的最后一条语句，所以如果你只写最后一行，它就可以工作

PlainTextDocument(paste(unlist(z),collapse=' '))

r - R的文本挖掘包...添加新函数getTransformation

1 回答 1

Related

Reference