0

假设 ABC 是一个数据框,如下所示:

ABC <- data.frame(Column1 = c(1.222, 3.445, 5.621, 8.501, 9.302), 
                  Column2 = c(654231, 12347, -2365, 90000, 12897), 
                  Column3 = c('A1', 'B2', 'E3', 'C1', 'F5'), 
                  Column4 = c('I bought it', 'The flower has a beautiful fragrance', 'It was bought by me', 'I have bought it', 'The flower smells good'), 
                  Column5 = c('Good', 'Bad', 'Ok', 'Moderate', 'Perfect'))

我的目的是在 Column4 中找到同义字符串。在这种情况下,我买了它它是我买的我买了它是同义词或相似的字符串,花有美丽的香味花闻起来很好传达类似的意思。

我在以下线程中尝试了IVR的方法并卡住了:Find similar texts based on paraphrase detection

当我运行 HLS.Extract 代码块时,我收到以下错误消息:

Error in strsplit(PlainTextDocument(synonyms(word)), ",") : non-character Argument

使用 as.character 也不能解决问题:

Syns = function(word){  
    word <- as.character(word) ###
    wl    =   gsub("(.*[[:space:]].*)","",      
                   gsub("^c\\(|[[:punct:]]+|^[[:space:]]+|[[:space:]]+$","",  
                        unlist(strsplit(PlainTextDocument(synonyms(word)),","))))
    wl = wl[wl!=""] 
    return(wl)     
  }  
  1. 出了什么问题?

  2. 有没有更好的方法使用 R 对其进行编码,并另外创建一个新列,例如数字 1 作为第一个同义字符串的条目,2 作为下一组同义字符串的条目?

  3. 它适用于德语文本吗?

4

1 回答 1

0

通过将 PlainTextDocument(synonyms(word)) 设置为字符解决了该问题,如下所示:

Syns = function(word){ 
    wl    =   gsub("(.*[[:space:]].*)","",      
                   gsub("^c\\(|[[:punct:]]+|^[[:space:]]+|[[:space:]]+$","",  
                        unlist(strsplit(as.character(PlainTextDocument(synonyms(word))),",")))) 
    wl = wl[wl!=""] 
    return(wl)     
  } 
于 2021-01-25T12:17:15.963 回答