r - grepl 查找单词

Question

我正在尝试在 R 中找到许多单词中的西班牙语单词。我有一个 excel 中的所有西班牙语单词，我不知道如何附加到帖子中（它有超过 80000 个单词），我正在尝试检查是否有一些单词在上面。

例如：

words = c("Silla", "Sillas", "Perro", "asdfg")

我尝试使用此解决方案：

grepl(paste(spanish_words, collapse = "|"), words)

但是西班牙语单词太多了，给了我这个错误：

错误

所以...我可以做谁？我也试过这个：

toupper(words) %in% toupper(spanish_words)

结果

正如您所看到的，此选项仅在完全匹配时给出 TRUE，我需要“Sillas”也显示为 TRUE（它是 silla 的复数词）。这就是我首先尝试使用 grepl 的原因，也用于获取复数。

任何想法？

score 1 · Accepted Answer

作为df：

df <- tibble(text = c("some words", 
                      "more words", 
                      "Perro", 
                      "And asdfg", 
                      "Comb perro and asdfg"))

单词向量：words <- c("Silla", "Sillas", "Perro", "asdfg") words <- tolower(paste(words, collapse = "|"))

然后使用mutate和str_detect：

df %>% 
  mutate(
   text = tolower(text), 
   spanish_word = str_detect(text, words)
 )

回报：

text                 spanish_word
  <chr>                <lgl>       
1 some words           FALSE       
2 more words           FALSE       
3 perro                TRUE        
4 and asdfg            TRUE        
5 comb perro and asdfg TRUE

r - grepl 查找单词

1 回答 1

Related

Reference