r - 用 R 将语料库中的两个单词组合起来

Question

所以这是我的代码

ny <- read.csv2("nyt.csv", sep = "\t", header = T)
ny_texte <- as.vector(ny)

iterator <- itoken(ny_texte,
                   preprocessor=tolower, 
                   tokenizer=word_tokenizer, 
                   progressbar=FALSE)

vocabulary <- create_vocabulary(iterator)

我的 .csv 是纽约时报的文章。我想在词汇中结合“纽约”、“南非”、“埃利斯岛”等词，而不仅仅是这样的标记：“新”、“约克”等

我怎样才能做到这一点？

谢谢你

为了更精确：我正在使用这些库

library(text2vec)
library(stopwords)
library(tm)
library(dplyr)
library(readr)

例如关于我的结果

ny[1]

1 “ 可能的总统竞选活动等待翅膀的 LEAD 州长库莫宣誓就职新年前夜第二任期纽约首席执行官 LEAD 州长库莫与可能的总统竞选等待翅膀......

vocabulary 在此处输入图像描述

score 0 · Accepted Answer

回答您的问题仍然有点困难：我们无法运行您的代码，因为我们没有“nyt.csv”。但它似乎gsub()会做你想要的：

ny <- read.csv2("nyt.csv", sep = "\t", header = TRUE)
ny <– gsub("new york", "newyork", ny, ignore.case = TRUE)
ny <– gsub("south africa", "southafrica", ny, ignore.case = TRUE)
ny_texte <- as.vector(ny)

（然后运行示例中的itoken()andcreate_vocabulary()命令。）

r - 用 R 将语料库中的两个单词组合起来

1 回答 1

Related

Reference