我正在使用 udpipe 包(用于 R)中的函数 keywords_rake 从一堆文档中提取关键字。
udmodel_en <- udpipe_load_model(file = dl$file_model)
x <- udpipe_annotate(udmodel_en, x = data$text)
x <- as.data.frame(x)
keywords <- keywords_rake(x = x, term = "lemma", group = "doc_id",
relevant = x$xpos %in% c("NN", "JJ"), ngram_max = 2)
数据看起来像这样
Text
"cats are nice but dogs are better..."
"I really like dogs..."
"red flowers are pretty, especially roses..."
"once I saw a blue whale ..."
....
(每一行是一个单独的文档)
但是输出不包括关键字的来源,并提供所有文档的关键字列表
如何将这些关键字链接到它们来自的相应文档?(即每个文档都有一个关键字列表)
像这样的东西:
keywords
doc1 dog, cat, blue whale
doc2 dog
doc3 red flower, tower, Donald Trump