让我们从一个可重现的例子开始,它是一个key
由 8 列和 3 行组成的数据框:
key <- structure(c("Make Professional Maps with QGIS and Inkscape",
"Gain the skills to produce original, professional, and aesthetically pleasing maps using free software",
"English", "Inkscape 101 for Beginners - Design Vector Graphics",
"Learn how to create and design vector graphics for free!", "English",
"Design & Create Vector Graphics With Inkscape 2016", "The Beginners Guide to designing and creating Vector Graphics with Inkscape. No Experience needed!",
"English", "Design a Logo for Free in Inkscape", "Learn from an award winning, published logo design professional!",
"English", "Inkscape - Beginner to Pro", "If you want to have a decent learning curve, you are new to the program or even in design, this course is for you.",
"English", "Creating 2D Textures in Inkscape", "A guide to creating colorful and interesting textures in inkscape.",
"English", "Vector Art in Inkscape - Icon Design | Make Vector Graphics",
"Learn Icon Design by creating Vector Graphics using the .SVG and PNG format with the Free Software Inkscape!",
"English", "Inkscape and Bootstrap 3 -> Responsive Web Design!",
"Design responsive websites using Free tools Inkscape and Bootstrap 3! Mood Boards and Style Tiles to Mobile First!",
"English"), .Dim = c(3L, 8L), .Dimnames = list(c("Title", "Short_Description",
"Language"), c("1", "2", "4", "5", "6", "9", "13", "15")))
我想独立提取每一列的关键字。为此,我使用udpipe
R 中的包。
因为我想运行每一列中的函数,所以我运行了一个for
循环。
在开始之前,我们以英语为参考创建模型(有关更多信息,请参阅此链接):
library(udpipe)
ud_model <- udpipe_download_model(language = "english")
ud_model <- udpipe_load_model(ud_model$file_model)
理想情况下,我的最终输出将是一个包含 8 列的数据框,并且提取了许多行作为关键字。
我尝试了两种方法:
方法一:使用dplyr
library(dplyr)
keywords <- list()
for(i in ncol(keywords_en_t)){
keywords[[i]] <- keywords_en_t %>%
udpipe_annotate(ud_model,s)
as.data.frame()
}
方法二:
key <- list()
stats <- list()
for(i in ncol(keywords_en_t)){
key[[i]] <- as.data.frame(udpipe_annotate(ud_model, x = keywords_en_t[,i]))
stats[[i]] <- subset(key[[i]], upos %in% "NOUN")
stats <- txt_freq(x = stats$lemma)
}
输出
在这两种情况下,或者我得到一些错误或者输出不是预期的。
如前所述,我期望的输出是一个数据框,其中 8 列在行中表示关键字
任何的想法?