创建一个包含 2 列的 data.frame 并将其存储在某处,作为 rds、数据库对象或在 excel 中。因此,您可以在每次需要时加载它。
在 data.frame 中获得数据后,您可以使用 joins /dictionaries 将其与文本语料库中的单词匹配。在评分 data.frame 中,我使用 1 和 2 来表示扇区,但您也可以使用单词。
请参阅使用 tidytext 的示例,但请阅读情绪分析并使用您需要的任何包。
library(tidytext)
library(dplyr)
text_df <- data.frame(id = 1:2,
text = c("The business is in the mining industry and has a settlement.",
"The court ordered the business owner to settle the lawsuit."))
text_df %>%
unnest_tokens(word, text) %>%
inner_join(my_scoring_df)
Joining, by = "word"
id word sector
1 1 business 1
2 1 industry 1
3 1 settlement 2
4 2 court 2
5 2 business 1
6 2 lawsuit 2
数据:
my_scoring_df <- structure(list(word = c("business", "exchange", "industry", "rule",
"settlement", "umpire", "court", "tribunal", "lawsuit", "bench",
"courthouse", "courtroom"), sector = c(1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L)), class = "data.frame", row.names = c(NA,
-12L))