我有一个包含多个字符串(文本)变量的文件,其中每个受访者为每个变量写了一两句话。我希望能够找到每个单词组合的频率(即“能力”与“性能”一起出现的频率)。到目前为止,我的代码是:
#Setting up the data file
data.text <- scan("C:/temp/tester.csv", what="char", sep="\n")
#Change everything to lower text
data.text <- tolower(data.text)
#Split the strings into separate words
data.words.list <- strsplit(data.text, "\\W+", perl=TRUE)
data.words.vector <- unlist(data.words.list)
#List each word and frequency
data.freq.list <- table(data.words.vector)
这给了我每个单词的列表以及它在字符串变量中出现的频率。现在我想查看每 2 个单词组合的频率。这可能吗?
谢谢!
字符串数据示例:
ID Reason_for_Dissatisfaction Reason_for_Likelihood_to_Switch
1 "not happy with the service" "better value at other place"
2 "poor customer service" "tired of same old thing"
3 "they are overchanging me" "bad service"