我有这个数据框:
> str(final)
'data.frame': 112 obs. of 3 variables:
$ FAO_CountryName: chr Algeria Egypt Libya Morocco ...
$ FAO_CountryURL : chr "http://www.fao.org/giews/countrybrief/country.jsp?code=DZA" "http://www.fao.org/giews/countrybrief/country.jsp?code=EGY" "http://www.fao.org/giews/countrybrief/country.jsp?code=LBY" "http://www.fao.org/giews/countrybrief/country.jsp?code=MAR" ...
$ Text : chr "\r\n Reference Date: 24-November-2016\r\n \r\n \r\n FOOD SECURITY SNAPSHOT\r\n \r\n "| __truncated__ "\r\n Reference Date: 28-November-2016\r\n \r\n \r\n FOOD SECURITY SNAPSHOT\r\n \r\n "| __truncated__ "\r\n Reference Date: 15-November-2016\r\n \r\n \r\n FOOD SECURITY SNAPSHOT\r\n \r\n "| __truncated__ "\r\n Reference Date: 21-September-2016\r\n \r\n \r\n FOOD SECURITY SNAPSHOT\r\n \r\n "| __truncated__ ...
我想以一种我可以的方式处理 Text 变量 - 例如 - 逐行计算一个单词在其中出现的次数。换句话说,我想得到一个如下的数据框:
> head(final, n=2)
FAO_CountryName FAO_CountryURL Text WordCount
Algeria http://www.fao.org… Algeria is nice… Algeria 1
is 1
...
Egypt http://www.fao.org… Egypt is nice too… Egypt 1
is 5
...
然而,我已经这样做了:
## Counting the words included in the textual dataset.
keywords <- text_df %>%
unnest_tokens(word, text) %>%
count(word, sort = TRUE) %>%
ungroup()
## Scoring the textual frequencies into the textual dataset (i.e. how many times the words are present)
total_words <- keywords %>%
group_by(word) %>%
summarize(total = sum(n))
尽管如此,这样我只能获得所有列的字数,而不是 ROW BY ROW。有什么建议吗?