r - 无法在 R 中绘制 Zipf 定律

Question

我从文本文件中加载了大量术语及其频率，并将其转换为表格：

myTbl = read.table("word_count.txt")  # read text file 

colnames(myTbl)<-c("term", "frequency")
head(myTbl, n = 10)

> head(myTbl, n = 10)
    term frequency
1     de     35945
2      i     34850
3  \xe3n     19936
4      s     15348
5     cu     13722
6     la     13505
7     se     13364
8     pe     13361
9     nu     12693
10     o     11995

我可能应该添加一个带有单词排名的列，然后根据频率绘制排名，但我该怎么做呢？

score 4 · Accepted Answer

与其自行计算，不如使用该tm软件包更容易。将 myTbl 转换为术语文档矩阵 (tdm)

library(tm)
tdm <- TermDocumentMatrix(myTbl) # there are many more clean up steps, but I am simplifying

那么你不仅有 Zipf，还有要显示的堆和绘图。

Zipf_plot(tdm) 
Heaps_plot(tdm) # how vocabulary grows as size of text grows

或者，您可以使用qdap包及其等级频率图。这是小插曲的引述：

排名频率图是一种可视化单词排名与频率的方法，与 Zipf 定律相关，该定律指出单词的排名与其频率成反比。rank_freq_mplot 和 rank_freq_plot 提供了绘制单词排名和频率的方法（rank_freq_mplot 通过分组变量绘制）。
Rank_freq_mplot 使用 ggplot2 包，而 rank_freq_plot 使用基本图形。

r - 无法在 R 中绘制 Zipf 定律

1 回答 1

Related

Reference