r - 使用 R 的频率表比较

Question

我有两个使用函数创建R的频率表：table()

freq1 <- table(unlist(strsplit(topic_list1, split=";")))
freq2 <- table(unlist(strsplit(topic_list2, split=";")))

topic_list1并且topic_list2是包含主题的文本表示的字符串，由 . 分隔;。

如果可能的话，我想要一种比较两个频率的方法。

因此，如果两个列表包含频率不同的相同主题，我希望能够看到它。对于出现在一个频率表中的主题也是如此，但在另一个频率表中则不然。

score 2 · Accepted Answer

可能有一种更优雅的方法可以做到这一点，但这应该可行：

# here I'm generating some example data
set.seed(5)
topic_list1 <- paste(sample(letters, 20, replace=T), sep=";")
topic_list2 <- paste(sample(letters, 15, replace=T), sep=";")

# I don't make the tables right away
tl1      <- unlist(strsplit(topic_list1, split=";"))
tl2      <- unlist(strsplit(topic_list2, split=";"))
big_list <- unique(c(tl1, tl2))

# this computes your frequencies
lbl         <- length(big_list)
tMat1       <- matrix(rep(tl1, lbl), byrow=T, nrow=lbl)
tMat2       <- matrix(rep(tl2, lbl), byrow=T, nrow=lbl)
tMat1       <- cbind(big_list, tMat1)
tMat2       <- cbind(big_list, tMat2)
counts1     <- apply(tMat1, 1, function(x){sum(x[1]==x[2:length(x)])})
counts2     <- apply(tMat2, 1, function(x){sum(x[1]==x[2:length(x)])})
total_freqs <- rbind(counts1, counts2, counts1-counts2)

# this makes it nice looking & user friendly
colnames(total_freqs) <- big_list
rownames(total_freqs) <- c("topics1", "topics2", "difference")
total_freqs           <- total_freqs[ ,order(total_freqs[3,])]
total_freqs
            d  l  a  z  b f s y m r x h n i g k c v o
topics1     0  0  0  0  0 2 1 1 1 1 2 2 1 1 1 1 2 2 2
topics2     2  2  2  1  1 2 1 1 1 0 1 1 0 0 0 0 0 0 0
difference -2 -2 -2 -1 -1 0 0 0 0 1 1 1 1 1 1 1 2 2 2

从那里您可以只使用直数或随心所欲地将它们可视化（例如，点图等）。这是一个简单的点图：

windows()
  dotchart(t(total_freqs)[,3], main="Frequencies of topics1 - topics2")
  abline(v=0)

在此处输入图像描述

score 0 · Accepted Answer

您可以简单地对它们进行条形图（使用 beside=T 参数），这将为您提供一种直观地比较每个级别的计数的方法......下面是一个示例：

counts <- table(mtcars$vs, mtcars$gear)
barplot(counts, col=c("darkblue","red"), legend=rownames(counts), beside=T)

r - 使用 R 的频率表比较

2 回答 2

Related

Reference