我需要按年龄和婚姻状况计算个人的频率,所以通常我会使用:
table(age, marital_status)
然而,在数据采样后,每个人都有不同的权重。如何将其合并到我的频率表中?
我需要按年龄和婚姻状况计算个人的频率,所以通常我会使用:
table(age, marital_status)
然而,在数据采样后,每个人都有不同的权重。如何将其合并到我的频率表中?
您可以使用svytable
来自 packagesurvey
或wtd.table
from 的函数rgrs
。
编辑: rgrs
现在称为questionr
:
df <- data.frame(var = c("A", "A", "B", "B"), wt = c(30, 10, 20, 40))
library(questionr)
wtd.table(x = df$var, weights = df$wt)
# A B
# 40 60
这也是可能的dplyr
:
library(dplyr)
count(x = df, var, wt = wt)
# # A tibble: 2 x 2
# var n
# <fctr> <dbl>
# 1 A 40
# 2 B 60
只是为了完整起见,使用base R:
df <- data.frame(var = c("A", "A", "B", "B"), wt = c(30, 10, 20, 40))
aggregate(x = list("wt" = df$wt), by = list("var" = df$var), FUN = sum)
var wt
1 A 40
2 B 60
或者使用不那么繁琐的公式表示法:
aggregate(wt ~ var, data = df, FUN = sum)
var wt
1 A 40
2 B 60
使用data.table
你可以做:
# using the same data as Victorp
setDT(df)[, .(n = sum(wt)), var]
var n
1: A 40
2: B 60
包中的另一个解决方案expss
:
df <- data.frame(var = c("A", "A", "B", "B"), wt = c(30, 10, 20, 40))
library(expss)
fre(df$var, weight = df$wt)
| df$var | Count | Valid percent | Percent | Responses, % | Cumulative responses, % |
| ------ | ----- | ------------- | ------- | ------------ | ----------------------- |
| A | 40 | 40 | 40 | 40 | 40 |
| B | 60 | 60 | 60 | 60 | 100 |
| #Total | 100 | 100 | 100 | 100 | |
| <NA> | 0 | | 0 | | |
您还可以使用包 freqweights 中的 tablefreq:
df <- data.frame(var = c("A", "A", "B", "B"), wt = c(30, 10, 20, 40))
library(freqweights)
tablefreq(df, "var", "wt")
A tibble: 2 x 2
var freq
<fct> <dbl>
1 A 40
2 B 60
使用包装重量和功能 wpct
require(weights)
df <- data.frame(var = c("A", "A", "B", "B"), wt = c(30, 10, 20, 40))
wpct(df$var, df$wt)
A B
0.4 0.6