我有一个这样的数据框:
df = data.frame (Gender = c ("F", "M", "M", "F"),
cat_age = c ("] 10-15]", "] 10, 15]", "] 20 -25] ","] 55-60] "),
frequency = c (2, 6, 8, 7))
我想把它改成这样:
F; M; cat_age
2; 6; ] 10, 15]
0; 8; ] 20, 25]
7; 0; ] 55, 60]
您的 data.frame 有一些奇怪的地方,如果"] 10-15]"并且"] 10, 15]"应该是同一类别,您需要在 data.frame 中进行设置。例如:
df = data.frame (Gender = c ("F", "M", "M", "F"),
cat_age = c ("] 10-15]", "] 10-15]", "] 20 -25] ","] 55-60] "), frequency = c (2, 6, 8, 7))
然后你可以使用pivot_wider()from tidyr:
library(tidyr)
pivot_wider(df,values_from=frequency,names_from=Gender,values_fill=0)
# A tibble: 3 x 3
cat_age F M
<fct> <dbl> <dbl>
1 "] 10-15]" 2 6
2 "] 20 -25] " 0 8
3 "] 55-60] " 7 0
这是使用的基本 R 选项reshape
dfout <- reshape(
transform(df,
cat_age = sapply(
regmatches(cat_age, gregexpr("\\d+", cat_age)),
function(x) paste0("]", paste0(x, collapse = ","), "]")
)
),
direction = "wide",
idvar = "cat_age",
timevar = "Gender"
)
这使
> dfout
cat_age frequency.F frequency.M
1 ]10,15] 2 6
3 ]20,25] NA 8
4 ]55,60] 7 NA
如果要替换NA为0,可以再添加一行
replace(df,is.na(df),0)
这样
> replace(dfout,is.na(dfout),0)
cat_age frequency.F frequency.M
1 ]10,15] 2 6
3 ]20,25] 0 8
4 ]55,60] 7 0