我的数据框中有一个汇总统计数据:
war_3 a1_1_area_mean a1_2_area_mean a1_3_area_mean a1_4_area_mean a1_5_area_mean a1_6_area_mean
1 1 0.23827851 0.07843460 0.02531607 0.1193928 0.7635068 0.02333938
2 2 0.23162416 0.05949285 0.01422585 0.3565457 0.8593997 0.06895526
3 3 0.09187454 0.07274503 0.10357251 0.2821142 0.5929178 0.02455053
a1_7_area_mean a1_8_area_mean a1_t_area_mean a2_1_area_mean a2_2_area_mean a2_3_area_mean
1 0.005387169 0.2725867 1.526242 0.107725394 0.19406917 0.02213419
2 0.016701786 0.2222106 1.829156 0.073991405 0.03504120 0.00815826
3 0.028382414 0.1997225 1.395880 0.003634443 0.03508602 0.00000000
a2_4_area_mean a2_5_area_mean a2_t_area_mean a1_1_area_var a1_2_area_var a1_3_area_var a1_4_area_var
1 0.02024704 0.0040841950 0.34826000 1.2730028 0.13048871 0.05165589 0.1851353
2 0.07621595 0.0005078053 0.19391462 0.6114136 0.09287735 0.05697542 0.7284144
3 0.00000000 0.0000000000 0.03872046 0.1171754 0.07581946 0.35349703 0.3883895
a1_5_area_var a1_6_area_var a1_7_area_var a1_8_area_var a1_t_area_var a2_1_area_var a2_2_area_var
1 2.7640424 0.01688505 0.001459156 0.8844626 7.940393 0.57992528 1.41104857
2 2.6797714 0.05490461 0.003428341 0.5725653 8.190389 0.18087732 0.11406984
3 0.9938991 0.01801805 0.006360622 0.3405592 3.460435 0.00306776 0.06579978
a2_3_area_var a2_4_area_var a2_5_area_var a2_t_area_var a1_1_area_sd a1_2_area_sd a1_3_area_sd
1 0.067049470 0.06260921 0.0045015472 2.10734089 1.1282743 0.3612322 0.2272793
2 0.009580693 0.29505206 0.0005616327 0.85060972 0.7819294 0.3047579 0.2386952
3 0.000000000 0.00000000 0.0000000000 0.06861217 0.3423089 0.2753533 0.5945562
a1_4_area_sd a1_5_area_sd a1_6_area_sd a1_7_area_sd a1_8_area_sd a1_t_area_sd a2_1_area_sd
1 0.4302735 1.6625410 0.1299425 0.03819890 0.9404587 2.817870 0.76152825
2 0.8534719 1.6370007 0.2343173 0.05855204 0.7566805 2.861886 0.42529674
3 0.6232090 0.9969449 0.1342313 0.07975351 0.5835745 1.860224 0.05538736
a2_2_area_sd a2_3_area_sd a2_4_area_sd a2_5_area_sd a2_t_area_sd
1 1.1878757 0.25893912 0.2502183 0.06709357 1.4516683
2 0.3377423 0.09788102 0.5431869 0.02369879 0.9222851
3 0.2565147 0.00000000 0.0000000 0.00000000 0.2619392
以上汇总表来自以下脚本和原始数据框,如下所示:
uid war_3 a1_1_area a1_2_area a1_3_area a1_4_area a1_5_area a1_6_area a1_7_area a1_8_area a1_t_area
1 1001 1 0 0.00000 0 0.67048 0.0000 0.02088 0 0.00000 0.69136
2 1002 2 0 0.00000 0 0.00000 0.9019 0.14493 0 0.00000 1.04683
3 1003 2 0 0.00000 0 0.00000 0.9019 0.00000 0 0.00000 0.90190
4 1004 2 0 1.09322 0 0.00000 0.0000 0.00000 0 0.00000 1.09322
5 1005 3 0 1.75000 0 0.00000 0.0000 0.00000 0 0.00000 1.75000
6 1006 2 0 2.43442 0 0.32223 0.0000 0.00000 0 0.76801 3.52466
a2_1_area a2_2_area a2_3_area a2_4_area a2_5_area a2_t_area
1 0 0 0 0 0 0
2 0 0 0 0 0 0
3 0 0 0 0 0 0
4 0 0 0 0 0 0
5 0 0 0 0 0 0
6 0 0 0 0 0 0
summary <- df.anov %>% select(-uid) %>% group_by(war_3,) %>%
summarize_each(funs(min,max,mean,median,var,sd)))
但是,由于很难war_3
通过均值、var 和 sd 来比较(组)对中的每个值,我想将其转换为以下格式:
variable war_3 mean variance s.d.
a1_1_area, 1 , x , x , x
a1_1_area, 2 , x , x , x
a1_1_area, 3 , x , x , x
a1_2_area, 1 , x , x , x
a1_2_area, 2 , x , x , x
a1_2_area, 3 , x , x , x
a1_3_area, 1 , x , x , x
a1_3_area, 2 , x , x , x
a1_3_area, 3 , x , x , x
a1_4_area, 1 , x , x , x
a1_4_area, 2 , x , x , x
a1_4_area, 3 , x , x , x
(it continues until `a2_5_area` in `variable`)
我曾经使用gather
indplyr
将简单数据帧的宽格式重新排列为长格式,但是这个数据帧需要更复杂的操作,可能需要重复操作select(matches())
。
变量是:
war_3
对每条记录进行分组的变量(group_by(war_3) %>% summarize_each(funs(mean,var,sd))
在前面的操作中已经分组)
aX_Y_area_Z
: whereX
有两个值,分别为 1 和 2,Y
分别1-8
为X=1
和1-5
for X=2
。Z
有 3 个统计量mean, variance and s.d.
。
你能帮我实现吗?
我更喜欢使用dplyr
管道而不是data.table()
解决方案。
以下脚本是非常手动的方式,但在每个脚本中都会产生重复的记录,gather()
我不想手动指定每个列号或名称。
summary %>%
gather(key1,mean,
a1_1_area_mean,a1_2_area_mean,a1_3_area_mean,a1_4_area_mean,
a1_5_area_mean,a1_6_area_mean,a1_7_area_mean,a1_8_area_mean,
a1_t_area_mean,a2_1_area_mean,a2_2_area_mean,a2_3_area_mean,
a2_4_area_mean,a2_5_area_mean,a2_t_area_mean) %>%
gather(key2,var,
a1_1_area_var,a1_2_area_var,a1_3_area_var,a1_4_area_var,
a1_5_area_var,a1_6_area_var,a1_7_area_var,a1_8_area_var,
a1_t_area_var,a2_1_area_var,a2_2_area_var,a2_3_area_var,
a2_4_area_var,a2_5_area_var,a2_t_area_var) %>%
gather(key3,sd,
a1_1_area_sd,a1_2_area_sd,a1_3_area_sd,a1_4_area_sd,
a1_5_area_sd,a1_6_area_sd,a1_7_area_sd,a1_8_area_sd,
a1_t_area_sd,a2_1_area_sd,a2_2_area_sd,a2_3_area_sd,
a2_4_area_sd,a2_5_area_sd,a2_t_area_sd) %>%
mutate_at(vars(key1),funs(str_sub(.,1,9))) %>% select(-key2,-key3) %>%
rename(key=key1) -> summary2