I'm just learning R and trying to find ways to modify my grouped data.frame
in order to get a mean of a variable value
(x+y/2), and standard deviation (sd
) sqrt((x^2+y^2)/2) of cohesive observations. Other (equal) variables (sequence
, value1
) should not change.
I used subset()
and rowMeans()
, but I wonder if there is a nicer way using dplyr
and tidyr
(probably using a nested dataframe?)
My test data.frame looks like:
id location value sd sequence value1
"anon1" "nose" 5 0.2 "a" 1
"anon2" "body" 4 0.4 "a" 2
"anon3" "left_arm" 3 0.3 "a" 3
"anon3" "right_arm" 5 0.6 "a" 3
"anon4" "head" 4 0.3 "a" 4
"anon5" "left_leg" 2 0.2 "a" 5
"anon5" "right_leg" 1 0.1 "a" 5
dput output of my test data.frame:
myData <- structure(list(ï..id = structure(c(1L, 2L, 3L, 3L, 4L, 5L, 5L
), .Label = c("anon1", "anon2", "anon3", "anon4", "anon5"), class = "factor"),
location = structure(c(5L, 1L, 3L, 6L, 2L, 4L, 7L), .Label = c("body",
"head", "left_arm", "left_leg", "nose", "right_arm", "right_leg"
), class = "factor"), value = c(5L, 4L, 3L, 5L, 4L, 2L, 1L
), sd = c(0.2, 0.4, 0.3, 0.6, 0.3, 0.2, 0.1), sequence = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L), .Label = "a", class = "factor"),
value1 = c(1L, 2L, 3L, 3L, 4L, 5L, 5L)), .Names = c("ï..id",
"location", "value", "sd", "sequence", "value1"), class = "data.frame", row.names = c(NA,
-7L))
how it should look:
id location value sd sequence value1
"anon1" "nose" 5 0.2 "a" 1
"anon2" "body" 4 0.4 "a" 2
"anon3" "arm" 4 0.47 "a" 3
"anon4" "head" 4 0.3 "a" 4
"anon5" "leg" 1.5 0.15 "a" 5