我有许多不同的表,我想在 R 中编写一个函数,其中:
表 1:
coordinates var1.pred var1.var observed residual zscore fold
1 (2579410, 1079720) 5.057024 0.4325275 5.468 0.41097625 0.62489903 1
2 (2579330, 1079730) 5.329797 0.3945041 4.498 -0.83179667 -1.32431534 2
3 (2579260, 1079770) 4.788211 0.5576228 5.114 0.32578861 0.43628035 3
4 (2579930, 1080030) 5.067753 0.4972365 4.764 -0.30375347 -0.43076434 4
5 (2579700, 1079770) 5.116632 0.5792768 4.626 -0.49063190 -0.64463327 5
6 (2579540, 1079640) 4.865667 0.6122453 6.522 1.65633254 2.11682434 6
7 (2579860, 1079880) 5.139779 0.4655840 4.856 -0.28377887 -0.41589245 7
如果“观察到”的值超出了以下两个值的容差,则将其标记为异常值:
var1.pred+(1.96*sqrt(var1.var))
var1.pred-(.96*sqrt(var1.var))
换句话说:
if
var1.pred-(1.96*sqrt(var1.var)) < 'observed' < var1.pred-(1.96*sqrt(var1.var))
结果正常,否则结果异常。
此外,列的名称相同,表名称为 1,2,3 .... 。
dat <- structure(list(coordinates = structure(c(3L, 2L, 1L, 7L, 5L,
4L, 6L), .Label = c("(2579260, 1079770)", "(2579330, 1079730)",
"(2579410, 1079720)", "(2579540, 1079640)", "(2579700, 1079770)",
"(2579860, 1079880)", "(2579930, 1080030)"), class = "factor"),
var1.pred = c(5.057024, 5.329797, 4.788211, 5.067753, 5.116632,
4.865667, 5.139779), var1.var = c(0.4325275, 0.3945041, 0.5576228,
0.4972365, 0.5792768, 0.6122453, 0.465584), observed = c(5.468,
4.498, 5.114, 4.764, 4.626, 6.522, 4.856), residual = c(0.41097625,
-0.83179667, 0.32578861, -0.30375347, -0.4906319, 1.65633254,
-0.28377887), zscore = c(0.62489903, -1.32431534, 0.43628035,
-0.43076434, -0.64463327, 2.11682434, -0.41589245), fold = 1:7), .Names = c("coordinates",
"var1.pred", "var1.var", "observed", "residual", "zscore", "fold"
), row.names = c(NA, -7L), class = "data.frame")