我有一个所有案例之间的相似矩阵,并且在一个单独的数据框中,这些案例的类别。我想计算来自同一类的案例之间的平均相似度,这是来自 j 类的示例 n 的等式:
我们必须计算 n 和与 n 来自同一类的所有案例 k 之间的所有平方接近的总和。链接: http: //www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#outliers
我用 2 个 for 循环实现了它,但它真的很慢。有没有更快的方法在 R 中做这样的事情?
谢谢。
//数据(输入)
带类的数据框:
structure(list(class = structure(c(1L, 2L, 2L, 1L, 3L, 3L, 1L,
1L, 2L, 3L), .Label = c("1", "2", "3", "5", "6", "7"), class = "factor")), .Names = "class", row.names = c(NA,
-10L), class = "data.frame")
接近矩阵(m行m列对应上面数据框m行中的类):
structure(c(1, 0.60996875, 0.51775, 0.70571875, 0.581375, 0.42578125,
0.6595, 0.7134375, 0.645375, 0.468875, 0.60996875, 1, 0.77021875,
0.55171875, 0.540375, 0.53084375, 0.4943125, 0.462625, 0.7910625,
0.56321875, 0.51775, 0.77021875, 1, 0.451375, 0.60353125, 0.62353125,
0.5203125, 0.43934375, 0.6909375, 0.57159375, 0.70571875, 0.55171875,
0.451375, 1, 0.69196875, 0.59390625, 0.660375, 0.76834375, 0.606875,
0.65834375, 0.581375, 0.540375, 0.60353125, 0.69196875, 1, 0.7194375,
0.684, 0.68090625, 0.50553125, 0.60234375, 0.42578125, 0.53084375,
0.62353125, 0.59390625, 0.7194375, 1, 0.53665625, 0.553125, 0.513,
0.801625, 0.6595, 0.4943125, 0.5203125, 0.660375, 0.684, 0.53665625,
1, 0.8456875, 0.52878125, 0.65303125, 0.7134375, 0.462625, 0.43934375,
0.76834375, 0.68090625, 0.553125, 0.8456875, 1, 0.503, 0.6215,
0.645375, 0.7910625, 0.6909375, 0.606875, 0.50553125, 0.513,
0.52878125, 0.503, 1, 0.60653125, 0.468875, 0.56321875, 0.57159375,
0.65834375, 0.60234375, 0.801625, 0.65303125, 0.6215, 0.60653125,
1), .Dim = c(10L, 10L))
正确结果:
c(2.44197227050781, 2.21901680175781, 2.07063155175781, 2.52448621289062,
1.88040830957031, 2.16019295703125, 2.58622273828125, 2.81453253222656,
2.1031745078125, 2.00542063378906)