我有一个函数可以用序数数据计算矩阵中的简单匹配距离:
require(proxy)
m <- test
f <- function(x,y) sum(x == y) / NROW(x)
matches <- as.matrix(dist(m, f, upper=TRUE))
问题是当有缺失值时,这个函数将不起作用,例如在下面的矩阵中。
test <- structure(list(X1 = c(1, 2, 3, 4, 2, NA), X2 = c(2, 3, 4, 5,
3, 6), X3 = c(3, 4, NA, 5, 3, 2), X4 = c(2, 4, 6, 5, 3, 8), X5 = c(1,
3, 2, 4, 6, 4)), .Names = c("X1", "X2", "X3", "X4", "X5"), row.names = c(NA,
6L), class = "data.frame")
由此产生的距离矩阵将是:
> matches
1 2 3 4 5 6
1 0.0 0.0 NA 0 0.2 NA
2 0.0 0.0 NA 0 0.4 NA
3 NA NA 0 NA NA NA
4 0.0 0.0 NA 0 0.0 NA
5 0.2 0.4 NA 0 0.0 NA
6 NA NA NA NA NA 0
即使存在缺失值,如何调整此函数来计算匹配距离?