r - 在 R 中使用具有分类值的 k-NN

Question

我正在寻找对主要具有分类特征的数据进行分类。为此，欧几里得距离（或任何其他数值假设距离）不适合。

我正在寻找 [R] 的 kNN 实现，可以选择不同的距离方法，例如汉明距离。有没有一种方法可以使用具有不同距离度量函数的常见 kNN 实现，例如 {class} 中的实现？

我正在使用 R 2.15

score 8 · Accepted Answer

只要您可以计算距离/相异矩阵（以您喜欢的任何方式），您就可以轻松执行 kNN 分类，而无需任何特殊包。

# Generate dummy data
y <- rep(1:2, each=50)                          # True class memberships
x <- y %*% t(rep(1, 20)) + rnorm(100*20) < 1.5  # Dataset with 20 variables
design.set <- sample(length(y), 50)
test.set <- setdiff(1:100, design.set)

# Calculate distance and nearest neighbors
library(e1071)
d <- hamming.distance(x)
NN <- apply(d[test.set, design.set], 1, order)

# Predict class membership of the test set
k <- 5
pred <- apply(NN[, 1:k, drop=FALSE], 1, function(nn){
    tab <- table(y[design.set][nn])
    as.integer(names(tab)[which.max(tab)])      # This is a pretty dirty line
}

# Inspect the results
table(pred, y[test.set])

如果有人知道比上面的脏线更好的方法来找到向量中最常见的值，我很乐意知道。

需要该drop=FALSE参数来NN在 case 中保留 as 矩阵的子集k=1。如果不是，它将被转换为向量并apply抛出错误。

r - 在 R 中使用具有分类值的 k-NN

1 回答 1

Related

Reference