bootstrapping - 当我调用 sample() 函数时，为什么 R 会告诉我我的概率分布中有 NA？

Question

当我尝试运行以下函数时遇到问题。我得到的确切错误是：Error in sample.int(length(x), size, replace, prob) : NA in probability vector。

我使用这print(t)条线来查看它在哪里停止，它似乎在第 10 次迭代左右，此时，我查看NA我的概率向量中是否有任何值w，但没有。最小值在 10e-5 的量级上。

有谁知道是什么导致了这个错误？概率向量中的值是否可能太小以至于 R 将它们解释为NA？

我对函数的调用：

boosted_prediction <- boost_LS(x_train, y_train, x_test, 1500)

我的功能：

boost_LS <- function (x, y, x_test, ts) {
  n <- nrow(x)
  w <- matrix(rep(1 / n, n), n, 1)
  boost_pred <- matrix(0, nrow(x_test), 1)
  for (t in 1:ts) {
    bootstrap_index <- sample(1:n, size = n, replace = TRUE, prob = w)
    bootstrap_x <- as.matrix(x[bootstrap_index, ])
    bootstrap_y <- as.matrix(y[bootstrap_index])
    ls_w <- solve(t(bootstrap_x) %*% bootstrap_x) %*% t(bootstrap_x) %*% bootstrap_y
    pred <- sign(bootstrap_x %*% ls_w)
    e_t <- sum(w[bootstrap_y != pred])
    a_t <- 0.5 * log((1 - e_t) / e_t)
    w_hat <- matrix(0, n, 1)
    for (i in 1:n) {
      w_hat[i, 1] <- w[i, 1] * exp(-a_t * bootstrap_y[i, 1] * pred[i, 1])
    }
    w <- w_hat / sum(w_hat)
    boost_pred <- boost_pred + (a_t * (x_test %*% ls_w))
  #  print(t)
  }
  return(sign(boost_pred))
}

编辑：所以，我发现我的错误率 ( e_t) 在 6-7 次迭代后变为 0，所以我的新权重概率向量 ( a_t) 将变为Inf，这弄乱了我的概率向量......

现在，这已不再是调试问题，而是 AdaBoost 算法的逻辑问题。如果有人有任何提示，将不胜感激！

bootstrapping - 当我调用 sample() 函数时，为什么 R 会告诉我我的概率分布中有 NA？

0 回答 0

Related

Reference