1

Suppose we have a data frame like this:

dat <- data.frame(
    a = rnorm(1000),
    b = 1/(rnorm(1000))^2,
    c = 1/rnorm(1000),
    d = as.factor(sample(c(0, 1, 2), 1000, replace=TRUE)),
    e = as.factor(sample(c('X', 'Y'), 1000, replace=TRUE))
)

We would like to compute a histogram on this data in all dimensions (i.e a, b, c, d, e) with specified breaks in each dimension. Obviously factor dimensions imply their breaks already. The final data should like like a data.frame where each row is a vector of breaks across all dimensions (combination of breaks) and the data occurrence count for this combination. Python numpy has histogramdd: Multidimension histogram in python. Is there something similar in R? What is the best way to do this in R? Thank you.

I ended up using the following, where bin counts are passed to the function as the last row:

dat <- data.frame(
    a = rnorm(1000),
    b = 1/(rnorm(1000))^2,
    c = 1/rnorm(1000),
    d = as.factor(sample(c(0, 1, 2), 1000, replace=TRUE)),
    e = as.factor(sample(c('X', 'Y'), 1000, replace=TRUE))
)

dat[nrow(dat)+1,] <- c(10,10,10,NaN,NaN)

histnd <- function(df) {
  res <- lapply(df, function(x) {
    bin_idx <- length(x)
    if (is.factor(x) || is.character(x)) {
      return(x[-bin_idx])
    }
    #
    x_min <- min(x[-bin_idx])
    x_max <- max(x[-bin_idx])
    breaks <- seq(x_min, x_max, (x_max - x_min)/x[bin_idx])
    cut(x[-bin_idx], breaks)
    })
  res <- do.call(data.frame, res)
  res$FR <- as.numeric(0)
  res <- aggregate(FR ~ ., res, length)
}

h <- histnd(dat)
4

1 回答 1

1

我不知道预期的结果是什么,但这应该提供一个起点:

histnd <- function(DF) {
  res <- lapply(DF, function(x) {
    if (is.factor(x) || is.character(x)) return(x)
    breaks <- pretty(range(x), n = nclass.Sturges(x), min.n = 1)
    cut(x, breaks)
    })
  res <- do.call(data.frame, res)
  as.data.frame(table(res))
}

h <- histnd(dat)
于 2015-09-16T17:31:47.020 回答