r - Hexbin：为每个 bin 应用函数

Question

我想构建 hexbin 图，其中每个 bin 都绘制了“落入此 bin 的 1 类和 2 类点之间的比率”（无论是否为对数）。

x <- rnorm(10000)
y <- rnorm(10000)
h <- hexbin(x,y)
plot(h)
l <- as.factor(c( rep(1,2000), rep(2,8000) ))

关于如何实现这一点的任何建议？有没有办法根据 bin 统计信息向每个 bin 引入函数？

score 3 · Accepted Answer

@cryo111 的答案有最重要的成分 - IDs = TRUE。在那之后，只需弄清楚你想用Inf's 做什么，以及你需要将比率缩放多少才能获得将产生漂亮图的整数。

library(hexbin)
library(data.table)

set.seed(1)
x = rnorm(10000)
y = rnorm(10000)

h = hexbin(x, y, IDs = TRUE)

# put all the relevant data in a data.table
dt = data.table(x, y, l = c(1,1,1,2), cID = h@cID)

# group by cID and calculate whatever statistic you like
# in this case, ratio of 1's to 2's,
# and then Inf's are set to be equal to the largest ratio
dt[, list(ratio = sum(l == 1)/sum(l == 2)), keyby = cID][,
     ratio := ifelse(ratio == Inf, max(ratio[is.finite(ratio)]), ratio)][,
     # scale up (I chose a scaling manually to get a prettier graph)
     # and convert to integer and change h
     as.integer(ratio*10)] -> h@count

plot(h)

在此处输入图像描述

score 1 · Accepted Answer

您可以通过以下方式确定每个 bin 中 1 类和 2 类点的数量

library(hexbin)
library(plyr)
x=rnorm(10000)
y=rnorm(10000)
#generate hexbin object with IDs=TRUE
#the object includes then a slot with a vector cID
#cID maps point (x[i],y[i]) to cell number cID[i]
HexObj=hexbin(x,y,IDs = TRUE)

#find count statistics for first 2000 points (class 1) and the rest (class 2)
CountDF=merge(count(HexObj@cID[1:2000]),
              count(HexObj@cID[2001:length(x)]),
              by="x",
              all=TRUE
             )
#replace NAs by 0
CountDF[is.na(CountDF)]=0
#check if all points are included
sum(CountDF$freq.x)+sum(CountDF$freq.y)

但打印它们是另一回事。例如，如果一个 bin 中没有 2 类点怎么办？那时没有定义分数。另外，据我了解hexbin只是一个二维直方图。因此，它计算落入给定 bin 的点数。我认为它不能像您的情况那样处理非整数数据。

r - Hexbin：为每个 bin 应用函数

2 回答 2

Related

Reference