0

我想制作食物中毒数据的六边形图。我可以用 ggplot2 和 geom_hex 很容易地做到这一点......

ggplot(df) + geom_hex(aes(x=longitude, y=latitude))

在此处查看代码... https://gist.github.com/corynissen/5823114

但是,这只会绘制食物中毒的频率,这会产生误导,因为在有更多餐馆的地区会报告更多的食物中毒。因此,我想使用餐厅许可证数据对其进行规范化。

基本上,对于每个 bin,我想要 df 的计数除以 lic 的计数(请参阅链接中的数据/代码)。

4

1 回答 1

0

如果您也可以接受热图,这是我的肮脏解决方案:

# we don't want missing values in lat or lon
lic <- subset(lic, !is.na(longitude) & !is.na(latitude))

# get the x and y ranges for the union of both data sets
xmin <- min(c(df$longitude, lic$longitude))
xmax <- max(c(df$longitude, lic$longitude))
ymin <- min(c(df$latitude,  lic$latitude))
ymax <- max(c(df$latitude,  lic$latitude))

# set the number of bins and get x and y break points
n_bins  <- 30
xbreaks <- seq(xmin, xmax, length=(n_bins+1))
ybreaks <- seq(ymin, ymax, length=(n_bins+1))

# get the 2d histogram of the food inspections set
v1 <- cut(df$longitude, breaks=xbreaks)  # creates a factor of length nrow(df)
v2 <- cut(df$latitude,  breaks=ybreaks)  # creates a factor of length nrow(df)
A1 <- as.numeric(table(v1,v2))           # of length n_bins*n_bins

# get the 2d histogram of the business licenses set
v1 <- cut(lic$longitude, breaks=xbreaks) # creates a factor of length nrow(lic)
v2 <- cut(lic$latitude,  breaks=ybreaks) # creates a factor of length nrow(lic)
A2 <- as.numeric(table(v1,v2))           # of length n_bins*n_bins

# let's normalize the data
A3 <- A1 / A2
A3[is.infinite(A3) | is.na(A3)] <- 0  # 2 values were infinite!?

# create the final data set in a very very dirty way...
df2 <- data.frame(longitude = rep(seq(xmin, xmax, length=(2*n_bins+1))[seq(2, (2*n_bins+1), by=2)], times=n_bins), latitude = rep(seq(ymin, ymax, length=(2*n_bins+1))[seq(2, (2*n_bins+1), by=2)], each=n_bins), count=A3)

# ...and visualize it
ggplot() +
   geom_tile(data=df2, mapping=aes(x=longitude, y=latitude, fill=count))
于 2013-11-18T11:08:50.220 回答