希望帮助生成频率的 2D 直方图,其中频率是在列中计算的。我的主要问题:从计数转换为基于列的频率。
这是我的起始代码:
# expected packages
library(ggplot2)
library(plyr)
# generate example data corresponding to expected data input
x_data = sample(101:200,10000, replace = TRUE)
y_data = sample(1:100,10000, replace = TRUE)
my_set = data.frame(x_data,y_data)
# define x and y interval cut points
x_seq = seq(100,200,10)
y_seq = seq(0,100,10)
# label samples as belonging within x and y intervals
my_set$x_interval = cut(my_set$x_data,x_seq)
my_set$y_interval = cut(my_set$y_data,y_seq)
# determine count for each x,y block
xy_df = ddply(my_set, c("x_interval","y_interval"),"nrow") # still need to convert for use with dplyr
# convert from count to frequency based on formula: freq = count/sum(count in given x interval)
################ TRYING TO FIGURE OUT #################
# plot results
fig_count <- ggplot(xy_df, aes(x = x_interval, y = y_interval)) + geom_tile(aes(fill = nrow)) # count
fig_freq <- ggplot(xy_df, aes(x = x_interval, y = y_interval)) + geom_tile(aes(fill = freq)) # frequency
对于如何计算列内频率的任何帮助,我将不胜感激。
谢谢!江淮
编辑:我认为解决方案将需要以下步骤 1)计算并存储每个 x 间隔因子的总计数 2)将单个 bin 计数除以其相应的 x 间隔因子计数以获得频率。
不知道如何执行此操作。.