I am trying to get something like what the smoothScatter
function does, only in ggplot. I have figured out everything except for plotting the N most sparse points. Can anyone help me with this?
library(grDevices)
library(ggplot2)
# Make two new devices
dev.new()
dev1 <- dev.cur()
dev.new()
dev2 <- dev.cur()
# Make some data that needs to be plotted on log scales
mydata <- data.frame(x=exp(rnorm(10000)), y=exp(rnorm(10000)))
# Plot the smoothScatter version
dev.set(dev1)
with(mydata, smoothScatter(log10(y)~log10(x)))
# Plot the ggplot version
dev.set(dev2)
ggplot(mydata) + aes(x=x, y=y) + scale_x_log10() + scale_y_log10() +
stat_density2d(geom="tile", aes(fill=..density..^0.25), contour=FALSE) +
scale_fill_gradientn(colours = colorRampPalette(c("white", blues9))(256))
Notice how in the base graphics version, the 100 most "sparse" points are plotted over the smoothed density plot. Sparseness is defined by the value of the kernel density estimate at the point's coordinate, and importantly, the kernel density estimate is calculated after the log transform (or whatever other coordinate transform). I can plot all points by adding + geom_point(size=0.5)
, but I only want the sparse ones.
Is there any way to accomplish this with ggplot? There are really two parts to this. The first is to figure out what the outliers are after coordinate transforms, and the second is to plot only those points.