r - 在 ggplot2 中使用 geom_point 有条件地使用抖动

Question

我有一个包含 12 个变量的图表，分为两组。我不能使用构面，但使用颜色和形状，我已经能够使可视化易于理解。但是，有些点重叠（部分或全部）。我正在使用抖动来处理这些问题，但正如您从附图中看到的那样，这会导致所有点都被移动，而不仅仅是那些重叠的点。在此处输入图像描述

有没有办法有条件地使用抖动或闪避？更好的是，有没有办法将部分重叠的点并排放置？如您所见，我的 x 轴是离散类别，向左/向右轻微移动无关紧要。我尝试将 dotplot 与一起使用binaxis='y'，但这完全破坏了 x 轴。

编辑：这张图已经成功地完成了我正在寻找的事情。

进一步编辑：添加此可视化背后的代码。

disciplines <- c("Comp. Sc.\n(17.2%)", "Physics\n(19.6%)", "Maths\n(29.4%)", "Pol.Sc.\n(40.4%)", "Psychology\n(69.8%)")

# To stop ggplot from imposing alphabetical ordering on x-axis
disciplines <- factor(disciplines, levels=disciplines, ordered=T)

# involved aspects
intensive   <- c( 0.660,  0.438,  0.515,  0.028,  0.443)
comparative <- c( 0.361,  0.928,  0.270,  0.285,  0.311)
wh_adverbs  <- c( 0.431,  0.454,  0.069,  0.330,  0.577)
past_tense    <- c(0.334, 0.229, 0.668, 0.566, 0.838)
present_tense <- c(0.680, 0.408, 0.432, 0.009, 0.996)
conjunctions <- c( 0.928,  0.207,  0.162, -0.299, -0.045)
personal      <- c(0.498, 0.521, 0.332, 0.01, 0.01)
interrogative <- c(0.266, 0.202, 0.236, 0.02, 0.02)
sbj_objective <- c(0.913, 0.755, 0.863, 0.803, 0.913)
possessive    <- c(0.896, 0.802, 0.960, 0.611, 0.994)
thrd_person <- c(-0.244, -0.265, -0.310, -0.008, -0.384)
nouns       <- c(-0.602, -0.519, -0.388, -0.244, -0.196)

df1 <- data.frame(disciplines,
                 "Intensive Adverbs"=intensive,
                 "Comparative Adverbs"=comparative,
                 "Wh-adverbs (WRB)"=wh_adverbs,
                 "Verb: Past Tense"=past_tense,
                 "Verb: Present Tense"=present_tense,
                 "Conjunctions"=conjunctions,
                 "Personal Pronouns"=personal,
                 "Interrogative Pronouns"=interrogative,
                 "Subjective/Objective Pronouns"=sbj_objective,
                 "Possessive Pronouns"=possessive,
                 "3rd-person verbs"=thrd_person,
                 "Nouns"=nouns,
                 check.names=F)

df1.m <- melt(df1)
grp <- ifelse(df1.m$variable %in% c('3rd-person verbs','Nouns'), 'Informational Features', 'Involved Features')
g <- ggplot(df1.m, aes(group=grp, disciplines, value, shape=grp, colour=variable))
g <- g + geom_hline(yintercept=0, size=9, color="white")
g <- g + geom_smooth(method=loess, span=0.75, level=0.95, alpha=I(0.16), linetype="dashed")
g <- g + geom_point(size=4,  alpha=I(0.7), position=position_jitter(width=0.1, height=0))
g <- g + scale_shape_manual(values=c(17,19))

score 3 · Accepted Answer

我很好奇其他人可能会提出什么建议，但为了获得并排的效果，您可以将主要的 x 轴类别编码为数字 (10, 20,..50) 加/减一小部分，例如 (0.. 10)/2 基于您用于颜色的类别。因此，您可以将 x 轴设为 9.6、9.8、10.0、10.2 ...，然后是 20.0、20.2、20.4。这可以创建一个有组织的图，而不是随机分配这些部分调整。

这是针对您的数据集的该想法的快速实现。它将主 x 变量偏移了disciplines子类别的六分之一，variable并使用没有抖动的 x 值...

M = df1.m
ScaleFactor = 6
xadj = as.numeric(M$variable)/ScaleFactor
xadj = xadj - mean(xadj)   # shift it to center around zero
x10  = as.numeric(M$disciplines) * 10
M$x = x10 + xadj
g = ggplot(M, aes(group=grp, x, value, shape=grp, colour=variable)) 
g + geom_point(size=4,alpha=I(0.7)) + scale_x_discrete(breaks=x10,labels=disciplines)

请注意，每个类别中的值以相同的顺序均匀分布。（此代码不包括图中显示的所有曲线拟合等）。

在此处输入图像描述

变化：如果你“量化”你的 y 值，你可以更清楚地看到效果，所以它们中的更多并排绘制。

M$valmod = M$value - M$value %% 0.2 + .1

然后在语句中使用代替valmod来看看效果。valueaes()

要恢复类别标签，请使用手动设置scale_x_discrete。此版本使用不同ScaleFactor的更宽间距和量化 y 轴：

M=df1.m
ScaleFactor = 3
# Note this could just be xadj instead of adding to data frame
M$xadj = as.numeric(M$variable)/ScaleFactor
M$xadj = M$xadj - mean(M$xadj)   # shift it to center around zero
M$x10  = as.numeric(M$disciplines) * 10
M$x = M$x10 + M$xadj

Qfact = 0.2  # resolution to quantize y values
M$valmod = M$value - M$value %% Qfact + Qfact/2  # clump y to given resolution

g = ggplot(M, aes(group=grp, x, valmod, shape=grp, colour=variable)) +
    scale_x_discrete(limits = M$x10, breaks=unique(M$x10),labels=levels(M$disciplines))
g + geom_point(size=3,alpha=I(0.7))

量化的

r - 在 ggplot2 中使用 geom_point 有条件地使用抖动

1 回答 1

Related

Reference