r - Scatter plot in ggplot, one numeric variable across two groups

Question

I would like to create a scatter plot in ggplot2 which displays male test_scores on the x-axis and female test_scores on the y-axis using the dataset below. I can easily create a geom_line plot splitting male and female and putting the date ("dts") on the x-axis.

library(tidyverse)

#create data

dts <- c("2011-01-02","2011-01-02","2011-01-03","2011-01-04","2011-01-05",
"2011-01-02","2011-01-02","2011-01-03","2011-01-04","2011-01-05")

sex <- c("M","F","M","F","M","F","M","F","M","F")

test <- round(runif(10,.5,1),2)

semester <- data.frame("dts" = as.Date(dts), "sex" = sex, "test_scores" = 
test)

#show the geom_line plot
ggplot(semester, aes(x = dts, y = test, color = sex)) + geom_line()

It seems with only one time series, ggplot2 does better with the data in wide format than long format. For instance, I could easily create two columns, "male_scores" and "female_scores" and plot those against each other, but I would like to keep my data tidy and in long format.

Cheers and thank you.

score 3 · Accepted Answer

你整理得太多了。整理数据不仅仅是让它尽可能长的机制，它使它尽可能宽。

例如，如果您将位置作为 X 和 Y 用于动物目击，您将不会有两行，其中一行的“标签”列包含“X”，X 坐标位于“值”列中，另一行包含“Y” “标签”列和“值”列中的 Y 坐标 - 除非您确实将数据存储在键值存储中，但那是另一回事了......

扩大您的数据并将男性和女性的测试分数放入test_core_male和test_score_female，然后它们是散点图的 x 和 y 美学。

score 0 · Accepted Answer

保持数据较长的问题在于，给定的 Y 值不会有对应的 X 值。原因在于数据集的结构——

         dts  sex  test_scores
1 2011-01-02   M        0.67
2 2011-01-02   F        0.78
3 2011-01-03   M        0.58
4 2011-01-04   F        0.58
5 2011-01-05   M        0.51

如果 ypu 要使用代码——

ggplot(semester, aes(x = semester$test_scores[semester$sex=='M',] ,
                     y =  semester$test_scores[semester$sex=='F',], 
                     color = sex)) + geom_point()

GGplot 会出现错误。主要原因是通过对男性分数进行子集化，该子集没有相应的女性分数。您需要首先将数据折叠到日期级别。正如您正确指出的那样，这不是很长的格式。

我建议为这个单独的情节创建一个广泛的数据集。有多种方法可以做到这一点，但这是一个不同的主题。

r - Scatter plot in ggplot, one numeric variable across two groups

2 回答 2

Related

Reference