我对 R 很陌生,使用 geom_raster() 函数创建热图时遇到了一些麻烦。因此,我本周正在处理 tidytuesday 挑战,我想创建一张热图,以显示举办比赛是否对主办团队有利。我查看指标:分别代表x和y值的team_name和pole 。然后我用主变量填充图表,看看每支球队是否有任何趋势,它的完赛位置,以及他们是否是比赛的东道主。
下面是我用来创建热图和热图本身的代码片段。到此为止,我整理了数据,这就是时髦数据名称的原因。
pole_position <- c("P1", "P2", "P3", "P4", "P5", "P6", "P7", "P8", "P9", "P10", "P11", "P12", "P13", "P14", "P15", "P16")
ggplot(data = clean_marbles_2, mapping = aes(x = team_name, y = pole, fill = host)) +
geom_raster() +
scale_y_discrete(limits = pole_position) +
coord_flip() +
labs(x = "Team name", y = "Finish placement", title = "Does hosting the race affect finish placement?")
起初我认为这是一个非常酷的图形,但我很快意识到它缺少一些“是”的主机。此图中应该有 16 个不同的蓝绿色盒子,但只有 11 个。
然后,我对图表进行了分面,以确定它是否能够识别输入的数据。下面是生成图形的代码和照片。pole_position 的值在两个图表之间没有变化。
ggplot(data = clean_marbles_2, mapping = aes(x = team_name, y = pole, fill = host)) +
geom_raster() +
scale_y_discrete(limits = pole_position) +
coord_flip() +
labs(x = "Team name", y = "Finish placement", title = "Does hosting the race affect finish placement?") +
facet_wrap(~host)
如您所见,所有十六个蓝色瓷砖都出现在“是”区域中。我完全不明白为什么之前的图形只记录了 16 个蓝色瓷砖中的 11 个。
我的问题是:为什么不是所有的蓝色瓷砖都出现在第一个图形中?
感谢任何帮助和/或建设性的批评。谢谢!
这是 tidytuesday Github 存储库的链接:这里。
编辑:
这是我对整理数据所做的事情,请不要因为我做错了什么而责备我,我很想学习任何提高编码效率的方法。
# Read in the data from the github repo
marbles <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-06-02/marbles.csv')
# Set the correct point & pole values
marbles$points[marbles$pole == 'P1'] = 25
marbles$pole[marbles$points == 25] = 'P1'
marbles$points[marbles$pole == 'P2'] = 18
marbles$pole[marbles$points == 18] = 'P2'
marbles$points[marbles$pole == 'P3'] = 15
marbles$pole[marbles$points == 15] = 'P3'
marbles$points[marbles$pole == 'P4'] = 12
marbles$pole[marbles$points == 12] = 'P4'
marbles$points[marbles$pole == 'P5'] = 10
marbles$pole[marbles$points == 10] = 'P5'
marbles$points[marbles$pole == 'P6'] = 8
marbles$pole[marbles$points == 8] = 'P6'
marbles$points[marbles$pole == 'P7'] = 6
marbles$pole[marbles$points == 6] = 'P7'
marbles$points[marbles$pole == 'P8'] = 4
marbles$pole[marbles$points == 4] = 'P8'
marbles$points[marbles$pole == 'P9'] = 2
marbles$pole[marbles$points == 2] = 'P9'
marbles$points[marbles$pole == 'P10'] = 1
marbles$pole[marbles$points == 1] = 'P10'
marbles$points[marbles$pole == 'P11'] = 0
marbles$pole[marbles$points == 0] = 'P11'
# replace any excess and incorrect pole/point values to align with my scale.
marbles[186, 8] = 'P10'
marbles[186, 9] = 1
# Replace the pole values for the 0 point scores
# This was done for many more values than what is seen here.
marbles[252,8] = 'P12'
marbles[253,8] = 'P13'
marbles[254,8] = 'P14'
marbles[255,8] = 'P15'
marbles[256,8] = 'P16'
# Remove the notes and source sections of the tidy data
clean_marbles = subset(marbles, select = -c(notes, source))
# Create a clean subset without any NA values
clean_marbles_2 = na.omit(clean_marbles)
我知道这是非常乏味的。您可以在我上面包含的代码中看到点和极点的对应值。我试图使数据更加统一,认为之后更容易可视化,但我想不是。