0

感谢很多人,我的图表工作是 R 的新手。

I have three charts

随机条

绘图频率排序

按频率排序的条形图

绘制帕累托叠加

帕累托叠加 如果您仔细观察,您会看到按比例排列的有序频率图位于底部。

 ```{r}
df <- filter(df_clean_distances, end_station_name != "NA" )
d <-df %>% select( end_station_name) %>%
group_by(end_station_name) %>%
summarize( freq = n())
head(d$freq )
dput(head(d))
d2 <- d[ order(-d$freq),]
d2

随机绘制

```{r}
ggplot(d2, aes( x=end_station_name, y= freq)) + 
geom_bar( stat = "identity") + 
theme( axis.text.x = element_blank()) +
  ylim( c(0,40000))
```

绘图频率排序

 ```{r}
 ggplot(d2, aes( x=reorder(end_station_name,-freq), y= freq)) +    
    geom_bar( stat = "identity") +   
    theme(axis.text.x = element_blank()) +   
    ylim( c(0,40000))+
    labs( title = "end station by freq", x = "Station Name")

使用 Pareto 叠加绘图

```{r}

ggplot(d2, aes( x=reorder(end_station_name,-freq), y= freq)) +    
geom_bar( stat = "identity") +   theme(axis.text.x = element_blank()) +  
ggQC::stat_pareto( point.color = "red", point.size = 0.5) +
labs( title = "end station by freq", x = "Station Name") 
```

输入(头)输出

```{r}
> dput(head(d, n=20))
  structure(list(end_station_name = c("2112 W Peterson Ave", "63rd St 
  Beach", 
  "900 W Harrison St", "Aberdeen St & Jackson Blvd", "Aberdeen St & 
   Monroe St", 
  "Aberdeen St & Randolph St", "Ada St & 113th St", "Ada St & 
   Washington Blvd", 
  "Adler Planetarium", "Albany Ave & 26th St", "Albany Ave & 
   Bloomingdale Ave", 
  "Albany Ave & Montrose Ave", "Archer (Damen) Ave & 37th St", 
  "Artesian Ave & Hubbard St", "Ashland Ave & 13th St", "Ashland Ave & 
  50th St", 
  "Ashland Ave & 63rd St", "Ashland Ave & 66th St", "Ashland Ave & 
   69th St", 
  "Ashland Ave & 73rd St"), freq = c(1032L, 2524L, 3836L, 8383L, 
  6587L, 6136L, 18L, 6281L, 12050L, 397L, 2833L, 1875L, 710L, 1879L, 
  2659L, 151L, 112L, 102L, 78L, 8L)), row.names = c(NA, -20L), class = 
  c("tbl_df", "tbl", "data.frame"))
```    

如您所见,帕累托图适用于右手比例,但左手非常不合时宜。虽然有 300 万行,但 y 轴上的缩放已将频率降低到底部的一条非常小的曲线,但左侧很难看到。

如何将左 y 轴固定为大约 40,000,以便正确显示频率曲线?

4

1 回答 1

0

这是一个解决方案,但不是带有 package ggQC,带有sec_axis.
诀窍是预先计算max(freq),然后将其用作比例因子以对齐两个轴。此数据准备代码的灵感来自此rstudio-pubs 博客文章

library(ggplot2)
library(dplyr)

M <- max(d$freq)

d %>%
  arrange(desc(freq)) %>%
  mutate(cum_freq = cumsum(freq/sum(freq))) %>%
  ggplot(aes(x = reorder(end_station_name, -freq), y = freq)) +    
  geom_bar(stat = "identity") +   
  geom_line(mapping = aes(y = cum_freq*M, group = 1)) +
  geom_point(
    mapping = aes(y = cum_freq*M),
    color = "red", 
    size = 0.5
  ) +
  scale_y_continuous(
    sec.axis = sec_axis(~ ./M, 
                        labels = scales::percent,
                        name = "Cummulative percentage")) +
  labs( title = "end station by freq", x = "Station Name") +
  theme(axis.text.x = element_blank())

在此处输入图像描述

于 2021-04-30T16:49:22.690 回答