1

我正在尝试在R 中进行数据科学练习(7.5.2.1,#2):使用 geom_tile() 和 dplyr 来探索平均航班延误如何随目的地和一年中的月份而变化。是什么让情节难以阅读?你怎么能改进它?

首先,转换列。

library(nycflights13)
foo <- nycflights13::flights %>%
  transmute(tot_delay = dep_delay + arr_delay, m = month, d = dest) %>%
  filter(!is.na(tot_delay)) %>%
  group_by(m, d) %>%
  summarise(avg_delay = mean(tot_delay))

现在foo似乎是基于“源”输出的数据框。

> foo
Source: local data frame [1,112 x 3]
Groups: m [?]

       m     d avg_delay
   <int> <chr>     <dbl>
1      1   ALB 76.571429
2      1   ATL  8.567982
3      1   AUS 19.017751
4      1   AVL 49.000000
5      1   BDL 32.081081
6      1   BHM 47.043478
7      1   BNA 25.930233
8      1   BOS  2.698517
9      1   BQN  8.516129
10     1   BTV 18.393665
# ... with 1,102 more rows

as_tibble似乎没有工作,我做错了什么?

> as_tibble(foo)
Source: local data frame [1,112 x 3]
Groups: m [?]

       m     d avg_delay
   <int> <chr>     <dbl>
1      1   ALB 76.571429
2      1   ATL  8.567982
3      1   AUS 19.017751
4      1   AVL 49.000000
5      1   BDL 32.081081
6      1   BHM 47.043478
7      1   BNA 25.930233
8      1   BOS  2.698517
9      1   BQN  8.516129
10     1   BTV 18.393665
# ... with 1,102 more rows

tibble的内部结构不应该不同吗?

> str(foo)
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 1112 obs. of  3 variables:
 $ m        : int  1 1 1 1 1 1 1 1 1 1 ...
 $ d        : chr  "ALB" "ATL" "AUS" "AVL" ...
 $ avg_delay: num  76.57 8.57 19.02 49 32.08 ...
 - attr(*, "vars")=List of 1
  ..$ : symbol m
 - attr(*, "drop")= logi TRUE
> str(as_tibble(foo))
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 1112 obs. of  3 variables:
 $ m        : int  1 1 1 1 1 1 1 1 1 1 ...
 $ d        : chr  "ALB" "ATL" "AUS" "AVL" ...
 $ avg_delay: num  76.57 8.57 19.02 49 32.08 ...
 - attr(*, "vars")=List of 1
  ..$ : symbol m
 - attr(*, "drop")= logi TRUE

请注意 as_tibble() 按预期工作

> packageDescription("tibble")
Package: tibble
Encoding: UTF-8
Version: 1.3.0

> is_tibble(foo)
[1] TRUE
4

1 回答 1

0

为我工作 -foo是一个“tibble”,在印刷品中被宣布为“A tibble: 112 x 3”:

> foo
Source: local data frame [1,112 x 3]
Groups: m [?]

# A tibble: 1,112 x 3
       m     d avg_delay
   <int> <chr>     <dbl>
 1     1   ALB 76.571429
 2     1   ATL  8.567982

所以你可能有一个旧版本的 dplyr。我的是:

> packageDescription("dplyr")
Package: dplyr
Type: Package
Version: 0.5.0

还有其他一切:

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.4 LTS

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.5.0  tibble_1.3.1

loaded via a namespace (and not attached):
[1] magrittr_1.5     R6_2.2.0         assertthat_0.2.0 DBI_0.5-1       
[5] tools_3.3.1      Rcpp_0.12.11     rlang_0.1.1     
于 2017-06-18T21:24:14.933 回答