分组新变量_CDA答疑社区

啊啊啊啊啊吖

2018-11-03 阅读量: 738

分组新变量

虽然与 summarize() 函数结合起来使用是最有效的，但分组也可以与 mutate() 和 filter()

函数结合，以完成非常便捷的操作。

• 找出每个分组中最差的成员：

flights_sml %>%
group_by(year, month, day) %>%
filter(rank(desc(arr_delay)) < 10)
#> Source: local data frame [3,306 x 7]
#> Groups: year, month, day [365]
#>
#> year month day dep_delay arr_delay distance
#> <int> <int> <int> <dbl> <dbl> <dbl>
#> 1 2013 1 1 853 851 184
#> 2 2013 1 1 290 338 1134
#> 3 2013 1 1 260 263 266
#> 4 2013 1 1 157 174 213
#> 5 2013 1 1 216 222 708
#> 6 2013 1 1 255 250 589
#> # ... with 3,300 more rows, and 1 more variables:
#> # air_time <dbl>

• 找出大于某个阈值的所有分组：

popular_dests <- flights %>%
group_by(dest) %>%
filter(n() > 365)
popular_dests
#> Source: local data frame [332,577 x 19]
#> Groups: dest [77]
#>
#> year month day dep_time sched_dep_time dep_delay
#> <int> <int> <int> <int> <int> <dbl>
#> 1 2013 1 1 517 515 2
#> 2 2013 1 1 533 529 4
#> 3 2013 1 1 542 540 2
#> 4 2013 1 1 544 545 -1
#> 5 2013 1 1 554 600 -6
#> 6 2013 1 1 554 558 -4
#> # ... with 3.326e+05 more rows, and 13 more variables:
#> # arr_time <int>, sched_arr_time <int>,
#> # arr_delay <dbl>, carrier <chr>, flight <int>,
#> # tailnum <chr>, origin <chr>, dest <chr>,
#> # air_time <dbl>, distance <dbl>, hour <dbl>,
#> # minute <dbl>, time_hour <dttm>

6.8974

关注作者

发表评论

暂无数据

CDA考试动态

CDA报考指南

推荐帖子