2018-11-03
阅读量:
1033
添加新变量,我用mutate()
除了选择现有的列,我们在用R语言做数据分析时还时常需要添加新的列,心裂是现有列的函数,mutate的存在感很强!
mutate() 总是将新列添加在数据集的最后,因此我们需要先创建一个更狭窄的数据集,以
便能够看到新变量。记住,当使用 RStudio 时,查看所有列的最简单的方法就是使用 View()
函数:
flights_sml <- select(flights,
year:day,
ends_with("delay"),
distance,
air_time
)
mutate(flights_sml,
gain = arr_delay - dep_delay,
speed = distance / air_time * 60
)
#> # A tibble: 336,776 × 9
#> year month day dep_delay arr_delay distance air_time
#> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 2013 1 1 2 11 1400 227
#> 2 2013 1 1 4 20 1416 227
#> 3 2013 1 1 2 33 1089 160
#> 4 2013 1 1 -1 -18 1576 183
#> 5 2013 1 1 -6 -25 762 116
#> 6 2013 1 1 -4 12 719 150
#> # ... with 3.368e+05 more rows, and 2 more variables:
#> # gain <dbl>, speed <dbl>
一旦创建,新列就可以立即使用:
mutate(flights_sml,
gain = arr_delay - dep_delay,
hours = air_time / 60,
gain_per_hour = gain / hours
)
#> # A tibble: 336,776 × 10
#> year month day dep_delay arr_delay distance air_time
#> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 2013 1 1 2 11 1400 227
#> 2 2013 1 1 4 20 1416 227
#> 3 2013 1 1 2 33 1089 160
#> 4 2013 1 1 -1 -18 1576 183
#> 5 2013 1 1 -6 -25 762 116
#> 6 2013 1 1 -4 12 719 150
#> # ... with 3.368e+05 more rows, and 3 more variables:
#> # gain <dbl>, hours <dbl>, gain_per_hour <dbl>
如果只想保留新变量,可以使用transmute()函数:
transmute(flights,
gain = arr_delay - dep_delay,
hours = air_time / 60,
gain_per_hour = gain / hours
)
#> # A tibble: 336,776 × 3
#> gain hours gain_per_hour
#> <dbl> <dbl> <dbl>
#> 1 9 3.78 2.38
#> 2 16 3.78 4.23
#> 3 31 2.67 11.62
#> 4 -17 3.05 -5.57
#> 5 -19 1.93 -9.83
#> 6 16 2.50 6.40
#> # ... with 3.368e+05 more rows
0.0000
0
2
关注作者
收藏
评论(0)
发表评论
暂无数据
推荐帖子
0条评论
0条评论
1条评论