An orderly approach to regression models, ideally with dplyr
source link: https://www.codesd.com/item/an-orderly-approach-to-regression-models-ideally-with-dplyr.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
An orderly approach to regression models, ideally with dplyr
Reading the documentation for do()
in dplyr, I've been impressed by the ability to create regression models for groups of data and was wondering whether it would be possible to replicate it using different independent variables rather than groups of data.
So far I've tried
require(dplyr)
data(mtcars)
models <- data.frame(var = c("cyl", "hp", "wt"))
models <- models %>% do(mod = lm(mpg ~ as.name(var), data = mtcars))
Error in as.vector(x, "symbol") :
cannot coerce type 'closure' to vector of type 'symbol'
models <- models %>% do(mod = lm(substitute(mpg ~ i, as.name(.$var)), data = mtcars))
Error in substitute(mpg ~ i, as.name(.$var)) :
invalid environment specified
The desired final output would be something like
var slope standard_error_slope
1 cyl -2.87 0.32
2 hp -0.07 0.01
3 wt -5.34 0.56
I'm aware that something similar is possible using a lapply approach, but find the apply family largely inscrutable. Is there a dplyr solution?
This isn't pure "dplyr", but rather, "dplyr" + "tidyr" + "data.table". Still, I think it should be pretty easily readable.
library(data.table)
library(dplyr)
library(tidyr)
mtcars %>%
gather(var, val, cyl:carb) %>%
as.data.table %>%
.[, as.list(summary(lm(mpg ~ val))$coefficients[2, 1:2]), by = var]
# var Estimate Std. Error
# 1: cyl -2.87579014 0.322408883
# 2: disp -0.04121512 0.004711833
# 3: hp -0.06822828 0.010119304
# 4: drat 7.67823260 1.506705108
# 5: wt -5.34447157 0.559101045
# 6: qsec 1.41212484 0.559210130
# 7: vs 7.94047619 1.632370025
# 8: am 7.24493927 1.764421632
# 9: gear 3.92333333 1.308130699
# 10: carb -2.05571870 0.568545640
If you really just wanted a few variables, start with a vector, not a data.frame
.
models <- c("cyl", "hp", "wt")
mtcars %>%
select_(.dots = c("mpg", models)) %>%
gather(var, val, -mpg) %>%
as.data.table %>%
.[, as.list(summary(lm(mpg ~ val))$coefficients[2, 1:2]), by = var]
# var Estimate Std. Error
# 1: cyl -2.87579014 0.3224089
# 2: hp -0.06822828 0.0101193
# 3: wt -5.34447157 0.5591010
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK