6

An orderly approach to regression models, ideally with dplyr

 3 years ago
source link: https://www.codesd.com/item/an-orderly-approach-to-regression-models-ideally-with-dplyr.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

An orderly approach to regression models, ideally with dplyr

advertisements

Reading the documentation for do() in dplyr, I've been impressed by the ability to create regression models for groups of data and was wondering whether it would be possible to replicate it using different independent variables rather than groups of data.

So far I've tried

require(dplyr)
data(mtcars)

models <- data.frame(var = c("cyl", "hp", "wt"))

models <- models %>% do(mod = lm(mpg ~ as.name(var), data = mtcars))
Error in as.vector(x, "symbol") :
  cannot coerce type 'closure' to vector of type 'symbol'

models <- models %>% do(mod = lm(substitute(mpg ~ i, as.name(.$var)), data = mtcars))
Error in substitute(mpg ~ i, as.name(.$var)) :
  invalid environment specified

The desired final output would be something like

  var slope standard_error_slope
1 cyl -2.87                 0.32
2  hp -0.07                 0.01
3  wt -5.34                 0.56

I'm aware that something similar is possible using a lapply approach, but find the apply family largely inscrutable. Is there a dplyr solution?


This isn't pure "dplyr", but rather, "dplyr" + "tidyr" + "data.table". Still, I think it should be pretty easily readable.

library(data.table)
library(dplyr)
library(tidyr)

mtcars %>%
  gather(var, val, cyl:carb) %>%
  as.data.table %>%
  .[, as.list(summary(lm(mpg ~ val))$coefficients[2, 1:2]), by = var]
#      var    Estimate  Std. Error
#  1:  cyl -2.87579014 0.322408883
#  2: disp -0.04121512 0.004711833
#  3:   hp -0.06822828 0.010119304
#  4: drat  7.67823260 1.506705108
#  5:   wt -5.34447157 0.559101045
#  6: qsec  1.41212484 0.559210130
#  7:   vs  7.94047619 1.632370025
#  8:   am  7.24493927 1.764421632
#  9: gear  3.92333333 1.308130699
# 10: carb -2.05571870 0.568545640

If you really just wanted a few variables, start with a vector, not a data.frame.

models <- c("cyl", "hp", "wt")

mtcars %>%
  select_(.dots = c("mpg", models)) %>%
  gather(var, val, -mpg) %>%
  as.data.table %>%
  .[, as.list(summary(lm(mpg ~ val))$coefficients[2, 1:2]), by = var]
#    var    Estimate Std. Error
# 1: cyl -2.87579014  0.3224089
# 2:  hp -0.06822828  0.0101193
# 3:  wt -5.34447157  0.5591010


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK