Calculate fully standardised effects (model coefficients) in standard deviation units, adjusted for multicollinearity.
Usage
stdEff(
mod,
weights = NULL,
data = NULL,
term.names = NULL,
unique.eff = TRUE,
cen.x = TRUE,
cen.y = TRUE,
std.x = TRUE,
std.y = TRUE,
refit.x = TRUE,
incl.raw = FALSE,
R.squared = FALSE,
R2.arg = NULL,
env = NULL
)
Arguments
- mod
A fitted model object, or a list or nested list of such objects.
- weights
An optional numeric vector of weights to use for model averaging, or a named list of such vectors. The former should be supplied when
mod
is a list, and the latter when it is a nested list (with matching list names). If set to"equal"
, a simple average is calculated instead.- data
An optional dataset, used to first refit the model(s).
- term.names
An optional vector of names used to extract and/or sort effects from the output.
- unique.eff
Logical, whether unique effects should be calculated (adjusted for multicollinearity among predictors).
- cen.x, cen.y
Logical, whether effects should be calculated as if from mean-centred variables.
- std.x, std.y
Logical, whether effects should be scaled by the standard deviations of variables.
- refit.x
Logical, whether the model should be refit with mean-centred predictor variables (see Details).
- incl.raw
Logical, whether to append the raw (unstandardised) effects to the output.
- R.squared
Logical, whether R-squared values should also be calculated (via
R2()
).- R2.arg
A named list of additional arguments to
R2()
(where applicable), excepting argumentenv
. Ignored ifR.squared = FALSE
.- env
Environment in which to look for model data (if none supplied). Defaults to the
formula()
environment.
Details
stdEff()
will calculate fully standardised effects (coefficients)
in standard deviation units for a fitted model or list of models. It
achieves this via adjusting the 'raw' model coefficients, so no
standardisation of input variables is required beforehand. Users can simply
specify the model with all variables in their original units and the
function will do the rest. However, the user is free to scale and/or centre
any input variables should they choose, which should not affect the outcome
of standardisation (provided any scaling is by standard deviations). This
may be desirable in some cases, such as to increase numerical stability
during model fitting when variables are on widely different scales.
If arguments cen.x
or cen.y
are TRUE
, effects will be calculated as
if all predictors (x) and/or the response variable (y) were mean-centred
prior to model-fitting (including any dummy variables arising from
categorical predictors). Thus, for an ordinary linear model where centring
of x and y is specified, the intercept will be zero – the mean (or weighted
mean) of y. In addition, if cen.x = TRUE
and there are interacting terms
in the model, all effects for lower order terms of the interaction are
adjusted using an expression which ensures that each main effect or lower
order term is estimated at the mean values of the terms they interact with
(zero in a 'centred' model) – typically improving the interpretation of
effects. The expression used comprises a weighted sum of all the effects
that contain the lower order term, with the weight for the term itself
being zero and those for 'containing' terms being the product of the means
of the other variables involved in that term (i.e. those not in the lower
order term itself). For example, for a three-way interaction (x1 * x2 *
x3), the expression for main effect \(\beta1\) would be:
$$\beta_{1} + \beta_{12}\bar{x}_{2} + \beta_{13}\bar{x}_{3} + \beta_{123}\bar{x}_{2}\bar{x}_{3}$$ (adapted from here)
In addition, if std.x = TRUE
or unique.eff = TRUE
(see below), product
terms for interactive effects will be recalculated using mean-centred
variables, to ensure that standard deviations and variance inflation
factors (VIF) for predictors are calculated correctly (the model must be
refit for this latter purpose, to recalculate the variance-covariance
matrix).
If std.x = TRUE
, effects are scaled by multiplying by the standard
deviations of predictor variables (or terms), while if std.y = TRUE
they
are divided by the standard deviation of the response variable (minus any
offsets). If the model is a GLM, this latter is calculated using the
link-transformed response (or an estimate of same) generated using the
function glt()
. If both arguments are true, the effects are regarded as
'fully' standardised in the traditional sense, often referred to as
'betas'.
If unique.eff = TRUE
(default), effects are adjusted for
multicollinearity among predictors by dividing by the square root of the
VIFs (Dudgeon, 2016; Thompson et al., 2017; RVIF()
). If they have also
been scaled by the standard deviations of x and y, this converts them to
semipartial correlations, i.e. the correlation between the unique
components of predictors (residualised on other predictors) and the
response variable. This measure of effect size is arguably much more
interpretable and useful than the traditional standardised coefficient, as
it always represents the unique effects of predictors and so can more
readily be compared both within and across models. Values range from zero
to +/- one rather than +/- infinity (as in the case of betas) – putting
them on the same scale as the bivariate correlation between predictor and
response. In the case of GLMs however, the measure is analogous but not
exactly equal to the semipartial correlation, so its values may not always
be bound between +/- one (such cases are likely rare). Importantly, for
ordinary linear models, the square of the semipartial correlation equals
the increase in R-squared when that variable is included last in the model
– directly linking the measure to unique variance explained. See
here
for additional arguments in favour of the use of semipartial correlations.
If refit.x
, cen.x
, and unique.eff
are TRUE
and there are
interaction terms in the model, the model will be refit with any
(newly-)centred continuous predictors, in order to calculate correct VIFs
from the variance-covariance matrix. However, refitting may not be
necessary in some circumstances, for example where predictors have already
been mean-centred, and whose values will not subsequently be resampled
(e.g. parametric bootstrap). Setting refit.x = FALSE
in such cases will
save time, especially with larger/more complex models and/or bootstrap
runs.
If incl.raw = TRUE
, raw (unstandardised) effects can also be appended,
i.e. those with all centring and scaling options set to FALSE
(though
still adjusted for multicollinearity, where applicable). These may be of
interest in some cases, for example to compare their bootstrapped
distributions with those of standardised effects.
If R.squared = TRUE
, model R-squared values are appended to effects via
the R2()
function, with any additional arguments passed via R2.arg
.
Finally, if weights
are specified, the function calculates a weighted
average of standardised effects across a set (or sets) of different
candidate models for a particular response variable(s) (Burnham & Anderson,
2002), via the avgEst()
function.
References
Burnham, K. P., & Anderson, D. R. (2002). Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach (2nd ed.). Springer-Verlag. https://link.springer.com/book/10.1007/b97636
Dudgeon, P. (2016). A Comparative Investigation of Confidence Intervals for Independent Variables in Linear Regression. Multivariate Behavioral Research, 51(2-3), 139-153. doi:10/gfww3f
Thompson, C. G., Kim, R. S., Aloe, A. M., & Becker, B. J. (2017). Extracting the Variance Inflation Factor and Other Multicollinearity Diagnostics from Typical Regression Results. Basic and Applied Social Psychology, 39(2), 81-90. doi:10/gfww2w
Examples
library(lme4)
#> Loading required package: Matrix
#>
#> Attaching package: ‘lme4’
#> The following object is masked from ‘package:semEff’:
#>
#> getData
# Standardised (direct) effects for SEM
m <- shipley.sem
stdEff(m)
#> $DD
#> (Intercept) lat
#> -0.05600661 -0.68772025
#>
#> $Date
#> (Intercept) DD
#> -0.01493651 -0.62813666
#>
#> $Growth
#> (Intercept) Date
#> -0.2917507 0.3824224
#>
#> $Live
#> (Intercept) Growth
#> 0.3105220 0.3681961
#>
stdEff(m, cen.y = FALSE, std.y = FALSE) # x-only
#> $DD
#> (Intercept) lat
#> 143.183138 -7.066909
#>
#> $Date
#> (Intercept) DD
#> 126.79703 -5.11375
#>
#> $Growth
#> (Intercept) Date
#> 48.955514 2.448161
#>
#> $Live
#> (Intercept) Growth
#> 5.452982 2.226862
#>
stdEff(m, std.x = FALSE, std.y = FALSE) # centred only
#> $DD
#> (Intercept) lat
#> -0.5755154 -0.8354729
#>
#> $Date
#> (Intercept) DD
#> -0.1216002 -0.4976475
#>
#> $Growth
#> (Intercept) Date
#> -1.8677067 0.3007147
#>
#> $Live
#> (Intercept) Growth
#> 1.8780467 0.3478536
#>
stdEff(m, cen.x = FALSE, cen.y = FALSE) # scaled only
#> $DD
#> (Intercept) lat
#> 19.1373370 -0.6877202
#>
#> $Date
#> (Intercept) DD
#> 24.3624481 -0.6281367
#>
#> $Growth
#> (Intercept) Date
#> 1.6853620 0.3824224
#>
#> $Live
#> (Intercept) Growth
#> -2.0214940 0.3681961
#>
stdEff(m, unique.eff = FALSE) # include multicollinearity
#> $DD
#> (Intercept) lat
#> -0.05600661 -0.68772025
#>
#> $Date
#> (Intercept) DD
#> -0.01493651 -0.62813666
#>
#> $Growth
#> (Intercept) Date
#> -0.2917507 0.3824224
#>
#> $Live
#> (Intercept) Growth
#> 0.3105220 0.3681961
#>
stdEff(m, R.squared = TRUE) # add R-squared
#> $DD
#> (Intercept) lat (R.squared) (R.squared.adj)
#> -0.05600661 -0.68772025 0.70800826 0.70747906
#> (R.squared.pred)
#> 0.68356361
#>
#> $Date
#> (Intercept) DD (R.squared) (R.squared.adj)
#> -0.01493651 -0.62813666 0.98555565 0.98553510
#> (R.squared.pred)
#> 0.98202414
#>
#> $Growth
#> (Intercept) Date (R.squared) (R.squared.adj)
#> -0.2917507 0.3824224 0.7938367 0.7934879
#> (R.squared.pred)
#> 0.7552854
#>
#> $Live
#> (Intercept) Growth (R.squared) (R.squared.adj)
#> 0.3105220 0.3681961 0.2668652 0.2655971
#> (R.squared.pred)
#> NA
#>
stdEff(m, incl.raw = TRUE) # add unstandardised
#> $DD
#> (Intercept) lat (raw)_(Intercept) (raw)_lat
#> -0.05600661 -0.68772025 196.65237838 -0.83547294
#>
#> $Date
#> (Intercept) DD (raw)_(Intercept) (raw)_DD
#> -0.01493651 -0.62813666 198.33816379 -0.49764747
#>
#> $Growth
#> (Intercept) Date (raw)_(Intercept) (raw)_Date
#> -0.2917507 0.3824224 10.7892162 0.3007147
#>
#> $Live
#> (Intercept) Growth (raw)_(Intercept) (raw)_Growth
#> 0.3105220 0.3681961 -12.2260588 0.3478536
#>
# Demonstrate equality with effects from manually-standardised variables
# (gaussian models only)
m <- shipley.growth[[3]]
d <- data.frame(scale(na.omit(shipley)))
e1 <- stdEff(m, unique.eff = FALSE)
e2 <- coef(summary(update(m, data = d)))[, 1]
stopifnot(all.equal(e1, e2))
# Demonstrate equality with square root of increment in R-squared
# (ordinary linear models only)
m <- lm(Growth ~ Date + DD + lat, data = shipley)
r2 <- summary(m)$r.squared
e1 <- stdEff(m)[-1]
en <- names(e1)
e2 <- sapply(en, function(i) {
f <- reformulate(en[!en %in% i])
r2i <- summary(update(m, f))$r.squared
sqrt(r2 - r2i)
})
stopifnot(all.equal(e1, e2))
# Model-averaged standardised effects
m <- shipley.growth # candidate models
w <- runif(length(m), 0, 1) # weights
stdEff(m, w)
#> (Intercept) DD Date lat
#> -0.294231350 -0.003638724 0.248803366 -0.039021926