semEff • semEff

semEff automates calculation of effects for structural equation models (SEM) which use local estimation (‘piecewise’; see Lefcheck (2016)). Here’s a short example using an SEM testing fire impacts on plant species richness (Grace & Keeley (2006); data provided with the piecewiseSEM package).

# install.packages(c("semEff", "piecewiseSEM"))
library(semEff)
data(keeley, package = "piecewiseSEM")

Let’s have a look at the data. Observations represent a range of vegetation parameters measured in 90 post-fire study plots after wildfires in southern California in 1993:

knitr::kable(head(keeley))

distance	elev	abiotic	age	hetero	firesev	cover	rich
53.40900	1225	60.67103	40	0.757065	3.50	1.0387974	51
37.03745	60	40.94291	25	0.491340	4.05	0.4775924	31
53.69565	200	50.98805	15	0.844485	2.60	0.9489357	71
53.69565	200	61.15633	15	0.690847	2.90	1.1949002	64
51.95985	970	46.66807	23	0.545628	4.30	1.2981890	68
51.95985	970	39.82357	24	0.652895	4.00	1.1734866	34

An SEM fit to these data in Grace & Keeley (2006) tests the influence of landscape vs. local factors in the response of plants to fire, or more specifically, the direct vs. indirect effects of landscape location (distance from coast) on plant species richness via other biotic and abiotic mediators (see the paper for further details). We can reproduce this SEM (roughly) in a piecewise fashion by fitting a separate linear model for each observed response variable:

keeley.sem <- list(
  lm(age ~ distance, data = keeley),
  lm(hetero ~ distance, data = keeley),
  lm(abiotic ~ distance, data = keeley),
  lm(firesev ~ age, data = keeley),
  lm(cover ~ firesev, data = keeley),
  lm(rich ~ distance + hetero + abiotic + cover, data = keeley)
)

And visualise it using piecewiseSEM:::plot.psem() (wrapper for DiagrammeR::render_graph()), which helps to clarify the directions of causal pathways in the system:

piecewiseSEM:::plot.psem(
  piecewiseSEM::as.psem(keeley.sem),
  node_attrs = data.frame(shape = "rectangle", color = "black", fillcolor = "grey"),
  layout = "tree"
)

Now, to calculate effects, we first bootstrap and save the standardised direct effects:

system.time(
  keeley.sem.boot <- bootEff(keeley.sem, R = 100, seed = 13, parallel = "no")
)
#>    user  system elapsed 
#>   4.394   0.037   4.431

Note that here we use only 100 reps to save time, where typically we should use a lot more for greater accuracy of confidence intervals (e.g. 1,000–10,000+). Using the bootstrapped estimates we can now calculate direct, indirect, and total effects:

(keeley.sem.eff <- semEff(keeley.sem.boot))
#> 
#> Piecewise SEM with:
#>   * 1 exogenous vs. 6 endogenous variable(s)
#>   * 9 direct vs. 8 indirect effect(s)
#> 
#> Variables:
#>             Category   Predictor Mediator Response   Dir. Eff. Ind. Eff.  
#>             --------   --------- -------- --------   --------- ---------  
#>  distance | Exog.    |     x        -        -     |         -         - |
#>  age      | Endog.   |     x        x        x     |         1         0 |
#>  hetero   | Endog.   |     x        x        x     |         1         0 |
#>  abiotic  | Endog.   |     x        x        x     |         1         0 |
#>  firesev  | Endog.   |     x        x        x     |         1         1 |
#>  cover    | Endog.   |     x        x        x     |         1         2 |
#>  rich     | Endog.   |     -        -        x     |         4         5 |
#> 
#> Use summary() for effects and confidence intervals for endogenous variables.
#> See ?getEff() for extracting (unformatted) effects.

Let’s examine effects for the final response variable, plant species richness, which has four direct and five indirect paths leading to it:

summary(keeley.sem.eff, response = "rich")
#> 
#> SEM direct, summed indirect, total, and mediator effects:
#> 
#> Response 'rich' (6/6):
#>                        Effect     Bias   Std. Err.   Lower CI Upper CI    
#>                        ------   ------   ---------   -------- --------    
#>  DIRECT:    distance |  0.233 |  0.016 |     0.065 |    0.089    0.351 | *
#>             hetero   |  0.301 | -0.009 |     0.072 |    0.168    0.447 | *
#>             abiotic  |  0.219 | -0.014 |     0.078 |    0.080    0.401 | *
#>             cover    |  0.267 | -0.015 |     0.082 |    0.085    0.446 | *
#>                      |        |        |           |                   |  
#>  INDIRECT:  distance |  0.220 | -0.012 |     0.047 |    0.139    0.349 | *
#>             age      | -0.053 |  0.002 |     0.025 |   -0.145   -0.012 | *
#>             firesev  | -0.117 |  0.007 |     0.045 |   -0.224   -0.029 | *
#>                      |        |        |           |                   |  
#>  TOTAL:     distance |  0.453 |  0.004 |     0.045 |    0.340    0.524 | *
#>             age      | -0.053 |  0.002 |     0.025 |   -0.145   -0.012 | *
#>             hetero   |  0.301 | -0.009 |     0.072 |    0.168    0.447 | *
#>             abiotic  |  0.219 | -0.014 |     0.078 |    0.080    0.401 | *
#>             firesev  | -0.117 |  0.007 |     0.045 |   -0.224   -0.029 | *
#>             cover    |  0.267 | -0.015 |     0.082 |    0.085    0.446 | *
#>                      |        |        |           |                   |  
#>  MEDIATORS: age      |  0.015 | -0.001 |     0.009 |    0.004    0.056 | *
#>             hetero   |  0.104 | -0.003 |     0.031 |    0.046    0.186 | *
#>             abiotic  |  0.101 | -0.008 |     0.037 |    0.032    0.184 | *
#>             firesev  | -0.038 |  0.001 |     0.017 |   -0.083   -0.008 | *
#>             cover    | -0.155 |  0.008 |     0.061 |   -0.302   -0.037 | *

We can see that the standardised total effect of distance from coast on plant species richness is moderately high at 0.453 (CIs: 0.373, 0.526), and is split almost equally between direct (0.233) and indirect (0.220) effects (involving five mediators). The mediator effects listed here are the sum of all the indirect paths from all predictors which involve each mediator, and can be useful as a rough indicator of the overall importance and net direction of effect of each. hetero, abiotic, and cover appear to be relatively important, with cover having the strongest influence (-0.155). However, this considers all predictors in the system collectively – meaning that mediators are themselves counted as predictors. To focus only on the exogenous variable distance, let’s recalculate:

summary(
  semEff(keeley.sem.boot, predictor = "distance"),
  response = "rich"
)
#> 
#> SEM direct, summed indirect, total, and mediator effects:
#> 
#> Response 'rich' (6/6):
#>                        Effect     Bias   Std. Err.   Lower CI Upper CI    
#>                        ------   ------   ---------   -------- --------    
#>  DIRECT:    distance |  0.233 |  0.016 |     0.065 |    0.089    0.351 | *
#>                      |        |        |           |                   |  
#>  INDIRECT:  distance |  0.220 | -0.012 |     0.047 |    0.139    0.349 | *
#>                      |        |        |           |                   |  
#>  TOTAL:     distance |  0.453 |  0.004 |     0.045 |    0.340    0.524 | *
#>                      |        |        |           |                   |  
#>  MEDIATORS: age      |  0.015 | -0.001 |     0.009 |    0.004    0.056 | *
#>             hetero   |  0.104 | -0.003 |     0.031 |    0.046    0.186 | *
#>             abiotic  |  0.101 | -0.008 |     0.037 |    0.032    0.184 | *
#>             firesev  |  0.015 | -0.001 |     0.009 |    0.004    0.056 | *
#>             cover    |  0.015 | -0.001 |     0.009 |    0.004    0.056 | *

Now we can see that hetero and abiotic are the important mediators for distance, while the relative influence of cover is much lower – weakened due to being part of a single longer chain of effect (involving age and firesev). The total indirect effect of 0.220 can thus be broken down into three individual paths:

distance -> age -> firesev -> cover -> rich (0.015)
distance -> hetero -> rich (0.104)
distance -> abiotic -> rich (0.101)

This is only one potential way of analysing paths in this SEM. Depending on the size of the system and the particular research questions, any number of predictors, mediators and/or response variables can be investigated separately or collectively for direct vs. indirect effects. If there are multiple hypotheses (competing or complementary), these can all be tested in a single SEM by comparing the relative importance of different variables and pathways of effect.

NOTE: All effects presented here are standardised unique effects (i.e. adjusted for multicollinearity, a.k.a. semipartial correlations), which is the only way to fully partition effects in the system. These will usually be a bit smaller than the unadjusted standardised coefficients, since most variables are correlated to some degree. If there are large differences between the two, consideration should be given to the impact (and relevance) of multicollinearity in the system. In this particular case, it’s minimal, with standard errors of coefficients less than twice as large as they should be (checked using RVIF() for the species richness model):

RVIF(keeley.sem[[6]])
#> distance   hetero  abiotic    cover 
#> 1.214376 1.122211 1.139111 1.074784

To use original standardised coefficients instead, specify unique.eff = FALSE, which can be passed to stdEff(), bootEff(), and/or semEff(). These can be preferred if prediction is the primary goal rather than inference, for comparison with the unique effects or with other standardised coefficients, or for other reasons.

References

Grace, J. B., & Keeley, J. E. (2006). A structural equation model analysis of postfire plant diversity in California shrublands. Ecological Applications: A Publication of the Ecological Society of America, 16(2), 503–514. https://doi.org/10.1890/1051-0761(2006)016%5B0503:ASEMAO%5D2.0.CO;2

Lefcheck, J. S. (2016). piecewiseSEM: Piecewise structural equation modelling in R for ecology, evolution, and systematics. Methods in Ecology and Evolution, 7(5), 573–579. https://doi.org/10.1111/2041-210X.12512