Bootstrap model effects (standardised coefficients) and optional SEM correlated errors.

bootEff(
mod,
R,
seed = NULL,
type = c("nonparametric", "parametric", "semiparametric"),
ran.eff = NULL,
cor.err = NULL,
catch.err = TRUE,
parallel = c("snow", "multicore", "no"),
ncpus = NULL,
cl = NULL,
bM.arg = NULL,
...
)

## Arguments

mod A fitted model object, or a list or nested list of such objects. Alternatively, a "psem" object from piecewiseSEM::psem(). If model lists are unnamed, response variable names will be used. Number of bootstrap resamples to generate. Seed for the random number generator. If not provided, a random five-digit integer is used (see Details). The type of bootstrapping to perform. Can be "nonparametric" (default), "parametric", or "semiparametric" (the last two currently only for mixed models, via bootMer()). For nonparametric bootstrapping of mixed models, the name of the (highest-level) random effect to resample (see Details). Optional, names of SEM correlated errors to be bootstrapped (ignored if mod is a "psem" object). Should be of the form: c("var1 ~~ var2", "var3 ~~ var4", ...) (spaces optional), using model/response variable names. Logical, should errors generated during model fitting or estimation be caught and NA returned for estimates? If FALSE, any such errors will cause the function to exit. The type of parallel processing to use. Can be one of "snow", "multicore", or "no" (for none). Number of system cores to use for parallel processing. If NULL (default), all available cores are used. Optional cluster to use if parallel = "snow". If NULL (default), a local cluster is created using the specified number of cores. A named list of any additional arguments to bootMer(). Arguments to stdEff().

## Value

An object of class "boot" containing the bootstrapped effects, or a (named) list/nested list of such objects.

## Details

bootEff() uses boot::boot() (primarily) to bootstrap standardised effects from a fitted model or list of models (calculated using stdEff()). Bootstrapping is typically nonparametric, i.e. model effects are calculated from data where the rows have been randomly sampled with replacement. 10,000 such resamples should provide accurate coverage for confidence intervals in most situations, with fewer sufficing in some cases. To ensure that data is resampled in the same way across individual bootstrap operations within the same run (e.g. models in a list), the same seed is set per operation, with the value saved as an attribute to the matrix of bootstrapped values (for reproducibility). The seed can either be user-supplied or a randomly-generated five-digit number (default), and is always re-initialised on exit (i.e. set.seed(NULL)).

Where weights are specified, bootstrapped effects will be a weighted average across the set of candidate models for each response variable, calculated after each model is first refit to the resampled dataset (specifying weights = "equal" will use a simple average instead – see avgEst()). If no weights are specified and mod is a nested list of models, the function will throw an error, as it will be expecting weights for a presumed model averaging scenario. If instead the user wishes to bootstrap each individual model, they should recursively apply the function using rMapply() (remember to set a seed).

Where names of response variables with correlated errors are specified to cor.err, the function will also return bootstrapped Pearson correlated errors (weighted residuals) for those models. If weights are supplied and mod is a nested list, residuals will first be averaged across candidate models. If any two models (or candidate sets) with correlated errors were fit to different subsets of data observations, both models/sets are first refit to data containing only the observations in common.

For nonparametric bootstrapping of mixed models, resampling should occur at the group-level, as individual observations are not independent. The name of the random effect to resample must be supplied to ran.eff. For nested random effects, this should be the highest-level group (Davison & Hinkley, 1997; Ren et al., 2010). This form of resampling will result in datasets of different sizes if observations are unbalanced across groups; however this should not generally be an issue, as the number of independent units (groups), and hence the 'degrees of freedom', remains unchanged.

For mixed models with non-nested random effects, nonparametric resampling will not be appropriate. In these cases, parametric or semiparametric bootstrapping can be performed instead via lme4::bootMer() (with additional arguments passed to that function as necessary). NOTE: As bootMer() takes only a fitted model as its first argument (i.e. no lists), any model averaging is calculated 'post-hoc' using the estimates in boot objects for each candidate model, rather than during the bootstrapping process itself (i.e. the default procedure via boot()). Results are then returned in a new boot object for each response variable or correlated error estimate.

If supplied a list containing both mixed and non-mixed models, bootEff() with nonparametric bootstrapping will still work and will treat all models as mixed models for resampling (with a warning). This is likely a relatively rare scenario, but may occur where the user decides that non-mixed models perform similarly and/or cause less fitting issues than their mixed counterparts for at least some response variables (e.g. where random effect variance estimates are at or near zero). The data will be resampled on the supplied random effect for all models. If nonparametric bootstrapping is not used in this scenario however, an error will occur, as bootMer() will only accept mixed models.

Parallel processing is used by default via the parallel package and option parallel = "snow" (and is generally recommended), but users can specify the type of parallel processing to use, or none. If "snow", a cluster of workers is created using makeCluster(), and the user can specify the number of system cores to incorporate in the cluster (defaults to all available). bootEff() then exports all required objects and functions to this cluster using clusterExport(), after performing a (rough) match of all objects and functions in the current global environment to those referenced in the model call(s). Users should load any required external packages prior to calling the function.

## Note

Bootstrapping mixed (or indeed any other) models may take a very long time when the number of replicates, observations, parameters, and/or models is high. To decrease processing time, it may be worth trying different optimisers and/or other options to generate faster estimates (always check results).

## References

Burnham, K. P., & Anderson, D. R. (2002). Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach (2nd ed.). Springer-Verlag. https://www.springer.com/gb/book/9780387953649

Davison, A. C., & Hinkley, D. V. (1997). Bootstrap Methods and their Application. Cambridge University Press.

Ren, S., Lai, H., Tong, W., Aminzadeh, M., Hou, X., & Lai, S. (2010). Nonparametric bootstrapping for hierarchical data. Journal of Applied Statistics, 37(9), 1487–1498. doi: 10/dvfzcn

## Examples

# Bootstrap Shipley SEM (test – 1 rep)
# (set 'site' as group for resampling – highest-level random effect)
bootEff(shipley.sem, R = 1, ran.eff = "site", parallel = "no")
#> Warning: Model failed to converge with max|grad| = 0.00475537 (tol = 0.002, component 1)#> $DD #> #> ORDINARY NONPARAMETRIC BOOTSTRAP #> #> #> Call: #> boot::boot(data = x, statistic = s, R = R, parallel = parallel, #> ncpus = nc, cl = cl) #> #> #> Bootstrap Statistics : #> original bias std. error #> t1* -0.05600661 0.01150877 NA #> t2* -0.68772025 -0.08480588 NA #> #>$Date
#>
#> ORDINARY NONPARAMETRIC BOOTSTRAP
#>
#>
#> Call:
#> boot::boot(data = x, statistic = s, R = R, parallel = parallel,
#>     ncpus = nc, cl = cl)
#>
#>
#> Bootstrap Statistics :
#>        original       bias    std. error
#> t1* -0.01493651  0.002680086          NA
#> t2* -0.62813666 -0.015181882          NA
#>
#> $Growth #> #> ORDINARY NONPARAMETRIC BOOTSTRAP #> #> #> Call: #> boot::boot(data = x, statistic = s, R = R, parallel = parallel, #> ncpus = nc, cl = cl) #> #> #> Bootstrap Statistics : #> original bias std. error #> t1* -0.2917507 -0.14076466 NA #> t2* 0.3824224 0.01448792 NA #> #>$Live
#>
#> ORDINARY NONPARAMETRIC BOOTSTRAP
#>
#>
#> Call:
#> boot::boot(data = x, statistic = s, R = R, parallel = parallel,
#>     ncpus = nc, cl = cl)
#>
#>
#> Bootstrap Statistics :
#>      original    bias    std. error
#> t1* 0.3105220 0.2188220          NA
#> t2* 0.3681961 0.1213242          NA
#>
# Check estimates (use saved boot object – 1000 reps)
lapply(shipley.sem.boot, "[[", 1)  # original
#> $DD #> (Intercept) lat #> -0.05600661 -0.68772025 #> #>$Date
#> (Intercept)          DD
#> -0.01493651 -0.62813666
#>
#> $Growth #> (Intercept) Date #> -0.2917507 0.3824224 #> #>$Live
#> (Intercept)      Growth
#>   0.3105220   0.3681961
#> lapply(shipley.sem.boot, function(i) head(i$t)) # bootstrapped #>$DD
#>      (Intercept)        lat
#> [1,]  0.07884187 -0.5481916
#> [2,] -0.09289124 -0.6231252
#> [3,] -0.20894499 -0.6936512
#> [4,] -0.13368868 -0.4139394
#> [5,] -0.10229480 -0.7023361
#> [6,] -0.01969415 -0.6358960
#>
#> $Date #> (Intercept) DD #> [1,] -0.11082251 -0.6196655 #> [2,] -0.12908598 -0.6238294 #> [3,] 0.01454823 -0.6291431 #> [4,] -0.03500544 -0.5212996 #> [5,] -0.04554961 -0.5774458 #> [6,] -0.02636631 -0.5555745 #> #>$Growth
#>      (Intercept)      Date
#> [1,]  -0.2900700 0.3256320
#> [2,]  -0.2223630 0.3988001
#> [3,]  -0.3721685 0.4173523
#> [4,]  -0.2221964 0.3551045
#> [5,]  -0.1850172 0.4199339
#> [6,]  -0.3353260 0.3531018
#>
#> \$Live
#>      (Intercept)    Growth
#> [1,]   1.0030671 0.8118620
#> [2,]   0.3839991 0.3822085
#> [3,]   0.4223174 0.4173702
#> [4,]   0.3149438 0.3793002
#> [5,]   0.6149997 0.5659556
#> [6,]   0.6737008 0.5710483
#>