The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
Many abiotic and ecological processes are a result of variables
operating at multiple scales. Such multiscale ecological processes can
be challenging to quantify, especially when the surrounding landscape
contributes to the process. The extent and magnitude of contribution
from the surrounding landscape is often referred to as the ‘scale of
effect’. Determining the scales of effect has long been the focus of
ecologists seeking to address multiscale ecological questions, but until
recently, methods to rigorously and can identify such scales have been
lacking. With advanced coding skills, Bayesian analytical frameworks can
be used (e.g., Stuber & Gruber 2020; Amirkhiz et al. 2023). More
recently, user-friendly methods, implemented in R, have been developed
(Siland
: Carpentier & Martin 2021;
Scalescape
: Lowe et al. 2022). Both of these packages, like
multiScaleR
, estimate distance-weighted landscape effects
(i.e. scales of effect). All take a similar approach to achieving this
goal, but each works with different model classes and have different
functions available to users.
multiScaleR
is fully supported in handling models of
class glm
, glm.nb
, nlme
(including gls
), lme4
, and
unmarked
. It should work with any model class that has an
update
function and that is a supported class in the R
package insight. This
package has been developed to provide an efficient modeling workflow,
facilitating scaling of rasters and spatial projections of optimized
models. Let me know if you encounter issues or would like to see support
for other model classes.
References
Amirkhiz, R. G., R. John, and D. L. Swanson. 2023. A Bayesian approach for multiscale modeling of the influence of seasonal and annual habitat variation on relative abundance of ring-necked pheasant roosters. Ecological Informatics 75:102003.
Carpentier, F., and O. Martin. 2021. Siland a R package for estimating the spatial influence of landscape. Scientific Reports 11:7488.
Lowe, E. B., B. Iuliano, C. Gratton, and A. R. Ives. 2022. ‘Scalescape’: an R package for estimating distance-weighted landscape effects on an environmental response. Landscape Ecology 37:1771–1785.
Stuber, E. F., and L. F. Gruber. 2020. Recent methodological solutions to identifying scales of effect in multi-scale modeling. Current Landscape Ecology Reports 5:127–139.
There are many different kernel transformations that could
be used to quantify the contribution of environmental variables as a
function of distance. Four kernel functions are implemented in
multiScaleR
. Experience with simulated data suggests that
the Exponential Power kernel, while flexible, is prone to over fitting
and optimization often fails to converge. The Gaussian kernel is the
default with multiScaleR
. Note: Figures in this document
rendered poorly. Re-running code on your local machine will provide much
better results!
Gaussian (gaus
) – Decay in space governed by a
single parameter (sigma)
Negative Exponential (exp
) – Decay in space governed
by a single parameter (sigma)
Exponential Power (expow
) – Decay in space governed
by a scale parameter (sigma), and shape parameter (beta)
Fixed width buffer (fixed
) – Effect does not decay
with distance
Prior to using multiScaleR
to identify distance-weighted
scales of effect, data must be appropriately formatted. These steps will
be demonstrated with sample, simulated data provided with the
package.
library(multiScaleR)
## Read in data
data("landscape_counts")
dat <- landscape_counts
data("surv_pts")
pts <- vect(surv_pts)
land_rast <- terra::rast(system.file("extdata",
"landscape.tif",
package = 'multiScaleR'))
The landscape_counts
data frame contains simulated
counts from 100 spatial point locations as well as a scaled and centered
site-level covariate (‘site’). The landscape.tif
file is a
spatRaster
object consisting of three surfaces (land1 =
binary habitat; land2 = continuous habitat / environment; land3 =
continuous / environment). The counts were simulated using the following
parameters:
Poisson process
site
land1
Effect –> -0.5
Gaussian sigma –> 250
land2
Effect –> 0.7
Gaussian sigma –> 500
land3
## counts site
## Min. : 0 Min. :-1.7129
## 1st Qu.: 0 1st Qu.:-0.6480
## Median : 1 Median :-0.1697
## Mean : 2 Mean : 0.0000
## 3rd Qu.: 3 3rd Qu.: 0.5189
## Max. :10 Max. : 2.9984
## class : SpatVector
## geometry : points
## dimensions : 100, 1 (geometries, attributes)
## extent : 755.3902, 3552.415, 755.3902, 3552.415 (xmin, xmax, ymin, ymax)
## coord. ref. :
## names : obs
## type : <int>
## values : 0
## 3
## 7
## class : SpatRaster
## size : 200, 200, 3 (nrow, ncol, nlyr)
## resolution : 20, 20 (x, y)
## extent : 0, 4000, 0, 4000 (xmin, xmax, ymin, ymax)
## coord. ref. :
## source : landscape.tif
## names : land1, land2, land3
## min values : 0, 0, 0
## max values : 1, 1, 1
kernel_prep
To begin an analysis, the kernel_prep
function must be
run with the spatRaster
layers and spatial point data.
Additionally, it is necessary to specify the maximum distance (in raster
map units) that you want to consider in the analysis. The greater the
distance considered, the more computationally intensive the analysis
will be, so some consideration and discretion is needed. If the maximum
distance appears to be constraining the optimization results, you will
receive a warning.
kernel_inputs <- kernel_prep(pts = pts,
raster_stack = land_rast,
max_D = 1700,
kernel = 'gaussian',
verbose = FALSE)
kernel_inputs
##
## There are 100 observations at 3 spatial covariate(s):
## land1 land2 land3
##
## The specified kernel is:
## gaussian
##
## Number of elements:
## 13111
## Minimum Distance:
## 20
## Maximum Distance:
## 1700
## Unit Conversion:
## 1700
Next, you need to fit the preliminary model that will serve as the
starting point for optimizing scales of effect. We will pull raster
values from the newly created kernel_inputs
then fit a GLM.
These raster value are scaled and centered weighted mean values for each
layer, at each point.
## 'data.frame': 100 obs. of 5 variables:
## $ counts: int 0 2 7 3 2 0 2 1 1 1 ...
## $ site : num 0.992 1.992 0.3 -0.623 1.891 ...
## $ land1 : num 0.959 0.565 -0.132 -0.73 1.238 ...
## $ land2 : num -0.15 0.726 0.513 0.623 0.958 ...
## $ land3 : num 0.6937 0.9235 0.455 -0.0789 1.0492 ...
Fit Poisson GLM
##
## Call:
## glm(formula = counts ~ site + land1 + land2 + land3, family = poisson(),
## data = df)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.52794 0.08307 6.355 2.08e-10 ***
## site -0.02191 0.11116 -0.197 0.844
## land1 -0.38930 0.08881 -4.383 1.17e-05 ***
## land2 0.77890 0.14288 5.451 5.00e-08 ***
## land3 0.09565 0.13016 0.735 0.462
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 224.24 on 99 degrees of freedom
## Residual deviance: 165.40 on 95 degrees of freedom
## AIC: 372.08
##
## Number of Fisher Scoring iterations: 5
multiScale_optim
We are now ready to use the multiScale_optim
function.
Note: for purposes of this vignette, the function is not being run in
parallel. Optimization will be much quicker when parallelized by
specifying the number of cores to use with n_cores
. The
optimization below takes ~50 seconds to complete.
We get some helpful warning messages indicating the estimated scale
of effect for one of our raster variables exceeds the max_D
threshold we specified with the kernel_prep
function and we
get a suggestion of how to correct this. We also get a warning that the
precision (standard error) for one or more sigma terms is large relative
to the mean estimate. Let’s first take a look at our results.
##
## Call:
## multiScale_optim(fitted_mod = mod0, kernel_inputs = kernel_inputs)
##
##
## Kernel used:
## gaussian
##
## ***** Optimized Scale of Effect -- Sigma *****
##
## Mean SE 2.5% 97.5%
## land1 248.1714 34.19032 180.2951 316.0478
## land2 538.4724 50.76299 437.6951 639.2496
## land3 538.9632 309.97857 20.0000 1154.3485
##
##
## ====================================
##
## ***** Optimized Scale of Effect -- Distance *****
## 90% Kernel Weight
##
## Mean 2.5% 97.5%
## land1 408.21 296.56 519.85
## land2 885.71 719.94 1051.47
## land3 886.52 32.90 1898.73
##
##
## ====================================
##
## ***** Fitted Model Summary *****
##
##
## Call:
## glm(formula = counts ~ site + land1 + land2 + land3, family = poisson(),
## data = dat)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.37648 0.09260 4.066 4.79e-05 ***
## site 0.06417 0.09891 0.649 0.5165
## land1 -0.50803 0.08013 -6.340 2.30e-10 ***
## land2 0.75707 0.09939 7.617 2.60e-14 ***
## land3 0.18076 0.09994 1.809 0.0705 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 224.24 on 99 degrees of freedom
## Residual deviance: 106.99 on 95 degrees of freedom
## AIC: 313.66
##
## Number of Fisher Scoring iterations: 5
##
##
## WARNING!!!
## The estimated scale of effect extends beyond the maximum distance specified.
## Consider increasing max_D in `kernel_prep` to ensure accurate estimation of scale.
##
##
## WARNING!!!
## The standard error of one or more `sigma` estimates is >= 50% of the estimated mean value.
## Carefully assess whether or not this variable is meaningful in your analysis and interpret with caution.
At the bottom of the output is the standard GLM
model
summary output. We see that site
appears to have no effect,
land1
a negative effect, land2
a positive
effect and land3
a weak positive effect on the observed
counts. At the top of the summary output are the estimated sigma terms
related to the Gaussian kernel for each raster layer. The standard error
and 95% confidence intervals are also reported. Below this, the actual
distance effect is calculated. By default, the distance that encompasses
90% of the kernel weight is identified. From these summaries, we see
that land3
is the variable that may have a larger scale of
effect than we anticipated, but that it is also being imprecisely
estimated. This may be a red flag that the variable is not relevant to
the analysis and/or should not be scaled.
We could update max_D
by re-running
kernel_prep
, but we won’t do that here. One thing to be
aware of when optimizing scales of effect is that it is possible to have
overfit models with parameters that don’t have a strong relationship
with your response. With this simulated data, we know that
land3
actually has no effect. Let’s fit another model
without this variable.
No warnings! Let’s look at our results.
##
## Call:
## multiScale_optim(fitted_mod = mod0_2, kernel_inputs = kernel_inputs)
##
##
## Kernel used:
## gaussian
##
## ***** Optimized Scale of Effect -- Sigma *****
##
## Mean SE 2.5% 97.5%
## land1 238.4505 33.14563 172.6569 304.2441
## land2 499.5337 47.47022 405.3061 593.7614
##
##
## ====================================
##
## ***** Optimized Scale of Effect -- Distance *****
## 90% Kernel Weight
##
## Mean 2.5% 97.5%
## land1 392.22 284.00 500.44
## land2 821.66 666.67 976.65
##
##
## ====================================
##
## ***** Fitted Model Summary *****
##
##
## Call:
## glm(formula = counts ~ site + land1 + land2, family = poisson(),
## data = dat)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.39293 0.09098 4.319 1.57e-05 ***
## site 0.17476 0.08246 2.119 0.0341 *
## land1 -0.48620 0.07922 -6.137 8.40e-10 ***
## land2 0.62567 0.07245 8.636 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 224.24 on 99 degrees of freedom
## Residual deviance: 109.57 on 96 degrees of freedom
## AIC: 314.24
##
## Number of Fisher Scoring iterations: 5
Overall, we did a pretty good job recovering the data generating
parameter values within the GLM as well as the sigma values for scaling
the raster surfaces. We can further explore and visualize our results.
Using the plot
function, we can visualize how the weighted
contribution of each variable decrease with distance. By default, the
mean and 95% confidence interval of the 90% cumulative kernel weight is
identified on the plot. This can be modified by changing
prob
within the plot function.
In addition to the plotting the kernel function, effects plots from
the fitted model object can also be generated using
plot_marginal_effects
We can also apply the optimized kernel to the raster surfaces.
##
## Smoothing spatRaster 1 of 2: land1 at sigma = 238
##
## Smoothing spatRaster 2 of 2: land2 at sigma = 499
Data were simulated from a Gaussian kernel, but we can optimize using other kernels and compare model performance. Starting values had to be specified when using the exponential power kernel due to convergence issues.
## Negative Exponential
exp_inputs <- kernel_prep(pts = pts,
raster_stack = land_rast,
max_D = 1700,
kernel = 'exp',
verbose = FALSE)
## Exponential Power
expow_inputs <- kernel_prep(pts = pts,
raster_stack = land_rast,
max_D = 1700,
kernel = 'expow',
verbose = FALSE)
## Fixed width buffer
fixed_inputs <- kernel_prep(pts = pts,
raster_stack = land_rast,
max_D = 1700,
kernel = 'fixed',
verbose = FALSE)
Optimize model with alternative kernels
opt_exp <- multiScale_optim(fitted_mod = mod0_2,
kernel_inputs = exp_inputs)
## Starting values needed
opt_expow <- multiScale_optim(fitted_mod = mod0_2,
kernel_inputs = expow_inputs,
par = c(500/expow_inputs$unit_conv,
500/exp_inputs$unit_conv,
5,10))
opt_fixed <- multiScale_optim(fitted_mod = mod0_2,
kernel_inputs = fixed_inputs)
If you explore each of the outputs, you’ll see that we generally arrive at similar results. There are some warnings, but we won’t worry about those right now. Next, we’ll try to compare the relative support of these models using different weighted kernels.
With multiScaleR
it is possible to create AIC(c) or BIC
tables from lists of fitted models.
##
## Model selection based on AICc:
##
## K AICc Delta_AICc AICcWt Cum.Wt LL
## [expow]~site + land1 + land2 6 316.27 0.00 0.47 0.47 -151.68
## [gaussian]~site + land1 + land2 5 316.88 0.61 0.34 0.81 -153.12
## [fixed]~site + land1 + land2 5 318.10 1.83 0.19 1.00 -153.73
## [exp]~site + land1 + land2 5 327.39 11.12 0.00 1.00 -158.38
##
## Model selection based on BIC:
##
## K BIC Delta_BIC BICWt Cum.Wt LL
## [gaussian]~site + land1 + land2 5 329.27 0.00 0.51 0.51 -153.12
## [fixed]~site + land1 + land2 5 330.48 1.22 0.28 0.78 -153.73
## [expow]~site + land1 + land2 6 331.00 1.73 0.21 1.00 -151.68
## [exp]~site + land1 + land2 5 339.78 10.51 0.00 1.00 -158.38
The questionably-fitting exponential power model is best-supported by AICc ranking while the Gaussian model is best-supported by BIC. Because of the added complexity (extra parameter) with the exponential power kernel, uncertain optimization, and general equivalency to other models, we should probably lean toward the simpler Gaussian kernel model. This example highlights the challenges that may be encountered when trying to parse different kernels for estimating scales of effect. From observations of performance with simulated data, use of different kernels tends to result in a similar final model and interpretation of variable effects. This, in part, is why the Gaussian kernel is the default, and likely will meet most researcher needs.
Of greater interest is using model selection to identify the best supported model in terms of parameterization (not kernel used).
## Landscape only effect
mod0_3 <- glm(counts ~ land1 + land2,
family = poisson(),
data = df)
## Landscape 1 only effect
mod0_4 <- glm(counts ~ land1,
family = poisson(),
data = df)
## Landscape 2 only effect
mod0_5 <- glm(counts ~ land2,
family = poisson(),
data = df)
## Landscape 3 only effect
mod0_6 <- glm(counts ~ land3,
family = poisson(),
data = df)
## Site only effect
## No multiScaleR optimization
mod0_7 <- glm(counts ~ site,
family = poisson(),
data = df)
Optimize scale for each alternative model
opt3 <- multiScale_optim(fitted_mod = mod0_3,
kernel_inputs = kernel_inputs)
opt4 <- multiScale_optim(fitted_mod = mod0_4,
kernel_inputs = kernel_inputs)
opt5 <- multiScale_optim(fitted_mod = mod0_5,
kernel_inputs = kernel_inputs)
opt6 <- multiScale_optim(fitted_mod = mod0_6,
kernel_inputs = kernel_inputs)
Put models into list and assess.
##
## Model selection based on AICc:
##
## K AICc Delta_AICc AICcWt Cum.Wt
## [gaussian]~site + land1 + land2 + land3 6 316.56 0.00 0.46 0.46
## [gaussian]~site + land1 + land2 5 316.88 0.31 0.40 0.86
## [gaussian]~land1 + land2 4 318.96 2.40 0.14 1.00
## [gaussian]~land2 3 349.41 32.85 0.00 1.00
## [gaussian]~land1 3 393.73 77.17 0.00 1.00
## [gaussian]~land3 3 418.64 102.08 0.00 1.00
## [NA]~site 2 422.96 106.40 0.00 1.00
## LL
## [gaussian]~site + land1 + land2 + land3 -151.83
## [gaussian]~site + land1 + land2 -153.12
## [gaussian]~land1 + land2 -155.27
## [gaussian]~land2 -171.58
## [gaussian]~land1 -193.74
## [gaussian]~land3 -206.20
## [NA]~site -209.42
##
## Model selection based on BIC:
##
## K BIC Delta_BIC BICWt Cum.Wt LL
## [gaussian]~land1 + land2 4 328.96 0.00 0.46 0.46 -155.27
## [gaussian]~site + land1 + land2 5 329.27 0.31 0.40 0.86 -153.12
## [gaussian]~site + land1 + land2 + land3 6 331.29 2.33 0.14 1.00 -151.83
## [gaussian]~land2 3 356.98 28.02 0.00 1.00 -171.58
## [gaussian]~land1 3 401.30 72.34 0.00 1.00 -193.74
## [gaussian]~land3 3 426.21 97.25 0.00 1.00 -206.20
## [NA]~site 2 428.05 99.09 0.00 1.00 -209.42
Model selection tables clearly show that the inclusion of scaled landscape variables is important, however information criterion may not have the greatest resolution or power to differentiate among competing models. Analyses will likely require a critical, holistic assessment of information criterion, parameter effect sizes, and precision in scale of effect estimates (i.e. sigma).
The challenge of identifying ‘significant’ scale relationships was
also noted by Lowe et al., and the R package scalescape
has
a bootstrap function to determine the significance of parameters within
the regression model. Such procedures are not currently implemented in
multiScaleR
.
unmarked
Among the model classes that multiScaleR
can be used to
optimize are those from unmarked
. We will use the
simulation functions from the package to create data for this analysis.
First, we will simulate and visualize raster surfaces.
We’ll use the bin1
and cont2
surfaces for
our data simulation. First, we’ll simulate count data with a Poisson
distribution. For the unmarked
data simulation, we need to
specify the count model intercept (alpha
), and regression
coefficients that describe the effect of the raster layers
(beta
), the scale of effect (sigma
), number of
survey points on the landscape (n_points
), the number of
replicate surveys conducted (n_surv
), the detection
probability (det
; here, the probability of detecting an
individual), and the maximum distance to consider in the analysis
(max_D
).
rs <- terra::subset(rs, c(1,4))
s_dat <- sim_dat_unmarked(alpha = 1,
beta = c(0.75,-0.75),
kernel = 'gaussian',
sigma = c(75, 150),
n_points = 75,
n_surv = 5,
det = 0.5,
type = 'count',
raster_stack = rs,
max_D = 550,
user_seed = 555)
plot(s_dat$df$y ~ s_dat$df$bin1)
We can see that we have simulated a positive effect of
bin1
on counts and negative effect of cont2
on
counts. We can now prepare data for unmarked
and optimize
scale of effect with multiScaleR
.
library(unmarked)
kernel_inputs <- kernel_prep(pts = s_dat$pts,
raster_stack = rs,
max_D = 550,
kernel = 'gaus',
verbose = FALSE)
umf <- unmarkedFramePCount(y = s_dat$y,
siteCovs = kernel_inputs$kernel_dat)
## Base unmarked model
mod0_umf.p <- unmarked::pcount(~1 ~bin1 + cont2,
data = umf,
K = 100)
Compare optimized results with simulated values. Note that the
detection reported by unmarked
is on the logit scale, so
must be back-transformed. Overall, both the scale parameters and the
Poisson count model parameters were accurately recovered.
##
## Call:
## multiScale_optim(fitted_mod = mod0_umf.p, kernel_inputs = kernel_inputs,
## n_cores = 8)
##
##
## Kernel used:
## gaussian
##
## ***** Optimized Scale of Effect -- Sigma *****
##
## Mean SE 2.5% 97.5%
## bin1 70.37381 10.62560 49.20652 91.5411
## cont2 145.11999 25.01268 95.29217 194.9478
##
##
## ====================================
##
## ***** Optimized Scale of Effect -- Distance *****
## 90% Kernel Weight
##
## Mean 2.5% 97.5%
## bin1 115.75 80.94 150.57
## cont2 238.70 156.74 320.66
##
##
## ====================================
##
## ***** Fitted Model Summary *****
##
##
## Call:
## unmarked::pcount(formula = ~1 ~ bin1 + cont2, data = inputs$data,
## K = 100, mixture = "P")
##
## Abundance (log-scale):
## Estimate SE z P(>|z|)
## (Intercept) 1.070 0.0996 10.74 6.37e-27
## bin1 0.626 0.0710 8.81 1.30e-18
## cont2 -0.753 0.0716 -10.51 7.70e-26
##
## Detection (logit-scale):
## Estimate SE z P(>|z|)
## -0.000922 0.147 -0.00627 0.995
##
## AIC: 1016.524
## Number of sites: 75
## [1] 0.4997696
Assess kernel and effects plots
Finally, we may want to project our fitted model to the landscape to
make spatial predictions. To do this, we need to scale and center our
raster surfaces after applying our kernel transformations. The
spatRaster
object can then be used with the
terra::predict
function. Depending upon the spatial
coverage of your sample points, you may find predicted values across a
broader region unrealistic. In these instances, you may need to clamp
the scaled rasters to be within a reasonable range of those observed in
your sample data.
## Project model
rast_scale.center <- kernel_scale.raster(raster_stack = rs,
multiScaleR = opt_umf.p,
scale_center = TRUE,
clamp = TRUE,
pct_mx = 0.00)
##
## Smoothing spatRaster 1 of 2: bin1 at sigma = 70
##
## Smoothing spatRaster 2 of 2: cont2 at sigma = 145
ab.mod_pred <- terra::predict(rast_scale.center,
opt_umf.p$opt_mod,
type = 'state')
plot(ab.mod_pred)
Now we’ll simulate occurrence data suitable for analysis with
unmarked
. Preliminary experience has shown that simulation
parameters can be harder to recover when using an occupancy model in
unmarked
with multiScaleR
. We’ll use the same
raster surfaces.
s_dat.occ <- sim_dat_unmarked(alpha = 0.25,
beta = c(-0.75,1),
kernel = 'gaussian',
sigma = c(225, 75),
n_points = 250,
n_surv = 5,
det = 0.5,
type = 'occ',
raster_stack = rs,
max_D = 800,
user_seed = 555)
plot(s_dat.occ$df$y ~ s_dat.occ$df$bin1)
Prepare inputs for use with multiScaleR
kernel_inputs <- kernel_prep(pts = s_dat.occ$pts,
raster_stack = rs,
max_D = 800,
kernel = 'gaus',
verbose = FALSE)
## Occupancy frame
umf <- unmarkedFrameOccu(y = s_dat.occ$y,
siteCovs = kernel_inputs$kernel_dat)
## Base unmarked model
(mod0_umf.occ <- unmarked::occu(~1 ~bin1 + cont2,
data = umf))
##
## Call:
## unmarked::occu(formula = ~1 ~ bin1 + cont2, data = umf)
##
## Occupancy (logit-scale):
## Estimate SE z P(>|z|)
## (Intercept) 0.546 0.159 3.44 5.76e-04
## bin1 -0.829 0.175 -4.72 2.33e-06
## cont2 0.841 0.173 4.86 1.17e-06
##
## Detection (logit-scale):
## Estimate SE z P(>|z|)
## 0.048 0.0785 0.612 0.541
##
## AIC: 1307.227
## Number of sites: 250
opt_umf.occ <- multiScale_optim(fitted_mod = mod0_umf.occ,
kernel_inputs = kernel_inputs,
n_cores = 8)
##
## Call:
## multiScale_optim(fitted_mod = mod0_umf.occ, kernel_inputs = kernel_inputs,
## n_cores = 8)
##
##
## Kernel used:
## gaussian
##
## ***** Optimized Scale of Effect -- Sigma *****
##
## Mean SE 2.5% 97.5%
## bin1 183.3404 60.39596 64.39061 302.2901
## cont2 139.1945 59.22802 22.54502 255.8440
##
##
## ====================================
##
## ***** Optimized Scale of Effect -- Distance *****
## 90% Kernel Weight
##
## Mean 2.5% 97.5%
## bin1 301.57 105.91 497.22
## cont2 228.95 37.08 420.83
##
##
## ====================================
##
## ***** Fitted Model Summary *****
##
##
## Call:
## unmarked::occu(formula = ~1 ~ bin1 + cont2, data = inputs$data,
## knownOcc = object@knownOcc)
##
## Occupancy (logit-scale):
## Estimate SE z P(>|z|)
## (Intercept) 0.636 0.173 3.68 2.32e-04
## bin1 -0.983 0.173 -5.67 1.41e-08
## cont2 1.009 0.200 5.04 4.78e-07
##
## Detection (logit-scale):
## Estimate SE z P(>|z|)
## 0.0498 0.0783 0.637 0.524
##
## AIC: 1293.509
## Number of sites: 250
## [1] 0.5124544
We did a decent job of recovering simulated values. Note that the
sample size was increased for this simulation. Previous explorations
suggested that smaller sample sizes (e.g., 100) did not contain enough
information to optimize the scales of effect (sigma
). There
is opportunity for further exploration of data needs / limitations for
estimating scales of effect, especially related to more complex models
such those fit with unmarked
.
## Project model
rast_scale.center <- kernel_scale.raster(raster_stack = rs,
multiScaleR = opt_umf.occ,
scale_center = TRUE,
clamp = T)
##
## Smoothing spatRaster 1 of 2: bin1 at sigma = 183
##
## Smoothing spatRaster 2 of 2: cont2 at sigma = 139
occ.mod_pred <- terra::predict(rast_scale.center,
opt_umf.occ$opt_mod,
type = 'state')
plot(occ.mod_pred)
multiScaleR
has been developed to work with any model
class that has an update function. It has been tested with models fit
using several different packages. If a particular model class does not
work, please let me know so I can trouble shoot and try to incorporate
functionality for it. In the example below, I demonstrate the fitting of
a zero-inflated model with the pscl
package. Notably, the
zero-inflated term is a spatial variable that is also smoothed during
the analysis. This requires some manual processing to simulate data.
set.seed(12345)
rs <- sim_rast(user_seed = 999,
dim = 500, resolution = 30)
rs <- terra::subset(rs, c(1,3))
## Zero-inflated parameters
zi_alpha <- -0.25
zi_beta <- -1
n_points <- 400
alpha <- 0.5
beta <- 0.75
kernel <- 'gaussian'
sigma <- c(400, # For main effect, bin1
200) # For zero-inflated, cont1
StDev <- 2 # Negative binomial dispersion
## Generate random points
bnd <- buffer(vect(ext(rs[[1]])), -1000)
pts <- spatSample(x = rs[[1]],
size = n_points,
method = 'random',
ext = ext(bnd),
replace = F,
as.points = T)
kernel_out <- kernel_prep(pts = pts,
raster_stack = rs,
max_D = 1500,
kernel = kernel,
sigma = sigma,
verbose = FALSE)
## ZINB simulation
zi_prob <- plogis(zi_alpha + zi_beta*kernel_out$kernel_dat$cont1)
# Simulate zero-inflated counts
# 1 = excess zero, 0 = from Poisson
is_zero <- rbinom(length(zi_prob), size = 1, prob = zi_prob)
obs <- exp(alpha + beta*kernel_out$kernel_dat$bin1)
obs.zinb <- ifelse(is_zero, 0,
rnbinom(length(is_zero),
mu = obs,
size = StDev))
dat <- data.frame(cnt = obs.zinb,
kernel_out$kernel_dat)
with(dat, plot(cnt ~ bin1))
With data generated, we can now fit our zero-inflated model and
kernel_prep
and optimize with multiScaleR
.
## Classes and Methods for R originally developed in the
## Political Science Computational Laboratory
## Department of Political Science
## Stanford University (2002-2015),
## by and under the direction of Simon Jackman.
## hurdle and zeroinfl functions by Achim Zeileis.
zinb_mod <- pscl::zeroinfl(cnt ~ bin1 | cont1,
dist = 'negbin',
data = dat)
kernel_inputs <- kernel_prep(pts = pts,
kernel = kernel,
max_D = 1500,
raster_stack = rs,
verbose = FALSE)
NOTE: It is highly encouraged to fit your model as
pscl::zeroinfl
.
We did a pretty good job in this simple scenario. Note the larger
sample size. This was necessary to get good estimates for all
parameters. Also, while cont1
was estimated relatively
accurately, there is much greater uncertainty when compared to
bin1
.
##
## Call:
## multiScale_optim(fitted_mod = zinb_mod, kernel_inputs = kernel_inputs,
## n_cores = 4)
##
##
## Kernel used:
## gaussian
##
## ***** Optimized Scale of Effect -- Sigma *****
##
## Mean SE 2.5% 97.5%
## bin1 340.6336 45.13498 251.89872 429.3684
## cont1 180.5352 63.92097 54.86737 306.2031
##
##
## ====================================
##
## ***** Optimized Scale of Effect -- Distance *****
## 90% Kernel Weight
##
## Mean 2.5% 97.5%
## bin1 560.29 414.34 706.25
## cont1 296.95 90.25 503.66
##
##
## ====================================
##
## ***** Fitted Model Summary *****
##
##
## Call:
## pscl::zeroinfl(formula = cnt ~ bin1 | cont1, data = dat, dist = "negbin")
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -1.1449 -0.6028 -0.4006 0.3115 5.7773
##
## Count model coefficients (negbin with log link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.51167 0.09169 5.581 2.4e-08 ***
## bin1 0.88011 0.06789 12.965 < 2e-16 ***
## Log(theta) 0.86488 0.27752 3.116 0.00183 **
##
## Zero-inflation model coefficients (binomial with logit link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.3549 0.2033 -1.746 0.0809 .
## cont1 -1.0701 0.2028 -5.276 1.32e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Theta = 2.3747
## Number of iterations in BFGS optimization: 12
## Log-likelihood: -522.8 on 5 Df
Below is a brief walk through the other functions related to scales
of effect that are included with the multiScaleR
package.
kernel_dist
In some cases, a direct assessment of scale of effect distance is desired.
## Mean 2.5% 97.5%
## land1 392.22 284.00 500.44
## land2 821.66 666.67 976.65
## Mean 2.5% 97.5%
## land1 467.35 338.40 596.31
## land2 979.07 794.39 1163.75
It’s also possible to calculate the distance encompassed by specified kernels without a fitted model.
## [1] 164.49
## [1] 230.26
## [1] 90.44
plot_kernel
This is a generic function to visualize how kernel weight decays with distance
sim_rast
Raster surfaces can be simulated that are amenable for use the
sim_dat
function.
r_sim1 <- sim_rast(dim = 100,
resolution = 30,
user_seed = 555)
r_sim2 <- sim_rast(dim = 100,
resolution = 30,
autocorr_range1 = 100,
autocorr_range2 = 1,
sill = 10,
user_seed = 555)
plot(r_sim1)
sim_dat
This function will simulate data from scaled raster surfaces.
s_dat <- sim_dat(alpha = 0.25,
beta = c(0.75, -0.75),
kernel = 'gaussian',
sigma = c(350, 200),
type = 'count',
n_points = 100,
raster_stack = terra::subset(r_sim1, c(1,4)),
min_D = 250,
user_seed = 999)
Look at the simulated covariate relationships with counts
Use the simulated data with multiScaleR
sim_mod <- glm(y ~ bin1 + cont2,
family = 'poisson',
data = s_dat$df)
kernel_inputs <- kernel_prep(pts = s_dat$pts,
raster_stack = r_sim1,
max_D = 1500,
kernel = 'gaussian',
verbose = FALSE)
Check results
##
## Call:
## multiScale_optim(fitted_mod = sim_mod, kernel_inputs = kernel_inputs)
##
##
## Kernel used:
## gaussian
##
## ***** Optimized Scale of Effect -- Sigma *****
##
## Mean SE 2.5% 97.5%
## bin1 373.8632 35.31925 303.76423 443.9621
## cont2 144.6731 36.36662 72.49548 216.8508
##
##
## ====================================
##
## ***** Optimized Scale of Effect -- Distance *****
## 90% Kernel Weight
##
## Mean 2.5% 97.5%
## bin1 614.95 499.65 730.25
## cont2 237.97 119.24 356.69
##
##
## ====================================
##
## ***** Fitted Model Summary *****
##
##
## Call:
## glm(formula = y ~ bin1 + cont2, family = "poisson", data = dat)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.24979 0.09909 2.521 0.0117 *
## bin1 0.75973 0.05153 14.745 <2e-16 ***
## cont2 -0.65317 0.05669 -11.521 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 439.901 on 99 degrees of freedom
## Residual deviance: 95.606 on 97 degrees of freedom
## AIC: 286.02
##
## Number of Fisher Scoring iterations: 5
##
## Smoothing spatRaster 1 of 2: bin1 at sigma = 373
##
## Smoothing spatRaster 2 of 2: cont2 at sigma = 144
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.