| Title: | Simulated Sampling Procedure for Community Ecology |
| Version: | 1.0.2 |
| Date: | 2025-04-23 |
| Maintainer: | Edlin Guerra-Castro <edlinguerra@gmail.com> |
| Description: | The Simulation-based Sampling Protocol (SSP) is an R package designed to estimate sampling effort in studies of ecological communities. It is based on the concept of pseudo-multivariate standard error (MultSE) (Anderson & Santana-Garcon, 2015, <doi:10.1111/ele.12385>) and the simulation of ecological data. The theoretical background is described in Guerra-Castro et al. (2020, <doi:10.1111/ecog.05284>). |
| Depends: | R (≥ 3.5.0) |
| License: | GPL-3 |
| Encoding: | UTF-8 |
| LazyData: | true |
| RoxygenNote: | 7.3.2 |
| Suggests: | knitr, rmarkdown, testthat, roxygen2 |
| VignetteBuilder: | knitr |
| URL: | https://github.com/edlinguerra/SSP |
| BugReports: | https://github.com/edlinguerra/SSP/issues |
| Imports: | vegan, stats, sampling, ggplot2 |
| NeedsCompilation: | no |
| Packaged: | 2025-04-24 22:53:35 UTC; edlin |
| Author: | Edlin Guerra-Castro [aut, cre], Maite Mascaro [aut], Nuno Simoes [aut], Juan Cruz-Motta [aut], Juan Cajas [aut] |
| Repository: | CRAN |
| Date/Publication: | 2025-04-24 23:20:02 UTC |
SSP: Simulated Sampling Procedure for Community Ecology
Description
SSP is an R package designed to estimate sampling effort in studies of ecological communities based on the definition of pseudo multivariate standard error (MultSE) (Anderson & Santana-Garcon 2015) and simulation of data (Guerra-Castro et al., 2021).
Details
The protocol in SSP consists in simulating several extensive data matrices that mimic some of the relevant ecological features of the community of interest using a pilot data set. For each simulated data, several sampling efforts are repeatedly executed and MultSE is calculated to each one. The mean value, 0.025 and 0.975 quantiles of MultSE for each sampling effort across all simulated data are then estimated and plotted. The mean values are standardized in relation to the lowest sampling effort (consequently, the worst precision), and an optimal sampling effort can be identified as that in which the increase in sample size do not improve the precision beyond a threshold value (e.g. 2.5%).
SSP includes seven functions: assempar for extrapolation of assemblage parameters using pilot data; simdata for simulation of several data sets based on extrapolated parameters; datquality for evaluation of plausibility of simulated data; sampsd for repeated estimations of MultSE for different sampling designs in simulated data sets; summary_ssp for summarizing the behavior of MultSE for each sampling design across all simulated data sets, ioptimum for identification of the optimal sampling effort, and plot_ssp to plot sampling effort vs MultSE of simulated data.
The SSP package is developed at GitHub (https://github.com/edlinguerra/SSP/).
Author(s)
The SSP development team is Edlin Guerra-Castro, Maite Mascaro, Nuno Simoes, Juan Cruz-Motta and Juan Cajas
References
-Anderson, M.J., & Santana-Garcon, J. (2015). Measures of precision for dissimilarity-based multivariate analysis of ecological communities. Ecology Letters 18(1), 66-73. doi: doi:10.1111/ele.12385
-Guerra-Castro, E.J., Cajas, J.C., Simões, N., Cruz-Motta, J.J., & Mascaró, M. (2021). SSP: an R package to estimate sampling effort in studies of ecological communities. Ecography 44(4), 561-573. doi: doi:10.1111/ecog.05284
Examples
###To speed up the simulation of these examples, the cases, sites and N were set small.
##Single site: micromollusk from Cayo Nuevo (Yucatan, Mexico)
data(micromollusk)
#Estimation of parameters of pilot data
par.mic<-assempar (data = micromollusk,
type= "P/A",
Sest.method = "average")
#Simulation of 3 data sets, each one with 20 potential sampling units from a single site
sim.mic<-simdata(par.mic, cases= 3, N = 20, sites = 1)
#Sampling and estimation of MultSE for each sample size (few repetitions
#to speed up the example)
sam.mic<-sampsd(dat.sim = sim.mic,
Par = par.mic,
transformation = "P/A",
method = "jaccard",
n = 10,
m = 1,
k = 3)
#Summary of MultSE for each sampling effort
summ.mic<-summary_ssp(results = sam.mic, multi.site = FALSE)
#Cut-off points to identify optimal sampling effort
opt.mic<-ioptimum(xx = summ.mic, multi.site = FALSE)
#Plot
plot_ssp(xx = summ.mic, opt = opt.mic, multi.site = FALSE)
##Multiple sites: Sponges from Alacranes National Park (Yucatan, Mexico).
data(sponges)
#Estimation of parameters of pilot data
par.spo<-assempar(data = sponges,
type= "counts",
Sest.method = "average")
#Simulation of 3 data sets, each one with 10 potential sampling units in 3 sites.
sim.spo<-simdata(par.spo, cases= 3, N = 10, sites = 3)
#Sampling and estimation of MultSE for each sampling design (few repetitions
#to speed up the example)
sam.spo<-sampsd(dat.sim = sim.spo,
Par = par.spo,
transformation = "square root",
method = "bray",
n = 10,
m = 3,
k = 3)
#Summary of MultSE for each sampling effort
summ.spo<-summary_ssp(results = sam.spo, multi.site = TRUE)
#Cut-off points to identify optimal sampling effort
opt.spo<-ioptimum(xx = summ.spo, multi.site = TRUE)
#Plot
plot_ssp(xx = summ.spo, opt = opt.spo, multi.site = TRUE)
Estimation of Ecological Parameters of the Assemblage
Description
This function extracts the main parameters of the pilot data using base R functions,
as well as functions like specpool and dispweight.
Usage
assempar(data, type = c("P/A", "counts", "cover"), Sest.method = "average")
Arguments
data |
Data frame with species names (columns) and samples (rows). The first column should indicate the site to which the sample belongs, regardless of whether a single site has been sampled. |
type |
Nature of the data to be processed. It may be presence/absence ("P/A"), counts of individuals ("counts"), or coverage ("cover"). |
Sest.method |
Method for estimating species richness. The function |
Details
The expected number of species in the assemblage is estimated using non-parametric methods (Gotelli et al. 2011). Due to variability in the estimates of each approximation (Reese et al. 2014), we recommend using the average. The probability of detection of each species is estimated among and within sites. Among-site detection is calculated as the frequency of occurrences of each species across sampled sites; within-site detection is calculated as the weighted average of frequencies in sites where the species are present. Spatial aggregation (only for count data) is evaluated using the index of dispersion D (Clarke et al. 2006). Properties of unseen species are approximated using information from observed species, assuming their detection probabilities match those of the rarest observed species. Abundance distributions are simulated using random Poisson values with lambda as the overall mean of observed abundances.
Value
A list (class list) containing the estimated parameters of the assemblage, to be used by simdata.
Note
Important: The first column should indicate the site ID of each sample (as character or numeric), even when only a single site was sampled.
References
Clarke, K. R., Chapman, M. G., Somerfield, P. J., & Needham, H. R. (2006). Dispersion-based weighting of species counts in assemblage analyses. Journal of Experimental Marine Biology and Ecology, 320, 11–27.
Gotelli, N. J., & Colwell, R. K. (2011). Estimating species richness. In A. E. Magurran & B. J. McGill (Eds.), Biological diversity: frontiers in measurement and assessment (pp. 39–54). Oxford University Press.
Guerra-Castro, E.J., Cajas, J.C., Simões, N., Cruz-Motta, J.J., & Mascaró, M. (2021). SSP: an R package to estimate sampling effort in studies of ecological communities. Ecography 44(4), 561-573. doi: doi:10.1111/ecog.05284
Reese, G. C., Wilson, K. R., & Flather, C. H. (2014). Performance of species richness estimators across assemblage types and survey parameters. Global Ecology and Biogeography, 23(5), 585–594.
See Also
Examples
## Single site: micromollusk from Cayo Nuevo (Yucatan, Mexico)
data(micromollusk)
par.mic <- assempar(data = micromollusk, type = "P/A", Sest.method = "average")
par.mic
## Multiple sites: Sponges from Alacranes National Park (Yucatan, Mexico)
data(sponges)
par.spo <- assempar(data = sponges, type = "counts", Sest.method = "average")
par.spo
Diversity Metrics of Simulated and Original Data
Description
Estimates the average number of species and the Simpson diversity index per sampling unit, as well as the total multivariate dispersion of both the original (pilot) and simulated datasets.
Usage
datquality(data, dat.sim, Par, transformation, method)
Arguments
data |
Data frame with species as columns and samples as rows. The first column should indicate the site to which the sample belongs, regardless of whether a single site was sampled. |
dat.sim |
List of simulated data sets generated by |
Par |
List of parameters generated by |
transformation |
Mathematical transformation to reduce the weight of dominant species: one of "square root", "fourth root", "Log (X+1)", "P/A", or "none". |
method |
Dissimilarity metric used for multivariate dispersion, passed to |
Details
The quality of the simulated data sets is evaluated by statistical similarity to the pilot data. This includes: (i) the average number of species per sampling unit, (ii) the average Simpson diversity index, and (iii) the multivariate dispersion (MVD), defined as the average dissimilarity of each sampling unit to the group centroid in the dissimilarity space (Anderson 2006). For simulated datasets, mean and standard deviation are reported for (i) and (ii), and the 0.95 quantile of the MVD distribution is used to describe its variability.
Value
A data frame containing the mean and standard deviation of richness and diversity per sampling unit, and the MVD for original data, as well as the 0.95 quantile of MVD from the simulated data.
Note
It is desirable that simulated data resemble observed data in species richness and diversity per sampling unit.
References
Anderson, M. J. (2006). Distance-based tests for homogeneity of multivariate dispersions. Biometrics, 62, 245–253.
Guerra-Castro, E.J., Cajas, J.C., Simões, N., Cruz-Motta, J.J., & Mascaró, M. (2021). SSP: an R package to estimate sampling effort in studies of ecological communities. Ecography 44(4), 561-573. doi: doi:10.1111/ecog.05284
See Also
Examples
## Single site: micromollusk from Cayo Nuevo (Yucatan, Mexico)
data(micromollusk)
par.mic <- assempar(data = micromollusk, type = "P/A", Sest.method = "average")
sim.mic <- simdata(par.mic, cases = 3, N = 10, sites = 1)
qua.mic <- datquality(data = micromollusk, dat.sim = sim.mic, Par = par.mic,
transformation = "none", method = "jaccard")
qua.mic
## Multiple sites: Sponges from Alacranes National Park (Yucatan, Mexico)
data(sponges)
par.spo <- assempar(data = sponges, type = "counts", Sest.method = "average")
sim.spo <- simdata(par.spo, cases = 3, N = 10, sites = 3)
qua.spo <- datquality(data = sponges, dat.sim = sim.spo, Par = par.spo,
transformation = "square root", method = "bray")
qua.spo
Epibionts on Caribbean mangrove roots
Description
Data corresponds to epibenthic organisms in mangrove roots from Laguna de La Restinga National Park, Venezuela (Guerra-Castro et al. 2016).
Usage
data("epibionts")
Format
A data frame with 96 observations on the following 152 variables.
sectora factor with levels
EIMsitea numeric vector
Aaptos.spa numeric vector
Acanthophora.spiciferaa numeric vector
Acetabularia.crenulataa numeric vector
Aglaothamnion.spa numeric vector
Amathia.spa numeric vector
Amorphinopsis.atlanticaa numeric vector
Amphimedon.erinaa numeric vector
Anemonia.sargassensisa numeric vector
Aplidium.accarensea numeric vector
Aplysilla.glacialisa numeric vector
Ascidia.curvataa numeric vector
Ascidia.spa numeric vector
Ascidia.sydneiensisa numeric vector
Balanus.spa numeric vector
Bartholomea.annulataa numeric vector
Biemna.caribeaa numeric vector
Bostrychia.tenellaa numeric vector
Botrylloides.nigruma numeric vector
Botrylloides.sp.1a numeric vector
Botrylloides.sp.2a numeric vector
Brachidontes.exustusa numeric vector
Branchiomma.conspersuma numeric vector
Branchiomma.nigromaculatuma numeric vector
Bryopsis.spa numeric vector
Bugula.neritinaa numeric vector
Bugula.spa numeric vector
Calliactis.tricolora numeric vector
Callyspongia..Callyspongia..pallidaa numeric vector
Carijoa.riiseia numeric vector
Caulerpa.racemosaa numeric vector
Caulerpa.racemosa.var.peltataa numeric vector
Caulerpa.sertularioidesa numeric vector
Caulerpa.verticillataa numeric vector
Caulibugula.spa numeric vector
Celleporaria.spa numeric vector
Ceramium.diaphanuma numeric vector
Chaetomorpha.sp.1a numeric vector
Chaetomorpha.sp.2a numeric vector
Chalinula.molitbaa numeric vector
Chelonaplysilla.erectaa numeric vector
Chondrilla.nuculaa numeric vector
Chthamalus.spa numeric vector
Clathria..Clathria..microchelaa numeric vector
Clathria.spa numeric vector
Clavelina.oblongaa numeric vector
Clavelina.pictaa numeric vector
Complejo.Cliona.celataa numeric vector
Crassostrea.rhizophoraea numeric vector
Dictyota.spa numeric vector
Didemnum.cineraceuma numeric vector
Didemnum.perluciduma numeric vector
Didemnum.spa numeric vector
Diplosoma.listerianuma numeric vector
Distaplia.bermudensisa numeric vector
Distaplia.styliferaa numeric vector
Dynamena.spa numeric vector
Dysidea.etheriaa numeric vector
Dysidea.spa numeric vector
Ecteinascidia.spa numeric vector
Ecteinascidia.styeloidesa numeric vector
Ecteinascidia.turbinataa numeric vector
Eudistoma.olivaceuma numeric vector
Eusynstyela.tinctaa numeric vector
Exaiptasia.pallidaa numeric vector
Ficopomatus.spa numeric vector
Geodia.papyraceaa numeric vector
Halichondria..Halichondria..magniconulosaa numeric vector
Halichondria..Halichondria..melanadociaa numeric vector
Haliclona..Halichoclona..magnificaa numeric vector
Haliclona..Reniera..implexiformisa numeric vector
Haliclona..Reniera..manglarisa numeric vector
Haliclona..Reniera..ruetzleria numeric vector
Haliclona..Reniera..tubiferaa numeric vector
Haliclona..Rhizoniera..curacaoensisa numeric vector
Haliclona..Soestella..caeruleaa numeric vector
Haliclona..Soestella..smithaea numeric vector
Haliclona..Soestella..twincayensisa numeric vector
Halimeda.spa numeric vector
Halisarca.spa numeric vector
Halopteris.spa numeric vector
Herdmania.pallidaa numeric vector
Hippopodina.feegeensisa numeric vector
Hydroides.spa numeric vector
Hyrtios.proteusa numeric vector
Iotrochota.birotulataa numeric vector
Ircinia.felixa numeric vector
Ircinia.spa numeric vector
Isognomon.alatusa numeric vector
Kirchenpaueria.spa numeric vector
Lissoclinum.spa numeric vector
Lissodendoryx..Lissodendoryx..isodictyalisa numeric vector
Lithophyllum.pustulatuma numeric vector
Microcosmus.exasperatusa numeric vector
Molgula.occidentalisa numeric vector
Murrayella.pericladosa numeric vector
Mycale..Aegogropila..carmigropilaa numeric vector
Mycale..Aegogropila..citrinaa numeric vector
Mycale..Carmia..magnirhaphidiferaa numeric vector
Mycale..Carmia..microsigmatosaa numeric vector
Mycale..Mycale..laevisa numeric vector
Mycale..Zygomycale..angulosaa numeric vector
Mycale.spa numeric vector
Nemalecium.spa numeric vector
Notaulax.nudicollisa numeric vector
Obelia.spa numeric vector
Oceanapia.nodosaa numeric vector
Padina.spa numeric vector
Perna.viridisa numeric vector
Perophora.viridisa numeric vector
Phaeophyceaea numeric vector
Phallusia.nigraa numeric vector
Phyllangia.americanaa numeric vector
Pinctada.imbricataa numeric vector
Plakortis.angulospiculatusa numeric vector
Polyclinum.constellatuma numeric vector
Polysiphonia.sp.1a numeric vector
Polysiphonia.sp.3a numeric vector
Polysiphonia.subtilissimaa numeric vector
Pteria.colymbusa numeric vector
Pyura.sp..1a numeric vector
Pyura.sp..2a numeric vector
Pyura.vittataa numeric vector
Rhizoclonium.spa numeric vector
Rhodosoma.turcicuma numeric vector
Sabella.spa numeric vector
Sabellastarte.magnificaa numeric vector
Schizoporella.pungensa numeric vector
Scopalina.ruetzleria numeric vector
Scopalina.spa numeric vector
Scrupocellaria.spa numeric vector
Sphacelaria.rigidulaa numeric vector
Spongia..Spongia..pertusaa numeric vector
Spongia..Spongia..tubuliferaa numeric vector
Sporolithon.episporuma numeric vector
Spyridia.hypnoidesa numeric vector
Styela.canopusa numeric vector
Styela.sp.1a numeric vector
Styela.sp.2a numeric vector
Suberites.aurantiacusa numeric vector
Symplegma.brakenhielmia numeric vector
Symplegma.rubraa numeric vector
Synnotum.circinatuma numeric vector
Tedania..Tedania..ignisa numeric vector
Terpios.manglarisa numeric vector
Tethya.actiniaa numeric vector
Tethya.spa numeric vector
Trididemnum.orbiculatuma numeric vector
Ulva.spa numeric vector
Viatrix.globuliferaa numeric vector
Zoobotryon.verticillatuma numeric vector
Details
Data consists of the coverage (by point-intercept) of 110 taxa identified in 240 mangrove roots, sampled under a hierarchically nested spatial design that included four random sites within each of three sectors of the lagoon system corresponding to a strong environmental gradient: external (E), intermediate (M), and internal (I). The abundance of epibenthic organisms of 8 roots were described within each site, producing a total of 32 roots in each sector. This spatial protocol was repeated five times over a period of 14 months. For demonstrative purpose, data from the 4th sampling period was randomly chosen as data for this package.
Source
https://doi.org/10.3354/meps11693
References
Guerra-Castro, E. J., J. E. Conde, and J. J. Cruz-Motta. (2016). Scales of spatial variation in tropical benthic assemblages and their ecological relevance: epibionts on Caribbean mangrove roots as a model system. Marine Ecology Progress Series 548:97-110.
Examples
data(epibionts)
str(epibionts)
Identification of the Optimal Sampling Effort
Description
Estimates the sampling effort at which the improvement in precision (MultSE) per additional sampling unit becomes sub-optimal or redundant, based on predefined cut-off thresholds.
Usage
ioptimum(xx, multi.site = TRUE, c1 = 10, c2 = 5, c3 = 2.5)
Arguments
xx |
A data frame generated by |
multi.site |
Logical. Indicates whether multiple sites were simulated. |
c1 |
First cut threshold. Default is 10% improvement over the highest MultSE. |
c2 |
Second cut threshold. Default is 5% improvement over the highest MultSE. |
c3 |
Third cut threshold. Default is 2.5% improvement over the highest MultSE. |
Details
Sampling efforts between the minimum (e.g. 2 samples) and c1 represent the necessary effort to achieve acceptable precision.
Efforts between c1 and c2 reflect sub-optimal gains, and those between c2 and c3 are considered optimal.
Beyond c3, any additional effort results in marginal improvements in MultSE and may be considered redundant.
This classification helps support cost-benefit decisions in ecological survey design (see Underwood, 1990).
If c3 is not reached within the simulated range, the maximum available effort is returned with a warning.
Value
A vector or matrix indicating the sampling sizes corresponding to each cut-off point.
Note
The cut-off thresholds are arbitrary and should be adjusted based on the ecological question and resource availability.
In some cases, c3 may not be reached within the range of simulated sampling efforts.
References
Underwood, A. J. (1990). Experiments in ecology and management: Their logics, functions and interpretations. Australian Journal of Ecology, 15, 365–389.
Guerra-Castro, E.J., Cajas, J.C., Simões, N., Cruz-Motta, J.J., & Mascaró, M. (2021). SSP: an R package to estimate sampling effort in studies of ecological communities. Ecography 44(4), 561-573. doi: doi:10.1111/ecog.05284
See Also
Examples
## Single site example
data(micromollusk)
par.mic <- assempar(data = micromollusk, type = "P/A", Sest.method = "average")
sim.mic <- simdata(par.mic, cases = 3, N = 20, sites = 1)
sam.mic <- sampsd(dat.sim = sim.mic,
Par = par.mic,
transformation = "P/A",
method = "jaccard",
n = 10,
m = 1,
k = 3)
summ.mic <- summary_ssp(results = sam.mic, multi.site = FALSE)
opt.mic <- ioptimum(xx = summ.mic, multi.site = FALSE)
## Multiple sites example
data(sponges)
par.spo <- assempar(data = sponges, type = "counts", Sest.method = "average")
sim.spo <- simdata(par.spo, cases = 3, N = 10, sites = 3)
sam.spo <- sampsd(dat.sim = sim.spo,
Par = par.spo,
transformation = "square root",
method = "bray",
n = 10,
m = 3,
k = 3)
summ.spo <- summary_ssp(results = sam.spo, multi.site = TRUE)
opt.spo <- ioptimum(xx = summ.spo, multi.site = TRUE)
Micromollusks of marine shallow sandy bottoms around Cayo Nuevo, Gulf of Mexico, Mexico
Description
Presence/absence of 68 species registered in six cores of 4 cm diameter and 10 cm depth taken in sandy bottoms around Cayo Nuevo, Gulf of Mexico, Mexico
Usage
data("micromollusk")
Format
A data frame with 6 observations on the following 69 variables.
sitea numeric vector
Leptochiton.sp.a numeric vector
Ischnochiton..Ischnochiton..erythronotusa numeric vector
Arcidae.sp.a numeric vector
Arca.imbricataa numeric vector
Barbatia.domingensisa numeric vector
Bentharca.sp.a numeric vector
Arcopsis.adamsia numeric vector
Crenella.sp.a numeric vector
Anomia.sp..a numeric vector
Carditopsis.smithiia numeric vector
Lucinidae..a numeric vector
Chama.sinuosaa numeric vector
Chama.sp.a numeric vector
Galeommatidae.sp.a numeric vector
Chione.elevataa numeric vector
Semele.bellastriataa numeric vector
Gastropoda.sp..1..a numeric vector
Gastropoda.sp..2..a numeric vector
Gastropoda.sp..3..a numeric vector
Diodora.minutaa numeric vector
Diodora.sp...a numeric vector
Scissurella.redfernia numeric vector
Synaptocochlea.pictaa numeric vector
Lodderena.ornataa numeric vector
Cerithium.sp...a numeric vector
Sansonia.tuberculataa numeric vector
Iniforis.turristhomaea numeric vector
Metaxia.rugulosaa numeric vector
Cerithiopsis.cf..iuxtafuniculataa numeric vector
Cerithiopsis.sp.a numeric vector
Vermetidae.incertae.sedis.irregularisa numeric vector
Dendropoma.corrodensa numeric vector
Vermetid.sp..Ca numeric vector
Petaloconchus.mcgintyia numeric vector
Thylacodes.sp.a numeric vector
Alvania.auberianaa numeric vector
Alvania.colombianaa numeric vector
Alvania.sp.a numeric vector
Simulamerelina.caribaeaa numeric vector
Schwartziella.fischeria numeric vector
Zebina.brownianaa numeric vector
Zebina.sp.a numeric vector
Caecum.circumvolutuma numeric vector
Caecum.donmooreia numeric vector
Caecum.floridanuma numeric vector
Caecum.johnsonia numeric vector
Caecum.pulchelluma numeric vector
Caecum.textilea numeric vector
Caecum.sp..Ba numeric vector
Meioceras.nitiduma numeric vector
Cochliolepis.striataa numeric vector
Parviturboides.interruptusa numeric vector
Vitrinella.sp.a numeric vector
Gibberula.lavalleeanaa numeric vector
Prunum.apicinuma numeric vector
Volvarina.avenaa numeric vector
Astyris.lunataa numeric vector
Phrontis.albusa numeric vector
Phrontis.sp.a numeric vector
Trachypollia.sp...a numeric vector
Turridae.sp..1a numeric vector
Turridae.sp..2..a numeric vector
Turridae.sp..3..a numeric vector
Ammonicera.lineofuscataa numeric vector
Ammonicera.minortalisa numeric vector
Rissoella.galbaa numeric vector
Pyramidellidae.sp.a numeric vector
Pseudoscilla.babyloniaa numeric vector
Details
Cayo Nuevo is a small reef cay located 240 km off the North-Western coast of Yucatan. Data correspond to a study about the biodiversity of marine benthic reef habitats off the Yucatan shelf (Ortigosa, Suarez-Mozo, Barrera et al. 2018).
Source
https://doi.org/10.3897/zookeys.779.24562
References
Ortigosa, D., Suarez-Mozo, N. Y., Barrera, N. C., & Simoes, N. (2018). First survey of Interstitial molluscs from Cayo Nuevo, Campeche Bank, Gulf of Mexico. Zookeys, 779. doi:10.3897/zookeys.779.24562
Examples
data(micromollusk)
Epibionts on Caribbean mangrove roots: pilot data
Description
Data corresponds to a pilot study abput epibenthic organisms in mangrove roots from Laguna de La Restinga National Park, Venezuela (Guerra-Castro et al. 2011).
Usage
data("pilot")
Format
A data frame with 180 observations on the following 118 variables.
Sectora factor with levels
EIMSitea numeric vector
sp1a numeric vector
sp2a numeric vector
sp3a numeric vector
sp4a numeric vector
sp5a numeric vector
sp6a numeric vector
sp7a numeric vector
sp8a numeric vector
sp9a numeric vector
sp10a numeric vector
sp11a numeric vector
sp12a numeric vector
sp13a numeric vector
sp14a numeric vector
sp15a numeric vector
sp16a numeric vector
sp17a numeric vector
sp18a numeric vector
sp19a numeric vector
sp20a numeric vector
sp21a numeric vector
sp22a numeric vector
sp23a numeric vector
sp24a numeric vector
sp25a numeric vector
sp26a numeric vector
sp27a numeric vector
sp28a numeric vector
sp29a numeric vector
sp30a numeric vector
sp31a numeric vector
sp32a numeric vector
sp33a numeric vector
sp34a numeric vector
sp35a numeric vector
sp36a numeric vector
sp37a numeric vector
sp38a numeric vector
sp39a numeric vector
sp40a numeric vector
sp41a numeric vector
sp42a numeric vector
sp43a numeric vector
sp44a numeric vector
sp45a numeric vector
sp46a numeric vector
sp47a numeric vector
sp48a numeric vector
sp49a numeric vector
sp50a numeric vector
sp51a numeric vector
sp52a numeric vector
sp53a numeric vector
sp54a numeric vector
sp55a numeric vector
sp56a numeric vector
sp57a numeric vector
sp58a numeric vector
sp59a numeric vector
sp60a numeric vector
sp61a numeric vector
sp62a numeric vector
sp63a numeric vector
sp64a numeric vector
sp65a numeric vector
sp66a numeric vector
sp67a numeric vector
sp68a numeric vector
sp69a numeric vector
sp70a numeric vector
sp71a numeric vector
sp72a numeric vector
sp73a numeric vector
sp74a numeric vector
sp75a numeric vector
sp76a numeric vector
sp77a numeric vector
sp78a numeric vector
sp79a numeric vector
sp80a numeric vector
sp81a numeric vector
sp82a numeric vector
sp83a numeric vector
sp84a numeric vector
sp85a numeric vector
sp86a numeric vector
sp87a numeric vector
sp88a numeric vector
sp89a numeric vector
sp90a numeric vector
sp91a numeric vector
sp92a numeric vector
sp93a numeric vector
sp94a numeric vector
sp95a numeric vector
sp96a numeric vector
sp97a numeric vector
sp98a numeric vector
sp99a numeric vector
sp100a numeric vector
sp101a numeric vector
sp102a numeric vector
sp103a numeric vector
sp104a numeric vector
sp105a numeric vector
sp106a numeric vector
sp107a numeric vector
sp108a numeric vector
sp109a numeric vector
sp110a numeric vector
sp111a numeric vector
sp112a numeric vector
sp113a numeric vector
sp114a numeric vector
sp115a numeric vector
sp116a numeric vector
Details
Data consists of the coverage (by point-intercept) of 116 taxa identified in 180 mangrove roots, sampled under a hierarchically nested spatial design that included six random sites within each of three sectors of the lagoon system corresponding to a strong environmental gradient: external (E), intermediate (M), and internal (I). The abundance of epibenthic organisms of 10 roots were described within each site, producing a total of 60 roots in each sector. The analysis of these pilot data defined the sampling design used by Guerra-Castro et al. (2016).
Source
https://www.interciencia.net/wp-content/uploads/2018/01/923-GUERRA-8.pdf
References
Guerra-Castro, E., J. J. Cruz-Motta, and J. E. Conde. 2011. Cuantificación de la diversidad de especies incrustantes asociadas a las raíces de Rhizophora mangle L. en el Parque Nacional Laguna de La Restinga. Interciencia 36:923-930.
Guerra-Castro, E. J., J. E. Conde, and J. J. Cruz-Motta. (2016). Scales of spatial variation in tropical benthic assemblages and their ecological relevance: epibionts on Caribbean mangrove roots as a model system. Marine Ecology Progress Series 548:97-110.
Examples
data(pilot)
str(pilot)
SSP Plot: Visualization of MultSE and Sampling Effort
Description
Plots the relationship between MultSE and sampling effort using results from SSP simulations.
Usage
plot_ssp(xx, opt, multi.site)
Arguments
xx |
A data frame generated by |
opt |
A vector or data matrix generated by |
multi.site |
Logical. Indicates whether several sites were simulated. |
Details
This function visualizes the behavior of MultSE (pseudo-multivariate standard error) as sampling effort increases. If simulations involve two sampling scales (e.g., sites and samples), separate graphs are generated. Two shaded bands highlight sub-optimal (light grey) and optimal (dark grey) improvements in precision. The graph also displays the relative gain in precision (as cumulative percentage) for each level of sampling effort, compared to the lowest.
This visualization helps identify when additional sampling effort results in diminishing returns.
The plot is generated using ggplot2 and can be further customized.
Value
A ggplot2 object.
Note
This is an exploratory plot and can be edited or extended using standard ggplot2 functions.
References
Guerra-Castro, E.J., Cajas, J.C., Simões, N., Cruz-Motta, J.J., & Mascaró, M. (2021). SSP: an R package to estimate sampling effort in studies of ecological communities. Ecography 44(4), 561-573. doi: doi:10.1111/ecog.05284
Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer.
See Also
Examples
## Single site: micromollusk from Cayo Nuevo (Yucatan, Mexico)
data(micromollusk)
par.mic <- assempar(data = micromollusk, type = "P/A", Sest.method = "average")
sim.mic <- simdata(par.mic, cases = 3, N = 20, sites = 1)
sam.mic <- sampsd(dat.sim = sim.mic, Par = par.mic, transformation = "P/A",
method = "jaccard", n = 10, m = 1, k = 3)
summ.mic <- summary_ssp(results = sam.mic, multi.site = FALSE)
opt.mic <- ioptimum(xx = summ.mic, multi.site = FALSE)
plot_ssp(xx = summ.mic, opt = opt.mic, multi.site = FALSE)
## Multiple sites: Sponges from Alacranes National Park (Yucatan, Mexico)
data(sponges)
par.spo <- assempar(data = sponges, type = "counts", Sest.method = "average")
sim.spo <- simdata(par.spo, cases = 3, N = 10, sites = 3)
sam.spo <- sampsd(dat.sim = sim.spo, Par = par.spo, transformation = "square root",
method = "bray", n = 10, m = 3, k = 3)
summ.spo <- summary_ssp(results = sam.spo, multi.site = TRUE)
opt.spo <- ioptimum(xx = summ.spo, multi.site = TRUE)
plot_ssp(xx = summ.spo, opt = opt.spo, multi.site = TRUE)
Sampling Simulated Data and Estimation of Multivariate Standard Errors
Description
For each simulated data set, this function performs repeated sampling across a range of effort levels and estimates the corresponding MultSE (pseudo-multivariate standard error) using dissimilarity-based methods.
Usage
sampsd(dat.sim, Par, transformation, method, n, m, k)
Arguments
dat.sim |
A list of simulated data sets generated by |
Par |
A list of parameters estimated by |
transformation |
Mathematical transformation to reduce the influence of dominant species: one of "square root", "fourth root", "Log (X+1)", "P/A", or "none". |
method |
Dissimilarity metric to use, passed to |
n |
Maximum number of sampling units per site (must be <= total units available). |
m |
Maximum number of sites to sample per data set (must be <= total number of sites). |
k |
Number of repetitions of each sampling configuration (samples × sites) for each data set. |
Details
For multi-site simulations, the function selects subsets of sites (from 2 to m) and then draws n samples per site
using a two-stage sampling method with inclusion probabilities (Tillé, 2006). For single-site simulations, repeated samples of size
2 to n are taken without replacement.
Each sample undergoes the selected transformation and a dissimilarity matrix is computed. MultSE is estimated using:
-
Single site: pseudo-variance, with
MultSE = \sqrt(V/n) -
Multiple sites: mean squares from a PERMANOVA model (residual and site effects)
This procedure is computationally intensive, especially with large k. Start with low values for exploration.
Value
A matrix containing the estimated MultSE values for each simulated data set, sampling effort combination,
and repetition. This matrix is used by summary_ssp.
Note
For quick exploratory analysis, use small k. Once optimal sampling effort is explored,
rerun with larger k (e.g. 100). Computation time will increase accordingly.
References
Anderson, M. J., & Santana-Garcon, J. (2015). Measures of precision for dissimilarity-based multivariate analysis of ecological communities. Ecology Letters, 18(1), 66–73.
Guerra-Castro, E. J., Cajas, J. C., Simoes, N., Cruz-Motta, J. J., & Mascaro, M. (2021). SSP: An R package to estimate sampling effort in studies of ecological communities. Ecography, 44(4), 561–573. doi:10.1111/ecog.05284
Tillé, Y. (2006). Sampling Algorithms. Springer, New York.
See Also
assempar, simdata, summary_ssp, vegdist
Examples
## Single site example
data(micromollusk)
par.mic <- assempar(data = micromollusk, type = "P/A", Sest.method = "average")
sim.mic <- simdata(par.mic, cases = 3, N = 20, sites = 1)
sam.mic <- sampsd(dat.sim = sim.mic, Par = par.mic, transformation = "P/A",
method = "jaccard", n = 10, m = 1, k = 3)
## Multiple site example
data(sponges)
par.spo <- assempar(data = sponges, type = "counts", Sest.method = "average")
sim.spo <- simdata(par.spo, cases = 3, N = 20, sites = 3)
sam.spo <- sampsd(dat.sim = sim.spo, Par = par.spo, transformation = "square root",
method = "bray", n = 10, m = 3, k = 3)
Simulation of Ecological Data Sets
Description
Simulates multiple ecological data sets using parameters estimated from a pilot study. The output can be used in downstream SSP functions for quality evaluation and sampling effort estimation.
Usage
simdata(Par, cases, N, sites)
Arguments
Par |
A list of parameters estimated by |
cases |
Number of data sets to simulate. |
N |
Number of samples to simulate in each site. |
sites |
Number of sites to simulate in each data set. |
Details
Presence/absence data are simulated using Bernoulli trials based on empirical frequencies of occurrence among sites (for site-level presence) and within sites (for local occurrence patterns). These matrices are then converted into abundance matrices using values drawn from Poisson or negative binomial distributions (for count data), or from log-normal distributions (for continuous data like coverage or biomass), depending on the aggregation properties estimated in the pilot data.
This process is repeated cases times, producing a list of simulated data sets that reflect the statistical properties of the original
assemblage, but without incorporating environmental constraints or species co-occurrence structures.
Value
A list of simulated community data sets, to be used by datquality and sampsd.
Note
This simulation assumes that differences in composition or abundance are due to spatial aggregation, as captured by the pilot data. It does not incorporate environmental gradients or species associations. For more advanced modeling of species associations, copula-based approaches as suggested by Anderson et al. (2019) may be integrated in future versions of SSP.
References
Anderson, M. J., & Walsh, D. C. I. (2013). PERMANOVA, ANOSIM, and the Mantel test in the face of heterogeneous dispersions: What null hypothesis are you testing? Ecological Monographs, 83(4), 557–574.
Anderson, M. J., de Valpine, P., Punnett, A., & Miller, A. E. (2019). A pathway for multivariate analysis of ecological communities using copulas. Ecology and Evolution, 9, 3276–3294.
Guerra-Castro, E.J., Cajas, J.C., Simões, N., Cruz-Motta, J.J., & Mascaró, M. (2021). SSP: an R package to estimate sampling effort in studies of ecological communities. Ecography 44(4), 561-573. doi: doi:10.1111/ecog.05284
McArdle, B. H., & Anderson, M. J. (2004). Variance heterogeneity, transformations, and models of species abundance: a cautionary tale. Canadian Journal of Fisheries and Aquatic Sciences, 61, 1294–1302.
See Also
Examples
## Single site simulation
data(micromollusk)
par.mic <- assempar(data = micromollusk, type = "P/A", Sest.method = "average")
sim.mic <- simdata(par.mic, cases = 3, N = 10, sites = 1)
## Multiple site simulation
data(sponges)
par.spo <- assempar(data = sponges, type = "counts", Sest.method = "average")
sim.spo <- simdata(par.spo, cases = 3, N = 10, sites = 3)
Sponges in Alacranes Reef National Park (ARNP), Gulf of Mexico, Mexico
Description
Counts of 41 species of sponges in 36 transects of 20 m * 1 m across 8 sites around ARNP
Usage
data("sponges")
Format
A data frame with 36 observations on the following 42 variables.
siteFactor w/ 6 levels
Agelas.clathrodesa numeric vector
Agelas.dispara numeric vector
Agelas.tubulataa numeric vector
Agelas.wiedenmayeria numeric vector
Aiolocroia.crassaa numeric vector
Amphimedon.copressaa numeric vector
Aplysina.archeria numeric vector
Aplysina.cauliformisa numeric vector
Aplysina.fistularisa numeric vector
Aplysina.fulvaa numeric vector
Aplysina.insularisa numeric vector
Aplysina.lacunosaa numeric vector
Callyspongia.pliciferaa numeric vector
Callyspongia.vaginalisa numeric vector
Callispongia.fallaxa numeric vector
Callispongia.armigeraa numeric vector
Cliona.delitrixa numeric vector
Cliona.variansa numeric vector
Cribochalina.vascoluma numeric vector
Dragmacidon.sp.a numeric vector
Dysidea.variabilisa numeric vector
Ectyoplasia.feroxa numeric vector
Geodia.neptunia numeric vector
Hymeniacidon.caeruleaa numeric vector
Iotrochota.birotulataa numeric vector
Igernella.notabilisa numeric vector
Ircinia.felixa numeric vector
Ircinia.strobilinaa numeric vector
Monanchora.arbusculaa numeric vector
Mycale.laxissimaa numeric vector
Mycale.laevisa numeric vector
Nipahtes.amorphaa numeric vector
Niphates.erectaa numeric vector
Niphathes.digitalisa numeric vector
Phorbas.amaranthusa numeric vector
Scopalina.rutzleria numeric vector
Svenezea.flavaa numeric vector
Spirastrella.coccineaa numeric vector
Verongula.reswiguia numeric vector
Verongula.rigidaa numeric vector
Xestospongia.mutaa numeric vector
Details
This data corresponds to a pilot study about sponge biodiversity in reef habitats in the Yucatán shelf (Ugalde et al., 2015)
Source
https://biotaxa.org/Zootaxa/article/view/zootaxa.3911.2.1
References
Ugalde, D., Gomez, P., & Simoes, N. (2015). Marine sponges (Porifera: Demospongiae) from the Gulf of Mexico, new records and redescription of Erylus trisphaerus (de Laubenfels, 1953). Zootaxa, 3911(2), 151-183.
Examples
data(sponges)
str(sponges)
Summary of MultSE for Each Sampling Effort in Simulated Data Sets
Description
Computes the average MultSE (pseudo-multivariate standard error) for each sampling effort across simulated datasets, and estimates associated variation and rate of change.
Usage
summary_ssp(results, multi.site)
Arguments
results |
A matrix generated by |
multi.site |
Logical. Indicates whether multiple sites were simulated. |
Details
For each sampling effort in each simulated data set, the average MultSE is computed (Anderson & Santana-Garcon, 2015). The function then calculates the overall mean and associated lower and upper quantiles of these averages. To evaluate how precision improves with effort, the average MultSE values are relativized to the maximum (typically at the lowest effort), and a numerical forward finite difference derivative is calculated to approximate the rate of change.
This output is used to support the identification of optimal and redundant sampling efforts based on precision gain.
Value
A data frame summarizing MultSE for each sampling effort, including the mean, quantiles, relativized values, and estimated derivative.
Note
This data frame can be used to plot MultSE versus sampling effort and to apply cutoff rules using ioptimum.
References
Anderson, M. J., & Santana-Garcon, J. (2015). Measures of precision for dissimilarity-based multivariate analysis of ecological communities. Ecology Letters, 18(1), 66–73.
Guerra-Castro, E.J., Cajas, J.C., Simões, N., Cruz-Motta, J.J., & Mascaró, M. (2021). SSP: an R package to estimate sampling effort in studies of ecological communities. Ecography 44(4), 561-573. doi: doi:10.1111/ecog.05284
See Also
Examples
## Single site example
data(micromollusk)
par.mic <- assempar(data = micromollusk, type = "P/A", Sest.method = "average")
sim.mic <- simdata(par.mic, cases = 3, N = 10, sites = 1)
sam.mic <- sampsd(dat.sim = sim.mic, Par = par.mic, transformation = "P/A",
method = "jaccard", n = 10, m = 1, k = 3)
summ.mic <- summary_ssp(results = sam.mic, multi.site = FALSE)
## Multiple site example
data(sponges)
par.spo <- assempar(data = sponges, type = "counts", Sest.method = "average")
sim.spo <- simdata(par.spo, cases = 3, N = 20, sites = 3)
sam.spo <- sampsd(dat.sim = sim.spo, Par = par.spo, transformation = "square root",
method = "bray", n = 10, m = 3, k = 3)
summ.spo <- summary_ssp(results = sam.spo, multi.site = TRUE)