The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
The following gives a description of SimMultiCorrData
’s functions by topic. The user should visit the appropriate help page for more information.
nonnormvar1
simulates one non-normal continuous variable using either Fleishman (1978)’s third-order (method
= “Fleishman”) or Headrick (2002)’s fifth-order (method
= “Polynomial”) approximation. See Comparison of Simulated Distribution to Theoretical Distribution or Empirical Data vignette for an example.
rcorrvar
simulates k_cat
ordinal (\(\Large r \ge 2\) categories), k_cont
continuous, k_pois
Poisson, and/or k_nb
Negative Binomial variables with a specified correlation matrix rho
using Correlation Method 1. The variables are generated from multivariate normal variables with intermediate correlation matrix Sigma
, calculated by findintercorr
, and then transformed appropriately. The ordering of the variables in rho
must be ordinal, continuous, Poisson, and Negative Binomial (note that it is possible for k_cat
, k_cont
, k_pois
, and/or k_nb
to be 0).
rcorrvar2
simulates k_cat
ordinal (\(\Large r \ge 2\) categories), k_cont
continuous, k_pois
Poisson, and/or k_nb
Negative Binomial variables with a specified correlation matrix rho
using Correlation Method 2. The variables are generated from multivariate normal variables with intermediate correlation matrix Sigma
, calculated by findintercorr2
, and then transformed appropriately. The ordering of the variables in rho
must be ordinal, continuous, Poisson, and Negative Binomial (note that it is possible for k_cat
, k_cont
, k_pois
, and/or k_nb
to be 0).
Please see the Comparison of Correlation Method 1 and Correlation Method 2 vignette for more information about the two different simulation pathways.
find_constants
calculates the constants used to generate continuous variables via either Fleishman’s third-order (using fleish
equations) or Headrick’s fifth-order (using poly
equations) polynomial transformation. It attempts to find constants that generate a valid power method pdf. When using Headrick’s method, if no solutions converged or no valid pdf solutions could be found and a vector of sixth cumulant correction values (Six
) is provided, the function will attempt to find the smallest correction value that generates a valid power method pdf. If not, invalid pdf constants will be given.
fleish
contains Fleishman’s third-order polynomial transformation equations.
poly
contains Headrick’s fifth-order polynomial transformation equations.
calc_fisherk
uses Fisher’s k-statistics to calculate the mean, standard deviation, skewness, standardized kurtosis, and standardized fifth and sixth cumulants given a vector of data.
calc_moments
uses the method of moments to calculate the mean, standard deviation, skewness, standardized kurtosis, and standardized fifth and sixth cumulants given a vector of data.
calc_theory
calculates the mean, standard deviation, skewness, standardized kurtosis, and standardized fifth and sixth cumulants given either a distribution name with up to 4 associated parameters or pdf function fx with lower and upper support bounds. There are 39 available distributions by name. Please see the appropriate help pages for information regarding parameter inputs (in the VGAM
Yee (2018), triangle
Carnell (2016), or stats
R Core Team (2018) packages).
cdf_prob
calculates a cumulative probability using the theoretical power method cdf \(\Large F_p(Z)(p(z)) = F_p(Z)(p(z), F_Z(z))\) up to \(\Large sigma * y + mu = delta\), where \(\Large y = p(z)\), after using pdf_check
to verify that the given constants produce a valid pdf. If the given constants do not produce a valid power method pdf, a warning is given.
power_norm_corr
calculates the correlation between a continuous variable produced using a polynomial transformation and the generating standard normal variable. If the correlation is <= 0, the signs of c1 and c3 should be reversed (for method
= “Fleishman”), or c1, c3, and c5 (for method
= “Polynomial”). These sign changes have no effect on the cumulants of the resulting distribution.
pdf_check
determines if a given set of constants generates a valid power method pdf. This requires yielding a continuous variable with a positive correlation with the generating standard normal variable and satisfying certain contraints that vary by approximation method (see Headrick and Kowalchuk (2007)).
sim_cdf_prob
calculates the simulated (empirical) cumulative probability up to a given y-value (delta
). It uses Martin Maechler’s stats::ecdf
function to find the empirical cdf \(\Large Fn\). \(\Large Fn\) is a step function with jumps \(\Large i/n\) at observation values, where \(\Large i\) is the number of tied observations at that value. Missing values are ignored. For observations \(\Large y = (y1, y2, ..., yn)\), \(\Large Fn\) is the fraction of observations less or equal to \(\Large t\), i.e., \(\Large Fn(t) = \#[y_{i} <= t]/n\).
stats_pdf
calculates the \(\Large 100 * \alpha %\) symmetric trimmed mean (\(\Large 0 < \alpha < 0.50\)), median, mode, and maximum height of a valid power method pdf using the equations given by Headrick & Kowalchuk (2007).
calc_lower_skurt
determines the lower standardized kurtosis boundary for a continuous variable generated using the power method transformation. This boundary depends on skewness (for Fleishman’s third-order method, see Headrick and Sawilowsky (2002)) or skewness and standardized fifth and sixth cumulants (for Headrick’s fifth-order method, see Headrick (2002)).
fleish_Hessian
calculates the Fleishman transformation Hessian matrix and its determinant, which are used in finding the lower kurtosis boundary for asymmetric distributions.
fleish_skurt_check
contains the Fleishman transformation Lagrangean constraints which are used in finding the lower kurtosis boundary for asymmetric distributions.
poly_skurt_check
contains the Headrick transformation Lagrangean constraints which are used in finding the lower kurtosis boundary.
valid_corr
(correlation method 1) and valid_corr2
(correlation method 2) determine the feasible correlation bounds for ordinal, continuous, Poisson, and/or Negative Binomial variables. If a target correlation matrix rho
is specified, the functions check each pairwise correlation to see if it falls within the bounds. The indices of any variable pair with a target correlation that is outside the bounds are given. If continuous variables are required, the functions return the calculated constants, the required sixth cumulant correction (if a Six
vector of possible values was given), and whether each set of constants generate a valid power method pdf.
findintercorr
(correlation method 1) and findintercorr2
(correlation method 2) are the two main intermediate correlation calculation functions. These functions call the other functions:
chat_nb
calculates the upper Frechet-Hoeffding correlation bound for Negative Binomial - Normal variable pairs used to determine the intermediate correlation for Negative Binomial - Continuous variable pairs in method 1.
chat_pois
calculates the upper Frechet-Hoeffding correlation bound for Poisson - Normal variable pairs used to determine the intermediate correlation for Poisson - Continuous variable pairs in method 1.
denom_corr_cat
is used in intermediate correlation calculations involving ordinal variables (or variables treated as ordinal, as in method 2).
findintercorr_cat_nb
calculates the intermediate correlation for ordinal - Negative Binomial variables in method 1.
findintercorr_cat_pois
calculates the intermediate correlation for ordinal - Poisson variables in method 1.
findintercorr_cont
calculates the intermediate correlation for continuous variables based on either Fleishman’s third-order or Headrick’s fifth-order approximation.
findintercorr_cont_cat
calculates the intermediate correlation for continuous - ordinal variables.
findintercorr_cont_nb
and findintercorr_cont_nb2
calculate the intermediate correlations for continuous - Negative Binomial variables in method 1 or 2 (respectively).
findintercorr_cont_pois
and findintercorr_cont_pois2
calculate the intermediate correlation for continuous - Poisson variables in method 1 or 2 (respectively).
findintercorr_nb
calculates the intermediate correlation for Negative Binomial variables in method 1.
findintercorr_pois
calculates the intermediate correlation for Poisson variables in method 1.
findintercorr_pois_nb
calculates the intermediate correlation for Poisson - Negative Binomial variables in method 1.
intercorr_fleish
contains Fleishman’s third-order polynomial transformation intercorrelation equations.
intercorr_poly
contains Headrick’s fifth-order polynomial transformation intercorrelation equations.
max_count_support
calculates the maximum support value for count variables by extending the method of Barbiero and Ferrari (2015) to include Negative Binomial variables. It is used in method 2.
ordnorm
calculates the intermediate correlation for ordinal variables or variables treated as ordinal (as in method 2). It is based off of GenOrd::ordcont
with some important corrections.
var_cat
is used in intermediate correlation calculations involving ordinal variables (or variables treated as ordinal, as in method 2) to calculate the variance.
error_loop
is the main error_loop function called by rcorrvar
or rcorrvar2
.
error_vars
is used to generate variable pairs within the error loop.
The 8 graphing functions either use simulated data as an input or a set of constants (found by find_constants
or from simulation). In the first case, the empirical cdf or pdf is found. In the second case, the theoretical cdf or pdf is found using the equations from Headrick and Kowalchuk (2007). These functions (plot_cdf
, plot_pdf_ext
, plot_pdf_theory
) work only for continuous variable inputs. The other graphing functions work for continuous or count variable inputs. The graphs either display data values, pdfs, or cdfs. In the case of cdfs of continuous variables, the cumulative probability up to a given y-value (delta) can be calculated and displayed on the graph (using cdf_prob
for a set of constants or sim_cdf_prob
for a vector of simulated data). The empirical cdf can also be graphed for ordinal data. In the case of pdfs or actual data values, the target distribution can be overlayed on the graph. This target distribution can either be an empirical data set, or a distribution specified by name (Dist
plus up to 4 parameters) or by a user-supplied pdf fx
with support bounds. See plot_sim_pdf_theory
for names of Dist
inputs. The graphing functions work for invalid or valid power method pdfs. They are ggplot2
objects so designated graphing parameters (i.e. line color and type, title) can be specified by the user and the results can be further modified as necessary.
plot_cdf
plots the theoretical power method cumulative distribution function \(\Large F_p(Z)(p(z)) = F_p(Z)(p(z), F_Z(z))\), given a set of constants. If calc_prob
= TRUE, it will also calculate the cumulative probability up to a user-specified delta
value, where \(\Large sigma * y + mu = delta\) and \(\Large y = p(z)\).
plot_sim_cdf
plots the empirical cdf \(\Large Fn\) of simulated continuous, ordinal, or count data (see ggplot2::stat_ecdf
). If calc_cprob
= TRUE and the variable is continuous, the cumulative probability up to a user-specified y-value (delta
) is calculated (see sim_cdf_prob
) and the region on the plot is filled with a dashed horizontal line drawn at \(\Large Fn(delta)\).
plot_pdf_ext
plots the theoretical probability density function \(\Large f_p(Z)(p(z)) = f_p(Z)(p(z), f_Z(z)/p'(z))\), given a set of constants, and the target pdf calculated from a vector of external data. Unlike in plot_pdf_theory
, the vector of external data is required. If the user wants to plot only the theoretical pdf, plot_pdf_theory
should be used with overlay
= FALSE.
plot_pdf_theory
plots the theoretical probability density function \(\Large f_p(Z)(p(z)) = f_p(Z)(p(z), f_Z(z)/p'(z))\), given a set of constants, and the target pdf (if overlay
= TRUE), given either a continuous distribution name and parameters or a user-supplied pdf fx
(bounds set equal to bounds of simulated data).
plot_sim_ext
plots simulated continuous or count data and overlays external data (both as histograms). Unlike in plot_sim_theory
, the vector of external data is required. If the user wants to plot only the simulated data, plot_sim_theory
should be used with overlay
= FALSE.
plot_sim_pdf_ext
plots the pdf of simulated continuous or count data and overlays the target pdf computed from a vector of external data. Unlike in plot_sim_pdf_theory
, the vector of external data is required. If the user wants to plot only the pdf of simulated data, plot_sim_pdf_theory
should be used with overlay
= FALSE.
plot_sim_pdf_theory
plots the pdf of simulated continuous or count data and overlays the target pdf (if overlay
= TRUE) specified by distribution name and parameters or by pdf fx (bounds set equal to bounds of simulated data).
plot_sim_theory
plots simulated continuous or count data and overlays data (if overlay
= TRUE) randomly generated from a target distribution specified by name and parameters or by pdf fx
(bounds set equal to bounds of simulated data). Both distributions are plotted as histograms. If the target distribution is specified by a function fx
, it must be continuous.
These would not ordinarily be called by the user.
calc_final_corr
calculates the final correlation matrix.
separate_rho
separates a target correlation matrix by variable type.
Barbiero, A, and P A Ferrari. 2015. “Simulation of Correlated Poisson Variables.” Applied Stochastic Models in Business and Industry 31: 669–80. doi:10.1002/asmb.2072.
Carnell, Rob. 2016. Triangle: Provides the Standard Distribution Functions for the Triangle Distribution. https://CRAN.R-project.org/package=triangle.
Fleishman, A I. 1978. “A Method for Simulating Non-Normal Distributions.” Psychometrika 43: 521–32. doi:10.1007/BF02293811.
Headrick, T C. 2002. “Fast Fifth-Order Polynomial Transforms for Generating Univariate and Multivariate Non-Normal Distributions.” Computational Statistics and Data Analysis 40 (4): 685–711. doi:10.1016/S0167-9473(02)00072-5.
Headrick, T C, and R K Kowalchuk. 2007. “The Power Method Transformation: Its Probability Density Function, Distribution Function, and Its Further Use for Fitting Data.” Journal of Statistical Computation and Simulation 77: 229–49. doi:10.1080/10629360600605065.
Headrick, T C, and S S Sawilowsky. 2002. “Weighted Simplex Procedures for Determining Boundary Points and Constants for the Univariate and Multivariate Power Methods.” Journal of Educational and Behavioral Statistics 25: 417–36. doi:10.3102/10769986025004417.
R Core Team. 2018. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Yee, T W. 2018. VGAM: Vector Generalized Linear and Additive Models. https://CRAN.R-project.org/package=VGAM.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.