The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Simulation and Estimation for each group

Kazi Tanvir Hasan and Dr. Gabriel Odom

Introduction

This vignette demonstrates how to simulate multivariate normal data and multivariate skewed Gamma data using pre-estimated statistics or datasets.

In this guide, we cover the following steps:

  1. Simulate Multivariate Normal Data: Use pre-estimated statistics (mean vector and covariance matrix) to generate multivariate normal data using the simulate_group_data() function with MASS::mvrnorm() data generation function.

  2. Estimate Multivariate Moments: Using a dataset to estimate key statistics such as mean, variance, correlation, and skewness using calculate_stats_gaussian() function.

  3. Simulate Multivariate Skewed Gamma Data: Use pre-estimated statistics to generate multivariate skewed Gamma data using the simulate_group_data() function with generate_mvGamma_data data generation function.

  4. Estimate Multivariate Skewed Gamma Parameters: Estimate skewed Gamma distribution parameters (shape and rate) using the calculate_stats_gamma() function. Choose either Method of Moments (MoM) or Generalized Maximum Likelihood Estimation (gMLE) methods based on an existing dataset.

Setup

We will load the necessary libraries and assume that we already have pre-estimated statistics for the multivariate normal and skewed Gamma data generation.

# Load necessary libraries
library(MASS)         # For generating multivariate normal data
library(simBKMRdata)  # For generating skewed Gamma data and estimating moments

Simulate Multivariate Normal Data (Using Pre-Estimated Statistics)

Given the pre-estimated mean vector and covariance matrix, we can generate multivariate normal data using the MASS::mvrnorm() function.

# Example using MASS::mvrnorm for normal distribution
param_list <- list(
  Group1 = list(
    mean_vec = c(1, 2), 
    sampCorr_mat = matrix(c(1, 0.5, 0.5, 1), 2, 2), 
    sampSize = 100
  ),
  Group2 = list(
    mean_vec = c(2, 3), 
    sampCorr_mat = matrix(c(1, 0.3, 0.3, 1), 2, 2), 
    sampSize = 150
  )
)

mvnorm_samples <- simulate_group_data(param_list, MASS::mvrnorm, "Group")

Visualizing the Multivariate Normal Data

Let’s visualize the first two variables of the generated multivariate normal data.

# Plot the first two variables of the multivariate normal data
plot(
  mvnorm_samples[, 1], mvnorm_samples[, 2],
  main = "Scatterplot: MV Normal Data",
  xlab = "Variable 1", ylab = "Variable 2",
  pch = 19, col = "blue"
)

2. Estimate Multivariate Moments (Using an Existing Dataset)

Suppose we already have a dataset and need to estimate key multivariate moments such as mean, variance, skewness, and correlation. We can use the estimate_mv_moments() function to calculate these statistics.

Example: Estimating Moments from an Existing Dataset

myData <- data.frame(
  GENDER = c('Male', 'Female', 'Male', 'Female', 'Male', 'Female'),
  VALUE1 = c(1.2, 2.3, 1.5, 2.7, 1.35, 2.5),
  VALUE2 = c(3.4, 4.5, 3.8, 4.2, 3.6, 4.35)
)
calculate_stats_gaussian(data_df = myData, group_col = "GENDER")
$Female
$Female$sampSize
[1] 3

$Female$mean_vec
VALUE1 VALUE2 
  2.50   4.35 

$Female$sampSD
VALUE1 VALUE2 
  0.20   0.15 

$Female$sampCorr_mat
       VALUE1 VALUE2
VALUE1      1     -1
VALUE2     -1      1

$Female$sampSkew
[1] 0.00000e+00 5.91091e-15


$Male
$Male$sampSize
[1] 3

$Male$mean_vec
VALUE1 VALUE2 
  1.35   3.60 

$Male$sampSD
VALUE1 VALUE2 
  0.15   0.20 

$Male$sampCorr_mat
       VALUE1 VALUE2
VALUE1      1      1
VALUE2      1      1

$Male$sampSkew
[1] -4.411766e-15  2.240684e-15

Output Explanation

The moment_estimates object contains:

3. Simulate Multivariate Skewed Gamma Data (Using Pre-Estimated Statistics)

We can now generate multivariate skewed Gamma data based on pre-estimated shape and rate parameters for the Gamma distribution. This can be done using the generate_mvGamma_data function.

# Example using generate_mvGamma_data for Gamma distribution
param_list <- list(
   Group1 = list(
     sampCorr_mat = matrix(c(1, 0.5, 0.5, 1), 2, 2),
     shape_num = c(2, 2), 
     rate_num = c(1, 1), 
     sampSize = 100
   ),
   Group2 = list(
     sampCorr_mat = matrix(c(1, 0.3, 0.3, 1), 2, 2),
     shape_num = c(2, 2), 
     rate_num = c(1, 1), 
     sampSize = 150
   )
 
)

gamma_samples <- simulate_group_data(
  param_list, generate_mvGamma_data, "Group"
)

Visualizing the Skewed Gamma Data

Let’s plot the density of the first two variables of the generated skewed Gamma data.

# Plot the density of the first and second variable for Gamma data
old_par_mfrow <- par()[["mfrow"]]
par(mfrow = c(2, 1))
plot(density(gamma_samples[, 1]), main = "Gamma Variable 1", col = "blue")
plot(density(gamma_samples[, 2]), main = "Gamma Variable 2", col = "blue")

par(mfrow = old_par_mfrow)

4. Estimate Multivariate Skewed Gamma Parameters (Using an Existing Dataset)

If we have an existing dataset, we can estimate the parameters of the multivariate skewed Gamma distribution using methods such as Method of Moments (MoM) or Generalized Maximum Likelihood Estimation (gMLE).

Example: Estimating Skewed Gamma Parameters Using MoM

myData <- data.frame(
   GENDER = c('Male', 'Female', 'Male', 'Female', 'Male', 'Female'),
   VALUE1 = c(1.2, 2.3, 1.5, 2.7, 1.35, 2.5),
   VALUE2 = c(3.4, 4.5, 3.8, 4.2, 3.6, 4.35)
)
calculate_stats_gamma(data_df = myData, group_col= "GENDER", using = "MoM")
$Female
$Female$sampSize
[1] 3

$Female$mean_vec
VALUE1 VALUE2 
  2.50   4.35 

$Female$sampCorr_mat
       VALUE1 VALUE2
VALUE1      1     -1
VALUE2     -1      1

$Female$shape_num
VALUE1 VALUE2 
156.25 841.00 

$Female$rate_num
  VALUE1   VALUE2 
 62.5000 193.3333 


$Male
$Male$sampSize
[1] 3

$Male$mean_vec
VALUE1 VALUE2 
  1.35   3.60 

$Male$sampCorr_mat
       VALUE1 VALUE2
VALUE1      1      1
VALUE2      1      1

$Male$shape_num
VALUE1 VALUE2 
    81    324 

$Male$rate_num
VALUE1 VALUE2 
    60     90 

Output Explanation

The list will contain:

Conclusion

In this vignette, we demonstrated how to:

These methods allow you to generate and analyze synthetic multivariate datasets with specific properties based on pre-estimated statistics or available data, which is useful for simulations and statistical analysis in various domains such as finance, healthcare, and engineering.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.