README

The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

defm: Fit and simulate discrete binary exponential family models

Discrete exponential family models (DEFM) have a long tradition with extensive development rooted in exponential random graph models (ERGMs.) Applicable to any form o data that can be represented as binary arrays, DEFMs provide a way to model jointly distributed binary variables.

This package, built on top of the C++ library barry, provides a computationally efficient implementation of this family of models.

Installation

remotes::install_github("UofUEpi/defm")

Examples

Example 1: Four outcomes

In this example, we will simulate a dataset that contains 1,000 individuals with four different outcomes. The outcomes, 0/1 vectors, will be modeled as Markov processes of order one. Future states of this 0/1 vector are a function of the previous point in time. The following lines simulate the baseline data:

library(defm)
#> Loading required package: stats4

# Simulation parameters
set.seed(1231)
n   <- 5000L # Count of individuals
n_y <- 3L    # Number of dependent variables
n_x <- 2L    # Number of independent variables

# Simulating how many observations we will have per individuals
n_reps <- sample(3:10, n, replace = TRUE)

# Final number of rows in the data
n_t <- sum(n_reps)

# Simulating the data
Y <- matrix(as.integer(runif(n_y * n_t) < .1), ncol = n_y)
colnames(Y) <- paste0("y", 1:n_y - 1)
X <- matrix(rnorm(n_x * n_t), ncol = n_x)
colnames(X) <- paste0("x", 1:n_x - 1L)
id <- rep(1L:n, n_reps)
time <- unlist(sapply(n_reps, \(x) 1:x))

Here is a brief look at the data structure. Remember, we still have not actually simulated data WITH THE MODEL.

id	time	y0	y1	y2	x0	x1
1	1	0	0	0	0.51	0.95
1	2	0	0	0	0.16	0.25
1	3	0	0	0	1.20	-1.72
1	4	0	1	0	-0.20	1.55
2	1	0	1	0	-0.15	-0.68
2	2	1	0	0	1.19	0.92
2	3	0	0	1	-0.65	1.16
2	4	0	0	0	-0.99	0.21
2	5	0	0	0	0.76	1.45
2	6	0	0	0	-0.68	-1.34

In defm, transition statistics can be represented using matrices. In this case, the transition can be written as:

> matrix(c(1,NA_integer_,NA_integer_,1,1,NA_integer_), nrow = 2, byrow = TRUE)
#      [,1] [,2] [,3]
# [1,]    1   NA   NA
# [2,]    1    1   NA

The NA entries in the matrix mean that those can be either zero or one. In other words, only values different from NA will be considered for specifying the terms. Here is the factory function:

# Creating the model and adding a couple of terms
build_model <- function(id., Y., X., order. = 1, par. = par.) {
  
  # Mapping the data to the C++ wrapper
  d_model. <- new_defm(id., Y., X., order = order.)

  # Adding the model terms
  td_ones(d_model.)
  td_ones(d_model., covar = "x1")
  
  transition <- matrix(NA_integer_, nrow = order. + 1, ncol = ncol(Y.))
  transition[c(1,2,4)] <- 1
  
  td_generic(d_model., transition)
  
  # Initializing the model
  init_defm(d_model.)
  
  # Returning
  d_model.
  
}

With this factory function, we will use it to simulate some data with the same dimensions of the original dataset. In this case, the parameters used for the simulation will be:

sim_par <- c(-2, 2, 5)
d_model <- build_model(id, Y, X, order = 1L, par. = sim_par)
simulated_Y <- sim_defm(d_model, sim_par)
head(cbind(id, simulated_Y))
#>      id y0 y1 y2
#> [1,]  1  0  0  0
#> [2,]  1  0  0  1
#> [3,]  1  0  0  0
#> [4,]  1  1  1  1
#> [5,]  2  0  1  0
#> [6,]  2  1  1  0

colnames(simulated_Y) <- paste0("y", 1:n_y - 1)
d_model_sim <- build_model(id, simulated_Y, X, order = 1, par. = sim_par)
ans <- defm_mle(d_model_sim)

summary(ans)
#> Maximum likelihood estimation
#> 
#> Call:
#> stats4::mle(minuslogl = minuslog, start = start, method = "L-BFGS-B", 
#>     nobs = nrow_defm(object) + ifelse(morder_defm(object) > 0, 
#>         -nobs_defm(object), 0L), lower = lower, upper = upper)
#> 
#> Coefficients:
#>                         Estimate Std. Error
#> Num. of ones           -2.009506 0.01435063
#> Num. of ones x x1       2.020209 0.01592262
#> Motif {y0+}>{y0+, y1+}  5.051076 0.04649922
#> 
#> -2 log L: 54960.47

Statistical models
	Model 1
Num. of ones	-2.01 (0.01)^***
Num. of ones x x1	2.02 (0.02)^***
Motif {y0+}>{y0+, y1+}	5.05 (0.05)^***
AIC	54966.47
BIC	54991.16
N	27777
^*p < 0.001; ^p < 0.01; ^*p < 0.05

id	y0	y1	y2	x0	x1	Num. of ones	Num. of ones x x1	Motif {y0+}>{y0+, y1+}
1	0	0	0	0.51	0.95	NA	NA	NA
1	0	0	1	0.16	0.25	1	0.25	0
1	0	0	0	1.20	-1.72	0	0.00	0
1	1	1	1	-0.20	1.55	3	4.66	0
2	0	1	0	-0.15	-0.68	NA	NA	NA
2	1	1	0	1.19	0.92	2	1.84	0
2	1	1	1	-0.65	1.16	3	3.48	1
2	1	1	0	-0.99	0.21	2	0.43	1
2	1	1	1	0.76	1.45	3	4.35	1
2	0	0	0	-0.68	-1.34	0	0.00	0

Finally, we can also take a look at the distribution of the log odds. We calculate this by looking at changes in a single entry of the array. For example, the log-odds of having

, which are equivalent to

lo <- logodds(d_model_sim, coef(ans), 1,  2) 
hist(lo, main = "Distribution of Log-odds for term Y(1,2)",
     col="tomato", border = "transparent", breaks = 100)
abline(v=0, lwd = 2, lty = 2, col = "steelblue")

Example 2: A fun model

For fun, imagine that we want to describe a process in which an individual moves sequentially through a set of states. In this example, there are ten different y variables, but the person can only have one of them as active (equal to one.) We can simulate such data using DEFM.

We first need to generate the baseline data for the simulation. This involves creating a matrix of size 20x10 (so we have 20 time points,) filled with zeros in all but the first entry:

n   <- 20L
n_y <- 10L
id <- rep(1L, n)
Y <- matrix(0L, nrow = n, ncol = n_y)
Y[1] <- 1L
X <- matrix(0.0, nrow = n, ncol = 1)

With the data in hand, we can now simulate the process. First, we need to build the model. The key component of the model will be the transition matrices:

# Creating a new instance of a DEFM object
colnames(Y) <- paste0("y", 0:(n_y - 1))
colnames(X) <- "x0"
d_model <- new_defm(id = id, Y = Y, X = X, order = 1)

# Creating the transition terms, these
for (i in (1:(n_y - 1) - 1)) {
  transition <- matrix(NA_integer_, nrow = 2, ncol = n_y)
  transition[c(1:4) + 2 * i] <- c(1,0,0,1)
  td_generic(d_model, transition)
}

# Here is the last transition term
transition <- matrix(NA_integer_, nrow = 2, ncol = n_y)
transition[c(n_y * 2 - 1, n_y * 2, 1, 2)] <- c(1,0,0,1)
td_generic(d_model, transition)

# Adding a term of ones
td_ones(d_model)

# Initializing and simulating
init_defm(d_model)
set.seed(33)
(Y_sim <-sim_defm(d_model, par = c(rep(200, n_y), -5)))
#>       y0 y1 y2 y3 y4 y5 y6 y7 y8 y9
#>  [1,]  1  0  0  0  0  0  0  0  0  0
#>  [2,]  0  1  0  0  0  0  0  0  0  0
#>  [3,]  0  0  1  0  0  0  0  0  0  0
#>  [4,]  0  0  0  1  0  0  0  0  0  0
#>  [5,]  0  0  0  0  1  0  0  0  0  0
#>  [6,]  0  0  0  0  0  1  0  0  0  0
#>  [7,]  0  0  0  0  0  0  1  0  0  0
#>  [8,]  0  0  0  0  0  0  0  1  0  0
#>  [9,]  0  0  0  0  0  0  0  0  1  0
#> [10,]  0  0  0  0  0  0  0  0  0  1
#> [11,]  1  0  0  0  0  0  0  0  0  0
#> [12,]  0  1  0  0  0  0  0  0  0  0
#> [13,]  0  0  0  0  0  0  0  0  0  0
#> [14,]  0  0  0  0  0  0  0  0  0  0
#> [15,]  0  0  0  0  0  0  0  0  0  0
#> [16,]  0  0  0  0  0  0  0  0  0  0
#> [17,]  0  0  0  0  0  0  0  0  0  0
#> [18,]  0  0  0  0  0  0  0  0  0  0
#> [19,]  0  0  0  0  0  0  1  0  0  0
#> [20,]  0  0  0  0  0  0  0  0  0  0

Example 3: Using formulas for transitions

In this example, we will redo the previous model, but now use formulas for specifying the transitions:

d_model_formula <- new_defm(id = id, Y = Y, X = X, order = 1)

# We can use text formulas to add transition terms
d_model_formula <- d_model_formula + 
  "{y0, 0y1} > {0y0, y1}" +
  "{y1, 0y2} > {0y1, y2}" +
  "{y2, 0y3} > {0y2, y3}" +
  "{y3, 0y4} > {0y3, y4}" +
  "{y4, 0y5} > {0y4, y5}" +
  "{y5, 0y6} > {0y5, y6}" +
  "{y6, 0y7} > {0y6, y7}" +
  "{y7, 0y8} > {0y7, y8}" +
  "{y8, 0y9} > {0y8, y9}" +
  "{0y0, y9} > {y0, 0y9}" 

d_model_formula |>
  td_ones()

init_defm(d_model_formula)

# Inspecting
d_model_formula
#> Num. of Arrays       : 19
#> Support size         : 2
#> Support size range   : [11, 20]
#> Arrays in powerset   : 2048
#> Transform. Fun.      : no
#> Model terms (11)    :
#>  - Motif {y0+, y1-}>{y0-, y1+}
#>  - Motif {y1+, y2-}>{y1-, y2+}
#>  - Motif {y2+, y3-}>{y2-, y3+}
#>  - Motif {y3+, y4-}>{y3-, y4+}
#>  - Motif {y4+, y5-}>{y4-, y5+}
#>  - Motif {y5+, y6-}>{y5-, y6+}
#>  - Motif {y6+, y7-}>{y6-, y7+}
#>  - Motif {y7+, y8-}>{y7-, y8+}
#>  - Motif {y8+, y9-}>{y8-, y9+}
#>  - Motif {y0-, y9+}>{y0+, y9-}
#>  - Num. of ones
#> Model rules (1)     :
#>  - Markov model of order 1
#> Model Y variables (10):
#>   0) y0
#>   1) y1
#>   2) y2
#>   3) y3
#>   4) y4
#>   5) y5
#>   6) y6
#>   7) y7
#>   8) y8
#>   9) y9

# Simulating
set.seed(33)
(Y_sim_formula <- sim_defm(d_model_formula, par = c(rep(200, n_y), -5)))
#>       y0 y1 y2 y3 y4 y5 y6 y7 y8 y9
#>  [1,]  1  0  0  0  0  0  0  0  0  0
#>  [2,]  0  1  0  0  0  0  0  0  0  0
#>  [3,]  0  0  1  0  0  0  0  0  0  0
#>  [4,]  0  0  0  1  0  0  0  0  0  0
#>  [5,]  0  0  0  0  1  0  0  0  0  0
#>  [6,]  0  0  0  0  0  1  0  0  0  0
#>  [7,]  0  0  0  0  0  0  1  0  0  0
#>  [8,]  0  0  0  0  0  0  0  1  0  0
#>  [9,]  0  0  0  0  0  0  0  0  1  0
#> [10,]  0  0  0  0  0  0  0  0  0  1
#> [11,]  1  0  0  0  0  0  0  0  0  0
#> [12,]  0  1  0  0  0  0  0  0  0  0
#> [13,]  0  0  0  0  0  0  0  0  0  0
#> [14,]  0  0  0  0  0  0  0  0  0  0
#> [15,]  0  0  0  0  0  0  0  0  0  0
#> [16,]  0  0  0  0  0  0  0  0  0  0
#> [17,]  0  0  0  0  0  0  0  0  0  0
#> [18,]  0  0  0  0  0  0  0  0  0  0
#> [19,]  0  0  0  0  0  0  1  0  0  0
#> [20,]  0  0  0  0  0  0  0  0  0  0

Acknowledgement

This work was supported by the Assistant Secretary of Defense for Health Affairs endorsed by the Department of Defense, through the Psychological Health/Traumatic Brain Injury Research Program Long-Term Impact of Military-Relevant Brain Injury Consortium (LIMBIC) Award/W81XWH-18-PH/TBIRP-LIMBIC under Award No. I01 RX003443. The U.S. Army Medical Research Acquisition Activity, 839 Chandler Street, Fort Detrick MD 21702-5014 is the awarding and administering acquisition office. Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the Department of Defense. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the U.S. Government, the U.S. Department of Veterans Affairs or the Department of Defense and no official endorsement should be inferred.

The package is part of the ORION project, supported by the U.S. Army Medical Research Acquisition Activity Award, Contact #W81XWH1910615. This study was approved by the University of Utah Institutional Review Board (IRB #137948) and at each of the military treatment facility sites.

Code of Conduct

Please note that the defm project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.