library(delaydiscount)
library(dplyr)
#> Warning: package 'dplyr' was built under R version 4.3.3
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, unionThe simulate_dataset function can be used to simulate
data from the hierarchical linearized hyperbolic model.
The hierarchical linearized hyperboloid model works as follows.
The hyperbolic model for the discounting curve is \(D(t) = \frac{1}{1+kt}\). However, actual discounting rates reported by subjects may not fit this curve. Actual discounting proportions are assumed to be between 0 and 1, exclusive. They are transformed from the \((0,1)\) scale to a \((-\infty, \infty)\) scale by the function \(\phi(x) = \log(\frac{1}{x}-1)\).
The model discounting curve with the transformation applied is \(\log(\frac{1}{D(t)} - 1) = \log(k) + \log(t)\). The transformed curve provides a mean structure for the data, conditional on the subject’s \(\log(k)\) parameter.
We assume that the transformed observed indifference point \(\tilde D(t_{ijc})\) will follow the model
\(\log(\frac{1}{\tilde D(t_{ijc})} - 1) = \log(k_{ic}) + \log(t_{ijc}) + \epsilon_{ijc},\)
where \(\epsilon_{ijc} \sim N(0, \sigma^2)\), \(i\) is an index for subject, \(j\) an index for time point, and \(c\) is an index for group/condition.
\(\log(k_{ic}) \sim N(\theta_c, \frac{g\sigma^2}{T})\), where \(\theta_c\) is the population mean of the \(\log(k_{ic})\) for subjects in group \(c\), and \(T\) is the number of time points observed for each subject.
The model is hierarchical, as the \(\log(k_{ic})\) parameter for each subject within a group is assumed to be a realization from a normal distribution with some population mean \(\theta_c\) for its subjects, and variance \(\frac{g\sigma^2}{T}\), i.e. the variance is constant across subjects, across groups.
A call to simulate_dataset is shown below.
set.seed(8678)
sim_data <- simulate_dataset(groups = c("EFT", "HIT", "NCC"),
num_subj = c(120, 142, 172),
time_points = c(30, 90, 180, 365, 1095, 1825, 3650),
mean_ln_k = c(-6.877, -5.861, -6.078),
sigma_sq = 1.978, g = 10.407)The groups vector gives the name of the groups in the
simulated study.
The num_subj vector gives the sample size for each
group.
The time_points vector gives the surveyed delay
lengths.
The mean_ln_k vector gives the group mean
hyperparameters.
The value sigma_sq gives the parameter \(\sigma^2\), which represents the
conditional variance of an observed transformed indifference point \(\log(\frac{1}{\tilde D(t_{ijc})} - 1)\)
given the subject’s \(\log(k_{ic})\)
parameter.
The value g gives the ratio of the variance subject
\(\log(k_{ic})\) parameter to the
conditional variance of the least squares (on the transformed scale)
estimator of a subject’s \(\log(k_{ic})\) parameter given the true
\(\log(k_{ic})\). The result is that
the true \(\log(k_{ic})\) parameters
have variance \(\frac{g\sigma^2}{T}\).
(Recall \(T\) is the length of the
time_points vector.)
The data frame output by sim_data can be analyzed using
our analysis methods. In particular, prepare_data_frame and
jb_rule_check are ready to be run on the data frame.
However, dd_hyperbolic_model and
get_subj_est_ln_k will still need to be run on the output
from prepare_data_frame.
# These methods can be run on the output from simulate_dataset
prep_data <- prepare_data_frame(sim_data)
rule_check_results <- jb_rule_check(sim_data)
# These methods need to be run on the output from prepare_data_frame
subj_est_ln_k <- get_subj_est_ln_k(prep_data)
sim_model <- dd_hyperbolic_model(prep_data)The output from simulate_dataset has one observation per
simulated subject per delay for which an indifference point was
observed. It has columns subj, true_ln_k,
group, delay, and indiff.
The columns other than true_ln_k are necessary to run
the analysis function on the dataset. The subj column
identifies the subject, the group column identifies the
group, the delay column identifies the length of time for
which the observed indifference point is measured, and the
indiff column contains the subject’s indifference point for
the delay expressed as a proportion of the maximum reward.
The true_ln_k column contains the true \(\log(k_{ic})\) parameter of the subject
which the observation corresponds to. This can be used to compare
estimates of the subject \(\log(k_{ic})\) parameters with the true
values.
For example, the following code produces a plot of the true vs estimated subject \(\log(k_{ic})\) parameters, with points colored by rule check result.
# Get a data frame consisting of one observation per subject, containing each
# subject's true ln(k)
subj_true_ln_k <- sim_data %>%
select(subj, true_ln_k, group) %>%
unique()
# Merge the data frames containing the true subject ln_k values, estimated
# subject ln_k values, and rule check results.
subj_ln_k_comp <- merge(subj_est_ln_k, subj_true_ln_k)
subj_ln_k_comp_wrc <- merge(subj_ln_k_comp, rule_check_results)
# Get the overall rule check result for each subject
subj_ln_k_comp_wrc$rc_pass <- subj_ln_k_comp_wrc$C1 & subj_ln_k_comp_wrc$C2
# Make a plot showing true vs estimated ln(k) for each subject,
# coloring the points based on whether or not the corresponding subject
# passed the rule check.
plot(x = subj_ln_k_comp_wrc$true_ln_k, y = subj_ln_k_comp_wrc$ln_k,
col = ifelse(subj_ln_k_comp_wrc$rc_pass, "green", "blue"),
main = "Subject True vs Maximum Likelihood Estimated ln(k)",
xlab = "True subject ln(k)", ylab = "Estimated subject ln(k)")
abline(a=0, b=1)
legend("topleft",
legend = c(
"Passes rule check",
"Fails rule check"),
col = c("green", "blue"), pch = 1)