‘reslife’ Package Demo

Load in the Package

library(reslife)
#> Loading required package: pracma
#> Loading required package: flexsurv
#> Loading required package: survival
#> Loading required package: gsl
#> 
#> Attaching package: 'gsl'
#> The following objects are masked from 'package:pracma':
#> 
#>     Ci, Si, erf, erfc, eta, expint_E1, expint_Ei, fact, hypot, psi,
#>     sinc, zeta

Demonstration of the ‘reslifefsr’ function

This function exclusively uses flexsurvreg() objects as the source of the data. We will show how to use the function below. First, we’ll start off with a simple example and show how the mean produced is equal to the integration.

We run a flexsurvreg() with a single covariate for the ‘gengamma.orig’ distribution. We then run the reslifefsr() function, using the flexsurvreg() object. We have a life of 1, which is a scalar value at which we calculate the residual life, and specify that we only want the mean.

library(flexsurv)
fs1 <- flexsurvreg(Surv(recyrs, censrec) ~ 1, data = bc,dist = "gengamma.orig")
r <- reslifefsr(obj = fs1, life = 1, type = 'mean')
r
#> [1] 7.194142

We can verify the accuracy of our derivation using the integrate function.


b <- exp(as.numeric(fs1$coefficients[1]))
a <- exp(as.numeric(fs1$coefficients[2]))
k <- exp(as.numeric(fs1$coefficients[3]))

# numerical integration
f <- function(x) {
  return(pgengamma.orig(x, shape = b, scale = a, k = k, lower.tail = FALSE))
}

mx_interal <- integrate(f, 1, Inf)$value/pgengamma.orig(1, shape = b, scale = a, k = k, lower.tail = FALSE)
mx_interal
#> [1] 7.194142

As you can see, the mean values are the same.

Demonstration with multiple covariates

What if the flexsurvreg() object has multiple covariates? We will see this below. We will use the ‘bc’ dataset contained within ‘flexsurv’ with an added covariate ‘age’.

data(bc)
newbc <- bc
newbc$age <- rnorm(dim(bc)[1], mean = 65-scale(newbc$recyrs, scale=FALSE),sd = 5)

Generating a flexsurvreg object:

fsr2 =  flexsurvreg(Surv(recyrs, censrec) ~ group+age, data=newbc, dist = 'weibull')

We can now use the reslifefsr() function to generate residual life values. There are 686 values, since the length of the input is 686. We are only going to show 10, in order to save space.

head(reslifefsr(obj = fsr2, life= 4), 10)
#>  [1] 12.132845 12.049737 11.203199 11.069652  9.281034  7.688503  6.642161
#>  [8]  6.013555  6.614462  6.883661

If we want to look at the median, we can specify that.

head(reslifefsr(obj = fsr2,life= 4, type = 'median'),10)
#>  [1] 9.953571 9.882971 9.164317 9.051030 7.536493 6.193590 5.315346 4.789800
#>  [9] 5.292152 5.517696

We can look at all 3 types now (mean, median, and percentile).

head(reslifefsr(obj = fsr2, life= 4, p = .8, type = 'all'),10)
#>         mean   median percentile
#> 1  12.132845 9.953571  19.276992
#> 2  12.049737 9.882971  19.147244
#> 3  11.203199 9.164317  17.824679
#> 4  11.069652 9.051030  17.615869
#> 5   9.281034 7.536493  14.814315
#> 6   7.688503 6.193590  12.310697
#> 7   6.642161 5.315346  10.659819
#> 8   6.013555 4.789800   9.665322
#> 9   6.614462 5.292152  10.616041
#> 10  6.883661 5.517696  11.041322

Notice how when ‘all’ is used, the output is a dataframe with the name of each value as the column names.

But what if we want to supply our own data and make predictions? Since the ‘flexsurv’ regression object generates parameters, we can add our own data points and get the residual life values for each.

Let’s start by generating some data. This data has the same two levels as ‘fsr2’ above (group, age). Note that the data is placed in a data frame. The new data must be inputted as a data frame for the function to work.

group = c("Medium", 'Good', "Poor")
age = c(43, 35, 39)
newdata = data.frame(age, group)

We run that through the reslifefsr() function and get a 3 row data frame

reslifefsr(obj = fsr2,life= 4,p = .6, type = 'all', newdata= newdata)
#>        mean    median percentile
#> 1 14.478463 11.949026   14.88343
#> 2 42.297923 35.756944   43.86586
#> 3  9.985935  8.132717   10.20265

Now we can show off some features of the function, specifically how it handles different input data.

Lets change the order of the variables in the input data and rerun. It will show the exact same results as above.

newdata = data.frame(group, age)
reslifefsr(obj = fsr2,life= 4,p = .6, type = 'all', newdata= newdata)
#>        mean    median percentile
#> 1 14.478463 11.949026   14.88343
#> 2 42.297923 35.756944   43.86586
#> 3  9.985935  8.132717   10.20265

This is a handy feature of the function that can enhance the user experience. Another feature of this is the ability to handle data with extra columns. Let’s show that feature below.

extra = c(100, 100, 100)
newdata2 = data.frame(age, group, extra)
reslifefsr(obj = fsr2,life= 4,p = .6, type = 'all', newdata= newdata2)
#>        mean    median percentile
#> 1 14.478463 11.949026   14.88343
#> 2 42.297923 35.756944   43.86586
#> 3  9.985935  8.132717   10.20265

The extra column ‘extra’ is ignored by the function.

Are there scenarios where the package would return an error? Yes, in fact, there are a few.

The first is if not enough data is provided, which is shown below.

newdata3 = data.frame(group)
reslifefsr(obj = fsr2, life = 4,p = .6, type = 'all', newdata= newdata3)
#> Warning in if (fsoutput$covdata$covnames != colnames(newdata)) {: the condition
#> has length > 1 and only the first element will be used
#>        mean   median percentile
#> 1 156.02073 133.3303   162.1598
#> 2 278.00800 238.0170   288.9824
#> 3  90.03585  76.7077    93.5399

Since only ‘group’ is provided, predictions cannot be generated. The function needs values for ‘age’ in order to make predictions.

Another scenario could happen with factor data. The levels for ‘group’ are ‘Good’, ‘Medium’, and ‘Poor’. But what if a level was inputted that isn’t one of those three? Let’s see.

group = c("Medium", 'Good', "Terrible")
newdata = data.frame(age, group)
reslifefsr(obj = fsr2,life= 4,p = .6, type = 'all', newdata= newdata)
#> Error in weibull_mlr(obj, life, p, type, newdata): Incorrect Level Entered

Since ‘Terrible’ is not a valid level, the function errors and returns a message notifying the user of this error.

It is also possible to plot the mean residual life for a set of values from a ‘flexsurv’ object.

We take the first line from newbc and create a blank vector of length 10. We then loop the function to create the MRLs for the first 10 integers, which gives us the expected survival time given the patient has survived y years.

We can then plot these values.

newdata4 = newbc[1,]
mrl = rep(0,10)
for (i in 1:10){
  mrl[i] = reslifefsr(obj = fsr2,life= i,p = .6, type = 'mean', newdata= newdata4)
}
plot(x=c(1:10), y=mrl,type="l", xlab = 'Years Survived', ylab = 'Mean Residual Life',  main = 'MRL over Time')

That is an overview of how to use the flexsurvreg() dependent function. Next, we will show how to get residual life values by inputting your own parameters.

Demonstration of the ‘residlife’ function

This function works similarly to the reslifefsr() function, except it doesn’t rely on a flexsurvreg() input. Instead, the user inputs the name of the distribution and the parameters. The distribution must be one contained within ‘flexsurv’ (‘gamma’, ‘gengamma.orig’, ‘weibull’, ‘gompertz’, ‘exponential’, ‘lnorm’, ‘llogis’, ‘gengamma’, ‘genf’, ‘genf.orig’), and the names of the parameters must match those specified for each distribution.

We will show a demonstration below.

The inputs for the function are simple. Values is the first argument, which are the values for which the residual life will be calculated. Values is usually a vector or a seq function. The next argument is the distribution name, ‘weibull’. The last argument are the parameters, inputted as a vector. The names, shape and scale, are specified in the vector.

For the examples in this vignette, we will use the output from a flexsurvreg() using the bc dataset.

fsr3 = flexsurvreg(Surv(recyrs, censrec) ~1, data=newbc, dist = 'weibull')
fsr3$res
#>            est     L95%     U95%         se
#> shape 1.271519 1.153372 1.401770 0.06326773
#> scale 6.191377 5.604203 6.840071 0.31475730

We use a sequence of values from 1 to 10, counting up by .5, for this example.

residlife(values = seq(1,10,.5), distribution = 'weibull', parameters= c(shape= 1.272, scale = 6.191))
#>  [1] 5.280618 5.125825 4.994025 4.878801 4.776287 4.683907 4.599837 4.522725
#>  [9] 4.451537 4.385457 4.323832 4.266128 4.211904 4.160787 4.112464 4.066666
#> [17] 4.023161 3.981746 3.942246

What if the parameter names are incorrect? The function will return an error.

residlife(values = seq(1,10,.5), distribution = 'weibull', parameters= c(shape= 1.272, not_scale = 6.191))
#> Error in residlife(values = seq(1, 10, 0.5), distribution = "weibull", : incorrect parameters entered. Parameters for weibull are shape and scale

For distributions where there are different parameterizations, the function can be flexible.

For example, ‘gamma’ has both shape/rate and shape/scale parameterizations. The function works for both, as long as the parameters are correctly specified in the input.

residlife(values = seq(1,10,.5), distribution = 'gamma', parameters= c(shape= 1.272, scale = 6.191))
#>  [1] 7.498217 7.389908 7.302911 7.230487 7.168736 7.115159 7.068047 7.026172
#>  [9] 6.988622 6.954700 6.923859 6.895665 6.869766 6.845874 6.823747 6.803186
#> [17] 6.784020 6.766103 6.749311
residlife(values = seq(1,10,.5), distribution = 'gamma', parameters= c(shape= 1.272, rate = 1/6.191))
#>  [1] 7.498217 7.389908 7.302911 7.230487 7.168736 7.115159 7.068047 7.026172
#>  [9] 6.988622 6.954700 6.923859 6.895665 6.869766 6.845874 6.823747 6.803186
#> [17] 6.784020 6.766103 6.749311

Much like the reslifefsr() function, the reslife() function has the ability to show mean, median, percentile, or all three.

residlife(values = seq(1,10,.5), distribution = 'weibull', parameters= c(shape= 1.272, scale = 6.191), p = .7, type = 'all')
#>        mean   median percentile
#> 1  5.280618 4.151524   6.619995
#> 2  5.125825 3.988283   6.423757
#> 3  4.994025 3.851217   6.253251
#> 4  4.878801 3.733262   6.102230
#> 5  4.776287 3.629997   5.966703
#> 6  4.683907 3.538406   5.843883
#> 7  4.599837 3.456318   5.731711
#> 8  4.522725 3.382114   5.628606
#> 9  4.451537 3.314545   5.533322
#> 10 4.385457 3.252635   5.444853
#> 11 4.323832 3.195598   5.362376
#> 12 4.266128 3.142799   5.285206
#> 13 4.211904 3.093714   5.212767
#> 14 4.160787 3.047908   5.144570
#> 15 4.112464 3.005015   5.080196
#> 16 4.066666 2.964725   5.019284
#> 17 4.023161 2.926773   4.961518
#> 18 3.981746 2.890929   4.906624
#> 19 3.942246 2.856997   4.854361

The outputs of this function can then easily be taken and used in a plot or any other type of analysis tool.

Let’s look at an example of such a plot.

Say we want to plot the mean residual life over time. First, we would create a vector for time, run it through the residlife() function with type = ‘mean’, and plot the output against time.

After extracting the parameters, we can run residlife() and generate an output vector. Note that since residlife() can take a vector input, we don’t have to use a loop.

time= seq(1,10,.5)
life = residlife(values = time, distribution = 'weibull',parameters= c(shape= 1.272, scale = 6.191))
plot(time, life, xlab = 'Years Survived', ylab = 'Mean Residual Life', main = 'MRL over Time', type="l")