The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

psrwe: Propensity Score-Integrated Methods for Incorporating Real-World Evidence in Clinical Studies

Chenguang Wang

2026-01-10

Introduction

In the R package psrwe, we implement a series of approaches for leveraging real-world evidence in clinical study design and analysis.

Propensity score estimation

The approaches implemented in psrwe are mostly based on propensity score adjustment. Estimation of propensity scores can be done by using the function psrwe_est.

data(ex_dta)
dta_ps <- psrwe_est(ex_dta,
                     v_covs = paste("V", 1:7, sep = ""),
                     v_grp = "Group",
                     cur_grp_level = "current",
                     nstrata = 5,
                     ps_method = "logistic")
dta_ps

## This is a sing-arm study. A total of 1031 RWD subjects and 
## 200 current study subjects are used to estimate propensity 
## scores by logistic model. A total of 5 RWD subjects are 
## trimmed (trim_ab=both) and excluded from the final 
## analysis. (PS values of 5 and 0 RWD are below and above 
## current study) The following covariates are adjusted in the 
## propensity score model: V1, V2, V3, V4, V5, V6, V7.
## 
## The following table summarizes the number of subjects in 
## each stratum, and the distance in PS distributions 
## calculated by overlapping area:
## 
##     Stratum N_RWD N_Current  Distance
## 1 Stratum 1   729        40 0.5608752
## 2 Stratum 2   156        40 0.7201481
## 3 Stratum 3    78        40 0.8035261
## 4 Stratum 4    50        40 0.8097207
## 5 Stratum 5    13        40 0.7952280

It is extremely important to evaluate the propensity score adjustment results. In psrwe, functions are provided to visualize the balance in covariate distributions and propensity score distributions based on propensity score stratification.

plot(dta_ps, plot_type = "balance")

## Warning: The `.dots` argument of `group_by()` is deprecated as of dplyr 1.0.0.
## ℹ The deprecated feature was likely used in the dplyr package.
##   Please report the issue at <https://github.com/tidyverse/dplyr/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning in grid.Call.graphics(C_rect, x$x, x$y, x$width, x$height,
## resolveHJust(x$just, : semi-transparency is not supported on this device:
## reported only once per page

plot(dta_ps, plot_type = "ps")

## Warning in grid.Call.graphics(C_polygon, x$x, x$y, index): semi-transparency is
## not supported on this device: reported only once per page

PS-integrated power prior approach for single arm studies

For single arm studies when there is one external data source, the function psrwe_powerp allows one to conduct the analysis proposed in Wang et. al. (2019). The method uses propensity score to pre-select a subset of real-world data containing patients that are similar to those in the current study in terms of covariates, and to stratify the selected patients together with those in the current study into more homogeneous strata. The power prior approach is then applied in each stratum to obtain stratum-specific posterior distributions, which are combined to complete the Bayesian inference for the parameters of interest.

ps_bor <- psrwe_borrow(dta_ps, total_borrow = 40,
                        method = "distance")
rst_pp <- psrwe_powerp(ps_bor, v_outcome = "Y_Bin",
                        outcome_type = "binary",
                        seed = 1234)

## 
## SAMPLING FOR MODEL 'powerpsbinary' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 4.3e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.43 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.149 seconds (Warm-up)
## Chain 1:                0.11 seconds (Sampling)
## Chain 1:                0.259 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'powerpsbinary' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 1.7e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.17 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.145 seconds (Warm-up)
## Chain 2:                0.106 seconds (Sampling)
## Chain 2:                0.251 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'powerpsbinary' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 1.6e-05 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.16 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.14 seconds (Warm-up)
## Chain 3:                0.109 seconds (Sampling)
## Chain 3:                0.249 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'powerpsbinary' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 1.6e-05 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.16 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 0.163 seconds (Warm-up)
## Chain 4:                0.118 seconds (Sampling)
## Chain 4:                0.281 seconds (Total)
## Chain 4:

Results can be further summarized as:

summary(rst_pp)

## $Overall
##         Type      Mean     StdErr
## Mean Control 0.3134685 0.02878146

PS-integrated composite likelihood approach for single arm studies

For single arm studies when there is one external data source, the function psrwe_cl allows one to conduct the analysis proposed in Wang et. al. (2020). In this approach, within each propensity score stratum, a composite likelihood function is specified and utilized to down-weight the information contributed by the external data source. Estimates of the stratum-specific parameters are obtained by maximizing the composite likelihood function. These stratum-specific estimates are then combined to obtain an overall population-level estimate of the parameter of interest.

rst_cl <- psrwe_compl(ps_bor, v_outcome = "Y_Bin",
                       outcome_type = "binary")
summary(rst_cl)

## $Overall
##      Type      Mean     StdErr
## 1 Control 0.3057532 0.02786997

PS-integrated composite likelihood approach for randomized studies

For randomized studies when there is one external data source that contains control subjects, the function psrwe_cl2arm allows one to conduct the analysis proposed in Chen et. al. (2020). In this approach, a propensity score-integrated composite likelihood approach is developed for augmenting the control arm of the two-arm randomized controlled trial with patients from the external data source. An example is given below.

data(ex_dta_rct)
dta_ps_rct <- psrwe_est(ex_dta_rct, v_covs = paste("V", 1:7, sep = ""),
                         v_grp = "Group", cur_grp_level = "current",
                         v_arm = "Arm", ctl_arm_level = "control")
dta_ps_rct

## This is a randomized study. A total of 1031 RWD subjects 
## and 200 current study subjects are used to estimate 
## propensity scores by logistic model. A total of 25 RWD 
## subjects are trimmed (trim_ab=both) and excluded from the 
## final analysis. (PS values of 1 and 0 RWD are below and 
## above current study) The following covariates are adjusted 
## in the propensity score model: V1, V2, V3, V4, V5, V6, V7.
## 
## The following table summarizes the number of subjects in 
## each stratum, and the distance in PS distributions 
## calculated by overlapping area:
## 
##     Stratum N_RWD N_RWD_CTL N_Current N_Cur_CTL N_Cur_TRT  Distance
## 1 Stratum 1   703       703        41        20        21 0.7205846
## 2 Stratum 2   120       120        34        20        14 0.7190201
## 3 Stratum 3    93        93        38        20        18 0.7666712
## 4 Stratum 4    72        72        43        20        23 0.6971793
## 5 Stratum 5    18        18        44        20        24 0.6175830

ps_bor_rct <- psrwe_borrow(dta_ps_rct, total_borrow = 30,
                            method = "distance")
ps_bor_rct

## A total of 30 subjects will be borrowed from the RWD. The 
## number 30 is split proportional to the distance in PS 
## distributions in each stratum. The following table 
## summarizes the number of subjects to be borrowed and the 
## weight parameter in each stratum:
## 
##     Stratum N_RWD N_RWD_CTL N_RWD_TRT N_Current N_Cur_CTL N_Cur_TRT  Distance
## 1 Stratum 1   703       703         0        41        20        21 0.7205846
## 2 Stratum 2   120       120         0        34        20        14 0.7190201
## 3 Stratum 3    93        93         0        38        20        18 0.7666712
## 4 Stratum 4    72        72         0        43        20        23 0.6971793
## 5 Stratum 5    18        18         0        44        20        24 0.6175830
##   Proportion N_Borrow       Alpha
## 1  0.2046512 6.139535 0.008733336
## 2  0.2042068 6.126205 0.051051710
## 3  0.2177401 6.532203 0.070238741
## 4  0.1980039 5.940117 0.082501625
## 5  0.1753980 5.261940 0.292329972

rst_cl_rct <- psrwe_compl(ps_bor_rct, v_outcome = "Y_Con",
                           outcome_type = "continuous")

rst_cl_rct$Effect

## $Stratum_Estimate
##        Mean    StdErr
## 1 13.826269 13.003388
## 2 -7.758159 10.346000
## 3 15.811210 15.199217
## 4 15.536261  9.432971
## 5 12.630545 11.726722
## 
## $Overall_Estimate
##       Mean   StdErr
## 1 10.63864 5.413684

Demo examples

The full examples for the PSPP and PSCL approaches can be found via the script demo("sec_4_1_ex", package = "psrwe") and demo("sec_4_2_ex", package = "psrwe").

Reference

Chen, W.C., Wang, C., Li, H., Lu, N., Tiwari, R., Xu, Y. and Yue, L.Q., 2020. Propensity score-integrated composite likelihood approach for augmenting the control arm of a randomized controlled trial by incorporating real-world data. Journal of Biopharmaceutical Statistics, 30(3), pp.508-520.

Wang, C., Lu, N., Chen, W. C., Li, H., Tiwari, R., Xu, Y., & Yue, L. Q. (2020). Propensity score-integrated composite likelihood approach for incorporating real-world evidence in single-arm clinical studies. Journal of biopharmaceutical statistics, 30(3), 495-507.

Wang, C., Li, H., Chen, W. C., Lu, N., Tiwari, R., Xu, Y., & Yue, L. Q. (2019). Propensity score-integrated power prior approach for incorporating real-world evidence in single-arm clinical studies. Journal of biopharmaceutical statistics, 29(5), 731-748.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.