The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
This packages makes it easy to meta-analyze multiple genetic association studies that may be correlated, which we established occurs when there is cryptic relatedness between the studies, though it could also happen due to other forms of confounding. The standard method for genetic association meta analysis, METAL, assumes that the studies are independent, so when they are not this results in inflated statistics in the meta-analyzed output, even when input studies are well calibrated.
At the end we present important distributional modeling of products of correlated standard normal variables, which enables estimation of correlations from median estimates in this setting where large outliers are expected.
Now we perform the meta-analysis! Once your studies are loaded onto data frames or tibbles, the process is quite simple.
library(metalcor)
# gather studies into a list
studies <- list( study1, study2, study3 )
# pass to metalcor!
out <- metalcor( studies )
# you can quickly inspect the return value, it is a list with two named elements:
# - assoc: the meta-analyzed association table
# - R: the estimated covariance matrix between studies
out
#> $assoc
#> # A tibble: 9,986 × 9
#> id chr pos n n_studies beta se z p
#> <int> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 1 11000 2 0.00157 0.00802 0.196 0.845
#> 2 2 1 2 13000 3 0.000109 0.000301 0.363 0.716
#> 3 3 1 3 13000 3 0.000129 0.00218 0.0589 0.953
#> 4 4 1 4 13000 3 -0.00841 0.00749 -1.12 0.261
#> 5 5 1 5 13000 3 -0.000256 0.00137 -0.187 0.852
#> 6 6 1 6 13000 3 -0.00966 0.00718 -1.35 0.179
#> 7 8 1 8 13000 3 0.0141 0.00671 2.10 0.0355
#> 8 9 1 9 11000 2 -0.00144 0.00598 -0.241 0.809
#> 9 10 1 10 11000 2 0.00128 0.00188 0.681 0.496
#> 10 11 1 11 13000 3 -0.00157 0.00356 -0.440 0.660
#> # ℹ 9,976 more rows
#>
#> $R
#> [,1] [,2] [,3]
#> [1,] 1.03832115 0.22578469 -0.02655883
#> [2,] 0.22578469 0.99500557 0.08044017
#> [3,] -0.02655883 0.08044017 0.99910255There are several things to note. First, the meta-analyzed table has
every locus observed in at least one study (the number of studies for
each locus is in the column n_studies; there’s only a few
loci that were removed in all three studies). These loci are
automatically sorted by chr and pos (that is
their only use here; loci were matched by id). For loci
observed in a single study, meta-analysis is just a copy of the input
data. Lastly, note that the estimated R matrix is close to the true
matrix of the simulation (in the previous section). If the input studies
had been inflated, their diagonal values would exceed 1, and in fact
they ought to equal their inflation factors (in these cases, the R
matrix is no longer a correlation matrix technically, but a covariance
matrix).
For most users, that’s it! The meta-analyzed association table is the primary output. However, since correlation between studies has not been studied before our work, it is important to report the estimated R matrix to determine which studies are not independent, and perhaps try to determine what causes dependence between studies more broadly.
Let’s also demonstrate quickly that, despite the relative high correlations between these studies, the resulting meta-analysis is not inflated. The following value is close to 1 ideally, and exceeds 1 under inflation/confounding:
If for whatever reason you want to skip the meta-analysis and only
study the correlation structure, you can do that! With the starting
Z-score matrix we simulated earlier, you can skip to the covariance
estimate directly using the function below. Both estimate_R
and admixcor above have a parameter median,
which if FALSE uses a mean-based estimator. Compare those
two estimates here, as well as the most naive estimate:
# default median estimate
estimate_R( Z )
#> [,1] [,2] [,3]
#> [1,] 1.03707004 0.22526404 -0.01709117
#> [2,] 0.22526404 1.00146031 0.08800936
#> [3,] -0.01709117 0.08800936 0.99877092
# mean estimate
estimate_R( Z, median = FALSE )
#> [,1] [,2] [,3]
#> [1,] 1.036232988 0.21293328 0.005285136
#> [2,] 0.212933275 1.01117457 0.099509762
#> [3,] 0.005285136 0.09950976 0.986847500
# standard sample covariance estimate is similar but not identical
cov( Z, use = 'pairwise.complete.obs' )
#> [,1] [,2] [,3]
#> [1,] 1.036281303 0.21290535 0.005181264
#> [2,] 0.212905351 1.01123191 0.099426824
#> [3,] 0.005181264 0.09942682 0.986749164(This Z matrix doesn’t have missingness, so estimates
are slightly different than the earlier ones after we added missingness,
but this function also handles missingness.) The median estimator is
preferred for real data, because it is robust to outliers, which in this
case corresponds to strong associations. We want to ignore strong
associations because of course they are correlated between studies, but
they are not correlated due to confounding. The mean estimator is more
robust if there are no outliers (since this simulation has no outliers,
the mean estimator will appear better here, but real data is
different!). The last one above, the standard sample covariance
estimator, is most similar to the mean estimator except it also centers
the z-scores before the comparison, which is theoretically not needed
for z-scores since they are supposed to have zero means, which is why we
don’t use that either.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.