The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Correlatio
Why would anybody want to see Figure 1?
- The simulated data of Figure 1
- Linearly transform the simulated data
Conclusion
What if?
From another angle

Correlatio

This R package can help visualizing what is summarized in Pearson’s correlation coefficient r. In a way, it can make visible the main part of r, which is a single number that ranges between -1 and 1. This possibility may be particularly helpful for people who do not fully understand Pearson’s correlation coefficient, unless they can also look at the main part of r, in addition to what is usually presented only verbally, and with formulas, numbers, and heatplots.

Why correlatio?

Main reason: Etymology of ‘correlation’ is ‘correlatio’. First: ‘cor-’, meaning ‘with, together’ or com- (from Latin cum), meaning ‘together, together with, in combination’. Second: ‘relation’ (from Latin relatio), meaning ‘a bringing back, restoring’.

Source: https://www.etymonline.com/word/correlation

Other reason: There are plenty of R packages which provide all sorts of possibilities regarding correlation, except for what this correlatio package offers. These other packages, which I was able to find, are called: correlation, corrr, simstudy (see simstudy Vignettes ‘Correlated data’), ppcor, corrmorant, corrgrapher, linkspotter.

Download of this correlatio R package

This correlatio R package can be downloaded from CRAN. Alternatively, from GitHub, e.g., by running these lines of code in R:

# Note: The R package devtools must have been installed, in order for this line of code to work:
devtools::install_github(repo="https://github.com/mmiche/correlatio",
                      dependencies = "Imports", build_vignettes = TRUE)

Correlatio uses the R packages base and stats, and:

library(ggplot2) # version 3.5.1 (Download from CRAN)
library(tibble) # version 3.2.1 (Download from CRAN)

Figure 1: Two variables, each having 6 observations. In the legend, s means that the pair of values go in the same direction, c stands for contrary.

Figure 1 shows what the etymology of the word correlation says: Together, i.e., in pairwise combination, bringing who knows what back from the tip (turquoise or red) to the respective mean.

Why would anybody want to see Figure 1?

First, why not? Some people might benefit, while nobody gets harmed. Second, in some scientific fields, a visualization of this kind may contribute to better, i.e., more thoroughly informed, scientific decisions. It may also contribute to fruitful discussions about whether or not assuming a continuous scale for one or more theoretical constructs can be defended (Heine and Heene, 2024; Feuerstahler, 2023).

Furthermore, Pearson’s correlation coefficient is the basis, upon which every version of linear regression modeling rests. That is, if computing correlations cannot be defended, then using linear regression models also cannot be defended. At the very least, visualizing the most important part of r may increase researchers’ awareness of limitations, regarding real-world validity of constructs, which otherwise can easily be overlooked or forgotten (Zyphur 2020a, 2020b; Zorich, 2025; Gardner and Neufeld, 2013).

The simulated data of Figure 1

# Set seed to guarantee reproducibility.
set.seed(13)
testSim <- correlatio::simcor(obs=6, rhos = c(.5, .6, .7))
test1 <- correlatio::corrio(data=testSim[[1]], visualize = TRUE)
test1$dat

##            x           y   x-mean(x)   y-mean(y)      covVec
## 1  0.5543269  1.34194738 -0.07810486  0.79851173 -0.06236764
## 2 -0.2802719  0.06483464 -0.91270374 -0.47860102  0.43682094
## 3  1.7751634  0.57115092  1.14273157  0.02771527  0.03167111
## 4  0.1873201  1.05074308 -0.44511167  0.50730742 -0.22580845
## 5  1.1425261 -0.37581709  0.51009435 -0.91925274 -0.46890563
## 6  0.4155261  0.60775501 -0.21690566  0.06431935 -0.01395123

There are many more details in the output ‘details’, such as the mean of x and y, respectively.

test1$details

## [[1]]
## [1] 0.6324318
## attr(,"Explanation")
## [1] "Mean of variable 1 (variable 1 = x)."
## 
## [[2]]
## [1] 0.5434357
## attr(,"Explanation")
## [1] "Mean of variable 2 (variable 2 = y)."
## 
## [[3]]
## [1] -0.771033
## attr(,"Explanation")
## [1] "Sum of all negative products (negSum): (x-mean(x)) * (y-mean(y))."
## 
## [[4]]
## [1] 0.468492
## attr(,"Explanation")
## [1] "Sum of all positive products (posSum): (x-mean(x)) * (y-mean(y))."
## 
## [[5]]
## [1] -0.3025409
## attr(,"Explanation")
## [1] "Numerator of covariance formula: Sum of negSum and posSum."
## 
## [[6]]
## [1] 5
## attr(,"Explanation")
## [1] "Denominator of covariance formula: n - 1."
## 
## [[7]]
## [1] -0.06050818
## attr(,"Explanation")
## [1] "Covariance: numeratorCov/denominatorCov."
## 
## [[8]]
## [1] 0.7280567
## attr(,"Explanation")
## [1] "Standard deviation of variable 1 (i.e., x): R command sd()."
## 
## [[9]]
## [1] 0.6283266
## attr(,"Explanation")
## [1] "Standard deviation of variable 2 (i.e., y): R command sd()."
## 
## [[10]]
## [1] 0.4574574
## attr(,"Explanation")
## [1] "Product of standard deviations (prodSD) of variables 1 and 2 (i.e., x and y)."
## 
## [[11]]
## [1] -0.1322707
## attr(,"Explanation")
## [1] "Correlation: Covariance/prodSD."
## 
## [[12]]
##        s        c 
## 33.33333 66.66667 
## attr(,"Explanation")
## [1] "Percentages of pairwise directions of s, c, n (s = same, c = contrary, n = no)"

Linearly transform the simulated data

In order to obtain values which more resemble data from our specific research field, the function ‘lineartransform’ can be used. Let’s apply this function to the three datasets in testSim. For example, the values in each variable shall range between 1 and 5, with 0 decimal digits, i.e., only integer values. Due to rounding the transformed values, the previous correlation will change somewhat.

testSimTransformed <- lapply(testSim, function(x) {
    apply(x, 2, function(y) {
        correlatio::lineartransform(futureRange = c(1, 5), vec = y, digits = 0)
    })
})

Now we run the ‘corrio’ function again, this time with the first of the three datasets of testSimTransformed. This will be followed by the numeric output and the visualization.

test2 <- correlatio::corrio(data=testSimTransformed[[1]])
test2$dat

##   x y  x-mean(x) y-mean(y)     covVec
## 1 3 5  0.1666667         2  0.3333333
## 2 1 2 -1.8333333        -1  1.8333333
## 3 5 3  2.1666667         0  0.0000000
## 4 2 4 -0.8333333         1 -0.8333333
## 5 4 1  1.1666667        -2 -2.3333333
## 6 2 3 -0.8333333         0  0.0000000

The visualization can be seen in Figure 2.

Figure 2: Two variables, each having 6 observations. In the legend, s means that the pair of values go in the same direction, c stands for contrary.

In Figure 2 we see that for the third and the sixth value, the y variable contains two values which are identical with the mean of y, namely 3. Therefore the legend of Figure 2 shows n for ‘no direction’, i.e., these pairwise combinations cannot be determined, because one of the pair of values goes in no direction.

Conclusion

The visualizations of this correlatio package may appear uninformative, once each variable contains 10, 20, or far more values. However, even then someone might come up with interesting ideas of how to benefit from this type of visualization. For example, assume you have 200 observations for a correlated pair of variables. You might want to randomly sample a subset of 10 or 20 values from the 200 observations, and look at the resulting plot while trying to figure out why it looks the way it does.

What if?

What if you wanted to visualize two correlated variables that have extremely different values? Example pair of such variables:

v1 <- c(6.58, 7.02, 6.95, 8.6, 6.81, 6.75, 7.65)
v2 <- c(176, 302, 194, 325, 318, 309, 275)

If these two variables were to be visualized in the same plot, the distances between v1 and the mean of v1 would not be visible, because the values of v2 determine the y-axis of the plot. In such a case, z-transform both variables, before visualizing them. The correlation between v1 and v2 is .44.

test3 <- correlatio::corrio(data=data.frame(z1=scale(v1), z2=scale(v2)), visualize = TRUE)
test3$dat

##            x          y  x-mean(x)  y-mean(y)      covVec
## 1 -0.8692861 -1.5560290 -0.8692861 -1.5560290  1.35263427
## 2 -0.2466346  0.5015686 -0.2466346  0.5015686 -0.12370418
## 3 -0.3456928 -1.2620865 -0.3456928 -1.2620865  0.43629423
## 4  1.9892499  0.8771618  1.9892499  0.8771618  1.74489396
## 5 -0.5438092  0.7628508 -0.5438092  0.7628508 -0.41484526
## 6 -0.6287162  0.6158795 -0.6287162  0.6158795 -0.38721343
## 7  0.6448890  0.0606548  0.6448890  0.0606548  0.03911561

The visualization can be seen in Figure 3.

Figure 3: Two variables z1 and z2. In the legend, s means that the pair of values go in the same direction, c stands for contrary.

From another angle

A correlation between two continuous variables can also be visualized geometrically. That is, the Pearson correlation coefficient is an angle in the unit circle (see this online illustration by Tim Elton). The cosine wave ranges from minus one to plus one (across the vertical axis), which is the reason why the Pearson correlation coefficient always is within this range.

The function ‘corvisualize’ returns the geometric visualization of a bivariate correlation between two continuous variables. It shows the perpendicular from the outcome vector onto the predictor vector in (two-dimensional) linear space. This way, the regression weight of the simple linear regression between predictor and outcome can be seen in the plot (where the perpendicular crosses (or meets) the vector that represents the predictor (x-axis)). BEWARE: The spread of the vector (square root of the variance of the predictor) is what determines the predictor unit. See the example below.

positiveCorDat <- data.frame(x1=c(5,9,3,6,2,9,3,7,2,8),
                             x2=c(2,6,7,8,3,5,5,8,3,9))
correlatio::corvisualize(data=positiveCorDat, x="x1", y="x2", visualize=TRUE)

## $covMat
##          x        y
## x 7.822222 3.400000
## y 3.400000 5.822222
## 
## $covPredMat
##          x        y
## x 7.822222 3.400000
## y 3.400000 1.477841
## 
## $corMat
##           x         y
## x 1.0000000 0.5038131
## y 0.5038131 1.0000000
## 
## $spreadMat
##        x        y 
## 2.796824 2.412928 
## 
## $angle
## [1] 59.74741
## 
## $rsquared
## [1] 0.2538276
## 
## $errorVariance
## [1] 4.344381
## 
## $errorSpread
## [1] 2.084318
## 
## $observedSpread
##        y 
## 2.412928 
## 
## $yhatSpread
## [1] 1.215665
## 
## $bWeight
##         x 
## 0.4346591 
## 
## $betaWeight
##         y 
## 0.5038131 
## 
## $anglePlot

The predictor’s length is the square root of column x1, which is 2.796824 (= one unit). The perpendicular (vertical dashed line) crosses the predictor vector at 1.215665. Before computing anything, we can see that the crossing occurs somewhere between one third and one half of the (one unit) length of the predictor vector (along the x-axis). Dividing 1.215665 by 2.796824 results in 0.4346591, which is the regression weight (mostly denoted by the letter b).

When scaling both the predictor and the outcome (mean = 0, standard deviation = 1), the b and the beta regression weight are identical, which in this example is 0.5038131:

positiveCorDat <- data.frame(x1=c(5,9,3,6,2,9,3,7,2,8),
                             x2=c(2,6,7,8,3,5,5,8,3,9))
positiveCorDat.z <- data.frame(scale(positiveCorDat))
correlatio::corvisualize(data=positiveCorDat.z, x="x1", y="x2", visualize=TRUE)

## $covMat
##           x         y
## x 1.0000000 0.5038131
## y 0.5038131 1.0000000
## 
## $covPredMat
##           x         y
## x 1.0000000 0.5038131
## y 0.5038131 0.2538276
## 
## $corMat
##           x         y
## x 1.0000000 0.5038131
## y 0.5038131 1.0000000
## 
## $spreadMat
## x y 
## 1 1 
## 
## $angle
## [1] 59.74741
## 
## $rsquared
## [1] 0.2538276
## 
## $errorVariance
## [1] 0.7461724
## 
## $errorSpread
## [1] 0.8638127
## 
## $observedSpread
## y 
## 1 
## 
## $yhatSpread
## [1] 0.5038131
## 
## $bWeight
##         x 
## 0.5038131 
## 
## $betaWeight
##         y 
## 0.5038131 
## 
## $anglePlot

# sessionInfo() # Today: 2025-05-23
R version 4.4.0 (2024-04-24)
Platform: x86_64-apple-darwin20
Running under: macOS Sonoma 14.2.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Zurich
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] correlatio_0.2.0

loaded via a namespace (and not attached):
 [1] generics_0.1.4     stringi_1.8.7      digest_0.6.37     
 [4] magrittr_2.0.3     grid_4.4.0         RColorBrewer_1.1-3
 [7] pkgload_1.4.0      fastmap_1.2.0      processx_3.8.6    
[10] pkgbuild_1.4.7     sessioninfo_1.2.2  ps_1.9.1          
[13] urlchecker_1.0.1   promises_1.3.2     purrr_1.0.4       
[16] scales_1.4.0       Rdpack_2.6.4       cli_3.6.5         
[19] shiny_1.10.0       rlang_1.1.6        rbibutils_2.3     
[22] ellipsis_0.3.2     remotes_2.4.2      cachem_1.1.0      
[25] devtools_2.4.5     tools_4.4.0        memoise_2.0.1     
[28] dplyr_1.1.4        ggplot2_3.5.2      httpuv_1.6.16     
[31] curl_6.2.2         vctrs_0.6.5        R6_2.6.1          
[34] mime_0.13          lifecycle_1.0.4    stringr_1.5.1     
[37] fs_1.6.6           htmlwidgets_1.6.4  usethis_2.1.6     
[40] miniUI_0.1.2       pkgconfig_2.0.3    desc_1.4.3        
[43] callr_3.7.6        pillar_1.10.2      later_1.4.2       
[46] gtable_0.3.6       glue_1.8.0         profvis_0.3.7     
[49] Rcpp_1.0.14        tibble_3.2.1       tidyselect_1.2.1  
[52] rstudioapi_0.17.1  dichromat_2.0-0.1  farver_2.1.2      
[55] xtable_1.8-4       htmltools_0.5.8.1  compiler_4.4.0

References

Elton, T. (no date). Sin cos and Tan animated from the Unit Circle, GeoGebra. Available at: https://www.geogebra.org/m/cNEtsbvC (Accessed: 23 May 2025).

Feuerstahler, L. (2023). Scale type revisited: Some misconceptions, misinterpretations, and recommendations. Psych. DOI

Gardner, R. C., & Neufeld, R. W. (2013). What the correlation coefficient really tells us about the individual. Canadian Journal of Behavioural Science/Revue canadienne des sciences du comportement, 45(4), 313-319. DOI

Heine, J.-H., & Heene, M. (2024). Measurement and mind: Unveiling the self-delusion of metrification in psychology. Measurement: Interdisciplinary Research and Perspectives. DOI

Zyphur, M. J., & Pierides, D. C. (2020a). Statistics and probability have always been value-laden: An historical ontology of quantitative research methods. Journal of Business Ethics, 167(1), 1-18. DOI

Zyphur, M. J., & Pierides, D. C. (2020b). Making quantitative research work: From positivist dogma to actual social scientific inquiry. Journal of Business Ethics, 167, 49-62. DOI

Zorich, J. N. (2025). The History of Correlation. Taylor & Francis. DOI

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.

correlatio

Marcel Miché

2025-05-23