The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
library(joyn)
#>
#> Attaching package: 'joyn'
#> The following object is masked from 'package:base':
#>
#> merge
library(data.table)
#> Warning: package 'data.table' was built under R version 4.4.2
x <- data.table(id = c(1, 4, 2, 3, NA),
t = c(1L, 2L, 1L, 2L, NA),
country = c(16, 12, 3, NA, 15))
y <- data.table(id = c(1, 2, 5, 6, 3),
gdp = c(11L, 15L, 20L, 13L, 10L),
country = 16:20)
This vignette will let you explore some additional features available
in joyn
, through an example use case.
Suppose you want to join tables x
and y
,
where the variable country is available in both. You could do
one of five things:
If you don’t use the argument by
, joyn
will
consider country and id as key variables by default
given that they are common between x
and
y
.
# The variables with the same name, `id` and `country`, are used as key
# variables.
joyn(x = x,
y = y)
#>
#> ── JOYn Report ──
#>
#> .joyn n percent
#> 1 x 4 44.4%
#> 2 y 4 44.4%
#> 3 x & y 1 11.1%
#> 4 total 9 100%
#> ────────────────────────────────────────────────────────── End of JOYn report ──
#> ℹ Note: Joyn's report available in variable .joyn
#> ℹ Note: Removing key variables id and country from id, gdp, and country
#> id t country gdp .joyn
#> <num> <int> <num> <int> <fctr>
#> 1: 1 1 16 11 x & y
#> 2: 4 2 12 NA x
#> 3: 2 1 3 NA x
#> 4: 3 2 NA NA x
#> 5: NA NA 15 NA x
#> 6: 2 NA 17 15 y
#> 7: 5 NA 18 20 y
#> 8: 6 NA 19 13 y
#> 9: 3 NA 20 10 y
Alternatively, you can specify to join by country
# Joining by country
joyn(x = x,
y = y,
by = "country")
#>
#> ── JOYn Report ──
#>
#> .joyn n percent
#> 1 x 4 44.4%
#> 2 y 4 44.4%
#> 3 x & y 1 11.1%
#> 4 total 9 100%
#> ────────────────────────────────────────────────────────── End of JOYn report ──
#> ℹ Note: Joyn's report available in variable .joyn
#> ℹ Note: Removing key variables country from id, gdp, and country
#> id t country gdp .joyn
#> <num> <int> <num> <int> <fctr>
#> 1: 1 1 16 11 x & y
#> 2: 4 2 12 NA x
#> 3: 2 1 3 NA x
#> 4: 3 2 NA NA x
#> 5: NA NA 15 NA x
#> 6: NA NA 17 15 y
#> 7: NA NA 18 20 y
#> 8: NA NA 19 13 y
#> 9: NA NA 20 10 y
y
and
don’t bring it into the resulting tableThis the default if you did not include country as part of
the key variables in argument by
.
joyn(x = x,
y = y,
by = "id")
#>
#> ── JOYn Report ──
#>
#> .joyn n percent
#> 1 x 2 28.6%
#> 2 y 2 28.6%
#> 3 x & y 3 42.9%
#> 4 total 7 100%
#> ────────────────────────────────────────────────────────── End of JOYn report ──
#> ℹ Note: Joyn's report available in variable .joyn
#> ℹ Note: Removing key variables id from id, gdp, and country
#> id t country gdp .joyn
#> <num> <int> <num> <int> <fctr>
#> 1: 1 1 16 11 x & y
#> 2: 4 2 12 NA x
#> 3: 2 1 3 15 x & y
#> 4: 3 2 NA 10 x & y
#> 5: NA NA 15 NA x
#> 6: 5 NA NA 20 y
#> 7: 6 NA NA 13 y
Another possibility is to make use of the update_NAs
argument of joyn()
. This allows you to update the NAs
values in variable country in table x
with the
actual values of the matching observations in country from
table y. In this case, actual values in country from table x
will remain unchanged.
joyn(x = x,
y = y,
by = "id",
update_NAs = TRUE)
#>
#> ── JOYn Report ──
#>
#> .joyn n percent
#> <char> <int> <char>
#> 1: x 2 28.6%
#> 2: x & y 2 28.6%
#> 3: NA updated 3 42.9%
#> 4: total 7 100%
#> ────────────────────────────────────────────────────────── End of JOYn report ──
#> ℹ Note: Joyn's report available in variable .joyn
#> ℹ Note: Removing key variables id from id, gdp, and country
#> id t country gdp .joyn
#> <num> <int> <num> <int> <fctr>
#> 1: 1 1 16 11 x & y
#> 2: 4 2 12 NA x
#> 3: 2 1 3 15 x & y
#> 4: 3 2 20 10 NA updated
#> 5: NA NA 15 NA x
#> 6: 5 NA 18 20 NA updated
#> 7: 6 NA 19 13 NA updated
You can also update all the values - both NAs and actual - in
variable country of table x
with the actual values
of the matching observations in country from y
.
This is done by setting update_values = TRUE
.
Notice that the reportvar
allows you keep track of how
the update worked. In this case, value update means that only
the values that are different between country from
x
and country from y
are updated.
However, let’s consider other possible cases:
If, for the same matching observations, the values between the two country variables were the same, the reporting variable would report x & y instead (so you know that there is no update to make).
if there are NAs in country from y
, the
actual values in x
will be unchanged, and you would see a
not updated status in the reporting variable. Nevertheless,
notice there is another way for you to bring country from
y
to x
. This is done through the argument
keep_y_in_x
(see 5. below ⬇️)
# Notice that only the value that are
joyn(x = x,
y = y,
by = "id",
update_values = TRUE)
#>
#> ── JOYn Report ──
#>
#> .joyn n percent
#> <char> <int> <char>
#> 1: NA updated 3 42.9%
#> 2: value updated 2 28.6%
#> 3: not updated 2 28.6%
#> 4: total 7 100%
#> ────────────────────────────────────────────────────────── End of JOYn report ──
#> ℹ Note: Joyn's report available in variable .joyn
#> ℹ Note: Removing key variables id from id, gdp, and country
#> id t country gdp .joyn
#> <num> <int> <num> <int> <fctr>
#> 1: 1 1 16 11 value updated
#> 2: 4 2 12 NA not updated
#> 3: 2 1 17 15 value updated
#> 4: 3 2 20 10 NA updated
#> 5: NA NA 15 NA not updated
#> 6: 5 NA 18 20 NA updated
#> 7: 6 NA 19 13 NA updated
Another available option is that of bringing the original variable
country from y
into the resulting table, without
using it to update the values in x
. In order to distinguish
country from x
and country from
y
, joyn
will assign a suffix to the variable’s
name: so that you will get country.y and country.x.
All of this can be done specifying
keep_common_vars = TRUE.
joyn(x = x,
y = y,
by = "id",
keep_common_vars = TRUE)
#>
#> ── JOYn Report ──
#>
#> .joyn n percent
#> 1 x 2 28.6%
#> 2 y 2 28.6%
#> 3 x & y 3 42.9%
#> 4 total 7 100%
#> ────────────────────────────────────────────────────────── End of JOYn report ──
#> ℹ Note: Joyn's report available in variable .joyn
#> ℹ Note: Removing key variables id from id, gdp, and country
#> id t country.x gdp country.y .joyn
#> <num> <int> <num> <int> <int> <fctr>
#> 1: 1 1 16 11 16 x & y
#> 2: 4 2 12 NA NA x
#> 3: 2 1 3 15 17 x & y
#> 4: 3 2 NA 10 20 x & y
#> 5: NA NA 15 NA NA x
#> 6: 5 NA NA 20 18 y
#> 7: 6 NA NA 13 19 y
In joyn
, you can also bring non common variables from
y
into the resulting table. In fact you can specify them in
y_vars_to_keep
, as shown in the example below:
# Keeping variable gdp
joyn(x = x,
y = y,
by = "id",
y_vars_to_keep = "gdp")
#>
#> ── JOYn Report ──
#>
#> .joyn n percent
#> 1 x 2 28.6%
#> 2 y 2 28.6%
#> 3 x & y 3 42.9%
#> 4 total 7 100%
#> ────────────────────────────────────────────────────────── End of JOYn report ──
#> ℹ Note: Joyn's report available in variable .joyn
#> id t country gdp .joyn
#> <num> <int> <num> <int> <fctr>
#> 1: 1 1 16 11 x & y
#> 2: 4 2 12 NA x
#> 3: 2 1 3 15 x & y
#> 4: 3 2 NA 10 x & y
#> 5: NA NA 15 NA x
#> 6: 5 NA NA 20 y
#> 7: 6 NA NA 13 y
Notice that if you set y_vars_to_keep = FALSE
or
y_vars_to_keep = NULL
, then joyn
won’t bring
any variable into the returning table.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.