The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
This vignette shows you how to upload and prepare any dataset for use
with finalfit. The demonstration will use the
boot::melanoma
. Use ?boot::melanoma
to see the
help page with data description. I will use
library(tidyverse)
methods. First I’ll
write_csv()
the data just to demonstrate reading it.
Note the various options in read_csv()
, including
providing column names, variable type, missing data identifier etc.
library(readr)
# Save example
write_csv(boot::melanoma, "boot.csv")
# Read data
= read_csv("boot.csv")
melanoma #> Rows: 205 Columns: 7
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> dbl (7): time, status, sex, age, year, thickness, ulcer
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Note the output shows how the columns/variables have been parsed. For
full details see ?readr::cols()
.
col_integer()
col_double()
col_factor()
col_character()
col_logical()
col_date()
col_time()
col_datetime()
ff_glimpse()
provides a convenient overview of all data
in a tibble or data frame. It is particularly important that factors are
correctly specified. Hence, ff_glimpse()
separates
variables into continuous and categorcial. As expected, no factors are
yet specified in the melanoma dataset.
library(finalfit)
ff_glimpse(melanoma)
#> $Continuous
#> label var_type n missing_n missing_percent mean sd min
#> time time <dbl> 205 0 0.0 2152.8 1122.1 10.0
#> status status <dbl> 205 0 0.0 1.8 0.6 1.0
#> sex sex <dbl> 205 0 0.0 0.4 0.5 0.0
#> age age <dbl> 205 0 0.0 52.5 16.7 4.0
#> year year <dbl> 205 0 0.0 1969.9 2.6 1962.0
#> thickness thickness <dbl> 205 0 0.0 2.9 3.0 0.1
#> ulcer ulcer <dbl> 205 0 0.0 0.4 0.5 0.0
#> quartile_25 median quartile_75 max
#> time 1525.0 2005.0 3042.0 5565.0
#> status 1.0 2.0 2.0 3.0
#> sex 0.0 0.0 1.0 1.0
#> age 42.0 54.0 65.0 95.0
#> year 1968.0 1970.0 1972.0 1977.0
#> thickness 1.0 1.9 3.6 17.4
#> ulcer 0.0 0.0 1.0 1.0
#>
#> $Categorical
#> data frame with 0 columns and 205 rows
If you wish to see the variables in the order in which they appear in
the data frame or tibble, missing_glimpse()
or
tibble::glimpse()
are useful.
missing_glimpse(melanoma)
#> label var_type n missing_n missing_percent
#> time time <dbl> 205 0 0.0
#> status status <dbl> 205 0 0.0
#> sex sex <dbl> 205 0 0.0
#> age age <dbl> 205 0 0.0
#> year year <dbl> 205 0 0.0
#> thickness thickness <dbl> 205 0 0.0
#> ulcer ulcer <dbl> 205 0 0.0
Use an original description of the data (often called a data dictionary) to correctly assign and label any factor variables. This can be done in a single pipe.
library(dplyr)
%>%
melanoma mutate(
status.factor = factor(status, levels = c(1, 2, 3),
labels = c("Died from melanoma", "Alive", "Died from other causes")) %>%
ff_label("Status"),
sex.factor = factor(sex, levels = c(1, 0),
labels = c("Male", "Female")) %>%
ff_label("Sex"),
ulcer.factor = factor(ulcer, levels = c(1, 0),
labels = c("Present", "Absent")) %>%
ff_label("Ulcer")
-> melanoma
)
ff_glimpse(melanoma)
#> $Continuous
#> label var_type n missing_n missing_percent mean sd min
#> time time <dbl> 205 0 0.0 2152.8 1122.1 10.0
#> status status <dbl> 205 0 0.0 1.8 0.6 1.0
#> sex sex <dbl> 205 0 0.0 0.4 0.5 0.0
#> age age <dbl> 205 0 0.0 52.5 16.7 4.0
#> year year <dbl> 205 0 0.0 1969.9 2.6 1962.0
#> thickness thickness <dbl> 205 0 0.0 2.9 3.0 0.1
#> ulcer ulcer <dbl> 205 0 0.0 0.4 0.5 0.0
#> quartile_25 median quartile_75 max
#> time 1525.0 2005.0 3042.0 5565.0
#> status 1.0 2.0 2.0 3.0
#> sex 0.0 0.0 1.0 1.0
#> age 42.0 54.0 65.0 95.0
#> year 1968.0 1970.0 1972.0 1977.0
#> thickness 1.0 1.9 3.6 17.4
#> ulcer 0.0 0.0 1.0 1.0
#>
#> $Categorical
#> label var_type n missing_n missing_percent levels_n
#> status.factor Status <fct> 205 0 0.0 3
#> sex.factor Sex <fct> 205 0 0.0 2
#> ulcer.factor Ulcer <fct> 205 0 0.0 2
#> levels
#> status.factor "Died from melanoma", "Alive", "Died from other causes", "(Missing)"
#> sex.factor "Male", "Female", "(Missing)"
#> ulcer.factor "Present", "Absent", "(Missing)"
#> levels_count levels_percent
#> status.factor 57, 134, 14 27.8, 65.4, 6.8
#> sex.factor 79, 126 39, 61
#> ulcer.factor 90, 115 44, 56
Everything looks good and you are ready to start analysis.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.