10. Make Datasets Documentation with write

The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

10. Make Datasets Documentation with write_man()

Introduction

R packages which contain datasets need documentation. The Roxygen2 package helps write R manual pages but there are many details. The write_man() function examines a dataset and writes a R file that contains Roxygen2 code which produces documents the dataset. The documentation is written using Markdown to organize details including the number of rows and columns in the dataset, the names, the labels (if any) and types of variable. The levels of categorical variables are also included.

Getting Started

Load the dataset you want to document into the global environment using tools like tidyREDCap::import_instruments() or the_data <- readr::read_csv("an_csv_file.csv") or the_data <- readxl::read_excel("an_excel_file.xlsx"). Then use write_man("the_data"). For example:

library(tidyverse)
library(conflicted)
library(labelled)  # for set_variable_labels()
demographics <- 
  readxl::read_excel("demographics.xlsx") |> 
  mutate(sex2 = as_factor(sex), .keep="unused") |> 
  labelled::set_variable_labels(
    age = "Age in Years",
    sex2 = "Sex assigned at Birth"
  )

rUM::write_man("demographics")

This produces a R file, whose name matches the dataset, in the R folder, that has details like

#' demographics dataset
#'
#' @description Description of the demographics dataset goes here
#'
#' @format A tibble with 3 rows and 2 variables:
#' \describe{
#'   \item{age}{
#'
#' | *Type:*        | numeric       |
#' | -------------- | ------------- |
#' |                |               |
#' | *Description:* | Age in Years |
#'
#'   }
#'   \item{sex2}{
#'
#' | *Type:*        | factor (First/Reference level = `Male`) |
#' | -------------- | ---------------------------------------------------- |
#' |                |                                                      |
#' | *Description:* | Sex assigned at Birth |
#' |                |                                                      |
#' | *Levels:*      | `Male, Female`           |
#'
#'   }
#' }
#' @source Where the data came from
"demographics"

If your dataset does not have labels you will need to modify the
#' | *Description:* | Description for *thingy* goes here |
line to have an useful description.

You will also want to modify the
@source Where the data came from
line to properly cite the source of the data.

Conclusion

If you follow this workflow for each of your datasets you will have user friendly documentation which will be ready for CRAN.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.