The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Title: Data Quality in Epidemiological Research
Version: 2.5.1
Description: Data quality assessments guided by a 'data quality framework introduced by Schmidt and colleagues, 2021' <doi:10.1186/s12874-021-01252-7> target the data quality dimensions integrity, completeness, consistency, and accuracy. The scope of applicable functions rests on the availability of extensive metadata which can be provided in spreadsheet tables. Either standardized (e.g. as 'html5' reports) or individually tailored reports can be generated. For an introduction into the specification of corresponding metadata, please refer to the 'package website' https://dataquality.qihs.uni-greifswald.de/VIN_Annotation_of_Metadata.html.
License: BSD_2_clause + file LICENSE
URL: https://dataquality.qihs.uni-greifswald.de/
BugReports: https://gitlab.com/libreumg/dataquier/-/issues
Depends: R (≥ 3.6.0)
Imports: dplyr (≥ 1.0.2), emmeans, ggplot2 (≥ 3.5.0), lme4, lubridate, MASS, MultinomialCI, parallelMap, patchwork (≥ 1.3.0), R.devices, rlang, robustbase, qmrparser, utils, rio, readr, scales, withr, lifecycle, units, methods
Suggests: openxlsx2, GGally, grDevices, jsonlite, cli, whoami, anytime, cowplot (≥ 0.9.4), digest, DT (≥ 0.23), flexdashboard, flexsiteboard, htmltools, knitr, markdown, parallel, parallelly, rJava, rmarkdown, rstudioapi, testthat (≥ 3.1.9), tibble, vdiffr, pkgload, Rdpack, callr, colorspace, plotly, ggvenn, htmlwidgets, future, processx, R6, shiny, xml2, mgcv, rvest, textutils, dbx, ggpubr, grImport2, rsvg, stringdist, rankICC, nnet, ordinal, storr, reticulate
VignetteBuilder: knitr
Encoding: UTF-8
KeepSource: FALSE
Language: en-US
RoxygenNote: 7.3.2
Config/testthat/parallel: true
Config/testthat/edition: 3
Config/testthat/start-first: dq_report_by_sm, dq_report2, dq_report_by_arguments, dq_report_by_s, int_encoding_errors, dq_report_by_pipesymbol_list, dq_report_by_m, plots, acc_loess, com_item_missingness, dq_report_by_na, dq_report_by_directories, con_limit_deviations, con_contradictions_redcap, com_segment_missingness, util_correct_variable_use
BuildManual: TRUE
NeedsCompilation: no
Packaged: 2025-03-05 17:44:09 UTC; struckmanns
Author: University Medicine Greifswald [cph], Elisa Kasbohm ORCID iD [aut], Elena Salogni ORCID iD [aut], Joany Marino ORCID iD [aut], Adrian Richter ORCID iD [aut], Carsten Oliver Schmidt ORCID iD [aut], Stephan Struckmann ORCID iD [aut, cre], German Research Foundation (DFG SCHM 2744/3-1, SCHM 2744/9-1, SCHM 2744/3-4) [fnd], National Research Data Infrastructure for Personal Health Data: (NFDI 13/1) [fnd], European Union’s Horizon 2020 programme (euCanSHare, grant agreement No. 825903) [fnd]
Maintainer: Stephan Struckmann <stephan.struckmann@uni-greifswald.de>
Repository: CRAN
Date/Publication: 2025-03-05 18:10:02 UTC

The dataquieR package about Data Quality in Epidemiological Research

Description

For a quick start please read dq_report2 and maybe the vignettes or the package's website.

Options

This package features the following options():

Author(s)

Maintainer: Stephan Struckmann stephan.struckmann@uni-greifswald.de (ORCID)

Authors:

Other contributors:

References

See Also

Useful links:

Other options: dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Write single results from a dataquieR_resultset2 report

Description

Write single results from a dataquieR_resultset2 report

Usage

## S3 replacement method for class 'dataquieR_resultset2'
x$el <- value

Arguments

x

the report

el

the index

value

the single result

Value

the dataquieR result object


Extract elements of a dataquieR Result Object

Description

Extract elements of a dataquieR Result Object

Usage

## S3 method for class 'dataquieR_result'
x$...

Arguments

x

the dataquieR result object

...

arguments passed to the implementation for lists.

Value

the element of the dataquieR result object with all messages still attached

See Also

base::Extract


Access single results from a dataquieR_resultset2 report

Description

Access single results from a dataquieR_resultset2 report

Usage

## S3 method for class 'dataquieR_resultset2'
x$el

Arguments

x

the report

el

the index

Value

the dataquieR result object


Holds Indicator .// Descriptor assignments from the manual at run-time

Description

Holds Indicator .// Descriptor assignments from the manual at run-time

Usage

..indicator_or_descriptor

Format

An object of class environment of length 0.


Holds parts of the manual at run-time

Description

Holds parts of the manual at run-time

Usage

..manual

Format

An object of class environment of length 0.


Access elements from a dataquieR_resultset2

Description

does so, but similar to [ for lists.

Usage

.access_dq_rs2(x, els)

Arguments

x

the dataquieR_resultset2

els

the selector (character, number or logical)

Value

the sub-list of x


Write elements from a dataquieR_resultset2

Description

does so, but similar to [ for lists.

Usage

.access_dq_rs2(x, els) <- value

Arguments

x

the dataquieR_resultset2

els

the selector (character, number or logical)

value

dataquieR_result to write

Value

the modified x


Get Access to Utility Functions

Description

[Experimental]

Usage

.get_internal_api(fkt, version = API_VERSION, or_newer = TRUE)

Arguments

fkt

function name

version

version number to get

Value

an API object


Roxygen-Template for indicator functions

Description

Roxygen-Template for indicator functions

Usage

.template_function_indicator(
  resp_vars,
  study_data,
  label_col,
  item_level,
  meta_data,
  meta_data_v2,
  meta_data_dataframe,
  meta_data_segment,
  dataframe_level,
  segment_level
)

Arguments

resp_vars

variable the names of the measurement variables, if missing or NULL, all variables will be checked

study_data

data.frame the data frame that contains the measurements

label_col

variable attribute the name of the column in the metadata with labels of variables

item_level

data.frame the data frame that contains metadata attributes of study data

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

meta_data_dataframe

data.frame the data frame that contains the metadata for the data frame level

meta_data_segment

data.frame – optional: Segment level metadata

dataframe_level

data.frame alias for meta_data_dataframe

segment_level

data.frame alias for meta_data_segment

Value

invisible(NULL)


Make normalizations of v2.0 item_level metadata.

Description

Requires referred missing-tables being available by prep_get_data_frame.

Usage

.util_internal_normalize_meta_data(
  meta_data = "item_level",
  label_col = LABEL,
  verbose = TRUE
)

Arguments

meta_data

data.frame old name for item_level

label_col

variable attribute the name of the column in the metadata with labels of variables

verbose

logical display all estimated decisions, defaults to TRUE, except if called in a dq_report2 pipeline.


Variable-argument roles

Description

A Variable-argument role is the intended use of an argument of a indicator function – an argument that refers variables. In general for the table .variable_arg_roles, the suffix _var means one variable allowed, while _vars means more than one. The default sets of arguments for util_correct_variable_use/util_correct_variable_use2 are defined from the point of usage, e.g. if it could be, that NAs are in the list of variable names, the function should be able to remove certain response variables from the output and not disallow them by setting allow_na to FALSE.

Usage

.variable_arg_roles

Format

An object of class tbl_df (inherits from tbl, data.frame) with 14 rows and 9 columns.

See Also

util_correct_variable_use()

util_correct_variable_use2()


Version of the API

Description

Version of the API

Usage

API_VERSION

Format

An object of class package_version (inherits from numeric_version) of length 1.

See Also

.get_internal_api()


Cross-item level metadata attribute name

Description

The allowable direction of an association. The input is a string that can be either "positive" or "negative".

Usage

ASSOCIATION_DIRECTION

Format

An object of class character of length 1.

See Also

meta_data_cross

Other meta_data_cross: ASSOCIATION_FORM, ASSOCIATION_METRIC, ASSOCIATION_RANGE, CHECK_ID, CHECK_LABEL, CONTRADICTION_TERM, CONTRADICTION_TYPE, DATA_PREPARATION, GOLDSTANDARD, MULTIVARIATE_OUTLIER_CHECK, MULTIVARIATE_OUTLIER_CHECKTYPE, N_RULES, REL_VAL, VARIABLE_LIST, meta_data_cross, util_normalize_cross_item()


Cross-item level metadata attribute name

Description

The allowable form of association. The string specifies the form based on a selected list.

Usage

ASSOCIATION_FORM

Format

An object of class character of length 1.

See Also

meta_data_cross

Other meta_data_cross: ASSOCIATION_DIRECTION, ASSOCIATION_METRIC, ASSOCIATION_RANGE, CHECK_ID, CHECK_LABEL, CONTRADICTION_TERM, CONTRADICTION_TYPE, DATA_PREPARATION, GOLDSTANDARD, MULTIVARIATE_OUTLIER_CHECK, MULTIVARIATE_OUTLIER_CHECKTYPE, N_RULES, REL_VAL, VARIABLE_LIST, meta_data_cross, util_normalize_cross_item()


Cross-item level metadata attribute name

Description

The metric underlying the association in ASSOCIATION_RANGE. The input is a string that specifies the analysis algorithm to be used.

Usage

ASSOCIATION_METRIC

Format

An object of class character of length 1.

See Also

meta_data_cross

Other meta_data_cross: ASSOCIATION_DIRECTION, ASSOCIATION_FORM, ASSOCIATION_RANGE, CHECK_ID, CHECK_LABEL, CONTRADICTION_TERM, CONTRADICTION_TYPE, DATA_PREPARATION, GOLDSTANDARD, MULTIVARIATE_OUTLIER_CHECK, MULTIVARIATE_OUTLIER_CHECKTYPE, N_RULES, REL_VAL, VARIABLE_LIST, meta_data_cross, util_normalize_cross_item()


Cross-item level metadata attribute name

Description

Specifies the allowable range of an association. The inclusion of the endpoints follows standard mathematical notation using round brackets for open intervals and square brackets for closed intervals. Values must be separated by a semicolon.

Usage

ASSOCIATION_RANGE

Format

An object of class character of length 1.

See Also

meta_data_cross

Other meta_data_cross: ASSOCIATION_DIRECTION, ASSOCIATION_FORM, ASSOCIATION_METRIC, CHECK_ID, CHECK_LABEL, CONTRADICTION_TERM, CONTRADICTION_TYPE, DATA_PREPARATION, GOLDSTANDARD, MULTIVARIATE_OUTLIER_CHECK, MULTIVARIATE_OUTLIER_CHECKTYPE, N_RULES, REL_VAL, VARIABLE_LIST, meta_data_cross, util_normalize_cross_item()


Cross-item level metadata attribute name

Description

Specifies the unique IDs for cross-item level metadata records

Usage

CHECK_ID

Format

An object of class character of length 1.

Details

if missing, dataquieR will create such IDs

See Also

meta_data_cross

Other meta_data_cross: ASSOCIATION_DIRECTION, ASSOCIATION_FORM, ASSOCIATION_METRIC, ASSOCIATION_RANGE, CHECK_LABEL, CONTRADICTION_TERM, CONTRADICTION_TYPE, DATA_PREPARATION, GOLDSTANDARD, MULTIVARIATE_OUTLIER_CHECK, MULTIVARIATE_OUTLIER_CHECKTYPE, N_RULES, REL_VAL, VARIABLE_LIST, meta_data_cross, util_normalize_cross_item()


Cross-item level metadata attribute name

Description

Specifies the unique labels for cross-item level metadata records

Usage

CHECK_LABEL

Format

An object of class character of length 1.

Details

if missing, dataquieR will create such labels

See Also

meta_data_cross

Other meta_data_cross: ASSOCIATION_DIRECTION, ASSOCIATION_FORM, ASSOCIATION_METRIC, ASSOCIATION_RANGE, CHECK_ID, CONTRADICTION_TERM, CONTRADICTION_TYPE, DATA_PREPARATION, GOLDSTANDARD, MULTIVARIATE_OUTLIER_CHECK, MULTIVARIATE_OUTLIER_CHECKTYPE, N_RULES, REL_VAL, VARIABLE_LIST, meta_data_cross, util_normalize_cross_item()


types of value codes

Description

types of value codes

Usage

CODE_CLASSES

Format

An object of class list of length 3.


Default Name of the Table featuring Code Lists

Description

Default Name of the Table featuring Code Lists

Metadata sheet name containing VALUE_LABEL_TABLES This metadata sheet can contain both value labels of several VALUE_LABEL_TABLE and also Missing and JUMP tables

Usage

CODE_LIST_TABLE

CODE_LIST_TABLE

Format

An object of class character of length 1.

An object of class character of length 1.


Only existence is checked, order not yet used

Description

Only existence is checked, order not yet used

Usage

CODE_ORDER

Format

An object of class character of length 1.


Cross-item level metadata attribute name

Description

Note: in some prep_-functions, this field is named RULE

Usage

CONTRADICTION_TERM

Format

An object of class character of length 1.

Details

Specifies a contradiction rule. Use REDCap like syntax, see online vignette

See Also

meta_data_cross

Other meta_data_cross: ASSOCIATION_DIRECTION, ASSOCIATION_FORM, ASSOCIATION_METRIC, ASSOCIATION_RANGE, CHECK_ID, CHECK_LABEL, CONTRADICTION_TYPE, DATA_PREPARATION, GOLDSTANDARD, MULTIVARIATE_OUTLIER_CHECK, MULTIVARIATE_OUTLIER_CHECKTYPE, N_RULES, REL_VAL, VARIABLE_LIST, meta_data_cross, util_normalize_cross_item()


Cross-item level metadata attribute name

Description

Specifies the type of a contradiction. According to the data quality concept, there are logical and empirical contradictions, see online vignette

Usage

CONTRADICTION_TYPE

Format

An object of class character of length 1.

See Also

meta_data_cross

Other meta_data_cross: ASSOCIATION_DIRECTION, ASSOCIATION_FORM, ASSOCIATION_METRIC, ASSOCIATION_RANGE, CHECK_ID, CHECK_LABEL, CONTRADICTION_TERM, DATA_PREPARATION, GOLDSTANDARD, MULTIVARIATE_OUTLIER_CHECK, MULTIVARIATE_OUTLIER_CHECKTYPE, N_RULES, REL_VAL, VARIABLE_LIST, meta_data_cross, util_normalize_cross_item()


Cross-item level metadata attribute name

Description

For contradiction rules, the required pre-processing steps that can be given. TODO JM: MISSING_LABEL will not work for non-factor variables

Usage

DATA_PREPARATION

Format

An object of class character of length 1.

Details

LABEL LIMITS MISSING_NA MISSING_LABEL MISSING_INTERPRET

See Also

meta_data_cross

Other meta_data_cross: ASSOCIATION_DIRECTION, ASSOCIATION_FORM, ASSOCIATION_METRIC, ASSOCIATION_RANGE, CHECK_ID, CHECK_LABEL, CONTRADICTION_TERM, CONTRADICTION_TYPE, GOLDSTANDARD, MULTIVARIATE_OUTLIER_CHECK, MULTIVARIATE_OUTLIER_CHECKTYPE, N_RULES, REL_VAL, VARIABLE_LIST, meta_data_cross, util_normalize_cross_item()


Data Types

Description

Data Types of Study Data

In the metadata, the following entries are allowed for the variable attribute DATA_TYPE:

Usage

DATA_TYPES

Format

An object of class list of length 4.

Details

Data Types of Function Arguments

As function arguments, dataquieR uses additional type specifications:

See Also

integer string


All available data types, mapped from their respective R types

Description

All available data types, mapped from their respective R types

Usage

DATA_TYPES_OF_R_TYPE

Format

An object of class list of length 14.

See Also

prep_dq_data_type_of


Data frame level metadata attribute name

Description

Name of the data frame

Usage

DF_CODE

Format

An object of class character of length 1.

See Also

meta_data_dataframe


Data frame level metadata attribute name

Description

Number of expected data elements in a data frame. numeric. Check only conducted if number entered

Usage

DF_ELEMENT_COUNT

Format

An object of class character of length 1.

See Also

meta_data_dataframe


Data frame level metadata attribute name

Description

The name of the data frame containing the reference IDs to be compared with the IDs in the study data set.

Usage

DF_ID_REF_TABLE

Format

An object of class character of length 1.

See Also

meta_data_dataframe


Data frame level metadata attribute name

Description

All variables that are to be used as one single ID variable (combined key) in a data frame.

Usage

DF_ID_VARS

Format

An object of class character of length 1.

See Also

meta_data_dataframe


Data frame level metadata attribute name

Description

Name of the data frame

Usage

DF_NAME

Format

An object of class character of length 1.

See Also

meta_data_dataframe


Data frame level metadata attribute name

Description

The type of check to be conducted when comparing the reference ID table with the IDs delivered in the study data files.

Usage

DF_RECORD_CHECK

Format

An object of class character of length 1.

See Also

meta_data_dataframe


Data frame level metadata attribute name

Description

Number of expected data records in a data frame. numeric. Check only conducted if number entered

Usage

DF_RECORD_COUNT

Format

An object of class character of length 1.

See Also

meta_data_dataframe


Data frame level metadata attribute name

Description

Defines expectancies on the uniqueness of the IDs across the rows of a data frame, or the number of times some ID can be repeated.

Usage

DF_UNIQUE_ID

Format

An object of class character of length 1.

See Also

meta_data_dataframe


Data frame level metadata attribute name

Description

Specifies whether identical data is permitted across rows in a data frame (excluding ID variables)

Usage

DF_UNIQUE_ROWS

Format

An object of class character of length 1.

See Also

meta_data_dataframe


All available probability distributions for acc_shape_or_scale

Description

Usage

DISTRIBUTIONS

Format

An object of class list of length 3.


Descriptor Function

Description

A function that returns some figure or table to assess data quality, but it does not return a value correlating with the magnitude of a data quality problem. It's the opposite of an Indicator.

The object Descriptor only contains the name used internally to tag such functions.

Usage

Descriptor

Format

An object of class character of length 1.

See Also

Indicator


Cross-item level metadata attribute name

Description

Defines the measurement variable to be used as a known gold standard. Only one variable can be defined as the gold standard.

Usage

GOLDSTANDARD

Format

An object of class character of length 1.

See Also

meta_data_cross

Other meta_data_cross: ASSOCIATION_DIRECTION, ASSOCIATION_FORM, ASSOCIATION_METRIC, ASSOCIATION_RANGE, CHECK_ID, CHECK_LABEL, CONTRADICTION_TERM, CONTRADICTION_TYPE, DATA_PREPARATION, MULTIVARIATE_OUTLIER_CHECK, MULTIVARIATE_OUTLIER_CHECKTYPE, N_RULES, REL_VAL, VARIABLE_LIST, meta_data_cross, util_normalize_cross_item()


Indicator Function

Description

A function that returns some value that correlates with the magnitude of a certain class of data quality problems. Typically, in dataquieR, such functions return a SummaryTable that features columns with names, that start with a short abbreviation that describes the specific semantics of the value (e.g., PCT for a percentage or COR for a correlation) and the public name of the indicator according to the data quality concept DQ_OBS, e.g., com_qum_nonresp for item-non-response-rate. A name could therefore be PCT_com_qum_nonresp.

The object Indicator only contains the name used internally to tag such functions.

Usage

Indicator

Format

An object of class character of length 1.

See Also

Descriptor


An exception class assigned for exceptions caused by long variable labels

Description

An exception class assigned for exceptions caused by long variable labels

Usage

LONG_LABEL_EXCEPTION

Format

An object of class character of length 1.


Cross-item level metadata attribute name

Description

Select, whether to compute acc_multivariate_outlier.

Usage

MULTIVARIATE_OUTLIER_CHECK

Format

An object of class character of length 1.

Details

You can leave the cell empty, then the depends on the setting of the option dataquieR.MULTIVARIATE_OUTLIER_CHECK. If this column is missing, all this is the same as having all cells empty and dataquieR.MULTIVARIATE_OUTLIER_CHECK set to "auto".

See also MULTIVARIATE_OUTLIER_CHECKTYPE.

See Also

meta_data_cross

Other meta_data_cross: ASSOCIATION_DIRECTION, ASSOCIATION_FORM, ASSOCIATION_METRIC, ASSOCIATION_RANGE, CHECK_ID, CHECK_LABEL, CONTRADICTION_TERM, CONTRADICTION_TYPE, DATA_PREPARATION, GOLDSTANDARD, MULTIVARIATE_OUTLIER_CHECKTYPE, N_RULES, REL_VAL, VARIABLE_LIST, meta_data_cross, util_normalize_cross_item()


Cross-item level metadata attribute name

Description

Select, which outlier criteria to compute, see acc_multivariate_outlier.

Usage

MULTIVARIATE_OUTLIER_CHECKTYPE

Format

An object of class character of length 1.

Details

You can leave the cell empty, then, all checks will apply. If you enter a set of methods, the maximum for N_RULES changes. See also UNIVARIATE_OUTLIER_CHECKTYPE.

See Also

meta_data_cross

Other meta_data_cross: ASSOCIATION_DIRECTION, ASSOCIATION_FORM, ASSOCIATION_METRIC, ASSOCIATION_RANGE, CHECK_ID, CHECK_LABEL, CONTRADICTION_TERM, CONTRADICTION_TYPE, DATA_PREPARATION, GOLDSTANDARD, MULTIVARIATE_OUTLIER_CHECK, N_RULES, REL_VAL, VARIABLE_LIST, meta_data_cross, util_normalize_cross_item()


Cross-item and item level metadata attribute name

Description

Select, how many violated outlier criteria make an observation an outlier, see acc_multivariate_outlier.

Usage

N_RULES

Format

An object of class character of length 1.

Details

You can leave the cell empty, then, all applied checks must deem an observation an outlier to have it flagged. See UNIVARIATE_OUTLIER_CHECKTYPE and MULTIVARIATE_OUTLIER_CHECKTYPE for the selected outlier criteria.

See Also

meta_data_cross

meta_data

Other meta_data_cross: ASSOCIATION_DIRECTION, ASSOCIATION_FORM, ASSOCIATION_METRIC, ASSOCIATION_RANGE, CHECK_ID, CHECK_LABEL, CONTRADICTION_TERM, CONTRADICTION_TYPE, DATA_PREPARATION, GOLDSTANDARD, MULTIVARIATE_OUTLIER_CHECK, MULTIVARIATE_OUTLIER_CHECKTYPE, REL_VAL, VARIABLE_LIST, meta_data_cross, util_normalize_cross_item()


Cross-item level metadata attribute name

Description

Specifies the type of reliability or validity analysis. The string specifies the analysis algorithm to be used, and can be either "inter-class" or "intra-class".

Usage

REL_VAL

Format

An object of class character of length 1.

See Also

meta_data_cross

Other meta_data_cross: ASSOCIATION_DIRECTION, ASSOCIATION_FORM, ASSOCIATION_METRIC, ASSOCIATION_RANGE, CHECK_ID, CHECK_LABEL, CONTRADICTION_TERM, CONTRADICTION_TYPE, DATA_PREPARATION, GOLDSTANDARD, MULTIVARIATE_OUTLIER_CHECK, MULTIVARIATE_OUTLIER_CHECKTYPE, N_RULES, VARIABLE_LIST, meta_data_cross, util_normalize_cross_item()


Scale Levels

Description

Scale Levels of Study Data according to ⁠Stevens's⁠ Typology

In the metadata, the following entries are allowed for the variable attribute SCALE_LEVEL:

Usage

SCALE_LEVELS

Format

An object of class list of length 5.

Details

Examples

See Also

Wikipedia


Segment level metadata attribute name

Description

The name of the data frame containing the reference IDs to be compared with the IDs in the targeted segment.

Usage

SEGMENT_ID_REF_TABLE

Format

An object of class character of length 1.

See Also

meta_data_segment


Deprecated segment level metadata attribute name

Description

The name of the data frame containing the reference IDs to be compared with the IDs in the targeted segment.

Usage

SEGMENT_ID_TABLE

Format

An object of class character of length 1.

Details

Please use SEGMENT_ID_REF_TABLE


Segment level metadata attribute name

Description

All variables that are to be used as one single ID variable (combined key) in a segment.

Usage

SEGMENT_ID_VARS

Format

An object of class character of length 1.

See Also

meta_data_segment


Segment level metadata attribute name

Description

true or false to suppress crude segment missingness output (⁠Completeness/Misg. Segments⁠ in the report). Defaults to compute the output, if more than one segment is available in the item-level metadata.

Usage

SEGMENT_MISS

Format

An object of class character of length 1.

See Also

meta_data_segment


Segment level metadata attribute name

Description

The name of the segment participation status variable

Usage

SEGMENT_PART_VARS

Format

An object of class character of length 1.

See Also

meta_data_segment


Segment level metadata attribute name

Description

The type of check to be conducted when comparing the reference ID table with the IDs in a segment.

Usage

SEGMENT_RECORD_CHECK

Format

An object of class character of length 1.

See Also

meta_data_segment


Segment level metadata attribute name

Description

Number of expected data records in each segment. numeric. Check only conducted if number entered

Usage

SEGMENT_RECORD_COUNT

Format

An object of class character of length 1.

See Also

meta_data_segment


Segment level metadata attribute name

Description

Segment level metadata attribute name

Usage

SEGMENT_UNIQUE_ID

Format

An object of class character of length 1.

See Also

DF_UNIQUE_ID

meta_data_segment


Segment level metadata attribute name

Description

Specifies whether identical data is permitted across rows in a segment (excluding ID variables)

Usage

SEGMENT_UNIQUE_ROWS

Format

An object of class character of length 1.

See Also

meta_data_segment


Character used by default as a separator in metadata such as missing codes

Description

This 1 character is according to our metadata concept "|".

Usage

SPLIT_CHAR

Format

An object of class character of length 1.


Valid unit symbols according to units::valid_udunits()

Description

like m, g, N, ...

See Also

Other UNITS: UNIT_IS_COUNT, UNIT_PREFIXES, UNIT_SOURCES, WELL_KNOWN_META_VARIABLE_NAMES


Is a unit a count according to units::valid_udunits()

Description

see column def, therein

Details

like ⁠%⁠, ppt, ppm

See Also

Other UNITS: UNITS, UNIT_PREFIXES, UNIT_SOURCES, WELL_KNOWN_META_VARIABLE_NAMES


Valid unit prefixes according to units::valid_udunits_prefixes()

Description

like k, m, M, c, ...

See Also

Other UNITS: UNITS, UNIT_IS_COUNT, UNIT_SOURCES, WELL_KNOWN_META_VARIABLE_NAMES


Maturity stage of a unit according to units::valid_udunits()

Description

see column source_xml therein, i.e., base, derived, accepted, or common

See Also

Other UNITS: UNITS, UNIT_IS_COUNT, UNIT_PREFIXES, WELL_KNOWN_META_VARIABLE_NAMES


Item level metadata attribute name

Description

Select, which outlier criteria to compute, see acc_univariate_outlier.

Usage

UNIVARIATE_OUTLIER_CHECKTYPE

Format

An object of class character of length 1.

Details

You can leave the cell empty, then, all checks will apply. If you enter a set of methods, the maximum for N_RULES changes. See also MULTIVARIATE_OUTLIER_CHECKTYPE.

See Also

WELL_KNOWN_META_VARIABLE_NAMES


Requirement levels of certain metadata columns

Description

These levels are cumulatively used by the function prep_create_meta and related in the argument level therein.

Usage

VARATT_REQUIRE_LEVELS

Format

An object of class list of length 5.

Details

currently available:


Cross-item level metadata attribute name

Description

Specifies a group of variables for multivariate analyses. Separated by |, please use variable names from VAR_NAMES or a label as specified in label_col, usually LABEL or LONG_LABEL.

Usage

VARIABLE_LIST

Format

An object of class character of length 1.

Details

if missing, dataquieR will create such IDs from CONTRADICTION_TERM, if specified.

See Also

meta_data_cross

Other meta_data_cross: ASSOCIATION_DIRECTION, ASSOCIATION_FORM, ASSOCIATION_METRIC, ASSOCIATION_RANGE, CHECK_ID, CHECK_LABEL, CONTRADICTION_TERM, CONTRADICTION_TYPE, DATA_PREPARATION, GOLDSTANDARD, MULTIVARIATE_OUTLIER_CHECK, MULTIVARIATE_OUTLIER_CHECKTYPE, N_RULES, REL_VAL, meta_data_cross, util_normalize_cross_item()


Variable roles can be one of the following:

Description

Usage

VARIABLE_ROLES

Format

An object of class list of length 5.


Well-known metadata column names, names of metadata columns

Description

names of the variable attributes in the metadata frame holding the names of the respective observers, devices, lower limits for plausible values, upper limits for plausible values, lower limits for allowed values, upper limits for allowed values, the variable name (column name, e.g. v0020349) used in the study data, the variable name used for processing (readable name, e.g. RR_DIAST_1) and in parameters of the QA-Functions, the variable label, variable long label, variable short label, variable data type (see also DATA_TYPES), re-code for definition of lists of event categories, missing lists and jump lists as CSV strings. For valid units see UNITS.

Usage

WELL_KNOWN_META_VARIABLE_NAMES

Format

An object of class list of length 58.

Details

all entries of this list will be mapped to the package's exported NAMESPACE environment directly, i.e. they are available directly by their names too:

See Also

meta_data_segment for STUDY_SEGMENT

Other UNITS: UNITS, UNIT_IS_COUNT, UNIT_PREFIXES, UNIT_SOURCES

Examples

print(WELL_KNOWN_META_VARIABLE_NAMES$VAR_NAMES)
# print(VAR_NAMES) # should usually also work

Write to a report

Description

Overwriting of elements only list-wise supported

Usage

## S3 replacement method for class 'dataquieR_resultset2'
x[...] <- value

Arguments

x

a 'dataquieR_resultset2

...

if this contains only one entry and this entry is not named or its name is els, then, the report will be accessed in list mode.

value

new value to write

Value

nothing, stops


Extract Parts of a dataquieR Result Object

Description

Extract Parts of a dataquieR Result Object

Usage

## S3 method for class 'dataquieR_result'
x[...]

Arguments

x

the dataquieR result object

...

arguments passed to the implementation for lists.

Value

the sub-list of the dataquieR result object with all messages still attached

See Also

base::Extract


Get a subset of a dataquieR dq_report2 report

Description

Get a subset of a dataquieR dq_report2 report

Usage

## S3 method for class 'dataquieR_resultset2'
x[row, col, res, drop = FALSE, els = row]

Arguments

x

the report

row

the variable names, must be unique

col

the function-call-names, must be unique

res

the result slot, must be unique

drop

drop, if length is 1

els

used, if in list-mode with named argument

Value

a list with results, depending on drop and the number of results, the list may contain all requested results in sub-lists. The order of the results follows the order of the row/column/result-names given


Set a single result from a ⁠dataquieR 2⁠ report

Description

Set a single result from a ⁠dataquieR 2⁠ report

Usage

## S3 replacement method for class 'dataquieR_resultset2'
x[[el]] <- value

Arguments

x

the report

el

the index

value

the single result

Value

the dataquieR result object


Extract Elements of a dataquieR Result Object

Description

Extract Elements of a dataquieR Result Object

Usage

## S3 method for class 'dataquieR_result'
x[[...]]

Arguments

x

the dataquieR result object

...

arguments passed to the implementation for lists.

Value

the element of the dataquieR result object with all messages still attached

See Also

base::Extract


Get a single result from a ⁠dataquieR 2⁠ report

Description

Get a single result from a ⁠dataquieR 2⁠ report

Usage

## S3 method for class 'dataquieR_resultset2'
x[[el]]

Arguments

x

the report

el

the index

Value

the dataquieR result object


Plots and checks for distributions for categorical variables

Description

To complete

Descriptor

Usage

acc_cat_distributions(
  resp_vars = NULL,
  group_vars = NULL,
  study_data,
  label_col,
  item_level = "item_level",
  meta_data = item_level,
  meta_data_v2
)

Arguments

resp_vars

variable the name of the measurement variable

group_vars

variable the name of the observer, device or reader variable

study_data

data.frame the data frame that contains the measurements

label_col

variable attribute the name of the column in the metadata with labels of variables

item_level

data.frame the data frame that contains metadata attributes of study data

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

Details

To complete

Value

A list with:

See Also

Online Documentation


Plots and checks for distributions

Description

Data quality indicator checks "Unexpected location" and "Unexpected proportion" with histograms.

Indicator

Usage

acc_distributions(
  resp_vars = NULL,
  study_data,
  label_col,
  item_level = "item_level",
  check_param = c("any", "location", "proportion"),
  plot_ranges = TRUE,
  flip_mode = "noflip",
  meta_data = item_level,
  meta_data_v2
)

Arguments

resp_vars

variable list the names of the measurement variables

study_data

data.frame the data frame that contains the measurements

label_col

variable attribute the name of the column in the metadata with labels of variables

item_level

data.frame the data frame that contains metadata attributes of study data

check_param

enum any | location | proportion. Which type of check should be conducted (if possible): a check on the location of the mean or median value of the study data, a check on proportions of categories, or either of them if the necessary metadata is available.

plot_ranges

logical Should the plot show ranges and results from the data quality checks? (default: TRUE)

flip_mode

enum default | flip | noflip | auto. Should the plot be in default orientation, flipped, not flipped or auto-flipped. Not all options are always supported. In general, this con be controlled by setting the roptions(dataquieR.flip_mode = ...). If called from dq_report, you can also pass flip_mode to all function calls or set them specifically using specific_args.

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

Value

A list with:

Algorithm of this implementation:

See Also

Online Documentation


ECDF plots for distribution checks

Description

Data quality indicator checks "Unexpected location" and "Unexpected proportion" if a grouping variable is included: Plots of empirical cumulative distributions for the subgroups.

Descriptor

Usage

acc_distributions_ecdf(
  resp_vars = NULL,
  group_vars = NULL,
  study_data,
  label_col,
  item_level = "item_level",
  meta_data = item_level,
  meta_data_v2,
  n_group_max = getOption("dataquieR.max_group_var_levels_in_plot",
    dataquieR.max_group_var_levels_in_plot_default),
  n_obs_per_group_min = getOption("dataquieR.min_obs_per_group_var_in_plot",
    dataquieR.min_obs_per_group_var_in_plot_default)
)

Arguments

resp_vars

variable list the names of the measurement variables

group_vars

variable list the name of the observer, device or reader variable

study_data

data.frame the data frame that contains the measurements

label_col

variable attribute the name of the column in the metadata with labels of variables

item_level

data.frame the data frame that contains metadata attributes of study data

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

n_group_max

maximum number of categories to be displayed individually for the grouping variable (group_vars, devices / examiners)

n_obs_per_group_min

minimum number of data points per group to create a graph for an individual category of the group_vars variable

Value

A list with:

See Also

Online Documentation


Plots and checks for distributions – Location

Description

Data quality indicator checks "Unexpected location" and "Unexpected proportion" with histograms.

Indicator

Usage

acc_distributions_loc(
  resp_vars = NULL,
  study_data,
  label_col = VAR_NAMES,
  item_level = "item_level",
  check_param = "location",
  plot_ranges = TRUE,
  flip_mode = "noflip",
  meta_data = item_level,
  meta_data_v2
)

Arguments

resp_vars

variable list the names of the measurement variables

study_data

data.frame the data frame that contains the measurements

label_col

variable attribute the name of the column in the metadata with labels of variables

item_level

data.frame the data frame that contains metadata attributes of study data

check_param

enum any | location | proportion. Which type of check should be conducted (if possible): a check on the location of the mean or median value of the study data, a check on proportions of categories, or either of them if the necessary metadata is available.

plot_ranges

logical Should the plot show ranges and results from the data quality checks? (default: TRUE)

flip_mode

enum default | flip | noflip | auto. Should the plot be in default orientation, flipped, not flipped or auto-flipped. Not all options are always supported. In general, this con be controlled by setting the roptions(dataquieR.flip_mode = ...). If called from dq_report, you can also pass flip_mode to all function calls or set them specifically using specific_args.

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

Value

A list with:

Algorithm of this implementation:

See Also


Plots and checks for distributions – only

Description

Descriptor

Usage

acc_distributions_only(
  resp_vars = NULL,
  study_data,
  label_col = VAR_NAMES,
  item_level = "item_level",
  flip_mode = "noflip",
  meta_data = item_level,
  meta_data_v2
)

Arguments

resp_vars

variable list the names of the measurement variables

study_data

data.frame the data frame that contains the measurements

label_col

variable attribute the name of the column in the metadata with labels of variables

item_level

data.frame the data frame that contains metadata attributes of study data

flip_mode

enum default | flip | noflip | auto. Should the plot be in default orientation, flipped, not flipped or auto-flipped. Not all options are always supported. In general, this con be controlled by setting the roptions(dataquieR.flip_mode = ...). If called from dq_report, you can also pass flip_mode to all function calls or set them specifically using specific_args.

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

Value

A list with:

Algorithm of this implementation:

See Also


Plots and checks for distributions – Proportion

Description

Data quality indicator checks "Unexpected location" and "Unexpected proportion" with histograms.

Indicator

Usage

acc_distributions_prop(
  resp_vars = NULL,
  study_data,
  label_col,
  item_level = "item_level",
  check_param = "proportion",
  plot_ranges = TRUE,
  flip_mode = "noflip",
  meta_data = item_level,
  meta_data_v2
)

Arguments

resp_vars

variable list the names of the measurement variables

study_data

data.frame the data frame that contains the measurements

label_col

variable attribute the name of the column in the metadata with labels of variables

item_level

data.frame the data frame that contains metadata attributes of study data

check_param

enum any | location | proportion. Which type of check should be conducted (if possible): a check on the location of the mean or median value of the study data, a check on proportions of categories, or either of them if the necessary metadata is available.

plot_ranges

logical Should the plot show ranges and results from the data quality checks? (default: TRUE)

flip_mode

enum default | flip | noflip | auto. Should the plot be in default orientation, flipped, not flipped or auto-flipped. Not all options are always supported. In general, this con be controlled by setting the roptions(dataquieR.flip_mode = ...). If called from dq_report, you can also pass flip_mode to all function calls or set them specifically using specific_args.

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

Value

A list with:

Algorithm of this implementation:

See Also


Extension of acc_shape_or_scale to examine uniform distributions of end digits

Description

This implementation contrasts the empirical distribution of a measurement variables against assumed distributions. The approach is adapted from the idea of rootograms (Tukey (1977)) which is also applicable for count data (Kleiber and Zeileis (2016)).

Indicator

Usage

acc_end_digits(
  resp_vars = NULL,
  study_data,
  label_col,
  item_level = "item_level",
  meta_data = item_level,
  meta_data_v2
)

Arguments

resp_vars

variable the names of the measurement variables, mandatory

study_data

data.frame the data frame that contains the measurements

label_col

variable attribute the name of the column in the metadata with labels of variables

item_level

data.frame the data frame that contains metadata attributes of study data

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

Value

a list with:

ALGORITHM OF THIS IMPLEMENTATION:

See Also

Online Documentation


Smoothes and plots adjusted longitudinal measurements and longitudinal trends from logistic regression models

Description

The following R implementation executes calculations for quality indicator "Unexpected location" (see here. Local regression (LOESS) is a versatile statistical method to explore an averaged course of time series measurements (Cleveland, Devlin, and Grosse 1988). In context of epidemiological data, repeated measurements using the same measurement device or by the same examiner can be considered a time series. LOESS allows to explore changes in these measurements over time.

Descriptor

Usage

acc_loess(
  resp_vars,
  group_vars = NULL,
  time_vars,
  co_vars = NULL,
  study_data,
  label_col = VAR_NAMES,
  item_level = "item_level",
  min_obs_in_subgroup = 30,
  resolution = 80,
  comparison_lines = list(type = c("mean/sd", "quartiles"), color = "grey30", linetype =
    2, sd_factor = 0.5),
  mark_time_points = getOption("dataquieR.acc_loess.mark_time_points",
    dataquieR.acc_loess.mark_time_points_default),
  plot_observations = getOption("dataquieR.acc_loess.plot_observations",
    dataquieR.acc_loess.plot_observations_default),
  plot_format = getOption("dataquieR.acc_loess.plot_format",
    dataquieR.acc_loess.plot_format_default),
  meta_data = item_level,
  meta_data_v2,
  n_group_max = getOption("dataquieR.max_group_var_levels_in_plot",
    dataquieR.max_group_var_levels_in_plot_default),
  enable_GAM = getOption("dataquieR.GAM_for_LOESS", dataquieR.GAM_for_LOESS.default),
  exclude_constant_subgroups =
    getOption("dataquieR.acc_loess.exclude_constant_subgroups",
    dataquieR.acc_loess.exclude_constant_subgroups.default),
  min_bandwidth = getOption("dataquieR.acc_loess.min_bw",
    dataquieR.acc_loess.min_bw.default),
  min_proportion = getOption("dataquieR.acc_loess.min_proportion",
    dataquieR.acc_loess.min_proportion.default)
)

Arguments

resp_vars

variable the name of the continuous measurement variable

group_vars

variable the name of the observer, device or reader variable

time_vars

variable the name of the variable giving the time of measurement

co_vars

variable list a vector of covariables for adjustment, for example age and sex. Can be NULL (default) for no adjustment.

study_data

data.frame the data frame that contains the measurements

label_col

variable attribute the name of the column in the metadata with labels of variables

item_level

data.frame the data frame that contains metadata attributes of study data

min_obs_in_subgroup

integer (optional argument) If group_vars is specified, this argument can be used to specify the minimum number of observations required for each of the subgroups. Subgroups with fewer observations are excluded. The default number is 30.

resolution

numeric the maximum number of time points used for plotting the trend lines

comparison_lines

list type and style of lines with which trend lines are to be compared. Can be mean +/- 0.5 standard deviation (the factor can be specified differently in sd_factor) or quartiles (Q1, Q2, and Q3). Arguments color and linetype are passed to ggplot2::geom_line().

mark_time_points

logical mark time points with observations (caution, there may be many marks)

plot_observations

logical show observations as scatter plot in the background. If there are co_vars specified, the values of the observations in the plot will also be adjusted for the specified covariables.

plot_format

enum AUTO | COMBINED | FACETS | BOTH. Return the plot as one combined plot for all groups or as facet plots (one figure per group). BOTH will return both variants, AUTO will decide based on the number of observers.

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

n_group_max

integer maximum number of categories to be displayed individually for the grouping variable (group_vars, devices / examiners)

enable_GAM

logical Can LOESS computations be replaced by general additive models to reduce memory consumption for large datasets?

exclude_constant_subgroups

logical Should subgroups with constant values be excluded?

min_bandwidth

numeric lower limit for the LOESS bandwidth, should be greater than 0 and less than or equal to 1. In general, increasing the bandwidth leads to a smoother trend line.

min_proportion

numeric lower limit for the proportion of the smaller group (cases or controls) for creating a LOESS figure, should be greater than 0 and less than 0.4.

Details

If mark_time_points or plot_observations is selected, but would result in plotting more than 400 points, only a sample of the data will be displayed.

Limitations

The application of LOESS requires model fitting, i.e. the smoothness of a model is subject to a smoothing parameter (span). Particularly in the presence of interval-based missing data, high variability of measurements combined with a low number of observations in one level of the group_vars may distort the fit. Since our approach handles data without knowledge of such underlying characteristics, finding the best fit is complicated if computational costs should be minimal. The default of LOESS in R uses a span of 0.75, which provides in most cases reasonable fits. The function acc_loess adapts the span for each level of the group_vars (with at least as many observations as specified in min_obs_in_subgroup and with at least three time points) based on the respective number of observations. LOESS consumes a lot of memory for larger datasets. That is why acc_loess switches to a generalized additive model with integrated smoothness estimation (gam by mgcv) if there are 1000 observations or more for at least one level of the group_vars (similar to geom_smooth from ggplot2).

Value

a list with:

See Also

Online Documentation


Estimate marginal means, see emmeans::emmeans

Description

This function examines the impact of so-called process variables on a measurement variable. This implementation combines a descriptive and a model-based approach. Process variables that can be considered in this implementation must be categorical. It is currently not possible to consider more than one process variable within one function call. The measurement variable can be adjusted for (multiple) covariables, such as age or sex, for example.

Marginal means rests on model-based results, i.e. a significantly different marginal mean depends on sample size. Particularly in large studies, small and irrelevant differences may become significant. The contrary holds if sample size is low.

Indicator

Usage

acc_margins(
  resp_vars = NULL,
  group_vars = NULL,
  co_vars = NULL,
  study_data,
  label_col,
  item_level = "item_level",
  threshold_type = "empirical",
  threshold_value,
  min_obs_in_subgroup = 5,
  min_obs_in_cat = 5,
  dichotomize_categorical_resp = TRUE,
  cut_off_linear_model_for_ord = 10,
  meta_data = item_level,
  meta_data_v2,
  sort_group_var_levels = getOption("dataquieR.acc_margins_sort",
    dataquieR.acc_margins_sort_default),
  include_numbers_in_figures = getOption("dataquieR.acc_margins_num",
    dataquieR.acc_margins_num_default),
  n_violin_max = getOption("dataquieR.max_group_var_levels_with_violins",
    dataquieR.max_group_var_levels_with_violins_default)
)

Arguments

resp_vars

variable the name of the measurement variable

group_vars

variable list len=1-1. the name of the observer, device or reader variable

co_vars

variable list a vector of covariables, e.g. age and sex for adjustment

study_data

data.frame the data frame that contains the measurements

label_col

variable attribute the name of the column in the metadata with labels of variables

item_level

data.frame the data frame that contains metadata attributes of study data

threshold_type

enum empirical | user | none. In case empirical is chosen, a multiplier of the scale measure is used. In case of user, a value of the mean or probability (binary data) has to be defined see ⁠Implementation and use of thresholds⁠ in the online documentation). In case of none, no thresholds are displayed and no flagging of unusual group levels is applied.

threshold_value

numeric a multiplier or absolute value (see ⁠Implementation and use of thresholds⁠ in the online documentation).

min_obs_in_subgroup

integer from=0. This optional argument specifies the minimum number of observations that is required to include a subgroup (level) of the group_var in the analysis. Subgroups with less observations are excluded.

min_obs_in_cat

integer This optional argument specifies the minimum number of observations that is required to include a category (level) of the outcome (resp_vars) in the analysis. Categories with less observations are combined into one group. If the collapsed category contains less observations than required, it will be excluded from the analysis.

dichotomize_categorical_resp

logical Should nominal response variables always be transformed to binary variables?

cut_off_linear_model_for_ord

integer from=0. This optional argument specifies the minimum number of observations for individual levels of an ordinal outcome (resp_var) that is required to run a linear model instead of an ordered regression (i.e., a cut-off value above which linear models are considered a good approximation). The argument can be set to NULL if ordered regression models are preferred for ordinal data in any case.

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

sort_group_var_levels

logical Should the levels of the grouping variable be sorted descending by the number of observations? Note that ordinal grouping variables will not be reordered.

include_numbers_in_figures

logical Should the figure report the number of observations for each level of the grouping variable?

n_violin_max

integer from=0. This optional argument specifies the maximum number of levels of the group_var for which violin plots will be shown in the figure.

Details

Limitations

Selecting the appropriate distribution is complex. Dozens of continuous, discrete or mixed distributions are conceivable in the context of epidemiological data. Their exact exploration is beyond the scope of this data quality approach. The present function uses the help function util_dist_selection, the assigned SCALE_LEVEL and the DATA_TYPE to discriminate the following cases:

Continuous data and count data with more than 20 distinct values are analyzed by linear models. Count data with up to 20 distinct values are modeled by a Poisson regression. For binary data, the implementation uses logistic regression. Nominal response variables will either be transformed to binary variables or analyzed by multinomial logistic regression models. The latter option is only available if the argument dichotomize_categorical_resp is set to FALSE and if the package nnet is installed. The transformation to a binary variable can be user-specified using the metadata columns RECODE_CASES and/or RECODE_CONTROL. Otherwise, the most frequent category will be assigned to cases and the remaining categories to control. For ordinal response variables, the argument cut_off_linear_model_for_ord controls whether the data is analyzed in the same way as continuous data: If every level of the variable has at least as many observations as specified in the argument, the data will be analyzed by a linear model. Otherwise, the data will be modeled by a ordered regression, if the package ordinal is installed.

Value

a list with:

See Also

Online Documentation


Calculate and plot Mahalanobis distances

Description

A standard tool to detect multivariate outliers is the Mahalanobis distance. This approach is very helpful for the interpretation of the plausibility of a measurement given the value of another. In this approach the Mahalanobis distance is used as a univariate measure itself. We apply the same rules for the identification of outliers as in univariate outliers:

For further details, please see the vignette for univariate outlier.

Indicator

Usage

acc_multivariate_outlier(
  variable_group = NULL,
  id_vars = NULL,
  label_col = VAR_NAMES,
  study_data,
  item_level = "item_level",
  n_rules = 4,
  max_non_outliers_plot = 10000,
  criteria = c("tukey", "3sd", "hubert", "sigmagap"),
  meta_data = item_level,
  meta_data_v2,
  scale = getOption("dataquieR.acc_multivariate_outlier.scale",
    dataquieR.acc_multivariate_outlier.scale_default),
  multivariate_outlier_check = TRUE
)

Arguments

variable_group

variable list the names of the continuous measurement variables building a group, for that multivariate outliers make sense.

id_vars

variable optional, an ID variable of the study data. If not specified row numbers are used.

label_col

variable attribute the name of the column in the metadata with labels of variables

study_data

data.frame the data frame that contains the measurements

item_level

data.frame the data frame that contains metadata attributes of study data

n_rules

numeric from=1 to=4. the no. of rules that must be violated to classify as outlier

max_non_outliers_plot

integer from=0. Maximum number of non-outlier points to be plot. If more points exist, a subsample will be plotted only. Note, that sampling is not deterministic.

criteria

set tukey | 3SD | hubert | sigmagap. a vector with methods to be used for detecting outliers.

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

scale

logical Should min-max-scaling be applied per variable?

multivariate_outlier_check

logical really check, pipeline use, only.

Value

a list with:

ALGORITHM OF THIS IMPLEMENTATION:

List function.

See Also

Online Documentation


Identify univariate outliers by four different approaches

Description

A classical but still popular approach to detect univariate outlier is the boxplot method introduced by Tukey 1977. The boxplot is a simple graphical tool to display information about continuous univariate data (e.g., median, lower and upper quartile). Outliers are defined as values deviating more than 1.5 \times IQR from the 1st (Q25) or 3rd (Q75) quartile. The strength of Tukey's method is that it makes no distributional assumptions and thus is also applicable to skewed or non mound-shaped data Marsh and Seo, 2006. Nevertheless, this method tends to identify frequent measurements which are falsely interpreted as true outliers.

A somewhat more conservative approach in terms of symmetric and/or normal distributions is the 3SD approach, i.e. any measurement not in the interval of mean(x) +/- 3 * \sigma is considered an outlier.

Both methods mentioned above are not ideally suited to skewed distributions. As many biomarkers such as laboratory measurements represent in skewed distributions the methods above may be insufficient. The approach of Hubert and Vandervieren 2008 adjusts the boxplot for the skewness of the distribution. This approach is implemented in several R packages such as robustbase::mc which is used in this implementation of dataquieR.

Another completely heuristic approach is also included to identify outliers. The approach is based on the assumption that the distances between measurements of the same underlying distribution should homogeneous. For comprehension of this approach:

Note, that the plots are not deterministic, because they use ggplot2::geom_jitter.

Indicator

Usage

acc_robust_univariate_outlier(
  resp_vars = NULL,
  study_data,
  label_col,
  item_level = "item_level",
  exclude_roles,
  n_rules = length(unique(criteria)),
  max_non_outliers_plot = 10000,
  criteria = c("tukey", "3sd", "hubert", "sigmagap"),
  meta_data = item_level,
  meta_data_v2
)

Arguments

resp_vars

variable list the name of the continuous measurement variable

study_data

data.frame the data frame that contains the measurements

label_col

variable attribute the name of the column in the metadata with labels of variables

item_level

data.frame the data frame that contains metadata attributes of study data

exclude_roles

variable roles a character (vector) of variable roles not included

n_rules

integer from=1 to=4. the no. rules that must be violated to flag a variable as containing outliers. The default is 4, i.e. all.

max_non_outliers_plot

integer from=0. Maximum number of non-outlier points to be plot. If more points exist, a subsample will be plotted only. Note, that sampling is not deterministic.

criteria

set tukey | 3SD | hubert | sigmagap. a vector with methods to be used for detecting outliers.

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

Details

Hint: The function is designed for unimodal data only.

Value

a list with:

ALGORITHM OF THIS IMPLEMENTATION:

See Also

acc_univariate_outlier


Compare observed versus expected distributions

Description

This implementation contrasts the empirical distribution of a measurement variables against assumed distributions. The approach is adapted from the idea of rootograms (Tukey 1977) which is also applicable for count data (Kleiber and Zeileis 2016).

Indicator

Usage

acc_shape_or_scale(
  resp_vars,
  study_data,
  label_col,
  item_level = "item_level",
  dist_col,
  guess,
  par1,
  par2,
  end_digits,
  flip_mode = "noflip",
  meta_data = item_level,
  meta_data_v2
)

Arguments

resp_vars

variable the name of the continuous measurement variable

study_data

data.frame the data frame that contains the measurements

label_col

variable attribute the name of the column in the metadata with labels of variables

item_level

data.frame the data frame that contains metadata attributes of study data

dist_col

variable attribute the name of the variable attribute in meta_data that provides the expected distribution of a study variable

guess

logical estimate parameters

par1

numeric first parameter of the distribution if applicable

par2

numeric second parameter of the distribution if applicable

end_digits

logical internal use. check for end digits preferences

flip_mode

enum default | flip | noflip | auto. Should the plot be in default orientation, flipped, not flipped or auto-flipped. Not all options are always supported. In general, this con be controlled by setting the roptions(dataquieR.flip_mode = ...). If called from dq_report, you can also pass flip_mode to all function calls or set them specifically using specific_args.

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

Value

a list with:

ALGORITHM OF THIS IMPLEMENTATION:

See Also

Online Documentation


Identify univariate outliers by four different approaches

Description

A classical but still popular approach to detect univariate outlier is the boxplot method introduced by Tukey 1977. The boxplot is a simple graphical tool to display information about continuous univariate data (e.g., median, lower and upper quartile). Outliers are defined as values deviating more than 1.5 \times IQR from the 1st (Q25) or 3rd (Q75) quartile. The strength of Tukey's method is that it makes no distributional assumptions and thus is also applicable to skewed or non mound-shaped data Marsh and Seo, 2006. Nevertheless, this method tends to identify frequent measurements which are falsely interpreted as true outliers.

A somewhat more conservative approach in terms of symmetric and/or normal distributions is the 3SD approach, i.e. any measurement not in the interval of mean(x) +/- 3 * \sigma is considered an outlier.

Both methods mentioned above are not ideally suited to skewed distributions. As many biomarkers such as laboratory measurements represent in skewed distributions the methods above may be insufficient. The approach of Hubert and Vandervieren 2008 adjusts the boxplot for the skewness of the distribution. This approach is implemented in several R packages such as robustbase::mc which is used in this implementation of dataquieR.

Another completely heuristic approach is also included to identify outliers. The approach is based on the assumption that the distances between measurements of the same underlying distribution should homogeneous. For comprehension of this approach:

Note, that the plots are not deterministic, because they use ggplot2::geom_jitter.

Indicator

Usage

acc_univariate_outlier(
  resp_vars = NULL,
  study_data,
  label_col,
  item_level = "item_level",
  exclude_roles,
  n_rules = length(unique(criteria)),
  max_non_outliers_plot = 10000,
  criteria = c("tukey", "3sd", "hubert", "sigmagap"),
  meta_data = item_level,
  meta_data_v2
)

Arguments

resp_vars

variable list the name of the continuous measurement variable

study_data

data.frame the data frame that contains the measurements

label_col

variable attribute the name of the column in the metadata with labels of variables

item_level

data.frame the data frame that contains metadata attributes of study data

exclude_roles

variable roles a character (vector) of variable roles not included

n_rules

integer from=1 to=4. the no. rules that must be violated to flag a variable as containing outliers. The default is 4, i.e. all.

max_non_outliers_plot

integer from=0. Maximum number of non-outlier points to be plot. If more points exist, a subsample will be plotted only. Note, that sampling is not deterministic.

criteria

set tukey | 3SD | hubert | sigmagap. a vector with methods to be used for detecting outliers.

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

Details

Hint: The function is designed for unimodal data only.

Value

a list with:

ALGORITHM OF THIS IMPLEMENTATION:

See Also


Utility function to compute model-based ICC depending on the (statistical) data type

Description

This function is still under construction. It is designed to run for any statistical data type as follows:

Indicator

Usage

acc_varcomp(
  resp_vars = NULL,
  group_vars = NULL,
  co_vars = NULL,
  study_data,
  label_col,
  item_level = "item_level",
  min_obs_in_subgroup = 10,
  min_subgroups = 5,
  cut_off_linear_model_for_ord = 10,
  threshold_value = lifecycle::deprecated(),
  meta_data = item_level,
  meta_data_v2
)

Arguments

resp_vars

variable the name of the measurement variable

group_vars

variable the name of the examiner, device or reader variable

co_vars

variable list a vector of covariables, e.g. age and sex, for adjustment

study_data

data.frame the data frame that contains the measurements

label_col

variable attribute the name of the column in the metadata with labels of variables

item_level

data.frame the data frame that contains metadata attributes of study data

min_obs_in_subgroup

integer from=0. This optional argument specifies the minimum number of observations that is required to include a subgroup (level) of the group_var in the analysis. Subgroups with less observations are excluded.

min_subgroups

integer from=0. This optional argument specifies the minimum number of subgroups (level) of the group_var that is required to run the analysis. If there are less subgroups, the analysis is not conducted.

cut_off_linear_model_for_ord

integer from=0. This optional argument specifies the minimum number of observations for individual levels of an ordinal outcome (resp_var) that is required to run a linear mixed effects model instead of a mixed effects ordered regression (i.e., a cut-off value above which linear models are considered a good approximation). The argument can be set to NULL if ordered regression models are preferred for ordinal data in any case.

threshold_value

Deprecated.

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

Details

Not yet described

Value

The function returns two data frames, 'SummaryTable' and 'SummaryData', that differ only in the names of the columns.


Convert a full dataquieR report to a data.frame

Description

Deprecated

Usage

## S3 method for class 'dataquieR_resultset'
as.data.frame(x, ...)

Arguments

x

Deprecated

...

Deprecated

Value

Deprecated


Convert a full dataquieR report to a list

Description

Deprecated

Usage

## S3 method for class 'dataquieR_resultset'
as.list(x, ...)

Arguments

x

Deprecated

...

Deprecated

Value

Deprecated


inefficient way to convert a report to a list. try prep_set_backend()

Description

inefficient way to convert a report to a list. try prep_set_backend()

Usage

## S3 method for class 'dataquieR_resultset2'
as.list(x, ...)

Arguments

x

dataquieR_resultset2

...

no used

Value

list


Data frame with contradiction rules

Description

Two versions exist, the newer one is used by con_contradictions_redcap and is described here., the older one used by con_contradictions is described here.

See Also

meta_data_cross


Summarize missingness columnwise (in variable)

Description

Item-Missingness (also referred to as item nonresponse (De Leeuw et al. 2003)) describes the missingness of single values, e.g. blanks or empty data cells in a data set. Item-Missingness occurs for example in case a respondent does not provide information for a certain question, a question is overlooked by accident, a programming failure occurs or a provided answer were missed while entering the data.

Indicator

Usage

com_item_missingness(
  resp_vars = NULL,
  study_data,
  label_col,
  item_level = "item_level",
  show_causes = TRUE,
  cause_label_df,
  include_sysmiss = TRUE,
  threshold_value,
  suppressWarnings = FALSE,
  assume_consistent_codes = TRUE,
  expand_codes = assume_consistent_codes,
  drop_levels = FALSE,
  expected_observations = c("HIERARCHY", "ALL", "SEGMENT"),
  pretty_print = lifecycle::deprecated(),
  meta_data = item_level,
  meta_data_v2
)

Arguments

resp_vars

variable list the name of the measurement variables

study_data

data.frame the data frame that contains the measurements

label_col

variable attribute the name of the column in the metadata with labels of variables

item_level

data.frame the data frame that contains metadata attributes of study data

show_causes

logical if TRUE, then the distribution of missing codes is shown

cause_label_df

data.frame missing code table. If missing codes have labels the respective data frame can be specified here or in the metadata as assignments, see cause_label_df

include_sysmiss

logical Optional, if TRUE system missingness (NAs) is evaluated in the summary plot

threshold_value

numeric from=0 to=100. a numerical value ranging from 0-100

suppressWarnings

logical warn about consistency issues with missing and jump lists

assume_consistent_codes

logical if TRUE and no labels are given and the same missing/jump code is used for more than one variable, the labels assigned for this code are treated as being be the same for all variables.

expand_codes

logical if TRUE, code labels are copied from other variables, if the code is the same and the label is set somewhere

drop_levels

logical if TRUE, do not display unused missing codes in the figure legend.

expected_observations

enum HIERARCHY | ALL | SEGMENT. If ALL, all observations are expected to comprise all study segments. If SEGMENT, the PART_VAR is expected to point to a variable with values of 0 and 1, indicating whether the variable was expected to be observed for each data row. If HIERARCHY, this is also checked recursively, so, if a variable points to such a participation variable, and that other variable does has also a PART_VAR entry pointing to a variable, the observation of the initial variable is only expected, if both segment variables are 1.

pretty_print

logical deprecated. If you want to have a human readable output, use SummaryData instead of SummaryTable

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

Value

a list with:

ALGORITHM OF THIS IMPLEMENTATION:

See Also

Online Documentation


Compute Indicators for Qualified Item Missingness

Description

Indicator

Usage

com_qualified_item_missingness(
  resp_vars,
  study_data,
  label_col = NULL,
  item_level = "item_level",
  expected_observations = c("HIERARCHY", "ALL", "SEGMENT"),
  meta_data = item_level,
  meta_data_v2
)

Arguments

resp_vars

variable list the name of the measurement variables

study_data

data.frame the data frame that contains the measurements

label_col

variable attribute the name of the column in the metadata with labels of variables

item_level

data.frame the data frame that contains metadata attributes of study data

expected_observations

enum HIERARCHY | ALL | SEGMENT. Report the number of observations expected using the old PART_VAR concept. See com_item_missingness for an explanation.

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

Value

A list with:

Examples

## Not run: 
prep_load_workbook_like_file("inst/extdata/Metadata_example_v3-6.xlsx")
clean <- prep_get_data_frame("item_level")
clean <- subset(clean, `Metadata name` == "Example" &
  !dataquieR:::util_empty(VAR_NAMES))
clean$`Metadata name` <- NULL
clean[, "MISSING_LIST_TABLE"] <- "missing_matchtable1"
prep_add_data_frames(item_level = clean)
clean <- prep_get_data_frame("missing_matchtable1")
clean <- clean[clean$`Metadata name` == "Example", , FALSE]
clean <-
  clean[suppressWarnings(as.character(as.integer(clean$CODE_VALUE)) ==
    as.character(clean$CODE_VALUE)), , FALSE]
clean$CODE_VALUE <- as.integer(clean$CODE_VALUE)
clean <- clean[!is.na(clean$`Metadata name`), , FALSE]
clean$`Metadata name` <- NULL
prep_add_data_frames(missing_matchtable1 = clean)
ship <- prep_get_data_frame("ship")
number_of_mis <- ceiling(nrow(ship) / 20)
resp_vars <- sample(colnames(ship), ceiling(ncol(ship) / 20), FALSE)
mistab <- prep_get_data_frame("missing_matchtable1")
valid_replacement_codes <-
  mistab[mistab$CODE_INTERPRET != "I", CODE_VALUE,
    drop =
    TRUE] # sample only replacement codes on item level. I uses the actual
          # values
for (rv in resp_vars) {
  values <- sample(as.numeric(valid_replacement_codes), number_of_mis,
    replace = TRUE)
  if (inherits(ship[[rv]], "POSIXct")) {
    values <- as.POSIXct(values, origin = min(as.POSIXct(Sys.Date()), 0))
  }
  ship[sample(seq_len(nrow(ship)), number_of_mis, replace = FALSE), rv] <-
    values
}
com_qualified_item_missingness(resp_vars = NULL, ship, "item_level", LABEL)
com_qualified_item_missingness(resp_vars = "Diabetes Age onset", ship,
  "item_level", LABEL)
com_qualified_item_missingness(resp_vars = NULL, "study_data", "meta_data",
  LABEL)
study_data <- ship
meta_data <- prep_get_data_frame("item_level")
label <- LABEL

## End(Not run)

Compute Indicators for Qualified Segment Missingness

Description

Indicator

Usage

com_qualified_segment_missingness(
  label_col = NULL,
  study_data,
  item_level = "item_level",
  expected_observations = c("HIERARCHY", "ALL", "SEGMENT"),
  meta_data = item_level,
  meta_data_v2,
  meta_data_segment,
  segment_level
)

Arguments

label_col

variable attribute the name of the column in the metadata with labels of variables

study_data

data.frame the data frame that contains the measurements

item_level

data.frame the data frame that contains metadata attributes of study data

expected_observations

enum HIERARCHY | ALL | SEGMENT. Report the number of observations expected using the old PART_VAR concept. See com_item_missingness for an explanation.

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

meta_data_segment

data.frame Segment level metadata

segment_level

data.frame alias for meta_data_segment

Value

A list with:


Summarizes missingness for individuals in specific segments

Description

This implementation can be applied in two use cases:

  1. participation in study segments is not recorded by respective variables, e.g. a participant's refusal to attend a specific examination is not recorded.

  2. participation in study segments is recorded by respective variables.

Use case (1) will be common in smaller studies. For the calculation of segment missingness it is assumed that study variables are nested in respective segments. This structure must be specified in the static metadata. The R-function identifies all variables within each segment and returns TRUE if all variables within a segment are missing, otherwise FALSE.

Use case (2) assumes a more complex structure of study data and metadata. The study data comprise so-called intro-variables (either TRUE/FALSE or codes for non-participation). The column PART_VAR in the metadata is filled by variable-IDs indicating for each variable the respective intro-variable. This structure has the benefit that subsequent calculation of item missingness obtains correct denominators for the calculation of missingness rates.

Descriptor

Usage

com_segment_missingness(
  study_data,
  item_level = "item_level",
  strata_vars = NULL,
  group_vars = NULL,
  label_col,
  threshold_value,
  direction,
  color_gradient_direction,
  expected_observations = c("HIERARCHY", "ALL", "SEGMENT"),
  exclude_roles = c(VARIABLE_ROLES$PROCESS),
  meta_data = item_level,
  meta_data_v2,
  segment_level,
  meta_data_segment
)

Arguments

study_data

data.frame the data frame that contains the measurements

item_level

data.frame the data frame that contains metadata attributes of study data

strata_vars

variable the name of a variable used for stratification, defaults to NULL for not grouping output

group_vars

variable the name of a variable used for grouping, defaults to NULL for not grouping output

label_col

variable attribute the name of the column in the metadata with labels of variables

threshold_value

numeric from=0 to=100. a numerical value ranging from 0-100

direction

enum low | high. "high" or "low", i.e. are deviations above/below the threshold critical. This argument is deprecated and replaced by color_gradient_direction.

color_gradient_direction

enum above | below. "above" or "below", i.e. are deviations above or below the threshold critical? (default: above)

expected_observations

enum HIERARCHY | ALL | SEGMENT. If ALL, all observations are expected to comprise all study segments. If SEGMENT, the PART_VAR is expected to point to a variable with values of 0 and 1, indicating whether the variable was expected to be observed for each data row. If HIERARCHY, this is also checked recursively, so, if a variable points to such a participation variable, and that other variable does has also a PART_VAR entry pointing to a variable, the observation of the initial variable is only expected, if both segment variables are 1.

exclude_roles

variable roles a character (vector) of variable roles not included

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

segment_level

data.frame alias for meta_data_segment

meta_data_segment

data.frame Segment level metadata. Optional.

Details

Implementation and use of thresholds

This implementation uses one threshold to discriminate critical from non-critical values. If direction is above than all values below the threshold_value are normal (displayed in dark blue in the plot and flagged with GRADING = 0 in the dataframe). All values above the threshold_value are considered critical. The more they deviate from the threshold the displayed color shifts to dark red. All critical values are highlighted with GRADING = 1 in the summary data frame. By default, highest values are always shown in dark red irrespective of the absolute deviation.

If direction is below than all values above the threshold_value are normal (displayed in dark blue, GRADING = 0).

Hint

This function does not support a resp_vars argument but exclude_roles to specify variables not relevant for detecting a missing segment.

List function.

Value

a list with:

See Also

Online Documentation


Counts all individuals with no measurements at all

Description

This implementation examines a crude version of unit missingness or unit-nonresponse (Kalton and Kasprzyk 1986), i.e. if all measurement variables in the study data are missing for an observation it has unit missingness.

The function can be applied on stratified data. In this case strata_vars must be specified.

Descriptor

Usage

com_unit_missingness(
  id_vars = NULL,
  strata_vars = NULL,
  label_col,
  study_data,
  item_level = "item_level",
  meta_data = item_level,
  meta_data_v2
)

Arguments

id_vars

variable list optional, a (vectorized) call of ID-variables that should not be considered in the calculation of unit- missingness

strata_vars

variable optional, a string or integer variable used for stratification

label_col

variable attribute the name of the column in the metadata with labels of variables

study_data

data.frame the data frame that contains the measurements

item_level

data.frame the data frame that contains metadata attributes of study data

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

Details

This implementations calculates a crude rate of unit-missingness. This type of missingness may have several causes and is an important research outcome. For example, unit-nonresponse may be selective regarding the targeted study population or technical reasons such as record-linkage may cause unit-missingness.

It has to be discriminated form segment and item missingness, since different causes and mechanisms may be the reason for unit-missingness.

Hint

This function does not support a resp_vars argument but id_vars, which have a roughly inverse logic behind: id_vars with values do not prevent a row from being considered missing, because an ID is the only hint for a unit that elsewise would not occur in the data at all.

List function.

Value

A list with:

See Also

Online Documentation


Checks user-defined contradictions in study data

Description

This approach considers a contradiction if impossible combinations of data are observed in one participant. For example, if age of a participant is recorded repeatedly the value of age is (unfortunately) not able to decline. Most cases of contradictions rest on comparison of two variables.

Important to note, each value that is used for comparison may represent a possible characteristic but the combination of these two values is considered to be impossible. The approach does not consider implausible or inadmissible values.

Descriptor

Usage

con_contradictions(
  resp_vars = NULL,
  study_data,
  label_col,
  item_level = "item_level",
  threshold_value,
  check_table,
  summarize_categories = FALSE,
  meta_data = item_level,
  meta_data_v2
)

Arguments

resp_vars

variable list the name of the measurement variables

study_data

data.frame the data frame that contains the measurements

label_col

variable attribute the name of the column in the metadata with labels of variables

item_level

data.frame the data frame that contains metadata attributes of study data

threshold_value

numeric from=0 to=100. a numerical value ranging from 0-100

check_table

data.frame contradiction rules table. Table defining contradictions. See details for its required structure.

summarize_categories

logical Needs a column 'tag' in the check_table. If set, a summary output is generated for the defined categories plus one plot per category.

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

Details

Algorithm of this implementation:

List function.

Value

If summarize_categories is FALSE: A list with:

if summarize_categories is TRUE, other objects are returned: one per category named by that category (e.g. "Empirical") containing a result for contradictions within that category only. Additionally, in the slot all_checks a result as it would have been returned with summarize_categories set to FALSE. Finally, a slot SummaryData is returned containing sums per Category and an according ggplot2::ggplot in SummaryPlot.

See Also

Online Documentation


Checks user-defined contradictions in study data

Description

This approach considers a contradiction if impossible combinations of data are observed in one participant. For example, if age of a participant is recorded repeatedly the value of age is (unfortunately) not able to decline. Most cases of contradictions rest on comparison of two variables.

Important to note, each value that is used for comparison may represent a possible characteristic but the combination of these two values is considered to be impossible. The approach does not consider implausible or inadmissible values.

Indicator

Usage

con_contradictions_redcap(
  study_data,
  item_level = "item_level",
  label_col,
  threshold_value,
  meta_data_cross_item = "cross-item_level",
  use_value_labels,
  summarize_categories = FALSE,
  meta_data = item_level,
  cross_item_level,
  `cross-item_level`,
  meta_data_v2
)

Arguments

study_data

data.frame the data frame that contains the measurements

item_level

data.frame the data frame that contains metadata attributes of study data

label_col

variable attribute the name of the column in the metadata with labels of variables

threshold_value

numeric from=0 to=100. a numerical value ranging from 0-100

meta_data_cross_item

data.frame contradiction rules table. Table defining contradictions. See online documentation for its required structure.

use_value_labels

logical Deprecated in favor of DATA_PREPARATION. If set to TRUE, labels can be used in the REDCap syntax to specify contraction checks for categorical variables. If set to FALSE, contractions have to be specified using the coded values. In case that this argument is not set in the function call, it will be set to TRUE if the metadata contains a column VALUE_LABELS which is not empty.

summarize_categories

logical Needs a column CONTRADICTION_TYPE in the meta_data_cross_item. If set, a summary output is generated for the defined categories plus one plot per category. TODO: Not yet controllable by metadata.

meta_data

data.frame old name for item_level

cross_item_level

data.frame alias for meta_data_cross_item

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

`cross-item_level`

data.frame alias for meta_data_cross_item

Details

Algorithm of this implementation:

List function.

Value

If summarize_categories is FALSE: A list with:

If summarize_categories is TRUE, other objects are returned: A list with one element Other, a list with the following entries: One per category named by that category (e.g. "Empirical") containing a result for contradiction checks within that category only. Additionally, in the slot all_checks, a result as it would have been returned with summarize_categories set to FALSE. Finally, in the top-level list, a slot SummaryData is returned containing sums per Category and an according ggplot2::ggplot in SummaryPlot.

See Also

Online Documentation for the function meta_data_cross Online Documentation for the required cross-item-level metadata


Detects variable levels not specified in metadata

Description

For each categorical variable, value lists should be defined in the metadata. This implementation will examine, if all observed levels in the study data are valid.

Indicator

Usage

con_inadmissible_categorical(
  resp_vars = NULL,
  study_data,
  label_col,
  item_level = "item_level",
  threshold_value = 0,
  meta_data = item_level,
  meta_data_v2
)

Arguments

resp_vars

variable list the name of the measurement variables

study_data

data.frame the data frame that contains the measurements

label_col

variable attribute the name of the column in the metadata with labels of variables

item_level

data.frame the data frame that contains metadata attributes of study data

threshold_value

numeric from=0 to=100. a numerical value ranging from 0-100.

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

Details

Algorithm of this implementation:

Value

a list with:

See Also

Online Documentation


Detects variable levels not specified in standardized vocabulary

Description

For each categorical variable, value lists should be defined in the metadata. This implementation will examine, if all observed levels in the study data are valid.

Indicator

Usage

con_inadmissible_vocabulary(
  resp_vars = NULL,
  study_data,
  label_col,
  item_level = "item_level",
  threshold_value = 0,
  meta_data = item_level,
  meta_data_v2
)

Arguments

resp_vars

variable list the name of the measurement variables

study_data

data.frame the data frame that contains the measurements

label_col

variable attribute the name of the column in the metadata with labels of variables

item_level

data.frame the data frame that contains metadata attributes of study data

threshold_value

numeric from=0 to=100. a numerical value ranging from 0-100.

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

Details

Algorithm of this implementation:

Value

a list with:

See Also

Online Documentation

Examples

## Not run: 
sdt <- data.frame(DIAG = c("B050", "B051", "B052", "B999"),
                  MED0 = c("S01XA28", "N07XX18", "ABC", NA), stringsAsFactors = FALSE)
mdt <- tibble::tribble(
~ VAR_NAMES, ~ DATA_TYPE, ~ STANDARDIZED_VOCABULARY_TABLE, ~ SCALE_LEVEL, ~ LABEL,
"DIAG", "string", "<ICD10>", "nominal", "Diagnosis",
"MED0", "string", "<ATC>", "nominal", "Medication"
)
con_inadmissible_vocabulary(NULL, sdt, mdt, label_col = LABEL)
prep_load_workbook_like_file("meta_data_v2")
il <- prep_get_data_frame("item_level")
il$STANDARDIZED_VOCABULARY_TABLE[[11]] <- "<ICD10GM>"
il$DATA_TYPE[[11]] <- DATA_TYPES$INTEGER
il$SCALE_LEVEL[[11]] <- SCALE_LEVELS$NOMINAL
prep_add_data_frames(item_level = il)
r <- dq_report2("study_data", dimensions = "con")
r <- dq_report2("study_data", dimensions = "con",
     advanced_options = list(dataquieR.non_disclosure = TRUE))
r

## End(Not run)

Detects variable values exceeding limits defined in metadata

Description

Inadmissible numerical values can be of type integer or float. This implementation requires the definition of intervals in the metadata to examine the admissibility of numerical study data.

This helps identify inadmissible measurements according to hard limits (for multiple variables).

Indicator

Usage

con_limit_deviations(
  resp_vars = NULL,
  study_data,
  label_col,
  item_level = "item_level",
  limits = NULL,
  flip_mode = "noflip",
  return_flagged_study_data = FALSE,
  return_limit_categorical = TRUE,
  meta_data = item_level,
  meta_data_v2,
  show_obs = TRUE
)

Arguments

resp_vars

variable list the name of the measurement variables

study_data

data.frame the data frame that contains the measurements

label_col

variable attribute the name of the column in the metadata with labels of variables

item_level

data.frame the data frame that contains metadata attributes of study data

limits

enum HARD_LIMITS | SOFT_LIMITS | DETECTION_LIMITS. what limits from metadata to check for

flip_mode

enum default | flip | noflip | auto. Should the plot be in default orientation, flipped, not flipped or auto-flipped. Not all options are always supported. In general, this con be controlled by setting the roptions(dataquieR.flip_mode = ...). If called from dq_report, you can also pass flip_mode to all function calls or set them specifically using specific_args.

return_flagged_study_data

logical return FlaggedStudyData in the result

return_limit_categorical

logical if TRUE return limit deviations also for categorical variables

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

show_obs

logical Should (selected) individual observations be marked in the figure for continuous variables?

Details

Algorithm of this implementation:

Value

a list with:

See Also


contradiction_functions

Description

Detect abnormalities help functions

Usage

contradiction_functions

Format

An object of class list of length 11.

Details

2 variables:


description of the contradiction functions

Description

description of the contradiction functions

Usage

contradiction_functions_descriptions

Format

An object of class list of length 11.


Log Level

Description

TODO

See Also

Other options: dataquieR, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Add stack-trace in condition messages (to be deprecated)

Description

TODO

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Metadata describes more than the current study data

Description

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Set caller for error conditions (to be deprecated)

Description

TODO

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Enable to switch to a general additive model instead of LOESS

Description

If this option is set to TRUE, time course plots will use general additive models (GAM) instead of LOESS when the number of observations exceeds a specified threshold. LOESS computations for large datasets have a high memory consumption.

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Maximum length for variable labels

Description

All variable labels will be shortened to fit this maximum length. Cannot be larger than 200 for technical reasons.

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Maximum length for value labels

Description

value labels are restricted to this length

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Set caller for message conditions (to be deprecated)

Description

TODO

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Default availability of multivariate outlier checks in reports

Description

can be

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Assume, all VALUE_LABELS are HTML escaped

Description

TODO

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Set caller for warning conditions (to be deprecated)

Description

TODO

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Exclude subgroups with constant values from LOESS figure

Description

If this option is set to TRUE, time course plots will only show subgroups with more than one distinct value. This might improve the readability of the figure.

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Display time-points in LOESS plots

Description

TODO

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Lower limit for the LOESS bandwidth

Description

The value should be greater than 0 and less than or equal to 1. In general, increasing the bandwidth leads to a smoother trend line.

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Lower limit for the proportion of cases or controls to create a smoothed time trend figure

Description

The value should be greater than 0 and less than 0.4. If the proportion of cases or controls is lower than the specified value, the LOESS figure will not be created for the specified binary outcome.

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


default for Plot-Format in acc_loess()

Description

TODO

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Display observations in LOESS plots

Description

TODO

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Include number of observations for each level of the grouping variable in the 'margins' figure

Description

If this option is set to FALSE, the figures created by acc_margins will not include the number of observations for each level of the grouping variable. This can be used to obtain clean static plots.

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Sort levels of the grouping variable in the 'margins' figures

Description

If this option is set to TRUE, the levels of the grouping variable in the figure are sorted in descending order according to the number of observations so that levels with more observations are easier to identify. Otherwise, the original order of the levels is retained.

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Apply min-max scaling in parallel coordinates figure to inspect multivariate outliers

Description

boolean, TRUE or FALSE

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Color for empirical contradictions

Description

TODO

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Color for logical contradictions

Description

TODO

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Call browser() on errors

Description

TODO

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Removal of hard limits from data before calculating descriptive statistics.

Description

can be

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Disable automatic post-processing of dataquieR function results

Description

TODO

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Try to avoid fallback to string columns when reading files

Description

If a file does not feature column data types ore features data types cell-based, choose that type which matches the majority of the sampled cells of a column for the column's data type.

Details

This may make you miss data type problems but it could fix them, so prep_get_data_frame() works better.

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Flip-Mode to Use for figures

Description

TODO

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Converting MISSING_LIST/JUMP_LIST to a MISSING_LIST_TABLE create on list per item

Description

TODO

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Control, how the label_col argument is used.

Description

TODO

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Name of the data.frame featuring a format for grading-values

Description

TODO

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Name of the data.frame featuring GRADING_RULESET

Description

TODO

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Control, if dataquieR tries to guess missing-codes from the study data in absence of metadata

Description

TODO

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Language-Suffix for metadata Label-Columns

Description

TODO

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Maximum number of levels of the grouping variable shown individually in figures

Description

If there are more examiners or devices than can be shown individually, they will be collapsed into "other".

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Maximum number of levels of the grouping variable shown with individual histograms ('violins') in 'margins' figures

Description

If there are more examiners or devices, the figure will be reduced to box-plots to save space.

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Minimum number of observations per grouping variable that is required to include an individual level of the grouping variable in a figure

Description

Levels of the grouping variable with fewer observations than specified here will be excluded from the figure.

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Remove all observation-level-real-data from reports

Description

TODO

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


function to call on progress increase

Description

TODO

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


function to call on progress message update

Description

TODO

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Number of levels to consider a variable ordinal in absence of SCALE_LEVEL

Description

TODO

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_metriclevels, dataquieR.testdebug


Number of levels to consider a variable metric in absence of SCALE_LEVEL

Description

TODO

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.testdebug


Disable all interactively used metadata-based function argument provision

Description

TODO

See Also

Other options: dataquieR, dataquieR.CONDITIONS_LEVEL_TRHESHOLD, dataquieR.CONDITIONS_WITH_STACKTRACE, dataquieR.ELEMENT_MISSMATCH_CHECKTYPE, dataquieR.ERRORS_WITH_CALLER, dataquieR.GAM_for_LOESS, dataquieR.MAX_LABEL_LEN, dataquieR.MAX_VALUE_LABEL_LEN, dataquieR.MESSAGES_WITH_CALLER, dataquieR.MULTIVARIATE_OUTLIER_CHECK, dataquieR.VALUE_LABELS_htmlescaped, dataquieR.WARNINGS_WITH_CALLER, dataquieR.acc_loess.exclude_constant_subgroups, dataquieR.acc_loess.mark_time_points, dataquieR.acc_loess.min_bw, dataquieR.acc_loess.min_proportion, dataquieR.acc_loess.plot_format, dataquieR.acc_loess.plot_observations, dataquieR.acc_margins_num, dataquieR.acc_margins_sort, dataquieR.acc_multivariate_outlier.scale, dataquieR.col_con_con_empirical, dataquieR.col_con_con_logical, dataquieR.debug, dataquieR.des_summary_hard_lim_remove, dataquieR.dontwrapresults, dataquieR.fix_column_type_on_read, dataquieR.flip_mode, dataquieR.force_item_specific_missing_codes, dataquieR.force_label_col, dataquieR.grading_formats, dataquieR.grading_rulesets, dataquieR.guess_missing_codes, dataquieR.lang, dataquieR.max_group_var_levels_in_plot, dataquieR.max_group_var_levels_with_violins, dataquieR.min_obs_per_group_var_in_plot, dataquieR.non_disclosure, dataquieR.progress_fkt, dataquieR.progress_msg_fkt, dataquieR.scale_level_heuristics_control_binaryrecodelimit, dataquieR.scale_level_heuristics_control_metriclevels


Internal constructor for the internal class dataquieR_resultset.

Description

creates an object of the class dataquieR_resultset.

Usage

dataquieR_resultset(...)

Arguments

...

properties stored in the object

Details

The class features the following methods:

Value

an object of the class dataquieR_resultset.

See Also

dq_report


Class dataquieR_resultset2.

Description

Class dataquieR_resultset2.

See Also

dq_report2


Verify an object of class dataquieR_resultset

Description

Deprecated

Usage

dataquieR_resultset_verify(...)

Arguments

...

Deprecated

Value

Deprecated


Compute Pairwise Correlations

Description

works on variable groups (cross-item_level), which are expected to show a Pearson correlation

Usage

des_scatterplot_matrix(
  label_col,
  study_data,
  item_level = "item_level",
  meta_data_cross_item = "cross-item_level",
  meta_data = item_level,
  meta_data_v2,
  cross_item_level,
  `cross-item_level`
)

Arguments

label_col

variable attribute the name of the column in the metadata with labels of variables

study_data

data.frame the data frame that contains the measurements

item_level

data.frame the data frame that contains metadata attributes of study data

meta_data_cross_item

meta_data_cross

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

cross_item_level

data.frame alias for meta_data_cross_item

`cross-item_level`

data.frame alias for meta_data_cross_item

Details

Descriptor # TODO: This can be an indicator

Value

a list with the slots:

Examples

## Not run: 
devtools::load_all()
prep_load_workbook_like_file("meta_data_v2")
des_scatterplot_matrix("study_data")

## End(Not run)

Compute Descriptive Statistics

Description

generates a descriptive overview of the variables in resp_vars.

Descriptor

Usage

des_summary(
  resp_vars = NULL,
  study_data,
  label_col,
  item_level = "item_level",
  meta_data = item_level,
  meta_data_v2,
  hard_limits_removal = getOption("dataquieR.des_summary_hard_lim_remove",
    dataquieR.des_summary_hard_lim_remove_default),
  ...
)

Arguments

resp_vars

variable the name of the measurement variable

study_data

data.frame the data frame that contains the measurements

label_col

variable attribute the name of the column in the metadata with labels of variables

item_level

data.frame the data frame that contains metadata attributes of study data

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

hard_limits_removal

logical if TRUE values outside hard limits are removed from the data before calculating descriptive statistics. The default is FALSE

...

arguments to be passed to all called indicator functions if applicable.

Details

TODO

Value

a list with:

See Also

Online Documentation

Examples

## Not run: 
prep_load_workbook_like_file("meta_data_v2")
xx <- des_summary(study_data = "study_data", meta_data =
                   prep_get_data_frame("item_level"))
util_html_table(xx$SummaryData)
util_html_table(des_summary(study_data = prep_get_data_frame("study_data"),
                   meta_data = prep_get_data_frame("item_level"))$SummaryData)


## End(Not run)

Compute Descriptive Statistics - categorical variables

Description

generates a descriptive overview of the categorical variables (nominal and ordinal) in resp_vars.

Descriptor

Usage

des_summary_categorical(
  resp_vars = NULL,
  study_data,
  label_col,
  item_level = "item_level",
  meta_data = item_level,
  meta_data_v2,
  hard_limits_removal = getOption("dataquieR.des_summary_hard_lim_remove",
    dataquieR.des_summary_hard_lim_remove_default),
  ...
)

Arguments

resp_vars

variable the name of the categorical measurement variable

study_data

data.frame the data frame that contains the measurements

label_col

variable attribute the name of the column in the metadata with labels of variables

item_level

data.frame the data frame that contains metadata attributes of study data

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

hard_limits_removal

logical if TRUE values outside hard limits are removed from the data before calculating descriptive statistics. The default is FALSE

...

arguments to be passed to all called indicator functions if applicable.

Details

TODO

Value

a list with:

See Also

Online Documentation

Examples

## Not run: 
prep_load_workbook_like_file("meta_data_v2")
xx <- des_summary_categorical(study_data = "study_data", meta_data =
                              prep_get_data_frame("item_level"))
util_html_table(xx$SummaryData)
util_html_table(des_summary_categorical(study_data = prep_get_data_frame("study_data"),
                   meta_data = prep_get_data_frame("item_level"))$SummaryData)

## End(Not run)

Compute Descriptive Statistics - continuous variables

Description

generates a descriptive overview of continuous variables (ratio and interval) in resp_vars.

Descriptor

Usage

des_summary_continuous(
  resp_vars = NULL,
  study_data,
  label_col,
  item_level = "item_level",
  meta_data = item_level,
  meta_data_v2,
  hard_limits_removal = getOption("dataquieR.des_summary_hard_lim_remove",
    dataquieR.des_summary_hard_lim_remove_default),
  ...
)

Arguments

resp_vars

variable the name of the continuous measurement variable

study_data

data.frame the data frame that contains the measurements

label_col

variable attribute the name of the column in the metadata with labels of variables

item_level

data.frame the data frame that contains metadata attributes of study data

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

hard_limits_removal

logical if TRUE values outside hard limits are removed from the data before calculating descriptive statistics. The default is FALSE

...

arguments to be passed to all called indicator functions if applicable.

Details

TODO

Value

a list with:

See Also

Online Documentation

Examples

## Not run: 
prep_load_workbook_like_file("meta_data_v2")
xx <- des_summary_continuous(study_data = "study_data", meta_data =
                              prep_get_data_frame("item_level"))
util_html_table(xx$SummaryData)
util_html_table(des_summary_continuous(study_data = prep_get_data_frame("study_data"),
                   meta_data = prep_get_data_frame("item_level"))$SummaryData)

## End(Not run)

Get the dimensions of a dq_report2 result

Description

Get the dimensions of a dq_report2 result

Usage

## S3 method for class 'dataquieR_resultset2'
dim(x)

Arguments

x

a dataquieR_resultset2 result

Value

dimensions


Names of DQ dimensions

Description

a vector of data quality dimensions. The supported dimensions are Completeness, Consistency and Accuracy.

Usage

dimensions

Format

An object of class character of length 3.

Value

Only a definition, not a function, so no return value

See Also

Data Quality Concept


Names of a dataquieR report object (v2.0)

Description

Names of a dataquieR report object (v2.0)

Usage

## S3 method for class 'dataquieR_resultset2'
dimnames(x)

Arguments

x

the result object

Value

the names


Dimension Titles for Prefixes

Description

order does matter, because it defines the order in the dq_report2.

Usage

dims

Format

An object of class character of length 5.

See Also

util_html_for_var()

util_html_for_dims()


Generate a full DQ report

Description

Deprecated

Usage

dq_report(...)

Arguments

...

Deprecated

Value

Deprecated


Generate a full DQ report, v2

Description

Generate a full DQ report, v2

Usage

dq_report2(
  study_data,
  item_level = "item_level",
  label_col = LABEL,
  meta_data_segment = "segment_level",
  meta_data_dataframe = "dataframe_level",
  meta_data_cross_item = "cross-item_level",
  meta_data_item_computation = "item_computation_level",
  meta_data = item_level,
  meta_data_v2,
  ...,
  dimensions = c("Completeness", "Consistency"),
  cores = list(mode = "socket", logging = FALSE, cpus = util_detect_cores(),
    load.balancing = TRUE),
  specific_args = list(),
  advanced_options = list(),
  author = prep_get_user_name(),
  title = "Data quality report",
  subtitle = as.character(Sys.Date()),
  user_info = NULL,
  debug_parallel = FALSE,
  resp_vars = character(0),
  filter_indicator_functions = character(0),
  filter_result_slots = c("^Summary", "^Segment", "^DataTypePlotList",
    "^ReportSummaryTable", "^Dataframe", "^Result", "^VariableGroup"),
  mode = c("default", "futures", "queue", "parallel"),
  mode_args = list(),
  notes_from_wrapper = list(),
  storr_factory = NULL,
  amend = FALSE,
  cross_item_level,
  `cross-item_level`,
  segment_level,
  dataframe_level,
  item_computation_level,
  .internal = rlang::env_inherits(rlang::caller_env(), parent.env(environment()))
)

Arguments

study_data

data.frame the data frame that contains the measurements

item_level

data.frame the data frame that contains metadata attributes of study data

label_col

variable attribute the name of the column in the metadata with labels of variables

meta_data_segment

data.frame – optional: Segment level metadata

meta_data_dataframe

data.frame – optional: Data frame level metadata

meta_data_cross_item

data.frame – optional: Cross-item level metadata

meta_data_item_computation

data.frame optional. computation rules for computed variables.

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

...

arguments to be passed to all called indicator functions if applicable.

dimensions

dimensions Vector of dimensions to address in the report. Allowed values in the vector are Completeness, Consistency, and Accuracy. The generated report will only cover the listed data quality dimensions. Accuracy is computational expensive, so this dimension is not enabled by default. Completeness should be included, if Consistency is included, and Consistency should be included, if Accuracy is included to avoid misleading detections of e.g. missing codes as outliers, please refer to the data quality concept for more details. Integrity is always included.

cores

integer number of cpu cores to use or a named list with arguments for parallelMap::parallelStart or NULL, if parallel has already been started by the caller. Can also be a cluster.

specific_args

list named list of arguments specifically for one of the called functions, the of the list elements correspond to the indicator functions whose calls should be modified. The elements are lists of arguments.

advanced_options

list options to set during report computation, see options()

author

character author for the report documents.

title

character optional argument to specify the title for the data quality report

subtitle

character optional argument to specify a subtitle for the data quality report

user_info

list additional info stored with the report, e.g., comments, title, ...

debug_parallel

logical print blocks currently evaluated in parallel

resp_vars

variable list the name of the measurement variables for the report. If missing, all variables will be used. Only item level indicator functions are filtered, so far.

filter_indicator_functions

character regular expressions, only if an indicator function's name matches one of these, it'll be used for the report. If of length zero, no filtering is performed.

filter_result_slots

character regular expressions, only if an indicator function's result's name matches one of these, it'll be used for the report. If of length zero, no filtering is performed.

mode

character work mode for parallel execution. default is "default", the values mean: - default: use queue except cores has been set explicitly - futures: use the future package - queue: use a queue as described in the examples from the callr package by Csárdi and Chang and start sub-processes as workers that evaluate the queue. - parallel: use the cluster from cores to evaluate all calls of indicator functions using the classic R parallel back-ends

mode_args

list of arguments for the selected mode. As of writing this manual, only for the mode queue the argument step is supported, which gives the number of function calls that are run by one worker at a time. the default is 15, which gives on most of the tested systems a good balance between synchronization overhead and idling workers.

notes_from_wrapper

list a list containing notes about changed labels by dq_report_by (otherwise NULL)

storr_factory

function NULL, or a function returning a storr object as back-end for the report's results. If used with cores > 1, the storage must be accessible from all cores and capable of concurrent writing according to storr. Hint: dataquieR currently only supports storr::storr_rds(), officially, while other back- ends may nevertheless work, yet, they are not tested.

amend

logical if there is already data in.storr_factory, use it anyways – unsupported, so far!

cross_item_level

data.frame alias for meta_data_cross_item

segment_level

data.frame alias for meta_data_segment

dataframe_level

data.frame alias for meta_data_dataframe

item_computation_level

data.frame alias for meta_data_item_computation

.internal

logical internal use, only.

`cross-item_level`

data.frame alias for meta_data_cross_item

Details

See dq_report_by for a way to generate stratified or splitted reports easily.

Value

a dataquieR_resultset2 that can be printed creating a HTML-report.

See Also

Examples

## Not run: 
prep_load_workbook_like_file("inst/extdata/meta_data_v2.xlsx")
meta_data <- prep_get_data_frame("item_level")
meta_data_cross <- prep_get_data_frame("cross-item_level")
x <- dq_report2("study_data", dimensions = NULL, label_col = "LABEL")
xx <- pbapply::pblapply(x, util_eval_to_dataquieR_result, env = environment())
xx <- pbapply::pblapply(tail(x), util_eval_to_dataquieR_result, env = environment())
xx <- parallel
cat(vapply(x, deparse1, FUN.VALUE = character(1)), sep = "\n", file = "all_calls.txt")
rstudioapi::navigateToFile("all_calls.txt")
eval(x$`acc_multivariate_outlier.Blood pressure checks`)

prep_load_workbook_like_file("meta_data_v2")
rules <- tibble::tribble(
  ~resp_vars,  ~RULE,
  "BMI", '[BODY_WEIGHT_0]/(([BODY_HEIGHT_0]/100)^2)',
  "R", '[WAIST_CIRC_0]/2/[pi]', # in m^3
  "VOL_EST", '[pi]*([WAIST_CIRC_0]/2/[pi])^2*[BODY_HEIGHT_0] / 1000', # in l
 )
prep_load_workbook_like_file("ship_meta_v2")
prep_add_data_frames(computed_items = rules)
r <- dq_report2("ship", dimensions = NULL, label_col = "LABEL")

## End(Not run)

Generate a stratified full DQ report

Description

Generate a stratified full DQ report

Usage

dq_report_by(
  study_data,
  item_level = "item_level",
  meta_data_segment = "segment_level",
  meta_data_dataframe = "dataframe_level",
  meta_data_cross_item = "cross-item_level",
  meta_data_item_computation = "item_computation_level",
  missing_tables = NULL,
  label_col,
  meta_data_v2,
  segment_column = NULL,
  strata_column = NULL,
  strata_select = NULL,
  selection_type = NULL,
  segment_select = NULL,
  segment_exclude = NULL,
  strata_exclude = NULL,
  subgroup = NULL,
  resp_vars = character(0),
  id_vars = NULL,
  advanced_options = list(),
  storr_factory = NULL,
  amend = FALSE,
  ...,
  output_dir = NULL,
  input_dir = NULL,
  also_print = FALSE,
  disable_plotly = FALSE,
  view = TRUE,
  meta_data = item_level,
  cross_item_level,
  `cross-item_level`,
  segment_level,
  dataframe_level,
  item_computation_level
)

Arguments

study_data

data.frame the data frame that contains the measurements: it can be an R object (e.g., bia), a data frame (e.g., "C:/Users/data/bia.dta"), a vector containing data frames files (e.g., c("C:/Users/data/bia.dta", ⁠C:/Users/data/biames.dta"⁠)), or it can be left empty and the data frames are provided in the data frame level metadata. If only the file name without path is provided (e.g., "bia.dta"), the file name needs the extension and the path must be provided in the argument input_dir. It can also contain only the file name in case of example data from the package dataquieR (e.g., "study_data" or "ship")

item_level

data.frame the data frame that contains metadata attributes of study data

meta_data_segment

data.frame – optional: Segment level metadata

meta_data_dataframe

data.frame – optional if study_data is present: Data frame level metadata

meta_data_cross_item

data.frame – optional: Cross-item level metadata

meta_data_item_computation

data.frame – optional: Computed items metadata

missing_tables

character the name of the data frame containing the missing codes, it can be a vector if more than one table is provided. Example: c("missing_table1", "missing_table2")

label_col

variable attribute the name of the column in the metadata containing the labels of the variables

meta_data_v2

character path or file name of the workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2

segment_column

variable attribute name of a metadata attribute usable to split the report in sections of variables, e.g. all blood-pressure related variables. By default, reports are split by STUDY_SEGMENT if available and no segment_column nor strata_column or subgroup are defined. To create an un-split report please write explicitly the argument 'segment_column = NULL'

strata_column

variable name of a study variable to stratify the report by, e.g. the study centers. Both labels and VAR_NAMES are accepted. In case of NAs in the selected variable, a separate report containing the NAs subset will be created

strata_select

character if given, the strata of strata_column are limited to the content of this vector. A character vector or a regular expression can be provided (e.g., "^a.*$"). This argument can not be used if no strata_column is provided

selection_type

character optional, can only be specified if a strata_select or strata_exclude is specified. If not present the function try to guess what the user typed as strata_select or strata_exclude. There are 3 options: value indicating that the stratum selected is a value and not a value_label. For example "0"; v_label indicating that the stratum specified is a label. For example "male". regex indicating that the user specified strata using a regular expression. For example "^Ber" to select all strata starting with that letters

segment_select

character if given, the levels of segment_column are limited to the content of this vector. A character vector or a regular expression (e.g., ".*_EXAM$") can be provided. This argument can not be used if no segment_column is provided.

segment_exclude

character optional, can only be specified if a segment_column is specified. The levels of segment_column will not include the content of this argument. A character vector or a regular expression can be provided (e.g., "^STU").

strata_exclude

character optional, can only be specified if a strata_column is specified. The strata of strata_column will not include the content of this argument. A character vector or a regular expression can be provided (e.g., "^STU").

subgroup

character optional, to define subgroups of cases. Rules are to be written as REDCap rules. Only VAR_NAMES are accepted in the rules.

resp_vars

variable the names of the measurement variables, if missing or NULL, all variables will be included

id_vars

variable a vector containing the name/s of the variables containing ids, to be used to merge multiple data frames if provided in study_data and to be add to referred vars

advanced_options

list options to set during report computation, see options()

storr_factory

function NULL, or a function returning a storr object as back-end for the report's results. If used with cores > 1, the storage must be accessible from all cores and capable of concurrent writing according to storr. Hint: dataquieR currently only supports storr::storr_rds(), officially, while other back- ends may nevertheless work, yet, they are not tested.

amend

logical if there is already data in.storr_factory, use it anyways – unsupported, so far!

...

arguments to be passed through to dq_report or dq_report2

output_dir

character if given, the output is not returned but saved in this directory

input_dir

character if given, the study data files that have no path and that are not URL are searched in this directory. Also meta_data_v2 is searched in this directory if no path is provided

also_print

logical if output_dir is not NULL, also create HTML output for each report using print.dataquieR_resultset2() written to the path output_dir

disable_plotly

logical do not use plotly, even if installed

view

logical open the returned report

meta_data

data.frame old name for item_level

cross_item_level

data.frame alias for meta_data_cross_item

segment_level

data.frame alias for meta_data_segment

dataframe_level

data.frame alias for meta_data_dataframe

item_computation_level

data.frame alias for meta_data_item_computation

`cross-item_level`

data.frame alias for meta_data_cross_item

Value

invisible(). named list of named lists of dq_report2 reports or, if output_dir has been specified, invisible(NULL)

See Also

dq_report

Examples

## Not run:  # really long-running example.
prep_load_workbook_like_file("meta_data_v2")
rep <- dq_report_by("study_data", label_col =
  LABEL, strata_column = "CENTER_0")
rep <- dq_report_by("study_data",
  label_col = LABEL, strata_column = "CENTER_0",
  segment_column = NULL
)
unlink("/tmp/testRep/", force = TRUE, recursive = TRUE)
dq_report_by("study_data",
  label_col = LABEL, strata_column = "CENTER_0",
  segment_column = STUDY_SEGMENT, output_dir = "/tmp/testRep"
)
unlink("/tmp/testRep/", force = TRUE, recursive = TRUE)
dq_report_by("study_data",
  label_col = LABEL, strata_column = "CENTER_0",
  segment_column = NULL, output_dir = "/tmp/testRep"
)
dq_report_by("study_data",
  label_col = LABEL,
  segment_column = STUDY_SEGMENT, output_dir = "/tmp/testRep"
)
dq_report_by("study_data",
  label_col = LABEL,
  segment_column = STUDY_SEGMENT, output_dir = "/tmp/testRep",
  also_print = TRUE
)
dq_report_by(study_data = "study_data", meta_data_v2 = "meta_data_v2",
  advanced_options = list(dataquieR.study_data_cache_max = 0,
  dataquieR.study_data_cache_metrics = TRUE,
  dataquieR.study_data_cache_metrics_env = environment()),
  cores = NULL, dimensions = "int")
dq_report_by(study_data = "study_data", meta_data_v2 = "meta_data_v2",
  advanced_options = list(dataquieR.study_data_cache_max = 0),
  cores = NULL, dimensions = "int")

## End(Not run)

HTML Dependency for report headers in clipboard

Description

HTML Dependency for report headers in clipboard

Usage

html_dependency_clipboard()

Value

the dependency


HTML Dependency for dataquieR

Description

generate all dependencies used in static dataquieR reports

Usage

html_dependency_dataquieR(iframe = FALSE)

Arguments

iframe

logical(1) if TRUE, create the dependency used in figure iframes.

Value

the dependency


HTML Dependency for report headers in DT::datatable

Description

HTML Dependency for report headers in DT::datatable

Usage

html_dependency_report_dt()

Value

the dependency


HTML Dependency for tippy

Description

HTML Dependency for tippy

Usage

html_dependency_tippy()

Value

the dependency


HTML Dependency for vertical headers in DT::datatable

Description

HTML Dependency for vertical headers in DT::datatable

Usage

html_dependency_vert_dt()

Value

the dependency


Wrapper function to check for studies data structure

Description

This function tests for unexpected elements and records, as well as duplicated identifiers and content. The unexpected element record check can be conducted by providing the number of expected records or an additional table with the expected records. It is possible to conduct the checks by study segments or to consider only selected segments.

Indicator

Usage

int_all_datastructure_dataframe(
  meta_data_dataframe = "dataframe_level",
  item_level = "item_level",
  meta_data = item_level,
  meta_data_v2,
  dataframe_level
)

Arguments

meta_data_dataframe

data.frame the data frame that contains the metadata for the data frame level

item_level

data.frame the data frame that contains metadata attributes of study data

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

dataframe_level

data.frame alias for meta_data_dataframe

Value

a list with

Examples

## Not run: 
out_dataframe <- int_all_datastructure_dataframe(
  meta_data_dataframe = "meta_data_dataframe",
  meta_data = "ship_meta"
)
md0 <- prep_get_data_frame("ship_meta")
md0
md0$VAR_NAMES
md0$VAR_NAMES[[1]] <- "Id" # is this missmatch reported -- is the data frame
                           # also reported, if nothing is wrong with it
out_dataframe <- int_all_datastructure_dataframe(
  meta_data_dataframe = "meta_data_dataframe",
  meta_data = md0
)

# This is the "normal" procedure for inside pipeline
# but outside this function  checktype is exact by default
options(dataquieR.ELEMENT_MISSMATCH_CHECKTYPE = "subset_u")
lapply(setNames(nm = prep_get_data_frame("meta_data_dataframe")$DF_NAME),
  int_sts_element_dataframe, meta_data = md0)
md0$VAR_NAMES[[1]] <-
  "id" # is this missmatch reported -- is the data frame also reported,
       # if nothing is wrong with it
lapply(setNames(nm = prep_get_data_frame("meta_data_dataframe")$DF_NAME),
  int_sts_element_dataframe, meta_data = md0)
options(dataquieR.ELEMENT_MISSMATCH_CHECKTYPE = "exact")

## End(Not run)


Wrapper function to check for segment data structure

Description

This function tests for unexpected elements and records, as well as duplicated identifiers and content. The unexpected element record check can be conducted by providing the number of expected records or an additional table with the expected records. It is possible to conduct the checks by study segments or to consider only selected segments.

Indicator

Usage

int_all_datastructure_segment(
  study_data,
  label_col,
  item_level = "item_level",
  meta_data = item_level,
  meta_data_v2,
  segment_level,
  meta_data_segment = "segment_level"
)

Arguments

study_data

data.frame the data frame that contains the measurements

label_col

variable attribute the name of the column in the metadata with labels of variables

item_level

data.frame the data frame that contains metadata attributes of study data

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

segment_level

data.frame alias for meta_data_segment

meta_data_segment

data.frame the data frame that contains the metadata for the segment level, mandatory

Value

a list with

Examples

## Not run: 
out_segment <- int_all_datastructure_segment(
  meta_data_segment = "meta_data_segment",
  study_data = "ship",
  meta_data = "ship_meta"
)

study_data <- cars
meta_data <- dataquieR::prep_create_meta(VAR_NAMES = c("speedx", "distx"),
  DATA_TYPE = c("integer", "integer"), MISSING_LIST = "|", JUMP_LIST = "|",
  STUDY_SEGMENT = c("Intro", "Ex"))

out_segment <- int_all_datastructure_segment(
  meta_data_segment = "meta_data_segment",
  study_data = study_data,
  meta_data = meta_data
)

## End(Not run)

Check declared data types of metadata in study data

Description

Checks data types of the study data and for the data type declared in the metadata

Indicator

Usage

int_datatype_matrix(
  resp_vars = NULL,
  study_data,
  label_col,
  item_level = "item_level",
  split_segments = FALSE,
  max_vars_per_plot = 20,
  threshold_value = 0,
  meta_data = item_level,
  meta_data_v2
)

Arguments

resp_vars

variable the names of the measurement variables, if missing or NULL, all variables will be checked

study_data

data.frame the data frame that contains the measurements

label_col

variable attribute the name of the column in the metadata with labels of variables

item_level

data.frame the data frame that contains metadata attributes of study data

split_segments

logical return one matrix per study segment

max_vars_per_plot

integer from=0. The maximum number of variables per single plot.

threshold_value

numeric from=0 to=100. percentage failing conversions allowed to still classify a study variable convertible.

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

Details

This is a preparatory support function that compares study data with associated metadata. A prerequisite of this function is that the no. of columns in the study data complies with the no. of rows in the metadata.

For each study variable, the function searches for its data type declared in static metadata and returns a heatmap like matrix indicating data type mismatches in the study data.

List function.

Value

a list with:


Check for duplicated content

Description

This function tests for duplicates entries in the data set. It is possible to check duplicated entries by study segments or to consider only selected segments.

Indicator

Usage

int_duplicate_content(
  level = c("dataframe", "segment"),
  study_data,
  item_level = "item_level",
  label_col,
  meta_data = item_level,
  meta_data_v2,
  ...
)

Arguments

level

character a character vector indicating whether the assessment should be conducted at the study level (level = "dataframe") or at the segment level (level = "segment").

study_data

data.frame the data frame that contains the measurements

item_level

data.frame the data frame that contains metadata attributes of study data

label_col

variable attribute the name of the column in the metadata with labels of variables

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

...

Depending on level, passed to either util_int_duplicate_content_segment or util_int_duplicate_content_dataframe

Value

a list. Depending on level, see util_int_duplicate_content_segment or util_int_duplicate_content_dataframe for a description of the outputs.


Check for duplicated IDs

Description

This function tests for duplicates entries in identifiers. It is possible to check duplicated identifiers by study segments or to consider only selected segments.

Indicator

Usage

int_duplicate_ids(
  level = c("dataframe", "segment"),
  study_data,
  item_level = "item_level",
  label_col,
  meta_data = item_level,
  meta_data_v2,
  ...
)

Arguments

level

character a character vector indicating whether the assessment should be conducted at the study level (level = "dataframe") or at the segment level (level = "segment").

study_data

data.frame the data frame that contains the measurements

item_level

data.frame the data frame that contains metadata attributes of study data

label_col

variable attribute the name of the column in the metadata with labels of variables

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

...

Depending on level, passed to either util_int_duplicate_ids_segment or util_int_duplicate_ids_dataframe

Value

a list. Depending on level, see util_int_duplicate_ids_segment or util_int_duplicate_ids_dataframe for a description of the outputs.


Encoding Errors

Description

Detects errors in the character encoding of string variables

Indicator

Usage

int_encoding_errors(
  resp_vars = NULL,
  study_data,
  label_col,
  meta_data_dataframe = "dataframe_level",
  item_level = "item_level",
  ref_encs,
  meta_data = item_level,
  meta_data_v2,
  dataframe_level
)

Arguments

resp_vars

variable the names of the measurement variables, if missing or NULL, all variables will be checked

study_data

data.frame the data frame that contains the measurements

label_col

variable attribute the name of the column in the metadata with labels of variables

meta_data_dataframe

data.frame the data frame that contains the metadata for the data frame level

item_level

data.frame the data frame that contains metadata attributes of study data

ref_encs

reference encodings (names are resp_vars)

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

dataframe_level

data.frame alias for meta_data_dataframe

Details

Strings are stored based on code tables, nowadays, typically as UTF-8. However, other code systems are still in use, so, sometimes, strings from different systems are mixed in the data. This indicator checks for such problems and returns the count of entries per variable, that do not match the reference coding system, which is estimated from the study data (addition of metadata field is planned).

If not specified in the metadata (columns ENCODING in item- or data-frame- level, the encoding is guessed from the data). Otherwise, it may be any supported encoding as returned by iconvlist().

Value

a list with:


Detect Expected Observations

Description

For each participant, check, if an observation was expected, given the PART_VARS from item-level metadata

Usage

int_part_vars_structure(
  label_col,
  study_data,
  item_level = "item_level",
  expected_observations = c("HIERARCHY", "SEGMENT"),
  disclose_problem_paprt_var_data = FALSE,
  meta_data = item_level,
  meta_data_v2
)

Arguments

label_col

character mapping attribute colnames(study_data) vs. meta_data[label_col]

study_data

study_data must have all relevant PART_VARS to avoid false-positives on PART_VARS missing from study_data

item_level

meta_data must be complete to avoid false positives on non-existing PART_VARS

expected_observations

enum HIERARCHY | SEGMENT. How should PART_VARS be handled: - SEGMENT: if PART_VAR is 1, an observation is expected - HIERARCHY: the default, if the PART_VAR is 1 for this variable and also for all PART_VARS of PART_VARS up in the hierarchy, an observation is expected.

disclose_problem_paprt_var_data

logical show the problematic data (PART_VAR only)

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

Details

Descriptor

Value

empty list, so far – the function only warns.


Determine missing and/or superfluous data elements

Description

Depends on dataquieR.ELEMENT_MISSMATCH_CHECKTYPE option, see there

Usage

int_sts_element_dataframe(
  item_level = "item_level",
  meta_data_dataframe = "dataframe_level",
  meta_data = item_level,
  meta_data_v2,
  check_type = getOption("dataquieR.ELEMENT_MISSMATCH_CHECKTYPE",
    dataquieR.ELEMENT_MISSMATCH_CHECKTYPE_default),
  dataframe_level
)

Arguments

item_level

data.frame the data frame that contains metadata attributes of study data

meta_data_dataframe

data.frame the data frame that contains the metadata for the data frame level

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

check_type

enum none | exact | subset_u | subset_m. See dataquieR.ELEMENT_MISSMATCH_CHECKTYPE

dataframe_level

data.frame alias for meta_data_dataframe

Details

Indicator

Value

list with names lots:

Examples

## Not run: 
prep_load_workbook_like_file("~/tmp/df_level_test.xlsx")
meta_data_dataframe <- "dataframe_level"
meta_data <- "item_level"

## End(Not run)

Checks for element set

Description

Depends on dataquieR.ELEMENT_MISSMATCH_CHECKTYPE option, see there – # TODO: Rind out, how to document and link it here using Roxygen.

Usage

int_sts_element_segment(
  study_data,
  item_level = "item_level",
  label_col,
  meta_data = item_level,
  meta_data_v2
)

Arguments

study_data

data.frame the data frame that contains the measurements, mandatory.

item_level

data.frame the data frame that contains metadata attributes of study data

label_col

variable attribute the name of the column in the metadata with labels of variables

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

Details

Indicator

Value

a list with

Examples

## Not run: 
study_data <- cars
meta_data <- dataquieR::prep_create_meta(VAR_NAMES = c("speedx", "distx"),
  DATA_TYPE = c("integer", "integer"), MISSING_LIST = "|", JUMP_LIST = "|",
  STUDY_SEGMENT = c("Intro", "Ex"))
options(dataquieR.ELEMENT_MISSMATCH_CHECKTYPE = "none")
int_sts_element_segment(study_data, meta_data)
options(dataquieR.ELEMENT_MISSMATCH_CHECKTYPE = "exact")
int_sts_element_segment(study_data, meta_data)
study_data <- cars
meta_data <- dataquieR::prep_create_meta(VAR_NAMES = c("speedx", "distx"),
  DATA_TYPE = c("integer", "integer"), MISSING_LIST = "|", JUMP_LIST = "|",
  STUDY_SEGMENT = c("Intro", "Intro"))
options(dataquieR.ELEMENT_MISSMATCH_CHECKTYPE = "none")
int_sts_element_segment(study_data, meta_data)
options(dataquieR.ELEMENT_MISSMATCH_CHECKTYPE = "exact")
int_sts_element_segment(study_data, meta_data)
study_data <- cars
meta_data <- dataquieR::prep_create_meta(VAR_NAMES = c("speed", "distx"),
  DATA_TYPE = c("integer", "integer"), MISSING_LIST = "|", JUMP_LIST = "|",
  STUDY_SEGMENT = c("Intro", "Intro"))
options(dataquieR.ELEMENT_MISSMATCH_CHECKTYPE = "none")
int_sts_element_segment(study_data, meta_data)
options(dataquieR.ELEMENT_MISSMATCH_CHECKTYPE = "exact")
int_sts_element_segment(study_data, meta_data)

## End(Not run)

Check for unexpected data element count

Description

This function contrasts the expected element number in each study in the metadata with the actual element number in each study data frame.

Indicator

Usage

int_unexp_elements(
  identifier_name_list,
  data_element_count,
  meta_data_dataframe = "dataframe_level",
  meta_data_v2,
  dataframe_level
)

Arguments

identifier_name_list

character a character vector indicating the name of each study data frame, mandatory.

data_element_count

integer an integer vector with the number of expected data elements, mandatory.

meta_data_dataframe

data.frame the data frame that contains the metadata for the data frame level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

dataframe_level

data.frame alias for meta_data_dataframe

Value

a list with


Check for unexpected data record count at the data frame level

Description

This function contrasts the expected record number in each study in the metadata with the actual record number in each study data frame.

Indicator

Usage

int_unexp_records_dataframe(
  identifier_name_list,
  data_record_count,
  meta_data_dataframe = "dataframe_level",
  meta_data_v2,
  dataframe_level
)

Arguments

identifier_name_list

character a character vector indicating the name of each study data frame, mandatory.

data_record_count

integer an integer vector with the number of expected data records per study data frame, mandatory.

meta_data_dataframe

data.frame the data frame that contains the metadata for the data frame level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

dataframe_level

data.frame alias for meta_data_dataframe

Value

a list with


Check for unexpected data record count within segments

Description

This function contrasts the expected record number in each study segment in the metadata with the actual record number in each segment data frame.

Indicator

Usage

int_unexp_records_segment(
  study_segment,
  study_data,
  label_col,
  item_level = "item_level",
  data_record_count,
  meta_data = item_level,
  meta_data_segment = "segment_level",
  meta_data_v2,
  segment_level
)

Arguments

study_segment

character a character vector indicating the name of each study data frame, mandatory.

study_data

data.frame the data frame that contains the measurements

label_col

variable attribute the name of the column in the metadata with labels of variables

item_level

data.frame the data frame that contains metadata attributes of study data

data_record_count

integer an integer vector with the number of expected data records, mandatory.

meta_data

data.frame old name for item_level

meta_data_segment

data.frame – optional: Segment level metadata

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

segment_level

data.frame alias for meta_data_segment

Details

The current implementation does not take into account jump or missing codes, the function is rather based on checking whether NAs are present in the study data

Value

a list with


Check for unexpected data record set

Description

This function tests that the identifiers match a provided record set. It is possible to check for unexpected data record sets by study segments or to consider only selected segments.

Indicator

Usage

int_unexp_records_set(
  level = c("dataframe", "segment"),
  study_data,
  item_level = "item_level",
  label_col,
  meta_data = item_level,
  meta_data_v2,
  ...
)

Arguments

level

character a character vector indicating whether the assessment should be conducted at the study level (level = "dataframe") or at the segment level (level = "segment").

study_data

data.frame the data frame that contains the measurements

item_level

data.frame the data frame that contains metadata attributes of study data

label_col

variable attribute the name of the column in the metadata with labels of variables

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

...

Depending on level, passed to either util_int_unexp_records_set_segment or util_int_unexp_records_set_dataframe

Value

a list. Depending on level, see util_int_unexp_records_set_segment or util_int_unexp_records_set_dataframe for a description of the outputs.


Description

used by the dq_report2-pipeline

Usage

.menu_env

Format

An object of class environment of length 3.


Description

Generate the menu for a report

Arguments

pages

encapsulated list with report pages as tagList objects, its names are the desired file names

Value

the html-taglist for the menu


Description

Creates a drop-down menu

Arguments

title

name of the entry in the main menu

menu_description

description, displayed, if the main menu entry itself is clicked

...

the sub-menu-entries

id

id for the entry, defaults to modified title

Value

html div object


Create a single menu entry

Description

Create a single menu entry

Arguments

title

of the entry

id

linked href, defaults to modified title. can be a word, then a single-page-link with an anchor tag is created.

...

additional arguments for the menu link

Value

html-a-tag object


Data frame with metadata about the study data on variable level

Description

Variable level metadata.

See Also

further details on variable level metadata.

meta_data_segment

meta_data_dataframe


Well known columns on the meta_data_cross-item sheet

Description

Metadata describing groups of variables, e.g., for their multivariate distribution or for defining contradiction rules.

See Also

check_table

Online Documentation

Other meta_data_cross: ASSOCIATION_DIRECTION, ASSOCIATION_FORM, ASSOCIATION_METRIC, ASSOCIATION_RANGE, CHECK_ID, CHECK_LABEL, CONTRADICTION_TERM, CONTRADICTION_TYPE, DATA_PREPARATION, GOLDSTANDARD, MULTIVARIATE_OUTLIER_CHECK, MULTIVARIATE_OUTLIER_CHECKTYPE, N_RULES, REL_VAL, VARIABLE_LIST, util_normalize_cross_item()


Well known columns on the meta_data_dataframe sheet

Description

Metadata describing data delivered on one data frame/table sheet, e.g., a full questionnaire, not its items.


.meta_data_env – an environment for easy metadata access

Description

used by the dq_report2-pipeline

Usage

.meta_data_env

Format

An object of class environment of length 8.

See Also

meta_data_env_id_vars meta_data_env_co_vars meta_data_env_time_vars meta_data_env_group_vars


Extract co-variables for a given item

Description

Extract co-variables for a given item

Arguments

entity

vector of item-identifiers

Value

a vector with co-variables for each entity-entry, having the explode attribute set to FALSE

See Also

meta_data_env


Extract MULTIVARIATE_OUTLIER_CHECK for variable group

Description

Extract MULTIVARIATE_OUTLIER_CHECK for variable group

Extract selected outlier criteria for a given item or variable group

Arguments

entity

vector of item- or variable group identifiers

Details

In the environment, target_meta_data should be set either to item_level or to cross-item_level.

In the environment, target_meta_data should be set either to item_level or to cross-item_level.

Value

a vector with id-variables for each entity-entry, having the explode attribute set to FALSE

a vector with id-variables for each entity-entry, having the explode attribute set to FALSE

See Also

meta_data_env

meta_data_env


Extract group variables for a given item

Description

Extract group variables for a given item

Arguments

entity

vector of item-identifiers

Value

a vector with possible group-variables (can be more than one per item) for each entity-entry, having the explode attribute set to TRUE

See Also

meta_data_env


Extract id variables for a given item or variable group

Description

Extract id variables for a given item or variable group

Arguments

entity

vector of item- or variable group identifiers

Details

In the environment, target_meta_data should be set either to item_level or to cross-item_level.

Value

a vector with id-variables for each entity-entry, having the explode attribute set to FALSE

See Also

meta_data_env


Extract outlier rules-number-threshold for a given item or variable group

Description

Extract outlier rules-number-threshold for a given item or variable group

Arguments

entity

vector of item- or variable group identifiers

Details

In the environment, target_meta_data should be set either to item_level or to cross-item_level.

Value

a vector with id-variables for each entity-entry, having the explode attribute set to FALSE

See Also

meta_data_env


Extract measurement time variable for a given item

Description

Extract measurement time variable for a given item

Arguments

entity

vector of item-identifiers

Value

a vector with time-variables (usually one per item) for each entity-entry, having the explode attribute set to TRUE

See Also

meta_data_env


Well known columns on the meta_data_segment sheet

Description

Metadata describing study segments, e.g., a full questionnaire, not its items.


return the number of result slots in a report

Description

return the number of result slots in a report

Usage

nres(x)

Arguments

x

the dataquieR report (v2.0)

Value

the number of used result slots


Convert a pipeline result data frame to named encapsulated lists

Description

Deprecated

Usage

pipeline_recursive_result(...)

Arguments

...

Deprecated

Value

Deprecated


Call (nearly) one "Accuracy" function with many parameterizations at once automatically

Description

Deprecated

Usage

pipeline_vectorized(...)

Arguments

...

Deprecated

Value

Deprecated


Plot a dataquieR summary

Description

Plot a dataquieR summary

Usage

## S3 method for class 'dataquieR_summary'
plot(x, y, ..., filter, dont_plot = FALSE, stratify_by)

Arguments

x

the dataquieR summary, see summary() and dq_report2()

y

not yet used

...

not yet used

filter

if given, this filters the summary, e.g., filter = call_names == "com_qualified_item_missingness"

dont_plot

suppress the actual plotting, just return a printable object derived from x

stratify_by

column to stratify the summary, may be one string.

Value

invisible html object


Utility function to plot a combined figure for distribution checks

Description

Data quality indicator checks "Unexpected location" with histograms and plots of empirical cumulative distributions for the subgroups.

Usage

prep_acc_distributions_with_ecdf(
  resp_vars = NULL,
  group_vars = NULL,
  study_data,
  label_col,
  item_level = "item_level",
  meta_data = item_level,
  meta_data_v2,
  n_group_max = getOption("dataquieR.max_group_var_levels_in_plot",
    dataquieR.max_group_var_levels_in_plot_default),
  n_obs_per_group_min = getOption("dataquieR.min_obs_per_group_var_in_plot",
    dataquieR.min_obs_per_group_var_in_plot_default)
)

Arguments

resp_vars

variable list the name of the measurement variable

group_vars

variable list the name of the observer, device or reader variable

study_data

data.frame the data frame that contains the measurements

label_col

variable attribute the name of the column in the metadata with labels of variables

item_level

data.frame the data frame that contains metadata attributes of study data

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

n_group_max

maximum number of categories to be displayed individually for the grouping variable (group_vars, devices / examiners)

n_obs_per_group_min

minimum number of data points per group to create a graph for an individual category of the group_vars variable

Value

A SummaryPlot.


Convert missing codes in metadata format v1.0 and a missing-cause-table to v2.0 missing list / jump list assignments

Description

The function has to working modes. If replace_meta_data is TRUE, by default, if cause_label_df contains a column named resp_vars, then the missing/jump codes in meta_data[, c(MISSING_CODES, JUMP_CODES)] will be overwritten, otherwise, it will be labeled using the cause_label_df.

Usage

prep_add_cause_label_df(
  item_level = "item_level",
  cause_label_df,
  label_col = VAR_NAMES,
  assume_consistent_codes = TRUE,
  replace_meta_data = ("resp_vars" %in% colnames(cause_label_df)),
  meta_data = item_level,
  meta_data_v2
)

Arguments

item_level

data.frame the data frame that contains metadata attributes of study data

cause_label_df

data.frame missing code table. If missing codes have labels the respective data frame can be specified here, see cause_label_df

label_col

variable attribute the name of the column in the metadata with labels of variables

assume_consistent_codes

logical if TRUE and no labels are given and the same missing/jump code is used for more than one variable, the labels assigned for this code will be the same for all variables.

replace_meta_data

logical if TRUE, ignore existing missing codes and jump codes and replace them with data from the cause_label_df. Otherwise, copy the labels from cause_label_df to the existing code columns.

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

Details

If a column resp_vars exists, then rows with a value in resp_vars will only be used for the corresponding variable.

Value

data.frame updated metadata including all the code labels in missing/jump lists

See Also

prep_extract_cause_label_df


Insert missing codes for NAs based on rules

Description

Insert missing codes for NAs based on rules

Usage

prep_add_computed_variables(
  study_data,
  meta_data,
  label_col,
  rules,
  use_value_labels
)

Arguments

study_data

data.frame the data frame that contains the measurements

meta_data

data.frame the data frame that contains metadata attributes of study data

label_col

variable attribute the name of the column in the metadata with labels of variables

rules

data.frame with the columns:

use_value_labels

logical In rules for factors, use the value labels, not the codes. Defaults to TRUE, if any VALUE_LABELS are given in the metadata.

Value

a list with the entry:

Examples

## Not run: 
study_data <- prep_get_data_frame("ship")
prep_load_workbook_like_file("ship_meta_v2")
meta_data <- prep_get_data_frame("item_level")
rules <- tibble::tribble(
  ~VAR_NAMES,  ~RULE,
  "BMI", '[BODY_WEIGHT_0]/(([BODY_HEIGHT_0]/100)^2)',
  "R", '[WAIST_CIRC_0]/2/[pi]', # in m^3
  "VOL_EST", '[pi]*([WAIST_CIRC_0]/2/[pi])^2*[BODY_HEIGHT_0] / 1000', # in l
 )
 r <- prep_add_computed_variables(study_data, meta_data,
   label_col = "LABEL", rules, use_value_labels = FALSE)

## End(Not run)

Add data frames to the pre-loaded / cache data frame environment

Description

These can be referred to by their names, then, wherever dataquieR expects a data.frame – just pass a character instead. If this character is not found, dataquieR would additionally look for files with the name and for URLs. You can also refer to specific sheets of a workbook or specific object from an RData by appending a pipe symbol and its name. A second pipe symbol allows to extract certain columns from such sheets (but they will remain data frames).

Usage

prep_add_data_frames(..., data_frame_list = list())

Arguments

...

data frames, if passed with names, these will be the names of these tables in the data frame environment. If not, then the names in the calling environment will be used.

data_frame_list

a named list with data frames. Also these will be added and names will be handled as for the ... argument.

Value

data.frame ⁠invisible(the cache environment)⁠

See Also

prep_load_workbook_like_file

prep_get_data_frame

Other data-frame-cache: prep_get_data_frame(), prep_list_dataframes(), prep_load_folder_with_metadata(), prep_load_workbook_like_file(), prep_purge_data_frame_cache(), prep_remove_from_cache()


Insert missing codes for NAs based on rules

Description

Insert missing codes for NAs based on rules

Usage

prep_add_missing_codes(
  resp_vars,
  study_data,
  meta_data_v2,
  item_level = "item_level",
  label_col,
  rules,
  use_value_labels,
  overwrite = FALSE,
  meta_data = item_level
)

Arguments

resp_vars

variable list the name of the measurement variables to be modified, all from rules, if omitted

study_data

data.frame the data frame that contains the measurements

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

item_level

data.frame the data frame that contains metadata attributes of study data

label_col

variable attribute the name of the column in the metadata with labels of variables

rules

data.frame with the columns:

  • resp_vars: Variable, whose NA-values should be replaced by jump codes

  • CODE_CLASS: Either MISSING or JUMP: Is the currently described case an expected missing value (JUMP) or not (MISSING)

  • CODE_VALUE: The jump code or missing code

  • CODE_LABEL: A label describing the reason for the missing value

  • RULE: A rule in REDcap style (see, e.g., REDcap help, REDcap how-to), and REDcap branching logic that describes cases for the missing

use_value_labels

logical In rules for factors, use the value labels, not the codes. Defaults to TRUE, if any VALUE_LABELS are given in the metadata.

overwrite

logical Also insert missing codes, if the values are not NA

meta_data

data.frame old name for item_level attributes of study data

Value

a list with the entries:


Support function to augment metadata during data quality reporting

Description

adds an annotation to static metadata

Usage

prep_add_to_meta(
  VAR_NAMES,
  DATA_TYPE,
  LABEL,
  VALUE_LABELS,
  item_level = "item_level",
  meta_data = item_level,
  meta_data_v2,
  ...
)

Arguments

VAR_NAMES

character Names of the Variables to add

DATA_TYPE

character Data type for the added variables

LABEL

character Labels for these variables

VALUE_LABELS

character Value labels for the values of the variables as usually pipe separated and assigned with =: 1 = male | 2 = female

item_level

data.frame the metadata to extend

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

...

Further defined variable attributes, see prep_create_meta

Details

Add metadata e.g. of transformed/new variable This function is not yet considered stable, but we already export it, because it could help. Therefore, we have some inconsistencies in the formals still.

Value

a data frame with amended metadata.


Re-Code labels with their respective codes according to the meta_data

Description

Re-Code labels with their respective codes according to the meta_data

Usage

prep_apply_coding(
  study_data,
  meta_data_v2,
  item_level = "item_level",
  meta_data = item_level
)

Arguments

study_data

data.frame the data frame that contains the measurements

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

item_level

data.frame the data frame that contains metadata attributes of study data

meta_data

data.frame old name for item_level

Value

data.frame modified study data with labels replaced by the codes


Check for package updates

Description

Check for package updates

Usage

prep_check_for_dataquieR_updates(
  beta = FALSE,
  deps = TRUE,
  ask = interactive()
)

Arguments

beta

logical check for beta version too

deps

logical check for missing (optional) dependencies

ask

logical ask for updates

Value

invisible(NULL)


Verify and normalize metadata on data frame level

Description

if possible, mismatching data types are converted ("true" becomes TRUE)

Usage

prep_check_meta_data_dataframe(
  meta_data_dataframe = "dataframe_level",
  meta_data_v2,
  dataframe_level
)

Arguments

meta_data_dataframe

data.frame data frame or path/url of a metadata sheet for the data frame level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

dataframe_level

data.frame alias for meta_data_dataframe

Details

missing columns are added, filled with NA, if this is valid, i.e., n.a. for DF_NAME as the key column

Value

standardized metadata sheet as data frame

Examples

## Not run: 
mds <- prep_check_meta_data_dataframe("ship_meta_dataframe|dataframe_level") # also converts
print(mds)
prep_check_meta_data_dataframe(mds)
mds1 <- mds
mds1$DF_RECORD_COUNT <- NULL
print(prep_check_meta_data_dataframe(mds1)) # fixes the missing column by NAs
mds1 <- mds
mds1$DF_UNIQUE_ROWS[[2]] <- "xxx" # not convertible
# print(prep_check_meta_data_dataframe(mds1)) # fail
mds1 <- mds
mds1$DF_UNIQUE_ID[[2]] <- 12
# print(prep_check_meta_data_dataframe(mds1)) # fail

## End(Not run)

Verify and normalize metadata on segment level

Description

if possible, mismatching data types are converted ("true" becomes TRUE)

Usage

prep_check_meta_data_segment(
  meta_data_segment = "segment_level",
  meta_data_v2,
  segment_level
)

Arguments

meta_data_segment

data.frame data frame or path/url of a metadata sheet for the segment level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

segment_level

data.frame alias for meta_data_segment

Details

missing columns are added, filled with NA, if this is valid, i.e., n.a. for STUDY_SEGMENT as the key column

Value

standardized metadata sheet as data frame

Examples

## Not run: 
mds <- prep_check_meta_data_segment("ship_meta_v2|segment_level") # also converts
print(mds)
prep_check_meta_data_segment(mds)
mds1 <- mds
mds1$SEGMENT_RECORD_COUNT <- NULL
print(prep_check_meta_data_segment(mds1)) # fixes the missing column by NAs
mds1 <- mds
mds1$SEGMENT_UNIQUE_ROWS[[2]] <- "xxx" # not convertible
# print(prep_check_meta_data_segment(mds1)) # fail

## End(Not run)

Checks the validity of metadata w.r.t. the provided column names

Description

This function verifies, if a data frame complies to metadata conventions and provides a given richness of meta information as specified by level.

Usage

prep_check_meta_names(
  item_level = "item_level",
  level,
  character.only = FALSE,
  meta_data = item_level,
  meta_data_v2
)

Arguments

item_level

data.frame the data frame that contains metadata attributes of study data

level

enum level of requirement (see also VARATT_REQUIRE_LEVELS). set to NULL to deactivate the check of richness.

character.only

logical a logical indicating whether level can be assumed to be character strings.

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

Details

Note, that only the given level is checked despite, levels are somehow hierarchical.

Value

a logical with:

Examples

## Not run: 
prep_check_meta_names(data.frame(VAR_NAMES = 1, DATA_TYPE = 2,
                      MISSING_LIST = 3))

prep_check_meta_names(
  data.frame(
    VAR_NAMES = 1, DATA_TYPE = 2, MISSING_LIST = 3,
    LABEL = "LABEL", VALUE_LABELS = "VALUE_LABELS",
    JUMP_LIST = "JUMP_LIST", HARD_LIMITS = "HARD_LIMITS",
    GROUP_VAR_OBSERVER = "GROUP_VAR_OBSERVER",
    GROUP_VAR_DEVICE = "GROUP_VAR_DEVICE",
    TIME_VAR = "TIME_VAR",
    PART_VAR = "PART_VAR",
    STUDY_SEGMENT = "STUDY_SEGMENT",
    LOCATION_RANGE = "LOCATION_RANGE",
    LOCATION_METRIC = "LOCATION_METRIC",
    PROPORTION_RANGE = "PROPORTION_RANGE",
    MISSING_LIST_TABLE = "MISSING_LIST_TABLE",
    CO_VARS = "CO_VARS",
    LONG_LABEL = "LONG_LABEL"
  ),
  RECOMMENDED
)

prep_check_meta_names(
  data.frame(
    VAR_NAMES = 1, DATA_TYPE = 2, MISSING_LIST = 3,
    LABEL = "LABEL", VALUE_LABELS = "VALUE_LABELS",
    JUMP_LIST = "JUMP_LIST", HARD_LIMITS = "HARD_LIMITS",
    GROUP_VAR_OBSERVER = "GROUP_VAR_OBSERVER",
    GROUP_VAR_DEVICE = "GROUP_VAR_DEVICE",
    TIME_VAR = "TIME_VAR",
    PART_VAR = "PART_VAR",
    STUDY_SEGMENT = "STUDY_SEGMENT",
    LOCATION_RANGE = "LOCATION_RANGE",
    LOCATION_METRIC = "LOCATION_METRIC",
    PROPORTION_RANGE = "PROPORTION_RANGE",
    DETECTION_LIMITS = "DETECTION_LIMITS", SOFT_LIMITS = "SOFT_LIMITS",
    CONTRADICTIONS = "CONTRADICTIONS", DISTRIBUTION = "DISTRIBUTION",
    DECIMALS = "DECIMALS", VARIABLE_ROLE = "VARIABLE_ROLE",
    DATA_ENTRY_TYPE = "DATA_ENTRY_TYPE",
    CO_VARS = "CO_VARS",
    END_DIGIT_CHECK = "END_DIGIT_CHECK",
    VARIABLE_ORDER = "VARIABLE_ORDER", LONG_LABEL =
      "LONG_LABEL", recode = "recode",
      MISSING_LIST_TABLE = "MISSING_LIST_TABLE"
  ),
  OPTIONAL
)

# Next one will fail
try(
  prep_check_meta_names(data.frame(VAR_NAMES = 1, DATA_TYPE = 2,
    MISSING_LIST = 3), TECHNICAL)
)

## End(Not run)

Support function to scan variable labels for applicability

Description

Adjust labels in meta_data to be valid variable names in formulas for diverse r functions, such as glm or lme4::lmer.

Usage

prep_clean_labels(
  label_col,
  item_level = "item_level",
  no_dups = FALSE,
  meta_data = item_level,
  meta_data_v2
)

Arguments

label_col

character label attribute to adjust or character vector to adjust, depending on meta_data argument is given or missing.

item_level

data.frame metadata data frame: If label_col is a label attribute to adjust, this is the metadata table to process on. If missing, label_col must be a character vector with values to adjust.

no_dups

logical disallow duplicates in input or output vectors of the function, then, prep_clean_labels would call stop() on duplicated labels.

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

Details

Hint: The following is still true, but the functions should be capable of doing potentially needed fixes on-the-fly automatically, so likely you will not need this function any more.

Currently, labels as given by label_col arguments in the most functions are directly used in formula, so that they become natural part of the outputs, but different models expect differently strict syntax for such formulas, especially for valid variable names. prep_clean_labels removes all potentially inadmissible characters from variable names (no guarantee, that some exotic model still rejects the names, but minimizing the number of exotic characters). However, variable names are modified, may become unreadable or indistinguishable from other variable names. For the latter case, a stop call is possible, controlled by the no_dups argument.

A warning is emitted, if modifications were necessary.

Value

a data.frame with:

Examples

## Not run: 
meta_data1 <- data.frame(
  LABEL =
    c(
      "syst. Blood pressure (mmHg) 1",
      "1st heart frequency in MHz",
      "body surface (\\u33A1)"
    )
)
print(meta_data1)
print(prep_clean_labels(meta_data1$LABEL))
meta_data1 <- prep_clean_labels("LABEL", meta_data1)
print(meta_data1)

## End(Not run)

Combine two report summaries

Description

Combine two report summaries

Usage

prep_combine_report_summaries(..., summaries_list, amend_segment_names = FALSE)

Arguments

...

objects returned by prep_extract_summary

summaries_list

if given, list of objects returned by prep_extract_summary

amend_segment_names

logical use names of the summaries_list and argument names as segment prefixes

Value

combined summaries

See Also

Other summary_functions: prep_extract_classes_by_functions(), prep_extract_summary(), prep_extract_summary.dataquieR_result(), prep_extract_summary.dataquieR_resultset2(), prep_render_pie_chart_from_summaryclasses_ggplot2(), prep_render_pie_chart_from_summaryclasses_plotly(), prep_summary_to_classes(), util_as_cat(), util_as_integer_cat(), util_extract_indicator_metrics(), util_get_category_for_result(), util_get_colors(), util_get_labels_grading_class(), util_get_message_for_result(), util_get_rule_sets(), util_get_ruleset_formats(), util_get_thresholds(), util_html_table(), util_sort_by_order()


Verify item-level metadata

Description

are the provided item-level meta_data plausible given study_data?

Usage

prep_compare_meta_with_study(
  study_data,
  label_col,
  item_level = "item_level",
  meta_data = item_level,
  meta_data_v2
)

Arguments

study_data

data.frame the data frame that contains the measurements

label_col

variable attribute the name of the column in the metadata with labels of variables

item_level

data.frame the data frame that contains metadata attributes of study data

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

Value

an invisible() list with the entries.


Support function to create data.frames of metadata

Description

Create a metadata data frame and map names. Generally, this function only creates a data.frame, but using this constructor instead of calling data.frame(..., stringsAsFactors = FALSE), it becomes possible, to adapt the metadata data.frame in later developments, e.g. if we decide to use classes for the metadata, or if certain standard names of variable attributes change. Also, a validity check is possible to implement here.

Usage

prep_create_meta(..., stringsAsFactors = FALSE, level, character.only = FALSE)

Arguments

...

named column vectors, names will be mapped using WELL_KNOWN_META_VARIABLE_NAMES, if included in WELL_KNOWN_META_VARIABLE_NAMES can also be a data frame, then its column names will be mapped using WELL_KNOWN_META_VARIABLE_NAMES

stringsAsFactors

logical if the argument is a list of vectors, a data frame will be created. In this case, stringsAsFactors controls, whether characters will be auto-converted to Factors, which defaults here always to false independent from the default.stringsAsFactors.

level

enum level of requirement (see also VARATT_REQUIRE_LEVELS) set to NULL, if not a complete metadata frame is created.

character.only

logical a logical indicating whether level can be assumed to be character strings.

Details

For now, this calls data.frame, but it already renames variable attributes, if they have a different name assigned in WELL_KNOWN_META_VARIABLE_NAMES, e.g. WELL_KNOWN_META_VARIABLE_NAMES$RECODE maps to recode in lower case.

NB: dataquieR exports all names from WELL_KNOWN_META_VARIABLE_NAME as symbols, so RECODE also contains "recode".

Value

a data frame with:

See Also

WELL_KNOWN_META_VARIABLE_NAMES


Instantiate a new metadata file

Description

Instantiate a new metadata file

Usage

prep_create_meta_data_file(
  file_name,
  study_data,
  open = TRUE,
  overwrite = FALSE
)

Arguments

file_name

character file path to write to

study_data

data.frame optional, study data to guess metadata from

open

logical open the file after creation

overwrite

logical overwrite file, if exists

Value

invisible(NULL)


Create a factory function for storr objects for backing a dataquieR_resultset2

Description

Create a factory function for storr objects for backing a dataquieR_resultset2

Usage

prep_create_storr_factory(db_dir = tempfile(), namespace = "objects")

Arguments

db_dir

character path to the directory for the back-end, if one is created on the fly.

namespace

character namespace for the report, so that one back-end can back several reports

the returned function will try to create a storr object using a temporary folder or the folder in db_dir, if specified. The database will either be the storr_rds.

Value

storr object or NULL, if package storr is not available


Get data types from data

Description

Get data types from data

Usage

prep_datatype_from_data(
  resp_vars = colnames(study_data),
  study_data,
  .dont_cast_off_cols = FALSE
)

Arguments

resp_vars

variable names of the variables to fetch the data type from the data

study_data

data.frame the data frame that contains the measurements Hint: Only data frames supported, no URL or file names.

.dont_cast_off_cols

logical internal use, only

Value

vector of data types

Examples

## Not run: 
dataquieR::prep_datatype_from_data(cars)

## End(Not run)

Convert two vectors from a code-value-table to a key-value list

Description

Convert two vectors from a code-value-table to a key-value list

Usage

prep_deparse_assignments(
  codes,
  labels = codes,
  split_char = SPLIT_CHAR,
  mode = c("numeric_codes", "string_codes")
)

Arguments

codes

codes, numeric or dates (as default, but string codes can be enabled using the option 'mode', see below)

labels

character labels, same length as codes

split_char

character split character character to split code assignments

mode

character one of two options to insist on numeric or datetime codes (default) or to allow for string codes

Value

a vector with assignment strings for each row of cbind(codes, labels)


Get the dataquieR DATA_TYPE of x

Description

Get the dataquieR DATA_TYPE of x

Usage

prep_dq_data_type_of(x)

Arguments

x

object to define the dataquieR data type of

Value

the dataquieR data type as listed in DATA_TYPES

See Also

DATA_TYPES_OF_R_TYPE


Expand code labels across variables

Description

Code labels are copied from other variables, if the code is the same and the label is set only for some variables

Usage

prep_expand_codes(
  item_level = "item_level",
  suppressWarnings = FALSE,
  mix_jumps_and_missings = FALSE,
  meta_data_v2,
  meta_data = item_level
)

Arguments

item_level

data.frame the data frame that contains metadata attributes of study data

suppressWarnings

logical show warnings, if labels are expanded

mix_jumps_and_missings

logical ignore the class of the codes for label expansion, i.e., use missing code labels as jump code labels, if the values are the same.

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

meta_data

data.frame old name for item_level

Value

data.frame an updated metadata data frame.

Examples

## Not run: 
meta_data <- prep_get_data_frame("meta_data")
meta_data$JUMP_LIST[meta_data$VAR_NAMES == "v00003"] <- "99980 = NOOP"
md <- prep_expand_codes(meta_data)
md$JUMP_LIST
md$MISSING_LIST
md <- prep_expand_codes(meta_data, mix_jumps_and_missings = TRUE)
md$JUMP_LIST
md$MISSING_LIST
meta_data <- prep_get_data_frame("meta_data")
meta_data$MISSING_LIST[meta_data$VAR_NAMES == "v00003"] <- "99980 = NOOP"
md <- prep_expand_codes(meta_data)
md$JUMP_LIST
md$MISSING_LIST

## End(Not run)

Extract all missing/jump codes from metadata and export a cause-label-data-frame

Description

Extract all missing/jump codes from metadata and export a cause-label-data-frame

Usage

prep_extract_cause_label_df(
  item_level = "item_level",
  label_col = VAR_NAMES,
  meta_data_v2,
  meta_data = item_level
)

Arguments

item_level

data.frame the data frame that contains metadata attributes of study data

label_col

variable attribute the name of the column in the metadata with labels of variables

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

meta_data

data.frame old name for item_level

Value

list with the entries

See Also

prep_add_cause_label_df


Extract old function based summary from data quality results

Description

Extract old function based summary from data quality results

Usage

prep_extract_classes_by_functions(r)

Arguments

r

dq_report2

Value

data.frame long format, compatible with prep_summary_to_classes()

See Also

Other summary_functions: prep_combine_report_summaries(), prep_extract_summary(), prep_extract_summary.dataquieR_result(), prep_extract_summary.dataquieR_resultset2(), prep_render_pie_chart_from_summaryclasses_ggplot2(), prep_render_pie_chart_from_summaryclasses_plotly(), prep_summary_to_classes(), util_as_cat(), util_as_integer_cat(), util_extract_indicator_metrics(), util_get_category_for_result(), util_get_colors(), util_get_labels_grading_class(), util_get_message_for_result(), util_get_rule_sets(), util_get_ruleset_formats(), util_get_thresholds(), util_html_table(), util_sort_by_order()


Extract summary from data quality results

Description

Generic function, currently supports dq_report2 and dataquieR_result

Usage

prep_extract_summary(r, ...)

Arguments

r

dq_report2 or dataquieR_result object

...

further arguments, maybe needed for some implementations

Value

list with two slots Data and Table with data.frames featuring all metrics columns from the report or result in x, the STUDY_SEGMENT and the VAR_NAMES. In case of Data, the columns are formatted nicely but still with the standardized column names – use util_translate_indicator_metrics() to rename them nicely. In case of Table, just as they are.

See Also

Other summary_functions: prep_combine_report_summaries(), prep_extract_classes_by_functions(), prep_extract_summary.dataquieR_result(), prep_extract_summary.dataquieR_resultset2(), prep_render_pie_chart_from_summaryclasses_ggplot2(), prep_render_pie_chart_from_summaryclasses_plotly(), prep_summary_to_classes(), util_as_cat(), util_as_integer_cat(), util_extract_indicator_metrics(), util_get_category_for_result(), util_get_colors(), util_get_labels_grading_class(), util_get_message_for_result(), util_get_rule_sets(), util_get_ruleset_formats(), util_get_thresholds(), util_html_table(), util_sort_by_order()


Extract report summary from reports

Description

Extract report summary from reports

Usage

## S3 method for class 'dataquieR_result'
prep_extract_summary(r, ...)

Arguments

r

dataquieR_result a result from adq_report2 report

...

not used

Value

list with two slots Data and Table with data.frames featuring all metrics columns from the report r, the STUDY_SEGMENT and the VAR_NAMES. In case of Data, the columns are formatted nicely but still with the standardized column names – use util_translate_indicator_metrics() to rename them nicely. In case of Table, just as they are.

See Also

prep_combine_report_summaries()

Other summary_functions: prep_combine_report_summaries(), prep_extract_classes_by_functions(), prep_extract_summary(), prep_extract_summary.dataquieR_resultset2(), prep_render_pie_chart_from_summaryclasses_ggplot2(), prep_render_pie_chart_from_summaryclasses_plotly(), prep_summary_to_classes(), util_as_cat(), util_as_integer_cat(), util_extract_indicator_metrics(), util_get_category_for_result(), util_get_colors(), util_get_labels_grading_class(), util_get_message_for_result(), util_get_rule_sets(), util_get_ruleset_formats(), util_get_thresholds(), util_html_table(), util_sort_by_order()


Extract report summary from reports

Description

Extract report summary from reports

Usage

## S3 method for class 'dataquieR_resultset2'
prep_extract_summary(r, ...)

Arguments

r

dq_report2 a dq_report2 report

...

not used

Value

list with two slots Data and Table with data.frames featuring all metrics columns from the report r, the STUDY_SEGMENT and the VAR_NAMES. In case of Data, the columns are formatted nicely but still with the standardized column names – use util_translate_indicator_metrics() to rename them nicely. In case of Table, just as they are.

See Also

prep_combine_report_summaries()

Other summary_functions: prep_combine_report_summaries(), prep_extract_classes_by_functions(), prep_extract_summary(), prep_extract_summary.dataquieR_result(), prep_render_pie_chart_from_summaryclasses_ggplot2(), prep_render_pie_chart_from_summaryclasses_plotly(), prep_summary_to_classes(), util_as_cat(), util_as_integer_cat(), util_extract_indicator_metrics(), util_get_category_for_result(), util_get_colors(), util_get_labels_grading_class(), util_get_message_for_result(), util_get_rule_sets(), util_get_ruleset_formats(), util_get_thresholds(), util_html_table(), util_sort_by_order()


Read data from files/URLs

Description

data_frame_name can be a file path or an URL you can append a pipe and a sheet name for Excel files or object name e.g. for RData files. Numbers may also work. All file formats supported by your rio installation will work.

Usage

prep_get_data_frame(
  data_frame_name,
  .data_frame_list = .dataframe_environment(),
  keep_types = FALSE,
  column_names_only = FALSE
)

Arguments

data_frame_name

character name of the data frame to read, see details

.data_frame_list

environment cache for loaded data frames

keep_types

logical keep types as possibly defined in a file, if the data frame is loaded from one. set TRUE for study data.

column_names_only

logical if TRUE imports only headers (column names) of the data frame and no content (an empty data frame)

Details

The data frames will be cached automatically, you can define an alternative environment for this using the argument .data_frame_list, and you can purge the cache using prep_purge_data_frame_cache.

Use prep_add_data_frames to manually add data frames to the cache, e.g., if you have loaded them from more complex sources, before.

Value

data.frame a data frame

See Also

prep_add_data_frames

prep_load_workbook_like_file

Other data-frame-cache: prep_add_data_frames(), prep_list_dataframes(), prep_load_folder_with_metadata(), prep_load_workbook_like_file(), prep_purge_data_frame_cache(), prep_remove_from_cache()

Examples

## Not run: 
bl <- as.factor(prep_get_data_frame(
  paste0("https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus",
    "/Projekte_RKI/COVID-19_Todesfaelle.xlsx?__blob=",
    "publicationFile|COVID_Todesfälle_BL|Bundesland"))[[1]])

n <- as.numeric(prep_get_data_frame(paste0(
  "https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/",
  "Projekte_RKI/COVID-19_Todesfaelle.xlsx?__blob=",
  "publicationFile|COVID_Todesfälle_BL|Anzahl verstorbene",
  " COVID-19 Fälle"))[[1]])
plot(bl, n)
# Working names would be to date (2022-10-21), e.g.:
#
# https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/ \
#    Projekte_RKI/COVID-19_Todesfaelle.xlsx?__blob=publicationFile
# https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/  \
#    Projekte_RKI/COVID-19_Todesfaelle.xlsx?__blob=publicationFile|2
# https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/ \
#    Projekte_RKI/COVID-19_Todesfaelle.xlsx?__blob=publicationFile|name
# study_data
# ship
# meta_data
# ship_meta
#
prep_get_data_frame("meta_data | meta_data")

## End(Not run)

Fetch a label for a variable based on its purpose

Description

Fetch a label for a variable based on its purpose

Usage

prep_get_labels(
  resp_vars,
  item_level = "item_level",
  label_col,
  max_len = MAX_LABEL_LEN,
  label_class = c("SHORT", "LONG"),
  label_lang = getOption("dataquieR.lang", ""),
  resp_vars_are_var_names_only = FALSE,
  resp_vars_match_label_col_only = FALSE,
  meta_data = item_level,
  meta_data_v2,
  force_label_col = getOption("dataquieR.force_label_col",
    dataquieR.force_label_col_default)
)

Arguments

resp_vars

variable list the variable names to fetch for

item_level

data.frame the data frame that contains metadata attributes of study data

label_col

variable attribute the name of the column in the metadata with labels of variables

max_len

integer the maximum label length to return, if not possible w/o causing ambiguous labels, the labels may still be longer

label_class

enum SHORT | LONG. which sort of label according to the metadata model should be returned

label_lang

character optional language suffix, if available in the metadata. Can be controlled by the option dataquieR.lang.

resp_vars_are_var_names_only

logical If TRUE, do not use other labels than VAR_NAMES for finding resp_vars in meta_data

resp_vars_match_label_col_only

logical If TRUE, do not use other labels than those, referred by label_col for finding resp_vars in meta_data

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

force_label_col

enum auto | FALSE | TRUE. if TRUE, always use labels according label_col, FALSE means use labels matching best the function's requirements, auto means FALSE, if in a dq_report() and TRUE, otherwise.

Value

character suitable labels for each resp_vars, names of this vector are VAR_NAMES

Examples

## Not run: 
prep_load_workbook_like_file("meta_data_v2")
prep_get_labels("SEX_0", label_class = "SHORT", max_len = 2)

## End(Not run)

Get data frame for a given segment

Description

Get data frame for a given segment

Usage

prep_get_study_data_segment(
  segment,
  study_data,
  item_level = "item_level",
  meta_data = item_level,
  meta_data_v2,
  segment_level,
  meta_data_segment = "segment_level"
)

Arguments

segment

character name of the segment to return data for

study_data

data.frame the data frame that contains the measurements

item_level

data.frame the data frame that contains metadata attributes of study data

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

segment_level

data.frame alias for meta_data_segment

meta_data_segment

data.frame – optional: Segment level metadata

Value

data.frame the data for the segment


Return the logged-in User's Full Name

Description

If whoami is not installed, the user name from Sys.info() is returned.

Usage

prep_get_user_name()

Details

Can be overridden by options or environment:

options(FULLNAME = "Stephan Struckmann")

Sys.setenv(FULLNAME = "Stephan Struckmann")

Value

character the user's name


Get machine variant for snapshot tests

Description

Get machine variant for snapshot tests

Usage

prep_get_variant()

Value

character the variant


Guess encoding of text or text files

Description

Guess encoding of text or text files

Usage

prep_guess_encoding(x, file)

Arguments

x

character string to guess encoding for

file

character file to guess encoding for

Value

encoding


Description

Prepare a label as part of a link for RMD files

Usage

prep_link_escape(s, html = FALSE)

Arguments

s

the label

html

prepare the label for direct HTML output instead of RMD

Value

the escaped label


List Loaded Data Frames

Description

List Loaded Data Frames

Usage

prep_list_dataframes()

Value

names of all loaded data frames

See Also

Other data-frame-cache: prep_add_data_frames(), prep_get_data_frame(), prep_load_folder_with_metadata(), prep_load_workbook_like_file(), prep_purge_data_frame_cache(), prep_remove_from_cache()


All valid ⁠voc:⁠ vocabularies

Description

All valid ⁠voc:⁠ vocabularies

Usage

prep_list_voc()

Value

character() all ⁠voc:⁠ suffixes allowed for prep_get_data_frame().

Examples

## Not run: 
prep_list_dataframes()
prep_list_voc()
prep_get_data_frame("<ICD10>")
my_voc <-
  tibble::tribble(
    ~ voc, ~ url,
    "test", "data:datasets|iris|Species+Sepal.Length")
prep_add_data_frames(`<>` = my_voc)
prep_list_dataframes()
prep_list_voc()
prep_get_data_frame("<test>")
prep_get_data_frame("<ICD10>")
my_voc <-
  tibble::tribble(
    ~ voc, ~ url,
    "ICD10", "data:datasets|iris|Species+Sepal.Length")
prep_add_data_frames(`<>` = my_voc)
prep_list_dataframes()
prep_list_voc()
prep_get_data_frame("<ICD10>")

## End(Not run)


Pre-load a folder with named (usually more than) one table(s)

Description

These can thereafter be referred to by their names only. Such files are, e.g., spreadsheet-workbooks or RData-files.

Usage

prep_load_folder_with_metadata(folder, keep_types = FALSE, ...)

Arguments

folder

the folder name to load.

keep_types

logical keep types as possibly defined in the file. set TRUE for study data.

...

arguments passed to []

Details

Note, that this function in contrast to prep_get_data_frame does neither support selecting specific sheets/columns from a file.

Value

⁠invisible(the cache environment)⁠

See Also

prep_add_data_frames

prep_get_data_frame

Other data-frame-cache: prep_add_data_frames(), prep_get_data_frame(), prep_list_dataframes(), prep_load_workbook_like_file(), prep_purge_data_frame_cache(), prep_remove_from_cache()


Load a dq_report2

Description

Load a dq_report2

Usage

prep_load_report(file)

Arguments

file

character the file name to load from

Value

dataquieR_resultset2 the report


Load a report from a back-end

Description

Load a report from a back-end

Usage

prep_load_report_from_backend(
  namespace = "objects",
  db_dir,
  storr_factory = prep_create_storr_factory(namespace = namespace, db_dir = db_dir)
)

Arguments

namespace

the namespace to read the report's results from

db_dir

character path to the directory for the back-end, if a storr_rds or storr_torr is used.

storr_factory

a function returning a storr object holding the report

Value

dataquieR_resultset2 the report

See Also

prep_create_storr_factory()

Examples

## Not run: 
r <- dataquieR::dq_report2("study_data", meta_data_v2 = "meta_data_v2",
                           dimensions = NULL)
storr_factory <- prep_create_storr_factory()
r_storr <- prep_set_backend(r, storr_factory)
r_restorr <- prep_set_backend(r_storr, NULL)
r_loaded <- prep_load_report_from_backend(storr_factory)

## End(Not run)

Pre-load a file with named (usually more than) one table(s)

Description

These can thereafter be referred to by their names only. Such files are, e.g., spreadsheet-workbooks or RData-files.

Usage

prep_load_workbook_like_file(file, keep_types = FALSE)

Arguments

file

the file name to load.

keep_types

logical keep types as possibly defined in the file. set TRUE for study data.

Details

Note, that this function in contrast to prep_get_data_frame does neither support selecting specific sheets/columns from a file.

Value

⁠invisible(the cache environment)⁠

See Also

prep_add_data_frames

prep_get_data_frame

Other data-frame-cache: prep_add_data_frames(), prep_get_data_frame(), prep_list_dataframes(), prep_load_folder_with_metadata(), prep_purge_data_frame_cache(), prep_remove_from_cache()


Support function to allocate labels to variables

Description

Map variables to certain attributes, e.g. by default their labels.

Usage

prep_map_labels(
  x,
  item_level = "item_level",
  to = LABEL,
  from = VAR_NAMES,
  ifnotfound,
  warn_ambiguous = FALSE,
  meta_data_v2,
  meta_data = item_level
)

Arguments

x

character variable names, character vector, see parameter from

item_level

data.frame metadata data frame, if, as a dataquieR developer, you do not have item-level-metadata, you should use util_map_labels instead to avoid consistency checks on for item-level meta_data.

to

character variable attribute to map to

from

character variable identifier to map from

ifnotfound

list A list of values to be used if the item is not found: it will be coerced to a list if necessary.

warn_ambiguous

logical print a warning if mapping variables from from to to produces ambiguous identifiers.

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

meta_data

data.frame old name for item_level

Details

This function basically calls colnames(study_data) <- meta_data$LABEL, ensuring correct merging/joining of study data columns to the corresponding metadata rows, even if the orders differ. If a variable/study_data-column name is not found in meta_data[[from]] (default from = VAR_NAMES), either stop is called or, if ifnotfound has been assigned a value, that value is returned. See mget, which is internally used by this function.

The function not only maps to the LABEL column, but to can be any metadata variable attribute, so the function can also be used, to get, e.g. all HARD_LIMITS from the metadata.

Value

a character vector with:

Examples

## Not run: 
meta_data <- prep_create_meta(
  VAR_NAMES = c("ID", "SEX", "AGE", "DOE"),
  LABEL = c("Pseudo-ID", "Gender", "Age", "Examination Date"),
  DATA_TYPE = c(DATA_TYPES$INTEGER, DATA_TYPES$INTEGER, DATA_TYPES$INTEGER,
                 DATA_TYPES$DATETIME),
  MISSING_LIST = ""
)
stopifnot(all(prep_map_labels(c("AGE", "DOE"), meta_data) == c("Age",
                                                 "Examination Date")))

## End(Not run)

Merge a list of study data frames to one (sparse) study data frame

Description

Merge a list of study data frames to one (sparse) study data frame

Usage

prep_merge_study_data(study_data_list)

Arguments

study_data_list

list the list

Value

data.frame study_data


Convert item-level metadata from v1.0 to v2.0

Description

This function is idempotent..

Usage

prep_meta_data_v1_to_item_level_meta_data(
  item_level = "item_level",
  verbose = TRUE,
  label_col = LABEL,
  cause_label_df,
  meta_data = item_level
)

Arguments

item_level

data.frame the old item-level-metadata

verbose

logical display all estimated decisions, defaults to TRUE, except if called in a dq_report2 pipeline.

label_col

variable attribute the name of the column in the metadata with labels of variables

cause_label_df

data.frame missing code table, see cause_label_df. Optional. If this argument is given, you can add missing code tables.

meta_data

data.frame old name for item_level

Details

The options("dataquieR.force_item_specific_missing_codes") (default FALSE) tells the system, to always fill in res_vars columns to the MISSING_LIST_TABLE, even, if the column already exists, but is empty.

Value

data.frame the updated metadata


Support function to identify the levels of a process variable with minimum number of observations

Description

utility function to subset data based on minimum number of observation per level

Usage

prep_min_obs_level(study_data, group_vars, min_obs_in_subgroup)

Arguments

study_data

data.frame the data frame that contains the measurements

group_vars

variable list the name grouping variable

min_obs_in_subgroup

integer optional argument if a "group_var" is used. This argument specifies the minimum no. of observations that is required to include a subgroup (level) of the "group_var" in the analysis. Subgroups with less observations are excluded. The default is 30.

Details

This functions removes observations having fewer than min_obs_in_subgroup distinct values in a group variable, e.g. blood pressure measurements performed by an examiner having fewer than e.g. 50 measurements done. It displays a warning, if samples/rows are removed and returns the modified study data frame.

Value

a data frame with:


Open a data frame in Excel

Description

Open a data frame in Excel

Usage

prep_open_in_excel(dfr)

Arguments

dfr

the data frame

Details

if the file cannot be read on function exit, NULL will be returned

Value

potentially modified data frame after dialog was closed


Support function for a parallel pmap

Description

parallel version of purrr::pmap

Usage

prep_pmap(.l, .f, ..., cores = 0)

Arguments

.l

data.frame with one call per line and one function argument per column

.f

function to call with the arguments from .l

...

additional, static arguments for calling .f

cores

number of cpu cores to use or a (named) list with arguments for parallelMap::parallelStart or NULL, if parallel has already been started by the caller. Set to 0 to run without parallelization.

Value

list of results of the function calls

Author(s)

Aurèle

S Struckmann

See Also

purrr::pmap

Stack Overflow post


Prepare and verify study data with metadata

Description

This function ensures, that a data frame ds1 with suitable variable names study_data and meta_data exist as base data.frames.

Usage

prep_prepare_dataframes(
  .study_data,
  .meta_data,
  .label_col,
  .replace_hard_limits,
  .replace_missings,
  .sm_code = NULL,
  .allow_empty = FALSE,
  .adjust_data_type = TRUE,
  .amend_scale_level = TRUE,
  .apply_factor_metadata = FALSE,
  .apply_factor_metadata_inadm = FALSE,
  .internal = rlang::env_inherits(rlang::caller_env(), parent.env(environment()))
)

Arguments

.study_data

if provided, use this data set as study_data

.meta_data

if provided, use this data set as meta_data

.label_col

if provided, use this as label_col

.replace_hard_limits

replace HARD_LIMIT violations by NA, defaults to FALSE.

.replace_missings

replace missing codes, defaults to TRUE

.sm_code

missing code for NAs, if they have been re-coded by util_combine_missing_lists

.allow_empty

allow ds1 to be empty, i.e., 0 rows and/or 0 columns

.adjust_data_type

ensure that the data type of variables in the study data corresponds to their data type specified in the metadata

.amend_scale_level

ensure that SCALE_LEVEL is available in the item-level meta_data. internally used to prevent recursion, if called from prep_scalelevel_from_data_and_metadata().

.apply_factor_metadata

logical convert categorical variables to labeled factors.

.apply_factor_metadata_inadm

logical convert categorical variables to labeled factors keeping inadmissible values. Implies, that .apply_factor_metadata will be set to TRUE, too.

.internal

logical internally called, modify caller's environment.

Details

This function defines ds1 and modifies study_data and meta_data in the environment of its caller (see eval.parent). It also defines or modifies the object label_col in the calling environment. Almost all functions exported by dataquieR call this function initially, so that aspects common to all functions live here, e.g. testing, if an argument meta_data has been given and features really a data.frame. It verifies the existence of required metadata attributes (VARATT_REQUIRE_LEVELS). It can also replace missing codes by NAs, and calls prep_study2meta to generate a minimum set of metadata from the study data on the fly (should be amended, so on-the-fly-calling is not recommended for an instructive use of dataquieR).

The function also detects tibbles, which are then converted to base-R data.frames, which are expected by dataquieR.

If .internal is TRUE, differently from the other utility function that work in their caller's environment, this function modifies objects in the calling function's environment. It defines a new object ds1, it modifies study_data and/or meta_data and label_col.

Value

ds1 the study data with mapped column names

See Also

acc_margins

Examples

## Not run: 
acc_test1 <- function(resp_variable, aux_variable,
                      time_variable, co_variables,
                      group_vars, study_data, meta_data) {
  prep_prepare_dataframes()
  invisible(ds1)
}
acc_test2 <- function(resp_variable, aux_variable,
                      time_variable, co_variables,
                      group_vars, study_data, meta_data, label_col) {
  ds1 <- prep_prepare_dataframes(study_data, meta_data)
  invisible(ds1)
}
environment(acc_test1) <- asNamespace("dataquieR")
# perform this inside the package (not needed for functions that have been
# integrated with the package already)

environment(acc_test2) <- asNamespace("dataquieR")
# perform this inside the package (not needed for functions that have been
# integrated with the package already)
acc_test3 <- function(resp_variable, aux_variable, time_variable,
                      co_variables, group_vars, study_data, meta_data,
                      label_col) {
  prep_prepare_dataframes()
  invisible(ds1)
}
acc_test4 <- function(resp_variable, aux_variable, time_variable,
                      co_variables, group_vars, study_data, meta_data,
                      label_col) {
  ds1 <- prep_prepare_dataframes(study_data, meta_data)
  invisible(ds1)
}
environment(acc_test3) <- asNamespace("dataquieR")
# perform this inside the package (not needed for functions that have been
# integrated with the package already)

environment(acc_test4) <- asNamespace("dataquieR")
# perform this inside the package (not needed for functions that have been
# integrated with the package already)
meta_data <- prep_get_data_frame("meta_data")
study_data <- prep_get_data_frame("study_data")
try(acc_test1())
try(acc_test2())
acc_test1(study_data = study_data)
try(acc_test1(meta_data = meta_data))
try(acc_test2(study_data = 12, meta_data = meta_data))
print(head(acc_test1(study_data = study_data, meta_data = meta_data)))
print(head(acc_test2(study_data = study_data, meta_data = meta_data)))
print(head(acc_test3(study_data = study_data, meta_data = meta_data)))
print(head(acc_test3(study_data = study_data, meta_data = meta_data,
  label_col = LABEL)))
print(head(acc_test4(study_data = study_data, meta_data = meta_data)))
print(head(acc_test4(study_data = study_data, meta_data = meta_data,
  label_col = LABEL)))
try(acc_test2(study_data = NULL, meta_data = meta_data))

## End(Not run)


Clear data frame cache

Description

Clear data frame cache

Usage

prep_purge_data_frame_cache()

Value

nothing

See Also

Other data-frame-cache: prep_add_data_frames(), prep_get_data_frame(), prep_list_dataframes(), prep_load_folder_with_metadata(), prep_load_workbook_like_file(), prep_remove_from_cache()


Remove a specified element from the data frame cache

Description

Remove a specified element from the data frame cache

Usage

prep_remove_from_cache(object_to_remove)

Arguments

object_to_remove

character name of the object to be removed as character string (quoted), or character vector containing the names of the objects to remove from the cache

Value

nothing

See Also

Other data-frame-cache: prep_add_data_frames(), prep_get_data_frame(), prep_list_dataframes(), prep_load_folder_with_metadata(), prep_load_workbook_like_file(), prep_purge_data_frame_cache()

Examples

## Not run: 
prep_load_workbook_like_file("meta_data_v2") #load metadata in the cache
ls(.dataframe_environment()) #get the list of dataframes in the cache

#remove cross-item_level from the cache
prep_remove_from_cache("cross-item_level")

#remove dataframe_level and expected_id from the cache
prep_remove_from_cache(c("dataframe_level", "expected_id"))

#remove missing_table and segment_level from the cache
x<- c("missing_table", "segment_level")
prep_remove_from_cache(x)

## End(Not run)


Create a ggplot2 pie chart

Description

Create a ggplot2 pie chart

Usage

prep_render_pie_chart_from_summaryclasses_ggplot2(
  data,
  meta_data = "item_level"
)

Arguments

data

data as returned by prep_summary_to_classes but summarized by one column (currently, we support indicator_metric, STUDY_SEGMENT, and VAR_NAMES)

meta_data

meta_data

Value

a ggplot2::ggplot2 plot

See Also

Other summary_functions: prep_combine_report_summaries(), prep_extract_classes_by_functions(), prep_extract_summary(), prep_extract_summary.dataquieR_result(), prep_extract_summary.dataquieR_resultset2(), prep_render_pie_chart_from_summaryclasses_plotly(), prep_summary_to_classes(), util_as_cat(), util_as_integer_cat(), util_extract_indicator_metrics(), util_get_category_for_result(), util_get_colors(), util_get_labels_grading_class(), util_get_message_for_result(), util_get_rule_sets(), util_get_ruleset_formats(), util_get_thresholds(), util_html_table(), util_sort_by_order()


Create a plotly pie chart

Description

Create a plotly pie chart

Usage

prep_render_pie_chart_from_summaryclasses_plotly(
  data,
  meta_data = "item_level"
)

Arguments

data

data as returned by prep_summary_to_classes but summarized by one column (currently, we support indicator_metric, call_names, STUDY_SEGMENT, and VAR_NAMES)

meta_data

meta_data

Value

a htmltools compatible object

See Also

Other summary_functions: prep_combine_report_summaries(), prep_extract_classes_by_functions(), prep_extract_summary(), prep_extract_summary.dataquieR_result(), prep_extract_summary.dataquieR_resultset2(), prep_render_pie_chart_from_summaryclasses_ggplot2(), prep_summary_to_classes(), util_as_cat(), util_as_integer_cat(), util_extract_indicator_metrics(), util_get_category_for_result(), util_get_colors(), util_get_labels_grading_class(), util_get_message_for_result(), util_get_rule_sets(), util_get_ruleset_formats(), util_get_thresholds(), util_html_table(), util_sort_by_order()


Guess the data type of a vector

Description

Guess the data type of a vector

Usage

prep_robust_guess_data_type(x, k = 50, it = 200)

Arguments

x

a vector with characters

k

numeric sample size, if less than ⁠floor(length(x) / (it/20)))⁠, minimum sample size is 1.

it

integer number of iterations when taking samples

Value

a guess of the data type of x. An attribute orig_type is also attached to give the more detailed guess returned by readr::guess_parser().

Algorithm

This function takes x and tries to guess the data type of random subsets of this vector using readr::guess_parser(). The RNG is initialized with a constant, so the function stays deterministic. It does such sub-sample based checks it times, the majority of the detected datatype determines the guessed data type.


Save a dq_report2

Description

Save a dq_report2

Usage

prep_save_report(report, file, compression_level = 3)

Arguments

report

dataquieR_resultset2 the report

file

character the file name to write to

compression_level

integer from=0 to=9. Compression level. 9 is very slow.

Value

invisible(NULL)


Heuristics to amend a SCALE_LEVEL column and a UNIT column in the metadata

Description

...if missing

Usage

prep_scalelevel_from_data_and_metadata(
  resp_vars = lifecycle::deprecated(),
  study_data,
  item_level = "item_level",
  label_col = LABEL,
  meta_data = item_level,
  meta_data_v2
)

Arguments

resp_vars

variable list deprecated, the function always addresses all variables.

study_data

data.frame the data frame that contains the measurements

item_level

data.frame the data frame that contains metadata attributes of study data

label_col

variable attribute the name of the column in the metadata with labels of variables

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

Value

data.frame modified metadata

Examples

## Not run: 
  prep_load_workbook_like_file("meta_data_v2")
  prep_scalelevel_from_data_and_metadata(study_data = "study_data")

## End(Not run)

Change the back-end of a report

Description

with this function, you can move a report from/to a storr storage.

Usage

prep_set_backend(r, storr_factory = NULL, amend = FALSE)

Arguments

r

dataquieR_resultset2 the report

storr_factory

storr the storr storage or NULL, to move the report fully back into the RAM.

amend

logical if there is already data in.storr_factory, use it anyways – unsupported, so far!

Value

dataquieR_resultset2 but now with the desired back-end


Guess a metadata data frame from study data.

Description

Guess a minimum metadata data frame from study data. Minimum required variable attributes are:

Usage

prep_study2meta(
  study_data,
  level = c(VARATT_REQUIRE_LEVELS$REQUIRED, VARATT_REQUIRE_LEVELS$RECOMMENDED),
  cumulative = TRUE,
  convert_factors = FALSE,
  guess_missing_codes = getOption("dataquieR.guess_missing_codes",
    dataquieR.guess_missing_codes_default)
)

Arguments

study_data

data.frame the data frame that contains the measurements

level

enum levels to provide (see also VARATT_REQUIRE_LEVELS)

cumulative

logical include attributes of all levels up to level

convert_factors

logical convert factor columns to coded integers. if selected, then also the study data will be updated and returned.

guess_missing_codes

logical try to guess missing codes from the data

Details

dataquieR:::util_get_var_att_names_of_level(VARATT_REQUIRE_LEVELS$REQUIRED)
#>            VAR_NAMES            DATA_TYPE   MISSING_LIST_TABLE 
#>          "VAR_NAMES"          "DATA_TYPE" "MISSING_LIST_TABLE"

The function also tries to detect missing codes.

Value

a meta_data data frame or a list with study data and metadata, if convert_factors == TRUE.

Examples

## Not run: 
dataquieR::prep_study2meta(Orange, convert_factors = FALSE)

## End(Not run)

Classify metrics from a report summary table

Description

Classify metrics from a report summary table

Usage

prep_summary_to_classes(report_summary)

Arguments

report_summary

list() as returned by prep_extract_summary()

Value

data.frame classes for the report summary table, long format

See Also

Other summary_functions: prep_combine_report_summaries(), prep_extract_classes_by_functions(), prep_extract_summary(), prep_extract_summary.dataquieR_result(), prep_extract_summary.dataquieR_resultset2(), prep_render_pie_chart_from_summaryclasses_ggplot2(), prep_render_pie_chart_from_summaryclasses_plotly(), util_as_cat(), util_as_integer_cat(), util_extract_indicator_metrics(), util_get_category_for_result(), util_get_colors(), util_get_labels_grading_class(), util_get_message_for_result(), util_get_rule_sets(), util_get_ruleset_formats(), util_get_thresholds(), util_html_table(), util_sort_by_order()


Prepare a label as part of a title text for RMD files

Description

Prepare a label as part of a title text for RMD files

Usage

prep_title_escape(s, html = FALSE)

Arguments

s

the label

html

prepare the label for direct HTML output instead of RMD

Value

the escaped label


Remove data disclosing details

Description

new function: no warranty, so far.

Usage

prep_undisclose(x)

Arguments

x

an object to un-disclose, a

Value

undisclosed object


Combine all missing and value lists to one big table

Description

Combine all missing and value lists to one big table

Usage

prep_unsplit_val_tabs(meta_data = "item_level", val_tab = NULL)

Arguments

meta_data

data.frame item level meta data to be used, defaults to "item_level"

val_tab

character name of the table being created: This table will be added to the data frame cache (or overwritten). If NULL, the table will only be returned

Value

data.frame the combined table


Get value labels from data

Description

Detects factors and converts them to compatible metadata/study data.

Usage

prep_valuelabels_from_data(resp_vars = colnames(study_data), study_data)

Arguments

resp_vars

variable names of the variables to fetch the value labels from the data

study_data

data.frame the data frame that contains the measurements

Value

a list with:

Examples

## Not run: 
dataquieR::prep_datatype_from_data(iris)

## End(Not run)

Print a DataSlot object

Description

Print a DataSlot object

Usage

## S3 method for class 'DataSlot'
print(x, ...)

Arguments

x

the object

...

not used

Value

see print


print implementation for the class ReportSummaryTable

Description

Use this function to print results objects of the class ReportSummaryTable.

Usage

## S3 method for class 'ReportSummaryTable'
print(
  x,
  relative = lifecycle::deprecated(),
  dt = FALSE,
  fillContainer = FALSE,
  displayValues = FALSE,
  view = TRUE,
  ...,
  flip_mode = "auto"
)

Arguments

x

ReportSummaryTable objects to print

relative

deprecated

dt

logical use DT::datatables, if installed

fillContainer

logical if dt is TRUE, control table size, see DT::datatables.

displayValues

logical if dt is TRUE, also display the actual values

view

logical if view is FALSE, do not print but return the output, only

...

not used, yet

flip_mode

enum default | flip | noflip | auto. Should the plot be in default orientation, flipped, not flipped or auto-flipped. Not all options are always supported. In general, this con be controlled by setting the roptions(dataquieR.flip_mode = ...). If called from dq_report, you can also pass flip_mode to all function calls or set them specifically using specific_args.

Value

the printed object

See Also

base::print


Print a Slot object

Description

displays all warnings and stuff. then it prints x.

Usage

## S3 method for class 'Slot'
print(x, ...)

Arguments

x

the object

...

not used

Value

calls the next print method


Print a StudyDataSlot object

Description

Print a StudyDataSlot object

Usage

## S3 method for class 'StudyDataSlot'
print(x, ...)

Arguments

x

the object

...

not used

Value

see print


Print a TableSlot object

Description

Print a TableSlot object

Usage

## S3 method for class 'TableSlot'
print(x, ...)

Arguments

x

the object

...

not used

Value

see print


Print a dataquieR result returned by dq_report2

Description

Print a dataquieR result returned by dq_report2

Usage

## S3 method for class 'dataquieR_result'
print(x, ...)

Arguments

x

list a dataquieR result from dq_report2 or util_eval_to_dataquieR_result

...

passed to print. Additionally, the argument slot may be passed to print only specific sub-results.

Value

see print

See Also

util_pretty_print()


Generate a RMarkdown-based report from a dataquieR report

Description

Generate a RMarkdown-based report from a dataquieR report

Usage

## S3 method for class 'dataquieR_resultset'
print(...)

Arguments

...

deprecated

Value

deprecated


Generate a HTML-based report from a dataquieR report

Description

Generate a HTML-based report from a dataquieR report

Usage

## S3 method for class 'dataquieR_resultset2'
print(
  x,
  dir,
  view = TRUE,
  disable_plotly = FALSE,
  block_load_factor = 4,
  advanced_options = list(),
  dashboard = NA,
  ...
)

Arguments

x

dataquieR report v2.

dir

character directory to store the rendered report's files, a temporary one, if omitted. Directory will be created, if missing, files may be overwritten inside that directory

view

logical display the report

disable_plotly

logical do not use plotly, even if installed

block_load_factor

numeric multiply size of parallel compute blocks by this factor.

advanced_options

list options to set during report computation, see options()

dashboard

logical dashboard mode: TRUE: create a dashboard only, FALSE: don't create a dashboard at all, NA or missing: create a "normal" report with a dashboard included.

...

additional arguments:

Value

file names of the generated report's HTML files


Print a dataquieR summary

Description

Print a dataquieR summary

Usage

## S3 method for class 'dataquieR_summary'
print(
  x,
  ...,
  grouped_by = c("call_names", "indicator_metric"),
  dont_print = FALSE,
  folder_of_report = NULL
)

Arguments

x

the dataquieR summary, see summary() and dq_report2()

...

not yet used

grouped_by

define the columns of the resulting matrix. It can be either "call_names", one column per function, or "indicator_metric", one column per indicator or both c("call_names", "indicator_metric"). The last combination is the default

dont_print

suppress the actual printing, just return a printable object derived from x

folder_of_report

a named vector with the location of variable and call_names

Value

invisible html object


print implementation for the class interval

Description

such objects, for now, only occur in RECCap rules, so this function is meant for internal use, mostly – for now.

Usage

## S3 method for class 'interval'
print(x, ...)

Arguments

x

interval objects to print

...

not used yet

Value

the printed object

See Also

base::print


print a list of dataquieR_result objects

Description

print a list of dataquieR_result objects

Usage

## S3 method for class 'list'
print(x, ...)

Arguments

x

list() only, if all elements inherit from dataquieR_result, this implementation runs

...

passed to other implementations

Value

undefined


Print a master_result object

Description

Print a master_result object

Usage

## S3 method for class 'master_result'
print(x, ...)

Arguments

x

the object

...

not used

Value

invisible(NULL)


Check applicability of DQ functions on study data

Description

Checks applicability of DQ functions based on study data and metadata characteristics

Usage

pro_applicability_matrix(
  study_data,
  item_level = "item_level",
  split_segments = FALSE,
  label_col,
  max_vars_per_plot = 20,
  meta_data_segment,
  meta_data_dataframe,
  flip_mode = "noflip",
  meta_data_v2,
  meta_data = item_level,
  segment_level,
  dataframe_level
)

Arguments

study_data

data.frame the data frame that contains the measurements

item_level

data.frame the data frame that contains metadata attributes of study data

split_segments

logical return one matrix per study segment

label_col

variable attribute the name of the column in the metadata with labels of variables

max_vars_per_plot

integer from=0. The maximum number of variables per single plot.

meta_data_segment

data.frame – optional: Segment level metadata

meta_data_dataframe

data.frame – optional: Data frame level metadata

flip_mode

enum default | flip | noflip | auto. Should the plot be in default orientation, flipped, not flipped or auto-flipped. Not all options are always supported. In general, this con be controlled by setting the roptions(dataquieR.flip_mode = ...). If called from dq_report, you can also pass flip_mode to all function calls or set them specifically using specific_args.

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

meta_data

data.frame old name for item_level

segment_level

data.frame alias for meta_data_segment

dataframe_level

data.frame alias for meta_data_dataframe

Details

This is a preparatory support function that compares study data with associated metadata. A prerequisite of this function is that the no. of columns in the study data complies with the no. of rows in the metadata.

For each existing R-implementation, the function searches for necessary static metadata and returns a heatmap like matrix indicating the applicability of each data quality implementation.

In addition, the data type defined in the metadata is compared with the observed data type in the study data.

Value

a list with:


Combine ReportSummaryTable outputs

Description

Using this rbind implementation, you can combine different heatmap-like results of the class ReportSummaryTable.

Usage

## S3 method for class 'ReportSummaryTable'
rbind(...)

Arguments

...

ReportSummaryTable objects to combine.

See Also

base::rbind.data.frame


Return names of result slots (e.g., 3rd dimension of dataquieR results)

Description

Return names of result slots (e.g., 3rd dimension of dataquieR results)

Usage

resnames(x)

Arguments

x

the objects

Value

character vector with names


Return names of result slots (e.g., 3rd dimension of dataquieR results)

Description

Return names of result slots (e.g., 3rd dimension of dataquieR results)

Usage

## S3 method for class 'dataquieR_resultset2'
resnames(x)

Arguments

x

the objects

Value

character vector with names


Data frame with the study data whose quality is being assessed

Description

Study data is expected in wide format. If should contain all variables for all segments in one large table, even, if some variables are not measured for all observational utils (study participants).


Summarize a dataquieR report

Description

Deprecated

Usage

## S3 method for class 'dataquieR_resultset'
summary(...)

Arguments

...

Deprecated

Value

Deprecated


Generate a report summary table

Description

Generate a report summary table

Usage

## S3 method for class 'dataquieR_resultset2'
summary(
  object,
  aspect = c("applicability", "error", "anamat", "indicator_or_descriptor"),
  FUN,
  collapse = "\n<br />\n",
  ...
)

Arguments

object

a square result set

aspect

an aspect/problem category of results

FUN

function to apply to the cells of the result table

collapse

passed to FUN

...

not used

Value

a summary of a dataquieR report

Examples

## Not run: 
  util_html_table(summary(report),
       filter = "top", options = list(scrollCollapse = TRUE, scrollY = "75vh"),
       is_matrix_table = TRUE, rotate_headers = TRUE, output_format = "HTML"
  )

## End(Not run)

Utility function for 3SD deviations rule

Description

This function calculates outliers according to the rule of 3SD deviations.

Usage

util_3SD(x)

Arguments

x

numeric data to check for outliers

Value

binary vector

See Also

Other outlier_functions: util_hubert(), util_sigmagap(), util_tukey()


Abbreviate snake_case function names to shortened CamelCase

Description

Abbreviate snake_case function names to shortened CamelCase

Usage

util_abbreviate(x)

Arguments

x

a vector of indicator function names

Value

abbreviations

See Also

base::abbreviate

Other process_functions: util_all_is_integer(), util_attach_attr(), util_bQuote(), util_backtickQuote(), util_coord_flip(), util_extract_matches(), util_par_pmap(), util_setup_rstudio_job(), util_suppress_output()


Abbreviate a vector of strings

Description

Abbreviate a vector of strings

Usage

util_abbreviate_unique(initial, max_value_label_len)

Arguments

initial

character vector with stuff to abbreviate

max_value_label_len

integer maximum length (may not strictly be met, if not possible keeping a maybe detected uniqueness of initial)

Value

character uniquely abbreviated initial

See Also

Other string_functions: util_filter_names_by_regexps(), util_pretty_vector_string(), util_set_dQuoteString(), util_set_sQuoteString(), util_sub_string_left_from_.(), util_sub_string_right_from_.(), util_translate()


Utility function for smoothed longitudinal trends from logistic regression models

Description

This function is under development. It computes a logistic regression for binary variables and visualizes smoothed time trends of the residuals by LOESS or GAM. The function can also be called for non-binary outcome variables. These will be transformed to binary variables, either using user-specified groups in the metadata columns RECODE_CASES and/or RECODE_CONTROL (see util_dichotomize), or it will attempt to recode the variables automatically. For nominal variables, it will consider the most frequent category as 'cases' and every other category as 'control', if there are more than two categories. Nominal variables with only two distinct values will be transformed by assigning the less frequent category to 'cases' and the more frequent category to 'control'. For variables of other statistical data types, values inside the interquartile range are considered as 'control', values outside this range as 'cases'. Variables with few different values are transformed in a simplified way to obtain two groups.

Usage

util_acc_loess_bin(
  resp_vars,
  label_col = NULL,
  study_data,
  item_level = "item_level",
  group_vars = NULL,
  time_vars,
  co_vars = NULL,
  min_obs_in_subgroup = 30,
  resolution = 80,
  plot_format = getOption("dataquieR.acc_loess.plot_format",
    dataquieR.acc_loess.plot_format_default),
  meta_data = item_level,
  n_group_max = getOption("dataquieR.max_group_var_levels_in_plot",
    dataquieR.max_group_var_levels_in_plot_default),
  enable_GAM = getOption("dataquieR.GAM_for_LOESS", dataquieR.GAM_for_LOESS.default),
  exclude_constant_subgroups =
    getOption("dataquieR.acc_loess.exclude_constant_subgroups",
    dataquieR.acc_loess.exclude_constant_subgroups.default),
  min_bandwidth = getOption("dataquieR.acc_loess.min_bw",
    dataquieR.acc_loess.min_bw.default),
  min_proportion = getOption("dataquieR.acc_loess.min_proportion",
    dataquieR.acc_loess.min_proportion.default)
)

Arguments

resp_vars

variable the name of the (binary) measurement variable

label_col

variable attribute the name of the column in the metadata with labels of variables

study_data

data.frame the data frame that contains the measurements

group_vars

variable the name of the observer, device or reader variable

time_vars

variable the name of the variable giving the time of measurement

co_vars

variable list a vector of co-variables, e.g. age and sex for adjustment

min_obs_in_subgroup

integer from=0. This optional argument specifies the minimum number of observations that is required to include a subgroup (level) of the group_var in the analysis. Subgroups with less observations are excluded.

resolution

integer the maximum number of time points used for plotting the trend lines

plot_format

enum AUTO | COMBINED | FACETS | BOTH. Return the plot as one combined plot for all groups or as facet plots (one figure per group). BOTH will return both variants, AUTO will decide based on the number of observers.

meta_data

data.frame the data frame that contains metadata attributes of study data

n_group_max

integer maximum number of categories to be displayed individually for the grouping variable (group_vars, devices / examiners)

enable_GAM

logical Can LOESS computations be replaced by general additive models to reduce memory consumption for large datasets?

exclude_constant_subgroups

logical Should subgroups with constant values be excluded?

min_bandwidth

numeric lower limit for the LOESS bandwidth, should be greater than 0 and less than or equal to 1. In general, increasing the bandwidth leads to a smoother trend line.

min_proportion

numeric lower limit for the proportion of the smaller group (cases or controls) for creating a LOESS figure, should be greater than 0 and less than 0.4.

Details

Descriptor

Value

a list with:


Utility function for smoothes and plots adjusted longitudinal measurements

Description

The following R implementation executes calculations for quality indicator "Unexpected location" (see here. Local regression (LOESS) is a versatile statistical method to explore an averaged course of time series measurements (Cleveland, Devlin, and Grosse 1988). In context of epidemiological data, repeated measurements using the same measurement device or by the same examiner can be considered a time series. LOESS allows to explore changes in these measurements over time.

Descriptor

Usage

util_acc_loess_continuous(
  resp_vars,
  label_col = NULL,
  study_data,
  item_level = "item_level",
  group_vars = NULL,
  time_vars,
  co_vars = NULL,
  min_obs_in_subgroup = 30,
  resolution = 80,
  comparison_lines = list(type = c("mean/sd", "quartiles"), color = "grey30", linetype =
    2, sd_factor = 0.5),
  mark_time_points = getOption("dataquieR.acc_loess.mark_time_points",
    dataquieR.acc_loess.mark_time_points_default),
  plot_observations = getOption("dataquieR.acc_loess.plot_observations",
    dataquieR.acc_loess.plot_observations_default),
  plot_format = getOption("dataquieR.acc_loess.plot_format",
    dataquieR.acc_loess.plot_format_default),
  meta_data = item_level,
  n_group_max = getOption("dataquieR.max_group_var_levels_in_plot",
    dataquieR.max_group_var_levels_in_plot_default),
  enable_GAM = getOption("dataquieR.GAM_for_LOESS", dataquieR.GAM_for_LOESS.default),
  exclude_constant_subgroups =
    getOption("dataquieR.acc_loess.exclude_constant_subgroups",
    dataquieR.acc_loess.exclude_constant_subgroups.default),
  min_bandwidth = getOption("dataquieR.acc_loess.min_bw",
    dataquieR.acc_loess.min_bw.default)
)

Arguments

resp_vars

variable the name of the continuous (or binary) measurement variable

label_col

variable attribute the name of the column in the metadata with labels of variables

study_data

data.frame the data frame that contains the measurements

group_vars

variable the name of the observer, device or reader variable

time_vars

variable the name of the variable giving the time of measurement

co_vars

variable list a vector of co-variables for adjustment, for example age and sex. Can be NULL (default) for no adjustment.

min_obs_in_subgroup

integer (optional argument) If group_vars is specified, this argument can be used to specify the minimum number of observations required for each of the subgroups. Subgroups with fewer observations are excluded. The default number is 30.

resolution

integer the maximum number of time points used for plotting the trend lines

comparison_lines

list type and style of lines with which trend lines are to be compared. Can be mean +/- 0.5 standard deviation (the factor can be specified differently in sd_factor) or quartiles (Q1, Q2, and Q3). Arguments color and linetype are passed to ggplot2::geom_line().

mark_time_points

logical mark time points with observations (caution, there may be many marks)

plot_observations

logical show observations as scatter plot in the background. If there are co_vars specified, the values of the observations in the plot will also be adjusted for the specified covariables.

plot_format

enum AUTO | COMBINED | FACETS | BOTH. Return the plot as one combined plot for all groups or as facet plots (one figure per group). BOTH will return both variants, AUTO will decide based on the number of observers.

meta_data

data.frame the data frame that contains metadata attributes of study data

n_group_max

integer maximum number of categories to be displayed individually for the grouping variable (group_vars, devices / examiners)

enable_GAM

logical Can LOESS computations be replaced by general additive models to reduce memory consumption for large datasets?

exclude_constant_subgroups

logical Should subgroups with constant values be excluded?

min_bandwidth

numeric lower limit for the LOESS bandwidth, should be greater than 0 and less than or equal to 1. In general, increasing the bandwidth leads to a smoother trend line.

Details

If mark_time_points or plot_observations is selected, but would result in plotting more than 400 points, only a sample of the data will be displayed.

Limitations

The application of LOESS requires model fitting, i.e. the smoothness of a model is subject to a smoothing parameter (span). Particularly in the presence of interval-based missing data, high variability of measurements combined with a low number of observations in one level of the group_vars may distort the fit. Since our approach handles data without knowledge of such underlying characteristics, finding the best fit is complicated if computational costs should be minimal. The default of LOESS in R uses a span of 0.75, which provides in most cases reasonable fits. The function util_acc_loess_continuous adapts the span for each level of the group_vars (with at least as many observations as specified in min_obs_in_subgroup and with at least three time points) based on the respective number of observations. LOESS consumes a lot of memory for larger datasets. That is why util_acc_loess_continuous switches to a generalized additive model with integrated smoothness estimation (gam by mgcv) if there are 1000 observations or more for at least one level of the group_vars (similar to geom_smooth from ggplot2).

Value

a list with:

See Also

Online Documentation


Estimates variance components

Description

Variance based models and intraclass correlations (ICC) are approaches to examine the impact of so-called process variables on the measurements. This implementation is model-based.

NB: The term ICC is frequently used to describe the agreement between different observers, examiners or even devices. In respective settings a good agreement is pursued. ICC-values can vary between ⁠[-1;1]⁠ and an ICC close to 1 is desired (Koo and Li 2016, Müller and Büttner 1994).

However, in multi-level analysis the ICC is interpreted differently. Please see Snijders et al. (Sniders and Bosker 1999). In this context the proportion of variance explained by respective group levels indicate an influence of (at least one) level of the respective group_vars. An ICC close to 0 is desired.

Usage

util_acc_varcomp(
  resp_vars = NULL,
  label_col = NULL,
  study_data,
  item_level = "item_level",
  group_vars,
  co_vars = NULL,
  min_obs_in_subgroup = 30,
  min_subgroups = 5,
  meta_data = item_level,
  meta_data_v2
)

Arguments

resp_vars

variable list the names of the continuous measurement variables

label_col

variable attribute the name of the column in the metadata with labels of variables

study_data

data.frame the data frame that contains the measurements

item_level

data.frame the data frame that contains metadata attributes of study data

group_vars

variable list the names of the resp. observer, device or reader variables

co_vars

variable list a vector of covariables, e.g. age and sex for adjustment

min_obs_in_subgroup

integer from=0. optional argument if a "group_var" is used. This argument specifies the minimum no. of observations that is required to include a subgroup (level) of the "group_var" in the analysis. Subgroups with fewer observations are excluded. The default is 30.

min_subgroups

integer from=0. optional argument if a "group_var" is used. This argument specifies the minimum no. of subgroups (levels) included "group_var". If the variable defined in "group_var" has fewer subgroups it is not used for analysis. The default is 5.

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

Value

a list with:

ALGORITHM OF THIS IMPLEMENTATION:

See Also

Online Documentation


Adjust the data types of study data, if needed

Description

Adjust the data types of study data, if needed

Usage

util_adjust_data_type(study_data, meta_data, relevant_vars_for_warnings)

Arguments

study_data

data.frame the study data

meta_data

meta_data VAR_NAMES relevant for warnings about conversion errors

relevant_vars_for_warnings

character

Value

data.frame modified study data


Place all geom_texts also in plotly right from the x position

Description

Place all geom_texts also in plotly right from the x position

Usage

util_adjust_geom_text_for_plotly(plotly)

Arguments

plotly

the plotly

Value

modified plotly-built object


Create a caption from an alias name of a dq_report2 result

Description

Create a caption from an alias name of a dq_report2 result

Usage

util_alias2caption(alias, long = FALSE)

Arguments

alias

alias name

long

return result based on menu_title_report, matrix_column_title_report otherwise

Value

caption

See Also

util_html_table

Other reporting_functions: util_copy_all_deps(), util_create_page_file(), util_eval_to_dataquieR_result(), util_evaluate_calls(), util_float_index_menu(), util_generate_anchor_link(), util_generate_anchor_tag(), util_generate_calls(), util_generate_calls_for_function(), util_load_manual(), util_make_data_slot_from_table_slot(), util_order_by_order(), util_set_size()


All indicator functions of dataquieR

Description

All indicator functions of dataquieR

Usage

util_all_ind_functions()

Value

character names of all indicator functions


Get all PART_VARS for a response variable (from item-level metadata)

Description

Get all PART_VARS for a response variable (from item-level metadata)

Usage

util_all_intro_vars_for_rv(
  rv,
  study_data,
  meta_data,
  label_col = LABEL,
  expected_observations = c("HIERARCHY", "ALL", "SEGMENT")
)

Arguments

rv

character the response variable's name

study_data

study_data

meta_data

meta_data

label_col

character the metadata attribute to map meta_data on study_data based on colnames(study_data)

expected_observations

enum HIERARCHY | ALL | SEGMENT. How should PART_VARS be handled: - ALL: Ignore, all observations are expected - SEGMENT: if PART_VAR is 1, an observation is expected - HIERARCHY: the default, if the PART_VAR is 1 for this variable and also for all PART_VARS of PART_VARS up in the hierarchy, an observation is expected.

Value

character all PART_VARS for rv from item level metadata. For expected_observations = HIERARCHY, the more general PART_VARS (i.e., up, in the hierarchy) are more left in the vector, e.g.: ⁠PART_STUDY, PART_PHYSICAL_EXAMINATIONS, PART_BLOODPRESSURE⁠

See Also

Other missing_functions: util_count_expected_observations(), util_filter_missing_list_table_for_rv(), util_get_code_list(), util_is_na_0_empty_or_false(), util_observation_expected(), util_remove_empty_rows(), util_replace_codes_by_NA()


convenience function to abbreviate all(util_is_integer(...))

Description

convenience function to abbreviate all(util_is_integer(...))

Usage

util_all_is_integer(x)

Arguments

x

the object to test

Value

TRUE, if all entries are integer-like, FALSE otherwise

See Also

util_is_integer

Other process_functions: util_abbreviate(), util_attach_attr(), util_bQuote(), util_backtickQuote(), util_coord_flip(), util_extract_matches(), util_par_pmap(), util_setup_rstudio_job(), util_suppress_output()


Test, if package anytime is installed

Description

Test, if package anytime is installed

Usage

util_anytime_installed()

Value

TRUE if anytime is installed.

See Also

requireNamespace

https://forum.posit.co/t/how-can-i-make-testthat-think-i-dont-have-a-package-installed/33441/2

util_ensure_suggested


utility function for the applicability of contradiction checks

Description

Test for applicability of contradiction checks

Usage

util_app_cd(x, dta)

Arguments

x

data.frame metadata

dta

logical vector, 1=matching data type, 0 = non-matching data type

Value

factor 0-3 for each variable in metadata

See Also

pro_applicability_matrix


utility function for the applicability of contradiction checks

Description

Test for applicability of contradiction checks

Usage

util_app_con_contradictions_redcap(x, dta)

Arguments

x

data.frame metadata

dta

logical vector, 1=matching data type, 0 = non-matching data type

Value

factor 0-3 for each variable in metadata

See Also

pro_applicability_matrix


utility function for the applicability of of distribution plots

Description

Test for applicability of distribution plots

Usage

util_app_dc(x, dta)

Arguments

x

data.frame metadata

dta

logical vector, 1=matching data type, 0 = non-matching data type

Value

factor 0-3 for each variable in metadata

See Also

pro_applicability_matrix


utility function to test for applicability of detection limits checks

Description

Test for applicability of detection limits checks

Usage

util_app_dl(x, dta)

Arguments

x

data.frame metadata

dta

logical vector, 1=matching data type, 0 = non-matching data type

Value

factor 0-3 for each variable in metadata

See Also

pro_applicability_matrix


utility function for the applicability of of end digits preferences checks

Description

Test for applicability of end digits preferences checks

Usage

util_app_ed(x, dta)

Arguments

x

data.frame metadata

dta

logical vector, 1=matching data type, 0 = non-matching data type

Value

factor 0-3 for each variable in metadata

See Also

pro_applicability_matrix


utility function to test for applicability of hard limits checks

Description

Test for applicability of hard limits checks

Usage

util_app_hl(x, dta)

Arguments

x

data.frame metadata

dta

logical vector, 1=matching data type, 0 = non-matching data type

Value

factor 0-3 for each variable in metadata

See Also

pro_applicability_matrix


utility function for the applicability of categorical admissibility

Description

Test for applicability of categorical admissibility

Usage

util_app_iac(x, dta)

Arguments

x

data.frame metadata

dta

logical vector, 1=matching data type, 0 = non-matching data type

Value

factor 0-3 for each variable in metadata

See Also

pro_applicability_matrix


utility function for the applicability of numeric admissibility

Description

Test for applicability of numeric admissibility

Usage

util_app_iav(x, dta)

Arguments

x

data.frame metadata

dta

logical vector, 1=matching data type, 0 = non-matching data type

Value

factor 0-3 for each variable in metadata

See Also

pro_applicability_matrix


utility function applicability of item missingness

Description

Test for applicability of item missingness

Usage

util_app_im(x, dta)

Arguments

x

data.frame metadata

dta

logical vector, 1=matching data type, 0 = non-matching data type

Value

factor 0-3 for each variable in metadata

See Also

pro_applicability_matrix


utility function for applicability of LOESS smoothed time course plots

Description

Test for applicability of LOESS smoothed time course plots

Usage

util_app_loess(x, dta)

Arguments

x

data.frame metadata

dta

logical vector, 1=matching data type, 0 = non-matching data type

Value

factor 0-3 for each variable in metadata

See Also

pro_applicability_matrix


utility function to test for applicability of marginal means plots

Description

Test for applicability of detection limits checks

Usage

util_app_mar(x, dta)

Arguments

x

data.frame metadata

dta

logical vector, 1 = matching data type, 0 = non-matching data type

Value

factor 0-3 for each variable in metadata

See Also

pro_applicability_matrix


utility function applicability of multivariate outlier detection

Description

Test for applicability of multivariate outlier detection

Usage

util_app_mol(x, dta)

Arguments

x

data.frame metadata

dta

logical vector, 1=matching data type, 0 = non-matching data type

Value

factor 0-3 for each variable in metadata

See Also

pro_applicability_matrix


utility function for the applicability of outlier detection

Description

Test for applicability of univariate outlier detection

Usage

util_app_ol(x, dta)

Arguments

x

data.frame metadata

dta

logical vector, 1=matching data type, 0 = non-matching data type

Value

factor 0-3 for each variable in metadata

See Also

pro_applicability_matrix


utility function to test for applicability of soft limits checks

Description

Test for applicability of soft limits checks

Usage

util_app_sl(x, dta)

Arguments

x

data.frame metadata

dta

logical vector, 1=matching data type, 0 = non-matching data type

Value

factor 0-3 for each variable in metadata

See Also

pro_applicability_matrix


utility function applicability of segment missingness

Description

Test for applicability of segment missingness

Usage

util_app_sm(x, dta)

Arguments

x

data.frame metadata

dta

logical vector, 1=matching data type, 0 = non-matching data type

Value

factor 0-3 for each variable in metadata

See Also

pro_applicability_matrix


utility function applicability of distribution function's shape or scale check

Description

Test for applicability of checks for deviation form expected probability distribution shapes/scales

Usage

util_app_sos(x, dta)

Arguments

x

data.frame metadata

dta

logical vector, 1=matching data type, 0 = non-matching data type

Value

factor 0-3 for each variable in metadata

See Also

pro_applicability_matrix


utility applicability variance components

Description

Test for applicability of ICC

Usage

util_app_vc(x, dta)

Arguments

x

data.frame metadata

dta

logical vector, 1=matching data type, 0 = non-matching data type

Value

factor 0-3 for each variable in metadata

See Also

pro_applicability_matrix


Convert a category to an ordered factor (1:5)

Description

Convert a category to an ordered factor (1:5)

Usage

util_as_cat(category)

Arguments

category

vector with categories

Value

an ordered factor

See Also

Other summary_functions: prep_combine_report_summaries(), prep_extract_classes_by_functions(), prep_extract_summary(), prep_extract_summary.dataquieR_result(), prep_extract_summary.dataquieR_resultset2(), prep_render_pie_chart_from_summaryclasses_ggplot2(), prep_render_pie_chart_from_summaryclasses_plotly(), prep_summary_to_classes(), util_as_integer_cat(), util_extract_indicator_metrics(), util_get_category_for_result(), util_get_colors(), util_get_labels_grading_class(), util_get_message_for_result(), util_get_rule_sets(), util_get_ruleset_formats(), util_get_thresholds(), util_html_table(), util_sort_by_order()


Convert a category to a number (1:5)

Description

Convert a category to a number (1:5)

Usage

util_as_integer_cat(category)

Arguments

category

vector with categories

Value

an integer

See Also

Other summary_functions: prep_combine_report_summaries(), prep_extract_classes_by_functions(), prep_extract_summary(), prep_extract_summary.dataquieR_result(), prep_extract_summary.dataquieR_resultset2(), prep_render_pie_chart_from_summaryclasses_ggplot2(), prep_render_pie_chart_from_summaryclasses_plotly(), prep_summary_to_classes(), util_as_cat(), util_extract_indicator_metrics(), util_get_category_for_result(), util_get_colors(), util_get_labels_grading_class(), util_get_message_for_result(), util_get_rule_sets(), util_get_ruleset_formats(), util_get_thresholds(), util_html_table(), util_sort_by_order()


Convert factors to label-corresponding numeric values

Description

Converts a vector factor aware of numeric values not being scrambled.

Usage

util_as_numeric(v, warn)

Arguments

v

the vector

warn

if not missing: character with error message stating conversion error

Value

the converted vector


Return the pre-computed plotly from a dataquieR result

Description

Return the pre-computed plotly from a dataquieR result

Usage

util_as_plotly_from_res(res, ...)

Arguments

res

the dataquieR result

...

not used

Value

a plotly object


Convert x to valid missing codes

Description

Convert x to valid missing codes

Usage

util_as_valid_missing_codes(x)

Arguments

x

character a vector of values

Value

converted x

See Also

Other robustness_functions: util_check_one_unique_value(), util_correct_variable_use(), util_empty(), util_ensure_character(), util_ensure_in(), util_ensure_suggested(), util_expect_scalar(), util_fix_rstudio_bugs(), util_is_integer(), util_is_numeric_in(), util_is_valid_missing_codes(), util_match_arg(), util_observations_in_subgroups(), util_stop_if_not(), util_warn_unordered()


utility function to assign labels to levels

Description

function to assign labels to levels of a variable

Usage

util_assign_levlabs(
  variable,
  string_of_levlabs,
  splitchar,
  assignchar,
  ordered = TRUE,
  variable_name = "",
  warn_if_inadmissible = TRUE
)

Arguments

variable

vector vector with values of a study variable

string_of_levlabs

character len=1. value labels, e.g. 1 = no | 2 = yes

splitchar

character len=1. splitting character(s) in string_of_levlabs, usually SPLIT_CHAR

assignchar

character len=1. assignment operator character(s) in string_of_levlabs, usually = or ⁠: ⁠

ordered

the function converts variable to a factor, by default to an ordered factor assuming LHS of assignments being meaningful numbers, e.g. 1 = low | 2 = medium | 3 = high. If no special order is given, set ordered to FALSE, e.g. for 1 = male | 2 = female or 1 = low | 2 = high | 3 = medium.

variable_name

character the name of the variable being converted for warning messages

warn_if_inadmissible

logical warn on con_inadmissible_categorical values

Details

DEPRECATED from v2.5.0

Value

a factor with labels assigned to categorical variables (if available)

See Also

Other data_management: util_check_data_type(), util_check_group_levels(), util_compare_meta_with_study(), util_dichotomize(), util_fix_merge_dups(), util_merge_data_frame_list(), util_rbind(), util_remove_na_records(), util_replace_hard_limit_violations(), util_round_to_decimal_places(), util_study_var2factor(), util_table_of_vct()


Attach attributes to an object and return it

Description

Attach attributes to an object and return it

Usage

util_attach_attr(x, ...)

Arguments

x

the object

...

named arguments, each becomes an attributes

Value

x, having the desired attributes attached

See Also

Other process_functions: util_abbreviate(), util_all_is_integer(), util_bQuote(), util_backtickQuote(), util_coord_flip(), util_extract_matches(), util_par_pmap(), util_setup_rstudio_job(), util_suppress_output()


Put in back-ticks

Description

also escape potential back-ticks in x

Usage

util_bQuote(x)

Arguments

x

a string

Value

x in back-ticks

See Also

util_backtickQuote

Other process_functions: util_abbreviate(), util_all_is_integer(), util_attach_attr(), util_backtickQuote(), util_coord_flip(), util_extract_matches(), util_par_pmap(), util_setup_rstudio_job(), util_suppress_output()


utility function to set string in backticks

Description

Quote a set of variable names with backticks

Usage

util_backtickQuote(x)

Arguments

x

variable names

Value

quoted variable names

See Also

util_bQuote

Other process_functions: util_abbreviate(), util_all_is_integer(), util_attach_attr(), util_bQuote(), util_coord_flip(), util_extract_matches(), util_par_pmap(), util_setup_rstudio_job(), util_suppress_output()


Utility function to create bar plots

Description

A helper function for simple bar plots. The layout is intended for data with positive numbers only (e.g., counts/frequencies).

Usage

util_bar_plot(
  plot_data,
  cat_var,
  num_var,
  relative = FALSE,
  show_numbers = TRUE,
  fill_var = NULL,
  colors = "#2166AC",
  show_color_legend = FALSE,
  flip = FALSE
)

Arguments

plot_data

the data for the plot. It should consist of one column specifying the categories, and a second column giving the respective numbers / counts per category. It may contain another column to specify the coloring of the bars (fill_var).

cat_var

column name of the categorical variable in plot_data

num_var

column name of the numerical variable in plot_data

relative

if TRUE, numbers will be interpreted as percentages (values in num_var should lie within ⁠[0,1]⁠)

show_numbers

if TRUE, numbers will be displayed on top of the bars

fill_var

column name of the variable in plot_data which will be used to color the bars individually

colors

vector of colors, or a single color

show_color_legend

if TRUE, a legend for the colors will be displayed

flip

if TRUE, bars will be oriented horizontally

Value

a bar plot


Data frame leaves haven

Description

if df is/contains a haven labelled or tibble object, convert it to a base R data frame

Usage

util_cast_off(df, symb, .dont_cast_off_cols = FALSE)

Arguments

df

data.frame may have or contain non-standard classes

symb

character name of the data frame for error messages

.dont_cast_off_cols

logical internal use, only.

Value

data.frame having all known special things removed


Verify the data type of a value

Description

Function to verify the data type of a value.

Usage

util_check_data_type(
  x,
  type,
  check_convertible = FALSE,
  threshold_value = 0,
  return_percentages = FALSE,
  check_conversion_stable = FALSE,
  robust_na = FALSE
)

Arguments

x

the value

type

expected data type

check_convertible

logical also try, if a conversion to the declared data type would work.

threshold_value

numeric from=0 to=100. percentage of failing conversions allowed.

return_percentages

logical return the percentage of mismatches.

check_conversion_stable

logical do not distinguish convertible from convertible, but with issues

robust_na

logical treat white-space-only-values as NA

Value

if return_percentages: if not check_convertible, the percentage of mismatches instead of logical value, if check_convertible, return a named vector with the percentages of all cases (names of the vector are match, convertible_mismatch_stable, convertible_mismatch_unstable, nonconvertible_mismatch) if not return_percentages: if check_convertible is FALSE, logical whether x is of the expected type if check_convertible is TRUE integer with the states ⁠0, 1, 2, 3⁠: 0 = Mismatch, not convertible 1 = Match 2 = Mismatch, but convertible 3 = Mismatch, convertible, but with issues (e.g., loss of decimal places)

See Also

Other data_management: util_assign_levlabs(), util_check_group_levels(), util_compare_meta_with_study(), util_dichotomize(), util_fix_merge_dups(), util_merge_data_frame_list(), util_rbind(), util_remove_na_records(), util_replace_hard_limit_violations(), util_round_to_decimal_places(), util_study_var2factor(), util_table_of_vct()


Check data for observer levels

Description

Check data for observer levels

Usage

util_check_group_levels(
  study_data,
  group_vars,
  min_obs_in_subgroup = -Inf,
  max_obs_in_subgroup = +Inf,
  min_subgroups = -Inf,
  max_subgroups = +Inf
)

Arguments

study_data

data.frame the data frame that contains the measurements

group_vars

variable the name of the observer, device or reader variable

min_obs_in_subgroup

integer from=0. optional argument if group_vars are used. This argument specifies the minimum number of observations that is required to include a subgroup (level) of the group variable named by group_vars in the analysis. Subgroups with fewer observations are excluded.

max_obs_in_subgroup

integer from=0. optional argument if group_vars are used. This argument specifies the maximum number of observations that is required to include a subgroup (level) of the group variable named by group_vars in the analysis. Subgroups with more observations are excluded.

min_subgroups

integer from=0. optional argument if a "group_var" is used. This argument specifies the minimum no. of subgroups (levels) included "group_var". If the variable defined in "group_var" has fewer subgroups it is split for analysis.

max_subgroups

integer from=0. optional argument if a "group_var" is used. This argument specifies the maximum no. of subgroups (levels) included "group_var". If the variable defined in "group_var" has more subgroups it is split for analysis.

Value

modified study data frame

See Also

prep_min_obs_level

Other data_management: util_assign_levlabs(), util_check_data_type(), util_compare_meta_with_study(), util_dichotomize(), util_fix_merge_dups(), util_merge_data_frame_list(), util_rbind(), util_remove_na_records(), util_replace_hard_limit_violations(), util_round_to_decimal_places(), util_study_var2factor(), util_table_of_vct()

Examples

## Not run: 
study_data <- prep_get_data_frame("study_data")
meta_data <- prep_get_data_frame("meta_data")
prep_prepare_dataframes(.label_col = LABEL)
util_check_group_levels(ds1, "CENTER_0")
dim(util_check_group_levels(ds1, "USR_BP_0", min_obs_in_subgroup = 400))

## End(Not run)



Check for one value only

Description

utility function to identify variables with one value only.

Usage

util_check_one_unique_value(x)

Arguments

x

vector with values

Value

logical(1): TRUE, if – except NA – exactly only one value is observed in x, FALSE otherwise

See Also

Other robustness_functions: util_as_valid_missing_codes(), util_correct_variable_use(), util_empty(), util_ensure_character(), util_ensure_in(), util_ensure_suggested(), util_expect_scalar(), util_fix_rstudio_bugs(), util_is_integer(), util_is_numeric_in(), util_is_valid_missing_codes(), util_match_arg(), util_observations_in_subgroups(), util_stop_if_not(), util_warn_unordered()


Get Function called for a Call Name

Description

get aliases from report attributes and then replace them by the actual function name

Usage

util_cll_nm2fkt_nm(cll_names, report)

Arguments

cll_names

character then systematic function call name to fetch its function name

report

dataquieR_resultset2 the report

Value

character the function name


Return hex code colors from color names or STATAReporter syntax

Description

Return hex code colors from color names or STATAReporter syntax

Usage

util_col2rgb(colors)

Arguments

colors

the colors, e.g.,"255 0 0" or "red" or "#ff0000"

Value

character vector with colors using HTML hexadecimal encoding, e..g, "#ff0000" for "red"


Get description for a call

Description

Get description for a call

Usage

util_col_description(cn)

Arguments

cn

the call name

Value

the description


Collect all errors, warnings, or messages so that they are combined for a combined result

Description

Collect all errors, warnings, or messages so that they are combined for a combined result

Usage

util_collapse_msgs(class, all_of_f)

Create a data frame containing all the results from summaries of reports

Description

Create a data frame containing all the results from summaries of reports

Usage

util_combine_list_report_summaries(
  to_combine,
  type = c("unique_vars", "repeated_vars")
)

Arguments

to_combine

vector a list containing the summaries of reports obtained with summary(report)

type

character if type is unique_vars it means that the variable names are unique and there is not need to add a prefix to the variables and labels to specify the report of origin. If type is repeated_vars a prefix will be used to specify the report of origin of each variable

Value

a summary of summaries of dataquieR reports


Combine results for Single Variables

Description

to, e.g., a data frame with one row per variable or a similar heat-map, see print.ReportSummaryTable().

Usage

util_combine_res(all_of_f)

Arguments

all_of_f

all results of a function

Value

row-bound combined results


Combine two value lists

Description

Combine two value lists

Usage

util_combine_value_label_tables(vlt1, vlt2)

Arguments

vlt1

value_label_table

vlt2

value_label_table

Value

value_label_table

Examples

## Not run: 
util_combine_value_label_tables(
  tibble::tribble(~ CODE_VALUE, ~ CODE_LABEL, 17L, "Test", 19L, "Test", 17L, "TestX"),
  tibble::tribble(~ CODE_VALUE, ~ CODE_LABEL, 17L, "Test", 19L, "Test", 17L, "TestX"))

## End(Not run)

Compares study data data types with the ones expected according to the metadata

Description

Utility function to compare data type of study data with those defined in metadata

Usage

util_compare_meta_with_study(
  sdf,
  mdf,
  label_col,
  check_convertible = FALSE,
  threshold_value = 0,
  return_percentages = FALSE,
  check_conversion_stable = FALSE
)

Arguments

sdf

the data.frame of study data

mdf

the data.frame of associated static metadata

label_col

variable attribute the name of the column in the metadata with labels of variables

check_convertible

logical also try, if a conversion to the declared data type would work.

threshold_value

numeric from=0 to=100. percentage failing conversions allowed if check_convertible is TRUE.

return_percentages

logical return the percentage of mismatches.

check_conversion_stable

logical do not distinguish convertible from convertible, but with issues

Value

for return_percentages == FALSE: if check_convertible is FALSE, a binary vector ⁠(0, 1)⁠ if data type applies, if check_convertible is ⁠TRUE`` a vector with the states ⁠0, 1, 2, 3⁠: 0 = Mismatch, not convertible 1 = Match 2 = Mismatch, but convertible 3 = Mismatch, convertible, but with issues (e.g., loss of decimal places) for ⁠return_percentages == TRUE': a data frame with percentages of non-matching datatypes according, each column is a variable, the rows follow the vectors returned by util_check_data_type.

See Also

prep_dq_data_type_of

prep_datatype_from_data

Other data_management: util_assign_levlabs(), util_check_data_type(), util_check_group_levels(), util_dichotomize(), util_fix_merge_dups(), util_merge_data_frame_list(), util_rbind(), util_remove_na_records(), util_replace_hard_limit_violations(), util_round_to_decimal_places(), util_study_var2factor(), util_table_of_vct()


Remove specific classes from a ggplot plot_env environment

Description

Useful to remove large objects before writing to disk with qs or rds. Also deletes parent environment of the plot environment. Also deletes unneeded variables

Usage

util_compress_ggplots_in_res(r)

Arguments

r

the object

See Also

HERE


Compute SE.Skewness

Description

Compute SE.Skewness

Usage

util_compute_SE_skewness(x, skewness = util_compute_skewness(x))

Arguments

x

data

skewness

if already known

Value

the standard error of skewness


Compute Kurtosis

Description

Compute Kurtosis

Usage

util_compute_kurtosis(x)

Arguments

x

data

Value

the Kurtosis


Compute the Skewness

Description

Compute the Skewness

Usage

util_compute_skewness(x)

Arguments

x

data

Value

the Skewness


Produce a condition function

Description

Produce a condition function

Usage

util_condition_constructor_factory(
  .condition_type = c("error", "warning", "message")
)

Arguments

.condition_type

character the type of the conditions being created and signaled by the function, "error", "warning", or "message"

See Also

Other condition_functions: util_deparse1(), util_error(), util_find_external_functions_in_stacktrace(), util_find_first_externally_called_functions_in_stacktrace(), util_find_indicator_function_in_callers(), util_message(), util_suppress_warnings(), util_warning()


Extract condition from try error

Description

Extract condition from try error

Usage

util_condition_from_try_error(x)

Arguments

x

the try-error object

Value

condition of the try-error


Can a vector be converted to a defined DATA_TYPE

Description

the function also checks, if the conversion is perfect, or if something is lost (e.g., decimal places), or something is strange (like arbitrary suffixes in a date, just note, that as.POSIXct("2020-01-01 12:00:00 CET asdf") does not fail in R), but util_conversion_stable("2020-01-01 12:00:00 CET asdf", DATA_TYPES$DATETIME) will.

Usage

util_conversion_stable(vector, data_type, return_percentages = FALSE)

Arguments

vector

vector input vector,

data_type

enum The type, to what the conversion should be tried.

return_percentages

logical return the percentage of stable conversions or matches.

Details

HINT: util_conversion_stable(.Machine$integer.max + 1, DATA_TYPES$INTEGER) seems to work correctly, although is.integer(.Machine$integer.max + 1) returns FALSE.

Value

numeric ratio of convertible entries in vector


return a flip term for ggplot2 plots, if desired.

Description

return a flip term for ggplot2 plots, if desired.

Usage

util_coord_flip(w, h, p, ref_env, ...)

Arguments

w

width of the plot to determine its aspect ratio

h

height of the plot to determine its aspect ratio

p

the ggplot2 object, so far. If w or h are missing, p is used for an estimate on w and h, if both axes are discrete.

ref_env

environment of the actual entry function, so that the correct formals can be detected.

...

additional arguments for coord_flip or coord_cartesian

Value

coord_flip or coord_cartesian

See Also

Other process_functions: util_abbreviate(), util_all_is_integer(), util_attach_attr(), util_bQuote(), util_backtickQuote(), util_extract_matches(), util_par_pmap(), util_setup_rstudio_job(), util_suppress_output()


Copy default dependencies to the report's lib directory

Description

Copy default dependencies to the report's lib directory

Usage

util_copy_all_deps(dir, pages, ...)

Arguments

dir

report directory

pages

all pages to write

...

additional htmltools::htmlDependency objects to be added to all pages, also

Value

invisible(NULL)

See Also

Other reporting_functions: util_alias2caption(), util_create_page_file(), util_eval_to_dataquieR_result(), util_evaluate_calls(), util_float_index_menu(), util_generate_anchor_link(), util_generate_anchor_tag(), util_generate_calls(), util_generate_calls_for_function(), util_load_manual(), util_make_data_slot_from_table_slot(), util_order_by_order(), util_set_size()


Check referred variables

Description

This function operates in the environment of its caller (using eval.parent, similar to Function like C-Preprocessor-Macros ). Different from the other utility function that work in the caller's environment (prep_prepare_dataframes), It has no side effects except that the argument of the calling function specified in arg_name is normalized (set to its default or a general default if missing, variable names being all white space replaced by NAs). It expects two objects in the caller's environment: ds1 and meta_data. meta_data is the metadata data frame and ds1 is produced by a preceding call of prep_prepare_dataframes using meta_data and study_data. So this function can only be used after calling the function prep_prepare_dataframes

Usage

util_correct_variable_use(
  arg_name,
  allow_na,
  allow_more_than_one,
  allow_null,
  allow_all_obs_na,
  allow_any_obs_na,
  min_distinct_values,
  need_type,
  need_scale,
  role = "",
  overwrite = TRUE,
  do_not_stop = FALSE,
  remove_not_found = TRUE
)

util_correct_variable_use2(
  arg_name,
  allow_na,
  allow_more_than_one,
  allow_null,
  allow_all_obs_na,
  allow_any_obs_na,
  min_distinct_values,
  need_type,
  need_scale,
  role = arg_name,
  overwrite = TRUE,
  do_not_stop = FALSE,
  remove_not_found = TRUE
)

Arguments

arg_name

character Name of a function argument of the caller of util_correct_variable_use

allow_na

logical default = FALSE. allow NAs in the variable names argument given in arg_name

allow_more_than_one

logical default = FALSE. allow more than one variable names in arg_name

allow_null

logical default = FALSE. allow an empty variable name vector in the argument arg_name

allow_all_obs_na

logical default = TRUE. check observations for not being all NA

allow_any_obs_na

logical default = TRUE. check observations for being complete without any NA

min_distinct_values

integer Minimum number of distinct observed values of a study variable

need_type

character if not NA, variables must be of data type need_type according to the metadata, can be a pipe (|) separated list of allowed data types. Use ! to exclude a type. See DATA_TYPES for the predefined variable types of the dataquieR concept.

need_scale

character if not NA, variables must be of scale level need_scale according to the metadata, can be a pipe (|) separated list of allowed scale levels. Use ! to exclude a level. See SCALE_LEVELS for the predefined scale levels of the dataquieR concept.

role

character variable-argument role. Set different defaults for all allow-arguments and need_type of this util_correct_variable_use.. If given, it defines the intended use of the verified argument. For typical arguments and typical use cases, roles are predefined in .variable_arg_roles. The role's defaults can be overwritten by the arguments. If role is "" (default), the standards are allow_na = FALSE, allow_more_than_one = FALSE, allow_null = FALSE, allow_all_obs_na = TRUE, allow_any_obs_na = TRUE, and need_type = NA. Use util_correct_variable_use2 for using the arg_name as default for role. See .variable_arg_roles for currently available variable-argument roles.

overwrite

logical overwrite vector of variable names to match the labels given in label_col.

do_not_stop

logical do not throw an error, if one of the variables violates allow_all_obs_na, allow_any_obs_na or min_distinct_values. Instead, the variable will be removed from arg_name in the parent environment with a warning. This is helpful for functions which work with multiple variables.

remove_not_found

TODO: Not yet implemented

Details

util_correct_variable_use and util_correct_variable_use2 differ only in the default of the argument role.

util_correct_variable_use and util_correct_variable_use2 put strong effort on producing compressible error messages to the caller's caller (who is typically an end user of a dataquieR function).

The function ensures, that a specified argument of its caller that refers variable names (one or more as character vector) matches some expectations.

This function accesses the caller's environment!

See Also

.variable_arg_roles

Other robustness_functions: util_as_valid_missing_codes(), util_check_one_unique_value(), util_empty(), util_ensure_character(), util_ensure_in(), util_ensure_suggested(), util_expect_scalar(), util_fix_rstudio_bugs(), util_is_integer(), util_is_numeric_in(), util_is_valid_missing_codes(), util_match_arg(), util_observations_in_subgroups(), util_stop_if_not(), util_warn_unordered()

Other robustness_functions: util_as_valid_missing_codes(), util_check_one_unique_value(), util_empty(), util_ensure_character(), util_ensure_in(), util_ensure_suggested(), util_expect_scalar(), util_fix_rstudio_bugs(), util_is_integer(), util_is_numeric_in(), util_is_valid_missing_codes(), util_match_arg(), util_observations_in_subgroups(), util_stop_if_not(), util_warn_unordered()


Count Expected Observations

Description

Count participants, if an observation was expected, given the PART_VARS from item-level metadata

Usage

util_count_expected_observations(
  resp_vars,
  study_data,
  meta_data,
  label_col = LABEL,
  expected_observations = c("HIERARCHY", "ALL", "SEGMENT")
)

Arguments

resp_vars

character the response variables, for that a value may be expected

study_data

study_data

meta_data

meta_data

label_col

character mapping attribute colnames(study_data) vs. meta_data[label_col]

expected_observations

enum HIERARCHY | ALL | SEGMENT. How should PART_VARS be handled: - ALL: Ignore, all observations are expected - SEGMENT: if PART_VAR is 1, an observation is expected - HIERARCHY: the default, if the PART_VAR is 1 for this variable and also for all PART_VARS of PART_VARS up in the hierarchy, an observation is expected.

Value

a vector with the number of expected observations for each resp_vars.

See Also

Other missing_functions: util_all_intro_vars_for_rv(), util_filter_missing_list_table_for_rv(), util_get_code_list(), util_is_na_0_empty_or_false(), util_observation_expected(), util_remove_empty_rows(), util_replace_codes_by_NA()


Create an HTML file for the dq_report2

Description

Create an HTML file for the dq_report2

Usage

util_create_page_file(
  page_nr,
  pages,
  rendered_pages,
  dir,
  template_file,
  report,
  logo,
  loading,
  packageName,
  deps,
  progress_msg,
  progress,
  title,
  by_report
)

Arguments

page_nr

the number of the page being created

pages

list with all page-contents named by their desired file names

rendered_pages

list with all rendered (htmltools::renderTags) page-contents named by their desired file names

dir

target directory

template_file

the report template file to use

report

the output of dq_report2

logo PNG file

loading

loading animation div

packageName

the name of the current package

deps

dependencies, as pre-processed by htmltools::copyDependencyToDir and htmltools::renderDependencies

progress_msg

closure to call with progress information

progress

closure to call with progress information

title

character the web browser's window name

by_report

logical this report html is part of a set of reports, add a back-link

Value

invisible(file_name)

See Also

Other reporting_functions: util_alias2caption(), util_copy_all_deps(), util_eval_to_dataquieR_result(), util_evaluate_calls(), util_float_index_menu(), util_generate_anchor_link(), util_generate_anchor_tag(), util_generate_calls(), util_generate_calls_for_function(), util_load_manual(), util_make_data_slot_from_table_slot(), util_order_by_order(), util_set_size()


Create an overview of the reports created with dq_report_by

Description

Create an overview of the reports created with dq_report_by

Usage

util_create_report_by_overview(
  output_dir,
  strata_column,
  segment_column,
  strata_column_label,
  subgroup,
  mod_label
)

Arguments

output_dir

character the directory in which all reports are searched and the overview is saved

strata_column

character name of a study variable to stratify the report by. It can be null

segment_column

character name of a metadata attribute usable to split the report in sections of variables. It can be null

strata_column_label

character the label of the variable used as strata_column

subgroup

character optional, to define subgroups of cases

mod_label

list util_ensure_label() info

Value

an overview of all dataquieR reports created with dq_report_by


Create a dashboard-table from a report summary

Description

Create a dashboard-table from a report summary

Usage

util_dashboard_table(repsum)

Arguments

repsum

a report summary from summary(report)

See Also

Other html: util_extract_all_ids(), util_generate_pages_from_report(), util_get_hovertext()


Data type conversion

Description

Utility function to convert a study variable to match the data type given in the metadata, if possible.

Usage

util_data_type_conversion(x, type)

Arguments

x

the value

type

expected data type

Value

the transformed values (if possible)


Expression De-Parsing

Description

Turn unevaluated expressions into character strings.

Arguments

expr

any R expression.

collapse

a string, passed to paste()

width.cutoff

integer in [20, 500] determining the cutoff (in bytes) at which line-breaking is tried.

...

further arguments passed to deparse().

Details

This is a simple utility function for R < 4.0.0 to ensure a string result (character vector of length one), typically used in name construction, as util_deparse1(substitute(.)).

This avoids a dependency on backports and on R >= 4.0.0.

Value

the deparsed expression

See Also

Other condition_functions: util_condition_constructor_factory(), util_error(), util_find_external_functions_in_stacktrace(), util_find_first_externally_called_functions_in_stacktrace(), util_find_indicator_function_in_callers(), util_message(), util_suppress_warnings(), util_warning()


Detect cores

Description

See parallel::detectCores for further details.

Usage

util_detect_cores()

Value

number of available CPU cores.

See Also

Other system_functions: util_user_hint(), util_view_file()


Escape characters for HTML in a data frame

Description

Escape characters for HTML in a data frame

Usage

util_df_escape(x)

Arguments

x

data.frame to be escaped

Value

data.frame with html escaped content


Utility function to dichotomize variables

Description

This function uses the metadata attributes RECODE_CASES and/or RECODE_CONTROL to dichotomize the data. 'Cases' will be recoded to 1, 'controls' to 0. The recoding can be specified by an interval (for metric variables) or by a list of categories separated by the 'SPLIT_CHAR'. Recoding will be used for data quality checks that include a regression model.

Usage

util_dichotomize(study_data, meta_data, label_col = VAR_NAMES)

Arguments

study_data

study data without jump/missing codes as specified in the code conventions

meta_data

metadata as specified in the code conventions

label_col

variable attribute the name of the column in the metadata with labels of variables

See Also

Other data_management: util_assign_levlabs(), util_check_data_type(), util_check_group_levels(), util_compare_meta_with_study(), util_fix_merge_dups(), util_merge_data_frame_list(), util_rbind(), util_remove_na_records(), util_replace_hard_limit_violations(), util_round_to_decimal_places(), util_study_var2factor(), util_table_of_vct()


Utility function to characterize study variables

Description

This function summarizes some properties of measurement variables.

Usage

util_dist_selection(study_data, val_lab = lifecycle::deprecated())

Arguments

study_data

study data, pre-processed with prep_prepare_dataframes to replace missing value codes by NA

val_lab

deprecated

Value

data frame with one row for each variable in the study data and the following columns: Variables contains the names of the variables IsInteger contains a check whether the variable contains integer values only (variables coded as factor will be converted to integers) IsMultCat contains a check for variables with integer or string values whether there are more than two categories NCategory contains the number of distinct values for variables with values coded as integers or strings (excluding NA and empty entries) AnyNegative contains a check whether the variable contains any negative values NDistinct contains the number of distinct values PropZeroes reports the proportion of zeroes

See Also

Other metadata_management: util_find_free_missing_code(), util_find_var_by_meta(), util_get_var_att_names_of_level(), util_get_vars_in_segment(), util_looks_like_missing(), util_no_value_labels(), util_validate_known_meta(), util_validate_missing_lists()


Create an environment with several alias names for the study data variables

Description

generates an environment similar to as.environment(ds1), but makes variables available by their VAR_NAME, LABEL, and label_col - names.

Usage

util_ds1_eval_env(study_data, meta_data = "item_level", label_col = LABEL)

Arguments

study_data

data.frame the data frame that contains the measurements

meta_data

data.frame the data frame that contains metadata attributes of study data

label_col

variable attribute the name of the column in the metadata with labels of variables. If study_data has already been mapped, i.e., util_ds1_eval_env(ds1, ...) is called, this will work too


Test, if values of x are empty, i.e. NA or whitespace characters

Description

Test, if values of x are empty, i.e. NA or whitespace characters

Usage

util_empty(x)

Arguments

x

the vector to test

Value

a logical vector, same length as x; TRUE, if resp. element in x is "empty"

See Also

Other robustness_functions: util_as_valid_missing_codes(), util_check_one_unique_value(), util_correct_variable_use(), util_ensure_character(), util_ensure_in(), util_ensure_suggested(), util_expect_scalar(), util_fix_rstudio_bugs(), util_is_integer(), util_is_numeric_in(), util_is_valid_missing_codes(), util_match_arg(), util_observations_in_subgroups(), util_stop_if_not(), util_warn_unordered()


convert a value to character

Description

convert a value to character

Usage

util_ensure_character(x, error = FALSE, error_msg, ...)

Arguments

x

the value

error

logical if TRUE, an error is thrown, a warning otherwise in case of a conversion error

error_msg

error message to be displayed, if conversion was not possible

...

additional arguments passed to util_error or util_warning respectively in case of an error, and if an error_msg has been passed

Value

as.character(x)

See Also

Other robustness_functions: util_as_valid_missing_codes(), util_check_one_unique_value(), util_correct_variable_use(), util_empty(), util_ensure_in(), util_ensure_suggested(), util_expect_scalar(), util_fix_rstudio_bugs(), util_is_integer(), util_is_numeric_in(), util_is_valid_missing_codes(), util_match_arg(), util_observations_in_subgroups(), util_stop_if_not(), util_warn_unordered()


similar to match.arg

Description

will only warn and return a cleaned x.

Usage

util_ensure_in(x, set, err_msg, error = FALSE, applicability_problem = NA)

Arguments

x

character vector of needles

set

character vector representing the haystack

err_msg

character optional error message. Use %s twice, once for the missing elements and once for proposals

error

logical if TRUE, the execution will stop with an error, if not all x are elements of set, otherwise, it will throw a warning and "clean" the vector x from unexpected elements.

applicability_problem

logical error indicates unsuitable resp_vars

Value

character invisible(intersect(x, set))

See Also

Other robustness_functions: util_as_valid_missing_codes(), util_check_one_unique_value(), util_correct_variable_use(), util_empty(), util_ensure_character(), util_ensure_suggested(), util_expect_scalar(), util_fix_rstudio_bugs(), util_is_integer(), util_is_numeric_in(), util_is_valid_missing_codes(), util_match_arg(), util_observations_in_subgroups(), util_stop_if_not(), util_warn_unordered()


Utility function ensuring valid labels and variable names

Description

Valid labels should not be empty, be unique and do not exceed a certain length.

Usage

util_ensure_label(meta_data, label_col, max_label_len = MAX_LABEL_LEN)

Arguments

meta_data

data.frame the data frame that contains metadata attributes of study data

label_col

variable attribute the name of the column in the metadata with labels of variables

max_label_len

integer maximum length for the labels, defaults to 30.

Value

a list containing the study data, possibly with adapted column names, the metadata, possibly with adapted labels, and a string and a table informing about the changes


Support function to stop, if an optional package is not installed

Description

This function stops, if a package is not installed but needed for using an optional feature of dataquieR.

Usage

util_ensure_suggested(
  pkg,
  goal = ifelse(is.null(rlang::caller_call()), "work", paste("call",
    sQuote(rlang::call_name(rlang::caller_call())))),
  err = TRUE,
  and_import = c()
)

Arguments

pkg

needed package

goal

feature description for error message.

err

logical Should the function throw an error (default) or a warning?

and_import

import the listed function to the caller's environment

Value

TRUE if all packages in pkg are available, FALSE if at least one of the packages is missing.

See Also

Other robustness_functions: util_as_valid_missing_codes(), util_check_one_unique_value(), util_correct_variable_use(), util_empty(), util_ensure_character(), util_ensure_in(), util_expect_scalar(), util_fix_rstudio_bugs(), util_is_integer(), util_is_numeric_in(), util_is_valid_missing_codes(), util_match_arg(), util_observations_in_subgroups(), util_stop_if_not(), util_warn_unordered()

Examples

## Not run:  # internal use, only
f <- function() {
  util_ensure_suggested <- get("util_ensure_suggested",
    asNamespace("dataquieR"))
  util_ensure_suggested("ggplot2", "Test",
      and_import = "(ggplot|geom_.*|aes)")
  print(ggplot(cars, aes(x = speed)) + geom_histogram())
}
f()

## End(Not run)


Produce an error message with a useful short stack trace. Then it stops the execution.

Description

Produce an error message with a useful short stack trace. Then it stops the execution.

Usage

util_error(
  m,
  ...,
  applicability_problem = NA,
  intrinsic_applicability_problem = NA,
  integrity_indicator = "none",
  level = 0,
  immediate,
  title = "",
  additional_classes = c()
)

Arguments

m

error message or a condition

...

arguments for sprintf on m, if m is a character

applicability_problem

logical TRUE, if an applicability issue, that is, the information for computation is missing (that is, an error that indicates missing metadata) or an error because the requirements of the stopped function were not met, e.g., a barplot was called for metric data. We can have logical or empirical applicability problems. empirical is the default, if the argument intrinsic_applicability_problem is left unset or set to FALSE.

intrinsic_applicability_problem

logical TRUE, if this is a logical applicability issue, that is, the computation makes no sense (for example, an error of unsuitable resp_vars). Intrinsic/logical applicability problems are also applicability problems. Non-logical applicability problems are called empirical applicability problems.

integrity_indicator

character the message is an integrity problem, here is the indicator abbreviation..

level

integer level of the error message (defaults to 0). Higher levels are more severe.

immediate

logical not used.

additional_classes

character additional classes the thrown condition object should inherit from, first.

Value

nothing, its purpose is to stop.

See Also

Other condition_functions: util_condition_constructor_factory(), util_deparse1(), util_find_external_functions_in_stacktrace(), util_find_first_externally_called_functions_in_stacktrace(), util_find_indicator_function_in_callers(), util_message(), util_suppress_warnings(), util_warning()


Evaluate a parsed redcap rule for given study data

Description

also allows to use VAR_NAMES in the rules, if other labels have been selected

Usage

util_eval_rule(
  rule,
  ds1,
  meta_data = "item_level",
  use_value_labels,
  replace_missing_by = "NA",
  replace_limits = TRUE
)

Arguments

rule

the redcap rule (parsed, already)

ds1

the study data as prepared by prep_prepare_dataframes

meta_data

the metadata

use_value_labels

map columns with VALUE_LABELS as factor variables

replace_missing_by

enum LABEL | INTERPRET | NA . Missing codes should be replaced by the missing labels, the AAPOR codes from the missing table or by NA. Can also be an empty string to keep the codes.

replace_limits

logical replace hard limit violations by NA

Value

the result of the parsed rule

See Also

Other redcap: util_get_redcap_rule_env()


Evaluate an expression and create a dataquieR_result object from it's evaluated value

Description

if an error occurs, the function will return a corresponding object representing that error. all conditions will be recorded and replayed, whenever the result is printed by print.dataquieR_result.

Usage

util_eval_to_dataquieR_result(
  expression,
  env = parent.frame(),
  filter_result_slots,
  nm,
  function_name,
  my_call = expression,
  my_storr_object = NULL,
  init = FALSE,
  called_in_pipeline = TRUE
)

Arguments

expression

the expression

env

the environment to evaluate the expression in

filter_result_slots

character regular expressions, only if an indicator function's result's name matches one of these, it'll be used for the report. If of length zero, no filtering is performed.

nm

character name for the computed result

function_name

character name of the function to be executed

my_call

the call being executed (equivalent to expression)

my_storr_object

a storr object to store the result in

init

logical is this an initial call to compute dummy results?

called_in_pipeline

logical if the evaluation should be considered as part of a pipeline.

Value

a dataquieR_result object

See Also

Other reporting_functions: util_alias2caption(), util_copy_all_deps(), util_create_page_file(), util_evaluate_calls(), util_float_index_menu(), util_generate_anchor_link(), util_generate_anchor_tag(), util_generate_calls(), util_generate_calls_for_function(), util_load_manual(), util_make_data_slot_from_table_slot(), util_order_by_order(), util_set_size()


Generate a full DQ report, v2

Description

Generate a full DQ report, v2

Usage

util_evaluate_calls(
  all_calls,
  study_data,
  meta_data,
  label_col,
  meta_data_segment,
  meta_data_dataframe,
  meta_data_cross_item,
  resp_vars,
  filter_result_slots,
  cores,
  debug_parallel,
  mode = c("default", "futures", "queue", "parallel"),
  mode_args,
  my_storr_object = NULL
)

Arguments

all_calls

list a list of calls

study_data

data.frame the data frame that contains the measurements

meta_data

data.frame the data frame that contains metadata attributes of study data

label_col

variable attribute the name of the column in the metadata with labels of variables

meta_data_segment

data.frame – optional: Segment level metadata

meta_data_dataframe

data.frame – optional: Data frame level metadata

meta_data_cross_item

data.frame – optional: cross-item level metadata

resp_vars

variable list the name of the measurement variables for the report.

filter_result_slots

character regular expressions, only if an indicator function's result's name matches one of these, it'll be used for the report. If of length zero, no filtering is performed.

cores

integer number of cpu cores to use or a named list with arguments for parallelMap::parallelStart or NULL, if parallel has already been started by the caller. Can also be a cluster.

debug_parallel

logical print blocks currently evaluated in parallel

mode

character work mode for parallel execution. default is "default", the values mean: - default: use queue except cores has been set explicitly - futures: use the future package - queue: use a queue as described in the examples from the callr package by Csárdi and Chang and start sub-processes as workers that evaluate the queue. - parallel: use the cluster from cores to evaluate all calls of indicator functions using the classic R parallel back-ends

mode_args

list of arguments for the selected mode. As of writing this manual, only for the mode queue the argument step is supported, which gives the number of function calls that are run by one worker at a time. the default is 15, which gives on most of the tested systems a good balance between synchronization overhead and idling workers.

Value

a dataquieR_resultset2. Can be printed creating a RMarkdown-report.

See Also

Other reporting_functions: util_alias2caption(), util_copy_all_deps(), util_create_page_file(), util_eval_to_dataquieR_result(), util_float_index_menu(), util_generate_anchor_link(), util_generate_anchor_tag(), util_generate_calls(), util_generate_calls_for_function(), util_load_manual(), util_make_data_slot_from_table_slot(), util_order_by_order(), util_set_size()


Verify, that argument is a data frame

Description

stops with an error, if not. will add the columns, and return the resulting extended data frame, and also updating the original data frame in the calling environment, if #' x is empty (data frames easily break to 0-columns in R, if they have not rows, e.g. using some split/rbind pattern)

Usage

util_expect_data_frame(
  x,
  col_names,
  convert_if_possible,
  custom_errors,
  dont_assign,
  keep_types = FALSE
)

Arguments

x

an object that is verified to be a data.frame.

col_names

column names x must contain or named list of predicates to check the columns (e.g., list(AGE=is.numeric, SEX=is.character))

convert_if_possible

if given, for each column, a lambda can be given similar to col_names check functions. This lambda would be used to try a conversion. If a conversion fails (returns NA, where the input was not ‘util_empty’), an error is still thrown, the data is converted, otherwise

custom_errors

list with error messages, specifically per column. names of the list are column names, values are messages (character).

dont_assign

set TRUE to keep x in the caller environment untouched

keep_types

logical keep types as possibly defined in a file, if the data frame is loaded from one. set TRUE for study data.

Value

invisible data frame


check, if a scalar/vector function argument matches expectations

Description

check, if a scalar/vector function argument matches expectations

Usage

util_expect_scalar(
  arg_name,
  allow_more_than_one = FALSE,
  allow_null = FALSE,
  allow_na = FALSE,
  min_length = -Inf,
  max_length = Inf,
  check_type,
  convert_if_possible,
  conversion_may_replace_NA = FALSE,
  dont_assign = FALSE,
  error_message
)

Arguments

arg_name

the argument

allow_more_than_one

allow vectors

allow_null

allow NULL

allow_na

allow NAs

min_length

minimum length of the argument's value

max_length

maximum length of the argument's value

check_type

a predicate function, that must return TRUE on the argument's value.

convert_if_possible

if given, a lambda can be given similar to check_type This lambda would be used to try a conversion. If a conversion fails (returns NA, where the input was not ‘util_empty’), an error is still thrown, the data is converted, otherwise

conversion_may_replace_NA

if set to TRUE, we can define a function in convert_if_possible that replaces NA values without causing a warning, but this option is set to FALSE by default to catch possible conversion problems (use it with caution).

dont_assign

set TRUE to keep x in the caller environment untouched

error_message

if check_type() returned FALSE, show this instead of a default error message.

Value

the value of arg_name – but this is updated in the calling frame anyway.

See Also

Other robustness_functions: util_as_valid_missing_codes(), util_check_one_unique_value(), util_correct_variable_use(), util_empty(), util_ensure_character(), util_ensure_in(), util_ensure_suggested(), util_fix_rstudio_bugs(), util_is_integer(), util_is_numeric_in(), util_is_valid_missing_codes(), util_match_arg(), util_observations_in_subgroups(), util_stop_if_not(), util_warn_unordered()

Examples

## Not run: 
f <- function(x) {
  util_expect_scalar(x, check_type = is.integer)
}
f(42L)
try(f(42))
g <- function(x) {
  util_expect_scalar(x, check_type = is.integer, convert_if_possible =
          as.integer)
}
g(42L)
g(42)

## End(Not run)


Extract all ids from a list of htmltools objects

Description

Extract all ids from a list of htmltools objects

Usage

util_extract_all_ids(pages)

Arguments

pages

the list of objects

Value

a character vector with valid targets

See Also

Other html: util_dashboard_table(), util_generate_pages_from_report(), util_get_hovertext()


Extract columns of a SummaryTable (or Segment, ...)

Description

Extract columns of a SummaryTable (or Segment, ...)

Usage

util_extract_indicator_metrics(Table)

Arguments

Table

data.frame, a table

Value

data.frame columns with indicator metrics from Table

See Also

Other summary_functions: prep_combine_report_summaries(), prep_extract_classes_by_functions(), prep_extract_summary(), prep_extract_summary.dataquieR_result(), prep_extract_summary.dataquieR_resultset2(), prep_render_pie_chart_from_summaryclasses_ggplot2(), prep_render_pie_chart_from_summaryclasses_plotly(), prep_summary_to_classes(), util_as_cat(), util_as_integer_cat(), util_get_category_for_result(), util_get_colors(), util_get_labels_grading_class(), util_get_message_for_result(), util_get_rule_sets(), util_get_ruleset_formats(), util_get_thresholds(), util_html_table(), util_sort_by_order()


return all matches of an expression

Description

return all matches of an expression

Usage

util_extract_matches(data, pattern)

Arguments

data

a character vector

pattern

a character string containing a regular expression

Value

A list with matching elements or NULL (in case on non-matching elements)

Author(s)

Josh O'Brien

See Also

Stack Overflow

Other process_functions: util_abbreviate(), util_all_is_integer(), util_attach_attr(), util_bQuote(), util_backtickQuote(), util_coord_flip(), util_par_pmap(), util_setup_rstudio_job(), util_suppress_output()

Examples

## Not run:  # not exported, so not tested
dat0 <- list("a sentence with citation (Ref. 12), (Ref. 13), and then (Ref. 14)",
  "another sentence without reference")
pat <- "Ref. (\\d+)"
util_extract_matches(dat0, pat)

## End(Not run)


Filter a MISSING_LIST_TABLE for rows matching the variable rv

Description

In MISSING_LIST_TABLE, a column resp_vars may be specified. If so, and if, for a row, this column is not empty, then that row only affects the one variable specified in that cell

Usage

util_filter_missing_list_table_for_rv(table, rv, rv2 = rv)

Arguments

table

cause_label_df a data frame with missing codes and optionally resp_vars. It also comprises labels and optionally an interpretation column with AAPOR codes. Must already cover the variable rv, i.e., item level metadata is not checked to find the suitable missing table for rv.

rv

variable the response variable to filter the missing list for specified by a label.

rv2

variable the response variable to filter the missing list for specified by a VAR_NAMES-name.

Value

data.frame the row-wise bound data frames as one data frame

See Also

Other missing_functions: util_all_intro_vars_for_rv(), util_count_expected_observations(), util_get_code_list(), util_is_na_0_empty_or_false(), util_observation_expected(), util_remove_empty_rows(), util_replace_codes_by_NA()


Filter collection based on its names() using regular expressions

Description

Filter collection based on its names() using regular expressions

Usage

util_filter_names_by_regexps(collection, regexps)

Arguments

collection

a named collection (list, vector, ...)

regexps

character a vector of regular expressions

Value

collection reduced to entries, that's names match at least any expression from regexps

See Also

Other string_functions: util_abbreviate_unique(), util_pretty_vector_string(), util_set_dQuoteString(), util_set_sQuoteString(), util_sub_string_left_from_.(), util_sub_string_right_from_.(), util_translate()

Examples

## Not run:  # internal function
util_filter_names_by_regexps(iris, c("epa", "eta"))

## End(Not run)


Function that calculated height and width values for script_iframe

Description

Function that calculated height and width values for script_iframe

Usage

util_finalize_sizing_hints(sizing_hints)

Arguments

sizing_hints

list containing information for setting the size of the iframe

Value

a list with figure_type_id, w, and h; sizes are as CSS, existing elements are kept, w_in_cm and h_in_cm are estimates for the size in centimeters on a typical computer display (in 2024)


Find externally called function in the stack trace

Description

intended use: error messages for the user

Usage

util_find_external_functions_in_stacktrace(
  sfs = rev(sys.frames()),
  cls = rev(sys.calls())
)

Arguments

sfs

reverse sys.frames to search in

cls

reverse sys.calls to search in

Value

vector of logicals stating for each index, if it had been called externally

See Also

Other condition_functions: util_condition_constructor_factory(), util_deparse1(), util_error(), util_find_first_externally_called_functions_in_stacktrace(), util_find_indicator_function_in_callers(), util_message(), util_suppress_warnings(), util_warning()


Find first externally called function in the stack trace

Description

intended use: error messages for the user

Usage

util_find_first_externally_called_functions_in_stacktrace(
  sfs = rev(sys.frames()),
  cls = rev(sys.calls())
)

Arguments

sfs

reverse sys.frames to search in

cls

reverse sys.calls to search in

Value

reverse sys.frames index of first non-dataquieR function in this stack

See Also

Other condition_functions: util_condition_constructor_factory(), util_deparse1(), util_error(), util_find_external_functions_in_stacktrace(), util_find_indicator_function_in_callers(), util_message(), util_suppress_warnings(), util_warning()


Check, if x contains valid missing codes

Description

Check, if x contains valid missing codes

Usage

util_find_free_missing_code(x)

Arguments

x

a vector of missing codes

Value

a missing code not in x

See Also

Other metadata_management: util_dist_selection(), util_find_var_by_meta(), util_get_var_att_names_of_level(), util_get_vars_in_segment(), util_looks_like_missing(), util_no_value_labels(), util_validate_known_meta(), util_validate_missing_lists()


Search for a formal in the stack trace

Description

Similar to dynGet(), find a symbol in the closest data quality indicator function and return its value. Can stop(), if symbol evaluation causes a stop.

Usage

util_find_indicator_function_in_callers(symbol = "resp_vars")

Arguments

symbol

symbol to find

Value

value of the symbol, if available, NULL otherwise

See Also

Other condition_functions: util_condition_constructor_factory(), util_deparse1(), util_error(), util_find_external_functions_in_stacktrace(), util_find_first_externally_called_functions_in_stacktrace(), util_message(), util_suppress_warnings(), util_warning()


Try hard, to map a variable

Description

does not warn on ambiguities nor if not found (but in the latter case, it returns ifnotfound)

Usage

util_find_var_by_meta(
  resp_vars,
  meta_data = "item_level",
  label_col = LABEL,
  allowed_sources = c(VAR_NAMES, label_col, LABEL, LONG_LABEL, "ORIGINAL_VAR_NAMES",
    "ORIGINAL_LABEL"),
  target = VAR_NAMES,
  ifnotfound = NA_character_
)

Arguments

resp_vars

variables to map from

meta_data

metadata

label_col

label-col to map from, if not allowed_sources should be entirely passed

allowed_sources

allowed names to map from (as metadata columns)

target

metadata attribute to map to

ifnotfound

list A list of values to be used if the item is not found: it will be coerced to a list if necessary.

Value

vector of mapped target names of resp_vars

See Also

Other metadata_management: util_dist_selection(), util_find_free_missing_code(), util_get_var_att_names_of_level(), util_get_vars_in_segment(), util_looks_like_missing(), util_no_value_labels(), util_validate_known_meta(), util_validate_missing_lists()


Move the first row of a data frame to its column names

Description

Move the first row of a data frame to its column names

Usage

util_first_row_to_colnames(dfr)

Arguments

dfr

data.frame

Value

data.frame with first row as column names


Fix results from merge

Description

this function handles the result of merge()-calls, if no.dups = TRUE and suffixes = c("", "")

Usage

util_fix_merge_dups(dfr, stop_if_incompatible = TRUE)

Arguments

dfr

data frame to fix

stop_if_incompatible

logical stop if data frame can not be fixed

See Also

Other data_management: util_assign_levlabs(), util_check_data_type(), util_check_group_levels(), util_compare_meta_with_study(), util_dichotomize(), util_merge_data_frame_list(), util_rbind(), util_remove_na_records(), util_replace_hard_limit_violations(), util_round_to_decimal_places(), util_study_var2factor(), util_table_of_vct()


RStudio crashes on parallel calls in some versions on Darwin based operating systems with R 4

Description

RStudio crashes on parallel calls in some versions on Darwin based operating systems with R 4

Usage

util_fix_rstudio_bugs()

Value

invisible null

See Also

Other robustness_functions: util_as_valid_missing_codes(), util_check_one_unique_value(), util_correct_variable_use(), util_empty(), util_ensure_character(), util_ensure_in(), util_ensure_suggested(), util_expect_scalar(), util_is_integer(), util_is_numeric_in(), util_is_valid_missing_codes(), util_match_arg(), util_observations_in_subgroups(), util_stop_if_not(), util_warn_unordered()


Ensure, sizing hint sticks at the dqr, only

Description

Ensure, sizing hint sticks at the dqr, only

Usage

util_fix_sizing_hints(dqr, x)

Arguments

dqr

a dataquieR result

x

a plot object

Value

a list with dqr and x, but fixed


Fix a storr object, if it features the factory-attribute

Description

Fix a storr object, if it features the factory-attribute

Usage

util_fix_storr_object(my_storr_object)

Arguments

my_storr_object

a storr-object

Value

a (hopefully) working storr_object

See Also

util_storr_factory()


return a single page navigation menu floating on the right

Description

if displayed in a dq_report2

Usage

util_float_index_menu(index_menu_table, object)

Arguments

index_menu_table

data.frame columns: links, hovers, texts

object

htmltools tag list, used, instead of index_menu_table, if passed

See Also

Other reporting_functions: util_alias2caption(), util_copy_all_deps(), util_create_page_file(), util_eval_to_dataquieR_result(), util_evaluate_calls(), util_generate_anchor_link(), util_generate_anchor_tag(), util_generate_calls(), util_generate_calls_for_function(), util_load_manual(), util_make_data_slot_from_table_slot(), util_order_by_order(), util_set_size()

Examples

## Not run: 
util_float_index_menu(tibble::tribble(
   ~ links, ~ hovers, ~ texts,
   "http://www.google.de/#xxx", "This is Google", "to Google",
   "http://www.uni-giessen.de/#xxx", "This is Gießen", "cruising on the A45"
))

## End(Not run)



Plots simple HTML tables with background color scale

Description

Plots simple HTML tables with background color scale

Usage

util_formattable(
  tb,
  min_val = min(tb, na.rm = TRUE),
  max_val = max(tb, na.rm = TRUE),
  min_color = c(0, 0, 255),
  max_color = c(255, 0, 0),
  soften = function(x) stats::plogis(x, location = 0.5, scale = 0.1),
  style_header = "font-weight: bold;",
  text_color_mode = c("bw", "gs"),
  hover_texts = NULL,
  escape_all_content = TRUE
)

Arguments

tb

data.frame the table as data.frame with mostly numbers

min_val

numeric minimum value for the numbers in tb

max_val

numeric maximum value for the numbers in tb

min_color

numeric vector with the RGB color values for the minimum color, values between 0 and 255

max_color

numeric vector with the RGB color values for the maximum color, values between 0 and 255

soften

function to be applied to the relative values between 0 and 1 before mapping them to a color

style_header

character to be applied to style the HTML header of the table

text_color_mode

enum bw | gs. Should the text be displayed in black and white or using a grey scale? In both cases, the color will be adapted to the background.

hover_texts

data.frame if not NULL, this data frame contains html code displayed when the user's mouse pointer moves inside corresponding cells from tb. Can contain HTML code.

escape_all_content

logical if TRUE, treat tb and hover_texts using some HTML escaping function

Value

htmltools compatible object

See Also

util_html_table()

Examples

## Not run: 

tb <- as.data.frame(matrix(ncol = 5, nrow = 5))
tb[] <- sample(1:100, prod(dim(tb)), replace = TRUE)
tb[, 1] <- paste("case", 1:nrow(tb))
htmltools::browsable(util_formattable(tb))
htmltools::browsable(util_formattable(tb[, -1]))

## End(Not run)


Get description for an indicator function

Description

Get description for an indicator function

Usage

util_function_description(fname)

Arguments

fname

the function name

Value

the description


Description

for dq_report2

Usage

util_generate_anchor_link(
  varname,
  callname,
  order_context = c("variable", "indicator"),
  name,
  title
)

Arguments

varname

variable to create a link to

callname

function call to create a link to

order_context

link created to variable overview or indicator overview page

name

replaces varname and callname, must contain the . separator, then

title

optional, replaces auto-generated link title

Value

the htmltools tag

See Also

Other reporting_functions: util_alias2caption(), util_copy_all_deps(), util_create_page_file(), util_eval_to_dataquieR_result(), util_evaluate_calls(), util_float_index_menu(), util_generate_anchor_tag(), util_generate_calls(), util_generate_calls_for_function(), util_load_manual(), util_make_data_slot_from_table_slot(), util_order_by_order(), util_set_size()


Generate a tag for a specific result

Description

for dq_report2

Usage

util_generate_anchor_tag(
  varname,
  callname,
  order_context = c("variable", "indicator"),
  name
)

Arguments

varname

variable to create an anchor for

callname

function call to create an anchor for

order_context

anchor created on variable overview or indicator overview page

name

replaces varname and callname, must contain the . separator, then

Value

the htmltools tag

See Also

Other reporting_functions: util_alias2caption(), util_copy_all_deps(), util_create_page_file(), util_eval_to_dataquieR_result(), util_evaluate_calls(), util_float_index_menu(), util_generate_anchor_link(), util_generate_calls(), util_generate_calls_for_function(), util_load_manual(), util_make_data_slot_from_table_slot(), util_order_by_order(), util_set_size()


Generate an execution/calling plan for computing a report from the metadata

Description

Generate an execution/calling plan for computing a report from the metadata

Usage

util_generate_calls(
  dimensions,
  meta_data,
  label_col,
  meta_data_segment,
  meta_data_dataframe,
  meta_data_cross_item,
  specific_args,
  arg_overrides,
  resp_vars,
  filter_indicator_functions
)

Arguments

dimensions

dimensions Vector of dimensions to address in the report. Allowed values in the vector are Completeness, Consistency, and Accuracy. The generated report will only cover the listed data quality dimensions. Accuracy is computational expensive, so this dimension is not enabled by default. Completeness should be included, if Consistency is included, and Consistency should be included, if Accuracy is included to avoid misleading detections of e.g. missing codes as outliers, please refer to the data quality concept for more details. Integrity is always included.

meta_data

data.frame the data frame that contains metadata attributes of study data

label_col

variable attribute the name of the column in the metadata with labels of variables

meta_data_segment

data.frame – optional: Segment level metadata

meta_data_dataframe

data.frame – optional: Data frame level metadata

meta_data_cross_item

data.frame – optional: Cross-item level metadata

specific_args

list named list of arguments specifically for one of the called functions, the of the list elements correspond to the indicator functions whose calls should be modified. The elements are lists of arguments.

arg_overrides

list arguments to be passed to all called indicator functions if applicable.

resp_vars

variables to be respected, NULL means to use all.

filter_indicator_functions

character regular expressions, only if an indicator function's name matches one of these, it'll be used for the report. If of length zero, no filtering is performed.

Value

a list of calls

See Also

Other reporting_functions: util_alias2caption(), util_copy_all_deps(), util_create_page_file(), util_eval_to_dataquieR_result(), util_evaluate_calls(), util_float_index_menu(), util_generate_anchor_link(), util_generate_anchor_tag(), util_generate_calls_for_function(), util_load_manual(), util_make_data_slot_from_table_slot(), util_order_by_order(), util_set_size()


Generate function calls for a given indicator function

Description

new reporting pipeline v2.0

Usage

util_generate_calls_for_function(
  fkt,
  meta_data,
  label_col,
  meta_data_segment,
  meta_data_dataframe,
  meta_data_cross_item,
  specific_args,
  arg_overrides,
  resp_vars
)

Arguments

fkt

the indicator function's name

meta_data

the item level metadata data frame

label_col

the label column

meta_data_segment

segment level metadata

meta_data_dataframe

data frame level metadata

meta_data_cross_item

cross-item level metadata

specific_args

argument overrides for specific functions

arg_overrides

general argument overrides

resp_vars

variables to be respected

Value

function calls for the given function

See Also

Other reporting_functions: util_alias2caption(), util_copy_all_deps(), util_create_page_file(), util_eval_to_dataquieR_result(), util_evaluate_calls(), util_float_index_menu(), util_generate_anchor_link(), util_generate_anchor_tag(), util_generate_calls(), util_load_manual(), util_make_data_slot_from_table_slot(), util_order_by_order(), util_set_size()


Convert a dataquieR report v2 to a named list of web pages

Description

Convert a dataquieR report v2 to a named list of web pages

Usage

util_generate_pages_from_report(
  report,
  template,
  disable_plotly,
  progress = progress,
  progress_msg = progress_msg,
  block_load_factor,
  dir,
  my_dashboard
)

Arguments

report

dataquieR report v2.

template

character template to use, only the name, not the path

disable_plotly

logical do not use plotly, even if installed

progress

function lambda for progress in percent – 1-100

progress_msg

function lambda for progress messages

block_load_factor

numeric multiply size of parallel compute blocks by this factor.

dir

character output directory for potential iframes.

my_dashboard

list of class shiny.tag.list featuring a dashboard or missing or NULL

Value

named list, each entry becomes a file with the name of the entry. the contents are HTML objects as used by htmltools.

See Also

Other html: util_dashboard_table(), util_extract_all_ids(), util_get_hovertext()

Examples

## Not run: 
devtools::load_all()
prep_load_workbook_like_file("meta_data_v2")
report <- dq_report2("study_data", dimensions = NULL, label_col = "LABEL");
save(report, file = "report_v2.RData")
report <- dq_report2("study_data", label_col = "LABEL");
save(report, file = "report_v2_short.RData")

## End(Not run)


Create a table summarizing the number of indicators and descriptors in the report

Description

Create a table summarizing the number of indicators and descriptors in the report

Usage

util_generate_table_indicators_descriptors(report)

Arguments

report

a report

Value

a table containing the number of indicators and descriptors created in the report, separated by data quality dimension.


Return the category for a result

Description

messages do not cause any category, warnings are cat3, errors are cat5

Usage

util_get_category_for_result(
  result,
  aspect = c("applicability", "error", "anamat", "indicator_or_descriptor"),
  ...
)

Arguments

result

a dataquieR_resultset2 result

aspect

an aspect/problem category of results (error, applicability error)

...

not used

Value

a category, see util_as_cat()

See Also

Other summary_functions: prep_combine_report_summaries(), prep_extract_classes_by_functions(), prep_extract_summary(), prep_extract_summary.dataquieR_result(), prep_extract_summary.dataquieR_resultset2(), prep_render_pie_chart_from_summaryclasses_ggplot2(), prep_render_pie_chart_from_summaryclasses_plotly(), prep_summary_to_classes(), util_as_cat(), util_as_integer_cat(), util_extract_indicator_metrics(), util_get_colors(), util_get_labels_grading_class(), util_get_message_for_result(), util_get_rule_sets(), util_get_ruleset_formats(), util_get_thresholds(), util_html_table(), util_sort_by_order()


Fetch a missing code list from the metadata

Description

get missing codes from metadata (e.g. MISSING_LIST or JUMP_LIST)

Usage

util_get_code_list(
  x,
  code_name,
  split_char = SPLIT_CHAR,
  mdf,
  label_col = VAR_NAMES,
  warning_if_no_list = TRUE,
  warning_if_unsuitable_list = TRUE
)

Arguments

x

variable the name of the variable to retrieve code lists for. only one variable at a time is supported, not vectorized!!

code_name

variable attribute JUMP_LIST or MISSING_LIST: Which codes to retrieve.

split_char

character len = 1. Character(s) used to separate different codes in the metadata, usually |, as in 99999|99998|99997.

mdf

data.frame the data frame that contains metadata attributes of study data

label_col

variable attribute the name of the column in the metadata with labels of variables

warning_if_no_list

logical len = 1. If TRUE, a warning is displayed, if no missing codes are available for a variable.

warning_if_unsuitable_list

logical len = 1. If TRUE, a warning is displayed, if missing codes do not match with a variable' data type.

Value

numeric vector of missing codes.

See Also

Other missing_functions: util_all_intro_vars_for_rv(), util_count_expected_observations(), util_filter_missing_list_table_for_rv(), util_is_na_0_empty_or_false(), util_observation_expected(), util_remove_empty_rows(), util_replace_codes_by_NA()


Get colors for each russet DQ category

Description

Get colors for each russet DQ category

Usage

util_get_colors()

Value

named vector of colors, names are categories (e.g, "1" to "5") values are colors as HTML RGB hexadecimal strings

See Also

Other summary_functions: prep_combine_report_summaries(), prep_extract_classes_by_functions(), prep_extract_summary(), prep_extract_summary.dataquieR_result(), prep_extract_summary.dataquieR_resultset2(), prep_render_pie_chart_from_summaryclasses_ggplot2(), prep_render_pie_chart_from_summaryclasses_plotly(), prep_summary_to_classes(), util_as_cat(), util_as_integer_cat(), util_extract_indicator_metrics(), util_get_category_for_result(), util_get_labels_grading_class(), util_get_message_for_result(), util_get_rule_sets(), util_get_ruleset_formats(), util_get_thresholds(), util_html_table(), util_sort_by_order()


Read additional concept tables

Description

Read additional concept tables

Usage

util_get_concept_info(filename, ...)

Arguments

filename

RDS-file name without extension to read from

...

passed to subset

Value

a data frame


Get encoding from metadata or guess it from data

Description

Get encoding from metadata or guess it from data

Usage

util_get_encoding(
  resp_vars = colnames(study_data),
  study_data,
  label_col,
  meta_data,
  meta_data_dataframe
)

Arguments

resp_vars

variable the names of the measurement variables, if missing or NULL, all variables will be checked

study_data

data.frame the data frame that contains the measurements

label_col

variable attribute the name of the column in the metadata with labels of variables

meta_data

data.frame old name for item_level

meta_data_dataframe

data.frame the data frame that contains the metadata for the data frame level

Value

named vector of valid encoding strings matching resp_vars


Find a foreground color for a background

Description

black or white

Usage

util_get_fg_color(cl)

Arguments

cl

colors

Value

black or white for each cl

See Also

stackoverflow.com


Import vector of hover text for tables in the report

Description

Import vector of hover text for tables in the report

Usage

util_get_hovertext(x)

Arguments

x

name of the tables. They are meta_data, meta_data_segment, meta_data_dataframe, meta_data_cross_item, meta_data_item_computation, com_item_missingness, int_datatype_matrix, con_inadmissible_categorical, rulesetformat, gradingruleset

Value

named vector containing the hover text from the file metadata-hovertext.rds in the inst folder. Names correspond to column names in the metadata tables

See Also

Other html: util_dashboard_table(), util_extract_all_ids(), util_generate_pages_from_report()


Get labels for each russet DQ category

Description

Get labels for each russet DQ category

Usage

util_get_labels_grading_class()

Value

named vector of labels, names are categories (e.g, "1" to "5") values are labels

See Also

Other summary_functions: prep_combine_report_summaries(), prep_extract_classes_by_functions(), prep_extract_summary(), prep_extract_summary.dataquieR_result(), prep_extract_summary.dataquieR_resultset2(), prep_render_pie_chart_from_summaryclasses_ggplot2(), prep_render_pie_chart_from_summaryclasses_plotly(), prep_summary_to_classes(), util_as_cat(), util_as_integer_cat(), util_extract_indicator_metrics(), util_get_category_for_result(), util_get_colors(), util_get_message_for_result(), util_get_rule_sets(), util_get_ruleset_formats(), util_get_thresholds(), util_html_table(), util_sort_by_order()


Return messages/warnings/notes/error messages for a result

Description

Return messages/warnings/notes/error messages for a result

Usage

util_get_message_for_result(
  result,
  aspect = c("applicability", "error", "anamat", "indicator_or_descriptor"),
  collapse = "\n<br />\n",
  ...
)

Arguments

result

a dataquieR_resultset2 result

aspect

an aspect/problem category of results

collapse

either a lambda function or a separator for combining multiple messages for the same result

...

not used

Value

hover texts for results with data quality issues, run-time errors, warnings or notes (aka messages)

See Also

Other summary_functions: prep_combine_report_summaries(), prep_extract_classes_by_functions(), prep_extract_summary(), prep_extract_summary.dataquieR_result(), prep_extract_summary.dataquieR_resultset2(), prep_render_pie_chart_from_summaryclasses_ggplot2(), prep_render_pie_chart_from_summaryclasses_plotly(), prep_summary_to_classes(), util_as_cat(), util_as_integer_cat(), util_extract_indicator_metrics(), util_get_category_for_result(), util_get_colors(), util_get_labels_grading_class(), util_get_rule_sets(), util_get_ruleset_formats(), util_get_thresholds(), util_html_table(), util_sort_by_order()


an environment with functions available for REDcap rules

Description

an environment with functions available for REDcap rules

Usage

util_get_redcap_rule_env()

Value

environment

See Also

Other redcap: util_eval_rule()


Get rule sets for DQ grading

Description

Get rule sets for DQ grading

Usage

util_get_rule_sets()

Value

names lists, names are the ruleset names, values are data.frames featuring the columns GRADING_RULESET, dqi_parameterstub, indicator_metric, dqi_catnum and dqi_cat_1 to ⁠dqi_cat_<dqi_catnum>⁠

See Also

Other summary_functions: prep_combine_report_summaries(), prep_extract_classes_by_functions(), prep_extract_summary(), prep_extract_summary.dataquieR_result(), prep_extract_summary.dataquieR_resultset2(), prep_render_pie_chart_from_summaryclasses_ggplot2(), prep_render_pie_chart_from_summaryclasses_plotly(), prep_summary_to_classes(), util_as_cat(), util_as_integer_cat(), util_extract_indicator_metrics(), util_get_category_for_result(), util_get_colors(), util_get_labels_grading_class(), util_get_message_for_result(), util_get_ruleset_formats(), util_get_thresholds(), util_html_table(), util_sort_by_order()


Get formats for DQ categories

Description

Get formats for DQ categories

Usage

util_get_ruleset_formats()

Value

data.frame columns: categories (e.g., "1" to "5"), color (e.g., "33 102 172", "67 147 195", "227 186 20", "214 96 77", 178 23 43"), label (e.g., "OK", "unclear", "moderate", "important", "critical" )

See Also

Other summary_functions: prep_combine_report_summaries(), prep_extract_classes_by_functions(), prep_extract_summary(), prep_extract_summary.dataquieR_result(), prep_extract_summary.dataquieR_resultset2(), prep_render_pie_chart_from_summaryclasses_ggplot2(), prep_render_pie_chart_from_summaryclasses_plotly(), prep_summary_to_classes(), util_as_cat(), util_as_integer_cat(), util_extract_indicator_metrics(), util_get_category_for_result(), util_get_colors(), util_get_labels_grading_class(), util_get_message_for_result(), util_get_rule_sets(), util_get_thresholds(), util_html_table(), util_sort_by_order()


Get namespace for attributes

Description

Get namespace for attributes

Usage

util_get_storr_att_namespace(my_storr_object)

Arguments

my_storr_object

the storr object

Value

the namespace name


Get the storr object backing a report

Description

Get the storr object backing a report

Usage

util_get_storr_object_from_report(r)

Arguments

r

the dataquieR_resultset2 / report

Value

the storr object holding the results or NULL, if the report lives in the memory, only


Get namespace specifically for summary attributes for speed-up

Description

Get namespace specifically for summary attributes for speed-up

Usage

util_get_storr_summ_namespace(my_storr_object)

Arguments

my_storr_object

the storr object

Value

the namespace name


Get the thresholds for grading

Description

Get the thresholds for grading

Usage

util_get_thresholds(indicator_metric, meta_data)

Arguments

indicator_metric

which indicator metric to be classified

meta_data

the item level metadata

Value

named list (names are VAR_NAMES, values are named vectors of intervals, names in the vectors are the category numbers)

See Also

Other summary_functions: prep_combine_report_summaries(), prep_extract_classes_by_functions(), prep_extract_summary(), prep_extract_summary.dataquieR_result(), prep_extract_summary.dataquieR_resultset2(), prep_render_pie_chart_from_summaryclasses_ggplot2(), prep_render_pie_chart_from_summaryclasses_plotly(), prep_summary_to_classes(), util_as_cat(), util_as_integer_cat(), util_extract_indicator_metrics(), util_get_category_for_result(), util_get_colors(), util_get_labels_grading_class(), util_get_message_for_result(), util_get_rule_sets(), util_get_ruleset_formats(), util_html_table(), util_sort_by_order()


Get variable attributes of a certain provision level

Description

This function returns all variable attribute names of a certain metadata provision level or of more than one level.

Usage

util_get_var_att_names_of_level(level, cumulative = TRUE)

Arguments

level

level(s) of requirement

cumulative

include all names from more basic levels

Value

all matching variable attribute names

See Also

Other metadata_management: util_dist_selection(), util_find_free_missing_code(), util_find_var_by_meta(), util_get_vars_in_segment(), util_looks_like_missing(), util_no_value_labels(), util_validate_known_meta(), util_validate_missing_lists()


Return all variables in the segment segment

Description

Return all variables in the segment segment

Usage

util_get_vars_in_segment(segment, meta_data = "item_level", label_col = LABEL)

Arguments

segment

character the segment as specified in STUDY_SEGMENT

meta_data

data.frame the metadata

label_col

character the metadata attribute used for naming the variables

Value

vector of variable names

See Also

Other metadata_management: util_dist_selection(), util_find_free_missing_code(), util_find_var_by_meta(), util_get_var_att_names_of_level(), util_looks_like_missing(), util_no_value_labels(), util_validate_known_meta(), util_validate_missing_lists()


Get the Table with Known Vocabularies

Description

Get the Table with Known Vocabularies

Usage

util_get_voc_tab(.data_frame_list = .dataframe_environment())

Arguments

.data_frame_list

environment cache for loaded data frames

Value

data.frame the (combined) table with known vocabularies


Add labels to ggplot

Description

EXPERIMENTAL

Usage

util_gg_var_label(
  ...,
  meta_data = get("meta_data", parent.frame()),
  label_col = get("label_col", parent.frame())
)

Arguments

...

EXPERIMENTAL

meta_data

the metadata

label_col

the label columns

Value

a modified ggplot


Utility function to check whether a variable has no grouping variable assigned

Description

Utility function to check whether a variable has no grouping variable assigned

Usage

util_has_no_group_vars(resp_vars, label_col = LABEL, meta_data = "item_level")

Arguments

resp_vars

variable list the name of a measurement variable

label_col

variable attribute the name of the column in the metadata with labels of variables

meta_data

data.frame old name for item_level

Value

boolean


Utility Function Heatmap with 1 Threshold

Description

Function to create heatmap-like plot given one threshold – works for percentages for now.

Usage

util_heatmap_1th(
  df,
  cat_vars,
  values,
  threshold,
  right_intv,
  invert,
  cols,
  strata
)

Arguments

df

data.frame with data to display as a heatmap.

cat_vars

variable list len=1-2. Variables to group by. Up to 2 group levels supported.

values

variable the name of the percentage variable

threshold

numeric lowest acceptable value

right_intv

logical len=1. If FALSE (default), intervals used to define color ranges in the heatmap are closed on the left side, if TRUE on the right side, respectively.

invert

logical len=1. If TRUE, high values are better, warning colors are used for low values. FALSE works vice versa.

cols

deprecated, ignored.

strata

variable optional, the name of a variable used for stratification inheritParams acc_distributions

Value

a list with:

See Also

Other figure_functions: util_optimize_histogram_bins()


If on Windows, hide a file

Description

If on Windows, hide a file

Usage

util_hide_file_windows(fn)

Arguments

fn

the file path + name

Value

invisible(NULL)


Utility function to create histograms

Description

A helper function for simple histograms.

Usage

util_histogram(
  plot_data,
  num_var = colnames(plot_data)[1],
  fill_var = NULL,
  facet_var = NULL,
  nbins_max = 100,
  colors = "#2166AC",
  is_datetime = FALSE
)

Arguments

plot_data

a data.frame without missing values

num_var

column name of the numerical or datetime variable in plot_data (if omitted, the first column is assumed to contain this variable)

fill_var

column name of the categorical variable in plot_data which will be used for coloring stacked histograms

facet_var

column name of the categorical variable in plot_data which will be used to create facets

nbins_max

the maximum number of bins for the histogram (see util_optimize_histogram_bins)

colors

vector of colors, or a single color

is_datetime

if TRUE, the x-axis will be adapted for the datetime format

Value

a histogram


escape ⁠"⁠

Description

escape ⁠"⁠

Usage

util_html_attr_quote_escape(s)

Arguments

s

haystack

Value

s with ⁠"⁠ replaced by ⁠&quot;⁠


Create a dynamic dimension related page for the report

Description

Create a dynamic dimension related page for the report

Usage

util_html_for_dims(
  report,
  use_plot_ly,
  template,
  block_load_factor,
  repsum,
  dir
)

Arguments

report

dataquieR_resultset2 a dq_report2 report

use_plot_ly

logical use plotly, if available.

template

character template to use for the dq_report2 report.

block_load_factor

numeric multiply size of parallel compute blocks by this factor.

repsum

the dataquieR summary, see summary() and dq_report2()

dir

character output directory for potential iframes.

Value

list of arguments for append_single_page() defined locally in util_generate_pages_from_report().


Create a dynamic single variable page for the report

Description

Create a dynamic single variable page for the report

Usage

util_html_for_var(
  report,
  cur_var,
  use_plot_ly,
  template,
  note_meta = c(),
  rendered_repsum,
  dir
)

Arguments

report

dataquieR_resultset2 a dq_report2 report

cur_var

character variable name for single variable pages

use_plot_ly

logical use plotly, if available.

template

character template to use for the dq_report2 report.

note_meta

character notes on the metadata for a single variable (if needed)

rendered_repsum

the dataquieR summary, see summary(), dq_report2() and print.dataquieR_summary()

dir

character output directory for potential iframes.

Value

list of arguments for append_single_page() defined locally in util_generate_pages_from_report().


The jack of all trades device for tables

Description

The jack of all trades device for tables

Usage

util_html_table(
  tb,
  filter = "top",
  columnDefs = NULL,
  autoWidth = FALSE,
  hideCols = character(0),
  rowCallback = DT::JS("function(r,d) {$(r).attr('height', '2em')}"),
  copy_row_names_to_column = !is.null(tb) && length(rownames(tb)) == nrow(tb) &&
    !is.integer(attr(tb, "row.names")) && !all(seq_len(nrow(tb)) == rownames(tb)),
  link_variables = TRUE,
  tb_rownames = FALSE,
  meta_data,
  rotate_headers = FALSE,
  fillContainer = TRUE,
  ...,
  colnames,
  descs,
  options = list(),
  is_matrix_table = FALSE,
  colnames_aliases2acronyms = is_matrix_table && !cols_are_indicatormetrics,
  cols_are_indicatormetrics = FALSE,
  label_col = LABEL,
  output_format = c("RMD", "HTML"),
  dl_fn = "*",
  rotate_for_one_row = FALSE,
  title = dl_fn,
  messageTop = NULL,
  messageBottom = NULL,
  col_tags = NULL,
  searchBuilder = FALSE,
  initial_col_tag,
  init_search,
  additional_init_args,
  additional_columnDefs
)

Arguments

tb

the table as data.frame

filter

passed to DT::datatable

columnDefs

column specifications for the datatables JavaScript object

autoWidth

passed to the datatables JavaScript library

hideCols

columns to hide (by name)

rowCallback

passed to the datatables JavaScript library (with default)

copy_row_names_to_column

add a column 0 with rownames

link_variables

considering row names being variables, convert row names to links to the variable specific reports

tb_rownames

number of columns from the left considered as row-names

meta_data

the data dictionary for labels and similar stuff

rotate_headers

rotate headers by 90 degrees

fillContainer

see DT::datatable

...

passed to DT::datatable

colnames

column names for the table (defaults to colnames(tb))

descs

character descriptions of the columns for the hover-box shown for the column names, if not missing, this overrides the existing description stuff from known column names. If you have an attribute "description" of the tb, then it overwrites everything and appears as hover text

options

individually overwrites defaults in options passed to DT::datatable

is_matrix_table

create a heat map like table without padding

colnames_aliases2acronyms

abbreviate column names considering being analysis matrix columns by their acronyms defined in square.

cols_are_indicatormetrics

logical cannot be TRUE, colnames_aliases2acronyms is TRUE. cols_are_indicatormetrics controls, if the columns are really function calls or, if cols_are_indicatormetrics has been set to TRUE, the columns are indicator metrics.

label_col

label col used for mapping labels in case of link_variables is used (that argument set to TRUE and Variables or VAR_NAMES in meta_data)

output_format

target format RMD or HTML, for RMD, markdown will be used in the output, for HTML, only HTML code is being generated

dl_fn

file name for downloaded table – see https://datatables.net/reference/button/excel

rotate_for_one_row

logical rotate one-row-tables

title

character title for download formats, see https://datatables.net/extensions/buttons/examples/html5/titleMessage.html

messageTop

character subtitle for download formats, see https://datatables.net/extensions/buttons/examples/html5/titleMessage.html

messageBottom

character footer for download formats, see https://datatables.net/extensions/buttons/examples/html5/titleMessage.html

col_tags

list if not NULL, a named list(), names are names used to name a newly created column-group hide/show button, elements are column names belonging to each column groups as defined by colnames

searchBuilder

logical if TRUE, display a searchBuilder-Button.

initial_col_tag

character col_tags entry to activate initially

init_search

list object to initialize searchBuilder, see datatables.net

additional_init_args

list if not missing or NULL, arguments passed to JavaScript, if searchBuilder == TRUE

additional_columnDefs

list additional columnDefs, can be missing or NULL

Value

the table to be added to an rmdhtml file as htmlwidgets::htmlwidgets

See Also

util_formattable()

Other summary_functions: prep_combine_report_summaries(), prep_extract_classes_by_functions(), prep_extract_summary(), prep_extract_summary.dataquieR_result(), prep_extract_summary.dataquieR_resultset2(), prep_render_pie_chart_from_summaryclasses_ggplot2(), prep_render_pie_chart_from_summaryclasses_plotly(), prep_summary_to_classes(), util_as_cat(), util_as_integer_cat(), util_extract_indicator_metrics(), util_get_category_for_result(), util_get_colors(), util_get_labels_grading_class(), util_get_message_for_result(), util_get_rule_sets(), util_get_ruleset_formats(), util_get_thresholds(), util_sort_by_order()


utility function for the outliers rule of Hubert and Vandervieren 2008

Description

function to calculate outliers according to the rule of Huber et al. This function requires the package robustbase

Usage

util_hubert(x)

Arguments

x

numeric data to check for outliers

Value

binary vector

See Also

Other outlier_functions: util_3SD(), util_sigmagap(), util_tukey()


Make it scalable, if it is a figure

Description

this function writes figures to helper files and embeds these in a returned object which is a scalable iframe. it does not change other objects in it.

Usage

util_iframe_it_if_needed(it, dir, nm, fkt, sizing_hints, ggthumb)

Arguments

it

htmltools::tag() compatible object

dir

character output directory for potential iframe.

nm

character name for the iframed file, if one is created

fkt

character function name of the indicator function that created ìt.

sizing_hints

object additional metadata about the natural figure size

ggthumb

ggplot2::ggplot() optional, underlying ggplot2 object for a preview

Value

htmltools::tag() compatible object, maybe now in an iframe


Extract all properties of a ReportSummaryTable

Description

Extract all properties of a ReportSummaryTable

Usage

util_init_respum_tab(x)

Arguments

x

ReportSummaryTable object

Value

list with all properties


Integer breaks for ggplot2

Description

creates integer-only breaks

Usage

util_int_breaks_rounded(x, n = 5)

Arguments

x

the values

n

integer giving the desired number of intervals. Non-integer values are rounded down.

Value

breaks suitable for ⁠scale_*_continuous⁠' breaks argument

Author(s)

Sarah

See Also

StackOverflow

Examples

## Not run: 
big_numbers1 <- data.frame(x = 1:5, y = c(0:1, 0, 1, 0))
big_numbers2 <- data.frame(x = 1:5, y = c(0:1, 0, 1, 0) + 1000000)

big_numbers_plot1 <- ggplot(big_numbers1, aes(x = x, y = y)) +
  geom_point()

big_numbers_plot2 <- ggplot(big_numbers2, aes(x = x, y = y)) +
  geom_point()

big_numbers_plot1 + scale_y_continuous()
big_numbers_plot1 + scale_y_continuous(breaks = util_int_breaks_rounded)

big_numbers_plot2 + scale_y_continuous()
big_numbers_plot2 + scale_y_continuous(breaks = util_int_breaks_rounded)

## End(Not run)


Check for duplicated content

Description

This function tests for duplicates entries in the data set. It is possible to check duplicated entries by study segments or to consider only selected segments.

Usage

util_int_duplicate_content_dataframe(
  level = c("dataframe"),
  identifier_name_list,
  id_vars_list,
  unique_rows,
  meta_data_dataframe = "dataframe_level",
  ...,
  dataframe_level
)

Arguments

level

character a character vector indicating whether the assessment should be conducted at the study level (level = "dataframe") or at the segment level (level = "segment").

identifier_name_list

vector the vector that contains the name of the identifier to be used in the assessment. For the study level, corresponds to the names of the different data frames. For the segment level, indicates the name of the segments.

id_vars_list

list the list containing the identifier variables names to be used in the assessment.

unique_rows

vector named. for each data frame, either true/false or no_id to exclude ID variables from check

meta_data_dataframe

data.frame the data frame that contains the metadata for the data frame level

...

Not used.

dataframe_level

data.frame alias for meta_data_dataframe

Value

a list with

See Also

Other integrity_indicator_functions: util_int_duplicate_content_segment(), util_int_duplicate_ids_dataframe(), util_int_duplicate_ids_segment(), util_int_unexp_records_set_dataframe(), util_int_unexp_records_set_segment()


Check for duplicated content

Description

This function tests for duplicates entries in the data set. It is possible to check duplicated entries by study segments or to consider only selected segments.

Usage

util_int_duplicate_content_segment(
  level = c("segment"),
  identifier_name_list,
  id_vars_list,
  unique_rows,
  study_data,
  meta_data,
  meta_data_segment = "segment_level",
  segment_level
)

Arguments

level

character a character vector indicating whether the assessment should be conducted at the study level (level = "dataframe") or at the segment level (level = "segment").

identifier_name_list

vector the vector that contains the name of the identifier to be used in the assessment. For the study level, corresponds to the names of the different data frames. For the segment level, indicates the name of the segments.

id_vars_list

list the list containing the identifier variables names to be used in the assessment.

unique_rows

vector named. for each segment, either true/false or no_id to exclude ID variables from check

study_data

data.frame the data frame that contains the measurements, mandatory.

meta_data

data.frame the data frame that contains metadata attributes of the study data, mandatory.

meta_data_segment

data.frame – optional: Segment level metadata

segment_level

data.frame alias for meta_data_segment

Value

a list with

See Also

Other integrity_indicator_functions: util_int_duplicate_content_dataframe(), util_int_duplicate_ids_dataframe(), util_int_duplicate_ids_segment(), util_int_unexp_records_set_dataframe(), util_int_unexp_records_set_segment()


Check for duplicated IDs

Description

This function tests for duplicates entries in identifiers. It is possible to check duplicated identifiers by study segments or to consider only selected segments.

Usage

util_int_duplicate_ids_dataframe(
  level = c("dataframe"),
  id_vars_list,
  identifier_name_list,
  repetitions,
  meta_data_dataframe = "dataframe_level",
  ...,
  dataframe_level
)

Arguments

level

character a character vector indicating whether the assessment should be conducted at the study level (level = "dataframe") or at the segment level (level = "segment").

id_vars_list

list id variable names for each segment or data frame

identifier_name_list

vector the segments or data frame names being assessed

repetitions

vector an integer vector indicating the number of allowed repetitions in the id_vars.

meta_data_dataframe

data.frame the data frame that contains the metadata for the data frame level

...

not used.

dataframe_level

data.frame alias for meta_data_dataframe

Value

a list with

See Also

Other integrity_indicator_functions: util_int_duplicate_content_dataframe(), util_int_duplicate_content_segment(), util_int_duplicate_ids_segment(), util_int_unexp_records_set_dataframe(), util_int_unexp_records_set_segment()


Check for duplicated IDs

Description

This function tests for duplicates entries in identifiers. It is possible to check duplicated identifiers by study segments or to consider only selected segments.

Usage

util_int_duplicate_ids_segment(
  level = c("segment"),
  id_vars_list,
  study_segment,
  repetitions,
  study_data,
  meta_data,
  meta_data_segment = "segment_level",
  segment_level
)

Arguments

level

character a character vector indicating whether the assessment should be conducted at the study level (level = "dataframe") or at the segment level (level = "segment").

id_vars_list

list id variable names for each segment or data frame

study_segment

vector the segments or data frame names being assessed

repetitions

vector an integer vector indicating the number of allowed repetitions in the id_vars. Currently, no repetitions are supported.

study_data

data.frame the data frame that contains the measurements, mandatory.

meta_data

data.frame the data frame that contains metadata attributes of the study data, mandatory.

meta_data_segment

data.frame – optional: Segment level metadata

segment_level

data.frame alias for meta_data_segment

Value

a list with

See Also

Other integrity_indicator_functions: util_int_duplicate_content_dataframe(), util_int_duplicate_content_segment(), util_int_duplicate_ids_dataframe(), util_int_unexp_records_set_dataframe(), util_int_unexp_records_set_segment()


Check for unexpected data record set

Description

This function tests that the identifiers match a provided record set. It is possible to check for unexpected data record sets by study segments or to consider only selected segments.

Usage

util_int_unexp_records_set_dataframe(
  level = c("dataframe"),
  id_vars_list,
  identifier_name_list,
  valid_id_table_list,
  meta_data_record_check_list,
  meta_data_dataframe = "dataframe_level",
  ...,
  dataframe_level
)

Arguments

level

character a character vector indicating whether the assessment should be conducted at the study level (level = "dataframe") or at the segment level (level = "segment").

id_vars_list

list the list containing the identifier variables names to be used in the assessment.

identifier_name_list

list the list that contains the name of the identifier to be used in the assessment. For the study level, corresponds to the names of the different data frames. For the segment level, indicates the name of the segments.

valid_id_table_list

list the reference list with the identifier variable values.

meta_data_record_check_list

character a character vector indicating the type of check to conduct, either "subset" or "exact".

meta_data_dataframe

data.frame the data frame that contains the metadata for the data frame level

...

not used

dataframe_level

data.frame alias for meta_data_dataframe

Value

a list with

See Also

Other integrity_indicator_functions: util_int_duplicate_content_dataframe(), util_int_duplicate_content_segment(), util_int_duplicate_ids_dataframe(), util_int_duplicate_ids_segment(), util_int_unexp_records_set_segment()


Check for unexpected data record set

Description

This function tests that the identifiers match a provided record set. It is possible to check for unexpected data record sets by study segments or to consider only selected segments.

Usage

util_int_unexp_records_set_segment(
  level = c("segment"),
  id_vars_list,
  identifier_name_list,
  valid_id_table_list,
  meta_data_record_check_list,
  study_data,
  label_col,
  meta_data,
  item_level,
  meta_data_segment = "segment_level",
  segment_level
)

Arguments

level

character a character vector indicating whether the assessment should be conducted at the study level (level = "dataframe") or at the segment level (level = "segment").

id_vars_list

list the list containing the identifier variables names to be used in the assessment.

identifier_name_list

list the list that contains the name of the identifier to be used in the assessment. For the study level, corresponds to the names of the different data frames. For the segment level, indicates the name of the segments.

valid_id_table_list

list the reference list with the identifier variable values.

meta_data_record_check_list

character a character vector indicating the type of check to conduct, either "subset" or "exact".

study_data

data.frame the data frame that contains the measurements, mandatory.

label_col

variable attribute the name of the column in the metadata with labels of variables

meta_data

data.frame the data frame that contains metadata attributes of the study data, mandatory.

item_level

data.frame the data frame that contains metadata attributes of study data

meta_data_segment

data.frame – optional: Segment level metadata

segment_level

data.frame alias for meta_data_segment

Value

a list with

See Also

Other integrity_indicator_functions: util_int_duplicate_content_dataframe(), util_int_duplicate_content_segment(), util_int_duplicate_ids_dataframe(), util_int_duplicate_ids_segment(), util_int_unexp_records_set_dataframe()


Utility function to interpret mathematical interval notation

Description

Utility function to split limit definitions into interpretable elements

Usage

util_interpret_limits(mdata)

Arguments

mdata

data.frame the data frame that contains metadata attributes of study data

Value

augments metadata by interpretable limit columns

See Also

util_validate_known_meta

Other parser_functions: util_parse_assignments(), util_parse_interval(), util_parse_redcap_rule()


Check for integer values

Description

This function checks if a variable is integer.

Usage

util_is_integer(x, tol = .Machine$double.eps^0.5)

Arguments

x

the object to test

tol

precision of the detection. Values deviating more than tol from their closest integer value will not be deemed integer.

Value

TRUE or FALSE

See Also

is.integer

Copied from the documentation of is.integer

is.integer detects, if the storage mode of an R-object is integer. Usually, users want to know, if the values are integer. As suggested by is.integer's documentation, is.wholenumber does so.

Other robustness_functions: util_as_valid_missing_codes(), util_check_one_unique_value(), util_correct_variable_use(), util_empty(), util_ensure_character(), util_ensure_in(), util_ensure_suggested(), util_expect_scalar(), util_fix_rstudio_bugs(), util_is_numeric_in(), util_is_valid_missing_codes(), util_match_arg(), util_observations_in_subgroups(), util_stop_if_not(), util_warn_unordered()


Detect falsish values

Description

Detect falsish values

Usage

util_is_na_0_empty_or_false(x)

Arguments

x

a value/vector of values

Value

vector of logical values: TRUE, wherever x is somehow empty

See Also

Other missing_functions: util_all_intro_vars_for_rv(), util_count_expected_observations(), util_filter_missing_list_table_for_rv(), util_get_code_list(), util_observation_expected(), util_remove_empty_rows(), util_replace_codes_by_NA()


Create a predicate function to check for certain numeric properties

Description

useful, e.g., for util_expect_data_frame and util_expect_scalar. The generated function returns on TRUE or FALSE, even if called with a vector.

Usage

util_is_numeric_in(
  min = -Inf,
  max = +Inf,
  whole_num = FALSE,
  finite = FALSE,
  set = NULL
)

Arguments

min

if given, minimum for numeric values

max

if given, maximum for numeric values

whole_num

if TRUE, expect a whole number

finite

Are Inf and -Inf invalid values? (FALSE by default)

set

if given, a set, the value must be in (see util_match_arg)

Value

a function that checks an x for the properties.

See Also

Other robustness_functions: util_as_valid_missing_codes(), util_check_one_unique_value(), util_correct_variable_use(), util_empty(), util_ensure_character(), util_ensure_in(), util_ensure_suggested(), util_expect_scalar(), util_fix_rstudio_bugs(), util_is_integer(), util_is_valid_missing_codes(), util_match_arg(), util_observations_in_subgroups(), util_stop_if_not(), util_warn_unordered()

Examples

## Not run: 
util_is_numeric_in(min = 0)(42)
util_is_numeric_in(min = 43)(42)
util_is_numeric_in(max = 3)(42)
util_is_numeric_in(whole_num = TRUE)(42)
util_is_numeric_in(whole_num = TRUE)(42.1)
util_is_numeric_in(set = c(1, 3, 5))(1)
util_is_numeric_in(set = c(1, 3, 5))(2)

## End(Not run)


Detect un-disclosed ggplot

Description

Detect un-disclosed ggplot

Usage

util_is_svg_object(x)

Arguments

x

the object to check

Value

TRUE or FALSE


Check, if x is a try-error

Description

Check, if x is a try-error

Usage

util_is_try_error(x)

Arguments

x

Value

logical() if it is a try-error


Check, if x contains valid missing codes

Description

Check, if x contains valid missing codes

Usage

util_is_valid_missing_codes(x)

Arguments

x

a vector of values

Value

TRUE or FALSE

See Also

Other robustness_functions: util_as_valid_missing_codes(), util_check_one_unique_value(), util_correct_variable_use(), util_empty(), util_ensure_character(), util_ensure_in(), util_ensure_suggested(), util_expect_scalar(), util_fix_rstudio_bugs(), util_is_integer(), util_is_numeric_in(), util_match_arg(), util_observations_in_subgroups(), util_stop_if_not(), util_warn_unordered()


being called by the active binding function for .manual

Description

being called by the active binding function for .manual

Usage

util_load_manual(
  rebuild = FALSE,
  target = "inst/manual.RData",
  target2 = "inst/indicator_or_descriptor.RData",
  man_hash = ""
)

Arguments

rebuild

rebuild the cache

target

file for ..manual

target2

file for ..indicator_or_descriptor

man_hash

internal use: hash-sum over the manual to prevent rebuild if not changed.

See Also

Other reporting_functions: util_alias2caption(), util_copy_all_deps(), util_create_page_file(), util_eval_to_dataquieR_result(), util_evaluate_calls(), util_float_index_menu(), util_generate_anchor_link(), util_generate_anchor_tag(), util_generate_calls(), util_generate_calls_for_function(), util_make_data_slot_from_table_slot(), util_order_by_order(), util_set_size()


Check for repetitive values using the digits 8 or 9 only

Description

Values not being finite (see is.finite) are also reported as missing codes. Also, all missing codes must be composed out of the digits 8 and 9 and they must be the largest values of a variable.

Usage

util_looks_like_missing(x, n_rules = 1)

Arguments

x

numeric vector to test

n_rules

numeric Only outlying values can be missing codes; at least n_rules rules in acc_univariate_outlier match

Value

logical indicates for each value in x, if it looks like a missing code

See Also

acc_univariate_outlier

Other metadata_management: util_dist_selection(), util_find_free_missing_code(), util_find_var_by_meta(), util_get_var_att_names_of_level(), util_get_vars_in_segment(), util_no_value_labels(), util_validate_known_meta(), util_validate_missing_lists()


Rename columns of a SummaryTable (or Segment, ...) to look nice

Description

Rename columns of a SummaryTable (or Segment, ...) to look nice

Usage

util_make_data_slot_from_table_slot(Table)

Arguments

Table

data.frame, a table

Value

renamed table

See Also

Other reporting_functions: util_alias2caption(), util_copy_all_deps(), util_create_page_file(), util_eval_to_dataquieR_result(), util_evaluate_calls(), util_float_index_menu(), util_generate_anchor_link(), util_generate_anchor_tag(), util_generate_calls(), util_generate_calls_for_function(), util_load_manual(), util_order_by_order(), util_set_size()


Maps label column metadata on study data variable names

Description

Maps a certain label column from the metadata to the study data frame.

Usage

util_map_all(label_col = VAR_NAMES, study_data, meta_data)

Arguments

label_col

the variable of the metadata that contains the variable names of the study data

study_data

the name of the data frame that contains the measurements

meta_data

the name of the data frame that contains metadata attributes of study data

Value

list with slot df with a study data frame with mapped column names

See Also

Other mapping: util_map_by_largest_prefix(), util_map_labels(), util_recode()


Map based on largest common prefix

Description

Map based on largest common prefix

Usage

util_map_by_largest_prefix(
  needle,
  haystack,
  split_char = "_",
  remove_var_suffix = TRUE
)

Arguments

needle

character(1) item to search

haystack

character items to find the entry sharing the largest prefix with needle

split_char

character(1) to split entries to atomic words (like letters, if "" or snake_elements, if "_")

remove_var_suffix

logical(1) remove potential suffix after the first dot ., before finding needle in haystack.

Value

character(1) with the fitting function name or NA_character_

See Also

Other mapping: util_map_all(), util_map_labels(), util_recode()

Examples

## Not run:  # internal function
util_map_by_largest_prefix(
  "acc_distributions_loc_ecdf_observer_time",
  names(dataquieR:::.manual$titles)
)
util_map_by_largest_prefix(
  "acc_distributions_loc_observer_time",
  names(dataquieR:::.manual$titles)
)
util_map_by_largest_prefix(
  "acc_distributions_loc_ecdf",
  names(dataquieR:::.manual$titles)
)
util_map_by_largest_prefix(
  "acc_distributions_loc",
  names(dataquieR:::.manual$titles)
)

## End(Not run)


Support function to allocate labels to variables

Description

Map variables to certain attributes, e.g. by default their labels.

Usage

util_map_labels(
  x,
  meta_data = "item_level",
  to = LABEL,
  from = VAR_NAMES,
  ifnotfound,
  warn_ambiguous = FALSE
)

Arguments

x

character variable names, character vector, see parameter from

meta_data

data.frame old name for item_level

to

character variable attribute to map to

from

character variable identifier to map from

ifnotfound

list A list of values to be used if the item is not found: it will be coerced to a list if necessary.

warn_ambiguous

logical print a warning if mapping variables from from to to produces ambiguous identifiers.

Details

This function basically calls colnames(study_data) <- meta_data$LABEL, ensuring correct merging/joining of study data columns to the corresponding metadata rows, even if the orders differ. If a variable/study_data-column name is not found in meta_data[[from]] (default from = VAR_NAMES), either stop is called or, if ifnotfound has been assigned a value, that value is returned. See mget, which is internally used by this function.

The function not only maps to the LABEL column, but to can be any metadata variable attribute, so the function can also be used, to get, e.g. all HARD_LIMITS from the metadata.

Value

a character vector with:

See Also

Other mapping: util_map_all(), util_map_by_largest_prefix(), util_recode()

Examples

## Not run: 
meta_data <- prep_create_meta(
  VAR_NAMES = c("ID", "SEX", "AGE", "DOE"),
  LABEL = c("Pseudo-ID", "Gender", "Age", "Examination Date"),
  DATA_TYPE = c(DATA_TYPES$INTEGER, DATA_TYPES$INTEGER, DATA_TYPES$INTEGER,
                 DATA_TYPES$DATETIME),
  MISSING_LIST = ""
)
stopifnot(all(prep_map_labels(c("AGE", "DOE"), meta_data) == c("Age",
                                                 "Examination Date")))

## End(Not run)

Utility function to create a margins plot for binary variables

Description

Utility function to create a margins plot for binary variables

Usage

util_margins_bin(
  resp_vars = NULL,
  group_vars = NULL,
  co_vars = NULL,
  threshold_type = NULL,
  threshold_value,
  min_obs_in_subgroup = 5,
  min_obs_in_cat = 5,
  caption = NULL,
  ds1,
  label_col,
  adjusted_hint = "",
  title = "",
  sort_group_var_levels = getOption("dataquieR.acc_margins_sort",
    dataquieR.acc_margins_sort_default),
  include_numbers_in_figures = getOption("dataquieR.acc_margins_num",
    dataquieR.acc_margins_num_default)
)

Arguments

resp_vars

variable the name of the binary measurement variable

group_vars

variable the name of the observer, device or reader variable

co_vars

variable list a vector of covariables, e.g. age and sex for adjustment

threshold_type

enum empirical | user | none. See acc_margins.

threshold_value

numeric see acc_margins

min_obs_in_subgroup

integer from=0. This optional argument specifies the minimum number of observations that is required to include a subgroup (level) of the group_var in the analysis.

min_obs_in_cat

integer This optional argument specifies the minimum number of observations that is required to include a category (level) of the outcome (resp_vars) in the analysis.

caption

string a caption for the plot (optional, typically used to report the coding of cases and control group)

ds1

data.frame the data frame that contains the measurements, after replacing missing value codes by NA, excluding inadmissible values and transforming categorical variables to factors.

label_col

variable attribute the name of the column in the metadata with labels of variables

adjusted_hint

character hint, if adjusted for co_vars

title

character title for the plot

sort_group_var_levels

logical Should the levels of the grouping variable be sorted descending by the number of observations (in the figure)?

include_numbers_in_figures

logical Should the figure report the number of observations for each level of the grouping variable?

Value

A table and a matching plot.


Utility function to create a margins plot from linear regression models

Description

Utility function to create a margins plot from linear regression models

Usage

util_margins_lm(
  resp_vars = NULL,
  group_vars = NULL,
  co_vars = NULL,
  threshold_type = NULL,
  threshold_value,
  min_obs_in_subgroup = 5,
  ds1,
  label_col,
  levels = NULL,
  adjusted_hint = "",
  title = "",
  n_violin_max = getOption("dataquieR.max_group_var_levels_with_violins",
    dataquieR.max_group_var_levels_with_violins_default),
  sort_group_var_levels = getOption("dataquieR.acc_margins_sort",
    dataquieR.acc_margins_sort_default),
  include_numbers_in_figures = getOption("dataquieR.acc_margins_num",
    dataquieR.acc_margins_num_default)
)

Arguments

resp_vars

variable the name of the measurement variable

group_vars

variable the name of the observer, device or reader variable

co_vars

variable list a vector of covariables, e.g. age and sex for adjustment

threshold_type

enum empirical | user | none. See acc_margins.

threshold_value

numeric see acc_margins

min_obs_in_subgroup

integer from=0. This optional argument specifies the minimum number of observations that is required to include a subgroup (level) of the group_var in the analysis.

ds1

data.frame the data frame that contains the measurements, after replacing missing value codes by NA, excluding inadmissible values and transforming categorical variables to factors.

label_col

variable attribute the name of the column in the metadata with labels of variables

levels

levels() of the original ordinal variable, if applicable. Used for axis tick labels.

adjusted_hint

character hint, if adjusted for co_vars

title

character title for the plot

n_violin_max

integer from=0. This optional argument specifies the maximum number of levels of the group_var for which violin plots will be shown in the figure.

sort_group_var_levels

logical Should the levels of the grouping variable be sorted descending by the number of observations (in the figure)?

include_numbers_in_figures

logical Should the figure report the number of observations for each level of the grouping variable?

Value

A table and a matching plot.


Utility function to create a plot similar to the margins plots for nominal variables

Description

This function is still under development. It uses the nnet package to compute multinomial logistic regression models.

Usage

util_margins_nom(
  resp_vars = NULL,
  group_vars = NULL,
  co_vars = NULL,
  min_obs_in_subgroup = 5,
  min_obs_in_cat = 5,
  ds1,
  label_col,
  adjusted_hint = "",
  title = "",
  sort_group_var_levels = getOption("dataquieR.acc_margins_sort",
    dataquieR.acc_margins_sort_default)
)

Arguments

resp_vars

variable the name of the nominal measurement variable

group_vars

variable the name of the observer, device or reader variable

co_vars

variable list a vector of covariables, e.g. age and sex for adjustment

min_obs_in_subgroup

integer from=0. This optional argument specifies the minimum number of observations that is required to include a subgroup (level) of the group_var in the analysis.

min_obs_in_cat

integer This optional argument specifies the minimum number of observations that is required to include a category (level) of the outcome (resp_vars) in the analysis.

ds1

data.frame the data frame that contains the measurements, after replacing missing value codes by NA, excluding inadmissible values and transforming categorical variables to factors.

label_col

variable attribute the name of the column in the metadata with labels of variables

adjusted_hint

character hint, if adjusted for co_vars

title

character title for the plot

sort_group_var_levels

logical Should the levels of the grouping variable be sorted descending by the number of observations (in the figure)?

Value

A table and a matching plot.


Utility function to create a plot similar to the margins plots for ordinal variables

Description

This function is still under development. It uses the ordinal package to compute ordered regression models.

Usage

util_margins_ord(
  resp_vars = NULL,
  group_vars = NULL,
  co_vars = NULL,
  min_obs_in_subgroup = 5,
  min_subgroups = 5,
  ds1,
  label_col,
  adjusted_hint = "",
  title = "",
  sort_group_var_levels = getOption("dataquieR.acc_margins_sort",
    dataquieR.acc_margins_sort_default)
)

Arguments

resp_vars

variable the name of the ordinal measurement variable

group_vars

variable the name of the observer, device or reader variable

co_vars

variable list a vector of covariables, e.g. age and sex for adjustment

min_obs_in_subgroup

integer from=0. This optional argument specifies the minimum number of observations that is required to include a subgroup (level) of the group_var in the analysis.

min_subgroups

integer from=3. The model provided by the ordinal package requires at least three different subgroups (levels) of the group_var. Users might want to increase this threshold to obtain results only for variables with a sufficient number of group_var levels (observers, devices, etc.).

ds1

data.frame the data frame that contains the measurements, after replacing missing value codes by NA, excluding inadmissible values and transforming categorical variables to factors.

label_col

variable attribute the name of the column in the metadata with labels of variables

adjusted_hint

character hint, if adjusted for co_vars

title

character title for the plot

sort_group_var_levels

logical Should the levels of the grouping variable be sorted descending by the number of observations (in the figure)?

Value

A table and a matching plot.


Utility function to create a margins plot from Poisson regression models

Description

Utility function to create a margins plot from Poisson regression models

Usage

util_margins_poi(
  resp_vars = NULL,
  group_vars = NULL,
  co_vars = NULL,
  threshold_type = NULL,
  threshold_value,
  min_obs_in_subgroup = 5,
  ds1,
  label_col,
  adjusted_hint = "",
  title = "",
  sort_group_var_levels = getOption("dataquieR.acc_margins_sort",
    dataquieR.acc_margins_sort_default),
  include_numbers_in_figures = getOption("dataquieR.acc_margins_num",
    dataquieR.acc_margins_num_default)
)

Arguments

resp_vars

variable the name of the measurement variable

group_vars

variable the name of the observer, device or reader variable

co_vars

variable list a vector of covariables, e.g. age and sex for adjustment

threshold_type

enum empirical | user | none. See acc_margins.

threshold_value

numeric see acc_margins

min_obs_in_subgroup

integer from=0. This optional argument specifies the minimum number of observations that is required to include a subgroup (level) of the group_var in the analysis.

ds1

data.frame the data frame that contains the measurements, after replacing missing value codes by NA, excluding inadmissible values and transforming categorical variables to factors.

label_col

variable attribute the name of the column in the metadata with labels of variables

adjusted_hint

character hint, if adjusted for co_vars

title

character title for the plot

sort_group_var_levels

logical Should the levels of the grouping variable be sorted descending by the number of observations (in the figure)?

include_numbers_in_figures

logical Should the figure report the number of observations for each level of the grouping variable?

Value

A table and a matching plot.


dataquieR version of match.arg

Description

does not support partial matching, but will display the most likely match as a warning/error.

Usage

util_match_arg(arg, choices, several_ok = FALSE, error = TRUE)

Arguments

arg

the argument

choices

the choices

several_ok

allow more than one entry in arg

error

stop(), if arg is not in choices (warns and cleans arg, otherwise)

Value

"cleaned" arg

See Also

Other robustness_functions: util_as_valid_missing_codes(), util_check_one_unique_value(), util_correct_variable_use(), util_empty(), util_ensure_character(), util_ensure_in(), util_ensure_suggested(), util_expect_scalar(), util_fix_rstudio_bugs(), util_is_integer(), util_is_numeric_in(), util_is_valid_missing_codes(), util_observations_in_subgroups(), util_stop_if_not(), util_warn_unordered()


Combine data frames by merging

Description

This is an extension of merge working for a list of data frames.

Usage

util_merge_data_frame_list(data_frames, id_vars)

Arguments

data_frames

list of data.frames

id_vars

character the variable(s) to merge the data frames by. each of them must exist in all data frames.

Value

data.frame combination of data frames

See Also

prep_merge_study_data

Other data_management: util_assign_levlabs(), util_check_data_type(), util_check_group_levels(), util_compare_meta_with_study(), util_dichotomize(), util_fix_merge_dups(), util_rbind(), util_remove_na_records(), util_replace_hard_limit_violations(), util_round_to_decimal_places(), util_study_var2factor(), util_table_of_vct()


Produce a condition message with a useful short stack trace.

Description

Produce a condition message with a useful short stack trace.

Usage

util_message(
  m,
  ...,
  applicability_problem = NA,
  intrinsic_applicability_problem = NA,
  integrity_indicator = "none",
  level = 0,
  immediate,
  title = "",
  additional_classes = c()
)

Arguments

m

a message or a condition

...

arguments for sprintf on m, if m is a character

applicability_problem

logical TRUE, if an applicability issue, that is, the information for computation is missing (that is, an error that indicates missing metadata) or an error because the requirements of the stopped function were not met, e.g., a barplot was called for metric data. We can have logical or empirical applicability problems. empirical is the default, if the argument intrinsic_applicability_problem is left unset or set to FALSE.

intrinsic_applicability_problem

logical TRUE, if this is a logical applicability issue, that is, the computation makes no sense (for example, an error of unsuitable resp_vars). Intrinsic/logical applicability problems are also applicability problems. Non-logical applicability problems are called empirical applicability problems.

integrity_indicator

character the message is an integrity problem, here is the indicator abbreviation..

level

integer level of the message (defaults to 0). Higher levels are more severe.

immediate

logical not used.

additional_classes

character additional classes the thrown condition object should inherit from, first.

Value

condition the condition object, if the execution is not stopped

See Also

Other condition_functions: util_condition_constructor_factory(), util_deparse1(), util_error(), util_find_external_functions_in_stacktrace(), util_find_first_externally_called_functions_in_stacktrace(), util_find_indicator_function_in_callers(), util_suppress_warnings(), util_warning()


Select really numeric variables

Description

Reduce resp_vars to those, which are either float or integer without VALUE_LABELS, i.e. likely numeric but not a factor

Usage

util_no_value_labels(resp_vars, meta_data, label_col, warn = TRUE, stop = TRUE)

Arguments

resp_vars

variable list len=1-2. the name of the continuous measurement variable

meta_data

data.frame the data frame that contains metadata attributes of study data

label_col

variable attribute the name of the column in the metadata with labels of variables

warn

logical warn about removed variable names

stop

logical stop on no matching resp_var

Value

character vector of matching resp_vars.

See Also

Other metadata_management: util_dist_selection(), util_find_free_missing_code(), util_find_var_by_meta(), util_get_var_att_names_of_level(), util_get_vars_in_segment(), util_looks_like_missing(), util_validate_known_meta(), util_validate_missing_lists()


Distribute CODE_LIST_TABLE in item level metadata

Description

fills the columns MISSING_LIST_TABLE and VALUE_LABEL_TABLE from CODE_LIST_TABLE, if applicable

Usage

util_normalize_clt(meta_data)

Arguments

meta_data

data.frame old name for item_level

Value

meta_data, but CODE_LIST_TABLE column is distributed to the columns VALUE_LABEL_TABLE and MISSING_LIST_TABLE, respectively.


Normalize and check cross-item-level metadata

Description

Normalize and check cross-item-level metadata

Usage

util_normalize_cross_item(
  meta_data = "item_level",
  meta_data_cross_item = "cross-item_level",
  label_col = LABEL
)

Arguments

meta_data

meta_data

meta_data_cross_item

cross-item-level metadata

label_col

character label column to use for variable naming

Value

normalized and checked cross-item-level metadata

See Also

meta_data_cross()

Other meta_data_cross: ASSOCIATION_DIRECTION, ASSOCIATION_FORM, ASSOCIATION_METRIC, ASSOCIATION_RANGE, CHECK_ID, CHECK_LABEL, CONTRADICTION_TERM, CONTRADICTION_TYPE, DATA_PREPARATION, GOLDSTANDARD, MULTIVARIATE_OUTLIER_CHECK, MULTIVARIATE_OUTLIER_CHECKTYPE, N_RULES, REL_VAL, VARIABLE_LIST, meta_data_cross


Convert VALUE_LABELS to separate tables

Description

Convert VALUE_LABELS to separate tables

Usage

util_normalize_value_labels(
  meta_data = "item_level",
  max_value_label_len = getOption("dataquieR.MAX_VALUE_LABEL_LEN",
    dataquieR.MAX_VALUE_LABEL_LEN_default)
)

Arguments

meta_data

data.frame old name for item_level

max_value_label_len

integer maximum length for value labels

Value

data.frame metadata with VALUE_LABEL_TABLE instead of VALUE_LABELS (or none of these, if absent)

Examples

## Not run: 
prep_purge_data_frame_cache()
prep_load_workbook_like_file("meta_data_v2")
util_normalize_value_labels()
prep_add_data_frames(test_labs =
  tibble::tribble(~ CODE_VALUE, ~ CODE_LABEL, 17L, "Test", 19L, "Test",
    17L, "TestX"))
il <- prep_get_data_frame("item_level")
if (!VALUE_LABEL_TABLE %in% colnames(il)) {
  il$VALUE_LABEL_TABLE <- NA_character_
}
il$VALUE_LABEL_TABLE[[1]] <- "test_labs"
il$VALUE_LABELS[[1]] <- "17 = TestY"
prep_add_data_frames(item_level = il)
util_normalize_value_labels()

## End(Not run)


Detect Expected Observations

Description

For each participant, check, if an observation was expected, given the PART_VARS from item-level metadata

Usage

util_observation_expected(
  rv,
  study_data,
  meta_data,
  label_col = LABEL,
  expected_observations = c("HIERARCHY", "ALL", "SEGMENT")
)

Arguments

rv

character the response variable, for that a value may be expected

study_data

study_data

meta_data

meta_data

label_col

character mapping attribute colnames(study_data) vs. meta_data[label_col]

expected_observations

enum HIERARCHY | ALL | SEGMENT. How should PART_VARS be handled: - ALL: Ignore, all observations are expected - SEGMENT: if PART_VAR is 1, an observation is expected - HIERARCHY: the default, if the PART_VAR is 1 for this variable and also for all PART_VARS of PART_VARS up in the hierarchy, an observation is expected.

Value

a vector with TRUE or FALSE for each row of study_data, if for study_data[rv] a value is expected.

See Also

Other missing_functions: util_all_intro_vars_for_rv(), util_count_expected_observations(), util_filter_missing_list_table_for_rv(), util_get_code_list(), util_is_na_0_empty_or_false(), util_remove_empty_rows(), util_replace_codes_by_NA()


Utility function observations in subgroups

Description

This function uses !is.na to count the number of non-missing observations in subgroups of the data (list) and in a set of user defined response variables. In some applications it is required that the number of observations per e.g. factor level is higher than a user-defined minimum number.

Usage

util_observations_in_subgroups(x, rvs)

Arguments

x

data frame

rvs

variable names

Value

matrix of flags

See Also

prep_min_obs_level

util_check_group_levels

Other robustness_functions: util_as_valid_missing_codes(), util_check_one_unique_value(), util_correct_variable_use(), util_empty(), util_ensure_character(), util_ensure_in(), util_ensure_suggested(), util_expect_scalar(), util_fix_rstudio_bugs(), util_is_integer(), util_is_numeric_in(), util_is_valid_missing_codes(), util_match_arg(), util_stop_if_not(), util_warn_unordered()


Creates a Link to our Website

Description

i.e., to a vignette on the website

Usage

util_online_ref(fkt_name)

Arguments

fkt_name

character function name to generate a link for

Value

character the link


Utility function to compute and optimize bin breaks for histograms

Description

Utility function to compute and optimize bin breaks for histograms

Usage

util_optimize_histogram_bins(
  x,
  interval_freedman_diaconis = NULL,
  nbins_max = 100,
  cuts = NULL
)

Arguments

x

a vector of data values (numeric or datetime)

interval_freedman_diaconis

range of values which should be included to calculate the Freedman-Diaconis bandwidth (e.g., for con_limit_deviations only values within limits) in interval notation (e.g., ⁠[0;100]⁠)

nbins_max

the maximum number of bins for the histogram. Strong outliers can cause too many narrow bins, which might be even to narrow to be plotted. This also results in large files and rendering problems. So it is sensible to limit the number of bins. The function will produce a message if it reduces the number of bins in such a case. Reasons could be unspecified missing value codes, or minimum or maximum values far away from most of the data values, a few number of unique values, or (for con_limit_deviations) no or few values within limits.

cuts

a vector of values at which breaks between bins should occur

Value

a list with bin breaks, if needed separated for each segment of the plot

See Also

Other figure_functions: util_heatmap_1th()


Utility function to distribute points across a time variable

Description

Utility function to distribute points across a time variable

Usage

util_optimize_sequence_across_time_var(
  time_var_data,
  n_points,
  prop_grid = 0.5
)

Arguments

time_var_data

vector of the data points of the time variable

n_points

maximum number of points to distribute across the time variable (minimum: 3)

prop_grid

proportion of points given in n_points that should be distributed in an equally spaced grid across the time variable (minimum: 0.1, maximum: 1). The remaining proportion of points will be spaced according to the distribution of the time variable's data points.

Value

a sequence of points in datetime format


Get the order of a vector with general order given in some other vector

Description

Get the order of a vector with general order given in some other vector

Usage

util_order_by_order(x, order, ...)

Arguments

x

the vector

order

the "order vector

...

additional arguments passed to order

See Also

Other reporting_functions: util_alias2caption(), util_copy_all_deps(), util_create_page_file(), util_eval_to_dataquieR_result(), util_evaluate_calls(), util_float_index_menu(), util_generate_anchor_link(), util_generate_anchor_tag(), util_generate_calls(), util_generate_calls_for_function(), util_load_manual(), util_make_data_slot_from_table_slot(), util_set_size()

Examples

## Not run: 
util_order_by_order(c("a", "b", "a", "c", "d"), letters)

## End(Not run)

Utility function parallel version of purrr::pmap

Description

Parallel version of purrr::pmap.

Usage

util_par_pmap(
  .l,
  .f,
  ...,
  cores = list(mode = "socket", cpus = util_detect_cores(), logging = FALSE,
    load.balancing = TRUE),
  use_cache = FALSE
)

Arguments

.l

data.frame with one call per line and one function argument per column

.f

function to call with the arguments from .l

...

additional, static arguments for calling .f

cores

number of cpu cores to use or a (named) list with arguments for parallelMap::parallelStart or NULL, if parallel has already been started by the caller.

use_cache

logical set to FALSE to omit re-using already distributed study- and metadata on a parallel cluster

Value

list of results of the function calls

Author(s)

Aurèle

S Struckmann

See Also

purrr::pmap

Stack Overflow post

Other process_functions: util_abbreviate(), util_all_is_integer(), util_attach_attr(), util_bQuote(), util_backtickQuote(), util_coord_flip(), util_extract_matches(), util_setup_rstudio_job(), util_suppress_output()


Utility function to parse assignments

Description

This function parses labels & level assignments in the format 1 = male | 2 = female. The function also handles m = male | f = female, but this would not match the metadata concept. The split-character can be given, if not the default from SPLIT_CHAR is to be used, but this would also violate the metadata concept.

Usage

util_parse_assignments(
  text,
  split_char = SPLIT_CHAR,
  multi_variate_text = FALSE,
  split_on_any_split_char = FALSE
)

Arguments

text

Text to be parsed

split_char

Character separating assignments, may be a vector, then all will be tried and the the most likely matching one will be returned as attribute split_char of the result.

multi_variate_text

don't paste text but parse element-wise

split_on_any_split_char

split on any split split_char, if > 1 given.

Value

the parsed assignments as a named list

See Also

Other parser_functions: util_interpret_limits(), util_parse_interval(), util_parse_redcap_rule()

Examples

## Not run: 
md <- prep_get_data_frame("meta_data")
vl <- md$VALUE_LABELS
vl[[50]] <- "low<medium < high"
a <- util_parse_assignments(vl, split_char = c(SPLIT_CHAR, "<"),
  multi_variate_text = TRUE)
b <- util_parse_assignments(vl, split_char = c(SPLIT_CHAR, "<"),
  split_on_any_split_char = TRUE, multi_variate_text = TRUE)
is_ordered <- vapply(a, attr, "split_char", FUN.VALUE = character(1)) == "<"
md$VALUE_LABELS[[50]] <- "low<medium < high"
md$VALUE_LABELS[[51]] <- "1 = low< 2=medium < 3=high"
md$VALUE_LABELS[[49]] <- "2 = medium< 1=low < 3=high" # counter intuitive
with_sl <- prep_scalelevel_from_data_and_metadata(study_data = "study_data",
  meta_data = md)
View(with_sl[, union(SCALE_LEVEL, colnames(with_sl))])

## End(Not run)


Utility function to parse intervals

Description

Utility function to parse intervals

Usage

util_parse_interval(int)

Arguments

int

an interval as string, e.g., "[0;Inf)"

Value

the parsed interval with elements inc_l (Is the lower limit included?), low (the value of the lower limit), inc_u (Is the upper limit included?), upp (the value of the upper limit)

See Also

Other parser_functions: util_interpret_limits(), util_parse_assignments(), util_parse_redcap_rule()


Interpret a REDcap-style rule and create an expression, that represents this rule

Description

Interpret a REDcap-style rule and create an expression, that represents this rule

Usage

util_parse_redcap_rule(
  rule,
  debug = 0,
  entry_pred = "REDcapPred",
  must_eof = FALSE
)

Arguments

rule

character REDcap style rule

debug

integer debug level (0 = off, 1 = log, 2 = breakpoints)

entry_pred

character for debugging reasons: The production rule used entry point for the parser

must_eof

logical if TRUE, expect the input to be eof, when the parser succeeded, fail, if not.

Value

expression the interpreted rule

REDcap rules 1 REDcap rules 2 REDcap rules 3

For resolving left-recursive rules, StackOverflow helps understanding the grammar below, just in case, theoretical computer science is not right in your mind currently.

See Also

Other parser_functions: util_interpret_limits(), util_parse_assignments(), util_parse_interval()

Examples

## Not run: 
#  rules:
# pregnancies <- 9999 ~ SEX == 'm' |  is.na(SEX)
# pregnancies <- 9998 ~ AGE < 12 |  is.na(AGE)
# pregnancies = 9999 ~ dist > 2 |  speed == 0

data.frame(target = "SEX_0",
  rule = '[speed] > 5 and [dist] > 42 or 1 = "2"',
  CODE = 99999, LABEL = "PREGNANCIES_NOT_ASSESSED FOR MALES",
  class = "JUMP")
ModifyiedStudyData <- replace in SEX_0 where SEX_0 is empty, if rule fits
ModifyedMetaData <- add missing codes with labels and class here

subset(study_data, eval(pregnancies[[3]]))

rule <-
 paste0('[con_consentdt] <> "" and [sda_osd1dt] <> "" and',
 ' datediff([con_consentdt],[sda_osd1dt],"d",true) < 0')

x <- data.frame(con_consentdt = c(as.POSIXct("2020-01-01"),
                as.POSIXct("2020-10-20")),
                sda_osd1dt = c(as.POSIXct("2020-01-20"),
                as.POSIXct("2020-10-01")))
eval(util_parse_redcap_rule(paste0(
  '[con_consentdt] <> "" and [sda_osd1dt] <> "" and ',
  'datediff([con_consentdt],[sda_osd1dt],"d", "Y-M-D",true) < 10')),
  x, util_get_redcap_rule_env())

util_parse_redcap_rule("[a] = 12 or [b] = 13")
cars[eval(util_parse_redcap_rule(
  rule = '[speed] > 5 and [dist] > 42 or 1 = "2"'), cars,
  util_get_redcap_rule_env()), ]
cars[eval(util_parse_redcap_rule(
  rule = '[speed] > 5 and [dist] > 42 or 2 = "2"'), cars,
  util_get_redcap_rule_env()), ]
cars[eval(util_parse_redcap_rule(
  rule = '[speed] > 5 or [dist] > 42 and 1 = "2"'), cars,
  util_get_redcap_rule_env()), ]
cars[eval(util_parse_redcap_rule(
  rule = '[speed] > 5 or [dist] > 42 and 2 = "2"'), cars,
  util_get_redcap_rule_env()), ]
util_parse_redcap_rule(rule = '(1 = "2" or true) and (false)')
eval(util_parse_redcap_rule(rule =
  '[dist] > sum(1, +(2, [dist] + 5), [speed]) + 3 + [dist]'),
cars, util_get_redcap_rule_env())

## End(Not run)


Paste strings but keep NA (paste0)

Description

Paste strings but keep NA (paste0)

Usage

util_paste0_with_na(...)

Arguments

...

other arguments passed to paste0

Value

character pasted strings


Paste strings but keep NA

Description

Paste strings but keep NA

Usage

util_paste_with_na(...)

Arguments

...

other arguments passed to paste

Value

character pasted strings


Plot to un-disclosed ggplot object

Description

Plot to un-disclosed ggplot object

Usage

util_plot2svg_object(expr, w = 21.2, h = 15.9, sizing_hints)

Arguments

expr

plot expression

w

width in cm

h

height in cm

Value

ggplot object, but rendered (no original data included)


Utility function to create plots for categorical variables

Description

Depending on the required level of complexity, this helper function creates various plots for categorical variables. Next to basic bar plots, it also enables group comparisons (for example for device/examiner effects) and longitudinal views.

Usage

util_plot_categorical_vars(
  resp_vars,
  group_vars = NULL,
  time_vars = NULL,
  study_data,
  meta_data,
  n_cat_max = 6,
  n_group_max = getOption("dataquieR.max_group_var_levels_in_plot", 20),
  n_data_min = 20
)

Arguments

resp_vars

name of the categorical variable

group_vars

name of the grouping variable

time_vars

name of the time variable

study_data

the data frame that contains the measurements

meta_data

the data frame that contains metadata attributes of study data

n_cat_max

maximum number of categories to be displayed individually for the categorical variable (resp_vars)

n_group_max

maximum number of categories to be displayed individually for the grouping variable (group_vars, devices / examiners)

n_data_min

minimum number of data points to create a time course plot for an individual category of the resp_vars variable

Value

a figure


Plot a ggplot2 figure without plotly

Description

Plot a ggplot2 figure without plotly

Usage

util_plot_figure_no_plotly(x, sizing_hints = NULL)

Arguments

x

ggplot2::ggplot2 object

sizing_hints

object additional metadata about the natural figure size

Value

htmltools compatible object


Plot a ggplot2 figure using plotly

Description

Plot a ggplot2 figure using plotly

Usage

util_plot_figure_plotly(x, sizing_hints = NULL)

Arguments

x

ggplot2::ggplot2 object

sizing_hints

object additional metadata about the natural figure size

Value

htmltools compatible object


Replacement for htmltools::plotTag

Description

the function is specifically designed for fully scalable SVG figures.

Usage

util_plot_svg_to_uri(expr, w = 800, h = 600)

Arguments

expr

plot expression

w

width

h

height

w and h are mostly used for the relation of fixed text sizes to the figure size.

Value

htmltools compatible object


Plotly to un-disclosed ggplot object

Description

Plotly to un-disclosed ggplot object

Usage

util_plotly2svg_object(plotly, w = 21.2, h = 15.9, sizing_hints)

Arguments

plotly

the object

w

width in cm

h

height in cm

Value

ggplot object, but rendered (no original data included)


Utility function to prepare the metadata for location checks

Description

Utility function to prepare the metadata for location checks

Usage

util_prep_location_check(
  resp_vars,
  meta_data,
  report_problems = c("error", "warning", "message"),
  label_col = VAR_NAMES
)

Arguments

resp_vars

variable list the names of the measurement variables

meta_data

data.frame the data frame that contains metadata attributes of study data

report_problems

enum Should missing metadata information be reported as error, warning or message?

Value

a list with the location metric (mean or median) and expected range for the location check

See Also

Other lookup_functions: util_prep_proportion_check(), util_variable_references()


Utility function to prepare the metadata for proportion checks

Description

Utility function to prepare the metadata for proportion checks

Usage

util_prep_proportion_check(
  resp_vars,
  meta_data,
  ds1,
  report_problems = c("error", "warning", "message"),
  label_col = attr(ds1, "label_col")
)

Arguments

resp_vars

variable list the names of the measurement variables

meta_data

data.frame the data frame that contains metadata attributes of study data

ds1

data.frame the data frame that contains the measurements (hint: missing value codes should be excluded, so the function should be called with ds1, if available)

report_problems

enum Should missing metadata information be reported as error, warning or message?

label_col

variable attribute the name of the column in the metadata with labels of variables

Value

a list with the expected range for the proportion check

See Also

Other lookup_functions: util_prep_location_check(), util_variable_references()


Convert single dataquieR result to an htmltools compatible object

Description

Convert single dataquieR result to an htmltools compatible object

Usage

util_pretty_print(
  dqr,
  nm,
  is_single_var,
  meta_data,
  label_col,
  use_plot_ly,
  dir,
  ...
)

Arguments

dqr

dataquieR_result an output (indicator) from dataquieR

nm

character the name used in the report, the alias name of the function call plus the variable name

is_single_var

logical we are creating a single variable overview page or an indicator summary page

meta_data

meta_data the data frame that contains metadata attributes of study data

label_col

variable attribute the name of the column in the metadata with labels of variables

use_plot_ly

logical use plotly

dir

character output directory for potential iframes.

...

further arguments passed through, if applicable

Value

htmltools compatible object with rendered dqr


Prepare a vector four output

Description

Prepare a vector four output

Usage

util_pretty_vector_string(v, quote = dQuote, n_max = length(v))

Arguments

v

the vector

quote

function, used for quoting – sQuote or dQuote

n_max

maximum number of elements of v to display, if not missing.

Value

the "pretty" collapsed vector as a string.

See Also

Other string_functions: util_abbreviate_unique(), util_filter_names_by_regexps(), util_set_dQuoteString(), util_set_sQuoteString(), util_sub_string_left_from_.(), util_sub_string_right_from_.(), util_translate()


Bind data frames row-based

Description

if not all data frames share all columns, missing columns will be filled with NAs.

Usage

util_rbind(..., data_frames_list = list())

Arguments

...

data.frame none more more data frames

data_frames_list

list optional, a list of data frames

Value

data.frame all data frames appended

See Also

Other data_management: util_assign_levlabs(), util_check_data_type(), util_check_group_levels(), util_compare_meta_with_study(), util_dichotomize(), util_fix_merge_dups(), util_merge_data_frame_list(), util_remove_na_records(), util_replace_hard_limit_violations(), util_round_to_decimal_places(), util_study_var2factor(), util_table_of_vct()

Examples

## Not run: 
util_rbind(head(cars), head(iris))
util_rbind(head(cars), tail(cars))
util_rbind(head(cars)[, "dist", FALSE], tail(cars)[, "speed", FALSE])

## End(Not run)


Can we really be sure to run RStudio

Description

⁠Jetbrain's⁠ Idea and so on fake to be RStudio by having RStudio in .Platform$GUI.

Usage

util_really_rstudio()

Value

TRUE, if really sure to be RStudio, FALSE, otherwise.


Map a vector of values based on an assignment table

Description

Map a vector of values based on an assignment table

Usage

util_recode(values, mapping_table, from, to, default = NULL)

Arguments

values

vector the vector

mapping_table

data.frame a table with the mapping table

from

character the name of the column with the "old values"

to

character the name of the column with the "new values"

default

character either one character or on character per value, being used, if an entry from values is not in the from column in 'mapping_table

Value

the mapped values

See Also

dplyr::recode

Other mapping: util_map_all(), util_map_by_largest_prefix(), util_map_labels()


For a group of variables (original) the function provides all original plus referred variables in the metadata and a new item_level metadata including information on the original variables and the referred variables

Description

For a group of variables (original) the function provides all original plus referred variables in the metadata and a new item_level metadata including information on the original variables and the referred variables

Usage

util_referred_vars(
  resp_vars,
  id_vars = character(0),
  vars_in_subgroup = character(0),
  meta_data,
  meta_data_segment = NULL,
  meta_data_dataframe = NULL,
  meta_data_cross_item = NULL,
  meta_data_item_computation = NULL,
  strata_column = NULL
)

Arguments

resp_vars

variable list the name of the original variables.

id_vars

variable a vector containing the name/s of the variables containing ids

vars_in_subgroup

variable a vector containing the name/s of the variable/s mentioned inside the subgroup rule

meta_data

data.frame old name for item_level

meta_data_segment

data.frame – optional: Segment level metadata

meta_data_dataframe

data.frame – optional if study_data is present: Data frame level metadata

meta_data_cross_item

data.frame – optional: Cross-item level metadata

meta_data_item_computation

data.frame – optional: Computed items metadata

strata_column

variable name of a study variable used to stratify the report by and to add as referred variable

Value

a named list containing the referred variables and a new item_level metadata including information on the original variables and the referred variables


removes empty rows from x

Description

removes empty rows from x

Usage

util_remove_empty_rows(x, id_vars = character(0))

Arguments

x

data.frame a data frame to be cleaned

id_vars

character column names, that will be treated as empty

Value

data.frame reduced x

See Also

Other missing_functions: util_all_intro_vars_for_rv(), util_count_expected_observations(), util_filter_missing_list_table_for_rv(), util_get_code_list(), util_is_na_0_empty_or_false(), util_observation_expected(), util_replace_codes_by_NA()


remove all records, that have at least one NA in any of the given variables

Description

remove all records, that have at least one NA in any of the given variables

Usage

util_remove_na_records(study_data, vars = colnames(study_data))

Arguments

study_data

the study data frame

vars

the variables being checked for NAs

Value

modified study_data data frame

See Also

Other data_management: util_assign_levlabs(), util_check_data_type(), util_check_group_levels(), util_compare_meta_with_study(), util_dichotomize(), util_fix_merge_dups(), util_merge_data_frame_list(), util_rbind(), util_replace_hard_limit_violations(), util_round_to_decimal_places(), util_study_var2factor(), util_table_of_vct()

Examples

## Not run: 
dta <- iris
dim(util_remove_na_records(dta))
dta$Species[4:6] <- NA
dim(util_remove_na_records(dta))
dim(util_remove_na_records(dta, c("Sepal.Length", "Petal.Length")))

## End(Not run)



Render a table summarizing dataquieR results

Description

Render a table summarizing dataquieR results

Usage

util_render_table_dataquieR_summary(
  x,
  grouped_by = c("call_names", "indicator_metric"),
  folder_of_report = NULL,
  var_uniquenames = NULL
)

Arguments

x

a report summary (summary(r))

grouped_by

define the columns of the resulting matrix. It can be either "call_names", one column per function, or "indicator_metric", one column per indicator or both c("call_names", "indicator_metric"). The last combination is the default

folder_of_report

a named vector with the location of variable and call_names

var_uniquenames

a data frame with the original variable names and the unique names in case of reports created with dq_report_by containing the same variable in several reports (e.g., creation of reports by sex)

Value

something, htmltools can render


Utility function to replace missing codes by NAs

Description

Substitute all missing codes in a data.frame by NA.

Usage

util_replace_codes_by_NA(
  study_data,
  meta_data = "item_level",
  split_char = SPLIT_CHAR,
  sm_code = NULL
)

Arguments

study_data

Study data including jump/missing codes as specified in the code conventions

meta_data

Metadata as specified in the code conventions

split_char

Character separating missing codes

sm_code

missing code for NAs, if they have been re-coded by util_combine_missing_lists

Codes are expected to be numeric.

Value

a list with a modified data frame and some counts

See Also

Other missing_functions: util_all_intro_vars_for_rv(), util_count_expected_observations(), util_filter_missing_list_table_for_rv(), util_get_code_list(), util_is_na_0_empty_or_false(), util_observation_expected(), util_remove_empty_rows()


Replace limit violations (HARD_LIMITS) by NAs

Description

Replace limit violations (HARD_LIMITS) by NAs

Usage

util_replace_hard_limit_violations(study_data, meta_data, label_col)

Arguments

study_data

study_data

meta_data

study_data

label_col

variable attribute the name of the column in the metadata with labels of variables

Value

modified study_data

See Also

Other data_management: util_assign_levlabs(), util_check_data_type(), util_check_group_levels(), util_compare_meta_with_study(), util_dichotomize(), util_fix_merge_dups(), util_merge_data_frame_list(), util_rbind(), util_remove_na_records(), util_round_to_decimal_places(), util_study_var2factor(), util_table_of_vct()


Import a data frame

Description

see rio::import, but with argument keep_types and modified error handling.

Usage

util_rio_import(fn, keep_types, ...)

Arguments

fn

the file name to load.

keep_types

logical keep types as possibly defined in the file. set TRUE for study data.

...

additional arguments for rio::import

Value

data.frame as in rio::import


Import list of data frames

Description

see rio::import_list, but with argument keep_types and modified error handling.

Usage

util_rio_import_list(fn, keep_types, ...)

Arguments

fn

the file name to load.

keep_types

logical keep types as possibly defined in the file. set TRUE for study data.

...

additional arguments for rio::import_list

Value

list as in rio::import_list


Round number of decimal places to 3 if the values are between 0.001 and 9999.999 otherwise (if at least one value of the vector is outside this limits) use scientific notation for all the values in a vector

Description

Round number of decimal places to 3 if the values are between 0.001 and 9999.999 otherwise (if at least one value of the vector is outside this limits) use scientific notation for all the values in a vector

Usage

util_round_to_decimal_places(x, digits = 3)

Arguments

x

a numeric vector to be rounded

digits

a numeric value indicating the number of desired decimal places

See Also

Other data_management: util_assign_levlabs(), util_check_data_type(), util_check_group_levels(), util_compare_meta_with_study(), util_dichotomize(), util_fix_merge_dups(), util_merge_data_frame_list(), util_rbind(), util_remove_na_records(), util_replace_hard_limit_violations(), util_study_var2factor(), util_table_of_vct()


Utility function to put strings in quotes

Description

This function generates usual double-quotes for each element of the character vector

Usage

util_set_dQuoteString(string)

Arguments

string

Character vector

Value

quoted string

See Also

Other string_functions: util_abbreviate_unique(), util_filter_names_by_regexps(), util_pretty_vector_string(), util_set_sQuoteString(), util_sub_string_left_from_.(), util_sub_string_right_from_.(), util_translate()


Utility function single quote string

Description

This function generates usual single-quotes for each element of the character vector.

Usage

util_set_sQuoteString(string)

Arguments

string

Character vector

Value

quoted string

See Also

Other string_functions: util_abbreviate_unique(), util_filter_names_by_regexps(), util_pretty_vector_string(), util_set_dQuoteString(), util_sub_string_left_from_.(), util_sub_string_right_from_.(), util_translate()


Attaches attributes about the recommended minimum absolute sizes to the plot p

Description

Attaches attributes about the recommended minimum absolute sizes to the plot p

Usage

util_set_size(p, width_em = NA_integer_, height_em = NA_integer_)

Arguments

p

ggplot2::ggplot the plot

width_em

numeric len=1. the minimum width hint in em

height_em

numeric len=1. the minimum height in em

Value

p the modified plot

See Also

Other reporting_functions: util_alias2caption(), util_copy_all_deps(), util_create_page_file(), util_eval_to_dataquieR_result(), util_evaluate_calls(), util_float_index_menu(), util_generate_anchor_link(), util_generate_anchor_tag(), util_generate_calls(), util_generate_calls_for_function(), util_load_manual(), util_make_data_slot_from_table_slot(), util_order_by_order()


Set up an RStudio job

Description

Also defines a progress function and a progress_msg function in the caller's environment.

Usage

util_setup_rstudio_job(job_name = "Job")

Arguments

job_name

a name for the job

Details

In RStudio its job system will be used, for shiny::withProgress based calls, this will require min and max being set to 0 and 1 (defaults). If cli is available, it will be used, in all other cases, just messages will be created.

Value

list: the progress function and the progress_msg function

See Also

Other process_functions: util_abbreviate(), util_all_is_integer(), util_attach_attr(), util_bQuote(), util_backtickQuote(), util_coord_flip(), util_extract_matches(), util_par_pmap(), util_suppress_output()

Examples

## Not run: 
  test <- function() {
    util_setup_rstudio_job("xx")
    Sys.sleep(5)
    progress(50)
    progress_msg("halfway through")
    Sys.sleep(5)
    progress(100)
    Sys.sleep(1)
  }
  test()

## End(Not run)


Utility function outliers according to the rule of Huber et al.

Description

This function calculates outliers according to the rule of Huber et al.

Usage

util_sigmagap(x)

Arguments

x

numeric data to check for outliers

Value

binary vector

See Also

Other outlier_functions: util_3SD(), util_hubert(), util_tukey()


Sort a vector by order given in some other vector

Description

Sort a vector by order given in some other vector

Usage

util_sort_by_order(x, order, ...)

Arguments

x

the vector

order

the "order vector

...

additional arguments passed to sort

See Also

Other summary_functions: prep_combine_report_summaries(), prep_extract_classes_by_functions(), prep_extract_summary(), prep_extract_summary.dataquieR_result(), prep_extract_summary.dataquieR_resultset2(), prep_render_pie_chart_from_summaryclasses_ggplot2(), prep_render_pie_chart_from_summaryclasses_plotly(), prep_summary_to_classes(), util_as_cat(), util_as_integer_cat(), util_extract_indicator_metrics(), util_get_category_for_result(), util_get_colors(), util_get_labels_grading_class(), util_get_message_for_result(), util_get_rule_sets(), util_get_ruleset_formats(), util_get_thresholds(), util_html_table()

Examples

## Not run: 
util_sort_by_order(c("a", "b", "a", "c", "d"), letters)

## End(Not run)


Split table with mixed code/missing lists to single tables

Description

resulting tables are populated to the data frame cache.

Usage

util_split_val_tab(val_tab = CODE_LIST_TABLE)

Arguments

val_tab

data.frame tables in one long data frame.

Value

invisible(NULL)


Compute something comparable from an ordered

Description

interpolates categories of an ordinal variable

Usage

util_standardise_ordinal_codes(codes, maxlevel_old, maxlevel_new)

Arguments

codes

numeric() n values

maxlevel_old

integer() number of categories of codes

maxlevel_new

integer() number of categories for output

Value

integer() n values in ⁠{1, ..., maxlevel_new}⁠


String check for results/combined results

Description

detect, if x starts with ⁠<prefix>.⁠ or equals ⁠<prefix>⁠, if results have been combined

Usage

util_startsWith_prefix._or_equals_prefix(x, prefix, sep = ".")

Arguments

x

character haystack

prefix

character needle

sep

character separation string

Value

logical if entries in x start with prefix-DOT/equal to prefix


Verify assumptions made by the code, that must be TRUE

Description

Verify assumptions made by the code, that must be TRUE

Usage

util_stop_if_not(..., label, label_only)

Arguments

...

see stopifnot

label

character a label for the assumptions, can be missing

label_only

logical if TRUE and label is given, the condition will not be displayed, if FALSE

Value

invisible(FALSE), if not stopped.

See Also

stopifnot

Other robustness_functions: util_as_valid_missing_codes(), util_check_one_unique_value(), util_correct_variable_use(), util_empty(), util_ensure_character(), util_ensure_in(), util_ensure_suggested(), util_expect_scalar(), util_fix_rstudio_bugs(), util_is_integer(), util_is_numeric_in(), util_is_valid_missing_codes(), util_match_arg(), util_observations_in_subgroups(), util_warn_unordered()


Create a storr object with a storr_factory attribute

Description

also does basic validity checks

Usage

util_storr_factory(my_storr_object, my_storr_factory)

Arguments

my_storr_object

a storr-object

my_storr_factory

a function creating the/a storr_object

Value

storr-object with the factory attribute and (hopefully) valid.


Create a storr-object using the factory

Description

also performs checks.

Usage

util_storr_object(
  my_storr_factory = function() {
     storr::storr_environment()
 }
)

Arguments

my_storr_factory

a function returning a storr object

Value

a storr object


Utility function for judging whether a character vector does not appear to be a categorical variable

Description

The function considers the following properties:

Usage

util_string_is_not_categorical(vec)

Arguments

vec

a character vector

Value

TRUE or FALSE


Convert a study variable to a factor

Description

Convert a study variable to a factor

Usage

util_study_var2factor(
  resp_vars = NULL,
  study_data,
  meta_data = "item_level",
  label_col = LABEL,
  assume_consistent_codes = TRUE,
  have_cause_label_df = FALSE,
  code_name = c(JUMP_LIST, MISSING_LIST),
  include_sysmiss = TRUE
)

Arguments

resp_vars

variable list the name of the measurement variables

study_data

data.frame the data frame that contains the measurements

meta_data

data.frame the data frame that contains metadata attributes of study data

label_col

variable attribute the name of the column in the metadata with labels of variables

assume_consistent_codes

logical assume, that missing codes are consistent for all variables

have_cause_label_df

logical is a missing-code table available

code_name

character all lists from the meta_data to use for the coding.

include_sysmiss

logical add also a factor level for data values that were NA in the original study data (system missingness).

Value

study_data converted to factors using the coding provided in code_name

See Also

Other data_management: util_assign_levlabs(), util_check_data_type(), util_check_group_levels(), util_compare_meta_with_study(), util_dichotomize(), util_fix_merge_dups(), util_merge_data_frame_list(), util_rbind(), util_remove_na_records(), util_replace_hard_limit_violations(), util_round_to_decimal_places(), util_table_of_vct()


Get sub-string left from first .

Description

Get sub-string left from first .

Usage

util_sub_string_left_from_.(x)

Arguments

x

the string with a least one .

See Also

Other string_functions: util_abbreviate_unique(), util_filter_names_by_regexps(), util_pretty_vector_string(), util_set_dQuoteString(), util_set_sQuoteString(), util_sub_string_right_from_.(), util_translate()

Examples

## Not run: 
util_sub_string_left_from_.(c("a.b", "asdf.xyz", "asdf.jkl.zuio"))

## End(Not run)


Get sub-string right from first .

Description

Get sub-string right from first .

Usage

util_sub_string_right_from_.(x)

Arguments

x

the string with a least one .

See Also

Other string_functions: util_abbreviate_unique(), util_filter_names_by_regexps(), util_pretty_vector_string(), util_set_dQuoteString(), util_set_sQuoteString(), util_sub_string_left_from_.(), util_translate()

Examples

## Not run: 
util_sub_string_right_from_.(c("a.b", "asdf.xyz"))
util_sub_string_right_from_.(c("a.b", "asdf.xy.z"))
util_sub_string_right_from_.(c("ab", "asdxy.z"))

## End(Not run)


Suppress any output to stdout using sink()

Description

Suppress any output to stdout using sink()

Usage

util_suppress_output(expr)

Arguments

expr

expression()

Value

invisible() result of expr

See Also

Other process_functions: util_abbreviate(), util_all_is_integer(), util_attach_attr(), util_bQuote(), util_backtickQuote(), util_coord_flip(), util_extract_matches(), util_par_pmap(), util_setup_rstudio_job()


Suppress warnings conditionally

Description

Suppress warnings conditionally

Usage

util_suppress_warnings(expr, classes = "warning")

Arguments

expr

expression to evaluate

classes

character classes of warning-conditions to suppress

Value

the result of expr

See Also

Other condition_functions: util_condition_constructor_factory(), util_deparse1(), util_error(), util_find_external_functions_in_stacktrace(), util_find_first_externally_called_functions_in_stacktrace(), util_find_indicator_function_in_callers(), util_message(), util_warning()


Tabulate a vector

Description

does the same as as.data.frame(table(x)) but guarantees a data frame with two columns is returned

Usage

util_table_of_vct(Var1)

Arguments

Var1

vector to ta tabulate

Value

a data frame with columns Var1 and Freq

See Also

Other data_management: util_assign_levlabs(), util_check_data_type(), util_check_group_levels(), util_compare_meta_with_study(), util_dichotomize(), util_fix_merge_dups(), util_merge_data_frame_list(), util_rbind(), util_remove_na_records(), util_replace_hard_limit_violations(), util_round_to_decimal_places(), util_study_var2factor()


Rotate 1-row data frames to key-value data frames

Description

if nrow(tb) > 1, util_table_rotator just returns tb.

Usage

util_table_rotator(tb)

Arguments

tb

data.frame a data frame

Value

data.frame but transposed


Get a translation

Description

Get a translation

Usage

util_translate(keys, ns = "general", lang = getOption("dataquieR.lang", ""))

Arguments

keys

character translation keys

ns

character translation namespace

lang

character language to translate to

Value

character translations

See Also

Other string_functions: util_abbreviate_unique(), util_filter_names_by_regexps(), util_pretty_vector_string(), util_set_dQuoteString(), util_set_sQuoteString(), util_sub_string_left_from_.(), util_sub_string_right_from_.()


Translate standard column names to readable ones

Description

TODO: Duplicate of util_make_data_slot_from_table_slot ??

Usage

util_translate_indicator_metrics(
  colnames,
  short = FALSE,
  long = TRUE,
  ignore_unknown = FALSE
)

Arguments

colnames

character the names to translate

short

logical include unit letter in output

long

logical include unit description in output

ignore_unknown

logical do not replace unknown indicator metrics by NA, keep them

Value

translated names


Utility function Tukey outlier rule

Description

This function calculates outliers according to the rule of Tukey.

Usage

util_tukey(x)

Arguments

x

numeric data to check for outliers

Value

binary vector

See Also

Other outlier_functions: util_3SD(), util_hubert(), util_sigmagap()


Remove tables referred to by metadata and use SVG for most figures

Description

Remove tables referred to by metadata and use SVG for most figures

Usage

util_undisclose(x, ...)

Arguments

x

an object to un-disclose

...

further arguments, used for pointing to the dataquieR_result object, if called recursively

Value

undisclosed object


Detect base unit from composite units

Description

Detect base unit from composite units

Usage

util_unit2baseunit(
  unit,
  warn_ambiguities = !exists("warn_ambiguities", .unit2baseunitenv),
  unique = TRUE
)

Arguments

unit

character a unit

warn_ambiguities

logical warn about all ambiguous units

unique

logical choose the more SI-like unit in case of ambiguities

Value

character all possible or the preferable (unique set TRUE) base units. Can be character(0), if unit is invalid or uniqueness was requested, but even using precedence rules of SI-closeness do not help selecting the most suitable unit.

Examples

## Not run: 
util_unit2baseunit("%")
util_unit2baseunit("d%")

# Invalid unit
util_unit2baseunit("aa%")
util_unit2baseunit("aa%", unique = FALSE)

util_unit2baseunit("a%")

# Invalid unit
util_unit2baseunit("e%")
util_unit2baseunit("e%", unique = FALSE)

util_unit2baseunit("E%")
util_unit2baseunit("Eg")

# Invalid unit
util_unit2baseunit("E")
util_unit2baseunit("E", unique = FALSE)

util_unit2baseunit("EC")
util_unit2baseunit("EK")
util_unit2baseunit("µg")
util_unit2baseunit("mg")
util_unit2baseunit("°C")
util_unit2baseunit("k°C")
util_unit2baseunit("kK")
util_unit2baseunit("nK")

# Ambiguous units, if used with unique = FALSE
util_unit2baseunit("kg")
util_unit2baseunit("cd")
util_unit2baseunit("Pa")
util_unit2baseunit("kat")
util_unit2baseunit("min")

# atto atom units or astronomical units, both in state "accepted"
util_unit2baseunit("au")
util_unit2baseunit("au", unique = FALSE)

# astronomical units or micro are, both in state "accepted"
util_unit2baseunit("ua")
util_unit2baseunit("ua", unique = FALSE)

util_unit2baseunit("kt")

# parts per trillion or pico US_liquid_pint, both in state "common",
# but in this case, plain count units will be preferred
util_unit2baseunit("ppt")
util_unit2baseunit("ppt", unique = FALSE)

util_unit2baseunit("ft")
util_unit2baseunit("yd")
util_unit2baseunit("pt")

# actually the same, but both only common, and to my knowledge not-so-common
# gram-force vs. kilogram-force (kilo pond)
util_unit2baseunit("kgf")
util_unit2baseunit("kgf", unique = FALSE)

util_unit2baseunit("at")
util_unit2baseunit("ph")
util_unit2baseunit("nt")

## End(Not run)

Save a hint to the user during package load

Description

Save a hint to the user during package load

Usage

util_user_hint(x)

Arguments

x

character the hint

Value

invisible(NULL)

See Also

Other system_functions: util_detect_cores(), util_view_file()


Utility function verifying syntax of known metadata columns

Description

This function goes through metadata columns, dataquieR supports and verifies for these, that they follow its metadata conventions.

Usage

util_validate_known_meta(meta_data)

Arguments

meta_data

data.frame the data frame that contains metadata attributes of study data

Value

data.frame possibly modified meta_data, invisible()

See Also

Other metadata_management: util_dist_selection(), util_find_free_missing_code(), util_find_var_by_meta(), util_get_var_att_names_of_level(), util_get_vars_in_segment(), util_looks_like_missing(), util_no_value_labels(), util_validate_missing_lists()


Validate code lists for missing and/or jump codes

Description

will warn/stop on problems

Usage

util_validate_missing_lists(
  meta_data,
  cause_label_df,
  assume_consistent_codes = FALSE,
  expand_codes = FALSE,
  suppressWarnings = FALSE,
  label_col
)

Arguments

meta_data

data.frame the data frame that contains metadata attributes of study data

cause_label_df

data.frame missing code table. If missing codes have labels the respective data frame can be specified here, see cause_label_df

assume_consistent_codes

logical if TRUE and no labels are given and the same missing/jump code is used for more than one variable, the labels assigned for this code will be the same for all variables.

expand_codes

logical if TRUE, code labels are copied from other variables, if the code is the same and the label is set somewhere

suppressWarnings

logical warn about consistency issues with missing and jump lists

label_col

variable attribute the name of the column in the metadata with labels of variables

Value

list with entries:

See Also

Other metadata_management: util_dist_selection(), util_find_free_missing_code(), util_find_var_by_meta(), util_get_var_att_names_of_level(), util_get_vars_in_segment(), util_looks_like_missing(), util_no_value_labels(), util_validate_known_meta()


Verify the class ReportSummaryTable

Description

Verify the class ReportSummaryTable

Usage

util_validate_report_summary_table(tb, meta_data, label_col)

Arguments

tb

data.frame object to be a ReportSummaryTable

meta_data

data.frame the data frame that contains metadata attributes of study data. Used to translate variable names, if given.

label_col

variable attribute the name of the column in the metadata with labels of variables

Value

data.frame maybe fixed ReportSummaryTable


Utility function to compute the rank intraclass correlation

Description

This implementation uses the package rankICC to compute the rank intraclass correlation, a nonparametric version of the ICC (Tu et al., 2023). In contrast to model-based ICC approaches, it is less sensitive to outliers and skewed distributions. It can be applied to variables with an ordinal, interval or ratio scale. However, it is not possible to adjust for covariables with this approach. The calculated ICC can become negative, like Fisher's ICC.

Usage

util_varcomp_robust(
  resp_vars = NULL,
  group_vars = NULL,
  study_data = study_data,
  meta_data = meta_data,
  min_obs_in_subgroup = 10,
  min_subgroups = 5,
  label_col = NULL
)

Arguments

resp_vars

the name of the response variable

group_vars

the name of the grouping variable

study_data

the data frame that contains the measurements

meta_data

the data frame that contains metadata attributes of study data

min_obs_in_subgroup

the minimum number of observations that is required to include a subgroup (level) of the grouping variable (group_vars) in the analysis. Subgroups with fewer observations are excluded.

min_subgroups

the minimum number of subgroups (levels) of the grouping variable (group_vars). If the variable has fewer subgroups, the analysis is not performed.

label_col

the name of the column in the metadata with labels of variables

Value

a vector from rankICC::rankICC


Find all columns in item-level-metadata, that refer to some other variable

Description

Find all columns in item-level-metadata, that refer to some other variable

Usage

util_variable_references(meta_data = "item_level")

Arguments

meta_data

data.frame the metadata

Value

character all column names referring to variables from item-level metadata

See Also

Other lookup_functions: util_prep_location_check(), util_prep_proportion_check()


Verify encoding

Description

Verify encoding

Usage

util_verify_encoding(dt0, ref_encs)

Arguments

dt0

data.frame data to verify

ref_encs

character names are column names of dt0, values their expected encoding, can be missing.

Examples

## Not run: 
  dt0 <-
    prep_get_data_frame(
    file.path("~",
      "rsync", "nako_mrt_qs$", "exporte", "NAKO_Datensatz_bereinigte_Daten",
      "NatCoEdc_Export", "export_mannheim_30.csv"))
  util_verify_encoding(dt0)
  dt0$mrt_note[[1]] <- iconv("Härbärt", "UTF-8", "cp1252")
  util_verify_encoding(dt0)
  dt0$mrt_note[[15]] <- iconv("Härbärt", "UTF-8", "cp1252")
  util_verify_encoding(dt0)
  dt0$mrt_note[[1]] <- "Härbärt"
  util_verify_encoding(dt0)
  dt0$mrt_note[[17]] <- iconv("Härbärt", "UTF-8", "latin3")
  util_verify_encoding(dt0)

## End(Not run)

Test for likely misspelled data frame references

Description

checks, if some data frame names may have typos in their names

Usage

util_verify_names(name_of_study_data = character(0))

Arguments

name_of_study_data

character names of study data, such are expected

Value

invisible(NULL), messages / warns only.


View a file in most suitable viewer

Description

View a file in most suitable viewer

Usage

util_view_file(file)

Arguments

file

the file to view

Value

invisible(file)

See Also

Other system_functions: util_detect_cores(), util_user_hint()


Warn about a problem in varname, if x has no natural order

Description

Also warns, if R does not have a comparison operator for x.

Usage

util_warn_unordered(x, varname)

Arguments

x

vector of data

varname

character len=1. Variable name for warning messages

Value

invisible(NULL)

See Also

Other robustness_functions: util_as_valid_missing_codes(), util_check_one_unique_value(), util_correct_variable_use(), util_empty(), util_ensure_character(), util_ensure_in(), util_ensure_suggested(), util_expect_scalar(), util_fix_rstudio_bugs(), util_is_integer(), util_is_numeric_in(), util_is_valid_missing_codes(), util_match_arg(), util_observations_in_subgroups(), util_stop_if_not()


Produce a warning message with a useful short stack trace.

Description

Produce a warning message with a useful short stack trace.

Usage

util_warning(
  m,
  ...,
  applicability_problem = NA,
  intrinsic_applicability_problem = NA,
  integrity_indicator = "none",
  level = 0,
  immediate,
  title = "",
  additional_classes = c()
)

Arguments

m

warning message or a condition

...

arguments for sprintf on m, if m is a character

applicability_problem

logical TRUE, if an applicability issue, that is, the information for computation is missing (that is, an error that indicates missing metadata) or an error because the requirements of the stopped function were not met, e.g., a barplot was called for metric data. We can have logical or empirical applicability problems. empirical is the default, if the argument intrinsic_applicability_problem is left unset or set to FALSE.

intrinsic_applicability_problem

logical TRUE, if this is a logical applicability issue, that is, the computation makes no sense (for example, an error of unsuitable resp_vars). Intrinsic/logical applicability problems are also applicability problems. Non-logical applicability problems are called empirical applicability problems.

integrity_indicator

character the warning is an integrity problem, here is the indicator abbreviation..

level

integer level of the warning message (defaults to 0). Higher levels are more severe.

immediate

logical Display the warning immediately, not only, when the interactive session comes back.

additional_classes

character additional classes the thrown condition object should inherit from, first.

Value

condition the condition object, if the execution is not stopped

See Also

Other condition_functions: util_condition_constructor_factory(), util_deparse1(), util_error(), util_find_external_functions_in_stacktrace(), util_find_first_externally_called_functions_in_stacktrace(), util_find_indicator_function_in_callers(), util_message(), util_suppress_warnings()


Data frame with labels for missing- and jump-codes #' Metadata about value and missing codes

Description

data.frame with the following columns:

See Also

Online

com_item_missingness()

com_segment_missingness()

com_qualified_item_missingness()

com_qualified_segment_missingness()

con_inadmissible_categorical()

con_inadmissible_vocabulary()

MISSING_LIST_TABLE

VALUE_LABEL_TABLE

STANDARDIZED_VOCABULARY_TABLE

cause_label_df

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.