Example data

The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Example data

Cormac Monaghan

2024-11-19

Example dataset

There is a small example dataset included in the lwc2022 package called cog_data. The dataset simulates cognitive scores following the methodology used in the the Health and Retirement (HRS), specifically focusing on tasks like word recall, serial subtraction, and backwards counting. These cognitive tasks are the core of the Langa-Weir classification system used to assess cognitive function.

The simulated dataset contains 10 observations and follows the structure expected by the functions in the package (extract(), score(), and classify()). Below, we detail the steps taken to simulate the dataset.

Structure of the simulated data

The cog_data dataset contains 35 variable. A summary of its structure is presented below:

# Load the package
library(lwc2022)

# Load the example dataset
data(cog_data)

# Display the structure of cog_data
str(cog_data)
#> 'data.frame':    10 obs. of  35 variables:
#>  $ HHID    : int  288941 234057 224021 785284 326317 465208 748794 293626 669691 689448
#>  $ PN      : int  93 99 72 26 7 42 9 83 36 78
#>  $ SD182M1 : num  17 53 39 63 12 15 32 52 55 7
#>  $ SD182M2 : num  9 51 10 23 27 99 63 7 63 27
#>  $ SD182M3 : num  32 38 25 34 29 5 8 12 13 18
#>  $ SD182M4 : num  33 67 27 25 38 21 15 51 57 26
#>  $ SD182M5 : num  99 31 16 62 30 6 53 8 22 22
#>  $ SD182M6 : num  39 31 58 17 64 60 59 34 4 13
#>  $ SD182M7 : num  5 64 61 25 62 22 25 32 56 25
#>  $ SD182M8 : num  23 35 40 58 30 12 31 67 56 30
#>  $ SD182M9 : num  35 14 29 32 7 3 23 64 96 15
#>  $ SD182M10: num  21 37 8 61 10 60 52 54 34 10
#>  $ SD183M1 : num  22 12 20 56 17 56 64 35 40 56
#>  $ SD183M2 : num  61 30 15 24 59 23 53 7 29 15
#>  $ SD183M3 : num  23 26 38 56 32 7 27 52 5 6
#>  $ SD183M4 : num  16 24 32 21 65 11 36 54 56 99
#>  $ SD183M5 : num  19 25 39 64 26 9 7 34 58 13
#>  $ SD183M6 : num  19 66 62 57 39 4 1 40 30 30
#>  $ SD183M7 : num  62 25 16 24 64 11 58 20 40 3
#>  $ SD183M8 : num  29 36 62 54 22 59 52 98 20 11
#>  $ SD183M9 : num  67 65 8 56 21 55 2 53 13 56
#>  $ SD183M10: num  6 67 8 54 32 96 36 55 14 63
#>  $ SD142   : int  96 90 97 97 99 98 97 91 94 98
#>  $ SD143   : int  86 86 89 90 80 98 89 92 90 90
#>  $ SD144   : int  89 76 89 78 78 74 83 83 75 70
#>  $ SD145   : int  69 76 76 66 68 79 65 77 76 64
#>  $ SD146   : int  69 52 63 50 51 53 59 50 54 57
#>  $ SD124   : int  0 0 0 0 1 1 0 1 0 0
#>  $ SD129   : int  0 1 0 0 0 1 0 0 1 0
#>  $ SD237WA : num  -8 -8 -9 1 0 0 0 1 0 1
#>  $ SD237WC : int  13 17 3 18 2 5 12 13 10 6
#>  $ SD237WT : int  42 42 38 60 48 16 35 36 27 27
#>  $ SD238WA : num  -8 0 -8 -8 -8 -9 1 -8 -8 -8
#>  $ SD238WC : int  9 7 9 4 2 12 9 11 7 13
#>  $ SD238WT : int  37 43 33 19 12 34 21 17 12 30

The dataset contains variables for individual identifiers, cognition-related tasks (immediate/delayed word recall, serial subtraction, and backwards counting), and other variables necessary for scoring and classification.

Variable breakdown

HHID: A unique household identifier.
PN: A unique personal identifier.
SD182M01-SD182M10: Responses for the Immediate Word Recall task.
SD183M01-SD183M10: Responses for the Delayed Word Recall task.
SD142-SD146: Responses for the Serial Subtraction task, where participants are asked to subtract 7 from 100 iteratively five times.
SD124 and SD129: Responses for the Backwards Counting task, where participants count backwards from 20. SD124 represents the first attempt, and SD129 represents the second attempt.
SD237WA-SD237WT and SD238WA-SD238WT: Responses to a mouse clicking test measuring accuracy, click counts, and click time.

Generating the data

The generate_example_data() function generates a dataset of size \(n = 10\), producing a set of cognitive test variables along with unique identifiers. The output dataset is structured similarly to the cognitive assessment data collected in the HRS.

# Simulated dataset
generate_example_data <- function(n = 10) {
  data.frame(
    # Identifiers
    HHID = sample(100000:999999, n, replace = TRUE),   # Random household ID
    PN = sample(1:99, n, replace = TRUE),              # Random person number

    # THESE ARE THE VARIABLES USED IN THE LW CLASSIFICATIONS
    # Immediate word recall (10 items)
    SD182M1 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD182M2 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD182M3 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD182M4 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD182M5 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD182M6 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD182M7 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD182M8 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD182M9 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD182M10 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),

    # Delayed word recall (10 items)
    SD183M1 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD183M2 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD183M3 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD183M4 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD183M5 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD183M6 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD183M7 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD183M8 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD183M9 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD183M10 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),

    # Serial subtraction (Subtracting 7 from 100 five times)
    SD142 = sample(90:100, n, replace = TRUE),  # First subtraction value
    SD143 = sample(80:99, n, replace = TRUE),   # Second subtraction
    SD144 = sample(70:89, n, replace = TRUE),   # Third subtraction
    SD145 = sample(60:79, n, replace = TRUE),   # Fourth subtraction
    SD146 = sample(50:69, n, replace = TRUE),   # Fifth subtraction

    # Backwards counting
    SD124 = sample(0:1, n, replace = TRUE),  # Success on first try (1 = success, 0 = fail)
    SD129 = sample(0:1, n, replace = TRUE),  # Success on second try (1 = success, 0 = fail)

    # RANDOM VARIABLES NOT USED IN LW CLASSIFICATIONS
    # Speed Test (Mouse clicking)
    SD237WA = sample(c(0, 1, -8, -9), n, replace = TRUE),
    SD237WC = sample(c(0, 1, -8, -9), n, replace = TRUE),
    SD237WT = sample(c(0, 1, -8, -9), n, replace = TRUE),
    SD238WA = sample(c(0, 1, -8, -9), n, replace = TRUE),
    SD238WC = sample(c(0, 1, -8, -9), n, replace = TRUE),
    SD238WT = sample(c(0, 1, -8, -9), n, replace = TRUE)
  )
}

Parameters

\(n\): The number of observations to generate (default \(n = 10\))

Output

The function returns a dataframe with \(n\) rows and the following columns:

HHID: A randomly generated unique household identifier.
PN: A randomly generated personal number for each individual.
SD182M1 - SD182M10: Responses for Immediate Word Recall, where values are simulated from a set of codes representing different recall categories.
SD183M1 - SD183M10: Responses for Delayed Word Recall, with values similarly simulated as above.
SD142 - SD146: Values from a serial subtraction task, representing five rounds of subtracting 7 from 100 (with random variance for errors).
SD124 and SD129: Binary responses representing success (1) or failure (0) on two attempts at backwards counting.
SD237WA and SD238WA: Accuracy responses for a mouse clicking test. Responses are represented as success (1), failure (0), non participation due to technical reasons (-6) or refusal to participate (-8). SD237WA indicates the first attempt while SD238WA indicates the second attempt.
SD237WC and SD238WC: Responses representing the total number of clicks for a mouse clicking test. SD237WC indicates the first attempt while SD238WC indicates the second attempt.
SD237WT and SD238WT: Responses representing the total amount of time (in seconds) spent on a mouse clicking test. SD237WT indicates the time for the first attempt while SD238WC indicates the time for the second attempt.

Example

set.seed(123)

cog_data <- generate_example_data()

knitr::kable(head(cog_data), caption = "Example of generated cognition data")

Example of generated cognition data
HHID	PN	SD182M1	SD182M2	SD182M3	SD182M4	SD182M5	SD182M6	SD182M7	SD182M8	SD182M9	SD182M10	SD183M1	SD183M2	SD183M3	SD183M4	SD183M5	SD183M6	SD183M7	SD183M8	SD183M9	SD183M10	SD142	SD143	SD144	SD145	SD146	SD124	SD129	SD237WA	SD237WC	SD237WT	SD238WA	SD238WC	SD238WT
288941	93	17	9	32	33	99	39	5	23	35	21	22	61	23	16	19	19	62	29	67	6	96	86	89	69	69	0	0	-8	0	0	-9	-8	-8
234057	99	53	51	38	67	31	31	64	35	14	37	12	30	26	24	25	66	25	36	65	67	90	86	76	76	52	0	1	-8	-9	0	-9	-9	1
224021	72	39	10	25	27	16	58	61	40	29	8	20	15	38	32	39	62	16	62	8	8	97	89	89	76	63	0	0	-9	0	1	-8	1	0
785284	26	63	23	34	25	62	17	25	58	32	61	56	24	56	21	64	57	24	54	56	54	97	90	78	66	50	0	0	1	-8	0	-9	-8	-9
326317	7	12	27	29	38	30	64	62	30	7	10	17	59	32	65	26	39	64	22	21	32	99	80	78	68	51	1	0	0	-8	1	-8	-8	1
465208	42	15	99	5	21	6	60	22	12	3	60	56	23	7	11	9	4	11	59	55	96	98	98	74	79	53	1	1	0	1	1	-8	-8	1

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.