The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
This vignette demonstrates some basic usage of the
codyna package. First, we load the package.
We also load the engagement data available in the
package (see ?engagement for further information)
The codyna package provides an extensive set of features
for discovering patterns in sequence data, such as n-grams, gapped
patterns or repeated sequences of the same state using the function
discover_patterns. The argument len can be
used to specify the pattern lengths to look for. Similarly, argument
gap specifies the gap sizes for gapped patterns.
discover_patterns(engagement, type = "ngram", len = 2:3)
#> # A tibble: 36 × 6
#> pattern length count proportion contained_in support
#> <chr> <int> <int> <dbl> <int> <dbl>
#> 1 Active->Active 2 10218 0.434 969 0.969
#> 2 Active->Active->Active 3 8386 0.372 931 0.931
#> 3 Disengaged->Disengaged 2 5186 0.220 811 0.811
#> 4 Disengaged->Disengaged->Disenga… 3 3925 0.174 706 0.706
#> 5 Average->Average 2 2774 0.118 789 0.789
#> 6 Average->Active 2 1545 0.0656 853 0.853
#> 7 Average->Average->Average 3 1439 0.0638 545 0.545
#> 8 Average->Active->Active 3 1265 0.0561 806 0.806
#> 9 Disengaged->Average 2 1092 0.0464 709 0.709
#> 10 Active->Average 2 1071 0.0455 695 0.695
#> # ℹ 26 more rows
discover_patterns(engagement, type = "gapped", gap = 1)
#> # A tibble: 9 × 6
#> pattern length count proportion contained_in support
#> <chr> <dbl> <int> <dbl> <int> <dbl>
#> 1 Active->*->Active 3 8718 0.387 934 0.934
#> 2 Disengaged->*->Disengaged 3 4063 0.180 722 0.722
#> 3 Average->*->Active 3 2129 0.0944 850 0.85
#> 4 Average->*->Average 3 1712 0.0759 611 0.611
#> 5 Active->*->Average 3 1534 0.0680 719 0.719
#> 6 Disengaged->*->Average 3 1412 0.0626 677 0.677
#> 7 Active->*->Disengaged 3 1126 0.0499 611 0.611
#> 8 Average->*->Disengaged 3 1021 0.0453 533 0.533
#> 9 Disengaged->*->Active 3 840 0.0372 529 0.529
discover_patterns(engagement, type = "repeated", len = 2:3)
#> # A tibble: 6 × 6
#> pattern length count proportion contained_in support
#> <chr> <int> <int> <dbl> <int> <dbl>
#> 1 Active->Active 2 10218 0.562 969 0.969
#> 2 Active->Active->Active 3 8386 0.610 931 0.931
#> 3 Disengaged->Disengaged 2 5186 0.285 811 0.811
#> 4 Disengaged->Disengaged->Disengag… 3 3925 0.285 706 0.706
#> 5 Average->Average 2 2774 0.153 789 0.789
#> 6 Average->Average->Average 3 1439 0.105 545 0.545The returned data frames show the length of the pattern, the number
of times it occurred across all sequences, its proportion among patterns
of the same length, the number sequence that contained the pattern, and
the proportion of sequences that contained the pattern (support). The
function discover_patterns can also be used to look for
specific patterns, for example
discover_patterns(engagement, pattern = "Active->*")
#> # A tibble: 3 × 6
#> pattern length count proportion contained_in support
#> <chr> <int> <int> <dbl> <int> <dbl>
#> 1 Active->Active 2 10218 0.859 969 0.969
#> 2 Active->Average 2 1071 0.0900 695 0.695
#> 3 Active->Disengaged 2 605 0.0509 508 0.508Here, the wildcard * matches any state, i.e., we are
looking for patterns that start with the Active state and
the following state can be any state.
We can also compute various sequence indices
sequence_indices(engagement)
#> # A tibble: 1,000 × 23
#> valid_n valid_proportion unique_states mean_spell_duration max_spell_duration
#> <int> <dbl> <int> <dbl> <dbl>
#> 1 23 1 3 3.83 11
#> 2 23 1 3 3.29 11
#> 3 24 1 3 3.43 8
#> 4 24 1 3 4 9
#> 5 24 1 3 3.43 12
#> 6 23 1 3 5.75 13
#> 7 23 1 3 2.88 7
#> 8 23 1 3 3.29 8
#> 9 23 1 3 2.88 7
#> 10 24 1 3 8 20
#> # ℹ 990 more rows
#> # ℹ 18 more variables: longitudinal_entropy <dbl>, simpson_diversity <dbl>,
#> # self_loop_tendency <dbl>, transition_rate <dbl>,
#> # transition_complexity <dbl>, initial_state_persistence <dbl>,
#> # initial_state_proportion <dbl>, initial_state_influence_decay <dbl>,
#> # cyclic_feedback_strength <dbl>, first_state <chr>, last_state <chr>,
#> # dominant_state <chr>, dominant_proportion <dbl>, …The codyna package provides methods for the detection of
early warning signals (EWS). These methods have been adapted from the
EWSmethods with a focus on high performance. Instead of
explicit rolling window calculations, codyna implements the
measures using update formulas, resulting up to 1000-fold reduction in
computation time in some instances. First, we prepare some simple time
series data for analysis.
Both rolling window and expanding window methods are supported.
ews_roll <- detect_warnings(ts_data, method = "rolling")
ews_exp <- detect_warnings(ts_data, method = "expanding")The function detect_warnings returns an object of class
ews, and the results can be easily visualized with the plot
method of this class.
One of the core features of codyna is regime detection
for time series data. Various methods are included with a user-friendly
interface and automated parameter selection based on sensitivity. We
continue with the example time series data.
regimes <- detect_regimes(
data = ts_data,
method = "threshold",
sensitivity = "medium"
)
regimes
#> # A tibble: 201 × 9
#> value time change id type magnitude confidence stability score
#> * <dbl> <dbl> <lgl> <int> <chr> <dbl> <lgl> <chr> <dbl>
#> 1 0 1 FALSE 1 none 0 NA Initial NA
#> 2 0.623 2 TRUE 2 threshold_me… 0.25 NA Unstable 0.225
#> 3 0.441 3 FALSE 2 none 0 NA Transiti… 0.35
#> 4 2.12 4 FALSE 2 none 0 NA Transiti… 0.475
#> 5 3.62 5 FALSE 2 none 0 NA Transiti… 0.6
#> 6 2.56 6 FALSE 2 none 0 NA Transiti… 0.725
#> 7 2.62 7 FALSE 2 none 0 NA Transiti… 0.6
#> 8 2.19 8 FALSE 2 none 0 NA Transiti… 0.475
#> 9 0.858 9 FALSE 2 none 0 NA Transiti… 0.35
#> 10 -0.158 10 FALSE 2 none 0 NA Unstable 0.225
#> # ℹ 191 more rowsThe columns value and time list the
original time series values and time points. The column
change shows when regime changes occur, and the
type describes the type of regime change (which depends on
the applied method). The id column provides the regime
identifiers. The column magnitude quantifies the magnitude
of the regime shift, and confidence is a method-dependent
measure on the likelihood of an actual regime shift. In addition regime
stability is described by stability along a stability score
provided in the score column. The resulting object is of
class regimes which has a customized plot method for
visualizing the stability of the regimes along the original time series
data.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.