The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider. If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
discovr:
Resources for Discovering Statistics Using R and RStudio (Field,
2023)
NOTE: This package is incomplete but under active
development. Check back here for updates/new tutorials.
The discovr package contains resources for my 2023
textbook Discovering Statistics
Using and
. There are
tutorials written using learnr. Once a tutorial is
running it’s a bit like reading a book but with places where you can
practice the code
that you have just been taught. The discovr package is free
and offered to support tutors and students using my textbook who want to
learn .
Installing discovr
To use discovr you first need to install
and
and
familiarise yourself with
,
and good
workflow practice. You can do this using this
interactive tutorial. Once you have installed
and
you can
install discovr. The package is in development so you have
to install it from github. To install the package execute (in
):
If you are trying to install on a networked computer the install
might fail (it’s to do install.packages not liking UNC paths, which I’m
not even going to pretend to understand). The solution is to specify the
location of your
library at the point of install. Most networks will map network
locations to a drive name (for example, at my own University, users
accounts are on the ‘N’ drive). Find the location of your
library (e.g.,
N:/Documents/R/win-library/3.5), possibly executing
.libPaths() to help you, and specify this location using
the lib argument:
I recommend working through this
playlist of tutorials on how to install, set up and work within
and
before
starting the interactive tutorials.
List of tutorials
discovr_01: Key concepts in
(functions and
objects, packages and functions, style, data types, tidyverse,
tibbles)
discovr_02: Summarizing data (frequency
distributions, grouped frequency distributions, relative frequencies,
histograms, mean, median, variance, standard deviation, interquartile
range)
discovr_03: Confidence intervals: interactive app
demonstrating what a confidence interval is, computing normal and
bootstrap confidence intervals using
, adding confidence
intervals to data summaries.
discovr_05: Visualizing data. The ggplot2 package,
boxplots, plotting means, violin plots, scatterplots, grouping by
colour, grouping using facets, adjusting scales, adjusting
positions.”
discovr_06: The beast of bias. Restructuring data
from messy to tidy format (and back). Spotting outliers using histograms
and boxplots. Calculating z-scores (standardizing scores).
Writing your own function. Using z-scores to detect outliers. Q-Q plots.
Calculating skewness, kurtosis and the number of valid cases. Grouping
summary statistics by multiple categorical/grouping variables.
discovr_07: Associations. Plotting data with
GGally. Pearson’s r, Spearman’s Rho, Kendall’s tau, robust
correlations.
discovr_08: The general linear model (GLM).
Visualizing the data, fitting GLMs with one and two predictors. Viewing
model parameters with broom, model parameters, standard errors,
confidence intervals, fit statistics, significance, Bayes factors and
Bayesian estimates (using default priors).
discovr_09: Categorical predictors with two
categories (comparing two means). Comparing two independent means,
comparing two related means, effect sizes, robust comparisons of means
(independent and related), Bayes factors and estimation (independent and
related means).
discovr_10: Moderation and mediation. Centring
variables (grand mean centring), specifying interaction terms,
moderation analysis, simple slopes analysis, Johnson-Neyman intervals,
mediation with one predictor, direct and indirect effects, mediation
using lavaan.
discovr_11: Comparing several means. Essentially
‘One-way independent ANOVA’ but taught using a general linear model
framework. Covers setting contrasts (dummy coding, contrast coding, and
linear and quadratic trends), the F-statistic and Welch’s
robust F, robust parameter estimation,
heteroscedasticity-consistent tests of parameters, robust tests of means
based on trimmed data, post hoc tests, Bayes factors.
discovr_12: Comparing means adjusted for other
variables. Essentially ‘Analysis of Covariance (ANCOVA)’ designs but
taught using a general linear model framework. Covers setting contrasts,
Type III sums of squares, the F-statistic, robust parameter
estimation, heteroscedasticity-consistent tests of parameters, robust
tests of adjusted means, post hoc tests, Bayes factors.
discovr_13: Factorial designs. Fitting models for
two-way factorial designs (independent measures) using both
lm() and the afex package. This tutorial
builds on previous ones to show how models can be fit with two
categorical predictors to look at the interaction between them. We look
at fitting the models, setting contrasts for the two categorical
predictors, obtaining estimated marginal means, interaction plots,
simple effects analysis, diagnostic plots, partial eta-squared and
partial omega-squared, robust models and Bayes factors.
discovr_14: Multilevel models. This tutorial looks
at fitting multilevel models using the lme4 package. It
begins with an optional section on checking and coding categorical
variables before moving on to show you how to fit and interpret a
multilevel model. We also briefly look at the purrr
package.
discovr_15: Repeated measures designs. Fitting
models for one- and two-way repeated measures designs using the
afex package. This tutorial builds on previous ones to show
how models can be fit with one or two categorical predictors when these
variables have been manipulated within the same entities. We look at
fitting the models, setting contrasts for the categorical predictors,
obtaining estimated marginal means, interaction plots, simple effects
analysis, diagnostic plots and robust models.
discovr_15_mlm: Repeated measures designs as
multilevel models. Fitting models for one- and two-way repeated measures
designs using a multilevel model framework. This tutorial builds on
previous ones to show how models can be fit with one or two categorical
predictors when these variables have been manipulated within the same
entities. We look at fitting the models, setting contrasts for the
categorical predictors and diagnostic plots.
discovr_15_growth: Modelling change over time.
Growth models using multilevel modelling.
discovr_16: Mixed designs. Fitting models for mixed
designs using the afex package. This tutorial builds on
previous ones to show how models can be fit with one or two categorical
predictors when at least one of these variables has been manipulated
within the same entities and at least one other has been manipulated
using different entities. We look at fitting the models, setting
contrasts for the categorical predictors, obtaining estimated marginal
means, interaction plots, simple effects analysis, diagnostic plots,
robust models and Bayes factors.
discovr_18: Exploratory Factor Analysis (EFA).
Applying factor analysis using the psych package. This
tutorial uses a fictitious questionnaire (the
Anxiety Scale, RAQ)
with 23 items to show how EFA can be used to identify clusters of items
that may, or may not, represent constructs associated with anxiety about
using . We look at
inspecting the correlation matrix, obtaining the Bartlett test and KMO
statistics, using parallel analysis to determine the number of factors
to extract, extracting factors, rotating the solution nd interpretation
of the factors. We also learn to obtain Cronbach’s alpha on each of the
subscales.
discovr_19: Categorical variables. Entering
categorical data, contingency tables, associations between categorical
variables, the chi-square test, standardized residuals, Fisher’s exact
test.
discovr_20: Categorical outcomes (logistic
regression). This tutorial builds on previous ones to show how the
general linear model model extends to situations where you want to
predict a binary outcome (logistic regression). We look at fitting the
models and interpretting the odds ratio.
Running a tutorial
In Version
1.3 onwards there is a tutorial pane. Having executed
library(discovr)
A list of tutorials appears in this pane. Scroll through them and
click on the
button to run the tutorial:
Alternatively, to run a particular tutorial from the console
execute:
The name of each tutorial is in bold in the list above. Once the
command to run the tutorial is executed it will spring to life in a web
browser.
Suggested workflow
The tutorials are self-contained (you practice code in code boxes) so
you don’t need to use
at the same
time. However, to get the most from them I would recommend that you
create an
project and within that open (and save) a new RMarkdown file each time
to work through a tutorial. Within that Markdown file, replicate parts
of the code from the tutorial (in code chunks) and use Markdown to write
notes about what you have done, and to reflect on things that you have
struggled with, or note useful tips to help you remember things.
Basically, write a learning journal. This workflow has the advantage of
not just teaching you the code that you need to do certain things, but
also provides practice in using
itself.
See this video explaining my suggested workflow:
Colour palettes
Inspired by the rockthemes
package and adapting code form that package I have come up with a bunch
of colour themes based around the studio albums of my favourite band Iron Maiden. Full disclosure, I’m
not a designer, so this largely involved uploading images of their
sleeves to colorpalettefromimage.com
and seeing what happened. If you have a better palette design send me
the hex codes for the colours! If you’re wondering why some albums are
missing, here’s the explanation: X Factor (would basically be 8 shades
of gray), Fear of the Dark (shit album), The Book of Souls (would
basically be 8 shades of black).
There is also a colourblind-friendly colour pallette from
The following palettes exist.
amolad_pal(): Colour palette (8 colour) based on Iron
Maiden’s A
Matter of Life and Death album sleeve. In ggplot2 use
scale_color_amolad() and
scale_fill_amolad().
bnw_pal(): Colour palette (8 colour) based on Iron
Maiden’s Brave
New World album sleeve. In ggplot2 use
scale_color_bnw() and scale_fill_bnw().
dod_pal(): Colour palette (8 colour) based on Iron
Maiden’s Dance of
Death album sleeve. In ggplot2 use
scale_color_dod() and scale_fill_dod().
frontier_pal(): Colour palette (8 colour) based on Iron
Maiden’s The
Final Frontier album sleeve. In ggplot2 use
scale_color_frontier() and
scale_fill_frontier().
im_pal(): Colour palette (8 colour) based on Iron
Maiden’s eponymous
album sleeve. In ggplot2 use scale_color_im()
and scale_fill_im().
killers_pal(): Colour palette (8 colour) based on Iron
Maiden’s Killers
album sleeve. In ggplot2 use
scale_color_killers() and
scale_fill_killers().
nob_pal(): Colour palette (8 colour) based on Iron
Maiden’s The
Number of the Beast album sleeve. In ggplot2 use
scale_color_nob() and scale_fill_nob().
okabe_ito_pal: Colourblind-friendly palette (8 colour)
from Okabe and Ito. In
ggplot2 use scale_color_oi() and
scale_fill_oi().
pom_pal(): Colour palette (8 colour) based on Iron
Maiden’s Piece of
Mind album sleeve. In ggplot2 use
scale_color_pom() and scale_fill_pom().
power_pal(): Colour palette (8 colour) based on Iron
Maiden’s Powerslave
album sleeve. In ggplot2 use
scale_color_power() and
scale_fill_power().
prayer_pal(): Colour palette (8 colour) based on Iron
Maiden’s No
Prayer for the Dying album sleeve. Use
scale_color_prayer() and
scale_fill_prayer().
senjutsu_pal(): Colour palette (10 colour) based on the
inner gatefold image of Iron Maiden’s Senjutsu
album album sleeve. In ggplot2 use
scale_color_senjutsu() and
scale_fill_senjutsu().
sit_pal(): Colour palette (8 colour) based on Iron
Maiden’s Somewhere
in Time album sleeve. In ggplot2 use
scale_color_sit() and scale_fill_sit().
ssoass_pal(): Colour palette (8 colour) based on Iron
Maiden’s Seventh
Son of a Seventh Son album sleeve. In ggplot2 use
scale_color_ssoass() and
scale_fill_ssoass().
virtual_pal(): Colour palette (8 colour) based on Iron
Maiden’s Virtual
IX album sleeve. In ggplot2 use
scale_color_virtual() and
scale_fill_virtual().
To view the palette execute
scales::show_col(name_of_palette()(8))
Replacing name_of_palette() with the name, for
example
scales::show_col(pom_pal()(8))
To apply, for example, the Powerslave palette to the colours of a
ggplot2 plot add scale_color_power() as a
layer:
library(ggplot2)# Get albums in the classic era from the discovr::eddiefy data.# I'm not including fear of the dark because it's not in any way classic.# No prayer for the dying was pushing its luck too if I'm honest.classic_era <-subset(discovr::eddiefy, year <1992)#> Loading required package: lubridate#> #> Attaching package: 'lubridate'#> The following objects are masked from 'package:base':#> #> date, intersect, setdiff, unionggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +geom_point(size =2) + discovr::scale_color_power() +theme_minimal()
Similarly to apply the Powerslave palette to the fill of objects in a
ggplot add scale_fill_power() as a layer:
ggplot(classic_era, aes(x = album_name, y = valence, fill = album_name)) +geom_violin() + discovr::scale_fill_power() +theme(axis.text.x =element_text(angle =45)) +theme_minimal()
Datasets
See the book or data descriptions for more details. This is a list of
available datasets within the package. Raw CSV files are available from
the book’s website.
acdc: Data about whether Bon Scott or Brian Johnson
is the best singer of AC/DC. For details execute
?acdc.
album_sales: fictitious data about predicting album
sales from advertising, airplay and the band’s image. For details
execute ?album_sales.
alien_scents: fictitious data about training
sniffer dogs to detect alien space lizards when they try to mask their
identity with different scents. For details execute
?alien_scents.
animal_bride: fictitious data about life
satisfaction when married to a dog or a goat. For details execute
?animal_bride.
angry_pigs: fictitious data about whether playing
the video game angry pigs makes people more aggressive towards pigs. For
details execute ?angry_pigs.
angry_real: fictitious data about whether playing
the video game angry pigs makes people more aggressive in everyday life.
For details execute ?angry_real.
animal_dance: Fictitious data about training cats
and dogs to dance.?animal_dance
beckham_1929: Data from a study by Beckham (1929).
For details execute ?beckham_1929.
big_hairy_spider: fictitious data about whether
anxiety is greater after exposure to real spiders or pictures of
spiders. For details execute ?big_hairy_spider.
biggest_liar: fictitious data about creativity and
telling lies. For details execute ?biggest_liar.
bronstein_2019: Data about whether delusion
proneness predicts belief in fake news because of less analytic
thinking. For details execute ?bronstein_2019.
bronstein_miss_2019: The data in [bronstein_2019]
but with missing values inserted using MCAR amputation. For details
execute ?bronstein_miss_2019.
catterplot: fictitious data for plotting a
catterplot. For details execute ?catterplot.
cat_dance: fictitious data about training cats to
dance. For details execute ?cat_dance.
cat_reg: fictitious data about training cats to
dance. For details execute ?cat_reg.
cetinkaya_2006: data from a study by Cetinkaya and
Domjan (2006) about quails with sexual fetishes. Seriously. For details
execute ?cetinkaya_2006.
chamorro_premuzic: Data about what students want
(personality wise) from their lecturers. For details execute
?chamorro_premuzic.
child_aggression: fictitious data (based on real
research) about predicting aggression in children. For details execute
?child_aggression.
coldwell_2006: Data predicting childhood adjustment
from various parenting variables. For details execute
?coldwell_2006.
cosmetic: Fictitious multilevel data predicting
quality of life from cosmetic surgery. For details execute
?cosmetic.
daniels_2012: Data about the effects of sexualised
sports images on self-image. For details execute
?daniels_2012.
dark_lord: fictitious data about the subliminal
messages in songs. For details execute ?dark_lord.
davey_2003: Data about the effects mood and stop
rules on checking behaviour. For details execute
?davey_2003.
download: fictitious data about the download music
festival and being smelly. For details execute
?download.
df_beta: fictitious data used to illustrate the DF
Beta statistic. For details execute ?df_beta.
eel: Fictitious data about a randomized control
trial to test whether eel therapy is an effective treatment of
constipation. For details execute ?eel.
elephooty: Fictitious data about elephants playing
football (soccer). For details execute ?elephooty.
escape: Fictitious data about whether I’m a better
songwriter than my old bandmate Malcolm ?escape.
essay_marks: fictitious data about essay marking.
For details execute ?essay_marks.
exam_anxiety: fictitious data about exam
performance, anxiety and revision. For details execute
?exam_anxiety.
field_2006: Data that tests a hypothesis that
threat information affects children’s avoidance of novel animals. For
details execute ?field_2006.
gallup_2003: Data that tests a hypothesis about why
penises have a bell end. For details execute
?gallup_2003.
gelman_2009: Data used to critically evaluate the
explanations (and claim) that there are more beautiful women than men in
the world. For details execute ?gelman_2009.
glastonbury: More fictitious data about music
festivals and being smelly. For details execute
?glastonbury.
goggles: fictitious data about whether alcohol
affects perception of physical attractiveness. For details execute
?goggles.
goggles_lighting: fictitious data about the
moderating effect of lighting on the ratings of attractivenesses of
faces after different doses of alcohol. For details execute
?goggles_lighting.
grades: fictitious data about statistics grades.
For details execute ?grades.
hangover: fictitious data about the efficacy of
different drinks as cures for a hangover. For details execute
?hangover
hiccups: fictitious data on digital rectal
stimulation and hiccups. For details execute ?hiccups.
hill_2007: Data from Hill et al. (2007) testing the
effect of different forms of psychoeducation on exercise behaviour. For
details execute ?hill_2007.
honesty_lab: fictitious data about perceptions of
honesty. For details execute ?honesty_lab.
ice_bucket: Data about the ice bucket challenge.
For details execute ?ice_bucket.
invisibility_base: Fictitious data about how much
mischief people would get up to if they had an invisibility cloak using
a pre-post study design.?invisibility_base
invisibility_cloak: fictitious data about how much
mischief people would get up to if they had an invisibility cloak using
an independent design. For details execute
?invisibility_cloak.
invisibility_rm: fictitious data about how much
mischief people would get up to if they had an invisibility cloak but
using a repeated measures design. For details execute
?invisibility_rm.
jiminy_cricket: fictitious data about whether
wishing on a star makes you successful. For details execute
?jiminy_cricket.
johns_2012: Data about whether the colour red is a
mating signal to men. For details execute ?johns_2012.
lambert_2012: Data about whether pornography use is
related to relationhsip commitment and infidelity. For details execute
?lambert_2012.
massar_2012: Data about whether gossiping has an
evolutionary function. For details execute
?massar_2012.
mcnulty_2008: Simulated data to match the results
of a study about whether attractivenes sis linked to the support given
within a relationship. For details execute
?mcnulty_2008.
men_dogs: fictitious data about whether men exhibit
dog-like behaviours (compared to dogs). For details execute
?men_dogs.
metal: Fictitious data about whether listening to
metal music makes you angry ?metal.
metal_health: fictitious data about whether
listening to heavy metal negatively affects mental health. For details
execute ?metal_health.
metallica: Data about thrash metal band, Metallica.
For details execute ?metallica.
miller_2007: Data from Miller et al. (2007) testing
the hidden-estrus theory. For details execute
?miller_2007.
mixed_attitude: Fictitious data about whether
different type of imagery in advertising affect ratings of different
types of drinks based on the gender identity of the participant. For
details execute ?mixed_attitude.
murder: fictitious data about the number of murder
each month at three street locations (Ruskin Avenue, Acacia Avenue and
Rue Morgue). For details execute ?murder.
muris_2008: Data about whether you can train
children to interpret ambiguous situations in a particular way. For
details execute ?muris_2008.
nichols_2004: Data from the development of the
Internet Addiction Scale, IAS (Nichols & Nicki, 2004). For details
execute ?nichols_2004.
notebook: fictitious data about whether watching
the film the notebook is emotionally arousing. For details execute
?notebook.
ocd: Fictitious data about interventions for
obsessive compulsive disorder. For details execute
?ocd.
ong_2011: Data about social media profile pictures
and personality traits. For details execute ?ong_2011.
ong_tidy: Data about social media profile pictures
and personality traits. For details execute ?ong_tidy.
penalty: Fictitious data about predictors of
penalty kick success in soccer (or whatever sport you enjoy). For
details execute ?penalty.
profile_pic: Fictitious data related to whether the
number of friend requests from random people on social media is affected
by whether your profile picture depicts you as single or part of a
romantic couple. For details execute ?profile_pic.
pubs: Data illustrating the difference between an
outlier and an influencial case. For details execute
?pubs.
puppies: Fictitious data related to whether puppy
therapy works. For details execute ?puppies.
puppy_love: Fictitious data related to whether
puppy therapy works when you adjust for a person’s love of puppies. For
details execute ?puppy_love.
r_exam: Fictitious data relating to an R exam at
two universities. For details execute ?r_exam.
reality_tv: Fictitious data relating to whether
being on a reality TV show exacerbates personality disorder traits. For
details execute ?reality_tv.
raq: Fictitious data relating to a fictional
questionnaire about R anxiety that is not an actual questionnaire. For
details execute ?raq.
roaming_cats: fictitious data about how far cats
roam from their homes. For details execute
?roaming_cats.
rollercoaster: Fictitious data about how
roller-coaster induced fear affects attractiveness ratings. For details
execute ?rollercoaster.
santas_log: Fictitious data related to whether the
type and quantity of treat consumed on Christmas night affects whether
elves successfully deliver presents. For details execute
?santas_log.
self_help: fictitious data about whether self-help
books improve relationship satisfaction. For details execute
?self_help.
self_help_dsur: fictitious data about whether
self-help books improve relationship satisfaction compared to statistics
books. For details execute ?self_help_dsur.
sharman_2015: Data from Sharman & Dingle (2015)
about whether listening to metal music increases anger
?sharman_2015.
shopping: fictitious data about shopping For
details execute ?shopping_exercise.
sniffer_dogs: fictitious data about training
sniffer dogs to detect alien space lizards. For details execute
?sniffer_dogs.
social_anxiety: fictitious (I think) data about
whether social anxiety symptoms are specific to social anxiety. For
details execute ?social_anxiety.
social_media: fictitious data about the effects of
social media on grammar. For details execute
?social_media.
soya: fictitious data about the effects of eating
soya on sperm count. For details execute ?soya.
speed_date: Fictitious data related to the extent
to which interest in dating someone is affected by their looks,
personality or the dating strategy they adopt. For details execute
?speed_date.
stalker: fictitious data about therapy for
stalking. For details execute ?stalker.
students: I can’t even remember what this data file
contains. For details execute ?student.
superhero: fictitious data about whether wearing
different superhero costumes leads to more severe physical injuries. For
details execute ?superhero.
supermodel: fictitious data about supermodel
salaries. For details execute ?supermodel.
switch: Fictitious data relating to whether
injuries from playing video console games can be mitigated by a warm
up.?switch
tablets: fictitious data about predicting the
desirability of computing tablets. For details execute
?tablets.
tea_15: fictitious data based on real data about
cognitive functioning and drinking tea. For details execute
?tea_makes_you_brainy_15.
tea_716: fictitious data based on real data about
cognitive functioning and drinking tea. For details execute
?tea_makes_you_brainy_716.
teaching: fictitious data about the success of
different methods of teaching. For details execute
?teaching.
teach_method: more fictitious data about the
success of different methods of teaching. For details execute
?teach_method.
text_messages: fictitious data about whether use of
messaging apps ruins your grammar. For details execute
?text_messages.
tosser: Fictitious data relating to a fictional
questionnaire about The Teaching of Statistics for Scientific
Experiments, which is fictional. For details execute
?tosser.
tuk_2011: Data about whether needing to urinate
helps decision making. For details execute ?tuk_2011.
tumour: fictitious data about mobile phone use and
brain tumours. For details execute ?tumour.
tutor_marks: fictitious data comparing 4 tutors
marks of the same essays. For details execute
?tutor_marks.
van_bourg_2020: Data from van Bourg et al (2020)
relating to whether dogs would release their distressed owners from a
box. For details execute ?van_bourg_2020.
video_games: fictitious data about the relationship
between video game use, callous unemotional traits and aggression. For
details execute ?video_games.
williams: Data relating to the development of a
questionnaire to measure organizational ability. For details execute
?williams
xbox: Fictitious data relating injuries to the type
of video console game played and the console it was played on. For
details execute ?xbox.
zhang_sample: Data about whether performing a maths
test under a different name assists performance. For details execute
?zhang_2013_subsample.
zibarras_2008: Data from Zibarras, Port, and Woods
(2008) relating to the relationship between personality and creativity.
For details execute ?zibarras_2008.
zombie_growth: fictitious data that mimics a
randomised control trial over time testing an intervention to transform
zombies back to their pre-zombified state. For details execute
?zombie_growth.
zombie_rehab: fictitious data that mimics a
randomised control trial testing an intervention to transform zombies
back to their pre-zombified state in different clinics. For details
execute ?zombie_rehab.
Smart Alex solutions
Solutions for end of chapter tasks are available at www.discovr.rocks.
Labcoat Leni solutions
Solutions for the Labcoat Leni tasks are available at www.discovr.rocks.
Chapter code
Although I recommend working through the interactive solutions, each
book Chapter has online code and a downloadable R Markdown file
available from www.discovr.rocks.
These binaries (installable software) and packages are in development. They may not be fully stable and should be used with caution. We make no claims about them.