Our Experiment: Each eyetrackingR vignette uses the eyetrackingR package to analyze real data from a simple 2-alternative forced choice (2AFC) word recognition task administered to 19- and 24-month-olds. On each trial, infants were shown a picture of an animate object (e.g., a horse) and an inanimate object (e.g., a spoon). After inspecting the images, they disappeared and they heard a label referring to one of them (e.g., “The horse is nearby!”). Finally, the objects re-appeared on the screen and they were prompted to look at the target (e.g., “Look at the horse!”).
In this vignette, we want to ascertain when a predictor had a significant effect during a trial. Analyses that aggregate over the trial window tell us whether an effect was significant, growth curve analyses tell us the trajectory of our effect over the course of the trial, and onset-contingent analyses can tell you reaction times for certain experimental designs. But none of these approaches allow you to ask: What is the onset of some predictor’s effect, and how long does the effect last? eyetrackingR includes two types of analyses for answering these questions, both of which we cover here.
Before performing this analysis, we’ll need to prepare and clean our dataset. Here we will to do this quickly and with few notes but, for more information, see the vignette on preparing your data.
set.seed(42)
library("Matrix")
library("lme4")
library("ggplot2")
library("eyetrackingR")
data("word_recognition")
data <- make_eyetrackingr_data(word_recognition,
participant_column = "ParticipantName",
trial_column = "Trial",
time_column = "TimeFromTrialOnset",
trackloss_column = "TrackLoss",
aoi_columns = c('Animate','Inanimate'),
treat_non_aoi_looks_as_missing = TRUE
)
# subset to response window post word-onset
response_window <- subset_by_window(data,
window_start_time = 15500,
window_end_time = 21000,
rezero = FALSE)
## Avg. window length in new data will be 5500
# analyze amount of trackloss by subjects and trials
(trackloss <- trackloss_analysis(data = response_window))
# remove trials with > 25% of trackloss
response_window_clean <- clean_by_trackloss(data = response_window,
trial_prop_thresh = .25)
## Performing Trackloss Analysis...
## Will exclude trials whose trackloss proportion is greater than : 0.25
## ...removed 33 trials.
# create Target condition column
response_window_clean$Target <- as.factor( ifelse(test = grepl('(Spoon|Bottle)', response_window_clean$Trial),
yes = 'Inanimate',
no = 'Animate') )
Our first approach is to look for runs of significant differences between our conditions after smoothing the data in a series of time bins (similar to Wendt et al., 2014). This involves:
smooth.spline()
, loess()
, or no smoother)This is a useful technique for estimating the timepoints of divergence between two conditions, while the smoothing helps remove minor deviations that might disrupt what would otherwise be considered a single divergent period. This can be especially helpful in infant data, which can be extremely noisy. Note that this approach does not explicitly control for Type-I error rates (i.e., it’s not a replacement for something like Bonferroni correction), and it can only deal with two-level factors.
This method returns a list of divergences between your two conditions based on time windows in which the 95% confidence intervals did not include 0 (i.e., p < .05).
To begin, we need to use make_time_sequence_data
to generate a time-binned dataframe. The bootstrap analysis we’ll be doing requires that we summarize our data–traditionally, we’d want a by-participants summary (but in some circumstances, you might want by-items as well).
response_time <- make_time_sequence_data(response_window_clean,
time_bin_size = 100,
predictor_columns = c("Target"),
aois = "Animate",
summarize_by = "ParticipantName"
)
# visualize timecourse
plot(response_time, predictor_column = "Target") +
theme_light() +
coord_cartesian(ylim = c(0,1))
We can then use make_boot_splines_data
to resample (samples
times) from this dataset and fit a smoother to each sample:
bootstrapped_familiar <- make_boot_splines_data(response_time,
predictor_column = 'Target',
within_subj = TRUE,
samples = 1000,
alpha = .05,
smoother = "smooth.spline")
We can then plot this curve. Because this is a within-subjects design, we get a single curve (and CI) corresponding to the difference between conditions. Within between-subjects designs, you’ll get two curves (and CIs) corresponding to the estimate of each condition.
plot(bootstrapped_familiar)
## Plotting within-subjects differences...
Finally, we can look at each timepoint to see whether the CIs include 0 and look for runs of significant timepoints – i.e., divergences.
bootstrap_analysis_familiar <- analyze_boot_splines(bootstrapped_familiar)
summary(bootstrap_analysis_familiar)
## Divergences:
## 1: 15900 - 21000
This analysis suggests that the effect is significant for virtually the entire time-window, starting as early as 15900ms.
Our second approach is to perform a different type of bootstrapping analyses, referred to as a cluster-based permutation analysis (Maris & Oostenveld, 2007). This analysis takes a summed statistic for each cluster of time bins that pass some level of significance, and compares each to the “null” distribution of sum statistics (obtained by bootstrap resampling data within the largest of the clusters).
This type of analysis should often give similar results to the above, but it has two main advanatges.
t.test
, wilcox.test
, lm
, and lmer
), so that continuous predictors, covariates, etc. can also be included in the model being tested.Here’s the procedure eyetrackingR implements under the hood:
To perform this procedure using eyetrackingR, we’ll need to make a time_sequence_data
dataframe. In this case, we already made above (response_window
).
Next, we’ll set the threshold for the t-statistic that will be considered a divergence. Note that this is just a threshold for grouping samples into initial clusters – lowering this threshold
parameter won’t make your divergences any more “significant” or increase your Type I error rate. Here we’ll set the threshold based on the appropriate t-statistic for a sample of this size and an alpha of .05.
alpha = .05
num_sub = length(unique((response_window_clean$ParticipantName)))
threshold_t = qt(p = 1 - alpha/2,
df = num_sub-1) # pick threshold t based on alpha = .05 two tailed
We can then look for initial clusters:
df_timeclust <- make_time_cluster_data(response_time,
test= "t.test",
paired = TRUE,
predictor_column = "Target",
threshold = threshold_t)
## Computing t.test for each time bin...
plot(df_timeclust) +
ylab("T-Statistic")
summary(df_timeclust)
## Test Type: t.test
## Predictor: Target
## Formula: Prop ~ Target
## Cluster 1 =====
## Time: 16100 - 19300
## Sum Statistic: 132.299
## Cluster 2 =====
## Time: 19400 - 20800
## Sum Statistic: 42.31067
Finally, bootstrap p-values for each of these clusters by asking how likely we are to see a cluster with that “Sum Statistic” given a permutation-based null distribution derived from our largest cluster.
clust_analysis <- analyze_time_clusters(df_timeclust, within_subj = TRUE, paired=TRUE,
samples=100) # in practice, you should use a lot more
## Using formula that was supplied in 'make_time_cluster_data'.
summary(clust_analysis)
## Test Type: t.test
## Predictor: Target
## Formula: Prop ~ Target
## Null Distribution =====
## Mean: -3.8066
## SD: 25.6008
## Cluster 1 =====
## Time: 16100 - 19300
## Sum Statistic: 132.299
## Probability: 0
## Cluster 2 =====
## Time: 19400 - 20800
## Sum Statistic: 42.3107
## Probability: 0.01
plot(clust_analysis)
Maris, E., Oostenveld, R., (2007). Nonparametric statistical testing of EEG- and MEG-data. Journal of Neuroscience Methods 164 (1), 177–190.
Wendt, D., Brand, T., & Kollmeier, B. (2014). An Eye-Tracking Paradigm for Analyzing the Processing Time of Sentences with Different Linguistic Complexities. PLoS ONE, 9(6), e100186. http://doi.org/10.1371/journal.pone.0100186.t003