Our Experiment: Each eyetrackingR vignette uses the eyetrackingR package to analyze real data from a simple 2-alternative forced choice (2AFC) word recognition task administered to 19- and 24-month-olds. On each trial, infants were shown a picture of an animate object (e.g., a horse) and an inanimate object (e.g., a spoon). After inspecting the images, they disappeared and they heard a label referring to one of them (e.g., “The horse is nearby!”). Finally, the objects re-appeared on the screen and they were prompted to look at the target (e.g., “Look at the horse!”).

Overview of this vignette

In this vignette, we want to ascertain when a predictor had a significant effect during a trial. Analyses that aggregate over the trial window tell us whether an effect was significant, growth curve analyses tell us the trajectory of our effect over the course of the trial, and onset-contingent analyses can tell you reaction times for certain experimental designs. But none of these approaches allow you to ask: What is the onset of some predictor’s effect, and how long does the effect last? eyetrackingR includes two types of analyses for answering these questions, both of which we cover here.

prerequisites

Before performing this analysis, we’ll need to prepare and clean our dataset. Here we will to do this quickly and with few notes but, for more information, see the vignette on preparing your data.

set.seed(42)

library("Matrix")
library("lme4")
library("ggplot2")
library("eyetrackingR")

data("word_recognition")
data <- make_eyetrackingr_data(word_recognition, 
                       participant_column = "ParticipantName",
                       trial_column = "Trial",
                       time_column = "TimeFromTrialOnset",
                       trackloss_column = "TrackLoss",
                       aoi_columns = c('Animate','Inanimate'),
                       treat_non_aoi_looks_as_missing = TRUE
                )

# subset to response window post word-onset
response_window <- subset_by_window(data, 
                                    window_start_time = 15500, 
                                    window_end_time = 21000, 
                                    rezero = FALSE)
## Avg. window length in new data will be 5500
# analyze amount of trackloss by subjects and trials
(trackloss <- trackloss_analysis(data = response_window))

# remove trials with > 25% of trackloss
response_window_clean <- clean_by_trackloss(data = response_window,
                                            trial_prop_thresh = .25)
## Performing Trackloss Analysis...
## Will exclude trials whose trackloss proportion is greater than : 0.25
##  ...removed  33  trials.
# create Target condition column
response_window_clean$Target <- as.factor( ifelse(test = grepl('(Spoon|Bottle)', response_window_clean$Trial), 
                                       yes = 'Inanimate', 
                                       no  = 'Animate') )

Bootstrapped smoothed divergence analysis

Our first approach is to look for runs of significant differences between our conditions after smoothing the data in a series of time bins (similar to Wendt et al., 2014). This involves:

  1. Fitting a smoothing curve to the data (choosing either smooth.spline(), loess(), or no smoother)
  2. Bootstrap-resampling these fits to build a distribution
  3. Obtaining the 95% confidence intervals of this distribution in order to determine which time-bins differ significantly.

This is a useful technique for estimating the timepoints of divergence between two conditions, while the smoothing helps remove minor deviations that might disrupt what would otherwise be considered a single divergent period. This can be especially helpful in infant data, which can be extremely noisy. Note that this approach does not explicitly control for Type-I error rates (i.e., it’s not a replacement for something like Bonferroni correction), and it can only deal with two-level factors.

This method returns a list of divergences between your two conditions based on time windows in which the 95% confidence intervals did not include 0 (i.e., p < .05).

To begin, we need to use make_time_sequence_data to generate a time-binned dataframe. The bootstrap analysis we’ll be doing requires that we summarize our data–traditionally, we’d want a by-participants summary (but in some circumstances, you might want by-items as well).

response_time <- make_time_sequence_data(response_window_clean,
                                  time_bin_size = 100, 
                                  predictor_columns = c("Target"),
                                  aois = "Animate",
                                  summarize_by = "ParticipantName"
                            )

# visualize timecourse
plot(response_time, predictor_column = "Target") + 
  theme_light() +
  coord_cartesian(ylim = c(0,1))

We can then use make_boot_splines_data to resample (samples times) from this dataset and fit a smoother to each sample:

bootstrapped_familiar <- make_boot_splines_data(response_time, 
                                              predictor_column = 'Target', 
                                              within_subj = TRUE, 
                                              samples = 1000, 
                                              alpha = .05,
                                              smoother = "smooth.spline") 

We can then plot this curve. Because this is a within-subjects design, we get a single curve (and CI) corresponding to the difference between conditions. Within between-subjects designs, you’ll get two curves (and CIs) corresponding to the estimate of each condition.

plot(bootstrapped_familiar)
## Plotting within-subjects differences...

Finally, we can look at each timepoint to see whether the CIs include 0 and look for runs of significant timepoints – i.e., divergences.

bootstrap_analysis_familiar <- analyze_boot_splines(bootstrapped_familiar)
summary(bootstrap_analysis_familiar)
## Divergences:
## 1:   15900 - 21000

This analysis suggests that the effect is significant for virtually the entire time-window, starting as early as 15900ms.

Bootstrapped cluster-based permutation analysis

Our second approach is to perform a different type of bootstrapping analyses, referred to as a cluster-based permutation analysis (Maris & Oostenveld, 2007). This analysis takes a summed statistic for each cluster of time bins that pass some level of significance, and compares each to the “null” distribution of sum statistics (obtained by bootstrap resampling data within the largest of the clusters).

This type of analysis should often give similar results to the above, but it has two main advanatges.

  1. It naturally controls the false-alarm rate while sacrificing little sensitivity.
  2. The implementation in eyetrackingR allows you to use this method with a variety of statistical techniques (t.test, wilcox.test, lm, and lmer), so that continuous predictors, covariates, etc. can also be included in the model being tested.

Here’s the procedure eyetrackingR implements under the hood:

  1. Runs a statistical test on each time-bin of your data.
  2. Take the time-bins whose test passed the a threshold statistic (e.g., t > 2.26), and group them by adjacency. We will call these time-clusters.
  3. For each time-cluster, calculate the sum of the statistics for the time-bins inside it.
  4. Take the data inside the largest of these clusters. Randomly shuffle it.
  5. Recalculate the sum-statistic for each of the shuffled datasets you sampled in (4).
  6. Repeat steps (4) and (5) hundreds of times. This will lead to a distribution of summed-statistics, each representing the results of a statistical test on shuffled data. Intuitively, this distribution represents what kind of sum-statistics we would expect to get in a cluster by chance, if no effect were present (i.e., if the data were just randomly shuffled).
  7. Compare the cluster-statistics from step (3) to the distribution found in (6) to obtain a p-value. So, for example, imagine we get a distribution of sum-statistics, and 6.4% of the sum-statistics in this distribution are greater than the sum-statistic of our cluster, then our p-value for this cluster is p = .064.

To perform this procedure using eyetrackingR, we’ll need to make a time_sequence_data dataframe. In this case, we already made above (response_window).

Next, we’ll set the threshold for the t-statistic that will be considered a divergence. Note that this is just a threshold for grouping samples into initial clusters – lowering this threshold parameter won’t make your divergences any more “significant” or increase your Type I error rate. Here we’ll set the threshold based on the appropriate t-statistic for a sample of this size and an alpha of .05.

alpha = .05
num_sub = length(unique((response_window_clean$ParticipantName)))
threshold_t = qt(p = 1 - alpha/2, 
                 df = num_sub-1) # pick threshold t based on alpha = .05 two tailed

We can then look for initial clusters:

df_timeclust <- make_time_cluster_data(response_time, 
                                      test= "t.test",
                                      paired = TRUE,
                                      predictor_column = "Target", 
                                      threshold = threshold_t) 
## Computing t.test for each time bin...
plot(df_timeclust) +
  ylab("T-Statistic")

summary(df_timeclust)
## Test Type:    t.test 
## Predictor:    Target 
## Formula:  Prop ~ Target 
## Cluster 1  ===== 
##  Time:        16100 - 19300 
##  Sum Statistic:   132.299 
## Cluster 2  ===== 
##  Time:        19400 - 20800 
##  Sum Statistic:   42.31067

Finally, bootstrap p-values for each of these clusters by asking how likely we are to see a cluster with that “Sum Statistic” given a permutation-based null distribution derived from our largest cluster.

clust_analysis <- analyze_time_clusters(df_timeclust, within_subj = TRUE, paired=TRUE, 
                                        samples=100) # in practice, you should use a lot more
## Using formula that was supplied in 'make_time_cluster_data'.
summary(clust_analysis)
## Test Type:    t.test 
## Predictor:    Target 
## Formula:  Prop ~ Target 
## Null Distribution ===== 
##  Mean:        -3.8066 
##  SD:      25.6008 
## Cluster 1  ===== 
##  Time:        16100 - 19300 
##  Sum Statistic:   132.299 
##  Probability:     0 
## Cluster 2  ===== 
##  Time:        19400 - 20800 
##  Sum Statistic:   42.3107 
##  Probability:     0.01
plot(clust_analysis)

References

Maris, E., Oostenveld, R., (2007). Nonparametric statistical testing of EEG- and MEG-data. Journal of Neuroscience Methods 164 (1), 177–190.

Wendt, D., Brand, T., & Kollmeier, B. (2014). An Eye-Tracking Paradigm for Analyzing the Processing Time of Sentences with Different Linguistic Complexities. PLoS ONE, 9(6), e100186. http://doi.org/10.1371/journal.pone.0100186.t003