The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
When working with large eyeris
databases containing
millions of eye-tracking data points, traditional export methods can run
into memory limitations or create unwieldy files. The chunked database
export functionality in eyeris
provides an out-of-the-box
solution for handling really large eyerisdb
databases
by:
CSV
and Parquet
formats for optimal performanceThis vignette walks through how to use these features after you’ve
created an eyerisdb
database using
bidsify(db_enabled = TRUE)
.
Before using the chunked export functions, you need:
eyerisdb
database created with
bidsify(db_enabled = TRUE)
arrow
package installed (for Parquet support):
install.packages("arrow")
(arrow
is included
when installing eyeris
from CRAN)The easiest way to export your entire database is with
eyeris_db_to_chunked_files()
:
result <- eyeris_db_to_chunked_files(
bids_dir = "/path/to/your/bids/directory",
db_path = "my-project" # your database name
)
# view what was exported
print(result)
Using the eyeris_db_to_chunked_files()
function
defaults, this will: - Process 1 million rows
at a time
(i.e., the default chunk size) - Create files up to 500MB
each (i.e., the default max file size) - Export all data types found in
your database - Save files to
bids_dir/derivatives/eyerisdb_export/my-proj/
The function creates organized output files:
derivatives/eyerisdb_export/my-proj/
├── my-proj_timeseries_chunked_01.csv # Single file (< 500MB)
├── my-proj_events_chunked_01-of-02.csv # Multiple files due to size
├── my-proj_events_chunked_02-of-02.csv
├── my-proj_confounds_summary_goal_chunked_01.csv # Grouped by schema
├── my-proj_confounds_summary_stim_chunked_01.csv # Different column structure
├── my-proj_confounds_events_chunked_01.csv
├── my-proj_epoch_summary_chunked_01.csv
└── my-proj_epochs_pregoal_chunked_01-of-03.csv # Epoch-specific data
You can customize the maximum file size to create smaller, more manageable files:
# Create smaller files for easy distribution
result <- eyeris_db_to_chunked_files(
bids_dir = "/path/to/bids",
db_path = "large-project",
max_file_size_mb = 100, # 100MB files instead of 500MB
chunk_size = 500000 # Process 500k rows at a time
)
This is particularly useful when: - Uploading to cloud storage with size/transfer bandwidth limits - Sharing data via email or file transfer services - Working with limited storage space
For large databases, you may only need certain types of data:
# Export only pupil timeseries and events
result <- eyeris_db_to_chunked_files(
bids_dir = "/path/to/bids",
db_path = "large-project",
data_types = c("timeseries", "events"),
subjects = c("sub-001", "sub-002", "sub-003") # Specific subjects only
)
Available data types typically include: - timeseries
-
Preprocessed eye-tracking pupil data - events
-
Experimental events - epochs
- Epoched data around
events
- confounds_summary
- Confound variables by epoch -
blinks
- Detected blinks
For better performance and compression, use Parquet format:
result <- eyeris_db_to_chunked_files(
bids_dir = "/path/to/bids",
db_path = "large-project",
file_format = "parquet",
max_file_size_mb = 200
)
Parquet advantages: - Smaller file sizes (often
50-80% smaller than CSV) - Faster reading with
arrow::read_parquet()
- Better data types
(preserves numeric precision) - Column-oriented storage
for analytics
When files are split due to size limits, you can recombine them:
# Find all parts of a split dataset
files <- list.files(
"path/to/eyerisdb_export/my-project/",
pattern = "timeseries_chunked_.*\\.csv$",
full.names = TRUE
)
# Read and combine all parts
combined_data <- do.call(rbind, lapply(files, read.csv))
# Or use the built-in helper function
combined_data <- read_eyeris_parquet(
parquet_dir = "path/to/eyerisdb_export/my-project/",
data_type = "timeseries"
)
For specialized analysis, you can process chunks with custom functions:
# Connect to database directly
con <- eyeris_db_connect("/path/to/bids", "large-project")
# Define custom analysis function for pupil data
analyze_chunk <- function(chunk) {
# Calculate summary statistics for this chunk
stats <- data.frame(
n_rows = nrow(chunk),
subjects = length(unique(chunk$subject_id)),
mean_eye_x = mean(chunk$eye_x, na.rm = TRUE),
mean_eye_y = mean(chunk$eye_y, na.rm = TRUE),
mean_pupil_raw = mean(chunk$pupil_raw, na.rm = TRUE),
mean_pupil_processed = mean(chunk$pupil_raw_deblink_detransient_interpolate_lpfilt_z, na.rm = TRUE),
missing_pupil_pct = sum(is.na(chunk$pupil_raw)) / nrow(chunk) * 100,
hz_modes = paste(unique(chunk$hz), collapse = ",")
)
# Save chunk summary (append to growing file)
write.csv(stats, "chunk_summaries.csv", append = file.exists("chunk_summaries.csv"))
return(TRUE) # Indicate success
}
# Hypothetical example: process large timeseries dataset in chunks
result <- process_chunked_query(
con = con,
query = "
SELECT subject_id, session_id, time_secs, eye_x, eye_y,
pupil_raw, pupil_raw_deblink_detransient_interpolate_lpfilt_z, hz
FROM timeseries_01_enc_clamp_run01
WHERE pupil_raw > 0 AND eye_x IS NOT NULL
ORDER BY time_secs
",
chunk_size = 100000,
process_chunk = analyze_chunk
)
eyeris_db_disconnect(con)
For databases with hundreds of millions of rows:
# Optimize for very large datasets
result <- eyeris_db_to_chunked_files(
bids_dir = "/path/to/bids",
db_path = "massive-project",
chunk_size = 2000000, # 2M rows per chunk for efficiency
max_file_size_mb = 1000, # 1GB files (larger but fewer files)
file_format = "parquet", # Better compression
data_types = "timeseries" # Focus on primary data type for analysis
)
If you encounter out-of-memory errors:
The function automatically handles this by processing tables in batches, but if you encounter issues:
When you see “Set operations can only apply to expressions with the same number of result columns”:
If files are locked or in use:
eyerisdb
database fileFor additional help:
?eyeris_db_to_chunked_files
eyeris_db_summary(bids_dir, db_path)
eyeris_db_list_tables(con)
verbose = TRUE
The built-in chunked eyerisdb
database export
functionality provides a robust solution for working with large
eyerisdb
databases. Key benefits include:
This makes it possible to work with even the largest eye-tracking/pupillometry datasets while maintaining performance/reliability without sacrificing the ability to share high-quality, reproducible datasets that support collaborative and open research.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.