The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
Tidy R access to the Anthropic Economic Index dataset.
The Anthropic Economic Index (AEI) is a recurring open dataset that maps real Claude conversations to occupations and tasks. Anthropic classifies millions of conversations against the U.S. Department of Labor’s O*NET task taxonomy and the Standard Occupational Classification (SOC) system, and publishes the resulting usage shares on Hugging Face under CC-BY-4.0. Each release also splits conversations into automation-style interactions (the user delegates to Claude) and augmentation-style interactions (the user works through a task with Claude). From the September 2025 release onwards, the data is broken down by country and US state. Methodology is documented in Handa et al. (2025); the privacy-preserving classification pipeline is described in Tamkin et al. (2024).
Five releases have shipped between February 2025 and March 2026,
covering Claude 3.5 Sonnet through Opus 4.5/4.6.
aieconindex lists releases, fetches raw and enriched usage
tables, retrieves task statements and request hierarchies, exposes
country and US-state slices, caches downloads, and produces ready-made
citations. Schema differences across releases are handled internally.
Three runtime dependencies (cli, httr2,
jsonlite) plus base R. No API key needed.
install.packages("aieconindex")
# or the development version
# install.packages("remotes")
remotes::install_github("charlescoverdale/aieconindex")R 4.1.0 or later.
library(aieconindex)
# 1. See what's available
aei_releases()
#> # AEI: releases · 5 rows
#> release_id release_date model
#> 1 release_2026_03_24 2026-03-24 Claude Opus 4.5/4.6
#> 2 release_2026_01_15 2026-01-15 Claude Sonnet 4.5
#> 3 release_2025_09_15 2025-09-15 Claude Sonnet 4
#> 4 release_2025_03_27 2025-03-27 Claude 3.7 Sonnet
#> 5 release_2025_02_10 2025-02-10 Claude 3.5 Sonnet
#> ...
# 2. Look inside a release
aei_files("2025-09-15", recursive = TRUE)
# 3. Fetch the canonical usage table
df <- aei_index("2025-09-15", source = "claude_ai", variant = "enriched")
# 4. Slice to a country
uk <- aei_geography("2025-09-15", country = "GBR")
# 5. Cite the dataset
aei_cite("2025-09-15", format = "bibtex")| Function | Returns |
|---|---|
aei_releases(live = TRUE) |
Available releases (live + bundled metadata) as an
aei_tbl |
aei_files(release, recursive = TRUE) |
Recursive file tree for a release as an aei_tbl with
path, type, size_bytes |
aei_releases(live = FALSE) # offline-safe (uses bundled metadata)
aei_files("latest") # tree of the most recent release
aei_files("2025-03-27", recursive = FALSE) # top-level only| Function | Returns |
|---|---|
aei_index(release, source, variant) |
Canonical usage table as an aei_tbl |
aei_download(release, path) |
CSVs as aei_tbl, JSON as parsed list, other extensions
as local path |
aei_index() locates the canonical usage CSV by
file-pattern matching. Arguments:
source: "claude_ai" (consumer product
traffic) or "1p_api" (first-party API). Not all releases
include both.variant: "raw" (counts and percentages
from Anthropic’s pipeline) or "enriched" (joined to O*NET /
SOC metadata, with derived per-capita and tier metrics). Older releases
may only ship one variant.df_raw <- aei_index("2026-03-24", source = "claude_ai", variant = "raw")
df_enriched <- aei_index("2025-09-15", source = "claude_ai", variant = "enriched")
df_api <- aei_index("2026-03-24", source = "1p_api", variant = "raw")aei_download() fetches any path returned by
aei_files():
soc <- aei_download("2025-03-27", "SOC_Structure.csv")
hierarchy <- aei_download("2025-09-15",
"data/output/request_hierarchy_tree_claude_ai.json")
report <- aei_download("2026-01-15", "aei_v4_appendix.pdf") # returns local path| Function | Returns |
|---|---|
aei_clusters(release, source) |
Request-hierarchy tree (Clio output) as a parsed nested list |
aei_tasks(release) |
O*NET task statements bundled with the release as an
aei_tbl |
aei_geography(release, country, geography) |
Country or US-state filter on the enriched table |
# Clio-derived request hierarchy (from 2025-09-15 onwards)
tree <- aei_clusters("2025-09-15", source = "claude_ai")
# Bundled O*NET task statements (ships in 2025-03-27)
tasks <- aei_tasks("2025-03-27")
# UK country slice (geographic facets ship from 2025-09-15 onwards)
uk <- aei_geography("2025-09-15", country = "GBR")
# Australia country slice
au <- aei_geography("2025-09-15", country = "AUS")
# US state-level breakdown
us_states <- aei_geography("2025-09-15", geography = "state_us")Country codes are ISO-3 ("GBR", "AUS",
"USA"). Releases before 2025-09-15 have no geographic data;
the function errors informatively.
| Function | Returns |
|---|---|
aei_compare(release_a, release_b, ...) |
Release-on-release diff with value_a,
value_b, delta, pct_change |
aei_link(x, y, by, type) |
Generic merge that preserves the aei_tbl class; for
splicing AEI to user-supplied data on a shared key |
aei_concentration(x, share_col, group_cols, top_n) |
HHI, top-N concentration ratio, Shannon entropy on usage shares |
# How did the cluster shares move between Sept 2025 and March 2026?
diff <- aei_compare("2025-09-15", "2026-03-24")
head(diff[order(-abs(diff$delta)), ])
# Splice AEI country shares to your own GDP-per-capita table
overlay <- data.frame(
geo_id = c("GBR", "AUS", "USA"),
gdp_pc = c(48000, 65000, 80000)
)
joined <- aei_link(aei_geography("2025-09-15"), overlay, by = "geo_id")
# How concentrated is UK Claude.ai usage across O*NET tasks?
uk <- aei_geography("2025-09-15", country = "GBR")
uk_tasks <- uk[uk$facet == "onet_task" & uk$variable == "onet_task_pct", ]
aei_concentration(uk_tasks)aei_link() is a thin wrapper over
base::merge() that preserves the aei_tbl class
and provenance metadata, supports left / inner / full joins, and warns
when a join produces zero rows. Use it to attach occupational crosswalks
(SOC, ANZSCO, ISCO, SOC2020 UK), national labour-force data (ONS, BLS
OEWS, ABS), or anything else keyed on country code or task
identifier.
| Function | Returns |
|---|---|
aei_cite(release, format, method = TRUE) |
Citation in plain text, BibTeX, or bibentry form |
By default aei_cite() returns both the dataset citation
and Handa et al. (2025).
Set method = FALSE for the dataset only.
aei_cite() # text, project-wide, with paper
aei_cite("2025-09-15", format = "bibtex") # BibTeX, both refs
aei_cite("2026-03-24", format = "bibentry") # bibentry object (multi-entry)
aei_cite(format = "text", method = FALSE) # dataset only| Function | Returns |
|---|---|
aei_cache_dir() |
Path of the cache directory (override-aware) |
aei_cache_info() |
List with dir, n_files,
size_bytes, size_human,
files |
aei_cache_clear() |
Clears the cache; invisible NULL |
All data-returning functions emit an aei_tbl: a
data.frame subclass with provenance metadata stored in the
aei_query attribute. The metadata carries
endpoint, the resolved release identifier, the source URL,
and the fetch timestamp; it is preserved across row and column
subsetting.
df <- aei_index("2025-09-15")
attr(df, "aei_query")
#> $endpoint "index"
#> $release "release_2025_09_15"
#> $facet "raw/claude_ai"
#> $source_url "https://huggingface.co/datasets/Anthropic/EconomicIndex/.../aei_raw_claude_ai_*.csv"
#> $fetched_at "2026-04-28 18:34:00 BST"
# Custom print method shows the provenance header
print(df)
#> # AEI: index · release=release_2025_09_15 · facet=raw/claude_ai · 12345 rows
#> ...
# Subsetting preserves the class and attribute
sub <- df[df$value > 1, ]
class(sub)
#> [1] "aei_tbl" "data.frame"The class inherits from data.frame, so any function that
takes a data frame works without conversion. Drop the class with
as.data.frame() if you need a plain frame.
Pin a release for production. Default
release = "latest" resolves to the most recent release at
call time, which is fine for exploration but unsuitable for reproducible
pipelines. Pin a release identifier explicitly:
RELEASE <- "2025-09-15" # or "release_2025_09_15"
df <- aei_index(RELEASE, source = "claude_ai", variant = "enriched")Replicate an Anthropic figure. Anthropic ships
Python replication notebooks (v2_report_replication.ipynb)
inside several releases. To replicate the augmentation-vs-automation
headline figure in R:
df <- aei_download("2025-03-27", "automation_vs_augmentation_v2.csv")
df$family <- ifelse(df$interaction_type %in% c("directive", "feedback loop"),
"Automation", "Augmentation")Country exposure ranking. Top O*NET tasks for the UK by share of Claude.ai usage:
uk <- aei_geography("2025-09-15", country = "GBR")
top <- subset(uk, facet == "onet_task" & variable == "onet_task_pct")
top <- top[order(-top$value), ][1:15, c("cluster_name", "value")]Cross-country comparison. Per-capita usage index for selected economies:
df <- aei_index("2025-09-15", source = "claude_ai", variant = "enriched")
country_overall <- subset(df,
geography == "country" &
variable == "usage_per_capita_index" &
cluster_name == "not_classified" &
level == 0
)
country_overall <- country_overall[order(-country_overall$value), ]Cite in a paper. Drop the BibTeX form straight into
your .bib:
cat(aei_cite("2025-09-15", format = "bibtex"), file = "refs.bib", append = TRUE)The package recognises every release published to Hugging Face up to 2026-03-24 and discovers any newer releases automatically via the Hugging Face tree API.
| Release | Headline model | Notes |
|---|---|---|
release_2025_02_10 |
Claude 3.5 Sonnet | Initial release; O*NET task mappings; automation vs augmentation |
release_2025_03_27 |
Claude 3.7 Sonnet | Cluster-level insights; v2 report replication notebook |
release_2025_09_15 |
Claude Sonnet 4 | Geographic + first-party API data added; long-format schema |
release_2026_01_15 |
Claude Sonnet 4.5 | Economic primitives added |
release_2026_03_24 |
Claude Opus 4.5/4.6 | Learning curves added |
Each release ships its own data_documentation.md on
Hugging Face. The package’s aei_releases() blends bundled
metadata (model, report URL) with a live Hugging Face listing.
Downloaded files are cached under the path returned by
aei_cache_dir(), which defaults to
tools::R_user_dir("aieconindex", "cache"). Override before
the first call:
options(aieconindex.cache_dir = "/your/preferred/path")Cache is keyed by release identifier and relative path, so re-downloads are byte-identical to the original.
aei_cache_info()
#> $dir "/Users/.../aieconindex/cache"
#> $n_files 3
#> $size_bytes 126839425
#> $size_human "121.0 MB"
#> $files <data.frame: 3 rows>
aei_cache_clear() # removes all cached filesThe latest release usage CSVs are around 100 MB each, so the first call to a fresh release is bandwidth-heavy. Subsequent calls are served from disk.
Anthropic ships its own replication code as Jupyter notebooks inside
several releases
(e.g. release_2025_03_27/v2_report_replication.ipynb). For
exact figure replication, use those. aieconindex is the
R-side equivalent of Hugging Face’s Python datasets loader:
typed, cached access to the same source files, with downstream analysis
left to you.
| Package | Description |
|---|---|
inequality |
Inequality and poverty measurement (labour-market distributional context) |
ons |
UK labour market data (employment, wages by occupation) |
fred |
US labour market data (employment, productivity, occupational wages) |
readoecd |
OECD international labour and skills data |
Cite both the package and the underlying dataset:
citation("aieconindex")
aei_cite("2025-09-15", format = "bibtex")aei_cite() returns the dataset citation alongside Handa et al. (2025), the
methodological source paper.
Issues and pull requests welcome at https://github.com/charlescoverdale/aieconindex/issues. Useful contributions for v0.2 include:
For Anthropic-introduced schema changes that break
aei_index() or aei_geography(), please open an
issue with a sample of the new file structure (output of
aei_files(<new_release>)).
This package is released under the MIT License.
The underlying Anthropic Economic Index dataset is released by
Anthropic under Creative Commons
Attribution 4.0 International (CC-BY-4.0). When using this package
to retrieve or redistribute that data, attribution to Anthropic and to
Handa et al. (2025) is
required. Use aei_cite() for ready-made citation
strings.
The bundled O*NET and SOC reference data (when accessed through the AEI) inherit their respective licences. See the O*NET licensing page and the BLS Standard Occupational Classification documentation.
This product uses the Anthropic Economic Index data but is not endorsed or certified by Anthropic.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.