This package provides the header-only ‘jsoncons’ library for manipulating JSON objects. Use rjsoncons for querying JSON or R objects using JMESpath, JSONpath, or JSONpointer. Link to the package for direct access to the ‘jsoncons’ C++ library.
Install the released package version from CRAN
install.packages("rjsoncons", repos = "https://CRAN.R-project.org")
Install the development version with
if (!requireNamespace("remotes", quiety = TRUE))
install.packages("remotes", repos = "https://CRAN.R-project.org")
remotes::install_github("mtmorgan/rjsoncons")
Attach the installed package to your R session, and check the version of the C++ library in use
library(rjsoncons)
rjsoncons::version()
## [1] "0.173.2"
j_query()
Here is a simple JSON example document
json <- '{
"locations": [
{"name": "Seattle", "state": "WA"},
{"name": "New York", "state": "NY"},
{"name": "Bellevue", "state": "WA"},
{"name": "Olympia", "state": "WA"}
]
}'
There are several common use cases. Use rjsoncons to query the JSON string using JSONpath, JMESpath or JSONpointer syntax to filter larger documents to records of interest, e.g., only cities in New York state, using ‘JMESpath’ syntax.
j_query(json, "locations[?state == 'NY']") |>
cat("\n")
## [{"name":"New York","state":"NY"}]
Use the as = "R"
argument to extract deeply nested elements as R
objects, e.g., a character vector of city names in Washington state.
j_query(json, "locations[?state == 'WA'].name", as = "R")
## [1] "Seattle" "Bellevue" "Olympia"
The JSON Pointer specification is simpler, indexing a single object in the document. JSON arrays are 0-based.
j_query(json, "/locations/0/state")
## [1] "WA"
The examples above use j_query()
, which automatically infers query
specification from the form of path
using j_path_type()
. It may be
useful to indicate query specification more explicitly using
jsonpointer()
, jsonpath()
, or jmespath()
; examples illustrating
features available for each query specification are on the help pages
?jsonpointer
, ?jsonpath
, and ?jmespath
.
j_pivot()
The following transforms a nested JSON document into a format that can
be incorporated directly in R as a data.frame
.
path <- '{
name: locations[].name,
state: locations[].state
}'
j_query(json, path, as = "R") |>
data.frame()
## name state
## 1 Seattle WA
## 2 New York NY
## 3 Bellevue WA
## 4 Olympia WA
The transformation from JSON ‘array-of-objects’ to ‘object-of-arrays’
suitable for direct representation as a data.frame
is common, and is
implemented directly as j_pivot()
j_pivot(json, "locations", as = "data.frame")
## name state
## 1 Seattle WA
## 2 New York NY
## 3 Bellevue WA
## 4 Olympia WA
j_pivot()
also support as = "tibble"
when the dplyr package is
installed.
rjsoncons can filter and transform R objects. These are
converted to JSON using jsonlite::toJSON()
before queries are made;
toJSON()
arguments like auto_unbox = TRUE
can be added to the
function call.
## `lst` is an *R* list
lst <- jsonlite::fromJSON(json, simplifyVector = FALSE)
j_query(lst, "locations[?state == 'WA'].name | sort(@)", auto_unbox = TRUE) |>
cat("\n")
## ["Bellevue","Olympia","Seattle"]
rjsoncons supports NDJSON (new-line delimited JSON). NDJSON consists of a file or character vector where each line / element represents a JSON record. This example uses data from the GitHub Archive project recording all actions on public GitHub repositories. The data included in the package are the first 10 lines of https://data.gharchive.org/2023-02-08-0.json.gz.
ndjson_file <-
system.file(package = "rjsoncons", "extdata", "2023-02-08-0.json")
NDJSON can be read into R (ndjson <- readLines(ndjson_file)
) and
used in j_query()
/ j_pivot()
, but it is often better to leave
full NDJSON files on disk. Thus the first argument to j_query()
or
j_pivot()
is usually a (text or gz-compressed) file path or URL.
Two additional options are available when working with
NDJSON. n_records
limits the number of records processed. Using
n_records
can be very useful when exploring the data. For instance,
the first record of a file can be viewed interactively with
j_query(ndjson_file, n_records = 1) |>
listviewer::jsonedit()
The option verbose = TRUE
adds a progress indicator, which provides
confidence that progress is being made while parsing large files. The
progress bar requires the cli package.
j_query()
provides a one-to-one mapping of NDJSON lines / elements
to the return value, e.g., j_query(ndjson_file, "@", as = "string")
on an NDJSON file with 1000 lines will return a character vector of
1000 elements, or with j_query(ndjson, "@", as = "R")
an R list
with length 1000.
j_query(ndjson_file, "{id: id, type: type}", n_records = 5)
## [1] "{\"id\":\"26939254345\",\"type\":\"DeleteEvent\"}"
## [2] "{\"id\":\"26939254358\",\"type\":\"PushEvent\"}"
## [3] "{\"id\":\"26939254361\",\"type\":\"CreateEvent\"}"
## [4] "{\"id\":\"26939254365\",\"type\":\"CreateEvent\"}"
## [5] "{\"id\":\"26939254366\",\"type\":\"PushEvent\"}"
j_pivot()
transforms an NDJSON file or character vector of objects
into a format convenient for input in R. j_pivot()
with NDJSON
files and JMESpath paths work particularly well together, because
JMESpath provides flexibility in creating JSON objects to be pivoted.
j_pivot(ndjson_file, "{id: id, type: type}", as = "data.frame")
## id type
## 1 26939254345 DeleteEvent
## 2 26939254358 PushEvent
## 3 26939254361 CreateEvent
## 4 26939254365 CreateEvent
## 5 26939254366 PushEvent
## 6 26939254367 PushEvent
## 7 26939254379 PushEvent
## 8 26939254380 IssuesEvent
## 9 26939254382 PushEvent
## 10 26939254383 PushEvent
Filtering NDJSON files can require relatively more complicated paths,
e.g., to filter ‘PushEvent’ types from organizations, construct a
query that acts on each NDJSON record to return an array of a single
object, then apply a filter to replace uninteresting elements with
0-length arrays (using as = "tibble"
often transforms the R
list-of-vectors to a tibble in a more pleasing and robust manner
compared to as = "data.frame"
).
path <-
"[{id: id, type: type, org: org}]
[?@.type == 'PushEvent' && @.org != null]"
j_pivot(ndjson_file, path, as = "data.frame")
## id type org.id org.login org.gravatar_id
## 1 26939254358 PushEvent 123667276 johnbieren-testing
## 2 26939254382 PushEvent 123667276 johnbieren-testing
## org.url
## 1 https://api.github.com/orgs/johnbieren-testing
## 2 https://api.github.com/orgs/johnbieren-testing
## org.avatar_url org.id.1 org.login.1
## 1 https://avatars.githubusercontent.com/u/123667276? 120284018 mornystannit
## 2 https://avatars.githubusercontent.com/u/123667276? 120284018 mornystannit
## org.gravatar_id.1 org.url.1
## 1 https://api.github.com/orgs/mornystannit
## 2 https://api.github.com/orgs/mornystannit
## org.avatar_url.1
## 1 https://avatars.githubusercontent.com/u/120284018?
## 2 https://avatars.githubusercontent.com/u/120284018?
A more complete example is used in the NDJSON extended vignette
The package includes a JSON parser, used with the argument as = "R"
or directly with as_r()
as_r('{"a": 1.0, "b": [2, 3, 4]}') |>
str()
#> List of 2
#> $ a: num 1
#> $ b: int [1:3] 2 3 4
The main rules of this transformation are outlined here. JSON arrays of a single type (boolean, integer, double, string) are transformed to R vectors of the same length and corresponding type.
as_r('[true, false, true]') # boolean -> logical
## [1] TRUE FALSE TRUE
as_r('[1, 2, 3]') # integer -> integer
## [1] 1 2 3
as_r('[1.0, 2.0, 3.0]') # double -> numeric
## [1] 1 2 3
as_r('["a", "b", "c"]') # string -> character
## [1] "a" "b" "c"
JSON arrays mixing integer and double values are transformed to R numeric vectors.
as_r('[1, 2.0]') |> class() # numeric
## [1] "numeric"
If a JSON integer array contains a value larger than R’s 32-bit
integer representation, the array is transformed to an R numeric
vector. NOTE that this results in loss of precision for JSON integer
values greater than 2^53
.
as_r('[1, 2147483648]') |> class() # 64-bit integers -> numeric
## [1] "numeric"
JSON objects are transformed to R named lists.
as_r('{}')
## named list()
as_r('{"a": 1.0, "b": [2, 3, 4]}') |> str()
## List of 2
## $ a: num 1
## $ b: int [1:3] 2 3 4
There are several additional details. A JSON scalar and a JSON vector of length 1 are represented in the same way in R.
identical(as_r("3.14"), as_r("[3.14]"))
## [1] TRUE
JSON arrays mixing types other than integer and double are transformed to R lists
as_r('[true, 1, "a"]') |> str()
## List of 3
## $ : logi TRUE
## $ : int 1
## $ : chr "a"
JSON null
values are represented as R NULL
values; arrays of
null
are transformed to lists
as_r('null') # NULL
## NULL
as_r('[null]') |> str() # list(NULL)
## List of 1
## $ : NULL
as_r('[null, null]') |> str() # list(NULL, NULL)
## List of 2
## $ : NULL
## $ : NULL
Ordering of object members is controlled by the object_names=
argument. The default preserves names as they appear in the JSON
definition; use "sort"
to sort names alphabetically. This argument
is applied recursively.
json <- '{"b": 1, "a": {"d": 2, "c": 3}}'
as_r(json) |> str()
## List of 2
## $ b: int 1
## $ a:List of 2
## ..$ d: int 2
## ..$ c: int 3
as_r(json, object_names = "sort") |> str()
## List of 2
## $ a:List of 2
## ..$ c: int 3
## ..$ d: int 2
## $ b: int 1
The parser corresponds approximately to jsonlite::fromJSON()
with
arguments simplifyVector = TRUE, simplifyDataFrame = FALSE, simplifyMatrix = FALSE)
. Unit tests (using the tinytest
framework) providing additional details are available at
system.file(package = "rjsoncons", "tinytest", "test_as_r.R")
jsonlite::fromJSON()
The built-in parser can be replaced by alternative parsers by returning
the query as a JSON string, e.g., using the fromJSON()
in the
jsonlite package.
j_query(json, "locations[?state == 'WA']") |>
## `fromJSON()` simplifies list-of-objects to data.frame
jsonlite::fromJSON()
## NULL
The rjsoncons package is particularly useful when accessing
elements that might otherwise require complicated application of
nested lapply()
, purrr expressions, or tidyr unnest_*()
(see R for Data Science chapter ‘Hierarchical data’).
The package includes the complete ‘jsoncons’ C++ header-only library, available to other R packages by adding
LinkingTo: rjsoncons
SystemRequirements: C++11
to the DESCRIPTION file. Typical use in an R package would also
include LinkingTo:
specifications for the cpp11 or Rcpp
(this package uses cpp11) packages to provide a C / C++ interface
between R and the C++ ‘jsoncons’ library.
This vignette was compiled using the following software versions
sessionInfo()
## R Under development (unstable) (2024-01-11 r85801)
## Platform: aarch64-apple-darwin23.2.0
## Running under: macOS Sonoma 14.2.1
##
## Matrix products: default
## BLAS: /Users/ma38727/bin/R-devel/lib/libRblas.dylib
## LAPACK: /Users/ma38727/bin/R-devel/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] rjsoncons_1.2.0 BiocStyle_2.31.0
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.34 R6_2.5.1 bookdown_0.37
## [4] fastmap_1.1.1 xfun_0.41 cachem_1.0.8
## [7] knitr_1.45 htmltools_0.5.7 rmarkdown_2.25
## [10] lifecycle_1.0.4 cli_3.6.2 sass_0.4.8
## [13] jquerylib_0.1.4 compiler_4.4.0 tools_4.4.0
## [16] evaluate_0.23 bslib_0.6.1 yaml_2.3.8
## [19] BiocManager_1.30.22.3 jsonlite_1.8.8 rlang_1.1.3