The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
nanoparquet
is a reader and writer for a common subset
of Parquet files.
FLOAT16
,
INTERVAL
, UNKNOWN
.Install the R package from CRAN:
install.packages("nanoparquet")
Call read_parquet()
to read a Parquet file:
<- nanoparquet::read_parquet("example.parquet") df
To see the columns of a Parquet file and how their types are mapped
to R types by read_parquet()
, call
parquet_column_types()
first:
::parquet_column_types("example.parquet") nanoparquet
Folders of similar-structured Parquet files (e.g. produced by Spark) can be read like this:
<- data.table::rbindlist(lapply(
df Sys.glob("some-folder/part-*.parquet"),
::read_parquet
nanoparquet ))
Call write_parquet()
to write a data frame to a Parquet
file:
::write_parquet(mtcars, "mtcars.parquet") nanoparquet
To see how the columns of the data frame will be mapped to Parquet
types by write_parquet()
, call
parquet_column_types()
first:
::parquet_column_types(mtcars) nanoparquet
Call parquet_info()
,
parquet_column_types()
, parquet_schema()
or
parquet_metadata()
to see various kinds of metadata from a
Parquet file:
parquet_info()
shows a basic summary of the file.parquet_column_types()
shows the leaf columns, these
are are the ones that read_parquet()
reads into R.parquet_schema()
shows all columns, including non-leaf
columns.parquet_metadata()
shows the most complete metadata
information: file meta data, the schema, the row groups and column
chunks of the file.::parquet_info("mtcars.parquet")
nanoparquet::parquet_column_types("mtcars.parquet")
nanoparquet::parquet_schema("mtcars.parquet")
nanoparquet::parquet_metadata("mtcars.parquet") nanoparquet
If you find a file that should be supported but isn’t, please open an issue here with a link to the file.
See also ?parquet_options()
.
nanoparquet.class
: extra class to add to data frames
returned by read_parquet()
. If it is not defined, the
default is "tbl"
, which changes how the data frame is
printed if the pillar package is loaded.nanoparquet.use_arrow_metadata
: unless this is set to
FALSE
, read_parquet()
will make use of Arrow
metadata in the Parquet file. Currently this is used to detect factor
columns.nanoparquet.write_arrow_metadata
: unless this is set to
FALSE
, write_parquet()
will add Arrow metadata
to the Parquet file. This helps preserving classes of columns,
e.g. factors will be read back as factors, both by nanoparquet and
Arrow.MIT
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.