The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
Data frames are the workhorse of data analysis in R. In HDF5, data frames are stored as Compound Datasets. This allows different columns to have different data types (e.g., integer, float, string) within the same dataset, much like a SQL table.
This vignette explains how h5lite handles data frames,
including row names, factors, and missing values.
Writing a data frame is as simple as writing any other object.
h5lite automatically maps each column to its appropriate
HDF5 type.
# Create a standard data frame
df <- data.frame(
id = 1:5,
group = c("A", "A", "B", "B", "C"),
score = c(10.5, 9.2, 8.4, 7.1, 6.0),
passed = c(TRUE, TRUE, TRUE, FALSE, FALSE),
stringsAsFactors = FALSE
)
# Write to HDF5
h5_write(df, file, "study_data/results")
# Fetch the column names
h5_names(file, "study_data/results")
#> [1] "id" "group" "score" "passed"
# Read back
df_in <- h5_read(file, "study_data/results")
head(df_in)
#> id group score passed
#> 1 1 A 10.5 1
#> 2 2 A 9.2 1
#> 3 3 B 8.4 1
#> 4 4 B 7.1 0
#> 5 5 C 6.0 0You can use the as argument to control the storage type
for specific columns. This is passed as a named vector where the names
correspond to the column names.
This is particularly useful for optimizing storage (e.g., saving
space by storing small integers as int8 or single
characters as ascii[1]).
Standard HDF5 Compound Datasets do not have a concept of “row names”.
However, h5lite preserves them using Dimension
Scales.
When you write a data frame with row names, h5lite
creates a separate dataset (usually named _rownames) and
links it to the main table. When reading, h5lite
automatically restores these as the row.names of the data
frame.
mtcars_subset <- head(mtcars, 3)
h5_write(mtcars_subset, file, "cars")
h5_str(file)
#> /
#> ├── study_data/
#> │ └── results <compound[4] × 5>
#> │ ├── $id <uint8>
#> │ ├── $group <utf8[1]>
#> │ ├── $score <float64>
#> │ └── $passed <uint8>
#> ├── custom_df <compound[2] × 10>
#> │ ├── $id <uint16>
#> │ └── $code <ascii[1]>
#> ├── cars <compound[11] × 3>
#> │ ├── @DIMENSION_LIST <vlen × 1>
#> │ ├── $mpg <float64>
#> │ ├── $cyl <uint8>
#> │ ├── $disp <uint8>
#> │ ├── $hp <uint8>
#> │ ├── $drat <float64>
#> │ ├── $wt <float64>
#> │ ├── $qsec <float64>
#> │ ├── $vs <uint8>
#> │ ├── $am <uint8>
#> │ ├── $gear <uint8>
#> │ └── $carb <uint8>
#> └── cars_rownames <utf8 × 3>
#> ├── @CLASS <ascii[16] scalar>
#> └── @REFERENCE_LIST <compound[2] × 1>
#> ├── $dataset <reference>
#> └── $dimension <uint32>
# Read back
result <- h5_read(file, "cars")
print(row.names(result))
#> [1] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710"These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.