The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Encoding, Decoding, and Cross-Language Data Transfer

Introduction

Data pipelines in {rixpress} often require controlling how objects are stored and restored, especially when dealing with:

  1. Non-standard R objects (e.g., machine learning models, large tables).
  2. Multiple file formats (CSV, {qs} compressed files, etc.).
  3. Cross-language workflows mixing R and Python.

This vignette focuses on encoding and decoding in R, and on transferring data between R and Python using rxp_py2r() and rxp_r2py().

Custom Encoding and Decoding in R

By default, {rixpress} uses saveRDS() and readRDS(). You can override this to handle different formats or complex objects:

library(rixpress)

# Encode output as CSV instead of RDS
d2 <- rxp_r(
  mtcars_head,
  my_head(mtcars_am, 100),
  user_functions = "my_head.R",
  nix_env = "default.nix",
  encoder = write.csv
)

# Encode as qs, decode input from CSV
d3 <- rxp_r(
  mtcars_tail,
  my_tail(mtcars_head),
  user_functions = "my_tail.R",
  nix_env = "default2.nix",
  encoder = qs::qsave,
  decoder = read.csv
)

# Decode multiple upstream objects with different decoders
d4 <- rxp_r(
  mtcars_mpg,
  full_join(mtcars_tail, mtcars_head),
  nix_env = "default2.nix",
  decoder = c(
    mtcars_tail = "qs::qread",
    mtcars_head = "read.csv"
  )
)

Key points:

As shown in the examples above, you can pass a function or a string representation of the function to encoder and decoder.

By encoding the object in a cross-language format, it is possible to pass it to another language. For example, read a csv file using Julia, encode it to Arrow and read it back in R:

library(rixpress)

list(
  rxp_jl_file(
    mtcars,
    # Assume here that mtcars.csv is separated by "|" instead of ","
    path = "data/mtcars.csv",
    read_function = "read_csv",
    user_functions = "functions.jl",
    encoder = "write_arrow"
    # read_csv and write_arrow are both
    # defined in the functions.jl script
    # and looks like this:

    #function write_arrow(df::DataFrame, filename::String)
    #    Arrow.write(filename, df)
    #end

    #function read_csv(path::String)
    #    df = CSV.read(path, DataFrame; delim="|")
    #return df
    #end

  ),

  rxp_r(
    mtcars2,
    select(mtcars, am, cyl, mpg),
    decoder = "read_feather"
  )
) |>
  rxp_populate()

You can find this example here. You can use the same approach to transfer data to Python (well, from and to any of the three supported languages).

Cross-Language Data Transfer: R ↔︎ Python

In the specific case of transferring objects (data, lists, vectors, arrays, etc.) between R and Python, it also possible to use {reticulate}’s built-in conversion by using rxp_py2r() and rxp_r2py(). These functions enable seamless movement of objects between R and Python:

library(rixpress)

# Python step producing pandas DataFrame
d1 <- rxp_py(
  name = mtcars_pl_am,
  expr = "mtcars_pl.filter(polars.col('am') == 1).to_pandas()"
)

# Transfer Python -> R
d2 <- rxp_py2r(
  name = mtcars_am,
  expr = mtcars_pl_am
)

# R step processing the data
d3 <- rxp_r(
  name = mtcars_head,
  expr = my_head(mtcars_am),
  user_functions = "functions.R"
)

# Transfer R -> Python
d3_1 <- rxp_r2py(
  name = mtcars_head_py,
  expr = mtcars_head
)

For this to work, you need to add {reticulate} to the pipeline’s execution environment.

Summary

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.