The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
s3fs
provides a file-system like interface into Amazon
Web Services for R
. It utilizes paws
SDK
and R6
for it’s core
design. This repo has been inspired by Python’s s3fs
, however
it’s API and implementation has been developed to follow
R
’s fs
.
You can install the released version of s3fs from CRAN with:
install.packages('s3fs')
r-universe installation:
# Enable repository from dyfanjones
options(repos = c(
dyfanjones = 'https://dyfanjones.r-universe.dev',
CRAN = 'https://cloud.r-project.org')
)
# Download and install s3fs in R
install.packages('s3fs')
Github installation
::install_github("dyfanjones/s3fs") remotes
paws
:
connection with AWS S3R6
: Setup
core classdata.table
:
wrangle lists into data.framesfs
: file
system on local fileslgr
: set
up loggingfuture
:
set up async functionalityfuture.apply
:
set up parallel loopingfs
s3fs
attempts to give the same interface as
fs
when handling files on AWS S3 from R
.
s3fs
functions are
vectorized, accepting multiple path inputs similar to
fs
.future
object of it’s no-async
counterpart.s3_stream_in
which returns a
list of raw objects.fs
naming conventions with dir_*
,
file_*
and path_*
however with the syntax
s3_
infront i.e s3_dir_*
,
s3_file_*
and s3_path_*
etc.fs
if a
failure happens, then it will be raised and not masked with a
warning.s3fs
functions are
designed to have the option to run in parallel through the use of
future
and future.apply
.For example: copy a large file from one location to the next.
library(s3fs)
library(future)
plan("multisession")
s3_file_copy("s3://mybucket/multipart/large_file.csv", "s3://mybucket/new_location/large_file.csv")
s3fs
to copy a large file (> 5GB) using multiparts,
future
allows each multipart to run in parallel to speed up
the process.
s3fs
uses future
to create a few key async functions. This is more focused on functions
that might be moving large files to and from R
and
AWS S3
.For example: Copying a large file from AWS S3
to
R
.
library(s3fs)
library(future)
plan("multisession")
s3_file_copy_async("s3://mybucket/multipart/large_file.csv", "large_file.csv")
fs
has a straight forward API with 4 core themes:
path_
for manipulating and constructing pathsfile_
for filesdir_
for directorieslink_
for linkss3fs
follows theses themes with the following:
s3_path_
for manipulating and constructing s3 uri
pathss3_file_
for s3 filess3_dir_
for s3 directoriesNOTE: link_
is currently not
supported.
library(s3fs)
# Construct a path to a file with `path()`
s3_path("foo", "bar", letters[1:3], ext = "txt")
#> [1] "s3://foo/bar/a.txt" "s3://foo/bar/b.txt" "s3://foo/bar/c.txt"
# list buckets
s3_dir_ls()
#> [1] "s3://MyBucket1"
#> [2] "s3://MyBucket2"
#> [3] "s3://MyBucket3"
#> [4] "s3://MyBucket4"
#> [5] "s3://MyBucket5"
# list files in bucket
s3_dir_ls("s3://MyBucket5")
#> [1] "s3://MyBucket5/iris.json" "s3://MyBucket5/athena-query/"
#> [3] "s3://MyBucket5/data/" "s3://MyBucket5/default/"
#> [5] "s3://MyBucket5/iris/" "s3://MyBucket5/made-up/"
#> [7] "s3://MyBucket5/test_df/"
# create a new directory
<- s3_dir_create(s3_file_temp(tmp_dir = "MyBucket5"))
tmp
tmp#> [1] "s3://MyBucket5/filezwkcxx9q5562"
# create new files in that directory
s3_file_create(s3_path(tmp, "my-file.txt"))
#> [1] "s3://MyBucket5/filezwkcxx9q5562/my-file.txt"
s3_dir_ls(tmp)
#> [1] "s3://MyBucket5/filezwkcxx9q5562/my-file.txt"
# remove files from the directory
s3_file_delete(s3_path(tmp, "my-file.txt"))
s3_dir_ls(tmp)
#> character(0)
# remove the directory
s3_dir_delete(tmp)
Created on 2022-06-21 by the reprex package (v2.0.1)
Similar to fs
, s3fs
is designed to work
well with the pipe.
library(s3fs)
<- s3_file_temp(tmp_dir = "MyBucket") |>
paths s3_dir_create() |>
s3_path(letters[1:5]) |>
s3_file_create()
paths#> [1] "s3://MyBucket/fileazqpwujaydqg/a"
#> [2] "s3://MyBucket/fileazqpwujaydqg/b"
#> [3] "s3://MyBucket/fileazqpwujaydqg/c"
#> [4] "s3://MyBucket/fileazqpwujaydqg/d"
#> [5] "s3://MyBucket/fileazqpwujaydqg/e"
|> s3_file_delete()
paths #> [1] "s3://MyBucket/fileazqpwujaydqg/a"
#> [2] "s3://MyBucket/fileazqpwujaydqg/b"
#> [3] "s3://MyBucket/fileazqpwujaydqg/c"
#> [4] "s3://MyBucket/fileazqpwujaydqg/d"
#> [5] "s3://MyBucket/fileazqpwujaydqg/e"
Created on 2022-06-22 by the reprex package (v2.0.1)
NOTE: all examples have be developed from
fs
.
s3fs
allows you to connect to file systems that provides
an S3-compatible interface. For example, MinIO offers high-performance, S3 compatible
object storage. You will be able to connect to your MinIO
server using s3fs::s3_file_system
:
library(s3fs)
s3_file_system(
aws_access_key_id = "minioadmin",
aws_secret_access_key = "minioadmin",
endpoint = "http://localhost:9000"
)
s3_dir_ls()
#> [1] ""
s3_bucket_create("s3://testbucket")
#> [1] "s3://testbucket"
# refresh cache
s3_dir_ls(refresh = T)
#> [1] "s3://testbucket"
s3_bucket_delete("s3://testbucket")
#> [1] "s3://testbucket"
# refresh cache
s3_dir_ls(refresh = T)
#> [1] ""
Created on 2022-12-14 with reprex v2.0.2
NOTE: if you to want change from AWS S3 to Minio in
the same R session, you will need to set the parameter
refresh = TRUE
when calling s3_file_system
again. You can use multiple sessions by using the R6 class
S3FileSystem
directly.
Please open a Github ticket raising any issues or feature requests.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.