The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
This R package allows R users to easily import large SAS datasets into Spark tables in parallel.
The package uses the spark-sas7bdat Spark package in order to read a SAS dataset in Spark. That Spark package imports the data in parallel on the Spark cluster using the Parso library and this process is launched from R using the sparklyr functionality.
More information about the spark-sas7bdat Spark package and sparklyr can be found at:
The following example reads in a file called iris.sas7bdat in parallel in a table called sas_example in Spark. Do try this with bigger data on your cluster and look at the help of the sparklyr package to connect to your Spark cluster.
library(sparklyr)
library(spark.sas7bdat)
<- system.file("extdata", "iris.sas7bdat", package = "spark.sas7bdat")
mysasfile
<- spark_connect(master = "local")
sc <- spark_read_sas(sc, path = mysasfile, table = "sas_example") x
The resulting pointer to a Spark table can be further used in dplyr statements. These will be executed in parallel using the Spark functionalities of the spark-sas7bdat package.
library(dplyr)
library(magrittr)
%>% group_by(Species) %>%
x summarise(count = n(), length = mean(Sepal_Length), width = mean(Sepal_Width))
Need support in big data and Spark analysis? Contact BNOSAC: http://www.bnosac.be
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.