The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
Apache Sedona is a cluster computing system for processing large-scale spatial data. Sedona extends existing cluster computing systems, such as Apache Spark and Apache Flink, with a set of out-of-the-box distributed Spatial Datasets and Spatial SQL that efficiently load, process, and analyze large-scale spatial data across machines.
The apache.sedona R package exposes an interface to Apache Sedona
through {sparklyr}
enabling higher-level access through a
{dplyr}
backend and familiar R functions.
To use Apache Sedona from R, you just need to install the apache.sedona package; Spark dependencies are managed directly by the package.
# Install released version from CRAN
install.packages("apache.sedona")
To use the development version, you will need both the latest version of the package and of the Apache Sedona jars.
To get the latest R package from GtiHub:
# Install development version from GitHub
::install_github("apache/sedona/R") devtools
To get the latest Sedona jars you can:
The path to the sedona-spark-shaded jars needs to be put in the
SEDONA_JAR_FILES
environment variables (see below).
spark_read_*
functions will read geospatial data into
Spark Dataframes. The resulting Spark dataframe object can then be
modified using dplyr verbs familiar to many R users. In addition,
spatial UDFs supported by Sedona can inter-operate seamlessly with other
functions supported in sparklyr’s dbplyr SQL translation env. For
example, the code below finds the average area of all polygons in
polygon_sdf:
The first time you load Sedona, Spark will download all the dependent
jars, which can take a few minutes and cause the connection to timeout.
You can either retry (some jars will already be downloaded and cached)
or increase the "sparklyr.connect.timeout"
parameter in the
sparklyr config.
library(sparklyr)
library(apache.sedona)
## Only if using development version:
Sys.setenv("SEDONA_JAR_FILES" = "<path to sedona-spark-shaded jar>")
<- spark_connect(master = "local")
sc <- spark_read_geojson(sc, location = "/tmp/polygon.json") polygon_sdf
<- polygon_sdf %>%
mean_area_sdf ::summarize(mean_area = mean(ST_Area(geometry)))
dplyrprint(mean_area_sdf)
Notice that all of the above can open up many interesting
possibilities. For example, one can extract ML features from geospatial
data in Spark dataframes, build a ML pipeline using ml_*
family of functions in {sparklyr}
to work with such
features, and if the output of a ML model happens to be a geospatial
object as well, one can even apply visualization routines in
{apache.sedona}
to visualize the difference between any
predicted geometry and the corresponding ground truth.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.