Repository Mirror for your Cloud Server and Webhosting

Title:

Spatial Parallel Computing by Hierarchical Data Partitioning

Version:

0.9.9

Description:

Geospatial data computation is parallelized by grid, hierarchy, or raster files. Based on 'future' (Bengtsson, 2024 <doi:10.32614/CRAN.package.future>) and 'mirai' (Gao et al., 2025 <doi:10.32614/CRAN.package.mirai>) parallel back-ends, 'terra' (Hijmans et al., 2025 <doi:10.32614/CRAN.package.terra>) and 'sf' (Pebesma et al., 2024 <doi:10.32614/CRAN.package.sf>) functions as well as convenience functions in the package can be distributed over multiple threads. The simplest way of parallelizing generic geospatial computation is to start from par_pad_*() functions to par_grid(), par_hierarchy(), or par_multirasters() functions. Virtually any functions accepting classes in 'terra' or 'sf' packages can be used in the three parallelization functions. A common raster-vector overlay operation is provided as a function extract_at(), which uses 'exactextractr' (Baston, 2023 <doi:10.32614/CRAN.package.exactextractr>), with options for kernel weights for summarizing raster values at vector geometries. Other convenience functions for vector-vector operations including simple areal interpolation (summarize_aw()) and summation of exponentially decaying weights (summarize_sedc()) are also provided.

License:

MIT + file LICENSE

URL:

https://docs.ropensci.org/chopin/, https://github.com/ropensci/chopin

BugReports:

https://github.com/ropensci/chopin/issues

Depends:

R (≥ 4.1)

SystemRequirements:

netcdf

Encoding:

UTF-8

LazyData:

true

LazyDataCompression:

RoxygenNote:

7.3.2

Imports:

anticlust, cli, dplyr (≥ 1.1.0), exactextractr (≥ 0.8.2), future, future.apply, igraph, methods, rlang, sf (≥ 1.0-10), sfheaders, stars (≥ 0.6-0), terra (≥ 1.7-18), mirai (≥ 1.3.0), collapse

Suggests:

covr, devtools, targets, DiagrammeR, future.mirai, knitr, lifecycle, rmarkdown, spatstat.random, testthat (≥ 3.0.0), units, withr, dggridR, h3r

VignetteBuilder:

knitr

Config/testthat/edition:

NeedsCompilation:

Packaged:

2025-09-10 13:08:18 UTC; isong

Author:

Insang Song

[aut, cre], Kyle Messier

[aut, ctb], Alec L. Robitaille [rev] (Alec reviewed the package version 0.6.3 for rOpenSci, see <https://github.com/ropensci/software-review/issues/638>), Eric R. Scott [rev] (Eric reviewed the package version 0.6.3 for rOpenSci, see <https://github.com/ropensci/software-review/issues/638>)

Maintainer:

Insang Song <geoissong@gmail.com>

Repository:

CRAN

Date/Publication:

2025-09-10 15:40:02 UTC

Computation of spatial data by hierarchical and objective partitioning of inputs for parallel processing

Description

The chopin package provides a set of functions to compute on divided geospatial data.

Basic functionalities

Distribute terra, sf, and chopin functions to parallel workers set by future or mirai
Set parallelization strategies based on artificial grids, equal-size clusters, hierarchy, and multiple raster files
Convenience functions for raster-vector overlay and weighted summary from vector dataset

`chopin` workflow

The simplest way of parallelizing generic geospatial computation is to start from ⁠par_pad_*⁠ functions to par_grid, or running par_hierarchy, or par_multirasters functions at once.

library(chopin)
library(terra)
library(sf)
library(collapse)
library(dplyr)
library(future)
library(future.mirai)
library(future.apply)
lastpar <- par(mfrow = c(1, 1))
options(sf_use_s2 = FALSE)

Example data

North Carolinian counties and a raster of elevation data are used as example data.

temp_dir <- tempdir(check = TRUE)

url_nccnty <-
  paste0(
   "https://raw.githubusercontent.com/",
   "ropensci/chopin/refs/heads/main/",
   "tests/testdata/nc_hierarchy.gpkg"
 )  

url_ncelev <-
  paste0(
   "https://raw.githubusercontent.com/",
   "ropensci/chopin/refs/heads/main/",
   "tests/testdata/nc_srtm15_otm.tif"
 )

nccnty_path <- file.path(temp_dir, "nc_hierarchy.gpkg")
ncelev_path <- file.path(temp_dir, "nc_srtm15_otm.tif")

# download data
download.file(url_nccnty, nccnty_path, quiet = TRUE)
download.file(url_ncelev, ncelev_path, quiet = TRUE)
nccnty <- terra::vect(nccnty_path)
ncelev <- terra::rast(ncelev_path)

To demonstrate chopin functions, we generate 10,000 random points in North Carolina

ncsamp <-
  terra::spatSample(
    nccnty,
    1e4L
  )
ncsamp$pid <- 1:nrow(ncsamp)

Creating grids

The example below will generate a regular grid from the random point data.

ncgrid <- par_pad_grid(ncsamp, mode = "grid", nx = 4L, ny = 2L, padding = 10000)
plot(ncgrid$original)

Extracting values from raster

Since all ⁠par_*⁠ functions operate on future backends, users should define the future plan before running the functions. multicore plan supports terra objects which may lead to faster computation, but it is not supported in Windows. An alternative is future.mirai's mirai_multisession plan, which is supported in many platforms and generally faster than plain future multisession plan.
workers argument should be defined with an integer value to specify the number of threads to be used.

future::plan(future.mirai::mirai_multisession, workers = 2L)

Then we dispatch multiple extract_at runs on the grid polygons.
Before we proceed, the terra object should be converted to sf object.

pg <-
  par_grid(
    grids = ncgrid,
    pad_y = FALSE,
    .debug = TRUE,
    fun_dist = extract_at,
    x = ncelev_path,
    y = sf::st_as_sf(ncsamp),
    id = "pid",
    radius = 1e4,
    func = "mean"
  )

Hierarchical processing

Here we demonstrate hierarchical processing of the random points using census tract polygons.

nccnty <- sf::st_read(nccnty_path, layer = "county")
nctrct <- sf::st_read(nccnty_path, layer = "tracts")

The example below will parallelize summarizing mean elevation at 10 kilometers circular buffers of random sample points by the first five characters of census tract unique identifiers, which are county codes.
This example demonstrates the hierarchy can be defined from any given polygons if the unique identifiers are suitably formatted for defining the hierarchy.

px <-
  par_hierarchy(
    # from here the par_hierarchy-specific arguments
    regions = nctrct,
    regions_id = "GEOID",
    length_left = 5,
    pad = 10000,
    pad_y = FALSE,
    .debug = TRUE,
    # from here are the dispatched function definition
    # for parallel workers
    fun_dist = extract_at,
    # below should follow the arguments of the dispatched function
    x = ncelev,
    y = sf::st_as_sf(ncsamp),
    id = "pid",
    radius = 1e4,
    func = "mean"
  )

Multiraster processing

Here we demonstrate multiraster processing of the random points using multiple rasters.

ncelev <- terra::rast(ncelev_path)
tdir <- tempdir(check = TRUE)
terra::writeRaster(ncelev, file.path(tdir, "test1.tif"), overwrite = TRUE)
terra::writeRaster(ncelev, file.path(tdir, "test2.tif"), overwrite = TRUE)
terra::writeRaster(ncelev, file.path(tdir, "test3.tif"), overwrite = TRUE)
terra::writeRaster(ncelev, file.path(tdir, "test4.tif"), overwrite = TRUE)
terra::writeRaster(ncelev, file.path(tdir, "test5.tif"), overwrite = TRUE)
rasts <- list.files(tdir, pattern = "tif$", full.names = TRUE)

pm <-
  par_multirasters(
    filenames = rasts,
    fun_dist = extract_at,
    x = NA,
    y = sf::st_as_sf(ncsamp)[1:500, ],
    id = "pid",
    radius = 1e4,
    func = "mean",
    .debug = TRUE
  )
par(lastpar)

Function selection guide for `⁠par_*()⁠`

We provide two flowcharts to help users choose the right function for parallel processing. The raster-oriented flowchart is for users who want to start with raster data, and the vector-oriented flowchart is for users with large vector data.

In raster-oriented selection, we suggest four factors to consider:

Number of raster files: for multiple files, par_multirasters is recommended. When there are multiple rasters that share the same extent and resolution, consider stacking the rasters into multilayer SpatRaster object by calling terra::rast(filenames).
Raster resolution: We suggest 100 meters as a threshold. Rasters with resolution coarser than 100 meters and a few layers would be better for the direct call of exactextractr::exact_extract().
Raster extent: Using SpatRaster in exactextractr::exact_extract() is often minimally affected by the raster extent.
Memory size: max_cells_in_memory argument value of exactextractr::exact_extract(), raster resolution, and the number of layers in SpatRaster are multiplicatively related to the memory usage.

For vector-oriented selection, we suggest three factors to consider:

Number of features: When the number of features is over 100,000, consider using par_grid or par_hierarchy to split the data into smaller chunks.
Hierarchical structure: If the data has a hierarchical structure, consider using par_hierarchy to parallelize the operation.
Data grouping: If the data needs to be grouped in similar sizes, consider using par_pad_balanced or par_pad_grid with mode = "grid_quantile".

Caveats

Why parallelization is slower than the ordinary function run? Parallelization may underperform when the datasets are too small to take advantage of divide-and-compute approach, where parallelization overhead is involved. Overhead here refers to the required amount of computational resources for transferring objects to multiple processes. Since the demonstrations above use quite small datasets, the advantage of parallelization was not as noticeable as it was expected. Should a large amount of data (spatial/temporal resolution or number of files, for example) be processed, users could find the efficiency of this package. A vignette in this package demonstrates use cases extracting various climate/weather datasets.

Notes on data restrictions

chopin works best with two-dimensional (planar) geometries. Users should disable s2 spherical geometry mode in sf by setting sf::sf_use_s2(FALSE). Running any chopin functions at spherical or three-dimensional (e.g., including M/Z dimensions) geometries may produce incorrect or unexpected results.

Author(s)

Maintainer: Insang Song geoissong@gmail.com (ORCID)

Authors:

Kyle Messier (ORCID) [contributor]

Other contributors:

Alec L. Robitaille (Alec reviewed the package version 0.6.3 for rOpenSci, see <https://github.com/ropensci/software-review/issues/638>) [reviewer]
Eric R. Scott (Eric reviewed the package version 0.6.3 for rOpenSci, see <https://github.com/ropensci/software-review/issues/638>) [reviewer]

Check the class of an input object

Description

This function checks the class of an input object and returns "raster" if it is a raster object, or "vector" if it is a vector object.

Usage

.check_character(input)

Arguments

input

The input object to be checked

Value

A character string indicating the class of the input object ("raster" or "vector")

Check Raster Input

Description

This function checks the input object to ensure it is a valid raster object or a character path to a raster file. It also provides warnings and informative messages based on the input type.

Usage

.check_raster(input, extent = NULL, ...)

Arguments

input

The input object to be checked. It can be either a SpatRaster object or a character path to a raster file.

extent

The extent of the raster. Defaults to NULL. Numeric vector should be put in order of c(xmin, xmax, ymin, ymax).

...

Placeholder.

Value

The validated input object.

Check the subject object and perform necessary conversions if needed.

Description

This function checks the class of the input object and performs necessary conversions if needed.

Usage

.check_vector(
  input,
  input_id = NULL,
  extent = NULL,
  out_class = character(1),
  ...
)

Arguments

input

sf/SpatVector/character. The input object to be checked.

input_id

character(1). ID field of the subject object.

extent

numeric(4). The extent of the subject object. Numeric vector should be put in order of c(xmin, xmax, ymin, ymax).

out_class

character(1). The class of the output object. Should be one of c("sf", "terra").

...

Placeholder.

Value

The checked and converted subject object.

Get intersection extent

Description

Get intersection extent

Usage

.intersect_extent(input, out_class, ...)

## S4 method for signature 'sf,ANY'
.intersect_extent(input, out_class = NULL, ...)

## S4 method for signature 'SpatExtent,ANY'
.intersect_extent(input, out_class = NULL, ...)

## S4 method for signature 'SpatVector,ANY'
.intersect_extent(input, out_class = NULL, ...)

## S4 method for signature 'numeric,character'
.intersect_extent(input, out_class = NULL, ...)

Arguments

input

sf/SpatExtent/SpatVector/numeric

out_class

character(1). "sf" or "terra"

...

other arguments. Placeholder.

Prescreen input data for parallelization

Description

This function takes input object and type character to ingest the input object to return the object in the desired class.

Usage

.par_screen(type, input, input_id = NULL, out_class = "terra", .window = NULL)

Arguments

type

character(1). "raster" or "vector".

input

object. Input object.

input_id

character(1). Default is NULL. If NULL, the function will not check the object with an ID column.

out_class

character(1). Default is NULL, but should be one of c("sf", "terra"). Default is "terra".

.window

numeric(4)/SpatExtent/st_bbox object. Loading window.

Return the input's GIS data model type

Description

This function returns one of 'vector' or 'raster' depending on the input class.

Usage

datamod(input)

Arguments

input

Spat*/sf/stars object.

Value

character(1). One of "vector" or "raster".

Note

Although stars object is a little ambiguous whether to classify vector or raster, it will be considered raster in this package.

Author(s)

Insang Song

Return the package the input object is based on

Description

Detect whether the input object is sf or Spat* object.

Usage

dep_check(input)

Arguments

input

Spat* in terra or sf object.

Value

A character object; one of "character", "terra" and "sf"

Author(s)

Insang Song

Switch spatial data class

Description

Convert class between sf/stars-terra

Usage

dep_switch(input)

Arguments

input

Spat* in terra or sf object.

Value

Data converted to the other package class (if sf, terra; if terra, sf)

Author(s)

Insang Song

Extract raster values with point buffers or polygons

Description

Extract raster values with point buffers or polygons

Usage

extract_at(x, y, ...)

## S4 method for signature 'SpatRaster,sf'
extract_at(
  x = NULL,
  y = NULL,
  id = NULL,
  func = "mean",
  extent = NULL,
  radius = NULL,
  out_class = "sf",
  kernel = NULL,
  kernel_func = stats::weighted.mean,
  bandwidth = NULL,
  max_cells = 3e+07,
  .standalone = TRUE,
  ...
)

## S4 method for signature 'character,character'
extract_at(
  x = NULL,
  y = NULL,
  id = NULL,
  func = "mean",
  extent = NULL,
  radius = NULL,
  out_class = "sf",
  kernel = NULL,
  kernel_func = stats::weighted.mean,
  bandwidth = NULL,
  max_cells = 3e+07,
  .standalone = TRUE,
  ...
)

## S4 method for signature 'SpatRaster,character'
extract_at(
  x = NULL,
  y = NULL,
  id = NULL,
  func = "mean",
  extent = NULL,
  radius = NULL,
  out_class = "sf",
  kernel = NULL,
  kernel_func = stats::weighted.mean,
  bandwidth = NULL,
  max_cells = 3e+07,
  .standalone = TRUE,
  ...
)

## S4 method for signature 'SpatRaster,SpatVector'
extract_at(
  x = NULL,
  y = NULL,
  id = NULL,
  func = "mean",
  extent = NULL,
  radius = NULL,
  out_class = "sf",
  kernel = NULL,
  kernel_func = stats::weighted.mean,
  bandwidth = NULL,
  max_cells = 3e+07,
  .standalone = TRUE,
  ...
)

## S4 method for signature 'character,sf'
extract_at(
  x = NULL,
  y = NULL,
  id = NULL,
  func = "mean",
  extent = NULL,
  radius = NULL,
  out_class = "sf",
  kernel = NULL,
  kernel_func = stats::weighted.mean,
  bandwidth = NULL,
  max_cells = 3e+07,
  .standalone = TRUE,
  ...
)

## S4 method for signature 'character,SpatVector'
extract_at(
  x = NULL,
  y = NULL,
  id = NULL,
  func = "mean",
  extent = NULL,
  radius = NULL,
  out_class = "sf",
  kernel = NULL,
  kernel_func = stats::weighted.mean,
  bandwidth = NULL,
  max_cells = 3e+07,
  .standalone = TRUE,
  ...
)

Arguments

x

SpatRaster object or file path(s) with extensions that are GDAL-compatible. When multiple file paths are used, the rasters must have the same extent and resolution.

y

sf/SpatVector object or file path.

...

Placeholder.

id

character(1). Unique identifier of each point.

func

function taking one numeric vector argument. Default is "mean" for all supported signatures in arguments x and y.

extent

numeric(4) or SpatExtent. Extent of clipping vector. It only works with points of character(1) file path.

radius

numeric(1). Buffer radius.

out_class

character(1). Output class. One of sf or terra.

kernel

character(1). Name of a kernel function One of "uniform", "triweight", "quartic", and "epanechnikov"

kernel_func

function. Kernel function to apply to the extracted values. Default is stats::weighted.mean()

bandwidth

numeric(1). Kernel bandwidth.

max_cells

integer(1). Maximum number of cells in memory.

.standalone

logical(1). Default is TRUE, which means that the function will be executed in a standalone mode. When using this function in ⁠par_*⁠ functions, set this to FALSE.

Details

Inputs are preprocessed in different ways depending on the class.

Vector inputs in y: sf is preferred, thus character and SpatVector inputs will be converted to sf object. If radius is not NULL, sf::st_buffer is used to generate circular buffers as subsequent raster-vector overlay is done with exactextractr::exact_extract.
Raster input in x: SpatRaster is preferred. If the input is not SpatRaster, it will be converted to SpatRaster object.

Value

A data.frame object with summarized raster values with respect to the mode (polygon or buffer) and the function.

Author(s)

Insang Song geoissong@gmail.com

Examples

ncpath <- system.file("gpkg/nc.gpkg", package = "sf")
rastpath <- file.path(tempdir(), "test.tif")

nc <- terra::vect(ncpath)
nc <- terra::project(nc, "EPSG:5070")
rrast <- terra::rast(nc, nrow = 300, ncol = 660)
terra::values(rrast) <- rgamma(1.98e5, 4, 2)
rpnt <- terra::spatSample(rrast, 16L, as.points = TRUE)
rpnt$pid <- sprintf("ID-%02d", seq(1, 16))

extract_at(rrast, rpnt, "pid", "mean", radius = 1000)
extract_at(rrast, nc, "NAME", "mean")
extract_at(rrast, ncpath, "NAME", "mean")
# Using SpatRaster object
suppressWarnings(
  extract_at(
    rrast, ncpath, "NAME", "mean",
    kernel = "epanechnikov",
    bandwidth = 1e5
  )
)
# Using raster path
terra::writeRaster(rrast, rastpath, overwrite = TRUE)
suppressWarnings(
  extract_at(
    rastpath, ncpath, "NAME", "mean",
    kernel = "epanechnikov",
    bandwidth = 1e5
  )
)

Setting the clipping extent

Description

Return clipping extent with buffer radius. It assumes the input CRS is projected and linear unit is meters.

Usage

get_clip_ext(pnts, radius, extrusion = 1.1)

Arguments

pnts

One of sf or SpatVector object. Target points of computation.

radius

numeric(1). Buffer radius. It is assumed to be in meters

extrusion

numeric(1). The extent extrusion factor. Default is 1.1, meaning that the actual padding is 10 percent wider than radius.

Value

A terra::ext or sfc_POLYGON object of the computation extent.

Author(s)

Insang Song

Subset for nonidentical package class objects

Description

Subset for nonidentical package class objects

Intersect different data model objects

Usage

## S4 method for signature 'SpatVector,bbox,missing,ANY'
x[i, j]

## S4 method for signature 'SpatVector,sf,missing,ANY'
x[i, j]

## S4 method for signature 'SpatVector,sfc,missing,ANY'
x[i, j]

## S4 method for signature 'SpatVector,SpatExtent,missing,ANY'
x[i, j]

.intersect(x, y)

Arguments

x

SpatVector/sf/SpatRaster object to be intersected.

i

Dataset used to subset x.

j

Column indices or names.

y

SpatVector/sf object. Intersecting object.

Kernel functions

Description

Kernel functions

Usage

kernelfunction(
  d,
  bw,
  kernel = c("uniform", "quartic", "triweight", "epanechnikov")
)

Arguments

d

Distance

bw

Bandwidth of a kernel

kernel

Kernel type. One of "uniform", "quartic", "triweight", and "epanechnikov"

Value

numeric. Kernel weights.

References

https://github.com/JanCaha/SpatialKDE

Examples

v_dist <- c(1, 10, 100, 25, 50, 0.1)
bw_dist1 <- 1
bw_dist2 <- 10
kernelfunction(v_dist, bw_dist1, "uniform")
kernelfunction(v_dist, bw_dist1, "quartic")
kernelfunction(v_dist, bw_dist1, "triweight")
kernelfunction(v_dist, bw_dist1, "epanechnikov")
kernelfunction(v_dist, bw_dist2, "uniform")
kernelfunction(v_dist, bw_dist2, "quartic")
kernelfunction(v_dist, bw_dist2, "triweight")
kernelfunction(v_dist, bw_dist2, "epanechnikov")

Mildly clustered points in North Carolina, United States

Description

Mildly clustered points in North Carolina, United States

Usage

ncpoints

Format

A data frame with 2,304 rows and two variables:

X: X coordinate
Y: Y coordinate

Note

Coordinates are in EPSG:5070 (Conus Albers Equal Area)

Source

sf package data nc

Examples

data("ncpoints", package = "chopin")

Map specified arguments to others in literals

Description

This function creates a new function that wraps an existing function, remapping the argument names based on a user-specified literal mapping. Specifically, arguments passed to the new function with names on the left-hand side of the mapping are renamed to the corresponding right-hand side names before being passed to the original function. Users map two arguments without x and/or y to standardize the argument names, x and y, to the target function. This function is particularly useful to parallelize functions for spatial data outside sf and terra packages that do not have arguments named x and/or y. ⁠par_*⁠ functions could detect such functions by wrapping nonstandardized functions to parallelize the computation.

Usage

par_convert_f(fun, ...)

Arguments

fun

A function to be wrapped.

...

A set of named arguments representing the mapping from the new function's argument names (left-hand side) to the original function's argument names (right-hand side). For example, x = group, y = score maps argument x to group and y to score.

Value

A new function that accepts the remapped arguments and calls the original function.

Examples

# Define an original function that expects arguments 'group' and 'score'
original_fun <- function(group, score, home = FALSE) {
  list(group = group, score = score, home = home)
}

# Create a new function that maps 'x' to 'group' and 'y' to 'score'
new_fun <- par_convert_f(original_fun, x = group, y = score)

# Call the new function using the new argument names
result <- new_fun(x = 10, y = 20)
print(result)

Partition coordinates into quantile polygons

Description

Partition coordinates into quantile polygons

Usage

par_cut_coords(x = NULL, y = NULL, quantiles)

Arguments

x

numeric/sf/SpatVector. x-coordinates (if numeric).

y

numeric. y-coordinates.

quantiles

numeric vector. Quantiles.

Value

A SpatVector object with field CGRIDID.

Note

This function is only for two-dimensional points.

Quantile definition

Description

Quantile definition

Usage

par_def_q(steps = 4L)

Arguments

steps

integer(1). The number of quantiles.

Value

numeric vector of quantiles.

Parallelize spatial computation over the computational grids

Description

future::multicore, future::multisession, future::cluster future.mirai::mirai_multisession in future::plan will parallelize the work in each grid. For details of the terminology in future package, refer to future::plan. This function assumes that users have one raster file and a sizable and spatially distributed target locations. Each thread will process the nearest integer of $|N_g| / |N_t|$ grids where $|N_g|$ denotes the number of grids and $|N_t|$ denotes the number of threads.

Usage

par_grid(grids, fun_dist, ..., pad_y = FALSE, .debug = FALSE)

Arguments

grids

List of two sf/SpatVector objects. Computational grids. It takes a strict assumption that the grid input is an output of par_pad_grid.

fun_dist

sf, terra or chopin functions. This function should have x and y arguments.

...

Arguments passed to the argument fun_dist.

pad_y

logical(1). Whether to filter y with the padded grid. Should be TRUE when x is where the values are calculated. Default is FALSE. In the reverse case, like terra::extent or exactextractr::exact_extract, the raster (x) extent should be set with the padded grid.

.debug

logical(1). Default is FALSE. Otherwise, if a unit computation fails, the error message and the CGRIDID value where the error occurred will be included in the output.

Value

a data.frame object with computation results. For entries of the results, consult the documentation of the function put in fun_dist argument.

Note

In dynamic dots (...), fun_dist arguments should include x and y where sf/terra class objects or file paths are accepted. Virtually any sf/terra functions that accept two arguments can be put in fun_dist; however, be advised that some spatial operations do not necessarily give the exact result from what would have been done with one thread. For example, distance calculated through this function may return the lower value than actual because the computational region was reduced. This would be the case especially where the target features are spatially sparsely distributed.

Author(s)

Insang Song geoissong@gmail.com

Examples


lastpar <- par(mfrow = c(1, 1))
library(sf)
library(future)
library(future.mirai)
options(sf_use_s2 = FALSE)
plan(mirai_multisession, workers = 2)
ncpath <- system.file("shape/nc.shp", package = "sf")
ncpoly <- sf::st_read(ncpath)
ncpoly <- sf::st_transform(ncpoly, "EPSG:5070")

# sf object
ncpnts <- sf::st_sample(ncpoly, 2000)
ncpnts <- sf::st_as_sf(ncpnts)
ncpnts$pid <- seq_len(nrow(ncpnts))

# file path
rrast <- terra::rast(ncpoly, nrow = 600, ncol = 1320)
terra::values(rrast) <- rgamma(7.92e5, 4, 2)

# Using raster path
rastpath <- file.path(tempdir(), "ncelev.tif")
terra::writeRaster(rrast, rastpath, overwrite = TRUE)

nccompreg <-
  chopin::par_pad_grid(
    input = ncpnts,
    mode = "grid",
    nx = 4L,
    ny = 2L,
    padding = 5e3L
  )
res <-
  par_grid(
    grids = nccompreg,
    fun_dist = extract_at,
    x = rastpath,
    y = ncpnts,
    qsegs = 90L,
    radius = 5e3L,
    id = "pid"
  )
future::plan(future::sequential)
mirai::daemons(0)
par(lastpar)

Parallelize spatial computation over the computational grids

Description

mirai::daemons will set the parallel backend then mirai::mirai_map will parallelize the work in each grid. For details of the terminology in mirai package, refer to mirai::mirai. This function assumes that users have one raster file and a sizable and spatially distributed target locations. Each thread will process the nearest integer of $|N_g| / |N_t|$ grids where $|N_g|$ denotes the number of grids and $|N_t|$ denotes the number of threads.

Usage

par_grid_mirai(grids, fun_dist, ..., pad_y = FALSE, .debug = TRUE)

Arguments

grids

List of two sf/SpatVector objects. Computational grids. It takes a strict assumption that the grid input is an output of par_pad_grid.

fun_dist

sf, terra or chopin functions. This function should have x and y arguments.

...

Arguments passed to the argument fun_dist.

pad_y

.debug

logical(1). Default is FALSE. Otherwise, if a unit computation fails, the error message and the CGRIDID value where the error occurred will be included in the output.

Value

a data.frame object with computation results. For entries of the results, consult the documentation of the function put in fun_dist argument.

Note

In dynamic dots (...), fun_dist arguments should include x and y where sf/terra class objects or file paths are accepted. Virtually any sf/terra functions that accept two arguments can be put in fun_dist; however be advised that some spatial operations do not necessarily give the exact result from what would have been done with one thread. For example, distance calculated through this function may return the lower value than actual because the computational region was reduced. This would be the case especially where the target features are spatially sparsely distributed.

Author(s)

Insang Song geoissong@gmail.com

Examples


lastpar <- par(mfrow = c(1, 1))
library(sf)
library(mirai)
options(sf_use_s2 = FALSE)
daemons(4)
ncpath <- system.file("shape/nc.shp", package = "sf")
ncpoly <- sf::st_read(ncpath)
ncpoly <- sf::st_transform(ncpoly, "EPSG:5070")

# sf object
ncpnts <-
  sf::st_sample(ncpoly, 2000)
ncpnts <- sf::st_as_sf(ncpnts)
ncpnts$pid <- seq_len(nrow(ncpnts))

# file path
rrast <- terra::rast(ncpoly, nrow = 600, ncol = 1320)
terra::values(rrast) <- rgamma(7.92e5, 4, 2)
# Using raster path
rastpath <- file.path(tempdir(), "ncelev.tif")
terra::writeRaster(rrast, rastpath, overwrite = TRUE)

nccompreg <-
  chopin::par_pad_grid(
    input = ncpnts,
    mode = "grid",
    nx = 4L,
    ny = 2L,
    padding = 5e3L
  )
res <-
  par_grid_mirai(
    grids = nccompreg,
    fun_dist = extract_at,
    x = rastpath,
    y = ncpnts,
    qsegs = 90L,
    radius = 5e3L,
    id = "pid"
  )
mirai::daemons(0L)
par(lastpar)

Parallelize spatial computation by hierarchy in input data

Description

"Hierarchy" refers to a system, which divides the entire study region into multiple subregions. It is oftentimes reflected in an area code system (e.g., FIPS for US Census geographies and Nomenclature of Territorial Units for Statistics (NUTS), etc.). future::multisession, future::multicore, future::cluster, future.mirai::mirai_multisession in future::plan will parallelize the work by splitting lower level features into several higher level feature group. For details of the terminology in future package, please refer to future::plan documentation. Each thread will process the number of lower level features in each higher level feature. Be advised that accessing the same file simultaneously with multiple processes may result in errors.

Usage

par_hierarchy(
  regions,
  regions_id = NULL,
  length_left = NULL,
  pad = 0,
  pad_y = FALSE,
  fun_dist,
  ...,
  .debug = FALSE
)

Arguments

regions

sf/SpatVector object. Computational regions. Only polygons are accepted.

regions_id

character(1). Name of unique ID field in regions. The regions will be split by the common level value.

length_left

integer(1). Length of the first characters of the regions_id values. Default is NULL, which will not manipulate the regions_id values. If the number of characters is not consistent (for example, numerics), the function will alert the user.

pad

numeric(1). Padding distance for each subregion defined by regions_id or trimmed regions_id values. in linear unit of coordinate system. Default is 0, which means each subregion is used as is. If the value is greater than 0, the subregion will be buffered by the value. The padding distance will be applied to x (pad_y = FALSE) or y (pad_y = TRUE) to filter the data.

pad_y

fun_dist

sf, terra, or chopin functions. This function should have x and y arguments.

...

Arguments passed to the argument fun_dist.

.debug

logical(1). Default is FALSE If a unit computation fails, the error message and the regions_id value where the error occurred will be included in the output.

Details

In dynamic dots (...), fun_dist arguments should include x and y where sf/terra class objects or file paths are accepted. Hierarchy is interpreted by the regions_id argument first. regions_id is assumed to be a field name in the x or y argument object. It is expected that regions represents the higher level boundaries and x or y in fun_dist is the lower level boundaries. However, if that is not the case, with trim argument, the function will generate the higher level codes from regions_id by extracting left-t Whether x or y is searched is determined by pad_y value. pad_y = TRUE will make the function attempt to find regions_id in x, whereas pad_y = FALSE will look for regions_id at y. If the regions_id doesn't exist in x or y, the function will utilize spatial relationship (intersects) to filter the data. Note that dispatching computation by subregions based on the spatial relationship may lead to a slight discrepancy in the result. For example, if the higher and lower level features are not perfectly aligned, there may be some features that are not included or duplicated in the subregions. The function will alert the user if spatial relation- ship is used to filter the data.

Value

a data.frame object with computation results. For entries of the results, consult the function used in fun_dist argument.

Note

Virtually any sf/terra functions that accept two arguments can be put in fun_dist; however, be advised that some spatial operations do not necessarily give the exact result from what would have been done with single thread. For example, distance calculated through this function may return the lower value than actual because the computational region was reduced. This would be the case especially where the target features are spatially sparsely distributed.

Author(s)

Insang Song geoissong@gmail.com

Examples


lastpar <- par(mfrow = c(1, 1))
library(terra)
library(sf)
library(future)
library(future.mirai)
options(sf_use_s2 = FALSE)
future::plan(future.mirai::mirai_multisession, workers = 2)

nccnty <- sf::st_read(
  system.file("shape/nc.shp", package = "sf")
)
nccnty <- sf::st_transform(nccnty, "EPSG:5070")
nccnty <- nccnty[seq_len(30L), ]

nccntygrid <- sf::st_make_grid(nccnty, n = c(200, 100))
nccntygrid <- sf::st_as_sf(nccntygrid)
nccntygrid$GEOID <- sprintf("%05d", seq_len(nrow(nccntygrid)))
nccntygrid <- sf::st_intersection(nccntygrid, nccnty)

rrast <- terra::rast(nccnty, nrow = 600, ncol = 1320)
terra::values(rrast) <- rgamma(7.92e5, 4, 2)

# Using raster path
rastpath <- file.path(tempdir(), "ncelev.tif")
terra::writeRaster(rrast, rastpath, overwrite = TRUE)

ncsamp <-
  sf::st_sample(
    nccnty,
    size = 1e4L
  )
# sfc to sf
ncsamp <- sf::st_as_sf(ncsamp)
# assign ID
ncsamp$kid <- sprintf("K-%05d", seq_len(nrow(ncsamp)))
res <-
  par_hierarchy(
    regions = nccnty,
    regions_id = "FIPS",
    fun_dist = extract_at,
    y = nccntygrid,
    x = rastpath,
    id = "GEOID",
    func = "mean"
  )
future::plan(future::sequential)
mirai::daemons(0)
par(lastpar)

Parallelize spatial computation by hierarchy in input data

Description

"Hierarchy" refers to a system, which divides the entire study region into multiple subregions. It is usually reflected in an area code system (e.g., FIPS for US Census geographies and Nomenclature of Territorial Units for Statistics (NUTS), etc.). mirai::daemons will set the parallel backend then mirai::mirai_map will the work by splitting lower level features into several higher level feature group. For details of the terminology in mirai package, refer to mirai::mirai. Each thread will process the number of lower level features in each higher level feature. Be advised that accessing the same file simultaneously with multiple processes may result in errors.

Usage

par_hierarchy_mirai(
  regions,
  regions_id = NULL,
  length_left = NULL,
  pad = 0,
  pad_y = FALSE,
  fun_dist,
  ...,
  .debug = TRUE
)

Arguments

regions

sf/SpatVector object. Computational regions. Only polygons are accepted.

regions_id

character(1). Name of unique ID field in regions. The regions will be split by the common level value.

length_left

pad

pad_y

fun_dist

sf, terra, or chopin functions. This function should have x and y arguments.

...

Arguments passed to the argument fun_dist.

.debug

logical(1). Default is FALSE If a unit computation fails, the error message and the regions_id value where the error occurred will be included in the output.

Details

In dynamic dots (...), fun_dist arguments should include x and y where sf/terra class objects or file paths are accepted. Hierarchy is interpreted by the regions_id argument first. regions_id is assumed to be a field name in the x or y argument object. It is expected that regions represents the higher level boundaries and x or y in fun_dist is the lower level boundaries. However, if that is not the case, with trim argument, the function will generate the higher level codes from regions_id by extracting the code from the left end (controlled by length_left). Whether x or y is searched is determined by pad_y value. pad_y = TRUE will make the function attempt to find regions_id in x, whereas pad_y = FALSE will look for regions_id at y. If the regions_id doesn't exist in x or y, the function will utilize spatial relationship (intersects) to filter the data. Note that dispatching computation by subregions based on the spatial relationship may lead to a slight discrepancy in the result. For example, if the higher and lower level features are not perfectly aligned, there may be some features that are not included or duplicated in the subregions. The function will alert the user if spatial relation- ship is used to filter the data.

Value

a data.frame object with computation results. For entries of the results, consult the function used in fun_dist argument.

Note

Virtually any sf/terra functions that accept two arguments can be put in fun_dist; however, be advised that some spatial operations do not necessarily give the exact result from what would have been done with one thread. For example, distance calculated through this function may return the lower value than actual because the computational region was reduced. This would be the case especially where the target features are spatially sparsely distributed.

Author(s)

Insang Song geoissong@gmail.com

Examples


lastpar <- par(mfrow = c(1, 1))
library(terra)
library(sf)
library(mirai)
options(sf_use_s2 = FALSE)
mirai::daemons(4)

nccnty <- sf::st_read(
  system.file("shape/nc.shp", package = "sf")
)
nccnty <- sf::st_transform(nccnty, "EPSG:5070")

nccntygrid <- sf::st_make_grid(nccnty, n = c(200, 100))
nccntygrid <- sf::st_as_sf(nccntygrid)
nccntygrid$GEOID <- sprintf("%05d", seq_len(nrow(nccntygrid)))
nccntygrid <- sf::st_intersection(nccntygrid, nccnty)

rrast <- terra::rast(nccnty, nrow = 600, ncol = 1320)
terra::values(rrast) <- rgamma(7.92e5, 4, 2)

# Using raster path
rastpath <- file.path(tempdir(), "ncelev.tif")
terra::writeRaster(rrast, rastpath, overwrite = TRUE)

ncsamp <-
  sf::st_sample(
    nccnty,
    size = 1e4L
  )
# sfc to sf
ncsamp <- sf::st_as_sf(ncsamp)
# assign ID
ncsamp$kid <- sprintf("K-%05d", seq_len(nrow(ncsamp)))
res <-
  par_hierarchy_mirai(
    regions = nccnty,
    regions_id = "FIPS",
    fun_dist = extract_at,
    y = nccntygrid,
    x = rastpath,
    id = "GEOID",
    func = "mean",
    .debug = TRUE
  )
mirai::daemons(0L)
par(lastpar)

Generate groups based on balanced clustering

Description

For balancing computational loads, the function uses the anticlust package to cluster the input points. The number of clusters is determined by the num_cluster argument. Each cluster will have equal number of points. Grids will be generated based on the cluster extents. At the lower level, the function uses terra::distance() function to calculate the Euclidean distance between points.

Usage

par_make_balanced(points_in = NULL, n_clusters = NULL)

Arguments

points_in

sf or SpatVector object. Target points of computation.

n_clusters

integer(1). The number of clusters.

Value

SpatVector object with a field "CGRIDID".

Note

This function is only for two-dimensional points. The results will be irregular grids with or without overlapping parts.

Author(s)

Insang Song

Convert DGGRID indices to sf object

Description

This function converts DGGRID indices to an sf object. It requires the dggridR package to be installed.

Usage

par_make_dggrid(x, res = 8L, topology = "HEXAGON")

Arguments

x

sf object.

res

integer(1). DGGRID resolution. Default is 8L.

topology

character(1). Topology type, either "HEXAGON" or "SQUARE". Default is "HEXAGON".

Details

dggridR::dgconstruct is used to create a DGGRID object with the specified resolution. All arguments in this function are used as default values other than res and topology.

Value

An sf object with polygons representing the DGGRID indices.

Author(s)

Insang Song

Examples

lastpar <- par(mfrow = c(1, 1))
library(sf)
if (rlang::is_installed("dggridR")) {
  library(dggridR)
  options(sf_use_s2 = FALSE)
  ncpath <- system.file("shape/nc.shp", package = "sf")
  nc <- read_sf(ncpath)
  nc <- st_transform(nc, "EPSG:4326")
  nc_comp_region_dggrid <-
    par_make_dggrid(
      nc,
      res = 8L,
      topology = "HEXAGON"
    )
  plot(sf::st_geometry(nc_comp_region_dggrid))
}
par(lastpar)

Generate grid polygons

Description

Returns a sf object that includes x- and y- index by using two inputs ncutsx and ncutsy, which are x- and y-directional splits, respectively.

Usage

par_make_grid(points_in = NULL, ncutsx = NULL, ncutsy = NULL)

Arguments

points_in

sf or SpatVector object. Target points of computation. character(1) of file path is also acceptable.

ncutsx

integer(1). The number of splits along x-axis.

ncutsy

integer(1). The number of splits along y-axis.

Value

A sf or SpatVector object of computation grids with unique grid id (CGRIDID).

Note

Grids are generated based on the extent of points_in first, then exhaustive grids will be filtered by the intersection between these and points_in. Thus, the number of generated grids may be smaller than ncutsx * ncutsy.

Author(s)

Insang Song

Convert H3 indices to sf object

Description

This function converts an input sf to an sf object with H3 hexagons. It requires the h3r package to be installed.

Usage

par_make_h3(x, res = 5L)

Arguments

x

sf object.

res

integer(1). H3 resolution. Default is 5L.

Details

Non-polygon x will be converted to polygons using sf::st_concave_hull. If the input is not convertible to polygons, the function will throw an error.

Value

An sf object with polygons representing the H3 indices.

Author(s)

Insang Song

Examples

lastpar <- par(mfrow = c(1, 1))
library(sf)
if (rlang::is_installed("h3r")) {
library(h3r)
options(sf_use_s2 = FALSE)
ncpath <- system.file("shape/nc.shp", package = "sf")
nc <- read_sf(ncpath)
nc <- st_transform(nc, "EPSG:4326")
# note that it will throw a warning if
# the input is MULTIPOLYGON.
nc_comp_region_h3 <-
  suppressWarnings(
    par_make_h3(
      nc,
      res = 5L
    )
  )
plot(sf::st_geometry(nc_comp_region_h3))
}
par(lastpar)

Merge adjacent grid polygons with given rules

Description

Merge boundary-sharing (in "Rook" contiguity) grids with fewer target features than the threshold. This function strongly assumes that the input is returned from the internal function par_make_grid, which has "CGRIDID" as the unique id field.

Usage

par_merge_grid(
  points_in = NULL,
  grid_in = NULL,
  grid_min_features = NULL,
  merge_max = 4L
)

Arguments

points_in

sf or SpatVector object. Target points of computation.

grid_in

sf or SpatVector object. The grid generated by the internal function par_make_grid.

grid_min_features

integer(1). Threshold to merge adjacent grids.

merge_max

integer(1). Maximum number of grids to merge per merged set. Default is 4. For example, if the number of grids to merge is 20 and merge_max is 10, the function will split the 20 grids into two sets of 10 grids.

Value

A sf or SpatVector object of computation grids.

Note

This function will not work properly if grid_in has more than one million grids.

Author(s)

Insang Song

References

Polsby DD, Popper FJ. (1991). The Third Criterion: Compactness as a Procedural Safeguard Against Partisan Gerrymandering. Yale Law & Policy Review, 9(2), 301–353.

Examples

lastpar <- par(mfrow = c(1, 1))
library(sf)
library(igraph)
library(dplyr)
library(spatstat.random)
options(sf_use_s2 = FALSE)

dg <- sf::st_as_sfc(st_bbox(c(xmin = 0, ymin = 0, xmax = 8e5, ymax = 6e5)))
sf::st_crs(dg) <- 5070
dgs <- sf::st_as_sf(st_make_grid(dg, n = c(20, 15)))
dgs$CGRIDID <- seq(1, nrow(dgs))

dg_sample <- sf::st_sample(dg,
  kappa = 5e-9, mu = 15,
  scale = 15000, type = "Thomas"
)
sf::st_crs(dg_sample) <- sf::st_crs(dg)
dg_merged <- par_merge_grid(sf::st_as_sf(dg_sample), dgs, 100)

plot(sf::st_geometry(dg_merged))
par(lastpar)

Parallelize spatial computation over multiple raster files

Description

Large raster files usually exceed the memory capacity in size. This function can be helpful to process heterogenous raster files with homogeneous summary functions. Heterogenous raster files refer to rasters with different spatial extents and resolutions. Cropping a large raster into a small subset even consumes a lot of memory and adds processing time. This function leverages terra SpatRaster to distribute computation jobs over multiple threads. It is assumed that users have multiple large raster files in their disk, then each file path is assigned to a thread. Each thread will directly read raster values from the disk using C++ pointers that operate in terra functions. For use, it is strongly recommended to use vector data with small and confined spatial extent for computation to avoid out-of-memory error. y argument in fun_dist will be used as-is. That means no preprocessing or subsetting will be applied. Please be aware of the spatial extent and size of the inputs.

Usage

par_multirasters(filenames, fun_dist, ..., .debug = FALSE)

Arguments

filenames

character. A vector or list of full file paths of raster files. n is the total number of raster files.

fun_dist

terra or chopin functions that accept SpatRaster object in an argument. In particular, x and y arguments should be present and x should be a SpatRaster.

...

Arguments passed to the argument fun_dist.

.debug

logical(1). Default is FALSE. If TRUE and a unit computation fails, the error message and the file path where the error occurred will be included in the output.

Value

a data.frame object with computation results. For entries of the results, consult the function used in fun_dist argument.

Author(s)

Insang Song geoissong@gmail.com

Examples


lastpar <- par(mfrow = c(1, 1))
library(terra)
library(sf)
library(future)
library(future.mirai)
options(sf_use_s2 = FALSE)
future::plan(future.mirai::mirai_multisession, workers = 2)

nccnty <- sf::st_read(
  system.file("shape/nc.shp", package = "sf")
)
nccnty <- sf::st_transform(nccnty, "EPSG:5070")

nccntygrid <- sf::st_make_grid(nccnty, n = c(200, 100))
nccntygrid <- sf::st_as_sf(nccntygrid)
nccntygrid$GEOID <- sprintf("%05d", seq_len(nrow(nccntygrid)))
nccntygrid <- sf::st_intersection(nccntygrid, nccnty)

rrast <- terra::rast(nccnty, nrow = 600, ncol = 1320)
terra::values(rrast) <- rgamma(7.92e5, 4, 2)

tdir <- tempdir(check = TRUE)
testpath1 <- file.path(tdir, "test1.tif")
testpath2 <- file.path(tdir, "test2.tif")
terra::writeRaster(rrast, testpath1, overwrite = TRUE)
terra::writeRaster(rrast, testpath2, overwrite = TRUE)
testfiles <- list.files(tdir, pattern = "tif$", full.names = TRUE)

res <- par_multirasters(
  filenames = testfiles,
  fun_dist = extract_at,
  x = testpath1,
  y = nccnty,
  id = "GEOID",
  func = "mean"
)

future::plan(future::sequential)
mirai::daemons(0)
par(lastpar)

Parallelize spatial computation over multiple raster files

Description

Usage

par_multirasters_mirai(filenames, fun_dist, ..., .debug = TRUE)

Arguments

filenames

character. A vector or list of full file paths of raster files. n is the total number of raster files.

fun_dist

terra or chopin functions that accept SpatRaster object in an argument. In particular, x and y arguments should be present and x should be a SpatRaster.

...

Arguments passed to the argument fun_dist.

.debug

logical(1). Default is FALSE. If TRUE and a unit computation fails, the error message and the file path where the error occurred will be included in the output.

Value

a data.frame object with computation results. For entries of the results, consult the function used in fun_dist argument.

Author(s)

Insang Song geoissong@gmail.com

Examples


lastpar <- par(mfrow = c(1, 1))
library(terra)
library(sf)
library(mirai)
options(sf_use_s2 = FALSE)
mirai::daemons(4)

nccnty <- sf::st_read(
  system.file("shape/nc.shp", package = "sf")
)
nccnty <- sf::st_transform(nccnty, "EPSG:5070")
nccnty <- nccnty[seq_len(30L), ]

nccntygrid <- sf::st_make_grid(nccnty, n = c(200, 100))
nccntygrid <- sf::st_as_sf(nccntygrid)
nccntygrid$GEOID <- sprintf("%05d", seq_len(nrow(nccntygrid)))
nccntygrid <- sf::st_intersection(nccntygrid, nccnty)

rrast <- terra::rast(nccnty, nrow = 600, ncol = 1320)
terra::values(rrast) <- rgamma(7.92e5, 4, 2)

tdir <- tempdir(check = TRUE)
terra::writeRaster(rrast, file.path(tdir, "test1.tif"), overwrite = TRUE)
terra::writeRaster(rrast, file.path(tdir, "test2.tif"), overwrite = TRUE)
testfiles <- list.files(tdir, pattern = "tif$", full.names = TRUE)

res <- par_multirasters_mirai(
  filenames = testfiles,
  fun_dist = extract_at,
  x = rrast,
  y = nccnty,
  id = "GEOID",
  func = "mean"
)
mirai::daemons(0L)
par(lastpar)

Extension of par_make_balanced for padded grids

Description

This function utilizes anticlust::balanced_clustering() to split the input into equal size subgroups then transform the data to be compatible with the output of par_pad_grid, for which a set of padded grids of the extent of input point subsets (as recorded in the field named "CGRIDID") is generated out of input points.

Usage

par_pad_balanced(points_in = NULL, ngroups, padding)

Arguments

points_in

sf or SpatVector object. Point geometries. Default is NULL.

ngroups

integer(1). The number of groups.

padding

numeric(1). A extrusion factor to make buffer to clip actual datasets. Depending on the length unit of the CRS of input.

Value

A list of two,

original: exhaustive and non-overlapping grid polygons in the class of input
padded: a square buffer of each polygon in original. Used for computation.

Author(s)

Insang Song

Examples

lastpar <- par(mfrow = c(1, 1))
library(terra)
library(sf)
options(sf_use_s2 = FALSE)

ncpath <- system.file("gpkg/nc.gpkg", package = "sf")
nc <- terra::vect(ncpath)
nc_rp <- terra::spatSample(nc, 1000)

nc_gr <- par_pad_balanced(nc_rp, 10L, 1000)
nc_gr
par(lastpar)

Get a set of computational grids

Description

Using input points, the bounding box is split to the predefined numbers of columns and rows. Each grid will be buffered by the radius.

Usage

par_pad_grid(
  input,
  mode = c("grid", "grid_advanced", "grid_quantile", "h3", "dggrid"),
  nx = 10L,
  ny = 10L,
  grid_min_features = 30L,
  padding = NULL,
  unit = NULL,
  quantiles = NULL,
  merge_max = NULL,
  return_wkt = FALSE,
  res = 8L,
  ...
)

Arguments

input

sf or Spat* object.

mode

character(1). Mode of region construction. One of

"grid" (simple grid regardless of the number of features in each grid)
"grid_advanced" (merging adjacent grids with smaller number of features than grid_min_features). The argument grid_min_features should be specified.
"grid_quantile" (x and y quantiles): an argument quantiles should be specified.
"h3" (H3 hexagons): an argument res should be specified.
"dggrid" (DGG grids): an argument res should be specified.

nx

integer(1). The number of grids along x-axis.

ny

integer(1). The number of grids along y-axis.

grid_min_features

integer(1). A threshold to merging adjacent grids

padding

numeric(1). A extrusion factor to make buffer to clip actual datasets. Depending on the length unit of the CRS of input.

unit

character(1). The length unit for padding (optional). units::set_units is used for padding when sf object is used. See link for the list of acceptable unit forms.

quantiles

numeric. Quantiles for grid_quantile mode.

merge_max

integer(1). Maximum number of grids to merge per merged set.

return_wkt

logical(1). Return WKT format. When TRUE, the return value will be a list of two WKT strings.

res

integer(1). Resolution for h3 and dggrid modes.

...

arguments passed to the internal function

Value

A list of two,

original: exhaustive (filling completely) and non-overlapping grid polygons in the class of input
padded: a square buffer of each polygon in original. Used for computation.

Author(s)

Insang Song

Examples

lastpar <- par(mfrow = c(1, 1))
# data
library(sf)
options(sf_use_s2 = FALSE)
ncpath <- system.file("shape/nc.shp", package = "sf")
nc <- read_sf(ncpath)
nc <- st_transform(nc, "EPSG:5070")

# run: nx and ny should strictly be integers
nc_comp_region <-
  par_pad_grid(
    nc,
    mode = "grid",
    nx = 4L, ny = 2L,
    padding = 10000
  )
par(mfcol = c(2, 3))
plot(nc_comp_region$original$geometry, main = "Original grid")
plot(nc_comp_region$padded$geometry, main = "Padded grid")

nc_comp_region_wkt <-
  par_pad_grid(
    nc,
    mode = "grid",
    nx = 4L, ny = 2L,
    padding = 10000,
    return_wkt = TRUE
  )
nc_comp_region_wkt$original
nc_comp_region_wkt$padded

if (rlang::is_installed("h3r")) {
  suppressWarnings(
    nc_comp_region_h3 <-
      par_pad_grid(
        nc,
        mode = "h3",
        res = 4L,
        padding = 10000
      )
  )
  plot(nc_comp_region_h3$original$geometry, main = "H3 grid (lv.4)")
  plot(nc_comp_region_h3$padded$geometry, main = "H3 padded grid (lv.4)")
}
if (rlang::is_installed("dggridR")) {
  nc_comp_region_dggrid <-
  par_pad_grid(
    nc,
    mode = "dggrid",
    res = 7L,
    padding = 10000
  )
  plot(nc_comp_region_dggrid$original$geometry, main = "DGGRID (lv.7)")
  plot(nc_comp_region_dggrid$padded$geometry, main = "Padded DGGRID (lv.7)")
}
par(lastpar)

Split grid list to a nested list of row-wise data frames

Description

Split grid list to a nested list of row-wise data frames

Usage

par_split_list(gridlist)

Arguments

gridlist

list. Output of par_pad_grid or par_pad_balanced

Details

If the input is a data frame, the function will return a list of two data frames: original and padded. If the input is a WKT vector, the function will return a list of two WKT strings: original and padded.

Value

A nested list of data frames or WKT strings.

Examples

lastpar <- par(mfrow = c(1, 1))

library(sf)
library(terra)
options(sf_use_s2 = FALSE)

ncpath <- system.file("shape/nc.shp", package = "sf")
nc <- read_sf(ncpath)
nc <- st_transform(nc, "EPSG:5070")
nc_comp_region <-
  par_pad_grid(
    nc,
    mode = "grid",
    nx = 4L, ny = 2L,
    padding = 10000
  )
par_split_list(nc_comp_region)

par(lastpar)

Regular grid points in the mainland United States at 1km spatial resolution

Description

Regular grid points in the mainland United States at 1km spatial resolution

Usage

prediction_grid

Format

A data frame with 8,092,995 rows and three variables:

site_id: Unique point identifier. Arbitrarily generated.
lon: Longitude
lat: Latitude

Note

Coordinates are in EPSG:5070 (Conus Albers Equal Area)

Source

Mainland United States polygon was obtained from the US Census Bureau.

Examples

data("prediction_grid", package = "chopin")

Check coordinate system then reproject

Description

The input is checked whether its coordinate system is present. If not, it is reprojected to the CRS specified in crs_standard.

Usage

reproject_std(input, crs_standard = "EPSG:4326")

Arguments

input

Input object one of sf or terra::Spat* object

crs_standard

character(1). A standard definition of coordinate reference system. Default is "EPSG:4326" Consult epsg.io for details of other CRS.

Value

A (reprojected) sf or SpatVector object.

Note

This function works well with EPSG codes.

Author(s)

Insang Song

Align vector CRS to raster's

Description

Align vector CRS to raster's

Usage

reproject_to_raster(vector = NULL, raster = NULL)

Arguments

vector

sf/stars/SpatVector/SpatRaster object

raster

SpatRaster object

Value

Reprojected object in the same class as vector

Author(s)

Insang Song

Area weighted summary using two polygon objects

Description

When x and y are different classes, poly_weight will be converted to the class of x.

Usage

summarize_aw(x, y, ...)

## S4 method for signature 'SpatVector,SpatVector'
summarize_aw(
  x,
  y,
  target_fields = NULL,
  id_x = "ID",
  fun = stats::weighted.mean,
  extent = NULL,
  ...
)

## S4 method for signature 'character,character'
summarize_aw(
  x,
  y,
  target_fields = NULL,
  id_x = "ID",
  fun = stats::weighted.mean,
  out_class = "terra",
  extent = NULL,
  ...
)

## S4 method for signature 'sf,sf'
summarize_aw(
  x,
  y,
  target_fields = NULL,
  id_x = "ID",
  fun = NULL,
  extent = NULL,
  ...
)

Arguments

x

A sf/SpatVector object or file path of polygons detectable with GDAL driver at weighted means will be calculated.

y

A sf/SpatVector object or file path of polygons from which weighted means will be calculated.

...

Additional arguments depending on class of x and y.

target_fields

character. Field names to calculate area-weighted.

id_x

character(1). The unique identifier of each polygon in x. Default is "ID".

fun

function(1)/character(1). The function to calculate the weighted summary. Default is stats::weighted.mean. The function must have a w argument. If both x and y are sf, it should be one of c("sum", "mean"). It will determine extensive argument in sf::st_interpolate_aw.

extent

numeric(4) or SpatExtent object. Extent of clipping x. It only works with x of character(1) file path. See terra::ext for more details. Coordinate systems should match.

out_class

character(1). "sf" or "terra". Output class.

Value

A data.frame with all numeric fields of area-weighted means.

Note

x and y classes should match. If x and y are characters, they will be read as sf objects.

Author(s)

Insang Song geoissong@gmail.com

Examples

lastpar <- par(mfrow = c(1, 1))
# package
library(sf)
options(sf_use_s2 = FALSE)
nc <- sf::st_read(system.file("shape/nc.shp", package="sf"))
nc <- sf::st_transform(nc, "EPSG:5070")
pp <- sf::st_sample(nc, size = 300)
pp <- sf::st_as_sf(pp)
pp[["id"]] <- seq(1, nrow(pp))
sf::st_crs(pp) <- "EPSG:5070"
ppb <- sf::st_buffer(pp, nQuadSegs=180, dist = units::set_units(20, "km"))

suppressWarnings(
  ppb_nc_aw <-
    summarize_aw(
      ppb, nc, c("BIR74", "BIR79"),
      "id", fun = "sum"
    )
)
summary(ppb_nc_aw)

# terra examples
library(terra)
ncpath <- system.file("gpkg/nc.gpkg", package = "sf")
nc <- terra::vect(ncpath)
pp <- terra::spatSample(nc, size = 300)
pp[["id"]] <- seq(1, nrow(pp))
ppb <- terra::buffer(pp, 20000)

suppressWarnings(
  ppb_nc_aw <-
    summarize_aw(
      ppb, nc, c("BIR74", "BIR79"), "id",
      fun = sum
    )
)
summary(ppb_nc_aw)
par(lastpar)

Calculate Sum of Exponentially Decaying Contributions (SEDC) covariates

Description

Calculate Sum of Exponentially Decaying Contributions (SEDC) covariates

Usage

summarize_sedc(
  point_from = NULL,
  point_to = NULL,
  id = NULL,
  sedc_bandwidth = NULL,
  threshold = NULL,
  target_fields = NULL,
  extent_from = NULL,
  extent_to = NULL,
  ...
)

Arguments

point_from

SpatVector object. Locations where the sum of SEDCs are calculated.

point_to

SpatVector object. Locations where each SEDC is calculated.

id

character(1). Name of the unique id field in point_to.

sedc_bandwidth

numeric(1). Distance at which the source concentration is reduced to exp(-3) (approximately -95 %)

threshold

numeric(1). For computational efficiency, the nearest points in threshold will be selected. 2 * sedc_bandwidth is applied if this value remains NULL.

target_fields

character. Field names to calculate SEDC.

extent_from

numeric(4) or SpatExtent. Extent of clipping point_from. It only works with point_from of character(1) file path. See terra::ext for more details. Coordinate systems should match.

extent_to

numeric(4) or SpatExtent. Extent of clipping point_to.

...

Placeholder.

Details

The SEDC is specialized in vector to vector summary of covariates with exponential decay. Decaying slope will be defined by sedc_bandwidth, where the concentration of the source is reduced to $\exp(-3)$ (approximately 5 \ of the attenuating concentration with the distance from the sources. It can be thought of as a fixed bandwidth kernel weighted sum of covariates, which encapsulates three steps:

Calculate the distance between each source and target points.
Calculate the weight of each source point with the exponential decay.
Summarize the weighted covariates.

Value

data.frame object with input field names with a suffix "_sedc" where the sums of EDC are stored. Additional attributes are attached for the EDC information.

attr(result, "sedc_bandwidth"): the bandwidth where concentration reduces to approximately five percent
attr(result, "sedc_threshold"): the threshold distance at which emission source points are excluded beyond that

Note

Distance calculation is done with terra functions internally. Thus, the function internally converts sf objects in ⁠point_*⁠ arguments to terra. Please note that any NA values in the input will be ignored in SEDC calculation.

Author(s)

Insang Song

References

Messier KP, Akita Y, Serre ML. (2012). Integrating Address Geocoding, Land Use Regression, and Spatiotemporal Geostatistical Estimation for Groundwater Tetrachloroethylene. Environmental Science & Technology 46(5), 2772-2780.(doi:10.1021/es203152a)
Wiesner C. (n.d.). Euclidean Sum of Exponentially Decaying Contributions Tutorial.

Examples

library(terra)
library(sf)
set.seed(101)
ncpath <- system.file("gpkg/nc.gpkg", package = "sf")
nc <- terra::vect(ncpath)
nc <- terra::project(nc, "EPSG:5070")
pnt_from <- terra::centroids(nc, inside = TRUE)
pnt_from <- pnt_from[, "NAME"]
pnt_to <- terra::spatSample(nc, 100L)
pnt_to$pid <- seq(1, 100)
pnt_to <- pnt_to[, "pid"]
pnt_to$val1 <- rgamma(100L, 1, 0.05)
pnt_to$val2 <- rgamma(100L, 2, 1)

vals <- c("val1", "val2")
suppressWarnings(
  summarize_sedc(pnt_from, pnt_to, "NAME", 1e5, 2e5, vals)
)

Computation of spatial data by hierarchical and objective partitioning of inputs for parallel processing

Description

Basic functionalities

chopin workflow

Function selection guide for ⁠par_*()⁠

Caveats

Author(s)

See Also

Check the class of an input object

Description

Usage

Arguments

Value

Check Raster Input

Description

Usage

Arguments

Value

Check the subject object and perform necessary conversions if needed.

Description

Usage

Arguments

Value

Get intersection extent

Description

Usage

Arguments

Prescreen input data for parallelization

Description

Usage

Arguments

Return the input's GIS data model type

Description

Usage

Arguments

Value

Note

Author(s)

See Also

Return the package the input object is based on

Description

Usage

Arguments

Value

Author(s)

See Also

Switch spatial data class

Description

Usage

Arguments

Value

Author(s)

See Also

Extract raster values with point buffers or polygons

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Setting the clipping extent

Description

Usage

Arguments

Value

Author(s)

See Also

Subset for nonidentical package class objects

Description

Usage

Arguments

Kernel functions

Description

Usage

Arguments

Value

References

See Also

`chopin` workflow

Function selection guide for `⁠par_*()⁠`