The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
What is the shortest path from Mombasa, Kenya to Marseille, France by
boat? It is not the straight line between those locations, as it cuts
through land. This package applies graph theory to gridded spatial data,
e.g. deriving shortest paths between places taking barriers or cost
surfaces into account. In the mentioned example, this would be
delineating a path along the East African coast, through the Red Sea,
the Suez Canal, and the Mediterranean. Apart from relating locations on
Earth, spaths
can also compute shortest paths more
generally on spheres and planes.
spaths
originates from a research project with larger data
and more extensive computations than earlier R packages in this field
could handle. Like other packages, previous versions of
spaths
used the igraph
package, a wrapper for
a C library, for graph theoretical algorithms. To further optimize
computational performance and tailor the code to the gridded, spatial
use case, spaths
dropped the igraph
dependency
and now comes with its own C++ graph theory implementation.
Apart from computational performance, spaths
is geared
towards user friendliness. Users do not have to understand how
graph-theoretical algorithms, transition functions, or spatial distance
functions work. And via a set of parameters, they can choose an
implementation that fits their machine’s capabilities and the
application’s prerequisites. Another aspect of user friendliness is the
ease of installation. Linux users without administrator rights often
struggle to install geospatial or other packages that build on external
system libraries. To prevent such issues, spaths
comes with
few dependencies. Apart from core packages by default installed with R,
users only have to install Rcpp
and
data.table
. Installing terra
is optional and
only required, when using terra
objects as inputs.
The package is written to be a foundation upon which other packages
build. As igraph
has become a dependency for
graph-theoretical software more broadly, spaths
aims to be
the dependency for applications connecting points in grids. The
extensions of spaths
can be in the geospatial domain,
computing e.g. sailing vessel routes taking wind patterns and ocean
currents into account, animal migration influenced by terrain topology
and vegetation, or optimal paths for Mars explorations. However, they
can also target biomedical studies or any other question connecting
points in a grid. All you have to do for most extensions is to pass a
custom transition function to the tr_fun
parameter that
tells spaths
how to compute transitions costs between cells
from one or more grid layers.
Before spaths
applies graph-theoretical algorithms, like
shortest paths identifications, it converts the spatial data into a
graph. A graph consists of vertices, also called nodes, that are
connected via edges. In spaths
, grid cell centroids are
vertices linked to neighboring cells’ centroids through edges. Which
neighbors are connected depends on the contiguity
argument.
Figure 1 illustrates the parameter’s two options: rook’s case contiguity
and queen’s case contiguity. In the former model, a cell is directly
connected to its four horizontally and vertically adjacent neighbors.
With the latter structure, a cell is also directly linked to the four
diagonally adjacent neighbors, implying a total of eight direct
neighbors.
spaths
uses queen’s case contiguity by default as it often
produces more appropriate shortest paths than rook’s case contiguity
does. The advantages of rook’s case contiguity are that the fewer edges
imply lower RAM requirements and shorter computational times of shortest
path algorithms than with queen’s case contiguity.
There are other types of contiguity that directly connect a cell to
second order neighbors. This can produce paths that look smoother, as it
permits the algorithm to traverse the grid at a larger set of angles.
Yet, spaths
does currently not implement such type of
contiguity because it would allow the algorithm to jump over barriers
that are one cell wide. In an application deriving ship routes, the
algorithm could jump over e.g. islands and peninsulas.
All of the input grid’s, i.e. rst
argument’s,
non-NA
pixels become part of the graph. In the example of
ship route estimations, you would set all land pixels to
NA
, restricting ships to only traverse water cells. The
number of of non-NA
cells and thereby size of the graph is
the most important determinant of RAM requirements and execution time.
Crop the grid to the relevant area or set irrelevant pixels to
NA
to boost efficiency.
Shortest paths algorithms, like the here used Dijkstra’s (1959) algorithm, minimize the sum of edge
weights along the paths between origins and destinations. By default,
the edge weight in spaths
is the geographic distance
between the centroids of the respective neighboring cells. Figure 2
illustrates edge weights as distance in kilometers between a cell at 1°N
1°E an its neighboring cells under queen’s case contiguity in an
unprojected, i.e. lonlat, grid of a one degree resolution. The use of
kilometers is for visualization purposes and deviates from the actual
implementation which uses meters in most cases, as the documentation
states.
In calculating distances between locations distributed around the globe,
it is usually best to use unprojected data. Projections,
i.e. transfering coordinates from a spheroid or ellipsoid onto a two
dimensional plane, necessarily distort the input. These distortions can
severely bias distances between cells. Projections are designed for
specific, often regional, applications. If you are not sure whether a
projection allows for correct distances in your case, use unprojected
data. This phenomenon is not specific to spaths
, but holds
for any GIS software. Another caveat of using projections is that
spaths
, like terra
, does not connect cells
across the projected grid’s edges, even if the data is global. Meaning
it does not connect the left most cells to the right most cells in the
grid. However, when the data is unprojected and global, it does connect
them. This applies to any rst
input class, including a
SpatRaster, a RasterLayer, and a matrix or a list of matrices with
spherical = TRUE
.
spaths
defaults to the fastest ways of deriving geographic
distances. It uses optimized Haversine distance computations for
unprojected data on Earth or other spherical objects and optimized
Euclidean distance computations for planar data. If you want account for
Earth’s ellipsoid shape rather than working with spherical distances and
computational efficiency is not a top priority, set
dist_comp
to "terra"
, which derives distances
via terra::distance
. terra
is a great and
performant package. What makes the "terra"
option slower
than "spaths"
in dist_comp
is that it triggers
the function to derive distances between all neighboring cells
separately and export them from R to C++ while the latter choice
computes them in a way tailored to this application in the graph
construction phase in C++. This implementation e.g. leverages the
occurrence of repeated distance values in gridded data.
Edge weights do not have to be geographic straight line distances
between cell centroids. With a custom transition function passed to
tr_fun
, you can freely define them in a different way. In
this function, you may use the parameters d
(distance
between the pixel centroids), x1
(x coordinate or longitude
of the first cell), x2
(x coordinate or longitude of the
second cell), y1
(y coordinate or latitude of the first
cell), y2
(y coordinate or latitude of the second cell),
v1
(rst
layers’ values from the first cell),
v2
(rst
layers’ values from the second cell),
and nc
(number of CPU cores set by ncores
).
The algorithm moves from the first to the neighboring second cell. If
the data is unprojected, i.e. lonlat, or if
dist_comp = "terra"
, d
is measured in meters.
Otherwise, it uses the units of the CRS. If rst
has one
layer, the values are passed to v1
and v2
as
vectors. Otherwise they are passed as a data table, if
v_matrix
is FALSE
(default), and as a matrix,
if v_matrix
is TRUE
. The function has to be
callable from R, but does not have to be written in R. It can e.g. be a
C++ function implemented via the Rcpp
package.
A common transition function is Tobler’s (1993) hiking function. It estimates the travel
time between locations conditional on terrain topography. To illustrate
this use case, imagine that the input grid is a digital elevation model.
It consists of one layer and the pixel values document the location’s
elevation in meters. This tr_fun
function then expresses
the transition cost in terms of the hours it takes to travel between
cell centroids according to Tobler:
function(d, v1, v2) d / (6000 * exp(-3.5 * abs((v2 - v1) / d + 0.05)))
.
It combines the straight line distance in meters d
with the
altitude difference, i.e. the value of the second cell v2
minus the value of the first cell v1
. Tobler’s (1993) hiking function produces asymmetric edge
weights. Walking uphill is not equally fast as walking downhill. This
results in a directed graph, meaning transition cost depends on the
direction in which the algorithm traverses between grid cells. So, the
tr_directed
argument has to be TRUE
.
With transition functions generating symmetric edge weights,
tr_directed
should always be set to FALSE
,
improving computational performance. Tobler’s (1993) function would e.g. produce symmetric
transition costs in the absence of the + 0.05
. The
tr_directed
parameter only has an effect in the presence of
custom transition functions, i.e. if tr_fun
is not
NULL
. Otherwise, the graph is always undirected.
As already mentioned, spaths
is not restricted to
geographic applications. Here is an arbitrary transition function
combining various layers of rst
:
function(v1, v2) v1[[1L]] * v2[[1L]] + v1[[2L]] * v2[[2L]]
.
This multiplies the first cell’s value in the first layer with the
second cell’s value in the first layer and adds it to the cells’ second
layer product.
It is recommended to use vectorized R functions like the ones above. It
computes all edge weights without explicitly calling an iterative
structure like for
, *apply
, or
foreach
loops. R loops are slow. If your function requires
a loop, write it in C++. Here is an example with
v_matrix = TRUE
, the Armadillo C++ library, C++ 20, and
...
as a placeholder for the function body:
Rcpp::sourceCpp(code = '
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::plugins(cpp20)]]
// [[Rcpp::export]]
Rcpp::NumericVector example_tr_fun(arma::mat &v1, arma::mat &v2, Rcpp::NumericVector &d) {
...
}
')
By default, shortest_paths
returns distances. These
distances are the sum of edge weights along the shortest paths between
points. Without a custom transition function, it is the geographic
distance between the centroids of the cells on a path between two
points. It is the length of the path in meters, unless you use a
projection with different units. With a custom transition function, the
distance or length of the path is expressed in whatever units the
transition function returns, such as hours of travel time with Tobler’s
(1993) hiking function.
set.seed(3L)
input_grid <- terra::rast(crs = "epsg:4326", resolution = 2, vals = sample(c(1L, NA_integer_), 16200L,
TRUE, c(0.8, 0.2)))
origin_pts <- rnd_locations(3L, output_type = "SpatVector")
destination_pts <- rnd_locations(3L, output_type = "SpatVector")
shortest_paths(input_grid, origin_pts)
origins destinations distances
<int> <int> <num>
1: 1 2 19627694
2: 1 3 7290325
3: 2 3 14467797
shortest_paths(input_grid, origin_pts, destination_pts)
origins destinations distances
<int> <int> <num>
1: 1 1 13313293
2: 1 2 6158046
3: 1 3 15837664
4: 2 1 9137621
5: 2 2 16130624
6: 2 3 4810903
7: 3 1 15919393
8: 3 2 10787554
9: 3 3 19275995
Instead of distances, shortest_paths
can output the path
lines or lines and distances jointly.
shortest_paths(input_grid, origin_pts, output = "lines")
class : SpatVector
geometry : lines
dimensions : 3, 3 (geometries, attributes)
extent : -179, 179, -63, -1 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326)
names : origins destinations connected
type : <int> <int> <logical>
values : 1 2 TRUE
1 3 TRUE
2 3 TRUE
shortest_paths(input_grid, origin_pts, output = "both")
class : SpatVector
geometry : lines
dimensions : 3, 3 (geometries, attributes)
extent : -179, 179, -63, -1 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326)
names : origins destinations distances
type : <int> <int> <num>
values : 1 2 1.963e+07
1 3 7.29e+06
2 3 1.447e+07
When no destinations are specified, shortest_paths
computes
the paths between all origin combinations. If transition costs are
symmetric, i.e. traveling from cell A to neighboring cell B is as
expensive as traveling from B to A, the function by default only returns
distances in one direction to boost computational efficiency and lower
the RAM requirements of the return object. If you would like the output
to report on both directions, set bidirectional
to
TRUE
.
shortest_paths(input_grid, origin_pts)
origins destinations distances
<int> <int> <num>
1: 1 2 19627694
2: 1 3 7290325
3: 2 3 14467797
shortest_paths(input_grid, origin_pts, bidirectional = TRUE)
origins destinations distances
<int> <int> <num>
1: 1 2 19627694
2: 1 3 7290325
3: 2 3 14467797
4: 2 1 19627694
5: 3 1 7290325
6: 3 2 14467797
The distance from a point to itself is zero. So, irrespective of what
arguments you pass to shortest_paths
, the function never
returns paths from points to themselves.
If you do not want to connect all origins to all destinations, specify
pairwise = TRUE
. This connects the first origin to the
first destination, the second origin to the second destination, etc. For
computational optimization, the function can change the order of the
results. In the example below, the first row contains the distance
between the second origin and the second destination. Always check the
output’s origins and destinations variables regarding which points an
estimate refers to.
shortest_paths(input_grid, origin_pts, destination_pts, pairwise = TRUE)
origins destinations distances
<int> <int> <num>
1: 2 2 16130624
2: 1 1 13313293
3: 3 3 19275995
The origins and destinations variables by default refer to the row
numbers in the origins
(and destinations
)
inputs. You can make shortest_paths
to utilize other names
by specifying a column in origins
(and
destinations
) containing point names. These names can be of
types character
, integer
, and
numeric
.
origin_pts$name <- letters[1:3]
shortest_paths(input_grid, origin_pts, origin_names = "name")
origins destinations distances
<char> <char> <num>
1: a b 19627694
2: a c 7290325
3: b c 14467797
Unconnected points are marked with Inf
in the
distances
variable, if distance_type
is
"double"
or "float"
, with NA
, if
distance_type
is "int"
or
"unsigned short int"
, and with a connected
variable, if distances are not returned. Integers use NA
rather than Inf
because Inf
is a numeric, not
an integer, value.
If output = "distances"
, the output is by default returned
as a data table. Data tables are data frames and you can use them in
methods expecting data frames. If you want the result to be a data frame
only, not a data table, set output_class
to
"data.frame"
.
shortest_paths(input_grid, origin_pts, output_class = "data.frame")
origins destinations distances
1 1 2 19627694
2 1 3 7290325
3 2 3 14467797
If output
is "lines"
or "both"
,
the the function returns a SpatVector, if rst
is a
SpatRaster or a RasterLayer, and a list, if rst
is a matrix
or a list of matrices. Explicitly setting output_class
to
"list"
returns a list in any case.
output_class = "SpatVector"
, however, returns a SpatVector
only if rst
is a SpatRaster or a RasterLayer.
NA
cells in rst
act as barriers. They mark the
cells which the algorithm must not travel through. What if these
barriers move? In the example of ships moving between ports, these
moving barriers could be Caribbean hurricanes. Ships do not go through
the storms, but around them.
Assume each hurricane is documented with a separate polygon. You could
mask the rst
grid with the different polygons, create one
grid per hurricane, and loop over these grids with
shortest_paths
. This would reestimate all shipping routes
in each call of shortest_paths
. Even lines not passing
through the Caribbean, e.g. routes between India and Australia, would be
recomputed. On top of that, each iteration would check the inputs and
convert them into the format used by the algorithm. It is an inefficient
strategy.
shortest_paths
comes with an efficient solution to the
moving barrier case. You pass the hurricane polygons to
update_rst
and the function computes the shortest paths in
a hurricane-free grid and the grids subject to hurricanes. It just
recomputes paths affected by a hurricane and is much more efficient than
looping over shortest_paths
is.
barrier <- terra::vect("POLYGON ((-179 -25, 100 -25, 100 -26, -179 -26, -179 -25))", crs = "epsg:4326")
shortest_paths(input_grid, origin_pts, update_rst = barrier)
origins destinations distances layer
<int> <int> <num> <int>
1: 1 2 19627694 0
2: 1 3 7290325 0
3: 2 3 14467797 0
4: 1 2 19627694 1
5: 1 3 13207350 1
6: 2 3 15465933 1
barriers <- list(barrier, terra::vect("POLYGON ((0 20, 1 20, 1 -20, 0 -20, 0 20))", crs = "epsg:4326"))
shortest_paths(input_grid, origin_pts, update_rst = barriers)
origins destinations distances layer
<int> <int> <num> <int>
1: 1 2 19627694 0
2: 1 3 7290325 0
3: 2 3 14467797 0
4: 1 2 19627694 1
5: 1 3 13207350 1
6: 2 3 15465933 1
7: 1 2 19813077 2
8: 1 3 7290325 2
9: 2 3 14467797 2
Layer 0 is the hurricane-free base grid, layer 1 is the hurricane-free
base grid updated with the first polygon, and layer 2 is the
hurricane-free base grid updated with the second polygon. Each layer
updates the unmodified rst
, not the grid already updated by
another polygon. update_rst
sets cells covered by the
respective polygon to NA
. It never sets cells to any other
value than NA
.
The actual implementation does not truly update rst
or the
graph, but marks the respective cells as blocked in a more efficient
way. The internal strategy obtains the same results, but is much faster
than physically updating the grid would be. So, you should treat
update_rst
as a method of setting any rst
cells that it intersects with to NA
, irrespective of the
optimized implementation.
spaths
is not limited to geographic locations on Earth. The
functions can be applied to other scenarios connecting points in grids.
These can be of a geographic nature, like other planets, or
non-geographic subjects.
As an example, we consider astronauts walking on Mars. Non-Earth
applications must provide rst
as a matrix or a list of
matrices. The SpatRaster and RasterLayer inputs are restricted to
evaluations on Earth. If rst
is a matrix or a list of
matrices, the parameters spherical
, radius
,
and extent
define what that matrix refers to. In our
Martian example, we define rst
to be a global, unprojected
grid of a two degree resolution.
Because the data is unprojected, meaning it is expressed in degrees on a
sphere, not points on a plane, we specify spherical = TRUE
.
Mars’ radius is 3389500
meters and the grid’s global nature
implies an extent
of c(-180, 180, -90, 90)
. It
stretches from -180 to 180 in the x dimension and -90 to 90 in the y
dimension.
When rst
is a matrix or a list of matrices,
origins
(and destinations
) are supplied as a
matrix, data frame, or data table of coordinates, with columns named
x
and y
. The Martian example uses a data
table.
set.seed(3L)
origin_pts <- rnd_locations(3L)
shortest_paths(input_grid, origin_pts, spherical = TRUE, radius = 3389500, extent = c(-180, 180, -90, 90))
origins destinations distances
<int> <int> <num>
1: 1 2 8229741
2: 1 3 5372309
3: 2 3 9088755
dist_comp = "terra"
is not available in non-Earth-based
scenarios. Unless, you compute edge weights through a custom transition
function, they are derived through the dist_comp = "spaths"
methods. Custom transition functions work essentially like they do for
Earth. A difference is that grid layers are not passed as layers of a
single SpatRaster, but as matrices in a list. As in the SpatRaster case,
the layers can be accessed by index and name.
set.seed(3L)
input_grid <- list(even_terrain = input_grid, temperature = matrix(stats::rnorm(16200L, 20, 5), nrow = 90))
custom_tr <- function(v1, v2) v1[[1L]] * v2[[1L]] + v1[[2L]] * v2[[2L]]
custom_tr <- function(v1, v2) v1[["even_terrain"]] * v2[["even_terrain"]] + v1[["temperature"]] * v2[["temperature"]]
shortest_paths(input_grid, origin_pts, spherical = TRUE, radius = 3389500, extent = c(-180, 180, -90, 90), tr_fun = custom_tr)
origins destinations distances
<int> <int> <num>
1: 1 2 15973.37
2: 1 3 10254.19
3: 2 1 15973.37
4: 2 3 17432.90
5: 3 1 10254.19
6: 3 2 Inf
Grid updating is not done with a SpatVector, but a vector of cell
numbers, a matrix, or a list of either. The vector’s cell numbers mark
the cells to set to NA
. spaths
counts cells
like terra
does, starting with one in the top left, then
increasing from left to right and afterwards from top to bottom. This
differs from how R base matrices enumerate cells, which iterate first
from top to bottom and second from left to right. If
update_rst
is a matrix, it must be of the same dimensions
as rst
and marks cells to be updated using NA
values. Cells with non-NA
values in an
update_rst
matrix are not updated.
set.seed(3L)
input_grid <- matrix(sample(c(1L, NA_integer_), 16200L, TRUE, c(0.8, 0.2)), nrow = 90)
barrier_vector <- sample.int(16200L, 10L)
shortest_paths(input_grid, origin_pts, spherical = TRUE, radius = 3389500, extent = c(-180, 180, -90, 90), update_rst = barrier_vector)
origins destinations distances layer
<int> <int> <num> <int>
1: 1 2 8229741 0
2: 1 3 5372309 0
3: 2 3 9088755 0
4: 1 2 8229741 1
5: 1 3 5372309 1
6: 2 3 9088755 1
barrier_matrix <- matrix(rep.int(1L, 16200L), nrow = 90)
barrier_matrix[sample.int(16200L, 10L)] <- NA_integer_
shortest_paths(input_grid, origin_pts, spherical = TRUE, radius = 3389500, extent = c(-180, 180, -90, 90), update_rst = barrier_matrix)
origins destinations distances layer
<int> <int> <num> <int>
1: 1 2 8229741 0
2: 1 3 5372309 0
3: 2 3 9088755 0
4: 1 2 8229741 1
5: 1 3 5372309 1
6: 2 3 9088755 1
In each of the two cases, update_rst
marks ten cells in
rst
to be set to NA
, once using a vector and
once using a matrix. In either approach, the ten cells do not affect the
paths and accordingly the results are the same for both layers.
Performance optimization is a central design principle of
spaths
. shortest_paths
’ runs with defaults
that are optimal for the average use case. Nonetheless, there are some
performance considerations that should generally be accounted for and
various parameters that can help in tailoring the function execution to
the application.
The number of non-NA
pixels in rst
has the
largest influence on both computational time and RAM demand. Crop the
rst
to the relevant area and set any cells that the
shortest paths are do surely no pass through to NA
. The
quantity of non-NA
cells does not only determine the size
of the graph, but the size of multiple intermediate objects, the number
of edge weights to derive, and the number of places to consider in the
shortest path algorithm.
To be more precise, the primary contributor to graph size is the number
of edges: the quantity of links between cells. Of course, the number of
edges increases in the grid cell count. There is an option to influence
this link though. shortest_paths
builds on the above
described queen’s case contiguity with up to eight edges per vertex.
Changing contiguity
to "root"
cuts RAM
consumption and computational time. Yet, it may induce less desireable
paths than contiguity = "queen"
does.
You can track the function’s progress with
show_progress = TRUE
. This prints messages on stages of
shortest_paths
, including a progress bar, if the number of
paths or update_rst
elements is less than or equal to
bar_limit
. bar_limit
defaults to 150. In the
example below, the function prints three =
, one per path.
show_progress = TRUE
is meant for testing purposes.
Especially printing a progress bar from a parallelized function
execution can prolong runtimes because the program limits writing to
output to one thread at a time. If you choose to print a progress bar,
do not set bar_limit
too high, which can cause the output
buffer to overflow and crash the program.
shortest_paths(input_grid, origin_pts, show_progress = TRUE)
Checking arguments
Converting spatial inputs
Preparing algoritm inputs
Starting distances calculation
|---|
|===|
Generating output object
origins destinations distances
<int> <int> <num>
1: 1 2 19627694
2: 1 3 7290325
3: 2 3 14467797
Deriving lines is computationally more expensive and requires more RAM than deriving distances. Storing the coordinates of 10,000 lines comprised of 500,000 cells on average each, requires 74.5 GB RAM. Storing the distances associated with those 10,000 paths requires 78.1 KB RAM, or 0.0001 percent as much as storing the coordinates. This is, of course, not the only information that the function holds. There are intermediate objects etc. Yet, the computational requirements of assembling path lines should not be underestimated.
By default, ncores
is NULL
and
shortest_paths
parallelizes across all of the machine’s CPU
cores. It is implemented with OpenMP in C++. OpenMP is much more
efficient than R level parallelism and scales quasi linearly in the
number of cores. Each thread runs an iteration of the shortest paths
algorithm. How many iterations there are depends on the number of origin
and destination points. The software utilizes shared memory parallelism.
Hence, objects that are shared across executions of the
graph-theoretical algorithm, like the graph and update_rst
,
are not copied. Instead, all threads share the same representation. This
does not mean that the RAM use does not increase in the number of cores.
Each execution of the shortest path algorithm comes with its own
objects, such as the priority queue managing the order of the cells to
be visited, which need allocation. So, explicitly setting the number of
cores to a value below the default is a way of reducing RAM consumption.
OpenMP is commonly not available on MacOS by default. This is not
specific to spaths
, but affects many modern R packages that
are geared towards performance. It implies that the package may be much
faster on Windows and Linux than on MacOS.
shortest_paths
runs Dijsktra’s algorithm to identify
shortest paths between points. It runs the algorithm once per origin. If
you provide one origin and ten destinations, the function produces ten
paths and runs the algorithm once. If you provide two origins and five
destinations, the function also produces ten paths, but runs the
algorithm twice. The second example, therefore, takes longer than the
first one. If the generated graph is undirected, it does not matter, if
you pass two origins and five destinations or five origins and two
destinations. shortest_paths
automatically adjusts the
direction to minimize the frequency with which Dijkstra’s (1959) algorithm is called. In the example, it
would be called twice in either case.
The algorithm by default derives the shortest paths from an origin to
other cells of the same graph until all destination cells have been
visited. This technique allows the algorithm to potentially stop before
having visited each vertex. Yet, it comes at the cost of checking for
each visited cell whether it is in the set of destinations. Hence, the
default early_stopping = TRUE
is efficient when point pairs
are close to each other compared to the entirety of cells. If at least
one points pair in an execution of the shortest paths algorithm is far
from each other, the alternative early_stopping = FALSE
can
be faster. It derives the distance to all other cells and then picks the
destinations cells from the result, avoiding the check for destination
cells while iterating through the graph.
early_stopping = TRUE
and
early_stopping = FALSE
produce the same results, but differ
in computational performance.
In a grid, the vector of distances between neighboring cells is made up
of not that many unique and repeating values. That makes it efficient to
precompute edge weights for the entire graph, and is the behavior which
the default pre = TRUE
induces. Computing individual edge
weights while Dijkstra’s (1959) algorithm
runs, pre = FALSE
, requires less RAM, as the function only
stores one edge weight at a time, but is almost always much slower than
precomputing them all jointly. So, setting pre
to
FALSE
is one of the last options to consider, when the
machine has insufficient RAM.
When update_rst
is a list, there are two potential
dimensions for parallelism. In iterating over the updated grids, the
function could parallelize at the level of point connections, like in
the base grid, or it could parallelize across grids, i.e. list elements
of update_rst
. By default the function chooses the latter,
meaning par_lvl
is "update_rst"
. All
connections in the grid updated with the first element of
update_rst
are handled by one core, all connections in the
grid updated with the second element of update_rst
are
handled by another core, etc. If par_lvl
is
"points"
, the function instead calls the former option. It
computes all connections in the grid updated with the first element of
update_rst
in parallel, then computes all connections in
the grid updated with the second element of update_rst
in
parallel, etc. The par_lvl
argument only affects the grids
updated with update_rst
list elements. The unupdated base
grid always uses the "points"
strategy. Which
par_lvl
option is preferred depends on the number of
recomputed paths and the number of update_rst
list
elements. Assume you run shortest_paths
with 8 CPU cores,
pass update_rst
of two elements, and run the shortest paths
algorithm 16 times per updated grid. par_lvl = "update_rst"
would utilize two cores and leave six cores idle.
par_lvl = "points"
, in contrast, would use all 8 cores,
commonly running the algorithm twice per core. It does not matter how
many connections there are in total and how often Dijkstra’s (1959) algorithm is called in the base grid.
Only paths that are affected by the grid updating, i.e. where
update_rst
sets at least one cell on the path to
NA
, are recomputed.
Internally, shortest_paths
stores paths in the form of cell
numbers. By default, these are four byte signed integers, the same type
R uses for integers. Only if output
is "lines"
or "both"
, are these cell numbers converted to coordinates,
usually two 8 byte double precision floating point numbers per cell,
before the function returns. If rst
has less than 65,535
non-NA
cells, you can make use of 2 byte unsigned short
integers instead. The path_type = "unsigned short int"
option requires half as much RAM to store cell numbers as
path_type = "int"
demands, but is slower as it comes with
type conversions from 4 byte signed integers. The results are the same,
irrespective of which option you employ.
Another data type selection regards distances.
distance_type
defaults to the fastest and most precise
option: double precision floating point numbers. "double"
corresponds to the numeric
type in R. It is an 8 byte type.
Alternatively, you can choose 4 byte single precision floating point
numbers ("float"
), 4 byte signed integers
("int"
), and 2 byte unsigned short integers
("unsigned short int"
). Unlike path_type
,
distance_type
changes the results. "float"
stores distances between cells with lower precision than
"double"
does and "int"
and
"unsigned short int"
round distances to integers. These
deviations accumulate over a path and can induce marked differences in
the result. Always test the effect in your application before choosing
one of the less RAM demanding options. If distance_type
is
"int"
, the distance between any cells, i.e. the sum of edge
weights along any path, not just the returned ones, must not exceed
2,147,483,647. With "unsigned short int"
that limit is
65,535, making it not applicable in most scenarios.
shortest_paths
does not check, if you meet these numerical
constraints. The results are simply wrong, if you violate them.
"float"
, "int"
, and
"unsigned short int"
are slower than "double"
because they require type conversions.
Contributions to this package are highly welcome. You can submit them as
a pull request to the ChristianDueben/spaths
GitHub
repository or via email to the maintainer email mentioned in the
DESCRIPTION
file. You may also build a package on top of
spaths
, as movecost
and
leastcostpath
did with gdistance
.
This is because of the contiguity
options that
spaths
implements: queen’s and rook’s case contiguity. As
clarified above, these rules restrict the algorithm to traverse between
directly adjacent cells only. Rook’s case contiguity provides access to
four neighbors spaced at 90° angles. Queen’s case contiguity provides
access to eight neighbors spaced at 45° angles. Incorporating types of
contiguity that allow for traversing between second order neighbors
would introduce a larger set of angles for the algorithm to choose from.
However, this lets the function to skip over first degree neighbors. In
the ship route example above, the ship could skip over land cells, such
as islands or peninsulas. Hence, the lines are the shortest paths given
the angles the algorithm may travel at.
The recommendation is to use SpatRaster rst
, SpatVector
origins
(and SpatVector destinations
) objects
when handling locations on Earth. The other classes also work, but the
recommended classes tend to be the most appropriate in that they require
little conversion and return output in a convenient format. If the data
do not represent locations on Earth, you need to use a matrix or list of
matrices as rst
with a matrix or a data frame as
origins
(and destinations
). Unlike a matrix, a
data frame accepts origin_names
and
destination_names
columns of a different type than the
coordinates.
The next features include alternative shortest paths algorithms, beyond Dijkstra’s algorithm, and centrality measures. In the respective categories, the A* algorithm and closeness centrality will be first. How long it will take to publish these updates depends on the maintainer’s academic career prospects.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.