Title: | Import and Manipulate 'ForestGEO' Data |
Version: | 1.2.10 |
Description: | To help you access, transform, analyze, and visualize 'ForestGEO' data, we developed a collection of R packages (https://forestgeo.github.io/fgeo/). This package, in particular, helps you to easily import, filter, and modify 'ForestGEO' data. To learn more about 'ForestGEO' visit https://forestgeo.si.edu/. |
License: | GPL-3 |
URL: | https://forestgeo.github.io/fgeo.tool/, https://github.com/forestgeo/fgeo.tool |
BugReports: | https://github.com/forestgeo/fgeo.tool/issues |
Depends: | R (≥ 3.2) |
Imports: | dplyr (≥ 0.8.0.1), glue (≥ 1.3.1), magrittr (≥ 1.5), purrr (≥ 0.3.2), readr (≥ 1.3.1), rlang (≥ 0.4.11), tibble (≥ 2.1.1), tidyselect (≥ 0.2.5) |
Suggests: | covr (≥ 3.2.1), fgeo.x (≥ 1.1.3), knitr (≥ 1.22), roxygen2 (≥ 6.1.1), spelling (≥ 2.1), stringr (≥ 1.4.0), testthat (≥ 2.1.1), tidyr (≥ 0.8.3) |
Encoding: | UTF-8 |
Language: | en-US |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | no |
Packaged: | 2025-04-03 17:11:30 UTC; rstudio |
Author: | Mauro Lepore |
Maintainer: | Mauro Lepore <maurolepore@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-04-03 17:30:02 UTC |
fgeo.tool: Import and Manipulate 'ForestGEO' Data
Description
To help you access, transform, analyze, and visualize 'ForestGEO' data, we developed a collection of R packages (https://forestgeo.github.io/fgeo/). This package, in particular, helps you to easily import, filter, and modify 'ForestGEO' data. To learn more about 'ForestGEO' visit https://forestgeo.si.edu/.
Author(s)
Maintainer: Mauro Lepore maurolepore@gmail.com (ORCID) [contractor]
Authors:
Richard Condit richardcondit@gmail.com
Suzanne Lao laoz@si.edu
Anudeep Singh anudeep7@gmail.com
Other contributors:
CTFS-ForestGEO ForestGEO@si.edu [copyright holder, funder]
See Also
Useful links:
Report bugs at https://github.com/forestgeo/fgeo.tool/issues
Pipe operator
Description
See magrittr::%>%
for details.
Usage
lhs %>% rhs
Add column status_tree
based on the status of all stems of each tree.
Description
Add column status_tree
based on the status of all stems of each tree.
Usage
add_status_tree(data, status_a = "A", status_d = "D")
Arguments
data |
A ForestGEO-like dataframe: A ViewFullTable, tree or stem table. |
status_a , status_d |
Sting to match alive and dead stems; it corresponds
to the values of the variable |
Value
The input data set with the additional variable status_tree
.
See Also
Other functions to add columns to dataframes:
add_subquad()
,
add_var()
Other functions for ForestGEO data:
add_subquad()
,
add_var()
Other functions for fgeo census:
add_var()
,
guess_plotdim()
,
pick_drop
Other functions for fgeo vft:
add_subquad()
,
add_var()
,
guess_plotdim()
,
pick_drop
Examples
# styler: off
stem <- tribble(
~CensusID, ~treeID, ~stemID, ~status,
1, 1, 1, "A",
1, 1, 2, "D",
1, 2, 3, "D",
1, 2, 4, "D",
2, 1, 1, "A",
2, 1, 2, "G",
2, 2, 3, "D",
2, 2, 4, "G"
)
# styler: on
add_status_tree(stem)
Add column subquadrat
based on QX
and QY
coordinates.
Description
Add column subquadrat
based on QX
and QY
coordinates.
Usage
add_subquad(data, x_q, y_q = x_q, x_sq, y_sq = x_sq, subquad_offset = NULL)
Arguments
data |
A dataframe with quadrat coordinates |
x_q , y_q |
Size in meters of a quadrat's side. For ForestGEO sites, a common value is 20. |
x_sq , y_sq |
Size in meters of a subquadrat's side. For ForestGEO sites, a common value is 5. |
subquad_offset |
Either First column is 0 First column is 1 ----------------- ----------------- 04 14 24 34 14 24 34 44 03 13 23 33 13 23 33 43 02 12 22 32 12 22 32 42 01 11 21 31 11 21 31 41 |
Value
Returns data
with the additional variable subquadrat
.
Author(s)
Anudeep Singh and Mauro Lepore.
See Also
Other functions to add columns to dataframes:
add_status_tree()
,
add_var()
Other functions for ForestGEO data:
add_status_tree()
,
add_var()
Other functions for fgeo vft:
add_status_tree()
,
add_var()
,
guess_plotdim()
,
pick_drop
Examples
# styler: off
vft <- tribble(
~QX, ~QY,
17.9, 0,
4.1, 15,
6.1, 17.3,
3.8, 5.9,
4.5, 12.4,
4.9, 9.3,
9.8, 3.2,
18.6, 1.1,
17.3, 4.1,
1.5, 16.3
)
# styler: on
add_subquad(vft, 20, 20, 5, 5)
add_subquad(vft, 20, 20, 5, 5, subquad_offset = -1)
Add columns lx/ly
, QX/QY
, index
, col/row
, hectindex
, quad
, gx/gy
.
Description
These functions add columns to position trees in a forest plot. They work
with ViewFullTable, tree and stem tables. From the input table, most
functions use only the gx
and gy
columns (or equivalent columns). The
exception is the function add_gxgy()
which inputs quadrat information. If
your data lacks some important column, an error message will inform you which
column is missing.
Usage
add_lxly(data, gridsize = 20, plotdim = NULL)
add_qxqy(data, gridsize = 20, plotdim = NULL)
add_index(data, gridsize = 20, plotdim = NULL)
add_col_row(data, gridsize = 20, plotdim = NULL)
add_hectindex(data, gridsize = 20, plotdim = NULL)
add_quad(data, gridsize = 20, plotdim = NULL, start = NULL, width = 2)
add_gxgy(data, gridsize = 20, start = 0)
Arguments
data |
A ForestGEO-like dataframe: A ViewFullTable, tree or stem table. |
gridsize |
The gridsize of the census plot (commonly 20 m). |
plotdim |
The global dimensions of the census plot (i.e. the
maximum possible values of |
start |
Defaults to label the first quadrat as "0101". Use |
width |
Number; width to pad the labels of plot-columns and -rows. |
Details
These functions are adapted from the CTFS R Package.
Value
For any given var
, a function add_var()
returns a modified
version of the input dataframe, with the additional variable(s) var
.
See Also
Other functions to add columns to dataframes:
add_status_tree()
,
add_subquad()
Other functions for ForestGEO data:
add_status_tree()
,
add_subquad()
Other functions for fgeo census:
add_status_tree()
,
guess_plotdim()
,
pick_drop
Other functions for fgeo vft:
add_status_tree()
,
add_subquad()
,
guess_plotdim()
,
pick_drop
Examples
# styler: off
x <- tribble(
~gx, ~gy,
0, 0,
50, 25,
999.9, 499.95,
1000, 500
)
# styler: on
# `gridsize` has a common default; `plotdim` is guessed from the data
add_lxly(x)
gridsize <- 20
plotdim <- c(1000, 500)
add_qxqy(x, gridsize, plotdim)
add_index(x, gridsize, plotdim)
add_hectindex(x, gridsize, plotdim)
add_quad(x, gridsize, plotdim)
add_quad(x, gridsize, plotdim, start = 0)
# `width` gives the nuber of digits to pad the label of plot-rows and
# plot-columns, e.g. 3 pads plot-rows with three zeros and plot-columns with
# an extra trhree zeros, resulting in a total of 6 zeros.
add_quad(x, gridsize, plotdim, start = 0, width = 3)
add_col_row(x, gridsize, plotdim)
# From `quadrat` or `QuadratName` --------------------------------------
# styler: off
x <- tribble(
~QuadratName,
"0001",
"0011",
"0101",
"1001"
)
# styler: on
# Output `gx` and `gy` ---------------
add_gxgy(x)
assert_is_installed("fgeo.x")
# Warning: The data may already have `gx` and `gx` columns
gxgy <- add_gxgy(fgeo.x::tree5)
select(gxgy, matches("gx|gy"))
# Output `col` and `row` -------------
# Create columns `col` and `row` from `QuadratName` with `tidyr::separate()`
# The argument `sep` lets you separate `QuadratName` at any positon
## Not run:
tidyr_is_installed <- requireNamespace("tidyr", quietly = TRUE)
stringr_is_installed <- requireNamespace("stringr", quietly = TRUE)
if (tidyr_is_installed && stringr_is_installed) {
library(tidyr)
library(stringr)
vft <- tibble(QuadratName = c("0001", "0011"))
vft
separate(
vft,
QuadratName,
into = c("col", "row"),
sep = 2
)
census <- select(fgeo.x::tree5, quadrat)
census
census$quadrat <- str_pad(census$quadrat, width = 4, pad = 0)
separate(
census,
quadrat,
into = c("col", "row"),
sep = 2,
remove = FALSE
)
}
## End(Not run)
Assert a package is installed.
Description
Assert a package is installed.
Usage
assert_is_installed(pkg)
Arguments
pkg |
Character vector giving the name of a package. |
Value
An error if pkg
is not installed or invisible pkg
if it is.
Examples
assert_is_installed("base")
## Not run:
try(assert_is_installed("bad"))
## End(Not run)
Check if an object contains specific names.
Description
Check if an object contains specific names.
Usage
check_crucial_names(x, nms)
Arguments
x |
A named object. |
nms |
String; names expected to be found in |
Value
Invisible x
, or an error with informative message.
See Also
Other functions to check inputs:
flag_if_group()
,
is_multiple()
Other functions for developers:
extract_insensitive()
,
flag_if_group()
,
is_multiple()
,
nms_try_rename()
,
rename_matches()
,
type_ensure()
Examples
v <- c(x = 1)
check_crucial_names(v, "x")
dfm <- data.frame(x = 1)
check_crucial_names(dfm, "x")
Drop if missing values.
Description
Valuable mostly for its warning.
Usage
drop_if_na(dfm, x)
Arguments
dfm |
A dataframe. |
x |
String giving a column name of |
Value
A dataframe.
See Also
tidyr::drop_na()
.
Examples
dfm <- data.frame(a = 1, b = NA)
drop_if_na(dfm, "b")
drop_if_na(dfm, "a")
Extract plot dimensions from habitat data.
Description
Extract plot dimensions from habitat data.
Usage
extract_gridsize(habitats)
extract_plotdim(habitats)
Arguments
habitats |
Data frame giving the habitat designation for each 20x20 quadrat. |
Value
-
extract_plotdim()
:plotdim
(vector of length 2); -
extract_gridsize()
:gridsize
(scalar).
Examples
assert_is_installed("fgeo.x")
habitat <- fgeo.x::habitat
extract_plotdim(habitat)
extract_gridsize(habitat)
Detect and extract matching strings – ignoring case.
Description
Detect and extract matching strings – ignoring case.
Return TRUE in position where name of x is in y; FALSE otherwise.
Usage
extract_insensitive(x, y)
detect_insensitive(x, y)
Arguments
x |
A string to be muted as in |
y |
A string to use as a reference to match |
Value
detect_*
and extract_*
return a logical vector and a string.
See Also
Other functions for developers:
check_crucial_names()
,
flag_if_group()
,
is_multiple()
,
nms_try_rename()
,
rename_matches()
,
type_ensure()
Other general functions to deal with names:
rename_matches()
Examples
x <- c("stemid", "n")
y <- c("StemID", "treeID")
detect_insensitive(x, y)
extract_insensitive(x, y)
vft <- data.frame(TreeID = 1, Status = 1)
extract_insensitive(tolower(names(vft)), names(vft))
extract_insensitive(names(vft), tolower(names(vft)))
Create elevation data.
Description
This function constructs an object of class "fgeo_elevation". It standardizes
the structure of elevation data to always output a dataframe with names gx
,
gy
and elev
.
Usage
fgeo_elevation(elev)
Arguments
elev |
One of these:
|
Value
A dataframe with names x/gx
, y/gy
and elev
.
Acknowledgments
This function was inspired by David Kenfack.
Examples
assert_is_installed("fgeo.x")
# Input: Elevation dataframe
elevation_df <- fgeo.x::elevation$col
fgeo_elevation(elevation_df)
class(elevation_df)
class(fgeo_elevation(elevation_df))
names(elevation_df)
names(fgeo_elevation(elevation_df))
# Input: Elevation list
elevation_ls <- fgeo.x::elevation
fgeo_elevation(elevation_ls)
class(elevation_ls)
class(fgeo_elevation(elevation_ls))
names(elevation_ls)
names(fgeo_elevation(elevation_ls))
Flag if a vector or dataframe-column meets a condition.
Description
This function returns a condition (error, warning, or message) and its first argument, invisibly. It is a generic. If the first input is a vector, it evaluates it directly; if it is is a dataframe, it evaluates a given column.
Usage
flag_if(.data, ...)
## Default S3 method:
flag_if(.data, predicate, condition = warning, msg = NULL, ...)
## S3 method for class 'data.frame'
flag_if(.data, name, predicate, condition = warning, msg = NULL, ...)
Arguments
.data |
Vector. |
... |
Other arguments passed to methods. |
predicate |
A predicate function. |
condition |
A condition function (e.g. |
msg |
String. An optional custom message. |
name |
String. The name of a column of a dataframe. |
Value
A condition (and .data
invisibly).
See Also
Other functions for internal use in other fgeo packages:
guess_plotdim()
,
is_multiple()
Examples
# WITH VECTORS
dupl <- c(1, 1)
flag_if(dupl, is_duplicated)
# Silent
flag_if(dupl, is_multiple)
mult <- c(1, 2)
flag_if(mult, is_multiple, message, "Custom")
# Silent
flag_if(mult, is_duplicated)
# Both silent
flag_if(c(1, NA), is_multiple)
flag_if(c(1, NA), is_duplicated)
# WITH DATAFRAMES
.df <- data.frame(a = 1:3, b = 1, stringsAsFactors = FALSE)
flag_if(.df, "b", is_multiple)
flag_if(.df, "a", is_multiple)
flag_if(.df, "a", is_multiple, message, "Custom")
Detect and flag based on a predicate applied to a variable by groups.
Description
These functions extend flag_if()
] and detect_if()
to
work by groups defined with dplyr::group_by()
.
Usage
flag_if_group(.data, name, predicate, condition = warn, msg = NULL)
detect_if_group(.data, name, predicate)
Arguments
.data |
A dataframe. |
name |
String. The name of a column of the dataframe. |
predicate |
A predicate function, e.g. |
condition |
A condition function, e.g. |
msg |
String to customize the returned message. |
Value
-
flag_if_group()
: A condition and its first input, invisibly. -
detect_if_group()
: Logical of length 1.
See Also
Other functions to check inputs:
check_crucial_names()
,
is_multiple()
Other functions for developers:
check_crucial_names()
,
extract_insensitive()
,
is_multiple()
,
nms_try_rename()
,
rename_matches()
,
type_ensure()
Examples
tree <- tibble(CensusID = c(1, 2), treeID = c(1, 2))
detect_if_group(tree, "treeID", is_multiple)
flag_if_group(tree, "treeID", is_multiple)
by_censusid <- group_by(tree, CensusID)
detect_if_group(by_censusid, "treeID", is_multiple)
flag_if_group(by_censusid, "treeID", is_multiple)
Functions to get variables from other variables.
Description
These functions wrap their corresponding functions from the CTFS R Package, but these versions are stricter. The main differences are these:
names use "_" not ".".
argument gridsize defaults to missing to force the user to provide it.
If the argument
plotdim
is missing from functionsgxgy_fun()
, its value will be guessed and notified.
Usage
rowcol_to_index(rowno, colno, gridsize, plotdim)
index_to_rowcol(index, gridsize, plotdim)
gxgy_to_index(gx, gy, gridsize, plotdim)
gxgy_to_lxly(gx, gy, gridsize, plotdim)
gxgy_to_qxqy(gx, gy, gridsize, plotdim)
gxgy_to_rowcol(gx, gy, gridsize, plotdim)
gxgy_to_hectindex(gx, gy, plotdim)
index_to_gxgy(index, gridsize, plotdim)
Arguments
rowno , colno |
Row and column number – as defined in a census plot. |
gridsize |
The gridsize of the census plot (commonly 20 m). |
plotdim |
The global dimensions of the census plot (i.e. the
maximum possible values of |
index |
Index number as defined for a census plot. |
gx , gy |
A number; global x and y position in a census plot. |
Details
gxgy_to_qxqy()
didn't exist in the original CTFS R Package. Added for
consistency.
Value
A vector or dataframe (see examples).
Author(s)
Rick Condit, Suzanne Lao.
Examples
gxgy_to_index(c(0, 400, 990), c(0, 200, 490), gridsize = 20)
gridsize <- 20
plotdim <- c(1000, 500)
x <- gxgy_to_hectindex(1:3, 1:3, plotdim)
x
typeof(x)
is.data.frame(x)
is.vector(x)
x <- gxgy_to_index(1:3, 1:3, gridsize, plotdim)
x
typeof(x)
is.data.frame(x)
is.vector(x)
x <- gxgy_to_lxly(1:3, 1:3, gridsize, plotdim)
x
typeof(x)
is.data.frame(x)
is.vector(x)
x <- gxgy_to_rowcol(1:3, 1:3, gridsize, plotdim)
x
typeof(x)
is.data.frame(x)
is.vector(x)
x <- index_to_rowcol(1:3, gridsize, plotdim)
x
typeof(x)
is.data.frame(x)
is.vector(x)
x <- rowcol_to_index(1:3, 1:3, gridsize, plotdim)
x
typeof(x)
is.data.frame(x)
is.vector(x)
index_to_gxgy(1:3, gridsize, plotdim)
Guess plot dimensions.
Description
Guess plot dimensions.
Usage
guess_plotdim(x, accuracy = 20)
Arguments
x |
A ForestGEO-like dataframe: A ViewFullTable, tree or stem table. |
accuracy |
A number giving the accuracy with which to round |
Value
A numeric vector of length 2.
See Also
Other functions for fgeo census and vft:
pick_drop
Other functions for fgeo census:
add_status_tree()
,
add_var()
,
pick_drop
Other functions for fgeo vft:
add_status_tree()
,
add_subquad()
,
add_var()
,
pick_drop
Other functions for internal use in other fgeo packages:
flag_if()
,
is_multiple()
Examples
x <- data.frame(
gx = c(0, 300, 979),
gy = c(0, 300, 481)
)
guess_plotdim(x)
Predicates to detect and flag duplicated and multiple values of a variable.
Description
is_multiple()
and is_duplicated()
return TRUE
if they detect,
respectively, multiple different values of a variable (e.g. c(1, 2)), or duplicated values of a variable (e.g. c(1, 1)
).
Usage
is_multiple(.data)
is_duplicated(.data)
Arguments
.data |
A vector. |
Value
Logical.
See Also
Other functions for internal use in other fgeo packages:
flag_if()
,
guess_plotdim()
Other functions to check inputs:
check_crucial_names()
,
flag_if_group()
Other functions for developers:
check_crucial_names()
,
extract_insensitive()
,
flag_if_group()
,
nms_try_rename()
,
rename_matches()
,
type_ensure()
Examples
is_multiple(c(1, 2))
is_multiple(c(1, 1))
is_multiple(c(1, NA))
is_duplicated(c(1, 2))
is_duplicated(c(1, 1))
is_duplicated(c(1, NA))
Try to rename an object.
Description
Given a name you want and a possible alternative, this function renames an object as you want or errs with an informative message.
Usage
nms_try_rename(x, want, try)
Arguments
x |
A named object. |
want |
String of length 1 giving the name you want the object to have. |
try |
String of length 1 giving the name the object might have. |
See Also
nms
Other functions for developers:
check_crucial_names()
,
extract_insensitive()
,
flag_if_group()
,
is_multiple()
,
rename_matches()
,
type_ensure()
Examples
nms_try_rename(c(a = 1), "A", "a")
nms_try_rename(data.frame(a = 1), "A", "a")
# Passes
nms_try_rename(c(a = 1, 1), "A", "a")
## Not run:
# Errs
# nms_try_rename(1, "A", "A")
## End(Not run)
Pick and drop rows from ViewFullTable, tree, and stem tables.
Description
These functions provide an expressive and convenient way to pick specific
rows from ForestGEO datasets. They allow you to remove missing values (with
na.rm = TRUE
) but conservatively default to preserving them. This behavior
is similar to base::subset()
and unlike dplyr::filter()
. This
conservative default is important because you want want to include missing
trees in your analysis.
Usage
pick_dbh_min(data, value, na.rm = FALSE)
pick_dbh_max(data, value, na.rm = FALSE)
pick_dbh_under(data, value, na.rm = FALSE)
pick_dbh_over(data, value, na.rm = FALSE)
pick_status(data, value, na.rm = FALSE)
drop_status(data, value, na.rm = FALSE)
Arguments
data |
A ForestGEO-like dataframe: A ViewFullTable, tree or stem table. |
value |
An atomic vector; a single value against to compare each value of the variable encoded in the function's name. |
na.rm |
Set to |
Value
A dataframe similar to .data
but including only the rows with
matching conditions.
See Also
dplyr::filter()
, Extract
([
).
Other functions for fgeo census and vft:
guess_plotdim()
Other functions for fgeo census:
add_status_tree()
,
add_var()
,
guess_plotdim()
Other functions for fgeo vft:
add_status_tree()
,
add_subquad()
,
add_var()
,
guess_plotdim()
Other functions to pick or drop rows of a ForestGEO dataframe:
pick_main_stem()
Examples
# styler: off
census <- tribble(
~dbh, ~status,
0, "A",
50, "A",
100, "A",
150, "A",
NA, "M",
NA, "D",
NA, NA
)
# styler: on
# <=
pick_dbh_max(census, 100)
pick_dbh_max(census, 100, na.rm = TRUE)
# >=
pick_dbh_min(census, 100)
pick_dbh_min(census, 100, na.rm = TRUE)
# <
pick_dbh_under(census, 100)
pick_dbh_under(census, 100, na.rm = TRUE)
# >
pick_dbh_over(census, 100)
pick_dbh_over(census, 100, na.rm = TRUE)
# Same, but `subset()` does not let you keep NAs.
subset(census, dbh > 100)
# ==
pick_status(census, "A")
pick_status(census, "A", na.rm = TRUE)
# !=
drop_status(census, "D")
drop_status(census, "D", na.rm = TRUE)
# Compose
pick_dbh_over(
drop_status(census, "D", na.rm = TRUE),
100
)
# More readable as a pipiline
census %>%
drop_status("D", na.rm = TRUE) %>%
pick_dbh_over(100)
# Also works with ViewFullTables
# styler: off
vft <- tribble(
~DBH, ~Status,
0, "alive",
50, "alive",
100, "alive",
150, "alive",
NA, "missing",
NA, "dead",
NA, NA
)
# styler: on
pick_dbh_max(vft, 100)
pick_status(vft, "alive", na.rm = TRUE)
Pick the main stem or main stemid(s) of each tree in each census.
Description
-
pick_main_stem()
picks a unique row for eachtreeID
per census. -
pick_main_stemid()
picks a unique row for eachstemID
per census. It is only useful when a single stem was measured twice in the same census, which sometimes happens to correct for the effect of large buttresses.
Usage
pick_main_stem(data)
pick_main_stemid(data)
Arguments
data |
A ForestGEO-like dataframe: A ViewFullTable, tree or stem table. |
Details
-
pick_main_stem()
picks the main stem of each tree in each census. It collapses data of multi-stem trees by picking a single stem pertreeid
percensusid
. From this group, it picks the stem at the top of a list sorted first by descending order ofhom
and then by descending order ofdbh
. This this corrects the effect of buttresses and picks the main stem. It ignores groups of grouped data and rejects data with multiple plots. -
pick_main_stemid()
does one step less thanpick_main_stem()
. It only picks the main stemid(s) of each tree in each census and keeps all stems per treeid. This is useful when calculating the total basal area of a tree, because you need to sum the basal area of each individual stem as well as sum only one of the potentially multiple measurements of each buttressed stem per census.
Value
A dataframe with a single plotname, and one row per per treeid per censusid.
Warning
These functions may be considerably slow. They are fastest if the data
already has a single stem per treeid. They are slower with data containing
multiple stems per treeid
(per censusid
), which is the main reason for
using this function. The slowest scenario is when data also contains
duplicated values of stemid
per treeid
(per censusid
). This may
happen if trees have buttresses, in which case these functions check
every stem for potential duplicates and pick the one with the largest hom
value.
For example, in a windows computer with 32 GB of RAM, a dataset with 2 million rows with multiple stems and buttresses took about 3 minutes to run. And a dataset with 2 million rows made up entirely of main stems took about ten seconds to run.
See Also
Other functions to pick or drop rows of a ForestGEO dataframe:
pick_drop
Examples
# One `treeID` with multiple stems.
# `stemID == 1.1` has two measurements (due to buttresses).
# `stemID == 1.2` has a single measurement.
# styler: off
census <- tribble(
~sp, ~treeID, ~stemID, ~hom, ~dbh, ~CensusID,
"sp1", "1", "1.1", 140, 40, 1, # main stemID (max `hom`)
"sp1", "1", "1.1", 130, 60, 1,
"sp1", "1", "1.2", 130, 55, 1 # main stemID (only one)
)
#' # styler: on
# Picks a unique row per unique `treeID`
pick_main_stem(census)
# Picks a unique row per unique `stemID`
pick_main_stemid(census)
Import ViewFullTable or ViewTaxonomy data from a .tsv or .csv file.
Description
read_vft()
and read_taxa()
help you to read ViewFullTable and
ViewTaxonomy data from text files delivered by the ForestGEO database.
These functions avoid common problems about column separators, missing
values, column names, and column types.
Usage
read_vft(file, delim = NULL, na = c("", "NA", "NULL"), ...)
read_taxa(file, delim = NULL, na = c("", "NA", "NULL"), ...)
Arguments
file |
A path to a file. |
delim |
Single character used to separate fields within a record. The
default ( |
na |
Character vector of strings to interpret as missing values. Set this
option to |
... |
Other arguments passed to |
Value
A tibble.
Acknowledgments
Thanks to Shameema Jafferjee Esufali for inspiring the feature that
automatically detects delim
(issue 65).
See Also
readr::read_delim()
, type_vft()
, type_taxa()
.
Other functions to read text files delivered by ForestgGEO's database:
type_vft()
Examples
assert_is_installed("fgeo.x")
library(fgeo.x)
example_path()
file_vft <- example_path("view/vft_4quad.csv")
read_vft(file_vft)
file_taxa <- example_path("view/taxa.csv")
read_taxa(file_taxa)
Recode subquadrat.
Description
Recode subquadrat.
Usage
recode_subquad(data, offset = -1)
Arguments
data |
A dataframe with the variable |
offset |
A number; either -1 or 1, to rest or add one unit to the number of column of each subquadrat. First column is 0 First column is 1 ----------------- ----------------- 04 14 24 34 14 24 34 44 03 13 23 33 13 23 33 43 02 12 22 32 12 22 32 42 01 11 21 31 11 21 31 41 |
Value
A modified version of the input.
Examples
first_subquad_11 <- tibble(subquadrat = c("11", "12", "22"))
first_subquad_11
first_subquad_01 <- recode_subquad(first_subquad_11, offset = -1)
first_subquad_01
first_subquad_11 <- recode_subquad(first_subquad_01, offset = 1)
first_subquad_11
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
- dplyr
add_count
,arrange
,count
,filter
,group_by
,mutate
,select
,summarise
,summarize
,ungroup
- rlang
- tibble
- tidyselect
contains
,ends_with
,everything
,last_col
,matches
,num_range
,one_of
,starts_with
Rename an object based on case-insensitive match of the names of a reference.
Description
Rename an object based on case-insensitive match of the names of a reference.
Usage
rename_matches(x, y)
Arguments
x |
x object which names to restored if they match the reference. |
y |
Named object to use as reference. |
Value
The output is x
with as many names changed as case-insensitive
matches there are with the reference.
See Also
Other functions for developers:
check_crucial_names()
,
extract_insensitive()
,
flag_if_group()
,
is_multiple()
,
nms_try_rename()
,
type_ensure()
Other general functions to deal with names:
extract_insensitive()
Examples
ref <- data.frame(COL1 = 1, COL2 = 1)
x <- data.frame(col1 = 5, col2 = 1, n = 5)
rename_matches(x, ref)
Fix common problems in ViewFullTable and ViewTaxonomy data.
Description
These functions fix common problems of ViewFullTable and ViewTaxonomy data:
Ensure that each column has the correct type.
Ensure that missing values are represented with
NA
s – not with the literal string "NULL".
Usage
sanitize_vft(.data, na = c("", "NA", "NULL"), ...)
sanitize_taxa(.data, na = c("", "NA", "NULL"), ...)
Arguments
.data |
A dataframe; either a ForestGEO ViewFullTable
( |
na |
Character vector of strings to interpret as missing values. Set this
option to |
... |
Arguments passed to |
Value
A dataframe.
Acknowledgments
Thanks to Shameema Jafferjee Esufali for motivating this functions.
See Also
Examples
assert_is_installed("fgeo.x")
vft <- fgeo.x::vft_4quad
# Introduce problems to show how to fix them
# Bad column types
vft[] <- lapply(vft, as.character)
# Bad representation of missing values
vft$PlotName <- "NULL"
# "NULL" should be replaced by `NA` and `DBH` should be numeric
str(vft[c("PlotName", "DBH")])
# Fix
vft_sane <- sanitize_vft(vft)
str(vft_sane[c("PlotName", "DBH")])
taxa <- read.csv(fgeo.x::example_path("taxa.csv"))
# E.g. inserting bad column types
taxa[] <- lapply(taxa, as.character)
# E.g. inserting bad representation of missing values
taxa$SubspeciesID <- "NULL"
# "NULL" should be replaced by `NA` and `ViewID` should be integer
str(taxa[c("SubspeciesID", "ViewID")])
# Fix
taxa_sane <- sanitize_taxa(taxa)
str(taxa_sane[c("SubspeciesID", "ViewID")])
Tidy eval helpers
Description
This page lists the tidy eval tools reexported in this package from rlang. To learn about using tidy eval in scripts and packages at a high level, see the dplyr programming vignette and the ggplot2 in packages vignette. The Metaprogramming section of Advanced R may also be useful for a deeper dive.
The tidy eval operators
{{
,!!
, and!!!
are syntactic constructs which are specially interpreted by tidy eval functions. You will mostly need{{
, as!!
and!!!
are more advanced operators which you should not have to use in simple cases.The curly-curly operator
{{
allows you to tunnel data-variables passed from function arguments inside other tidy eval functions.{{
is designed for individual arguments. To pass multiple arguments contained in dots, use...
in the normal way.my_function <- function(data, var, ...) { data %>% group_by(...) %>% summarise(mean = mean({{ var }})) }
-
rlang::enquo()
andrlang::enquos()
delay the execution of one or several function arguments. The former returns a single expression, the latter returns a list of expressions. Once defused, expressions will no longer evaluate on their own. They must be injected back into an evaluation context with!!
(for a single expression) and!!!
(for a list of expressions).my_function <- function(data, var, ...) { # Defuse var <- enquo(var) dots <- enquos(...) # Inject data %>% group_by(!!!dots) %>% summarise(mean = mean(!!var)) }
In this simple case, the code is equivalent to the usage of
{{
and...
above. Defusing withenquo()
orenquos()
is only needed in more complex cases, for instance if you need to inspect or modify the expressions in some way. The
.data
pronoun is an object that represents the current slice of data. If you have a variable name in a string, use the.data
pronoun to subset that variable with[[
.my_var <- "disp" mtcars %>% summarise(mean = mean(.data[[my_var]]))
Another tidy eval operator is
:=
. It makes it possible to use glue and curly-curly syntax on the LHS of=
. For technical reasons, the R language doesn't support complex expressions on the left of=
, so we use:=
as a workaround.my_function <- function(data, var, suffix = "foo") { # Use `{{` to tunnel function arguments and the usual glue # operator `{` to interpolate plain strings. data %>% summarise("{{ var }}_mean_{suffix}" := mean({{ var }})) }
Many tidy eval functions like
dplyr::mutate()
ordplyr::summarise()
give an automatic name to unnamed inputs. If you need to create the same sort of automatic names by yourself, useas_label()
. For instance, the glue-tunnelling syntax above can be reproduced manually with:my_function <- function(data, var, suffix = "foo") { var <- enquo(var) prefix <- as_label(var) data %>% summarise("{prefix}_mean_{suffix}" := mean(!!var)) }
Expressions defused with
enquo()
(or tunnelled with{{
) need not be simple column names, they can be arbitrarily complex.as_label()
handles those cases gracefully. If your code assumes a simple column name, useas_name()
instead. This is safer because it throws an error if the input is not a name as expected.
Ensure the specific columns of a dataframe have a particular type.
Description
Ensure the specific columns of a dataframe have a particular type.
Usage
type_ensure(df, ensure_nms, type = "numeric")
Arguments
df |
A dataframe. |
ensure_nms |
Character vector giving names of |
type |
A string giving the type to ensure in columns |
Value
A modified version of df
, with columns (specified in ensure_nms
)
of type type
.
See Also
Other functions to operate on column types:
type_vft()
Other functions for developers:
check_crucial_names()
,
extract_insensitive()
,
flag_if_group()
,
is_multiple()
,
nms_try_rename()
,
rename_matches()
Examples
dfm <- tibble(
w = c(NA, 1, 2),
x = 1:3,
y = as.character(1:3),
z = letters[1:3]
)
dfm
type_ensure(dfm, c("w", "x", "y"), "numeric")
type_ensure(dfm, c("w", "x", "y", "z"), "character")
Help to read ForestGEO data safely, with consistent columns type.
Description
A common cause of problems is feeding functions with data which columns are
not all of the expected type. The problem often begins when reading data from
a text file with functions such as utils::read.csv()
,
utils::read.delim()
, and friends – which commonly guess wrongly the column
type that you more likely expect. These common offenders are strongly
discouraged; instead consider using readr::read_csv()
, readr::read_tsv()
,
and friends, which guess column types correctly much more often than their
analogs from the utils package.
type_vft()
and type_taxa()
help you to read data more safely by
explicitly specifying what type to expect from each column of known datasets.
These functions output the specification of column types used internally by
read_vft()
and read_taxa()
:
-
type_vft():
Type specification for ViewFullTable. -
type_taxa():
Type specification for ViewFullTaxonomy.
Usage
type_vft()
type_taxa()
Details
Types reference (for more details see readr::read_delim()
):
c = character,
i = integer,
n = number,
d = double,
l = logical,
D = date,
T = date time,
t = time,
? = guess,
or _/- to skip the column.'.
Value
A list.
See Also
Other functions to operate on column types:
type_ensure()
Other functions to read text files delivered by ForestgGEO's database:
read_vft()
Examples
assert_is_installed("fgeo.x")
library(fgeo.x)
library(readr)
str(type_vft())
read_csv(example_path("view/vft_4quad.csv"), col_types = type_vft())
str(type_taxa())
read_csv(example_path("view/taxa.csv"), col_types = type_taxa())