The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
Version 0.0.0.9003
Welcome to climate modeler in R, short climodr. This package uses point data from climate stations, spectral imagery and elevation models to automatically create ready-to-use climate maps.
First of all, the idea of climodr is to deliver an easy to use method for creating high quality climate maps. Like the one we create in this vignette:
Note: This example will be created with dummy data and will not
create a good model, it is for educational purposes only. Lets take a
look into the basic structure of climodr:
Climodr is mainly split into four steps: Environment, Pre-Processing, Processing and Plotting With only these four steps climodr creates basic but resilient climate models and climate maps, without making it to complicated for the user.
This Vignette should guide you through the package, explain its functions and give you an idea of how to use climodr. The example contains a few dummy climate stations, a metadata-file for the climate stations, a vector file containing our area of interest, a small multi band satellite image and a digital elevation model (DEM). And this is everything one needs to run climodr!
This example will create climate maps for our climate sensor TA_200, which is the air temperature measured at a height of 2m above ground.
The Idea of climodr is, to speed up climate modelling processes and make them easier to use. The package foresees that one needs to store relevant input data into one folder structure and the package does the rest using the example workflow provided in this vignette. The functions are still modifyable, so one can adjust the model workflow to your liking.
To start with climodr, you first need to download and install the package from CRAN.
It may asks you to install all packages climodr needs to execute all its functions. Its mandatory to install these packages, otherwise climodr won’t be able to execute its functions, as it takes many functions into its comprehensive workflow.
You can also install the latest development version from the Environmental Informatics Lab (envima) @ Marburg University from Github. To do so, you need devtools installed to your R. Once devtools is installed, you can simply add climodr by following commands.
#install climodr (last a bit, but shouldn't take longer than 5-10 Minutes)
devtools::install_github(
"https://github.com/envima/climodr.git",
auth_token = "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",
dependencies = TRUE,
build_vignettes = TRUE)
Nice to add: Link to the help-pages on CRAN, once package is published
Setting up climodr just requires one step, before you can get
started.
Activating the package will cause some conflicts originating from
tydiverse. This is completely fine, as these functions won’t be tackled.
You can still use the old functions by addressing the package, e.g. like
stats::filter()
.
With the envi.create() function, one points out a path where
the package should store all its data. There is also the
memfrac argument. This argument allows you to change the
fraction of your RAM the terra-package is allowed to use. By default
this number is pretty low, so this way the process can be sped up.
library(climodr)
# setting up the environment for climodr
envrmt <- envi.create(tempdir(),
memfrac = 0.8)
## Succesfully created an environment in C:\Users\ALEXAN~1\AppData\Local\Temp\RtmpAf8TXY.
## Loading example data for the climodr example..
## Saved climodr example dependency files to {C:\Users\ALEXAN~1\AppData\Local\Temp\RtmpAf8TXY/input/dep}.
## Saved climodr example raster files to {C:\Users\ALEXAN~1\AppData\Local\Temp\RtmpAf8TXY/input/raster}.
## Saved climodr example tabular files to {C:\Users\ALEXAN~1\AppData\Local\Temp\RtmpAf8TXY/input/tabular}.
## Saved climodr example vector files to {C:\Users\ALEXAN~1\AppData\Local\Temp\RtmpAf8TXY/input/vector}.
## Done loading all the example files. You are ready to continue.
# remove everything in the global environment except of our environment path list
rm(list = setdiff(ls(), "envrmt"))
Climodr then creates an environment with three main folders:
- Input (for all necessary data the user must bring)
- Output (for ready-to-use data created by climodr)
- Workflow (for climodr to store data during the process)
The Input-Directory is the place, where all data, which shall be used
for modelling, should be saved beforehand. It consists of four different
folders:
- dep (Dependency, like a resolution image or metadata)
- raster (Raster data, work in progress)
- tabular (Tabular data, containing climate data from the climate
stations)
- vector (Vector data, like the study area or climate station point
data)
See list
of possible inputs for further details, what kind of input-data can
be used.
The Output-Folder is the place, where all final data, which is created
by the package, is stored in. It consists of three different
folders:
- maps (basic ready-to-use maps)
- predictions (plain prediction imagery)
- statistics (perfomance of the predictions and other statistics)
The Output-Directory contains all the reade-to-use data in some basic formats, which should be publication-ready if no other needs are wanted or required.
The Workflow-Directory contains all steps in between the Input and the Output. In here there are models, test and training data, clean tabular data, and so on.
Note:
- Do not delete any of these folders, since climodr requires those to run properly!
- The higher you set the fraction of RAM that climodr will use, the slower the
PC will become when running climodr, in case you want to do something in parallel
on the PC. Using a fraction > 0.8 can even make it hard to use a browser while
using climodr.
For this package, a small showcase product has been edited, which
comes with climodr. Its a small scene located in Hainich national park
in Germany. There are ten climate stations located in this scene.
Note: This is just an example with very few stations and a very small scene,
which will not result into a good model and is only used for educational purposes.
You don’t have to use raw format data that wasn’t pre-processed earlier. If you have data, that equals one of the levels in this Pre-Processing step, you can step in at the corresponding stage. For now you’ll need to take care that the data matches the pattern climodr produces in this workflow.
First, we have to prepare the raw tabular data for further uses. The prep.csv function cleans up the data and removes all NA values from the data.
## Removed NAs from 10 stations
#check the created csv files
csv_files <- grep("_no_NAs.csv$",
list.files(envrmt$path_tworkflow),
value=TRUE)
csv_files
## [1] "Station_G06_no_NAs.csv" "Station_G17_no_NAs.csv" "Station_G20_no_NAs.csv"
## [4] "Station_G21_no_NAs.csv" "Station_G25_no_NAs.csv" "Station_G48_no_NAs.csv"
## [7] "Station_W10_no_NAs.csv" "Station_W11_no_NAs.csv" "Station_W19_no_NAs.csv"
## [10] "Station_W20_no_NAs.csv"
Next, the data needs to be aggregated for to the desired time
steps.
In this version one can aggregate data into “monthly” and into “anual”
data.
The rbind argument stores all climate station data into one
file. This step is recommended, since the data usually will become way
shorter after the time aggregation and is easier to be processed further
this way.
csv_data <- proc.csv(envrmt = envrmt,
method = "monthly",
rbind = TRUE,
save_output = TRUE)
head(csv_data)
## plot datetime year month monthly_mean_Ta_200
## 1 Station_G06 2017-07-01 12:00:00 2017 7 17.32016
## 2 Station_G17 2017-07-01 12:00:00 2017 7 17.21474
## 3 Station_G20 2017-07-01 12:00:00 2017 7 17.47536
## 4 Station_G21 2017-07-01 12:00:00 2017 7 17.32417
## 5 Station_G25 2017-07-01 12:00:00 2017 7 17.48638
## 6 Station_G48 2017-07-01 12:00:00 2017 7 17.91024
Next, the stations have to be spatially located in a coordinate
system. This step is crucial to process the data in a modelation use
case.
csv_spat <- spat.csv(envrmt = envrmt,
method = "monthly",
des_file = "plot_description.csv",
save_output = TRUE)
head(csv_spat)
## plot datetime year month Ta_200 x y
## 1 Station_G06 2017-07-01 12:00:00 2017 7 17.320 810379.0 5894758
## 2 Station_G17 2017-07-01 12:00:00 2017 7 17.215 811096.8 5893045
## 3 Station_G20 2017-07-01 12:00:00 2017 7 17.475 807298.3 5892820
## 4 Station_G21 2017-07-01 12:00:00 2017 7 17.324 806897.6 5894563
## 5 Station_G25 2017-07-01 12:00:00 2017 7 17.486 811406.0 5891638
## 6 Station_G48 2017-07-01 12:00:00 2017 7 17.910 807097.4 5894840
Now, that we have spatial points of our stations, we can continue
with our raster data. The preferred method here is
"MB_Timeseries"
, which stands for multi band time
series. Use this method, if you provide multiple single band
rasters or raster stacks with different time stamps (YYYYMMDD…) in
the file names per scene. The function sorts them by date and crops
the data to our study area.
## Reading in raster 1/2.
## Cropping raster...
## Saving raster to rworkflow [C:\Users\ALEXAN~1\AppData\Local\Temp\RtmpAf8TXY/workflow/rworkflow].
## Reading in raster 2/2.
## Cropping raster...
## Saving DGM to rfinal [C:\Users\ALEXAN~1\AppData\Local\Temp\RtmpAf8TXY/output/rfinal].
One important step in climate modelling is to have good predictor
variables, which we extract from our spatial raster imagery. In this
case we have the 10 spectral bands from the spectral raster stack and
one additional elevation layer. This data is already okay to predict
with, but may not be sufficient. We can enhance our predictors by
creating even more spatial raster layers. This way we can gain more
information from these layers by calculating spectral indices, which
derive from different spectral layers. For example, we can calculate the
NDVI, which indicates the presence or absence of chlorophyll and thus of
vegetation. These layers can also be fed in our model as new predictor
variables.
Next, we calculate some basic indices, so we can create more predictor
variables for our models. Therefore vi
chooses the
vegetation indices one wants to create. You can either list the desired
indices in a vector, or simply use all
to generate
all available indices. For more detailed information use
?calc.indices
in the console.
calc.indices(envrmt = envrmt,
vi = "all",
bands = c("blue", "green", "red",
"nir", "nirb",
"re1", "re2", "re3",
"swir1", "swir2"),
overwrite = TRUE)
## Number of Scenes to calculate Indices for: 1
## Saving rasterstack with indices to rfinal [C:\Users\ALEXAN~1\AppData\Local\Temp\RtmpAf8TXY/output/rfinal].
Now, that we have spatial points as well as raster data, we can
extract additional predictor variables at the station points from the
spatial raster data. Therefore we use the fin.csv
function.
The function uses the positions we have added to our climate station
data using spat.csv
to extract the raster values at these
positions from every layer of the spatial raster data and adds the data
to each corresponding climate station. During our modeling steps this
climate station csv-file will be used to generate Spatial Points which
then will be used to train our models.
Reminder: You can check your data at any step, just go into your Workflow-Folder of the Project directory you defined in the beginning with `envi.create` and take a look into this data. Just make sure to not alter that data, as this may cause climodr to nut run following functions correctly.
## plot datetime year month Ta_200 x y elevation
## 1 Station_G06 2017-07-01 12:00:00 2017 7 17.320 810379.0 5894758 59.55454
## 2 Station_G17 2017-07-01 12:00:00 2017 7 17.215 811096.8 5893045 59.61478
## 3 Station_G20 2017-07-01 12:00:00 2017 7 17.475 807298.3 5892820 59.18333
## 4 Station_G21 2017-07-01 12:00:00 2017 7 17.324 806897.6 5894563 54.01669
## 5 Station_G25 2017-07-01 12:00:00 2017 7 17.486 811406.0 5891638 60.84670
## 6 Station_G48 2017-07-01 12:00:00 2017 7 17.910 807097.4 5894840 54.34514
## blue green red nir nirb re1 re2 re3
## 1 381.8609 594.5724 494.7336 2659.411 2539.976 845.2596 2039.955 2437.255
## 2 272.3525 466.0617 326.9346 2654.324 2504.981 696.6929 1987.608 2377.730
## 3 208.2174 400.8459 248.1707 2307.992 2201.704 604.3890 1759.978 2074.325
## 4 360.4428 687.6474 510.7201 3604.908 3454.017 1061.7035 2734.079 3258.767
## 5 263.9718 407.4745 293.0677 2107.948 1994.610 579.5432 1579.203 1893.036
## 6 373.8912 632.2380 528.5673 2631.460 2502.235 928.6302 2023.512 2378.249
## swir1 swir2 NDVI NDWI NDBI NDBSI MSAVI
## 1 1425.205 872.6232 0.6862962 -0.6345572 -0.3021595 -0.2260200 0.8139402
## 2 1360.370 683.8118 0.7806734 -0.7012794 -0.3223044 -0.2686122 0.8768091
## 3 1068.956 494.7866 0.8058256 -0.7040459 -0.3669099 -0.3128039 0.8924525
## 4 2054.615 1044.7113 0.7518142 -0.6796093 -0.2739265 -0.2143751 0.8583096
## 5 1066.651 528.1791 0.7558803 -0.6760191 -0.3280091 -0.2712485 0.8609418
## 6 1668.608 918.8693 0.6654666 -0.6125634 -0.2239154 -0.1553429 0.7991048
Now the data is ready for further modelling.
# Processing
In this step the spatial raster data and the climate station data is ready to use. If your data isn’t, check out the ‘Pre-Processing’ chapter.
First, one tests the data for autocorrelation. The evaluation vector
contains all columns with the sensor data and predictor variables, which
will be tested for autocorrelation. It creates the first outputs in the
package with one tabular-file per sensor, which contains all columns
which should be excluded from the modulation because they autocorrelate.
It also creates some visualization for the user of the autocorrelation,
if plot.corrplot
is set to TRUE.
Note’: The visualization is quite messy, when there are a lot of predictors. Maybe make it prettier in future.
autocorr(
envrmt = envrmt,
method = "monthly",
resp = 5,
pred = c(8:23),
plot.corrplot = TRUE,
corrplot = "coef"
)
## Calculated autocorrelations for these sensors [Ta_200].
Now, that we’ve done all necessary previous steps, one can start modelling. This is by far the function with the most variables. Here we give a quick overview, what these arguments do. For more detailed information take a look into the associated paper for this package at (to be published).
timespan
= Vector with last two digits of years to build
models from (in this example 2017)
climresp
= Vector of rows to create models for. (In this
example Ta_200)
classifier
= Vector of all model variants to be used. In
this case:
- random forest = “rf”
- partial-least-squares = “pls”
- neural networks = “nnet”
- linear regression = “lm”
seed
= Number to “pick randomness”. With the seed one can
reproduce random pulls.
p
= Fraction of random taken training data from full
data.
folds
= Character or vector. Method to Create spacetime
folds. “all” ,“LLO”, “LTO” or “LLTO”.
mnote
= Character. “Model Note”. 6 digits, Marks the
different model runs in a project.
predrows
= Vector with the row numbers used as
predictors.
tc_method
= Train control method. Default is cross
validation “cv”.
metric
= Summary Metric to select optimal model. Default:
Root Mean Square Error “RMSE”.
autocorrelation
= Logical Parameter. TRUE, if the results
of the autocorrelation should be considered.
doParallel
= Logical Parameter. When set True, the
Model-Process will parallelize on all cores except two, so your PC will
slow down a lot. Only recommended for PCs with at least 8 Cores.
Warning: Your PC wont be able to process other stuff efficiently during
parallelization.
Once all parameters are set, one can run the model workflow
calc.model
like this:
calc.model(
envrmt = envrmt,
method = "monthly",
timespan = c(2017),
climresp = c(5),
classifier = c(
"rf",
"pls",
"lm"),
seed = 707,
p = 0.8,
folds = "LLO",
mnote = "vignette",
predrows = c(8:23),
tc_method = "cv",
metric = "RMSE",
autocorrelation = TRUE,
doParallel = FALSE)
## Training models for 2017. Year-Nr.: 1/1
## Training monthly models for 2017. Month-Nr.: 7
## Use autocorellation data for filtering..
## Run with spatial folds for cross validation. Fold-Nr.: 1/1
## Calculate models for sensor: Ta_200
## Next model: Random Forest. 1/3
## Computing model...
## Lade nötiges Paket: ggplot2
## Lade nötiges Paket: lattice
## Note: No increase in performance found using more than 2 variables
## Next model: Partial-Least-Squares. 2/3
## Computing model...
## Note: No increase in performance found using more than 3 variables
## Next model: Linear Regression. 3/3
## Computing model...
## Note: No increase in performance found using more than 2 variables
## Done! Saving evaluation data frame.
## year_month classifier accuracy Nrmse Rsqrd sensor modeltype note
## 1 201707 raf 0.2325688 0.1778049 NaN Ta_200 LLO vignette
## 2 201707 pls 0.1762902 0.1347784 NaN Ta_200 LLO vignette
## 3 201707 lim 0.1516702 0.1159558 NaN Ta_200 LLO vignette
## variables
## 1 NDBI, NDBSI
## 2 NDWI, NDBI, blue
## 3 re2, swir1
Congratulations, you have created your first models using
climodr!
Climodr also creates an evaluation data frame, which is saved in the
statistics folder. These performance information are later used to
predict the best models from this model run.
Further we can predict the scenes from our example with the climate
models from our spatial station data using the climpred
function. climpred
also calculates the area of
applicability, if the AOA
argument is set to TRUE.
In this example the AOA will create a lot of dissimilaritys, because the
sample data is dummy data and far away from the reality of the new data.
Keep in mind to use the same mnote
as in your model run, so
climodr predicts with the models you created in this modelrun.
## Making Ta_200 lim-prediction for 201707.
## | | | 0% | |======= | 10% | |============== | 20% | |===================== | 30% | |============================ | 40% | |=================================== | 50% | |========================================== | 60% | |================================================= | 70% | |======================================================== | 80% | |=============================================================== | 90% | |======================================================================| 100%
Lets show the list of predictions:
## [1] "aoa/vignette_Ta_200_201707_LLO_lim_aoa.tif"
## [2] "vignette_Ta_200_201707_LLO_lim_prediction.tif"
For easier navigation and search the names of the predictions, as well as all other names created by using climodr, follow a pattern. They always consist of: The Mnote - the Sensor Name - the Date of the Scene - the Folds used during the model run - the model classifier - and the word prediction.
The climplot
function finally plots your predictions and
saves them in the output folder. It uses the plotting functions from the
terra
-package. These plots are very simple, but they
consist of all important information you’ll need in a map.
You can create plots of your predictions like this:
climplot(
envrmt = envrmt,
mnote = "vignette",
sensor = "Ta_200",
aoa = TRUE,
mapcolors = rev(heat.colors(50)),
scale_position = "bottomleft",
north_position = "topright"
)
## Creating maps for sensor [Ta_200].
So in the end you receive a map like the one we saw in the beginning.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.