Spatial Machine Learning with mlr3

Author

Marvin Ludwig

Published

Last Updated: March, 2025

1 Aims of this post

This post aims to give a minimal example on how to use mlr3 for a spatial prediction task. We want to get from measurements of temperature at specific locations in Spain to a spatially continuous map of temperature for all of Spain.

Such a spatial prediction task is often done by applying machine learning algorithms that are not necessarily developed for spatial tasks specifically and hence do not consider problems we might encounter in the spatial world, e.g., spatial autocorrelation or map extrapolation. In the last decade, a lot of methodological developments were made by various research groups to consider and deal with such specialties of spatial mapping. Many of which found their way in software packages such as mlr3.

2 Setup

Hide the code
install.packages("mlr3verse")
install.packages("mlr3spatiotempcv")
install.packages("sf")
install.packages("terra")
Hide the code
library(sf)
library(terra)

2.1 Casestudy data

Hide the code
covariates <- terra::rast("https://github.com/LOEK-RS/FOSSGIS2025-examples/raw/refs/heads/main/data/predictors.tif")
temperature <- sf::read_sf("https://github.com/LOEK-RS/FOSSGIS2025-examples/raw/refs/heads/main/data/temp_train.gpkg")

spain <- sf::read_sf("https://github.com/LOEK-RS/FOSSGIS2025-examples/raw/refs/heads/main/data/spain.gpkg") |>
    st_cast("POLYGON") |>
    st_transform(st_crs(temperature))

temperature <- terra::extract(covariates, temperature, bind = TRUE) |>
    sf::st_as_sf()

# the sf object cannot contain a column named "X" or "Y". Otherwise the task creation will fail because "Assertion on 'data' failed: Must have unique colnames"
temperature$X <- NULL
temperature$Y <- NULL

Some words about the terminology that is specific to the example data:

  • spain is the region outline - for visualization purposes and kNNDM setup
  • covariates are the spatially explicit data data to predict on (to prevent wording confusions with the predict function or the prediction)
  • temperature are the measured temperature data (i.e. the response variable, i.e., the ground truth) along with the covariates at the measurement location

2.2 Official Documentation

The best general introduction to mlr3 is probably the official mlr3 book which you can access for free here: https://mlr3book.mlr-org.com/. The homepage https://mlr-org.com/ also gives a very detailed overview of all the associated packages an current developments.
The individual packages also have their own documentation pages, where you will find more specific topics. The “spatial” packages are the following:

Hide the code
library(mlr3verse)
library(mlr3spatiotempcv)
library(mlr3viz)

mlr3 functions can be very verbose. For this blog post, I turned off messages for a less overwhelming tutorial.

Hide the code
lgr::get_logger("mlr3")$set_threshold("warn")
lgr::get_logger("bbotk")$set_threshold("warn")

3 Getting started with mlr3

First of all, mlr3 uses R6 classes and therefor an object-oriented design paradigm. This might be not intuitive for a lot of R users and more in line with something like scikit-learn of Python. In general, with R6 classes you define an object first, and this object contains all the methods and parameters you can access.

The R6 class in R has a significant drawback when it comes to RStudio’s convenience features. For example, it does not provide in-line popup help for available parameters in a model. This means you must manually look up the parameters, e.g., needed for the ranger() function.”

Let’s define a minimum spatial prediction task example with mlr3 classes:

mlr3 uses its own terminology. You have to know a lot of specific terms in order to comfortably use the functions. I try to use the mlr3 terminology here and give some alternative terms in parenthesis.

3.1 Task

The task defines general information about the data we have at hand. From our point observations we create the task and define what column the target (response / dependent) variable is, what column the geometry (coordinates) is and whether we want to use the coordinates as features (predictors). In this example, we also split the data in our task into a training and a test partition.

Hide the code
task_spain <- mlr3spatial::as_task_regr_st(
    temperature,
    target = "temp",
    coordinate_names = "geometry",
    coords_as_features = TRUE,
    crs = st_crs(temperature)
)

train_test_split <- mlr3::partition(task_spain, ratio = 0.7)
st_drop_geometry()

In caret, if we want to use spatial data, we have to specifically exclude the geometry column of the sf object. We lose the spatial information in the process of model training and prediction. However, this information can be critical, e.g., if we want to use CAST::errorprofiles() or CAST::knncv(). In mlr3 we can keep the geometry and can also directly define whether we want to use the coordinates as predictors.

3.2 Learner

The learner is the model type or algorithm we want to use with all its necessary or optional parameters. Here I define a Random Forest regression method from the ranger package with 100 decision trees and an mtry of 4.

Hide the code
rfmodel <- mlr3::lrn("regr.ranger", num.trees = 100, mtry = 4)

3.3 Measure

In order to evaluate the a score (model performance), e.g., from the previously defined test data partition, we need to define a measure. Here I use the Root Mean Squared Error (RMSE).

Hide the code
measure_rmse <- mlr3::msr("regr.rmse")

4 Training, Validation and Prediction

We now have everything defined to actually do something. To train the model we can use the train method from out defined learner. And we have to tell the train method on what to actually fit the model – in this case the data in task_spain, but only the subset we sampled as training data.

Hide the code
rfmodel$train(task_spain, row_ids = train_test_split$train)
rfmodel$model
Ranger result

Call:
 ranger::ranger(dependent.variable.name = task$target_names, data = task$data(),      case.weights = task$weights$weight, mtry = 4L, num.threads = 1L,      num.trees = 100L) 

Type:                             Regression 
Number of trees:                  100 
Sample size:                      136 
Number of independent variables:  22 
Mtry:                             4 
Target node size:                 5 
Variable importance mode:         none 
Splitrule:                        variance 
OOB prediction error (MSE):       1.040724 
R squared (OOB):                  0.8595686 

To test how well out model performs now on unseen data, we use the predict method of our fitted learner. Again, we tell the predict method on what data to predict. We can then calculate the score of this test prediction, i.e., the RMSE we defined earlier as out measure.

Hide the code
test_prediction <- rfmodel$predict(task_spain, row_ids = train_test_split$test)
test_prediction$score(measure_rmse)
regr.rmse 
 1.016704 
Hide the code
plot(test_prediction)

Finally, we can also use the model to predict temperatures for the whole raster by using the predict() function from terra. Alternatively, we could use predict_spatial() for the mlr3spatial package.

Hide the code
prediction <- terra::predict(covariates, rfmodel)
plot(prediction)

5 Spatial cross-validation hyperparameter tuning

In many cases, the ideal model hyperparameters are not obvious and we have the possibility to tune them based on the data we have.

5.1 Define the c-v strategy and hyperparameters

Again, we first define the necessary mlr3 objects for our tuning process. With rsmp we decide on a resampling strategy, i.e., the data partitions we use as cross-validation folds. The package mlr3spatiotemporalcv contains the popular spatial resampling strategies such as block CV (spcv_block used in the example below) or kNNDM.

For the learner instead of predefined hyperparameters, we use the function to_tune() where we can specify different values for the hyperparameters we want to test.

Hide the code
library(mlr3spatiotempcv)

resampling_blockcv <- rsmp("spcv_block", folds = 5, range = 5000)

rfmodel <- lrn(
    "regr.ranger",
    num.trees = 100,
    mtry = to_tune(c(2, 4, 6, 10, 12)),
    min.node.size = to_tune(c(5, 10, 15)),
    importance = "impurity"
)

Now we can assemble everything together in a tuning instance object with the ti() function. The first four arguments should be self explanatory: terminator means, if we want to stop with our search for the best hyperparameter combination after some criteria is met. store_benchmark_results and store_models regulates whether we want to keep the measures and models for all the hyperparameter combinations we test.

Finally, we have to decide on a strategy on how to search with the tuner object. Here I use grid_search which is the brute force method that tests every possible combination once. We then execute the cross-validation with the optimize method from the tuner object.

Hide the code
tuning_blockcv <- ti(
    task = task_spain, # the data
    resampling = resampling_blockcv, # the folds
    learner = rfmodel, # the rfmodel with mtry and min.node.size to tune
    measures = measure_rmse, # how to measure performance
    terminator = trm("none"),
    store_benchmark_result = TRUE,
    store_models = TRUE
)

tuner_grid_search <- mlr3tuning::tnr("grid_search")

tuner_grid_search$optimize(tuning_blockcv)
   min.node.size   mtry learner_param_vals  x_domain regr.rmse
          <char> <char>             <list>    <list>     <num>
1:            15     12          <list[5]> <list[2]> 0.8520169

The results are stored in the defined tuning instance object. We also have some nice plotting options with the mlr3viz package.

Hide the code
tuning_blockcv$result_learner_param_vals
$importance
[1] "impurity"

$num.threads
[1] 1

$num.trees
[1] 100

$min.node.size
[1] 15

$mtry
[1] 12
Hide the code
tuning_blockcv$result_y
regr.rmse 
0.8520169 
Hide the code
autoplot(tuning_blockcv, type = "parallel")

Because we set store_benchmark_result = TRUE and store_models = TRUE, we also have a archive where all the other results are stored.

Hide the code
tuning_blockcv$archive$data
    min.node.size   mtry regr.rmse warnings errors runtime_learners
           <char> <char>     <num>    <int>  <int>            <num>
 1:             5     12 0.8657247        0      0            0.153
 2:            10      6 0.9098643        0      0            0.106
 3:             5      6 0.8953927        0      0            0.111
 4:            15      6 0.9194065        0      0            0.076
 5:             5      2 1.1126819        0      0            0.075
 6:            10     10 0.8602758        0      0            0.111
 7:             5      4 0.9483508        0      0            0.087
 8:            15      2 1.2024457        0      0            0.055
 9:            15     12 0.8520169        0      0            0.103
10:            15     10 0.8758340        0      0            0.115
11:            10      2 1.1484717        0      0            0.064
12:            15      4 0.9986427        0      0            0.071
13:             5     10 0.8696826        0      0            0.139
14:            10     12 0.8588805        0      0            0.113
15:            10      4 0.9687282        0      0            0.073
                                   uhash  x_domain           timestamp batch_nr
                                  <char>    <list>              <POSc>    <int>
 1: f746d058-03dd-48ad-b1a5-6da70cc71c39 <list[2]> 2025-03-17 15:50:04        1
 2: 40f80983-3657-4845-9a59-d7fc85fdf6b0 <list[2]> 2025-03-17 15:50:04        2
 3: 32acbb67-833d-4c78-b135-1ea60ca52021 <list[2]> 2025-03-17 15:50:04        3
 4: 5b8d2e6a-608a-4caf-974a-424319bcb60a <list[2]> 2025-03-17 15:50:04        4
 5: bf116d61-c38b-46ec-8be5-ca42ab97b781 <list[2]> 2025-03-17 15:50:04        5
 6: 113e3414-881c-479b-8dd6-e942116fc9b6 <list[2]> 2025-03-17 15:50:05        6
 7: 3442f3ca-bcb5-4f28-a262-9a23470c01d7 <list[2]> 2025-03-17 15:50:05        7
 8: a9ab533a-0e98-419e-bfc7-e7dbdb2d0959 <list[2]> 2025-03-17 15:50:05        8
 9: dcdb50c6-df5d-40c7-ac4b-a065bd25414c <list[2]> 2025-03-17 15:50:05        9
10: 74510c03-3d05-4972-aea6-207ac92d7b29 <list[2]> 2025-03-17 15:50:05       10
11: 8feb5dc5-9c1d-4a79-9dac-ff90b3cadbd6 <list[2]> 2025-03-17 15:50:06       11
12: 88f2921f-2f54-4492-9466-ee329b35d2f7 <list[2]> 2025-03-17 15:50:06       12
13: db5cd572-cc23-4d5e-9bcb-0eeac93028fd <list[2]> 2025-03-17 15:50:06       13
14: 7221b6c4-6b04-4ba4-818d-00618199a778 <list[2]> 2025-03-17 15:50:06       14
15: b6197cbf-266f-4aa6-b4f8-7e53e3829b28 <list[2]> 2025-03-17 15:50:06       15

5.2 Final model with optimized hyperparameter

Once we found the optimal hyperparameter combination we can use them to train a final model on all the data. We define a new learner and assign the parameters from our tuning instance to it. Then we can train and predict again.

Hide the code
tuned_rfmodel <- lrn("regr.ranger")
tuned_rfmodel$param_set$values <- tuning_blockcv$result_learner_param_vals

tuned_rfmodel$train(task_spain)
tuned_prediction <- terra::predict(covariates, tuned_rfmodel)
plot(tuned_prediction)

6 Feature Selection

6.1 Define the feature selection strategy

Similar to the hyperparameter tuning, we can use the cross-validation strategy to test whether we can build a model with reduces predictors / covariates / features / variable. Here I demonstrate a forward variable selection, that starts out with the best performing pair of variables and appends additional variables, if there is still an increase in our cross-validation performance estimation.

First we have to define the feature selection strategy with fs(). I use a learner here with fixed hyperparameters. Everything is put together in the fselect() function that creates a feature selection instance in which all the information, methods and results are stored.

Hide the code
library(mlr3fselect)
library(mlr3filters)

select_mode <- fs("sequential", min_features = 2)

fs_rfmodel <- lrn(
    "regr.ranger",
    num.trees = 50,
    mtry = 2,
    min.node.size = 5,
    importance = "impurity"
)

set.seed(20)
feature_selection <- fselect(
    fselector = select_mode,
    task = task_spain,
    learner = fs_rfmodel,
    resampling = resampling_blockcv,
    measure = measure_rmse
)
Hide the code
# selected variables
feature_selection$result_feature_set
[1] "Y"         "dem"       "lst_day"   "lst_night" "ntl"       "popdens"  
Hide the code
# CV RMSE with the selected variables
feature_selection$result$regr.rmse
[1] 0.7795219

6.2 Final model with selected features

Once we found the ideal combination of variables, we can reduce our task to those predictors, train a model and predict.

Hide the code
fs_task <- task_spain$select(feature_selection$result_feature_set)

fs_rfmodel$train(fs_task)

fs_prediction <- terra::predict(covariates, fs_rfmodel)
plot(fs_prediction)

7 Conclusions

  • Utilizes an object-oriented approach with R6 classes, offering flexibility but requiring familiarity with this system in R.
  • Possible target audience: machine learning experts and Python users transitioning to R, especially those familiar with object-oriented programming.

7.1 Strengths

  • Spatially explicit methods are directly implemented: there is no need to exclude the geometry column of the sf object, allowing for the retention of spatial information.
  • Active development: continuously updated with improvements and bug fixes.
  • Very good introduction to the framework through the book: The book offers a well-structured guide for learning the mlr3 framework.
  • Well-documented functions: Clear and comprehensive documentation helps users understand the functions easily.

7.2 Weaknesses

  • Steep learning curve: The framework’s complexity can be challenging for beginners.
  • Overwhelming number of methods and options: The extensive methods can be difficult to navigate for new users.
  • Confusing help pages due to R6 logic: Understanding the R6 logic in help files can be tough for those unfamiliar with the system.
  • No in-Line help in RStudio: Lack of in-line help for R6 methods in RStudio makes it harder to access function details quickly.