Setup and the Basics

Building and running custom mechanistic eco-evolutionary biodiversity models

First things first, let’s get you up and running with gen3sis. In this setup guide, we will:

  1. Set up the R environment
  2. Understand the basics
  3. Run a basic simulation
  4. Customize simulations
  5. Troubleshoot

1. Setting up the R environment

We’ll need the gen3sis R package along with configuration and landscape files from the course repository. Please download the repo and create a new R script in the same directory.

  1. Go to https://github.com/ohagen/course_MEEM_SoSe26 and download the repository:

  1. Put it somewhere on your machine that you can readily access

  2. Unzip the folder

  3. Open up RStudio from the .Rproj

  4. Setting the needed packages

You can copy and paste code from the sections below, or use myscript.R.

# Function to check and install packages if they are not already installed
load_install_pkgs <- function(package_names) {
  for (package in package_names) {
    if (!require(package, character.only = TRUE)) {
      install.packages(package)
      print(paste("Installing", package))
      library(package, character.only = TRUE)
    }
  }
}

# Vector of packages to load and install if necessary
packages_to_load <- c("gen3sis", "here", "terra", "ape")

# Call the function with the vector of package names
load_install_pkgs(packages_to_load)

# print gen3sis version
print(paste("gen3sis version:", packageVersion("gen3sis")))
[1] "gen3sis version: 1.6.0"

You could also install the most recent version of the package from GitHub using devtools. But hold off on this today so we’re all working with the same version.

#install.packages("devtools")
#devtools::install_github(repo = "project-gen3sis/R-package", dependencies = TRUE)
#packageVersion("gen3sis")

Landscape and configuration files are bundled in this repository under data/space/ and data/config/. If you have an optional distances_full folder, place it inside data/space/ alongside distances_local to speed up some simulations.

# set path to directory containing the landscape
landscape_dir <- here("data/space")
# look at folder structure
list.files(landscape_dir)
[1] "distances_local" "landscapes.rds" 
# set path to config_file
config_file <- here("data/config/config_simple.R")

2. Understanding the basics

Gen3sis is an engine that simulates eco-evolutionary processes at the population level. It utilizes a landscape containing environmental variables that evolve over time, alongside a configuration file housing the eco-evolutionary rules. Read more here

2.1 Landscape

This tutorial utilizes a theoretical archipelago system. Each site 1x1 km, features temperature (i.e. mean, minimum and maximum temperature) and has a landscape structure that is generated by approximating topography, uplift dynamics, and lapse rate. Additionally, it incorporates global temperature and sea level changes dating back to the past 5 million years (Ma).
# load landscapes
lc <- readRDS(file.path(landscape_dir,"landscapes.rds"))
# class(lc) "list"
# get names of landscape variables
names(lc)
[1] "elevation" "mean_temp" "min_temp"  "max_temp"  "patch"    
# get first time step
first_step_pos <- ncol(lc$mean_temp)
# get first 10 sites of mean temperature for the 2 last time steps and the first (oldest) time step
lc$mean_temp[100:110, c(1:4, first_step_pos)]
       x    y        0        1      500
100 39.5 58.5       NA       NA       NA
101 40.5 58.5       NA       NA       NA
102 41.5 58.5       NA       NA       NA
103 42.5 58.5       NA       NA       NA
104 43.5 58.5       NA       NA       NA
105 44.5 58.5       NA       NA       NA
106 45.5 58.5       NA       NA       NA
107 46.5 58.5       NA       NA       NA
108 47.5 58.5       NA 23.82152       NA
109 48.5 58.5       NA 23.56207       NA
110 49.5 58.5 24.89379 23.39746 24.85717
# plot mean_temp for first and last time step
plot(terra::rast(lc$mean_temp[ ,c(1:3, first_step_pos)]))

🏋💻 Exercise [max 10 min] Create a way to visualize the last 100 time steps of this gen3sis input. If possible think of abstracting for any x,y,z1,z2,z3… temporal data-frame. If possible, try to make a function.

Answer:

# solution 1
for (ti in 100:0) {
  ri <- terra::rast(lc$elevation[,c("x", "y", as.character(ti))], type="xyz")
  plot(ri, main=paste(ti/100, "Ma")) # divide by 100 to get Ma since 1 time-step =10 kyr
  Sys.sleep(0.1)
}
# solution 2
# define animation function
plot_landenv <-  function(df, times=100:0, reverse=F, speed=0.1){
  # df is a data frame with x, y coordinates and time steps as columns with environmental variables
  # reverse is a boolean. Reverse the order of time steps?
  # times is either a vector of time steps or a character "all"
  # speed in seconds
  #df <- lc$temp
  if (times[1]=="all"){
   times <- names(df)[!names(df)%in%c("x", "y")] 
  }
  if (reverse){
    times <- rev(times)
  }
  for (ti in times){
    ri <- terra::rast(df[,c("x", "y", ti)], type="xyz")
    plot(ri, main=paste(as.numeric(ti)/100, "Ma"))
    Sys.sleep(speed)
  }
}
# call animation function
plot_landenv(lc$elevation, times=c(100:0), speed=0.1)
# example of plotting entire time series
# plot_landenv(lc$min_temp, times="all", reverse=T, speed=0.03)

2.2 Config

Open this config file in Rstudio or any text editor to see the rules that will be applied to the landscape.

# load config
cf <- create_input_config(config_file = here("data/config/config_simple.R"))
# list all main elements of the config file
names(cf$gen3sis)
[1] "general"        "initialization" "dispersal"      "speciation"    
[5] "mutation"       "ecology"       
# list all elements of the general section, i.e. the main settings and not so much on the eco-evolutionary processes
names(cf$gen3sis$general)
[1] "random_seed"                      "start_time"                      
[3] "end_time"                         "max_number_of_species"           
[5] "max_number_of_coexisting_species" "end_of_timestep_observer"        
[7] "trait_names"                      "environmental_ranges"            
[9] "verbose"                         

🏋💻 Exercise [max 10 min] Open the config file config_simple and go though the eco-evolutionary rules that will be applied to the landscape.

Answer:

Settings

We start our simulation at the latest avaiable time, i.e. 5Ma, which corresponds to time step 500 since 1 time-step = 10 kyr. We end it a the latest available time-step. Sometimes, simulations can go forever, to avoid this we limit the maximum total number of species alive in a simulation to 50000 with max_number_of_species and the maximum number of species within one site to 20000 with max_number_of_coexisting_species. This stops and flags simulations that generate too many species (time constrain). We define which traits we will consider in our simulation with traits_names, in our example, species have a dispersal ability a optimum temperature and a temperature width trait.

Observer

The observer function saves and plot changes (real-time during model execution) over time in the conditions of the virtual world (biotic and abiotic) by saving custom information at designated time steps.

Initialization

The create_ancestor_species function creates the first specie(s) in the simulation. In this case, we create one species spread across all avaiable sites with low dispersal ability, optimum temperature at 20 degrees C and a temperature width (+-1).

Dispersal

The dispersal function iterates over all species populations and determines the connectivity between sites, crucial for trait evolution and specialization, as well as the colonization of new sites. In our example, species dispersal is stochastic and uniform across all species. I.e. an exponential distribution with the rate denominator as the dispersal trait value.

Speciation

The speciation iterates over every species separately, registers populations’ geographic occupancy (species range), and determines when geographic isolation between population clusters is higher than a user-defined threshold, triggering a lineage splitting event of cladogenesis. The clustering of occupied sites is based on the species’ dispersal capacity and the landscape connection costs. Over time, disconnected clusters gradually accumulate incompatibility, analogous to genetic differentiation. When the divergence between clusters is above the speciation threshold, those clusters become two or more distinct species, and a divergence matrix reset follows. On the other hand, if geographic clusters come into secondary contact before the speciation occurs, they coalesce and incompatibilities are gradually reduced to zero.

Evolution

Think of it as trait evolution. Clustered populations (exchanging genes) have their trait homogenized. If weighted by abundance a trait of a population that is doing well in a site, as dictated by the ecology function, will contribute more to the average trait of a cluster. Populations mutate based on a normal distribution with standard deviation 0.001, possibly increasing or decreasing species optimum temperature.

Ecology

The ecology function determines the abundance or presence of the populations in occupied sites of each species. The function iterates over all occupied sites and updates the species population abundances or presences on the basis of local environmental values, updated co-occurrence patterns and species traits. In this example we use only presence/absence data, i.e. abundances 0 or 1.

3. Run a basic simulation

It’s time to run a simulation, we use the config_simple and archipelago system we are familiar with and set verbose to 2 in order to get more information on the progress of the simulation.

# run simulation
sim <- run_simulation(config = here("data/config/config_simple.R"), 
                      landscape = here("data/space"), 
                      output_directory = here("output"))

On sim we store the simulation output summary, while most of the information and large data is/should be stored in the output directory according to the observer function. If you don’t want to run the simulation, you can load the sim object from the repo.

sim_path <- here("output/config_simple/sgen3sis.rds")

if (file.exists(sim_path)) {
  sim <- readRDS(sim_path)
} else {
  message("Precomputed output not found at output/config_simple. Running the simple simulation now...")
  sim <- run_simulation(
    config = here("data/config/config_simple.R"),
    landscape = here("data/space"),
    output_directory = here("output"),
    verbose = 0
  )
}

In addition to the simulation output summary, we include a flag indicating whether the simulation completed successfully or the criteria for stopping, such as an excessive number of species. This section also provides system information and simulation-specific parameters.

#check elements inside the sim object
names(sim)
[1] "summary"    "flag"       "system"     "parameters"
# visualize the outputs
plot_summary(sim)

# plot richness from summary in custom fashion
na_mask <- is.na(lc$elevation[,"0"])
rich <- sim$summary$`richness-final`
rich[na_mask,3] <- NA
plot(terra::rast(rich, type="xyz"), col=c("grey", gen3sis::color_richness_non_CVDCBP(max(rich, na.rm=T))), main="Richness")

# plot richness at time step 32 using saved data
sps32 <- readRDS(here("output/config_simple/species/species_t_32.rds"))
lc32 <- readRDS(here("output/config_simple/landscapes/landscape_t_32.rds"))
plot_richness(sps32, lc32)

Beyond the custom storage mediated over the observer function, we have a standard phylogeny stored as phy.nex as well as a copy of the config used for the simulation.

phy <- read.nexus(file.path(here("output/config_simple/phy.nex")))
plot(phy)

🏋💻 Exercise [max 2 min] Think of 3 ways to change extinction rates?

4. Customize simulations

For an example of a more complex simulation, with species abundances, traits trading off and evolving check the config_M2_TH.R.

We will modify it so that allopatric speciation is faster.

conf <- create_input_config(here("data/config/config_M2_TH.R"))
conf$gen3sis$speciation$divergence_threshold = 30

We will also change the observer function to save the presence/absence matrix for each time step.

conf$gen3sis$general$end_of_timestep_observer = function(data, vars, config){
  plot_richness(data$all_species, data$landscape)
  # make p/a matrices
  out_dir <- config$directories$output
  if(!file.exists(file.path(out_dir,"occs"))){
    dir.create(file.path(out_dir, "occs"))
    }
  # cell names
  all_cells <- rownames(data$landscape$coordinates)
  # get 0 for absence and 1 for presence in each grid cell
  asp <- do.call(cbind,
                 lapply(data$all_species, FUN = function(x) {
                   ifelse(all_cells %in% names(x$abundance), 1, 0)
                  }))
  # colnames are species names
  colnames(asp ) <- unlist(lapply(data$all_species, function(x){x$id}))
  # column bind with x/y coordinates
  presence_absence_matrix <- cbind(data$landscape$coordinates, asp)
  saveRDS(presence_absence_matrix, 
          file=file.path(out_dir,"occs", paste0("pa_t_",vars$ti, ".rds")))
}
simod <- run_simulation(config=conf, 
                      landscape=here("data/space"), 
                      output_directory=tempdir())

The dynamics is different from the simple simulation, with more species and a some extinction events.

🏋💻 Exercise [15 min] Review the config_M2_TH.R file and try to understand what it’s doing. Consider how you might modify the configuration or apply it to a specific research question.

5. Troubleshoot

Creating or modifying a gen3sis configuration can definitely lead to some weird errors, especially since it’s so flexible. That’s the downside, along with the steep learning curve. But if you’re not too overwhelmed, you’re doing great!

Here are some handy debugging tips for when you run into those pesky errors:

browser(): This function lets you pause the execution and explore what’s going on. It’s like a pit stop where you can check out variables and step through the code.

To make your R session enter browser mode whenever you hit an error, you can use: options(error = recover). This will help you diagnose and fix issues more easily.

You can also condition browser calls, which is very useful when you want to stop execution at a specific time step or when a certain condition is met. Here’s an example:

# Use 'stop_time' to halt execution at a specific timestep in the landscape object:
stop_time <- "62"

get_dispersal_values <- function(n, species, landscape, config) {
    if (landscape$timestep == stop_time) {
        browser()
    }

    # You can also check the 'vars' object for the current timestep:
    vars <- dynGet("vars", inherits = TRUE)
    if (vars$ti == stop_time) {
        browser()
    }
}