Skip to contents

A Novel Framework to visualise trait dispersion and assess species invasiveness or site invasibility


1. Introduction

Biological invasions are a major driver of biodiversity loss. Invasive alien species can spread quickly, alter ecosystem processes, and displace native taxa. Because invasion outcomes depend jointly on functional traits, resident communities, and local environments, ad-hoc analyses are not enough. We need a transparent, reproducible way to quantify establishment potential at specific sites and to compare that potential across species and landscapes.

invasimapr fills this gap. It is a trait-based, site-specific R package that estimates invasion fitness for candidate invaders, assembles the resident community context that constrains establishment, and turns these quantities into mappable indicators for decision making. The workflow links three pillars: a functional trait space that governs competitive overlap, environmental suitability that determines how well species perform at a site, and biotic competition from residents that reduces the chance of establishment. By fitting a single trait–environment model and reusing it to derive both invader growth and resident context, the package keeps assumptions coherent and results easy to interpret.

This milestone report describes the rationale, model components, and software design of invasimapr. We define the ecological quantities we estimate, show how raw data are prepared and validated, and explain how outputs are summarised as invasibility (how open a site is to newcomers) and invasiveness (how prone a species is to establish across sites). The implementation relies on widely used methods such as GLMM or GAM for trait–environment responses and standard distance measures such as Gower for trait and environmental dissimilarities, which keeps the approach accessible and extensible.

Practically, the package is modular and reproducible. Each step can be run independently or as a full pipeline, inputs and assumptions are explicit, and all intermediate objects can be inspected. The end result is a consistent set of site-level maps and species-level rankings that support surveillance, prioritisation, and scenario testing under changing environments.


2. Context and model overview

The figure below summarizes the invasimapr pipeline from inputs to decision-ready outputs. Starting with species traits, site environments \(\mathbf{E}_s\), and resident communities, the workflow derives intermediate quantities in two coordinated tracks: a trait track that yields pairwise distances and a competition kernel \(\alpha_{ij}\), and an environment track that yields resident optima \(\theta_j\), site-resident mismatch \(\Delta_{js}\), an environmental kernel \(K_e\), and resident context \(N^{*}_{js}\). These parts are assembled into an impact tensor \(I_{ijs}\), aggregated to a penalty \(C^{\mathrm{(raw)}}_{is}\), and combined with predicted intrinsic growth \(r_{is}\) to produce invasion fitness \(\lambda_{is}\). Site- and species-level summaries (invasibility \(V_s\) and invasiveness \(V_i\)) support mapping and prioritisation.
 

Figure 1: Traits and environment feed two tracks - (1) competition via trait distances → \(\alpha_{ij}\), and (2) environmental filtering via optima and mismatch → \(K_e(\Delta_{js};\sigma_e)\) plus resident context \(N^{*}_{js}\). These combine into an impact tensor \(I_{ijs}\), which sums to the penalty \(C^{\mathrm{(raw)}}_{is}\). Subtracting this from the intrinsic growth \(r_{is}\) gives invasion fitness \(\lambda_{is}\), which aggregates to invasibility \(V_s\) and invasiveness \(V_i\) and can be mapped.

2.1. Inputs

  • Species traits: measurable characteristics (e.g., height, seed mass, leaf area, diet) that shape how species use resources and interact. Traits let us compare species on a common ecological scale.
  • Site environment \(\mathbf{E}_s\): local conditions (climate, soils, habitat structure) that determine whether a species can survive and grow at a place.
  • Residents: the species already established at a site (the resident community). Their presence sets the biotic backdrop into which newcomers arrive.
  • Candidate invaders: the species being evaluated. They may be real introductions or simulated species drawn from trait distributions to explore “what-if” scenarios.

The trait-environment response model ties traits and environments to observed abundances to generate consistent predictions for both invaders and residents.


2.2. Invasion fitness

Invasion fitness is the net potential for a newcomer to increase when rare. It combines how well the site suits the species (its predicted growth) with how much the resident community pushes back (competitive penalty). Positive values mean establishment is feasible; negative values indicate likely exclusion.

\[ \mathbf{\lambda_{is} \;=\; r_{is} \;-\; C^{\mathrm{(raw)}}_{is}} \]


2.3. Components of \(\lambda_{is}\)

The invader fitness \(\lambda_{is}\) at site \(s\) is built from three modules:

  1. Intrinsic growth \(r_{is}\): the abiotic match of species \(i\) to site \(s\).
  2. Competitive pressure \(C^{\mathrm{(raw)}}_{is}\): suppression exerted by residents.
  3. Interaction tensor \(I_{ijs}\): trait overlap): pairwise impact combining trait similarity, environmental suitability, and resident abundance.

2.3.1. Intrinsic growth (predicted growth potential)

This is the growth or abundance a species is expected to achieve at site \(s\) without competitors, derived from the trait-environment model. It captures environmental suitability and any trait effects independent of competition.

\(r_{is}\)

Thus a high \(r_{is}\) signals a good abiotic match, while a low \(r_{is}\) signals environmental mismatch.


2.3.2. Competitive pressure (penalty from residents)

The penalty sums how strongly all residents suppress the invader at a site:

\[ C^{\mathrm{(raw)}}_{is} \;=\; \sum_{j \neq i} I_{ijs} \]

This penalty integrates competition (trait similarity), environmental suitability/filtering (resident suitability), and resident abundance (how common residents are).

i) Functional trait space → competition

Species are positioned in a functional trait space (e.g. PCoA) where distances (e.g. Euclidean) reflect ecological similarity. Two summaries help interpret residents:

Pairwise similarity:

\[ \alpha_{ij} \;=\; \exp\!\left(-\frac{d_{ij}^2}{2\sigma_t^2}\right), \quad d_{ij}=\|\mathbf{z}_i-\mathbf{z}_j\|_2. \]

In this way, similarity is translated into a competition coefficient with a Gaussian kernel. Here \(d_{ij}\) is trait dissimilarity (0 = identical, 1 = very different) and \(\sigma_t\) controls how quickly competition fades as species diverge. The general relation \(g^{\mathrm{(all)}}_{ij}\) is computed over the full species set to provide consistent distances.

Centrality (formal definition):

Let \(p_{i}\) be normalised abundances, \(\mathbf{z}_i\) the coordinates in PCoA space, and

\[ \pmb{\mu} = \sum_i p_i \mathbf{z}_i, \quad c_i = \|\mathbf{z}_i - \pmb{\mu}\|_2. \]

Thus trait centrality is how close a species is to the centre of the resident trait cloud; central residents typically overlap more with others.

Dispersion metrics (community-level):

\[ \mathrm{FDis} = \sum_i p_i c_i, \]

\[ \mathrm{FRic} = \mathrm{Vol}(\mathrm{ConvHull}\{\mathbf{z}_i\}), \]

\[ \mathrm{RaoQ} = \tfrac{1}{2}\sum_i\sum_j p_i p_j\, d_{ij}. \]

Thus trait dispersion is how spread out residents are. Greater spread often means less average overlap and weaker competition.

All together, similarity, trait centrality and dispersion, quantify the “shape” of the trait cloud influencing overlap and competition.

ii) Environmental filtering of residents

Residents suppress invaders most where they perform well. We quantify the match between site conditions and each resident’s optimum with a kernel in environmental space:

\[ K_e(\Delta_{js}; \sigma_e) \;=\; \exp\!\left(-\frac{\Delta_{js}^2}{2\sigma_e^2}\right), \quad \Delta_{js} \;=\; \mathrm{Gower}\!\big(\mathbf{E}_s, \theta_j\big), \]

where \(\theta_j\) is resident \(j\)’s abundance-weighted environmental optimum. Small \(\Delta_{js}\) (good match) yields large \(K_e\); large mismatch down-weights that resident’s effect.

iii) Resident abundance (context)

Abundant residents exert more pressure. Typical abundance of resident \(j\) at site \(s\) is predicted from the trait-environment model:

\[ N^{*}_{js} \;=\; f_{\text{env}}\!\left(\mathbf{E}_s, \theta_j, \mathbf{z}_j\right), \]

where \(f_{\text{env}}\) is a fitted response (e.g., GLMM/GAM) and \(\mathbf{z}_j\) are any extra predictors.


2.3.3. Interaction strength (impact tensor)

The impact tensor combines trait overlap, resident suitability, and resident abundance into pairwise effects:

\[ I_{ijs} \;=\; \alpha_{ij} \; K_e(\Delta_{js}; \sigma_e) \; N^{*}_{js}. \]

Thus the suppression of invader \(j\) at site \(s\) will increase when: 1. Species \(i\) and \(j\) are very similar = large \(\alpha_{ij}\) 2. Residents \(j\) are well-matched with environment \(\mathbf{E}_s\) = large \(K_e\) 3. Residents \(j\) are very common/thriving = large \(N^{*}_{js}\)

Note: Summing \(I_{ijs}\) over residents \(j\) creates the site-level penalty \(C^{\mathrm{(raw)}}_{is}\).


2.4. Aggregated outputs

2.4.1. Invasiveness of species (propensity to invade)

How readily a species establishes across sites. We aggregate fitness over space to rank potential invaders.

\[ V_i \;=\; \sum_{s} \lambda_{is} \]

Thus higher \(V_i\) suggests species \(i\) tends to be more invasive across all sites i.e. many sites are suitable and/or less resistant to invasion by species \(i\).

2.4.2. Invasibility of sites (openness)

How open a site is to new species. We aggregate the fitness matrix over invaders to a site-level score that can be mapped or tracked through time.

\[ V_s \;=\; \frac{1}{I}\sum_{i} \mathbb{I}\{\lambda_{is} > 0\} \]

Thus higher \(V_s\) suggests more invaders are expected to establish at site \(s\).


3. Overview of the invasimapr R package conceptual workflow

This tutorial walks through the full invasimapr workflow for quantifying, mapping, and interpreting invasion fitness \(\lambda_{is}\). The pipeline integrates information from traits, environments, and resident communities, returning explicit intermediate objects so you can inspect, audit, and reuse results.

Figure 2: Conceptual overview of the invasimapr framework: Intrinsic growth potential (\(r_{is}\)) derives from trait-environment matching in the absence of competitors. Resident communities impose competitive pressure (\(C_{is}\)) through trait similarity, resident environmental matching, and resident abundance. These components combine into invasion fitness (\(\lambda_{is}\)), which aggregates to species-level invasiveness (\(V_i\)) and site-level invasibility (\(V_s\)).


  1. Inputs and setup
    Provide site environments \(\mathbf{E}_s\), resident occurrence or abundance, species traits, and consistent species and site identifiers. Load packages, set seeds, and ensure that tables align on IDs so later steps can estimate intrinsic growth \(\;r_{is}\), trait similarity \(\;d_{ij}\), and resident context \(\;N^{*}_{js}\).

  2. Data preparation
    Use get_trait_data() to collect, clean and standardise trait variables, harmonise units, resolve missing values, and align dimensions across sites and species. The output is a tidy trait table and matched site × species matrices ready for analysis.

  3. Trait space to competition
    Use compute_trait_space() to build a functional trait space and compute pairwise trait distances \(\;d_{ij}\). Convert distances to competition coefficients \(\alpha_{ij}\) with a Gaussian kernel whose bandwidth \(\sigma_t\) can be estimated from resident dispersion or set explicitly. The result is an invader × resident matrix of \(\alpha_{ij}\) that reflects how strongly species compete as a function of trait similarity.

  4. Environment and resident context
    Use build_glmm_formula() to first construct the model formula. Simulate hypothetical invaders with traits with simulate_invaders(). Estimate each resident’s environmental optimum \(\theta_j\), compute site-resident mismatch \(\Delta_{js} = \mathrm{Gower}(\mathbf{E}_s, \theta_j)\) (see \(\mathbf{E}_s\) and \(\theta_j\)), and transform mismatch into an environmental weight \(K_e(\Delta_{js};\sigma_e)\) with bandwidth \(\sigma_e\) using compute_environment_kernel(). Obtain resident context \(N^{*}_{js}\) as predicted or typical abundance per site, so resident effects are strongest where they are well matched and common.

  5. Trait-environment response for intrinsic growth
    Fit a trait-environment model to predict intrinsic growth \(r_{is}\) or an abundance proxy for candidate invaders at each site with predict_invader_response(). This yields the invader × site matrix \(\;r_{is}\), representing expected performance in the absence of competition.

  6. Assemble impacts and penalties
    Estimate pairwise biotic influence potentials by computing interaction strengths \(I_{ijs}\) with compute_interaction_strength(). Then combine competition \(\alpha_{ij}\), environmental weight \(K_e\), and resident context \(\;N^{*}_{js}\) into the impact tensor \(I_{ijs}\). Sum impacts over residents to obtain the total competitive penalty \(C^{\mathrm{(raw)}}_{is}\) for each invader at each site.

  7. Compute invasion fitness and variants
    Calculate invasion fitness as intrinsic growth minus penalty, \(\lambda_{is} = r_{is} - C^{\mathrm{(raw)}}_{is}\) using compute_invasion_fitness(). Optionally report scaled fitness that divides the penalty by richness, a relative-abundance version that emphasises composition, and a logistic-capped version that smoothly limits extreme penalties.

  8. Summaries, maps, and interpretation
    Aggregate the fitness matrix to site-level invasibility \(V_s\) and species-level invasiveness \(V_i\). Map \(\lambda_{is}\), \(V_s\), and \(V_i\), explore uncertainty and sensitivity, and create concise products for management and reporting.

Note. Different summaries can be used. For example, \(V_s\) can be the mean of \(\lambda_{is}\) over species \(i\); and \(V_i\) can be the mean across sites or a sum over positive values only, depending on management goals.

Step-by-step Workflow

1. Install and load invasimapr

Install and load the invasimapr package from GitHub, ensuring all functions are available for use in the workflow.

# # install remotes if needed
# install.packages("remotes")
# remotes::install_github("b-cubed-eu/invasimapr")

# Ensure the package is loaded when knitting
library(invasimapr)
sessionInfo()$otherPkgs$invasimapr$Version

# Make sure all the functions are loaded
devtools::load_all() # alternative during local development

2. Load other R libraries

Load core libraries for spatial processing, biodiversity modelling, and visualization required across the invasimapr analysis pipeline.

# Load essential packages
# --- Data Wrangling and Manipulation ---
library(dplyr)          # Tidy data manipulation (mutate, select, filter, etc.)
library(tidyr)          # Reshape data (wide - long, pivot functions)
library(tibble)         # Modern lightweight data frames (tibble objects)
library(purrr)          # Functional iteration (e.g. map())

# --- String and Factor Utilities ---
library(stringr)        # String pattern manipulation (e.g. str_detect())
library(fastDummies)    # Quickly create dummy/one-hot variables for factors

# --- Data Visualization ---
library(ggplot2)        # Grammar-of-graphics plotting
library(viridis)        # Colorblind-friendly palettes for ggplot2
library(factoextra)     # Visualize clustering e.g. fviz_nbclust, silhouettes
library(pheatmap)       # Pretty Heatmaps
library(grid)           # Grid Graphics Package

# --- Spatial Data ---
library(sf)             # Spatial vector data (simple features)
library(terra)          # Raster and spatial data operations

# --- Statistical and Ecological Modelling ---
library(glmmTMB)        # Fit GLMMs (Generalized Linear Mixed Models)
library(MASS)           # Kernel density estimation (kde2d, etc.)
library(cluster)        # Clustering algorithms, Gower distance, diagnostics
library(geometry)       # Convex hulls, volumes
library(ClustGeo)       # Spatially constrained clustering

# --- Model Performance and Diagnostics ---
# library(performance)  # Model checking and diagnostics
# options(warn = -1)

3. Data access and preparation using dissmapr

To acquire and prepare species occurrence data for biodiversity modelling using the dissmapr package, a series of modular functions streamline the workflow from raw observations to spatially aligned environmental predictors.

3.1. Install dissmapr

Install and load the dissmapr package from GitHub, ensuring all functions are available for use in the workflow.

# # install remotes if needed
# install.packages("remotes")
# remotes::install_github("b-cubed-eu/dissmapr")

# Ensure the package is loaded
library(dissmapr)
sessionInfo()$otherPkgs$dissmapr$Version
#> [1] "0.1.0"

3.2. Import and harmonise biodiversity-occurrence data

The process begins with dissmapr::get_occurrence_data(), which imports biodiversity records, such as a GBIF butterfly dataset for South Africa, and harmonizes them into standardised formats. Input sources can include local CSV files, URLs, or zipped GBIF downloads. The function filters data by taxon and region, returning both raw records and site-by-species matrices in presence-absence or abundance form.

# Use local GBIF data
bfly_data = dissmapr::get_occurrence_data(
  data = system.file("extdata", "gbif_butterflies.csv", package = "invasimapr"),
  source_type = "local_csv",
  sep = "\t"
)

# Check results but only a subset of columns to fit in console
dim(bfly_data)
#> [1] 81825    52
# str(bfly_data[,c(51,52,22,23,1,14,16,17,30)])
head(bfly_data[, c(51, 52, 22, 23, 1, 14, 16, 17, 30)])
#>   site_id pa         y        x    gbifID             verbatimScientificName countryCode
#> 1       1  1 -34.42086 19.24410 923051749                   Pieris brassicae          ZA
#> 2       2  1 -33.96044 18.75564 922985630                   Pieris brassicae          ZA
#> 3       3  1 -33.91651 18.40321 922619348 Papilio demodocus subsp. demodocus          ZA
#> 4       1  1 -34.42086 19.24410 922426210 Mylothris agathina subsp. agathina          ZA
#> 5       4  1 -34.35024 18.47488 921650584                  Eutricha capensis          ZA
#> 6       5  1 -33.58570 25.65097 921485695            Drepanogynis bifasciata          ZA
#>                                            locality        eventDate
#> 1                                          Hermanus 2012-10-13T00:00
#> 2                                   Polkadraai Road 2012-11-01T00:00
#> 3                                       Signal Hill 2012-10-31T00:00
#> 4                                          Hermanus 2012-10-13T00:00
#> 5 Cape of Good Hope / Cape Point Area, South Africa 2012-10-30T00:00
#> 6                             Kudu Ridge Game Lodge 2012-10-23T00:00

3.3. Format biodiversity records to long/wide formats

Next, dissmapr::format_df() restructures the raw records into tidy long and wide formats. This assigns unique site IDs, extracts key fields (coordinates, species names, observation values), and prepares two main outputs: site_obs (long format for mapping) and site_spp (wide format for species-level analysis).

# Continue from GBIF data
bfly_result = dissmapr::format_df(
  data        = bfly_data, # A `data.frame` of biodiversity records
  species_col = "verbatimScientificName", # Name of species column (required for `"long"`)
  value_col   = "pa", # Name of value column (e.g. presence/abundance; for `"long"`)
  extra_cols  = NULL, # Character vector of other columns to keep
  format      = "long" # Either`"long"` or `"wide"`
)

# Check `bfly_result` structure
str(bfly_result, max.level = 1)
#> List of 2
#>  $ site_obs:'data.frame':    79953 obs. of  5 variables:
#>  $ site_spp: tibble [56,090 × 2,871] (S3: tbl_df/tbl/data.frame)

# Optional: Create new objects from list items
site_obs = bfly_result$site_obs
site_spp = bfly_result$site_spp

# Check results
dim(site_obs)
#> [1] 79953     5
head(site_obs)
#>   site_id        x         y                            species value
#> 1       1 19.24410 -34.42086                   Pieris brassicae     1
#> 2       2 18.75564 -33.96044                   Pieris brassicae     1
#> 3       3 18.40321 -33.91651 Papilio demodocus subsp. demodocus     1
#> 4       1 19.24410 -34.42086 Mylothris agathina subsp. agathina     1
#> 5       4 18.47488 -34.35024                  Eutricha capensis     1
#> 6       5 25.65097 -33.58570            Drepanogynis bifasciata     1

dim(site_spp)
#> [1] 56090  2871
head(site_spp[, 1:6])
#> # A tibble: 6 × 6
#>   site_id     x     y `Mylothris agathina subsp. agathina` `Pieris brassicae` `Tarucus thespis`
#>     <int> <dbl> <dbl>                                <dbl>              <dbl>             <dbl>
#> 1       1  19.2 -34.4                                    1                  1                 1
#> 2       2  18.8 -34.0                                    0                  1                 0
#> 3       3  18.4 -33.9                                    0                  0                 0
#> 4       4  18.5 -34.4                                    0                  0                 0
#> 5       5  25.7 -33.6                                    0                  0                 0
#> 6       6  22.2 -33.6                                    0                  0                 0

#### Get parameters from processed data to use later
# Number of species
(n_sp = dim(site_spp)[2] - 3)
#> [1] 2868

# Species names
sp_cols = names(site_spp)[-c(1:3)]
sp_cols[1:10]
#>  [1] "Mylothris agathina subsp. agathina" "Pieris brassicae"                  
#>  [3] "Tarucus thespis"                    "Acraea horta"                      
#>  [5] "Danaus chrysippus"                  "Papilio demodocus subsp. demodocus"
#>  [7] "Eutricha capensis"                  "Mesocelis monticola"               
#>  [9] "Vanessa cardui"                     "Cuneisigna obstans"

3.4. Generate spatial grid and gridded summaries

To integrate the data spatially, dissmapr::generate_grid() overlays a user-defined spatial lattice (e.g. 0.5° grid), aggregates biodiversity observations per grid cell, and computes standardised metrics such as species richness and observation effort. Outputs include gridded species matrices (grid_spp, grid_spp_pa), a spatial polygon (grid_sf), and raster layers (grid_r), enabling downstream spatial modelling.

# Load the national boundary
rsa = sf::st_read(system.file("extdata", "rsa.shp", package = "invasimapr"))
#> Reading layer `rsa' from data source 
#>   `D:\Methods\R\myR_Packages\b-cubed-versions\invasimapr\inst\extdata\rsa.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 11 features and 8 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 16.45189 ymin: -34.83417 xmax: 32.94498 ymax: -22.12503
#> Geodetic CRS:  WGS 84

# Choose a working resolution
res = 0.5 # decimal degrees° (≈ 55 km at the equator)

# Convert the AoI to a 'terra' vector
rsa_vect = terra::vect(rsa)

# Initialise a blank raster template
grid = terra::rast(rsa_vect, resolution = res, crs = terra::crs(rsa_vect))

# Populate the raster with placeholder values
terra::values(grid) = 1

# Clip the raster to the AoI
grid_masked = terra::mask(grid, rsa_vect)

# Generate a 0.5° grid summary for the point dataset `site_spp`
grid_list = dissmapr::generate_grid(
  data          = site_spp, # point data with x/y + species columns
  x_col         = "x", # longitude column
  y_col         = "y", # latitude  column
  grid_size     = 0.5, # cell size in degrees
  sum_cols      = 4:ncol(site_spp), # columns to aggregate * could also use `names(site_spp)[4:ncol(site_spp)]`
  crs_epsg      = 4326 # WGS84
)

# Inspect the returned list
str(grid_list, max.level = 1)
#> List of 4
#>  $ grid_r     :S4 class 'SpatRaster' [package "terra"]
#>  $ grid_sf    :Classes 'sf' and 'data.frame':    1110 obs. of  8 variables:
#>   ..- attr(*, "sf_column")= chr "geometry"
#>   ..- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA
#>   .. ..- attr(*, "names")= chr [1:7] "centroid_lon" "centroid_lat" "grid_id" "mapsheet" ...
#>  $ grid_spp   : tibble [415 × 2,874] (S3: tbl_df/tbl/data.frame)
#>  $ grid_spp_pa: tibble [415 × 2,874] (S3: tbl_df/tbl/data.frame)

# (Optional) Promote list items to named objects
grid_r = grid_list$grid_r$grid_id # raster
grid_sf = grid_list$grid_sf # polygons for mapping or joins
grid_spp = grid_list$grid_spp # tabular summary per cell
grid_spp_pa = grid_list$grid_spp_pa # presence/absence summary

# Quick checks
dim(grid_sf) # ; head(grid_sf)
#> [1] 1110    8
dim(grid_spp) # ; head(grid_spp[, 1:8])
#> [1]  415 2874
dim(grid_spp_pa) # ; head(grid_spp_pa[, 1:8])
#> [1]  415 2874

Visualise species richess per grid cell
Plot square root transformed sampling effort and species richness, in a 1x2 side by side. Each map uses the viridisLite::turbo palette and is overlaid with the study area outline for spatial context.

# Extract & stretch the layers
effRich_r = sqrt(grid_list$grid_r[[c("obs_sum", "spp_rich")]])

# Open a 1×2 layout and plot each layer + outline
old_par = par(
  mfrow = c(1, 2), # multi‐figure by row: 1 row and 2 columns
  mar = c(1, 1, 1, 2)
) # margins sizes: bottom (1 lines)|left (1)|top (1)|right (2)

for (i in 1:2) {
  plot(effRich_r[[i]],
    col = viridisLite::turbo(100),
    colNA = NA,
    axes = FALSE,
    main = c(
      "Sampling effort (√obs count)",
      "Species richness (√unique count)"
    )[i],
    cex.main = 0.8
  ) # ← smaller title)
  plot(terra::vect(rsa), add = TRUE, border = "black", lwd = 0.4)
}

# Reset default plot parameters
par(old_par) # reset plotting parameters

Figure 3: Spatial distribution of sampling effort and species richness across the study area. Each grid cell shows the square root–transformed values for (left) total observation counts (sampling effort) and (right) unique species counts (species richness). Both maps use the viridisLite::turbo color palette for comparability and are overlaid with the study area outline to provide spatial context.

Environmental predictors are appended using dissmapr::get_enviro_data(), which buffers the grid, downloads raster data (e.g. WorldClim bioclimatic variables), resamples it, and links values to grid-cell centroids. This produces both a site-by-environment data frame (env_df) and a SpatRaster object (env_r), aligning biological and environmental data.

Begin by reading in a predefined target species list, then filter a site-by-species dataset (grid_spp) to retain only relevant species observations, and reshape the data for further analysis. This produces both a filtered long-format dataset (grid_obs) and a cleaned wide-format site-by-species matrix (grid_spp).

# Read in target species list
species = read.csv(system.file("extdata",
  "rsa_butterfly_species_names_n27_100plus.csv",
  package = "invasimapr"
), stringsAsFactors = FALSE)$species

# Filter `grid_spp` and convert to long-format
grid_obs = grid_spp %>%
  dplyr::select(-mapsheet) %>% # Drop mapsheet metadata
  pivot_longer(
    cols = -c(grid_id, centroid_lon, centroid_lat, obs_sum, spp_rich), # Keep core metadata columns only
    names_to = "species",
    values_to = "count",
    values_drop_na = TRUE
  ) %>%
  filter(
    # obs_sum > 100,                                   # Only high-observation sites
    count > 0, # Remove absent species
    species %in% !!species # Keep only target species
  ) %>%
  rename(
    site_id = grid_id, # Change 'grid_id' to 'site_id'
    x = centroid_lon, # Change 'centroid_lon' to 'x'
    y = centroid_lat # Change 'centroid_lat' to 'y'
  ) %>%
  relocate(site_id, x, y, obs_sum, spp_rich, species, count)

dim(grid_obs)
#> [1] 1737    7
head(grid_obs)
#> # A tibble: 6 × 7
#>   site_id     x     y obs_sum spp_rich species                    count
#>   <chr>   <dbl> <dbl>   <dbl>    <dbl> <chr>                      <dbl>
#> 1 1027     29.2 -22.3      41       31 Utetheisa pulchella            1
#> 2 1029     30.3 -22.3       7        7 Danaus chrysippus orientis     1
#> 3 1029     30.3 -22.3       7        7 Telchinia serena               1
#> 4 1031     31.3 -22.3     107       76 Vanessa cardui                 1
#> 5 1031     31.3 -22.3     107       76 Utetheisa pulchella            2
#> 6 1031     31.3 -22.3     107       76 Hypolimnas misippus            2
length(unique(grid_obs$species))
#> [1] 27
length(unique(grid_obs$site_id))
#> [1] 314

# Reshape site-by-species matrix to wide format and clean
grid_spp = grid_obs %>%
  pivot_wider(
    names_from = species,
    values_from = count,
    values_fill = 0 # Replace missing counts with 0
  )

dim(grid_spp)
#> [1] 314  32
# head(grid_spp)

Then proceed to retrieve and process environmental data using dissmapr::get_enviro_data(). In the example below, 19 bioclimatic variables are downloaded from WorldClim v2.1 (≈10 km resolution) for all site centroids in the grid_spp dataset. It performs the following steps:

  1. Retrieves WorldClim “bio” variables via the geodata interface.
  2. Buffers the area of interest (AOI) by 10 km.
  3. Retains site-level metadata (obs_sum, spp_rich) and excludes species columns.
# Retrieve 19 bioclim layers (≈10-km, WorldClim v2.1) for all grid centroids
data_path = "inst/extdata" # cache folder for rasters
enviro_list = dissmapr::get_enviro_data(
  data       = grid_spp, # centroids + obs_sum + spp_rich
  buffer_km  = 10, # pad the AOI slightly
  source     = "geodata", # WorldClim/SoilGrids interface
  var        = "bio", # bioclim variable set
  res        = 5, # 5-arc-min ≈ 10 km
  grid_r     = grid_r, # To set resampling resolution, if necessary
  path       = data_path,
  sp_cols    = 7:ncol(grid_spp), # ignore species columns
  ext_cols   = c("obs_sum", "spp_rich") # carry effort & richness through
)

# Quick checks
str(enviro_list, max.level = 1)
#> List of 3
#>  $ env_rast:S4 class 'SpatRaster' [package "terra"]
#>  $ sites_sf: sf [314 × 2] (S3: sf/tbl_df/tbl/data.frame)
#>   ..- attr(*, "sf_column")= chr "geometry"
#>   ..- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA
#>   .. ..- attr(*, "names")= chr "site_id"
#>  $ env_df  : tibble [314 × 24] (S3: tbl_df/tbl/data.frame)

# (Optional) Assign concise layer names for readability
# Find names here https://www.worldclim.org/data/bioclim.html
names_env = c(
  "temp_mean", "mdr", "iso", "temp_sea", "temp_max", "temp_min",
  "temp_range", "temp_wetQ", "temp_dryQ", "temp_warmQ",
  "temp_coldQ", "rain_mean", "rain_wet", "rain_dry",
  "rain_sea", "rain_wetQ", "rain_dryQ", "rain_warmQ", "rain_coldQ"
)
names(enviro_list$env_rast) = names_env

# (Optional) Promote frequently-used objects
env_r = enviro_list$env_rast # cropped climate stack
env_df = enviro_list$env_df # site × environment data-frame

# Quick checks
env_r
#> class       : SpatRaster 
#> size        : 30, 37, 19  (nrow, ncol, nlyr)
#> resolution  : 0.5, 0.5  (x, y)
#> extent      : 15.5, 34, -36, -21  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (EPSG:4326) 
#> source(s)   : memory
#> names       : temp_mean,       mdr,      iso, temp_sea, temp_max,  temp_min, ... 
#> min values  :  9.779773,  8.977007, 47.10606, 228.9986, 19.92147, -4.110302, ... 
#> max values  : 24.406433, 18.352308, 64.92966, 653.4167, 36.19497, 12.005042, ...
dim(env_df)
#> [1] 314  24
head(env_df)
#> # A tibble: 6 × 24
#>   site_id     x     y bio01 bio02 bio03 bio04 bio05 bio06 bio07 bio08 bio09 bio10 bio11 bio12 bio13
#>   <chr>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1027     29.2 -22.3  21.8 14.5   55.1  430.  32.6  6.30  26.3  26.2  15.9  26.2  15.9  358.  74.4
#> 2 1029     30.3 -22.3  22.8 13.9   58.0  359.  32.7  8.79  23.9  26.5  17.8  26.5  17.8  451.  94.4
#> 3 1031     31.3 -22.3  24.2 14.2   61.3  326.  34.2 10.9   23.2  27.5  19.7  27.5  19.7  467.  97.6
#> 4 117      18.2 -34.3  20.2 11.8   56.8  317.  29.8  9.29  20.5  19.9  19.8  23.7  16.0  583. 104. 
#> 5 118      18.7 -34.3  16.2  9.28  52.3  309.  25.4  7.65  17.7  12.4  19.8  19.9  12.4  700. 110. 
#> 6 119      19.3 -34.3  15.8 10.2   53.5  321.  25.8  6.67  19.2  11.8  19.6  19.6  11.8  701. 101. 
#> # ℹ 8 more variables: bio14 <dbl>, bio15 <dbl>, bio16 <dbl>, bio17 <dbl>, bio18 <dbl>, bio19 <dbl>,
#> #   obs_sum <dbl>, spp_rich <dbl>

# Build the final site × environment table
grid_env = env_df %>%
  dplyr::select(
    site_id, x, y,
    obs_sum, spp_rich, dplyr::everything()
  ) %>%
  mutate(across(
    .cols = -c(site_id, x, y, obs_sum, spp_rich), # all other columns
    .fns = ~ as.numeric(scale(.x)), # Scale bio
    .names = "{.col}" # keep same names
  ))

str(grid_env, max.level = 1)
#> tibble [314 × 24] (S3: tbl_df/tbl/data.frame)
head(grid_env)
#> # A tibble: 6 × 24
#>   site_id     x     y obs_sum spp_rich  bio01   bio02  bio03  bio04  bio05 bio06  bio07   bio08
#>   <chr>   <dbl> <dbl>   <dbl>    <dbl>  <dbl>   <dbl>  <dbl>  <dbl>  <dbl> <dbl>  <dbl>   <dbl>
#> 1 1027     29.2 -22.3      41       31  1.75   0.274  -0.309  0.272  1.24  0.605  0.292  1.51  
#> 2 1029     30.3 -22.3       7        7  2.20  -0.0315  0.616 -0.450  1.27  1.28  -0.239  1.58  
#> 3 1031     31.3 -22.3     107       76  2.78   0.150   1.68  -0.792  1.79  1.87  -0.391  1.83  
#> 4 117      18.2 -34.3    4246      231  1.08  -1.05    0.236 -0.883  0.241 1.42  -1.00  -0.0540
#> 5 118      18.7 -34.3    2202      215 -0.628 -2.25   -1.21  -0.975 -1.30  0.973 -1.61  -1.94  
#> 6 119      19.3 -34.3     989      173 -0.799 -1.79   -0.838 -0.842 -1.15  0.706 -1.30  -2.10  
#> # ℹ 11 more variables: bio09 <dbl>, bio10 <dbl>, bio11 <dbl>, bio12 <dbl>, bio13 <dbl>,
#> #   bio14 <dbl>, bio15 <dbl>, bio16 <dbl>, bio17 <dbl>, bio18 <dbl>, bio19 <dbl>

3.6. Remove highly correlated predictors (optional)

Finally, dissmapr::rm_correlated() optionally reduces multicollinearity by filtering out highly correlated predictors based on a threshold (e.g. r > 0.70), improving model stability and interpretability. Together, these functions provide a reproducible and scalable pipeline for preparing ecological datasets for spatial analysis.

# # (Optional) Rename BIO
# names(env_df) = c("grid_id", "centroid_lon", "centroid_lat", names_env, "obs_sum", "spp_rich")
#
# # Run the filter and compare dimensions
# # Filter environmental predictors for |r| > 0.70
# env_vars_reduced = dissmapr::rm_correlated(
#   data       = env_df[, 4:23],  # drop ID + coord columns
#   cols       = NULL,                  # infer all numeric cols
#   threshold  = 0.70,
#   plot       = TRUE                   # show heat-map of retained vars
# )
#
# # Before vs after
# c(original = ncol(env_df[, c(4, 6:24)]),
#   reduced  = ncol(env_vars_reduced))

4. Data access and preparation using invasimapr

This utility provides an automated pipeline for extracting and joining both biological trait data and rich metadata for any focal species. The function integrates several steps:

  1. Trait Table Lookup: Retrieves species’ trait data from a local trait table (CSV) or a TRY (Kattge et al 2020) style database, using fuzzy matching to ensure robust linkage even when there are minor naming inconsistencies.
  2. Wikipedia Metadata Scraping: Optionally augments each species entry with a taxonomic summary, higher taxonomy, and representative images scraped directly from Wikipedia.
  3. Image-based Color Palette Extraction: If enabled, downloads and processes public domain images to extract the most frequent colors, optionally removing green/white backgrounds to focus on diagnostic features.
  4. Flexible Output: Returns a single-row tibble with the species name, trait data, taxonomic metadata, image URL, and color palette - all harmonized for downstream analyses or visualization.

This function greatly simplifies the assembly of a unified species-trait-metadata table, which is essential for trait-based community ecology, macroecology, and biodiversity informatics projects.

# Fetch local trait data.frame
btfly_traits = read.csv(system.file("extdata", "species_traits.csv", package = "invasimapr"))
str(btfly_traits)
#> 'data.frame':    27 obs. of  21 variables:
#>  $ species     : chr  "Acraea horta" "Amata cerbera" "Bicyclus safitza safitza" "Cacyreus lingeus" ...
#>  $ trait_cont1 : num  0.83 0.874 -0.428 0.661 0.283 ...
#>  $ trait_cont2 : num  0.811 -0.106 0.672 0.475 0.622 ...
#>  $ trait_cont3 : num  -0.922 0.498 0.355 -0.657 -0.478 ...
#>  $ trait_cont4 : num  -0.684 -0.282 0.291 0.552 0.127 ...
#>  $ trait_cont5 : num  0.0715 -0.9955 0.2179 0.6736 0.503 ...
#>  $ trait_cont6 : num  0.16 0.643 -0.773 0.529 0.247 ...
#>  $ trait_cont7 : num  0.2035 -0.606 0.0705 -0.6409 -0.0962 ...
#>  $ trait_cont8 : num  -0.425 -0.611 0.568 -0.742 -0.742 ...
#>  $ trait_cont9 : num  0.1493 -0.2933 0.0949 0.7854 -0.02 ...
#>  $ trait_cont10: num  -0.5772 0.0992 -0.036 -0.6811 -0.7008 ...
#>  $ trait_cat11 : chr  "wetland" "forest" "wetland" "wetland" ...
#>  $ trait_cat12 : chr  "diurnal" "nocturnal" "diurnal" "nocturnal" ...
#>  $ trait_cat13 : chr  "bivoltine" "multivoltine" "univoltine" "multivoltine" ...
#>  $ trait_cat14 : chr  "detritivore" "detritivore" "generalist" "nectarivore" ...
#>  $ trait_cat15 : chr  "migratory" "resident" "resident" "migratory" ...
#>  $ trait_ord16 : int  4 1 4 3 4 1 1 4 1 1 ...
#>  $ trait_ord17 : int  1 4 4 3 2 4 3 5 4 3 ...
#>  $ trait_bin18 : int  1 1 1 0 1 1 1 1 0 0 ...
#>  $ trait_bin19 : int  1 0 1 0 0 1 1 1 0 1 ...
#>  $ trait_ord20 : chr  "medium" "large" "medium" "medium" ...

# Retrieve and join trait/metadata for all species in the observation set
spp_traits = purrr::map_dfr(
  unique(grid_obs$species),
  ~ get_trait_data(
    species = .x,
    n_palette = 5,
    preview = FALSE,
    do_summary = TRUE,
    do_taxonomy = TRUE,
    do_image = TRUE,
    do_palette = TRUE,
    local_trait_df = btfly_traits,
    local_species_col = "species",
    max_dist = 1
  )
)
# The final output combines trait data, taxonomic info, Wikipedia summary, images, and color palette for each species.
# This integrated dataset supports multi-faceted biodiversity, trait, and visualization analyses.

# Check output
str(spp_traits)
#> tibble [27 × 29] (S3: tbl_df/tbl/data.frame)
#>  $ species     : chr [1:27] "Utetheisa pulchella" "Danaus chrysippus orientis" "Telchinia serena" "Vanessa cardui" ...
#>  $ summary     : chr [1:27] "Utetheisa pulchella, the crimson-speckled flunkey, crimson-speckled footman, or crimson-speckled moth, is a mot"| __truncated__ NA "Acraea serena, the dancing acraea, is a butterfly of the family Nymphalidae. It is found throughout Africa sout"| __truncated__ "Vanessa cardui is the most widespread of all butterfly species. It is commonly called the painted lady, or form"| __truncated__ ...
#>  $ Kingdom     : Named chr [1:27] "Animalia" NA "Animalia" "Animalia" ...
#>   ..- attr(*, "names")= chr [1:27] "Kingdom" "Kingdom" "Kingdom" "Kingdom" ...
#>  $ Phylum      : Named chr [1:27] "Arthropoda" NA "Arthropoda" "Arthropoda" ...
#>   ..- attr(*, "names")= chr [1:27] "Phylum" "Phylum" "Phylum" "Phylum" ...
#>  $ Class       : Named chr [1:27] "Insecta" NA "Insecta" "Insecta" ...
#>   ..- attr(*, "names")= chr [1:27] "Class" "Class" "Class" "Class" ...
#>  $ Order       : Named chr [1:27] "Lepidoptera" NA "Lepidoptera" "Lepidoptera" ...
#>   ..- attr(*, "names")= chr [1:27] "Order" "Order" "Order" "Order" ...
#>  $ Family      : Named chr [1:27] "Erebidae" NA "Nymphalidae" "Nymphalidae" ...
#>   ..- attr(*, "names")= chr [1:27] "Family" "Family" "Family" "Family" ...
#>  $ img_url     : chr [1:27] "https://upload.wikimedia.org/wikipedia/commons/thumb/a/a6/Arctiidae_-_Utetheisa_pulchella.JPG/250px-Arctiidae_-"| __truncated__ NA "https://upload.wikimedia.org/wikipedia/commons/thumb/2/2a/Dancing_acraea_%28Acraea_serena%29_underside_Maputo.j"| __truncated__ "https://upload.wikimedia.org/wikipedia/commons/thumb/c/c8/0_Belle-dame_%28Vanessa_cardui%29_-_Echinacea_purpure"| __truncated__ ...
#>  $ palette     : chr [1:27] "#1D220C, #535509, #A59A47, #CCBF98, #8C8012" NA "#736E50, #B88C34, #969769, #403420, #E6E4CB" "#695C41, #3A311F, #CC8242, #EA8FD3, #C55C9B" ...
#>  $ trait_cont1 : num [1:27] 0.0284 0.0382 -0.8351 -0.2196 0.314 ...
#>  $ trait_cont2 : num [1:27] -0.203 -0.224 -0.307 0.569 0.666 ...
#>  $ trait_cont3 : num [1:27] -0.9969 0.0288 0.0288 0.1632 0.5191 ...
#>  $ trait_cont4 : num [1:27] 0.48 -0.533 0.925 0.466 -0.39 ...
#>  $ trait_cont5 : num [1:27] 0.8748 -0.0945 0.263 0.701 -0.9972 ...
#>  $ trait_cont6 : num [1:27] 0.87 -0.703 0.36 0.101 0.559 ...
#>  $ trait_cont7 : num [1:27] 0.586 -0.366 0.836 -0.733 0.459 ...
#>  $ trait_cont8 : num [1:27] -0.0974 -0.8555 -0.8781 0.6775 -0.7754 ...
#>  $ trait_cont9 : num [1:27] 0.123 -0.657 -0.865 -0.859 -0.373 ...
#>  $ trait_cont10: num [1:27] 0.85009 -0.00145 -0.5899 0.77351 -0.62313 ...
#>  $ trait_cat11 : chr [1:27] "wetland" "grassland" "forest" "forest" ...
#>  $ trait_cat12 : chr [1:27] "diurnal" "diurnal" "nocturnal" "diurnal" ...
#>  $ trait_cat13 : chr [1:27] "multivoltine" "univoltine" "multivoltine" "univoltine" ...
#>  $ trait_cat14 : chr [1:27] "detritivore" "detritivore" "generalist" "generalist" ...
#>  $ trait_cat15 : chr [1:27] "migratory" "migratory" "migratory" "resident" ...
#>  $ trait_ord16 : int [1:27] 3 1 2 3 1 1 4 4 2 2 ...
#>  $ trait_ord17 : int [1:27] 4 4 5 3 4 2 1 5 2 5 ...
#>  $ trait_bin18 : int [1:27] 1 1 1 0 0 0 1 1 0 0 ...
#>  $ trait_bin19 : int [1:27] 1 1 1 0 0 0 1 1 1 1 ...
#>  $ trait_ord20 : chr [1:27] "small" "medium" "large" "large" ...
head(spp_traits)
#> # A tibble: 6 × 29
#>   species          summary Kingdom Phylum Class Order Family img_url palette trait_cont1 trait_cont2
#>   <chr>            <chr>   <chr>   <chr>  <chr> <chr> <chr>  <chr>   <chr>         <dbl>       <dbl>
#> 1 Utetheisa pulch… Utethe… Animal… Arthr… Inse… Lepi… Erebi… https:… #1D220…      0.0284      -0.203
#> 2 Danaus chrysipp… <NA>    <NA>    <NA>   <NA>  <NA>  <NA>   <NA>    <NA>         0.0382      -0.224
#> 3 Telchinia serena Acraea… Animal… Arthr… Inse… Lepi… Nymph… https:… #736E5…     -0.835       -0.307
#> 4 Vanessa cardui   Vaness… Animal… Arthr… Inse… Lepi… Nymph… https:… #695C4…     -0.220        0.569
#> 5 Hypolimnas misi… Hypoli… Animal… Arthr… Inse… Lepi… Nymph… https:… #584E4…      0.314        0.666
#> 6 Pieris brassicae Pieris… Animal… Arthr… Inse… Lepi… Pieri… https:… #ECEFE…     -0.765       -0.136
#> # ℹ 18 more variables: trait_cont3 <dbl>, trait_cont4 <dbl>, trait_cont5 <dbl>, trait_cont6 <dbl>,
#> #   trait_cont7 <dbl>, trait_cont8 <dbl>, trait_cont9 <dbl>, trait_cont10 <dbl>, trait_cat11 <chr>,
#> #   trait_cat12 <chr>, trait_cat13 <chr>, trait_cat14 <chr>, trait_cat15 <chr>, trait_ord16 <int>,
#> #   trait_ord17 <int>, trait_bin18 <int>, trait_bin19 <int>, trait_ord20 <chr>

4.2. Alternatively, load local combined site, environment, and trait data

# Read GBIF species occurrence with simulated traits and enviro data
# One row per site-species combination
site_env_spp = read.csv(system.file("extdata", "site_env_spp_simulated.csv", package = "invasimapr"))
# site_env_spp = read.csv(system.file("extdata", "site_env_spp_trt_sim.csv", package = "invasimapr"))

# Check the results
names(site_env_spp)
#>  [1] "site_id"      "x"            "y"            "species"      "count"        "trait_cont1" 
#>  [7] "trait_cont2"  "trait_cont3"  "trait_cont4"  "trait_cont5"  "trait_cont6"  "trait_cont7" 
#> [13] "trait_cont8"  "trait_cont9"  "trait_cont10" "trait_cat11"  "trait_cat12"  "trait_cat13" 
#> [19] "trait_cat14"  "trait_cat15"  "trait_ord16"  "trait_ord17"  "trait_bin18"  "trait_bin19" 
#> [25] "trait_ord20"  "env1"         "env2"         "env3"         "env4"         "env5"        
#> [31] "env6"         "env7"         "env8"         "env9"         "env10"        "cat11_num"
dim(site_env_spp)
#> [1] 11205    36

5. Model Inputs

Shape your data so every row is “one species at one site,” with that species’ traits and that site’s environment.

5.1. Format site-locations

This section isolates the unique spatial coordinates for each sampling site. The resulting table (site_xy) will be used for spatial mapping, distance calculations, and for merging environmental and biodiversity metrics with precise locations.

# Create site coordinate table i.e. # Unique site coordinates
site_xy = site_env_spp %>%
  dplyr::select(site_id, x, y) %>%
  distinct() %>%
  mutate(.site_id_rn = site_id) %>%
  column_to_rownames(var = ".site_id_rn")
head(site_xy)
#>      site_id     x         y
#> 1026    1026 28.75 -22.25004
#> 1027    1027 29.25 -22.25004
#> 1028    1028 29.75 -22.25004
#> 1029    1029 30.25 -22.25004
#> 1030    1030 30.75 -22.25004
#> 1031    1031 31.25 -22.25004

5.2. Format site-environment variables

Here, we extract a site-by-environment matrix containing the values of all measured environmental covariates at each sampling site. This matrix (site_env) enables analyses of environmental gradients, spatial drivers of community composition, and covariate modeling.

# Site-by-environment matrix
site_env = site_env_spp %>%
  dplyr::select(
    site_id, x, y,
    env1:env10
  ) %>%
  mutate(site_id = as.character(site_id)) %>% # ensure character
  distinct() %>%
  mutate(.site_id_rn = site_id) %>%
  column_to_rownames(var = ".site_id_rn")
dim(site_env)
#> [1] 415  13
head(site_env[1:6, 1:6])
#>      site_id     x         y     env1      env2       env3
#> 1026    1026 28.75 -22.25004 2.203029 0.6471631 -0.4910981
#> 1027    1027 29.25 -22.25004 2.086006 1.4025519 -0.4471106
#> 1028    1028 29.75 -22.25004 2.233508 0.8008731 -0.5405243
#> 1029    1029 30.25 -22.25004 2.333375 1.1607272 -0.4405388
#> 1030    1030 30.75 -22.25004 2.153073 1.2747649 -0.2945477
#> 1031    1031 31.25 -22.25004 2.046307 1.4773531 -0.4125693

5.3. Format site-species abundances and presence-absence

This section generates two site-by-species matrices: one containing abundances (site_spp_ab), and one indicating presence-absence (site_spp_pa). These matrices are fundamental for calculating community diversity, richness, and for modeling occupancy and abundance patterns.

# Site-by-species abundance matrix (wide format)
site_spp_ab = site_env_spp %>% #
  dplyr::select(site_id, x, y, species, count) %>%
  pivot_wider(
    names_from  = species,
    values_from = count,
    values_fill = list(count = 0)
  ) %>%
  mutate(.site_id_rn = site_id) %>%
  column_to_rownames(var = ".site_id_rn")
dim(site_spp_ab)
#> [1] 415  30
head(site_spp_ab[1:6, 1:6])
#>      site_id     x         y Acraea horta Amata cerbera Bicyclus safitza safitza
#> 1026    1026 28.75 -22.25004           10             0                        0
#> 1027    1027 29.25 -22.25004            0             7                        0
#> 1028    1028 29.75 -22.25004            0             0                        0
#> 1029    1029 30.25 -22.25004            0            31                        0
#> 1030    1030 30.75 -22.25004            0            12                        3
#> 1031    1031 31.25 -22.25004            0             7                        0

# Site-by-species presence/absence matrix (wide format)
site_spp_pa = site_env_spp %>%
  mutate(pa = as.integer(count > 0)) %>%
  dplyr::select(site_id, x, y, species, pa) %>%
  pivot_wider(
    names_from  = species,
    values_from = pa,
    values_fill = list(pa = 0)
  ) %>%
  mutate(.site_id_rn = site_id) %>%
  column_to_rownames(var = ".site_id_rn")
dim(site_spp_pa)
#> [1] 415  30
head(site_spp_pa[1:6, 1:6])
#>      site_id     x         y Acraea horta Amata cerbera Bicyclus safitza safitza
#> 1026    1026 28.75 -22.25004            1             0                        0
#> 1027    1027 29.25 -22.25004            0             1                        0
#> 1028    1028 29.75 -22.25004            0             0                        0
#> 1029    1029 30.25 -22.25004            0             1                        0
#> 1030    1030 30.75 -22.25004            0             1                        1
#> 1031    1031 31.25 -22.25004            0             1                        0

5.4. Format species-trait values

Here we build the species-by-trait matrix (spp_trait), including all measured continuous, categorical, and ordinal traits for each species. This structure is central for trait-based analyses of community assembly, functional diversity, and invasion processes.

# Species-by-trait matrix (wide)
# Extract and process continuous, categorical, and ordinal trait data
spp_trait = spp_traits %>% # site_env_spp
  dplyr::select(
    species, trait_cont1:trait_cont10,
    trait_cat11:trait_cat15,
    trait_ord16:trait_ord20
  ) %>%
  distinct() %>%
  mutate(.species_rn = species) %>%
  column_to_rownames(var = ".species_rn") %>%
  mutate(across(where(is.character), as.factor))

# Check results
dim(spp_trait)
#> [1] 27 21
head(spp_trait[1:6, 1:6])
#>                                               species trait_cont1 trait_cont2 trait_cont3
#> Utetheisa pulchella               Utetheisa pulchella  0.02842357  -0.2030292 -0.99685889
#> Danaus chrysippus orientis Danaus chrysippus orientis  0.03819190  -0.2237834  0.02882587
#> Telchinia serena                     Telchinia serena -0.83512488  -0.3065035  0.02881542
#> Vanessa cardui                         Vanessa cardui -0.21959307   0.5693856  0.16320801
#> Hypolimnas misippus               Hypolimnas misippus  0.31398458   0.6658322  0.51908854
#> Pieris brassicae                     Pieris brassicae -0.76502528  -0.1364975 -0.71904181
#>                            trait_cont4 trait_cont5
#> Utetheisa pulchella          0.4797106  0.87477170
#> Danaus chrysippus orientis  -0.5325932 -0.09453685
#> Telchinia serena             0.9252160  0.26301460
#> Vanessa cardui               0.4664918  0.70096550
#> Hypolimnas misippus         -0.3895633 -0.99723831
#> Pieris brassicae             0.4879493 -0.21005391

6. Data summaries and visualisation

6.1. Summarise site-level diversity

This section quantifies and visualizes site-level biodiversity, focusing on local species richness and abundance. Calculating these metrics is essential for mapping alpha diversity, assessing community structure, and identifying spatial patterns of biodiversity hotspots and low-diversity areas across the study landscape.

  • Species richness (spp_richs): the number of species present (non-zero counts) at site \(s\).
  • Total abundance (obs_sums): the sum of all individual counts across species at site \(s\) (a proxy for sampling effort).
  • Mean abundance per species (obs_means): total abundance at site \(s\) divided by the number of species columns (N); effectively the average count per species regardless of whether it is present.
# Calculate site-level diversity metrics from the species-by-abundance matrix:
spp_rich_obs = site_spp_ab %>%
  mutate(
    spp_rich = rowSums(dplyr::select(., -site_id, -x, -y) > 0), # Species richness: number of species present
    obs_sum = rowSums(dplyr::select(., -site_id, -x, -y)), # Total abundance: sum of all individuals
    obs_mean = rowMeans(dplyr::select(., -site_id, -x, -y)) # Mean abundance per species
  ) %>%
  # Keep summary metrics and site coordinates
  dplyr::select(site_id, x, y, spp_rich, obs_sum, obs_mean) %>%
  mutate(site_id = as.character(site_id)) # Ensure site_id` is a
head(spp_rich_obs)
#>      site_id     x         y spp_rich obs_sum obs_mean
#> 1026    1026 28.75 -22.25004       17     172 6.370370
#> 1027    1027 29.25 -22.25004        9      92 3.407407
#> 1028    1028 29.75 -22.25004       11     131 4.851852
#> 1029    1029 30.25 -22.25004       12     129 4.777778
#> 1030    1030 30.75 -22.25004       13     136 5.037037
#> 1031    1031 31.25 -22.25004       13      99 3.666667

# Visualize spatial distribution of site-level species richness
# First define a custom color palette for richness mapping (blue = low, dark red = high)
col_pal = colorRampPalette(c("blue", "green", "yellow", "orange", "red", "darkred"))

# Plot
ggplot(spp_rich_obs, aes(x = x, y = y, fill = sqrt(spp_rich))) +
  geom_tile() +
  # Use custom color gradient, reversed so high richness is warm/dark, low is cool/blue
  scale_fill_gradientn(colors = rev(col_pal(10)), name = "√(Richness)") +
  geom_text(aes(label = spp_rich), color = "grey80", size = 2) + # Overlay actual richness values
  geom_sf(data = rsa, inherit.aes = FALSE, fill = NA, color = "black", size = 0.4) + # Plot boundary
  labs(
    x = "Longitude",
    y = "Latitude",
    title = "Spatial Distribution of Species Richness"
  ) +
  theme(panel.grid = element_blank())

Figure 4: Spatial distribution of site-level species richness across South Africa, derived from a species-by-abundance matrix. Each grid cell indicates the number of species observed (numeric labels), with colours representing the square-root–transformed richness (√richness) to emphasize variation across the landscape. Warmer colours denote higher richness, while cooler colours denote lower richness. National boundaries are shown for geographic reference.


7. Functional Trait Space

7.1. Basic trait similarity

To diagnose which functional dimensions are more conserved versus variable across the metacommunity, we compute trait-level similarity for each trait across all species. This allows identification of traits that might constrain or facilitate invasion and coexistence (e.g., highly conserved traits might reflect strong filtering, while highly variable traits may be axes of ecological opportunity).

We use the compute_trait_similarity() function, which calculates similarity as follows:

  • Numeric traits: Scaled to [0,1], pairwise Euclidean distances are computed, and similarity is 1 - mean(distance). If all values are identical or only one value is present, similarity is 100%.
  • Categorical traits: Similarity is the proportion of all possible species pairs that share the same category (level).

The output is a table of percent similarity for each trait, allowing direct comparison of conservation vs. lability across traits.

# Compute Trait Similarity for Numeric and Categorical Variables
df_traits = compute_trait_similarity(spp_trait[, -1])
head(df_traits)
#> # A tibble: 6 × 2
#>   Trait       Similarity
#>   <chr>            <dbl>
#> 1 trait_cont1       61.8
#> 2 trait_cont2       63.9
#> 3 trait_cont3       65.1
#> 4 trait_cont4       64.1
#> 5 trait_cont5       69.5
#> 6 trait_cont6       62.5

# Barplot: trait-level similarity (percent identity or scaled distance)
ggplot(df_traits, aes(x = reorder(Trait, Similarity), y = Similarity, fill = Similarity)) +
  geom_col(show.legend = FALSE) +
  scale_fill_viridis_c(option = "inferno") + # ramp color scale
  # ylim(0,100) +
  labs(
    title = "Average Trait Similarity (%)",
    y = "Similarity (%)",
    x = NULL
  ) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
Trait-level functional similarity across species.

Trait-level functional similarity across species.

Figure 5: Average trait similarity across species, measured as percent similarity for each categorical or continuous trait. Bars represent trait-level functional similarity, ordered from lowest to highest, with colour intensity indicating the degree of similarity. Traits with higher similarity values reflect more conserved characteristics across species, whereas lower values indicate greater differentiation.

7.2. Gower distance, clustering, and trait space mapping (PCoA)

7.2.1. Compute Gower distance (handles mixed trait types)

Trait-based approaches require robust dissimilarity measures for mixed data types (continuous, categorical, ordinal). Here, we compute pairwise Gower distances among species, which accommodates all variable types, and use hierarchical clustering to visualize functional similarity structure within the community.

# Compute Gower dissimilarity matrix (excluding species column)
sbt_gower = cluster::daisy(spp_trait[, -1], metric = "gower")
trait_dist = as.matrix(sbt_gower)

7.2.2. Hierarchical clustering (visualise trait-based groupings)

# Hierarchical clustering and dendrogram visualization of functional similarity
gower_hc = hclust(as.dist(sbt_gower))
# Dendrogram
fviz_dend(
  gower_hc,
  k = 4,
  cex = 0.5,
  k_colors = viridis(4, option = "D"), # k_colors = c("red","blue","green","purple"),
  color_labels_by_k = TRUE,
  rect = TRUE,
  rect_border = "grey40",
  main = "Gower Cluster Dendrogram") + 
  guides(scale = "none")
Species clustering by functional traits (Gower distance)

Species clustering by functional traits (Gower distance)

Figure 6: Hierarchical clustering of species based on functional traits using Gower dissimilarity. The dendrogram partitions species into four distinct functional groups (coloured branches), reflecting shared trait combinations that may underlie ecological similarity.

7.2.3. PCoA ordination (map species into a reduced-dimensional trait space)

Principal Coordinates Analysis (PCoA) enables ordination of species in a reduced, low-dimensional trait space, preserving pairwise dissimilarities. This is used to visualize the overall structure of the functional trait space and examine density and clustering patterns.

# PCoA ordination
pcoa = cmdscale(sbt_gower, eig = TRUE)
scores_species = as.data.frame(pcoa$points)[, 1:2]
colnames(scores_species) = c("PCoA1", "PCoA2")

# Visualize trait space density using kernel density estimation
xlims = range(scores_species$PCoA1) + c(-1, 1) * 0.1 * diff(range(scores_species$PCoA1))
ylims = range(scores_species$PCoA2) + c(-1, 1) * 0.1 * diff(range(scores_species$PCoA2))
grid_density = MASS::kde2d(scores_species$PCoA1,
  scores_species$PCoA2,
  n = 100,
  lims = c(xlims, ylims)
)
filled.contour(
  grid_density,
  color.palette = viridis,
  xlim = xlims, ylim = ylims,
  plot.title = title(
    main = "Trait Space Density Contours",
    xlab = "PCoA1",
    ylab = "PCoA2"
  ),
  plot.axes = {
    axis(1)
    axis(2)
    points(scores_species, pch = 19, cex = 0.5)
    # Draw all contours (thin)
    contour(
      x = grid_density$x, y = grid_density$y, z = grid_density$z,
      add = TRUE, drawlabels = FALSE, lwd = 0.7, col = "grey60"
    )
    # Highlight the major contour (e.g. highest density level)
    contour(
      x = grid_density$x, y = grid_density$y, z = grid_density$z,
      add = TRUE, drawlabels = FALSE,
      levels = max(grid_density$z) * 0.5, # 50% of max density
      lwd = 2, col = "black"
    )
  },
  key.title = title(main = "Density")
)
Kernel density in trait space (PCoA axes 1-2).

Kernel density in trait space (PCoA axes 1-2).

Figure 7: Principal Coordinates Analysis (PCoA) of species in reduced-dimensional trait space, based on Gower dissimilarity. Kernel density contours highlight regions of high species density (hotspots of trait similarity) and illustrate the overall structure of the functional trait space.

7.3. Trait centrality

Trait centrality quantifies how close each species is to the “core” of the community’s trait space. Peripheral species may be ecologically distinct and potentially more likely to become invaders or to escape biotic resistance.

# Calculate the community trait centroid in reduced trait-space (PCoA axes)
# First define abundance weights (uniform if none provided)
# Provide a numeric vector `abundance` aligned with rows(scores_species) if available
n = nrow(scores_species)
if (!exists("abundance") || is.null(abundance)) {
  p = rep(1 / n, n)
} else {
  stopifnot(is.numeric(abundance), length(abundance) == n, all(abundance >= 0))
  s = sum(abundance)
  if (s == 0) stop("sum(abundance) == 0")
  p = as.numeric(abundance) / s
}

# Weighted centroid and centrality (distance to centroid)
centroid = colSums(scores_species * p) / sum(p)
scores_species$centrality = sqrt(rowSums((as.matrix(scores_species) -
  matrix(centroid, n, 2, byrow = TRUE))^2))

# Store centrality alongside original trait table (for later use/plots) ---
spp_trt_cent = spp_trait
spp_trt_cent$centrality = scores_species$centrality

# Histogram of distribution of trait centrality (core vs peripheral species)
ggplot(spp_trt_cent, aes(x = centrality)) +
  geom_histogram(bins = 20, fill = "steelblue", color = "white") +
  theme_bw() +
  labs(
    x = "Distance to community-centroid", y = "Number of species",
    title = "Trait Centrality (Community Edge vs Core)"
  )
Distribution of trait centrality (distance to centroid) among species.

Distribution of trait centrality (distance to centroid) among species.

Figure 8: Distribution of trait centrality (distance to the community centroid) among species. The histogram shows how far each species lies from the centroid, representing the average trait combination of the community. The x-axis indicates distance to the centroid, while the y-axis gives the number of species at each distance. Most species are clustered at intermediate distances (~0.18–0.22), reflecting moderately typical trait values. A few species occur very close to the centroid, representing “core” species with common trait profiles, whereas others are more distant “peripheral” species with unusual trait combinations that may indicate specialized ecological roles. Overall, the community is centred on a typical trait set but also contains species that are either strongly representative or markedly distinct from that average.  

7.4. Trait dispersion

We calculate key functional diversity metrics at the community scale:

  • FDis: functional dispersion (average distance to centroid)
  • FRic: functional richness (trait-space convex hull volume)
  • RaoQ: Rao’s quadratic entropy (total abundance-weighted trait dissimilarity)

These summarize the functional structure and ecological breadth of the community.

# FDis: functional dispersion (weighted mean distance to centroid) 
FDis = sum(p * scores_species$centrality)

# FRic: functional richness (convex hull area/volume on retained axes) ---
FRic = NA_real_
if (n >= 3L) {  # need pcoa_dims + 1 points with pcoa_dims = 2
  hull = geometry::convhulln(as.matrix(scores_species[, c("PCoA1", "PCoA2")]), options = "FA")
  FRic = hull$vol
}

# RaoQ: abundance-weighted trait dissimilarity (Rao's quadratic entropy)
dmat = as.matrix(dist(scores_species[, c("PCoA1", "PCoA2")]))
RaoQ = 0.5 * sum(outer(p, p) * dmat)

# Assemble and plot
dispersion_df = data.frame(
  Metric = c("FDis", "FRic", "RaoQ"),
  Value  = c(FDis, FRic, RaoQ)
)

# Bar plot: community-level trait dispersion metrics
ggplot(dispersion_df, aes(x = Metric, y = Value)) +
  geom_col(width = 0.6, fill = "firebrick") +
  theme_classic() +
  labs(title = "Community-Level Trait Dispersion", y = "Metric value")
Community-level functional diversity metrics.

Community-level functional diversity metrics.

Figure 9: Community-level trait dispersion. Bar chart of three functional diversity metrics summarising the community’s functional structure. FDis (Functional Dispersion; ~0.20) indicates moderate spread of species around the trait centroid, reflecting variation in trait combinations. FRic (Functional Richness; ~0.21) is highest, showing that the community spans a broad portion of trait space, i.e. species collectively cover a wide range of trait combinations. RaoQ (Rao’s Quadratic Entropy; ~0.13) is lowest, suggesting that although trait space is well filled, community abundance is concentrated in functionally similar species. Together, these metrics suggest a community with broad trait coverage and moderate dispersion but dominated by clusters of functionally alike species.  

7.5. Combined functional trait space workflow

Trait-based community analyses often require multiple sequential steps: computing per-trait similarity, calculating trait dissimilarities across species, ordination, mapping trait space, quantifying species centrality, and summarising community-level diversity metrics.

The compute_trait_space() function unifies these steps into a single call, producing both per-trait similarity summaries and community-level dispersion outputs as follows:

  • Per-Trait Similarity Table: out$similarity holds percentage similarity (0-100) for each trait column, computed as either i) Numeric traits as the scaled mean pairwise similarity; or -Vi) Categorical traits as the proportion of identical pairs.

  • Trait Dissimilarity & Clustering: out$dispersion$plots$dend contains hierarchical clustering results based on pairwise Gower distances across all traits, highlighting functional similarity groupings.

  • Trait Space Ordination (PCoA): out$dispersion$plots$density_gg shows species positioned in reduced-dimensional trait space using PCoA, with kernel density contours to reveal clusters and gaps.

  • Species Centrality: out$dispersion$plots$centrality_hist displays each species’ Euclidean distance from the trait-space centroid. Shorter distances indicate core species; longer distances indicate peripheral, functionally distinct species.

  • Functional Diversity Metrics Table: out$dispersion$metrics_df provides tabulated values for all computed community-level functional diversity indices.

  • Functional Diversity Summary Plots: out$dispersion$plots$metrics_bar presents bar plots for three key community-level metrics:

    • FDis - functional dispersion (mean distance to centroid);
    • FRic - functional richness (convex hull volume/area);
    • RaoQ - abundance-weighted trait dissimilarity (Rao’s quadratic entropy).
# Same inputs and space as your manual pipeline
res = compute_trait_space(
  trait_df = spp_trait,
  species_col = 1,
  do_similarity = FALSE, # you don't need per-trait similarity here
  k = 4, # only affects dendrogram; metrics don’t use k
  pcoa_dims = 2, # keep 2 axes
  abundance = rep(1, nrow(spp_trait)), # equal weights/unweighted centroid
  show_density_plot = FALSE,
  show_plots = TRUE
)

Figure 10: Combined functional trait space workflow. Outputs from the compute_trait_space() function, which integrates multiple trait-based analyses into a unified framework. Shown are: (top left) hierarchical clustering of species by Gower trait distance; (top right) community-level functional diversity metrics (FDis, FRic, RaoQ); (bottom left) species centrality histogram (distance to centroid); and (bottom right) trait space ordination (PCoA with kernel density). Together, these visualisations capture trait similarity, trait-space structure, species’ functional positions, and overall community-level functional diversity.

# Check results/outputs
str(res, max.level = 1)
#> List of 1
#>  $ dispersion:List of 7
head(res$similarity)
#> NULL
str(res$dispersion, max.level = 1)
#> List of 7
#>  $ distance_matrix: num [1:27, 1:27] 0 0.398 0.416 0.534 0.607 ...
#>   ..- attr(*, "dimnames")=List of 2
#>  $ hc             :List of 7
#>   ..- attr(*, "class")= chr "hclust"
#>  $ pcoa           :List of 5
#>  $ scores         :'data.frame': 27 obs. of  3 variables:
#>  $ centroid       : num [1:2] -1.44e-17 -5.42e-20
#>  $ metrics_df     :'data.frame': 3 obs. of  2 variables:
#>  $ plots          :List of 4

Note: By pairing compute_trait_similarity() (per-trait conservation/lability) with compute_trait_dispersion() (whole-community functional structure), you get a complete, repeatable workflow for trait-based invasion and coexistence studies. All plots are stored in res$plots for flexible reuse, while res$metrics_df provides ready-to-use numerical summaries for statistical modelling.  

You can also rearrange or customise the plots in res$plots using patchwork or other layout tools. For example, the code below recreates the combined layout used when show_plot = TRUE, but omits the base filled.contour() plot for simplicity.

str(res$dispersion, max.level = 1)
#> List of 7
#>  $ distance_matrix: num [1:27, 1:27] 0 0.398 0.416 0.534 0.607 ...
#>   ..- attr(*, "dimnames")=List of 2
#>  $ hc             :List of 7
#>   ..- attr(*, "class")= chr "hclust"
#>  $ pcoa           :List of 5
#>  $ scores         :'data.frame': 27 obs. of  3 variables:
#>  $ centroid       : num [1:2] -1.44e-17 -5.42e-20
#>  $ metrics_df     :'data.frame': 3 obs. of  2 variables:
#>  $ plots          :List of 4

# # Custom combined layout without base filled.contour
# combined = res$dispersion$plots$dend /
#   (res$dispersion$plots$centrality_hist | res$dispersion$plots$metrics_bar) /
#   res$dispersion$plots$density_gg +
#   patchwork::plot_layout(heights = c(1, 2, 1))
#
# print(combined)  # display in console

Note: In this way you can change the order or arrangement of panels, replace individual plots with customised versions (e.g., change themes or colours), and/or combine them with other figures in your workflow  


8. Trait-Environment Response

To evaluate how species functional traits, environmental conditions, and their interactions influence species abundances, we use a Generalized Linear Mixed Model (GLMM) to:

  • Quantify the separate and combined effects of traits and environment.
  • Control for repeated observations of the same species and sites via random intercepts, accounting for non-independence and spatial structure.
  • Allow the flexibility to predict how hypothetical (invader) species might perform in new environments.

We use the Tweedie error distribution, which is well-suited to ecological count data because it handles overdispersion and many zeros.

8.1. Prepare the long-format dataset

We first create a long-format table, where each row is a single species-at-site observation, with all associated predictors attached:

  • Site metadata: site_id, spatial coordinates (x, y)
  • Species ID and count/abundance
  • Environmental predictors: e.g., env1-env10
  • Species traits: continuous (trait_cont1-trait_cont10), categorical (trait_cat11-trait_cat15), and ordinal (trait_ord16-trait_ord20)

We then convert all character variables to factors so they’re correctly handled by the model.

# Prepare long-format data
# Use simulated traits
longDF = site_env_spp %>%
  dplyr::select(
    site_id, x, y, species, count, # Metadata + response
    env1:env10, # Environment variables
    trait_cont1:trait_cont10, # Continuous traits
    trait_cat11:trait_cat15, # Categorical traits
    trait_ord16:trait_ord20 # Ordinal traits
  ) %>%
  mutate(across(where(is.character), as.factor))
head(longDF)
#>   site_id     x         y                    species count     env1      env2       env3       env4
#> 1    1026 28.75 -22.25004               Acraea horta    10 2.203029 0.6471631 -0.4910981 -0.7934531
#> 2    1026 28.75 -22.25004              Amata cerbera     0 2.203029 0.6471631 -0.4910981 -0.7934531
#> 3    1026 28.75 -22.25004   Bicyclus safitza safitza     0 2.203029 0.6471631 -0.4910981 -0.7934531
#> 4    1026 28.75 -22.25004           Cacyreus lingeus     0 2.203029 0.6471631 -0.4910981 -0.7934531
#> 5    1026 28.75 -22.25004        Charaxes wakefieldi     9 2.203029 0.6471631 -0.4910981 -0.7934531
#> 6    1026 28.75 -22.25004 Danaus chrysippus orientis     8 2.203029 0.6471631 -0.4910981 -0.7934531
#>        env5     env6      env7      env8        env9     env10 trait_cont1 trait_cont2 trait_cont3
#> 1 0.8216381 1.545075 0.4185999 -1.050379 -0.05366469 0.9329782   0.8296121   0.8114763 -0.92212702
#> 2 0.8216381 1.545075 0.4185999 -1.050379 -0.05366469 0.9329782   0.8741508  -0.1060607  0.49759077
#> 3 0.8216381 1.545075 0.4185999 -1.050379 -0.05366469 0.9329782  -0.4277209   0.6720085  0.35455366
#> 4 0.8216381 1.545075 0.4185999 -1.050379 -0.05366469 0.9329782   0.6608953   0.4751912 -0.65747134
#> 5 0.8216381 1.545075 0.4185999 -1.050379 -0.05366469 0.9329782   0.2834910   0.6221103 -0.47782407
#> 6 0.8216381 1.545075 0.4185999 -1.050379 -0.05366469 0.9329782   0.0381919  -0.2237834  0.02882587
#>   trait_cont4 trait_cont5 trait_cont6 trait_cont7 trait_cont8 trait_cont9 trait_cont10 trait_cat11
#> 1  -0.6841896  0.07152258   0.1596418  0.20353247  -0.4245005  0.14927467 -0.577216122     wetland
#> 2  -0.2819434 -0.99545407   0.6428078 -0.60601102  -0.6106477 -0.29329924  0.099240827      forest
#> 3   0.2912638  0.21787491  -0.7725628  0.07047322   0.5682188  0.09485216 -0.036037103     wetland
#> 4   0.5516467  0.67360312   0.5290155 -0.64088852  -0.7422557  0.78543719 -0.681060291     wetland
#> 5   0.1272937  0.50304512   0.2472269 -0.09622701  -0.7418214 -0.02001886 -0.700842010      forest
#> 6  -0.5325932 -0.09453686  -0.7031068 -0.36589330  -0.8554938 -0.65673577 -0.001454239   grassland
#>   trait_cat12  trait_cat13 trait_cat14 trait_cat15 trait_ord16 trait_ord17 trait_bin18 trait_bin19
#> 1     diurnal    bivoltine detritivore   migratory           4           1           1           1
#> 2   nocturnal multivoltine detritivore    resident           1           4           1           0
#> 3     diurnal   univoltine  generalist    resident           4           4           1           1
#> 4   nocturnal multivoltine nectarivore   migratory           3           3           0           0
#> 5   nocturnal    bivoltine  generalist   migratory           4           2           1           0
#> 6     diurnal   univoltine detritivore   migratory           1           4           1           1
#>   trait_ord20
#> 1      medium
#> 2       large
#> 3      medium
#> 4      medium
#> 5      medium
#> 6      medium

8.2. Build the model formula

We use build_glmm_formula() to automatically detect trait and environmental predictor columns based on naming conventions (prefixes like “trait_” or “env_”) or by excluding known metadata columns (site_id, x, y, species, count).

The function then:

  • Generates main effect terms for all detected traits and all detected environment variables.
  • Optionally expands these into all pairwise trait × environment interactions (e.g., trait_cont1:env3), capturing environment-dependent trait effects.
  • Appends random intercepts for the grouping variables specified in random_effects (default: (1 | species) and (1 | site_id)).
  • Returns a valid R formula object ready for model fitting.

This automatic approach ensures that the model always reflects the full trait-environment structure without hard-coding variable names.

# Automatically build the GLMM formula
fml = build_glmm_formula(
  data                 = longDF,
  response             = "count", # Response variable
  species_col          = "species", # Random effect grouping
  site_col             = "site_id", # Random effect grouping
  # env_cols             = names(longDF[,6:24]),
  include_interactions = TRUE, # Add all trait × environment terms
  random_effects       = c("(1 | species)", "(1 | site_id)")
)

# Check model formula structure
fml
#> count ~ trait_cont1 + trait_cont2 + trait_cont3 + trait_cont4 + 
#>     trait_cont5 + trait_cont6 + trait_cont7 + trait_cont8 + trait_cont9 + 
#>     trait_cont10 + trait_cat11 + trait_cat12 + trait_cat13 + 
#>     trait_cat14 + trait_cat15 + trait_ord16 + trait_ord17 + trait_bin18 + 
#>     trait_bin19 + trait_ord20 + env1 + env2 + env3 + env4 + env5 + 
#>     env6 + env7 + env8 + env9 + env10 + (trait_cont1 + trait_cont2 + 
#>     trait_cont3 + trait_cont4 + trait_cont5 + trait_cont6 + trait_cont7 + 
#>     trait_cont8 + trait_cont9 + trait_cont10 + trait_cat11 + 
#>     trait_cat12 + trait_cat13 + trait_cat14 + trait_cat15 + trait_ord16 + 
#>     trait_ord17 + trait_bin18 + trait_bin19 + trait_ord20):(env1 + 
#>     env2 + env3 + env4 + env5 + env6 + env7 + env8 + env9 + env10) + 
#>     (1 | species) + (1 | site_id)
#> <environment: 0x000001e927422080>

8.3. Fit the GLMM

We fit the model using glmmTMB::glmmTMB(), which supports a wide range of distributions and correlation structures. In the example below:

  • Family: tweedie(link = "log")
    • Handles overdispersed count data and zero-inflation without requiring a separate zero-inflated component.
    • The log link models multiplicative effects of predictors on expected abundance.
  • Data: The prepared long-format table (longDF).
  • Formula: The formula, automatically generated fml from build_glmm_formula() or user customised.

The model is fit via maximum likelihood, estimating both fixed-effect coefficients (for traits, environments, and their interactions) and variance components for the random effects.

# Fit Tweedie GLMM
set.seed(123)
mod = glmmTMB::glmmTMB(
  formula = fml,
  data    = longDF,
  family  = glmmTMB::tweedie(link = "log")
)
# Print model summary
summary(mod)
#>  Family: tweedie  ( log )
#> Formula:          count ~ trait_cont1 + trait_cont2 + trait_cont3 + trait_cont4 +  
#>     trait_cont5 + trait_cont6 + trait_cont7 + trait_cont8 + trait_cont9 +  
#>     trait_cont10 + trait_cat11 + trait_cat12 + trait_cat13 +  
#>     trait_cat14 + trait_cat15 + trait_ord16 + trait_ord17 + trait_bin18 +  
#>     trait_bin19 + trait_ord20 + env1 + env2 + env3 + env4 + env5 +  
#>     env6 + env7 + env8 + env9 + env10 + (trait_cont1 + trait_cont2 +  
#>     trait_cont3 + trait_cont4 + trait_cont5 + trait_cont6 + trait_cont7 +  
#>     trait_cont8 + trait_cont9 + trait_cont10 + trait_cat11 +  
#>     trait_cat12 + trait_cat13 + trait_cat14 + trait_cat15 + trait_ord16 +  
#>     trait_ord17 + trait_bin18 + trait_bin19 + trait_ord20):(env1 +  
#>     env2 + env3 + env4 + env5 + env6 + env7 + env8 + env9 + env10) +  
#>     (1 | species) + (1 | site_id)
#> Data: longDF
#> 
#>       AIC       BIC    logLik -2*log(L)  df.resid 
#>   38196.0   40239.4  -18819.0   37638.0     10926 
#> 
#> Random effects:
#> 
#> Conditional model:
#>  Groups  Name        Variance  Std.Dev.
#>  species (Intercept) 0.0007666 0.02769 
#>  site_id (Intercept) 0.0067449 0.08213 
#> Number of obs: 11205, groups:  species, 27; site_id, 415
#> 
#> Dispersion parameter for tweedie family (): 7.98 
#> 
#> Conditional model:
#>                                 Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)                    1.4704424  0.2628156   5.595 2.21e-08 ***
#> trait_cont1                   -0.5101186  0.1596020  -3.196 0.001393 ** 
#> trait_cont2                   -0.2327202  0.0678723  -3.429 0.000606 ***
#> trait_cont3                   -0.0341002  0.1094644  -0.312 0.755407    
#> trait_cont4                    0.0130361  0.0849067   0.154 0.877977    
#> trait_cont5                    0.0729463  0.0994374   0.734 0.463199    
#> trait_cont6                    0.0968116  0.1181499   0.819 0.412560    
#> trait_cont7                   -0.0688003  0.0874939  -0.786 0.431666    
#> trait_cont8                    0.1176131  0.1168984   1.006 0.314361    
#> trait_cont9                   -0.0103170  0.0824478  -0.125 0.900418    
#> trait_cont10                   0.1304327  0.0805313   1.620 0.105307    
#> trait_cat11grassland           0.0306508  0.1759612   0.174 0.861716    
#> trait_cat11wetland            -0.1600557  0.2324683  -0.689 0.491134    
#> trait_cat12nocturnal          -0.2304255  0.1110776  -2.074 0.038037 *  
#> trait_cat13multivoltine       -0.0010464  0.0923656  -0.011 0.990961    
#> trait_cat13univoltine         -0.2102297  0.1253302  -1.677 0.093463 .  
#> trait_cat14generalist          0.0071383  0.2479466   0.029 0.977032    
#> trait_cat14nectarivore         0.2505427  0.1645857   1.522 0.127943    
#> trait_cat15resident           -0.0296779  0.1293586  -0.229 0.818540    
#> trait_ord16                    0.0100260  0.0466039   0.215 0.829665    
#> trait_ord17                   -0.0374465  0.0314953  -1.189 0.234457    
#> trait_bin18                    0.0873029  0.0915280   0.954 0.340165    
#> trait_bin19                    0.0315366  0.1155415   0.273 0.784895    
#> trait_ord20medium              0.1233020  0.2515604   0.490 0.624029    
#> trait_ord20small              -0.1603479  0.1475234  -1.087 0.277067    
#> env1                           0.5422358  0.5433334   0.998 0.318289    
#> env2                          -1.0286903  0.5629959  -1.827 0.067674 .  
#> env3                          -0.4640513  0.8122554  -0.571 0.567788    
#> env4                           0.5304439  0.6134961   0.865 0.387245    
#> env5                           0.4719702  0.6761339   0.698 0.485151    
#> env6                           0.3931038  0.6982169   0.563 0.573427    
#> env7                           0.3775495  0.9974476   0.379 0.705048    
#> env8                          -0.2439993  1.1758269  -0.208 0.835609    
#> env9                          -0.6138702  1.2123491  -0.506 0.612613    
#> env10                         -0.4518295  1.2848177  -0.352 0.725087    
#> trait_cont1:env1               0.4542883  0.3307548   1.373 0.169600    
#> trait_cont1:env2              -0.1657942  0.3429966  -0.483 0.628833    
#> trait_cont1:env3              -0.0672174  0.4913060  -0.137 0.891178    
#> trait_cont1:env4               0.3310830  0.3695503   0.896 0.370302    
#> trait_cont1:env5              -0.0086909  0.4129877  -0.021 0.983211    
#> trait_cont1:env6              -0.0522244  0.4194517  -0.125 0.900914    
#> trait_cont1:env7              -0.2542360  0.6089449  -0.418 0.676311    
#> trait_cont1:env8              -0.1987654  0.7093235  -0.280 0.779310    
#> trait_cont1:env9               0.0132204  0.7344460   0.018 0.985638    
#> trait_cont1:env10              0.2976046  0.7825323   0.380 0.703716    
#> trait_cont2:env1               0.1224603  0.1401736   0.874 0.382318    
#> trait_cont2:env2               0.1257619  0.1457562   0.863 0.388234    
#> trait_cont2:env3               0.0510393  0.2097336   0.243 0.807732    
#> trait_cont2:env4              -0.0545704  0.1554956  -0.351 0.725630    
#> trait_cont2:env5              -0.0523391  0.1758109  -0.298 0.765931    
#> trait_cont2:env6              -0.0236931  0.1776132  -0.133 0.893879    
#> trait_cont2:env7               0.0193329  0.2592569   0.075 0.940556    
#> trait_cont2:env8              -0.0035217  0.3032631  -0.012 0.990735    
#> trait_cont2:env9               0.0114218  0.3123773   0.037 0.970833    
#> trait_cont2:env10             -0.1187163  0.3333231  -0.356 0.721721    
#> trait_cont3:env1               0.0224383  0.2268712   0.099 0.921215    
#> trait_cont3:env2               0.0197380  0.2373843   0.083 0.933734    
#> trait_cont3:env3               0.1063384  0.3360710   0.316 0.751686    
#> trait_cont3:env4               0.2250068  0.2545347   0.884 0.376700    
#> trait_cont3:env5               0.0438643  0.2815770   0.156 0.876206    
#> trait_cont3:env6               0.0931172  0.2868353   0.325 0.745456    
#> trait_cont3:env7              -0.3698838  0.4171191  -0.887 0.375209    
#> trait_cont3:env8              -0.2242540  0.4868664  -0.461 0.645081    
#> trait_cont3:env9              -0.1435396  0.5033282  -0.285 0.775506    
#> trait_cont3:env10              0.1991328  0.5354910   0.372 0.709990    
#> trait_cont4:env1               0.0662273  0.1761968   0.376 0.707013    
#> trait_cont4:env2               0.0482631  0.1848803   0.261 0.794054    
#> trait_cont4:env3               0.0475713  0.2618060   0.182 0.855815    
#> trait_cont4:env4               0.0558466  0.1984742   0.281 0.778419    
#> trait_cont4:env5              -0.0079881  0.2180135  -0.037 0.970772    
#> trait_cont4:env6               0.1322373  0.2240845   0.590 0.555109    
#> trait_cont4:env7               0.0030004  0.3240575   0.009 0.992613    
#> trait_cont4:env8              -0.0559403  0.3797565  -0.147 0.882891    
#> trait_cont4:env9              -0.0373883  0.3919177  -0.095 0.923998    
#> trait_cont4:env10             -0.2119408  0.4155868  -0.510 0.610066    
#> trait_cont5:env1              -0.3157162  0.2076197  -1.521 0.128348    
#> trait_cont5:env2              -0.0020636  0.2157308  -0.010 0.992368    
#> trait_cont5:env3              -0.0658514  0.3077739  -0.214 0.830578    
#> trait_cont5:env4               0.0134922  0.2354324   0.057 0.954300    
#> trait_cont5:env5               0.0261295  0.2553548   0.102 0.918498    
#> trait_cont5:env6               0.2519034  0.2647715   0.951 0.341402    
#> trait_cont5:env7              -0.4533944  0.3789051  -1.197 0.231466    
#> trait_cont5:env8               0.0661443  0.4466947   0.148 0.882284    
#> trait_cont5:env9               0.0510576  0.4586971   0.111 0.911371    
#> trait_cont5:env10              0.5789141  0.4873776   1.188 0.234907    
#> trait_cont6:env1              -0.1881089  0.2447584  -0.769 0.442161    
#> trait_cont6:env2               0.1347617  0.2537602   0.531 0.595378    
#> trait_cont6:env3               0.1762242  0.3631814   0.485 0.627518    
#> trait_cont6:env4              -0.1655427  0.2731632  -0.606 0.544501    
#> trait_cont6:env5              -0.0584509  0.3052173  -0.192 0.848129    
#> trait_cont6:env6               0.1809909  0.3105673   0.583 0.560045    
#> trait_cont6:env7               0.0136916  0.4483815   0.031 0.975640    
#> trait_cont6:env8              -0.0287384  0.5247439  -0.055 0.956324    
#> trait_cont6:env9               0.1670829  0.5425820   0.308 0.758128    
#> trait_cont6:env10             -0.0216889  0.5775002  -0.038 0.970041    
#> trait_cont7:env1               0.0117718  0.1832154   0.064 0.948770    
#> trait_cont7:env2              -0.2494385  0.1908231  -1.307 0.191155    
#> trait_cont7:env3              -0.2403364  0.2736175  -0.878 0.379745    
#> trait_cont7:env4               0.1897891  0.2073779   0.915 0.360095    
#> trait_cont7:env5               0.0307101  0.2285526   0.134 0.893112    
#> trait_cont7:env6               0.2824628  0.2343997   1.205 0.228185    
#> trait_cont7:env7              -0.1028858  0.3374557  -0.305 0.760452    
#> trait_cont7:env8              -0.0941266  0.3973214  -0.237 0.812732    
#> trait_cont7:env9               0.0188233  0.4101149   0.046 0.963392    
#> trait_cont7:env10              0.0654068  0.4330113   0.151 0.879936    
#> trait_cont8:env1              -0.1719497  0.2425198  -0.709 0.478316    
#> trait_cont8:env2              -0.2490949  0.2505595  -0.994 0.320148    
#> trait_cont8:env3              -0.1298487  0.3597634  -0.361 0.718153    
#> trait_cont8:env4              -0.0541162  0.2704139  -0.200 0.841384    
#> trait_cont8:env5               0.1029516  0.3019067   0.341 0.733100    
#> trait_cont8:env6               0.2066006  0.3083429   0.670 0.502835    
#> trait_cont8:env7               0.0301432  0.4444334   0.068 0.945926    
#> trait_cont8:env8               0.0388618  0.5189531   0.075 0.940306    
#> trait_cont8:env9               0.0657501  0.5389646   0.122 0.902904    
#> trait_cont8:env10              0.1524384  0.5715950   0.267 0.789708    
#> trait_cont9:env1               0.0491704  0.1729237   0.284 0.776144    
#> trait_cont9:env2              -0.1516373  0.1800992  -0.842 0.399807    
#> trait_cont9:env3               0.0404439  0.2559539   0.158 0.874447    
#> trait_cont9:env4               0.1468507  0.1926070   0.762 0.445799    
#> trait_cont9:env5               0.1078422  0.2144027   0.503 0.614972    
#> trait_cont9:env6               0.1373539  0.2175442   0.631 0.527790    
#> trait_cont9:env7               0.0709711  0.3175835   0.223 0.823168    
#> trait_cont9:env8              -0.1044847  0.3705180  -0.282 0.777946    
#> trait_cont9:env9              -0.0241682  0.3826458  -0.063 0.949638    
#> trait_cont9:env10             -0.0707903  0.4081366  -0.173 0.862300    
#> trait_cont10:env1             -0.0161651  0.1675005  -0.097 0.923117    
#> trait_cont10:env2             -0.0123918  0.1748189  -0.071 0.943490    
#> trait_cont10:env3             -0.1351959  0.2499217  -0.541 0.588540    
#> trait_cont10:env4             -0.0179706  0.1863072  -0.096 0.923158    
#> trait_cont10:env5             -0.0960179  0.2093890  -0.459 0.646548    
#> trait_cont10:env6              0.0523432  0.2122691   0.247 0.805227    
#> trait_cont10:env7              0.1417525  0.3084044   0.460 0.645780    
#> trait_cont10:env8              0.0911406  0.3615158   0.252 0.800959    
#> trait_cont10:env9              0.1438915  0.3728547   0.386 0.699557    
#> trait_cont10:env10            -0.1892620  0.3969231  -0.477 0.633488    
#> trait_cat11grassland:env1     -0.2166084  0.3633703  -0.596 0.551102    
#> trait_cat11wetland:env1       -0.2864278  0.4824768  -0.594 0.552739    
#> trait_cat11grassland:env2      0.3342700  0.3784489   0.883 0.377094    
#> trait_cat11wetland:env2       -0.0904607  0.5010602  -0.181 0.856730    
#> trait_cat11grassland:env3      0.0548656  0.5379108   0.102 0.918759    
#> trait_cat11wetland:env3        0.0121837  0.7162052   0.017 0.986427    
#> trait_cat11grassland:env4     -0.1520046  0.4004652  -0.380 0.704265    
#> trait_cat11wetland:env4        0.4100571  0.5381755   0.762 0.446096    
#> trait_cat11grassland:env5     -0.0669323  0.4505183  -0.149 0.881895    
#> trait_cat11wetland:env5       -0.0382453  0.6020639  -0.064 0.949350    
#> trait_cat11grassland:env6     -0.4388071  0.4552669  -0.964 0.335123    
#> trait_cat11wetland:env6       -0.1781153  0.6077280  -0.293 0.769458    
#> trait_cat11grassland:env7     -0.1155986  0.6691581  -0.173 0.862846    
#> trait_cat11wetland:env7        0.0118782  0.8885123   0.013 0.989334    
#> trait_cat11grassland:env8      0.1636730  0.7758684   0.211 0.832923    
#> trait_cat11wetland:env8       -0.1067334  1.0347932  -0.103 0.917848    
#> trait_cat11grassland:env9     -0.1814162  0.8019406  -0.226 0.821029    
#> trait_cat11wetland:env9       -0.1069967  1.0669591  -0.100 0.920120    
#> trait_cat11grassland:env10     0.0621179  0.8608062   0.072 0.942473    
#> trait_cat11wetland:env10       0.1178661  1.1409306   0.103 0.917719    
#> trait_cat12nocturnal:env1     -0.0754443  0.2309471  -0.327 0.743915    
#> trait_cat12nocturnal:env2     -0.0296988  0.2400987  -0.124 0.901557    
#> trait_cat12nocturnal:env3      0.0047181  0.3406933   0.014 0.988951    
#> trait_cat12nocturnal:env4      0.0006826  0.2569039   0.003 0.997880    
#> trait_cat12nocturnal:env5     -0.1068315  0.2872327  -0.372 0.709942    
#> trait_cat12nocturnal:env6      0.2186171  0.2904297   0.753 0.451608    
#> trait_cat12nocturnal:env7     -0.4323136  0.4256436  -1.016 0.309786    
#> trait_cat12nocturnal:env8      0.0687424  0.4939589   0.139 0.889319    
#> trait_cat12nocturnal:env9      0.1660581  0.5108386   0.325 0.745128    
#> trait_cat12nocturnal:env10     0.5392907  0.5458439   0.988 0.323155    
#> trait_cat13multivoltine:env1   0.1368508  0.1909488   0.717 0.473566    
#> trait_cat13univoltine:env1     0.2841505  0.2577352   1.102 0.270249    
#> trait_cat13multivoltine:env2   0.0681931  0.1971593   0.346 0.729434    
#> trait_cat13univoltine:env2    -0.0341924  0.2653602  -0.129 0.897474    
#> trait_cat13multivoltine:env3  -0.0409377  0.2844874  -0.144 0.885580    
#> trait_cat13univoltine:env3    -0.0574466  0.3794755  -0.151 0.879673    
#> trait_cat13multivoltine:env4  -0.0495623  0.2158698  -0.230 0.818408    
#> trait_cat13univoltine:env4     0.1227831  0.2886493   0.425 0.670566    
#> trait_cat13multivoltine:env5  -0.1020816  0.2363995  -0.432 0.665873    
#> trait_cat13univoltine:env5    -0.2213424  0.3186085  -0.695 0.487233    
#> trait_cat13multivoltine:env6  -0.2882216  0.2432146  -1.185 0.235997    
#> trait_cat13univoltine:env6     0.2284586  0.3278184   0.697 0.485862    
#> trait_cat13multivoltine:env7  -0.0209723  0.3503083  -0.060 0.952261    
#> trait_cat13univoltine:env7     0.0771967  0.4670323   0.165 0.868714    
#> trait_cat13multivoltine:env8   0.0419555  0.4116171   0.102 0.918814    
#> trait_cat13univoltine:env8     0.0414558  0.5478425   0.076 0.939681    
#> trait_cat13multivoltine:env9  -0.0118302  0.4233286  -0.028 0.977705    
#> trait_cat13univoltine:env9     0.3152771  0.5656912   0.557 0.577301    
#> trait_cat13multivoltine:env10  0.0253027  0.4497710   0.056 0.955137    
#> trait_cat13univoltine:env10   -0.2688425  0.6025053  -0.446 0.655447    
#> trait_cat14generalist:env1     0.2606976  0.5164928   0.505 0.613737    
#> trait_cat14nectarivore:env1   -0.2360857  0.3408356  -0.693 0.488518    
#> trait_cat14generalist:env2     0.0418664  0.5331459   0.079 0.937409    
#> trait_cat14nectarivore:env2   -0.0044629  0.3533997  -0.013 0.989924    
#> trait_cat14generalist:env3    -0.0139439  0.7682749  -0.018 0.985519    
#> trait_cat14nectarivore:env3    0.1527119  0.5020463   0.304 0.760992    
#> trait_cat14generalist:env4     0.1510262  0.5776250   0.261 0.793737    
#> trait_cat14nectarivore:env4    0.0284546  0.3790514   0.075 0.940161    
#> trait_cat14generalist:env5    -0.2015317  0.6417709  -0.314 0.753503    
#> trait_cat14nectarivore:env5   -0.0326672  0.4232302  -0.077 0.938476    
#> trait_cat14generalist:env6    -0.3943444  0.6552377  -0.602 0.547285    
#> trait_cat14nectarivore:env6   -0.0958633  0.4293834  -0.223 0.823335    
#> trait_cat14generalist:env7     0.3523372  0.9512120   0.370 0.711078    
#> trait_cat14nectarivore:env7    0.2099320  0.6256045   0.336 0.737198    
#> trait_cat14generalist:env8    -0.0441667  1.1089876  -0.040 0.968232    
#>  [ reached 'max' / getOption("max.print") -- omitted 75 rows ]
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note: This model forms the core of the invasion fitness workflow, allowing us to predict how novel trait-environment combinations are likely to perform (abundance-wise) in different sites, and enables robust estimation of both direct and interactive drivers of community assembly.  


9. Simulate & Predict Invaders

This section demonstrates how to simulate trait profiles for hypothetical invaders and then use the fitted abundance model to predict their expected performance across all sites. This forms a crucial part of modern invasion biology workflows: moving from theory-driven invader trait generation to data-driven, site-level predictions of establishment and performance.

9.1. Simulate hypothetical invaders

To evaluate how potential invaders might perform, we generate novel species by resampling trait values from the empirical distribution of the resident pool using the simulate_invaders() function. This approach allows you to:

  • Preserve realism by keeping trait ranges and levels identical to those observed in the resident community.
  • Explore unobserved combinations (when using column-wise sampling) or maintain natural trait correlations (when using row-wise sampling).
  • Control how numeric traits are drawn - bootstrap from observed values, draw from a truncated normal distribution, or sample from a uniform distribution within observed bounds.

By default, traits are sampled independently by column (mode = "columnwise"), which generates novel combinations. Alternatively, setting mode = "rowwise" samples entire resident profiles, preserving observed cross-trait covariance.

# Simulate 10 invaders from the resident trait table
set.seed(42)
inv_traits = simulate_invaders(
  resident_traits = spp_trait, # your resident species traits table
  n_inv           = 10, # number of invaders to create
  species_col     = "species", # ID column for species
  mode            = "columnwise", # or "rowwise"
  numeric_method  = "bootstrap", # or "normal", "uniform"
  seed            = 42
)
# Check results
head(inv_traits)
#>      species trait_cont1 trait_cont2 trait_cont3 trait_cont4 trait_cont5 trait_cont6 trait_cont7
#> inv1    inv1  0.47317663  -0.9853317  0.02881542  -0.8287759  0.07475339  -0.6596750  -0.3658933
#> inv2    inv2  0.31398458   0.8114763  0.65631697   0.9252160  0.70096550   0.3603284   0.8361128
#> inv3    inv3  0.02842357   0.5693856  0.69937944   0.1272937  0.70096550   0.4670559   0.8358081
#> inv4    inv4 -0.08451645  -0.5846821  0.43871168   0.4797106  0.20694817   0.3571857   0.4947222
#> inv5    inv5  0.86934449   0.6658322  0.16320801   0.8660683  0.65788426  -0.8394711   0.4845329
#> inv6    inv6 -0.21959307  -0.1060607  0.51908854   0.3348530  0.14695180  -0.9418284  -0.3658933
#>      trait_cont8 trait_cont9 trait_cont10 trait_cat11 trait_cat12  trait_cat13 trait_cat14
#> inv1  0.06374887   0.9931054 -0.001454239      forest   nocturnal multivoltine detritivore
#> inv2 -0.77538352  -0.5086517 -0.957099543      forest   nocturnal   univoltine detritivore
#> inv3 -0.09737830  -0.8268388 -0.460567643     wetland   nocturnal    bivoltine nectarivore
#> inv4 -0.89374103  -0.8647264  0.881129756   grassland   nocturnal multivoltine  generalist
#> inv5 -0.42450050  -0.6819552 -0.957099543   grassland     diurnal    bivoltine nectarivore
#> inv6  0.67751007  -0.3726324 -0.623131341     wetland     diurnal   univoltine  generalist
#>      trait_cat15 trait_ord16 trait_ord17 trait_bin18 trait_bin19 trait_ord20
#> inv1   migratory           1           2           1           1       large
#> inv2   migratory           3           2           0           1       small
#> inv3   migratory           1           2           1           1      medium
#> inv4   migratory           1           2           1           1      medium
#> inv5    resident           2           1           0           1      medium
#> inv6   migratory           1           3           0           1       large

Note: Use mode = "rowwise" when you want to preserve realistic trait correlations OR Use mode = "columnwise" when exploring hypothetical trait combinations beyond those observed.  

9.2. Predict abundances across all sites

With simulated invaders in hand, we use the predict_invader_response() function to generate expected abundances (our performance proxy) for every species-site combination as follows:

  • Takes a fitted model (e.g., from glmmTMB, lme4, glm, or gam)
  • Crosses the species traits table (residents + simulated invaders) with the site environment table
  • Harmonizes factor levels with the model’s training data
  • Calls the model’s native predict() method to return:
    • The full prediction grid (newdata)
    • A long table of predictions (predictions)
    • A wide site × species prediction matrix (prediction_matrix)

Note: Why use population-level (fixed-effects) predictions?

  • For novel invaders, the random-effect levels (e.g., species- or site-specific intercepts) are unknown.
  • Accordingly, predict_invader_response() defaults to setting random effects to zero, producing population-level predictions driven solely by traits, environmental covariates, and their interactions. This is the appropriate scale for invasion screening and comparative risk assessment.
  • Set include_random = TRUE only if you require conditional predictions that incorporate known random-effect levels from the fitted model (and, where supported, allow estimation for new levels).
     
# Combine residents and invaders into a single trait table
all_traits = bind_rows(
  spp_trait, # Resident species traits (must include 'species' column)
  inv_traits # Output from simulate_invaders()
)
head(all_traits[1:4,1:4])
#>                                               species trait_cont1 trait_cont2 trait_cont3
#> Utetheisa pulchella               Utetheisa pulchella  0.02842357  -0.2030292 -0.99685889
#> Danaus chrysippus orientis Danaus chrysippus orientis  0.03819190  -0.2237834  0.02882587
#> Telchinia serena                     Telchinia serena -0.83512488  -0.3065035  0.02881542
#> Vanessa cardui                         Vanessa cardui -0.21959307   0.5693856  0.16320801
tail(all_traits[1:4,1:4])
#>                                               species trait_cont1 trait_cont2 trait_cont3
#> Utetheisa pulchella               Utetheisa pulchella  0.02842357  -0.2030292 -0.99685889
#> Danaus chrysippus orientis Danaus chrysippus orientis  0.03819190  -0.2237834  0.02882587
#> Telchinia serena                     Telchinia serena -0.83512488  -0.3065035  0.02881542
#> Vanessa cardui                         Vanessa cardui -0.21959307   0.5693856  0.16320801

# Predict across all sites × all species using predict_invader_response()
pred = predict_invader_response(
  model          = mod,
  species_traits = all_traits, # residents + invaders (traits)
  site_env       = site_env, # base site predictors
  species_col    = "species",
  site_col       = "site_id",
  response_type  = "response",
  include_random = FALSE,
  site_aug       = dplyr::select(spp_rich_obs, site_id, obs_sum, spp_rich)
)

# Inspect outputs
str(pred, max.level = 1)
#> List of 3
#>  $ newdata          : tibble [15,355 × 36] (S3: tbl_df/tbl/data.frame)
#>  $ predictions      :'data.frame':   15355 obs. of  3 variables:
#>  $ prediction_matrix: num [1:415, 1:37] 2.27 1.77 2.99 2.08 1.78 ...
#>   ..- attr(*, "dimnames")=List of 2
names(pred$newdata) # long table: site_id, species, pred
#>  [1] "species"      "site_id"      "trait_cont1"  "trait_cont2"  "trait_cont3"  "trait_cont4" 
#>  [7] "trait_cont5"  "trait_cont6"  "trait_cont7"  "trait_cont8"  "trait_cont9"  "trait_cont10"
#> [13] "trait_cat11"  "trait_cat12"  "trait_cat13"  "trait_cat14"  "trait_cat15"  "trait_ord16" 
#> [19] "trait_ord17"  "trait_bin18"  "trait_bin19"  "trait_ord20"  "x"            "y"           
#> [25] "env1"         "env2"         "env3"         "env4"         "env5"         "env6"        
#> [31] "env7"         "env8"         "env9"         "env10"        "obs_sum"      "spp_rich"
head(pred$predictions) # long table: site_id, species, pred
#>   site_id             species     pred
#> 1    1026 Utetheisa pulchella 1.761536
#> 2    1027 Utetheisa pulchella 1.561990
#> 3    1028 Utetheisa pulchella 1.771892
#> 4    1029 Utetheisa pulchella 1.364692
#> 5    1030 Utetheisa pulchella 1.843295
#> 6    1031 Utetheisa pulchella 1.359066
dim(pred$prediction_matrix) # sites × species matrix (for fitness/impact/risk calcs)
#> [1] 415  37

10. Biotic & Abiotic Constraints

Understanding how trait similarity (biotic constraint) and environmental matching (abiotic constraint) shape invasion outcomes requires explicit, quantitative representations of both processes. This section details three linked steps:

  1. Constructing the general trait-based interaction strength matrix \(g^{\mathrm{(all)}}_{ij}\)
  2. Deriving competition coefficients \(\alpha_{ij}\) from the underlying trait dissimilarities \(d_{ij}\)
  3. Quantifying environmental matching via species’ optima \(K_e(\Delta_{js};\sigma_e)\)

10.1 Interaction Strength

The general trait-based interaction strength \(g^{\mathrm{(all)}}_{ij}\) (g_all) quantifies the dissimilarity between the trait vectors \(\mathbf{t}_i\) and \(\mathbf{t}_j\) for any two species \(i\) and \(j\). When \(g^{\mathrm{(all)}}\) is computed as a Gower distance, it is numerically identical to \(d_{ij}\). We also compute the resident × site abundance matrix \(N^{*}_{js}\) (Nstar), representing the predicted equilibrium abundance of resident species \(j\) at site \(s\).

compute_interaction_strength() outputs:

# Identify all resident species (excluding invaders)
residents = rownames(spp_trait)

# Compute interaction strengths and resident abundance matrix
cis = compute_interaction_strength(
  traits      = all_traits, # residents + invaders
  predDF      = pred$predictions, # predictions from predict_invader_response()
  method      = "gower", # mixed traits
  kernel      = "distance", # keep as dissimilarity
  standardise = FALSE, # Gower is already [0,1]
  sparsify_k  = NULL # no sparsification for now
)

# Site × resident abundance matrix
# Note: Nstar is built from predictions and restricted to resident species
Nstar = cis$Nstar # site × residents
head(Nstar[1:2, 1:4])
#>                                1026     1027     1028     1029
#> Utetheisa pulchella        1.761536 1.561990 1.771892 1.364692
#> Danaus chrysippus orientis 3.239507 3.001678 3.677266 3.446281

# trait-based distances/dissimilarities (interaction strengths)
# Note: g_all[i, j] is the trait dissimilarity between species i and j
g_all = cis$g_all # species × species
tail(g_all[1:2, 1:4])
#>                            Utetheisa pulchella Danaus chrysippus orientis Telchinia serena
#> Utetheisa pulchella                   0.000000                   0.398072        0.4164589
#> Danaus chrysippus orientis            0.398072                   0.000000        0.4409018
#>                            Vanessa cardui
#> Utetheisa pulchella             0.5341222
#> Danaus chrysippus orientis      0.5248909

10.2 Competition

From the trait dissimilarities \(d_{ij}\) (obtained from \(g^{\mathrm{(all)}}_{ij}\) when it is a distance matrix), we derive trait-based competition coefficients \(\alpha_{ij}\) using the Gaussian trait kernel.

compute_competition_kernel() outputs:

# Competition coefficients from trait distances
comp = compute_competition_kernel(
  g_all = g_all, # trait distance matrix (e.g., Gower)
  residents = residents, # resident IDs
  # invaders = paste0("inv", 1:n_inv),  # optional; default = non-residents
  # sigma_t  = NULL,             # optional; auto from resident-resident distances
  sigma_method = "sd" # or "median", "iqr"
)

# Optional: New named objects
sigma_t = comp$sigma_t # bandwidth used
d_ij = comp$d_ij # invader-resident distances
a_ij = comp$a_ij # Gaussian competition coefficients

# Check results
sigma_t
#> [1] 0.08758267
head(d_ij[1:4, 1:4])
#>      Utetheisa pulchella Danaus chrysippus orientis Telchinia serena Vanessa cardui
#> inv1           0.4428054                  0.3371734        0.3626310      0.5756327
#> inv2           0.4224789                  0.4655103        0.3639053      0.4594073
#> inv3           0.4136122                  0.4075364        0.3784909      0.5894313
#> inv4           0.3899103                  0.3182808        0.2564511      0.5408265
head(a_ij[1:4, 1:4])
#>      Utetheisa pulchella Danaus chrysippus orientis Telchinia serena Vanessa cardui
#> inv1        2.814174e-06               6.049429e-04     1.894032e-04   4.167436e-10
#> inv2        8.856219e-06               7.337331e-07     1.783117e-04   1.060064e-06
#> inv3        1.435855e-05               1.987671e-05     8.803305e-05   1.461412e-10
#> inv4        4.968745e-05               1.356017e-03     1.374756e-02   5.247268e-09

# Example: 
# For invader i at site s, total expected competitive pressure (scalar) could be:
# `pressure_{i,s} = sum_j a_{ij} * N*_j,s`
# which combines trait overlap (a_ij) with resident context (N*).

Note: - High trait similarity (low \(d_{ij}\)) → strong competition (high \(\alpha_{ij}\)) - High trait difference (high \(d_{ij}\)) → weak competition (low \(\alpha_{ij}\)) - Lowering \(\sigma_t\) sharpens niche separation; useful for sensitivity tests - Ensure g_all is a distance matrix before applying the Gaussian kernel; if it’s already kernelised, retrieve the original \(d_{ij}\) first  

10.3 Environmental Filtering

We estimate each species’ environmental optimum from predicted abundances and site-level environmental variables, then compute site-species distances \(\Delta_{js}\). These are converted to the Gaussian environmental similarity kernel \(K_e(\Delta_{js};\sigma_e)\) using bandwidth \(\sigma_e\).

compute_environment_kernel() outputs:

  • env_opt — species × environmental optima \(\theta_j\)
  • env_dist — site × species environmental distances \(\Delta_{js}\)
  • K_env — Gaussian environmental similarity kernel \(K_e\)
# Build environmental kernel and distances
ek = compute_environment_kernel(
  site_env = site_env, # site_id, x, y, env1:env10, ...
  predictions = pred$predictions, # species, site_id, pred (from predict_invader_response)
  site_col = "site_id",
  method = "gower",
  kernel = "gaussian", # "distance" or "similarity" also supported
  sigma_method = "sd" # or "median", "iqr"
)

# Optional: New named objects
sigma_e = ek$sigma_e # bandwidth used for Gaussian kernel
env_opt = ek$env_opt # species × env (optima)
env_dist = ek$env_dist # sites × species (distance)
g_env = ek$K_env # sites × species (Gaussian similarity), if requested

# Check results
sigma_e
#> [1] 0.0495915
head(env_opt[1:4, 1:4])
#>                                   env1        env2        env3       env4
#> Utetheisa pulchella        -0.45572568 -0.25326724 -0.05813116  0.1956375
#> Danaus chrysippus orientis -0.09139371 -0.19942460  0.22554528  0.1762052
#> Telchinia serena           -0.15295732 -0.10760847  0.16373671  0.2630910
#> Vanessa cardui              0.17546879 -0.03901351  0.02578274 -0.1113054
head(env_dist[1:4, 1:4])
#>      Utetheisa pulchella Danaus chrysippus orientis Telchinia serena Vanessa cardui
#> 1026           0.2919083                  0.2901035        0.2891394      0.2443649
#> 1027           0.3004139                  0.2986090        0.2976450      0.2528705
#> 1028           0.3051308                  0.3033259        0.3023619      0.2575874
#> 1029           0.3230038                  0.3211989        0.3202349      0.2754604
head(g_env[1:4, 1:4])
#>      Utetheisa pulchella Danaus chrysippus orientis Telchinia serena Vanessa cardui
#> 1026        2.994181e-08               3.707048e-08     4.152730e-08   5.339339e-06
#> 1027        1.075079e-08               1.339372e-08     1.505410e-08   2.259750e-06
#> 1028        6.015103e-09               7.519820e-09     8.467670e-09   1.385049e-06
#> 1029        6.137287e-10               7.773871e-10     8.815289e-10   1.996438e-07

11. Invasion Fitness

The invasion fitness of each hypothetical invader at each site is determined by combining predicted growth potential \(r_{is}\), the summed effect of trait-based competition \(C^{\mathrm{(raw)}}_{is}\), and the degree of environmental matching \(K_e(\Delta_{js};\sigma_e)\).

11.1 Competitive Pressure with Environmental Context

For site \(s\), invader \(i\), resident \(j\), the impact tensor \(I_{ijs}\) combines:

assemble_matrices() outputs:

# Assemble site- and species-specific competition/impact matrices
am = assemble_matrices(
  a_ij            = comp$a_ij, # invader × resident competition coefficients
  Nstar           = cis$Nstar, # resident abundances (residents × sites)
  predictions     = pred$predictions, # provides invader r_is (sites × invaders) if needed
  K_env           = ek$K_env # environmental kernel (sites × residents), or env_dist + sigma_e
  # env_dist        = ek$env_dist,      # OR Pass distances and σ to compute the Gaussian kernel internally
  # sigma_e         = ek$sigma_e
)

# Check results
str(am, max.level = 1)
#> List of 3
#>  $ I_raw            : num [1:10, 1:27, 1:415] 1.35e-12 1.06e-11 3.40e-12 7.14e-12 2.01e-16 ...
#>   ..- attr(*, "dimnames")=List of 3
#>  $ pressure_inv_site: num [1:10, 1:415] 5.39e-07 1.45e-06 1.23e-06 5.49e-07 1.57e-05 ...
#>   ..- attr(*, "dimnames")=List of 2
#>  $ meta             :List of 6

# 3D impact tensor: [invader, resident, site]
dim(am$I_raw)
#> [1]  10  27 415

# Total competitive pressure per invader × site (sum over residents)
pressure = am$pressure_inv_site
dim(pressure) # invaders × sites
#> [1]  10 415

11.2 Invasion Fitness Calculation

Given \(C^{\mathrm{(raw)}}_{is}\) and predicted growth \(r_{is}\), invasion fitness \(\lambda_{is}\) can be computed with different scaling or transformation variants:

  • Raw penalty:
    \[\lambda^{(\mathrm{raw})}_{is} = r_{is} - C^{(\mathrm{raw})}_{is}\]

  • Richness-scaled: \[\lambda^{(\mathrm{scaled})}_{is} = r_{is} - \frac{C^{(\mathrm{raw})}_{is}}{J_s}\] where \(J_s\) is resident richness at site \(s\).  

  • Relative-abundance weighted: \[\lambda^{(\mathrm{rel})}_{is} = r_{is} - \left( \sum_j a_{ij} N^{(\mathrm{rel})}_{js} \right)\]

  • Logistic penalty (bounded): \[\lambda^{(\mathrm{logis})}_{is} = r_{is} - \mathrm{inv\_logit}\!\left(k \left[C_{\mathrm{target},is} - x_0\right]\right)\]

Outputs of compute_invasion_fitness(), depending on variant specified:

# Compute invasion fitness (uses am$I_raw from assemble_matrices and pred$predictions)
fitness = compute_invasion_fitness(
  I_raw        = am$I_raw, # or pressure_inv_site = am$pressure_inv_site
  predictions  = pred$predictions, # builds r_mat internally
  a_ij         = comp$a_ij, # needed for lambda_rel / logistic_on = "rel"
  Nstar        = cis$Nstar,
  logistic_on  = "rel", # cap the A %*% N_rel penalty OR "raw"
  k            = 1,
  x0           = NULL, # auto: median penalty
  prefer       = "logis" # returned as $lambda OR c("logis", "rel", "raw", "scaled")
)

# Check results
str(fitness, max.level = 1)
#> List of 8
#>  $ C_raw        : num [1:10, 1:415] 5.39e-07 1.45e-06 1.23e-06 5.49e-07 1.57e-05 ...
#>   ..- attr(*, "dimnames")=List of 2
#>  $ r_mat        : num [1:10, 1:415] 9.09 22.79 4.49 2.73 7.63 ...
#>   ..- attr(*, "dimnames")=List of 2
#>  $ lambda_raw   : num [1:10, 1:415] 9.09 22.79 4.49 2.73 7.63 ...
#>   ..- attr(*, "dimnames")=List of 2
#>  $ lambda_scaled: num [1:10, 1:415] 9.09 22.79 4.49 2.73 7.63 ...
#>   ..- attr(*, "dimnames")=List of 2
#>  $ lambda_rel   : num [1:10, 1:415] 9.09 22.79 4.49 2.72 7.63 ...
#>   ..- attr(*, "dimnames")=List of 2
#>  $ lambda_logis : num [1:10, 1:415] 8.59 22.29 3.99 2.23 7.13 ...
#>   ..- attr(*, "dimnames")=List of 2
#>  $ lambda       : num [1:10, 1:415] 8.59 22.29 3.99 2.23 7.13 ...
#>   ..- attr(*, "dimnames")=List of 2
#>  $ meta         :List of 5

# Key outputs
C_raw = fitness$C_raw # invader × site penalty
lambda_mat = fitness$lambda # final invasion fitness (invader × site)

# Inspect invasion fitness matrix
dim(lambda_mat) # invaders × sites
#> [1]  10 415

12. Visualisation, Ranking & Clustering

This section summarises invasion fitness outcomes using plots and rankings that diagnose which invaders and locations pose the greatest establishment risk, show how that risk is distributed across the study region, and support both hypothesis generation and management prioritisation.

12.1 Invasion Fitness Matrix (invader × site)

We visualise the invasion fitness matrix lambda_mat, where each entry \(\lambda_{is}\) represents the low-density per-capita growth rate of invader \(i\) in site \(s\) as computed by compute_invasion_fitness().

This matrix is central to the invasion framework because it enables:

  • Direct comparison of all simulated invaders across all sites.
  • Identification of susceptible sites (rows/columns with consistently high \(\lambda_{is}\)).
  • Detection of broadly adapted invaders versus specialists.

Ecological interpretation:

  • Higher \(\lambda_{is}\) (lighter colours) → favourable conditions and/or weak competition.
  • Lower \(\lambda_{is}\) (darker colours) → strong competitive exclusion and/or poor environmental match.
# Convert the invasion fitness matrix to long format for ggplot and attach spatial coordinates
lambda_df = as_tibble(lambda_mat, rownames = "invader") %>%
  # Reshape from wide (invader × site) to long (invader, site, fitness value)
  pivot_longer(-invader, names_to = "site_id", values_to = "lambda") %>%
  # Add spatial coordinates for each site, matched by site_id
  mutate(
    x = site_xy$x[match(site_id, site_xy$site_id)], # Longitude/Easting
    y = site_xy$y[match(site_id, site_xy$site_id)] # Latitude/Northing
  )

# Order sites by space; invaders by mean lambda
site_order = site_xy$site_id[order(site_xy$x, site_xy$y)]
inv_order = lambda_df |>
  group_by(invader) |>
  summarise(mu = mean(lambda, na.rm = TRUE), .groups = "drop") |>
  arrange(desc(mu)) |>
  pull(invader)

# Format data.frame for ggplot
lambda_df = lambda_df |>
  mutate(
    site_id = factor(site_id, levels = site_order),
    invader = factor(invader, levels = inv_order)
  )

# Create color scaling: winsorize high tail (or set trans = "sqrt")
cap = quantile(lambda_df$lambda, 0.99, na.rm = TRUE)

# Plot invasion fitness matrix
ggplot(lambda_df, aes(x = site_id, y = invader, fill = pmin(lambda, cap))) +
  geom_raster() + # faster, cleaner than geom_tile with borders
  scale_fill_viridis_c(
    option = "magma", direction = -1,
    limits = c(0, cap), oob = scales::squish,
    name = expression("Invasion fitness" ~ lambda[i * k])
  ) +
  labs(
    title = "Invasion Fitness Matrix: Invaders Across Sites",
    x = "Site", y = "Invader"
  ) +
  theme_minimal(base_size = 10) +
  theme(
    panel.grid = element_blank(),
    axis.text.y = element_text(size = 8),
    axis.text.x = element_text(angle = 90, vjust = 0.5, size = 6)
  ) +
  # 3) show every Nth site label to avoid overlap
  scale_x_discrete(
    breaks = function(x) x[seq(1, length(x), by = 10)]
  ) +
  guides(fill = guide_colorbar(
    title.position = "top",
    barheight = grid::unit(40, "mm"),
    barwidth = grid::unit(4, "mm")
  ))

Figure 11: Invasion fitness matrix (invader × site). Heatmap of low-density per-capita growth rates (\(\lambda_{is}\)) for each invader (rows) across sites (columns). Higher fitness values (yellow-orange) indicate favourable conditions or weak competitive exclusion, while lower values (dark purple) reflect strong exclusion or poor environmental match. This matrix enables direct comparison of invader performance, highlighting both broadly adapted invaders (e.g. inv3, inv4) and more restricted specialists (e.g. inv7, inv10). The heterogeneous site-wise patterns underscore spatial variation in invasion potential, while inter-row differences reveal trait-driven variation in invasiveness. This unclustered baseline provides the foundation for subsequent ranking and clustering analyses.


12.2 Invader Invasiveness Ranking

From the invasion fitness matrix lambda_mat, we derive the species-level invasiveness \(V_i\), defined as the mean (or sum) of \(\lambda_{is}\) over all sites for invader \(i\). This integrates both biotic resistance and environmental filtering across the full landscape, producing a single quantitative measure of establishment potential.

Ranking invaders by \(V_i\) reveals:

  • High-risk “super-invaders” — species with high and geographically broad \(\lambda_{is}\), posing widespread establishment potential.
  • Low-risk invaders — species predicted to be strongly excluded in most or all sites.

Such rankings are valuable for risk prioritisation, focusing management and surveillance on species most likely to succeed under current conditions. In plots, bar height represents cumulative \(\lambda_{is}\) (summed over sites), ordered from highest to lowest \(V_i\).

# Sum invasion fitness across sites for each invader and order from highest to lowest
rank_df = lambda_df %>%
  group_by(invader) %>%
  summarise(total_lambda = sum(lambda, na.rm = TRUE)) %>%
  arrange(desc(total_lambda)) %>%
  mutate(invader = factor(invader, levels = invader))

# Bar plot of total invasion fitness by invader
ggplot(
  rank_df,
  aes(x = invader, y = sqrt(total_lambda), fill = sqrt(total_lambda))
) +
  geom_col(width = 0.7, color = "grey50") +
  scale_fill_viridis_c(
    option = "inferno",
    direction =1,
    name = expression("Total fitness" ~ sum(lambda[i]))
  ) +
  labs(
    title = "Invader Ranking by Total Growth Potential",
    x = "Invader",
    y = expression("Total invasion fitness " * sum(lambda[i * k]))
  ) +
  theme_minimal(base_size = 12) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1),
    panel.grid.major.x = element_blank(),
    legend.position = "top"
  )

> Figure 12: Invader invasiveness ranking by total growth potential. Bar plot of total invasion fitness ($ _{is}$) across all sites, providing a landscape-integrated measure of each invader’s establishment potential. Higher bars and lighter colours indicate stronger overall invasiveness. Invaders such as inv3 and inv4 emerge as “super-invaders,” with consistently high growth potential across the landscape, while inv7 and inv10 show the lowest cumulative fitness, suggesting they are unlikely to establish broadly. This ranking highlights which species pose the greatest invasion risk and thus merit prioritisation in monitoring and management.

12.3 Site-level Invasibility Metrics

We summarise lambda_mat across invaders to quantify site-level invasibility \(V_s\) — the openness of each community to establishment. For each site \(s\), we compute:

  • Proportion of invaders with positive fitness — fraction of invaders with \(\lambda_{is} > 0\). This metric only has ecological meaning if some \(\lambda_{is}\) can be negative (i.e. invaders that fail to establish). If all values are positive, the proportion saturates at 1 and becomes redundant.
  • Count of successful invaders — number of \(i\) with \(\lambda_{is} > 0\).
  • Mean invasion fitness — average \(\lambda_{is}\) across invaders, indicating general establishment potential.

Spatial mapping of \(V_s\) reveals:

  • Hotspots — sites with high proportions of positive \(\lambda_{is}\), indicating low biotic resistance and/or favourable abiotic conditions.
  • Potential refugia — sites with low proportions, where community structure or environmental context limits establishment.

Tile labels often show counts of successful invaders to complement proportional risk with absolute numbers.

# Summarize invasion fitness at each site:
site_sum = lambda_df %>%
  group_by(site_id, x, y) %>%
  summarize(
    mean_l   = mean(lambda), # Mean invasion fitness across invaders
    prop_pos = mean(lambda > 0), # Proportion of invaders with positive fitness
    cnt_pos  = sum(lambda > 0), # Number of successful invaders
    cnt_neg  = sum(lambda < 0) # Number of unsuccessful invaders
  ) %>%
  ungroup()

# Map proportion of successful invaders at each site
ggplot(site_sum, aes(x = x, y = y, fill = prop_pos)) +
  geom_tile(color = NA) +
  geom_sf(data = rsa, inherit.aes = FALSE, fill = NA, color = "black", size = 0.5) +
  scale_fill_viridis_c(
    option = "inferno",
    direction = 1,                     # ← reverse the palette
    labels = scales::percent_format(accuracy = 1),
    name = "% Invaders with\nPositive Fitness"
  ) +
  geom_text(aes(label = cnt_pos), color = "white", size = 2.5) +
  coord_sf() +
  labs(
    title = "Site-level Invasibility (Proportion of Successful Invaders)",
    x = "Longitude", y = "Latitude"
  ) +
  theme_minimal(base_size = 12) +
  theme(panel.grid = element_blank())

Figure 13: Site-level Invasibility Metrics - This map depicts the site-level invasibility \(V_s\), defined as the proportion of invaders with positive invasion fitness (\(\lambda_{is} > 0\)) at each site. Darker cells indicate invasion “hotspots” where a high fraction of invaders succeed, reflecting weak biotic resistance and/or favourable abiotic conditions. Lighter cells indicate sites with stronger resistance or limiting environments, where fewer invaders can establish. Numbers within cells denote the absolute count of successful invaders, complementing proportional risk with absolute richness of potential invaders.

Important note: The proportion of positive fitness values is only ecologically meaningful if some \(\lambda_{is}\) are negative (i.e. invaders can fail to establish). If all \(\lambda_{is} > 0\), the metric saturates at 1 and loses discriminatory power. Here we use the logis variant of invasion fitness, which yields both positive and negative values and thus provides informative spatial variation.


12.4 Mean Invasion Fitness by Site

We map the mean invasion fitness across invaders at each site i.e. the average \(\lambda_{is}\) over all \(i\) for a given site \(s\). This provides a continuous index of community openness, complementing the proportion-based \(V_s\):

  • High mean values — communities generally favourable to a broad range of potential invaders.
  • Low mean values — communities broadly resistant due to environmental mismatch, strong resident competition, or both.
  • Note: unlike the proportion metric, the mean remains informative even if all $_{is}$ are positive, because it reflects the magnitude of establishment success rather than just its occurrence.

Sites with moderately high mean values but low \(V_s\) may be vulnerable to a few highly adapted invaders.

# Map the mean invasion fitness of all invaders at each site
ggplot(site_sum, aes(x = x, y = y, fill = mean_l)) +
  geom_tile(color = NA) +
  geom_sf(data = rsa, inherit.aes = FALSE, fill = NA, color = "black", size = 0.5) +
  scale_fill_viridis_c(option = "inferno", name = "Mean invasion\nfitness") +
  # Optional: annotate with number of successful invaders
  # geom_text(aes(label = cnt_pos), color = "white", size = 1.5) +
  coord_sf() +
  labs(
    title = "Mean Species Invasiveness by Site (All Invaders)",
    x = "Longitude",
    y = "Latitude"
  ) +
  theme_minimal(base_size = 12) +
  theme(panel.grid = element_blank())

Figure 14: Mean invasion fitness by site - This map shows the mean invasion fitness (\(\overline{\lambda}_{\cdot s}\)) across all invaders at each site, providing a continuous measure of community openness to invasion. Warmer colours (yellow–orange) indicate sites where invaders tend to achieve higher fitness, reflecting low biotic resistance and/or favourable abiotic conditions. Cooler colours (purple–black) mark sites where invaders perform poorly on average, indicating stronger resistance or environmental filtering.

Important note: Unlike the proportion-based invasibility metric (\(V_s\)), which saturates at 1 if all \(\lambda_{is} > 0\), the mean remains informative even in such cases because it reflects magnitude rather than mere occurrence of establishment. Importantly, sites with moderately high mean fitness but low proportions of successful invaders may signal selective vulnerability—open to only a few highly adapted invaders rather than broadly permissive.


12.5 Clustering and Risk Scenarios

We identify clusters of sites and invaders with similar invasion fitness patterns using hierarchical clustering of \(\lambda_{is}\) from lambda_mat. This reveals ecological profiles or invasion signatures — types of sites most at risk, or types of invaders most threatening — enabling targeted surveillance and intervention.

12.5.1 Hierarchical Clustering of Invaders and Sites

Hierarchical clustering groups:

  • Rows (invaders) by similarity of their \(\lambda_{is}\) profiles across sites.
  • Columns (sites) by similarity of their \(\lambda_{is}\) profiles across invaders.

The heatmap displays \(\lambda_{is}\) values as cells, with dendrograms indicating clusters. This approach distinguishes:

  • Invader types — Broad-spectrum high-risk vs. site-specialists.
  • Site types — Broadly open vs. strongly resistant communities.

These groups can be used for categorical risk classification and scenario exploration in downstream analyses.

# Remove invaders (rows) and sites (columns) with all NA values (if present)
# This ensures only meaningful data are visualized and clustered
lambda_mat_noNA = lambda_mat[
  rowSums(is.na(lambda_mat)) < ncol(lambda_mat),
  colSums(is.na(lambda_mat)) < nrow(lambda_mat)
]

# Assign invaders and sites to discrete clusters (profiles) for downstream visualization
# Sites: cluster by invasion fitness profile (columns)
site_dist = dist(t(scale(lambda_mat_noNA))) # standardize first then compute pairwise distances between sites
site_clust = hclust(site_dist, method = "ward.D2") # Hierarchical clustering
kj = factoextra::fviz_nbclust(t(scale(lambda_mat_noNA)), kmeans, method = "silhouette") # or gap
site_groups = cutree(site_clust, k = 5) # Assign each site to one of 5 clusters

# Invaders: cluster by fitness profile across sites (rows)
invader_dist = dist(scale(lambda_mat_noNA)) # Scale then compute pairwise distances between invaders
invader_clust = hclust(invader_dist, method = "ward.D2")
ki = factoextra::fviz_nbclust(t(scale(lambda_mat_noNA)), kmeans, method = "silhouette") # or gap
invader_groups = cutree(invader_clust, k = 5) # Assign each invader to one of 5 clusters

# Attach cluster labels to site and invader summary dataframes
# Add site cluster group to site summary dataframe (site_sum)
# site_id must match the column names of lambda_mat_noNA
site_sum$site_cluster = factor(site_groups[site_sum$site_id])

# Add invader cluster group to long-form fitness dataframe (lambda_df)
# invader must match the row names of lambda_mat_noNA
lambda_df$invader_cluster = factor(invader_groups[lambda_df$invader])
# Clustered heatmap of invasion fitness matrix (invader × site)
# - Each cell shows the invasion fitness (lambda) for a given invader-site pair
# - Hierarchical clustering groups similar invaders and similar sites
pheatmap(lambda_mat_noNA,
  color = rev(viridis::viridis(100, option = "viridis")),
  clustering_distance_rows = "euclidean",
  clustering_distance_cols = "euclidean",
  clustering_method = "complete",
  fontsize_row = 8, fontsize_col = 8,
  main = "Clustered Invasion Fitness Matrix (Invader × Site)",
  angle_col = 45
)


# These cluster assignments can be used for:
# - Color-coding sites and invaders in subsequent plots
# - Group-based analyses or statistical comparisons
# - Mapping spatial patterns of invasion risk by ecological “scenario”

Figure 15: Clustered invasion fitness matrix (invader × site) - This clustered heatmap displays invasion fitness values (\(\lambda_{is}\)) for each invader–site pair, with hierarchical clustering applied to both invaders (rows) and sites (columns). The colour scale ranges from low (purple) to high (yellow) invasion fitness, while dendrograms identify groups of invaders with similar performance profiles and sites with similar resistance/openness syndromes. The analysis distinguishes: i) Invader types: Broad-spectrum, high-fitness invaders (e.g., inv3, inv4) versus specialists with patchy or constrained success (e.g., inv7, inv10); and ii) Site types: Broadly open communities permissive to many invaders versus strongly resistant communities that suppress invasion success.

Important note: This two-way clustering highlights joint structure in invader performance and community susceptibility, providing a basis for categorical risk classification and ecological “scenario” analysis. These groupings can inform both strategic monitoring (e.g., targeting clusters of high-risk invaders) and management planning (e.g., identifying site clusters most vulnerable to invasion).

12.5.2 Mapping Site-level Risk Categories

Cluster assignments are converted into descriptive invasibility categories (e.g., very-high, high, medium, low, very-low) based on mean \(\lambda_{is}\) within each cluster. Using the mean ensures that clusters capture not only whether invaders establish, but also the overall strength of establishment success across sites.

  • High-risk site categories — clusters with generally high \(V_s\) and high mean \(\lambda_{is}\).
  • Low-risk site categories — clusters with low \(V_s\) and consistently negative or low \(\lambda_{is}\).

Mapping these categories provides a spatial overview of invasibility profiles, highlighting geographic concentrations of risk. Parallel invader-category assignments allow alignment of high-risk invaders with high-risk sites for targeted management scenarios.

# ------------------------------- 
# 1) INVADER CATEGORIES
# ------------------------------- 
invader_summary = lambda_df %>%
  dplyr::group_by(invader, invader_cluster) %>%
  dplyr::summarise(mean_lambda = mean(lambda, na.rm = TRUE), .groups = "drop")

cluster_means = invader_summary %>%
  dplyr::group_by(invader_cluster) %>%
  dplyr::summarise(cluster_mean = mean(mean_lambda, na.rm = TRUE)) %>%
  dplyr::arrange(dplyr::desc(cluster_mean))

category_labels = c("very-high", "high", "medium", "low", "very-low")[1:nrow(cluster_means)]
cluster_map = stats::setNames(category_labels, cluster_means$invader_cluster)

invader_summary = invader_summary %>%
  dplyr::mutate(invader_category = factor(cluster_map[as.character(invader_cluster)],
                                          levels = category_labels))

lambda_df = lambda_df %>%
  dplyr::mutate(invader_category = factor(cluster_map[as.character(invader_cluster)],
                                          levels = category_labels))

# -------------------------------------------------------------
# 2) SITE GROUPING: ClustGeo ONLY (spatially-constrained Ward)
# -------------------------------------------------------------
if (!requireNamespace("ClustGeo", quietly = TRUE)) {
  stop("ClustGeo not installed. install.packages('ClustGeo')")
}

# Build λ matrix with aligned columns = site_id, rows = invaders
lambda_mat_noNA = lambda_mat[
  rowSums(is.na(lambda_mat)) < ncol(lambda_mat),
  colSums(is.na(lambda_mat)) < nrow(lambda_mat),
  drop = FALSE
]

# Match coordinates to kept sites
site_ids = colnames(lambda_mat_noNA)
site_coords = site_sum[match(site_ids, site_sum$site_id), c("x", "y")]
ok = stats::complete.cases(site_coords)

# Distances: λ-profile (D1) and geographic (D0)
X  = scale(t(lambda_mat_noNA[, ok, drop = FALSE]))      # sites × invaders
D1 = dist(X)
D0 = dist(as.matrix(site_coords[ok, , drop = FALSE]))

alpha = 0.3                                             # 0 = spatial only, 1 = lambda only
tree  = ClustGeo::hclustgeo(D0, D1, alpha = alpha)

# Choose k (fixed here; adjust as needed)
k = 5
site_groups_ok = stats::cutree(tree, k = k)             # named by kept site_ids

# Reinsert NA for unmatched coords and attach to site_sum
site_groups = stats::setNames(rep(NA_integer_, length(site_ids)), site_ids)
site_groups[names(site_groups_ok)] = site_groups_ok

# Order clusters by mean site fitness and map to ordered risk labels
tmp = data.frame(site_id = names(site_groups), cluster = site_groups) |>
  dplyr::left_join(site_sum[, c("site_id", "mean_l")], by = "site_id") |>
  dplyr::group_by(cluster) |>
  dplyr::summarise(mu = mean(mean_l, na.rm = TRUE), .groups = "drop") |>
  dplyr::arrange(dplyr::desc(mu))

site_labels = c("very-high", "high", "medium", "low", "very-low")[seq_len(nrow(tmp))]
remap = stats::setNames(site_labels, tmp$cluster)

site_sum = site_sum %>%
  dplyr::mutate(
    site_cluster  = remap[as.character(site_groups[match(site_id, names(site_groups))])],
    site_category = factor(site_cluster, levels = site_labels)
  )

# -----------------------------
# 3) Map the site categories
# -----------------------------
ggplot(site_sum, aes(x = x, y = y, fill = site_category)) +
  geom_tile(color = NA) +
  geom_sf(data = rsa, inherit.aes = FALSE, fill = NA, color = "black", size = 0.5) +
  scale_fill_brewer(palette = "RdYlBu", direction = 1, name = "Site invasibility") +
  coord_sf() +
  labs(
    title = "Spatial Invasion Risk (Spatially Constrained ClustGeo)",
    x = "Longitude", y = "Latitude"
  ) +
  theme_minimal(base_size = 12) +
  theme(panel.grid = element_blank())

Figure 16: Mapping site-level invasion risk categories (spatially constrained ClustGeo clustering) Sites are assigned to invasibility categories (very-high, high, medium, low, very-low) based on the mean invasion fitness (\(\lambda_{is}\)) of invaders establishing within each cluster. By using the mean, categories capture both the likelihood of establishment and the overall strength of invasion success across sites. High-risk categories (very-high, high) reflect communities with consistently high site openness (\(V_s\)) and elevated mean \(\lambda_{is}\), indicating broad vulnerability to invasion. Low-risk categories (low, very-low) are characterised by low \(V_s\) and suppressed \(\lambda_{is}\), reflecting resistant communities. The spatially constrained clustering (ClustGeo) ensures that these risk categories respect both ecological similarity in invasion profiles and geographic proximity. This mapping provides a spatial overview of invasibility syndromes across South Africa, highlighting regional hotspots of invasion risk. When coupled with invader-category assignments, the framework enables alignment of high-risk invaders with high-risk sites, supporting targeted monitoring and management scenario development.


13. Synthesising Invasion Fitness Insights

This section distills the spatial, clustering, and trait-based analyses into targeted summaries of invasion risk patterns. All outputs are derived from the invasion fitness matrix \(\lambda_{is}\) computed in compute_invasion_fitness(), where \(i\) indexes invaders and \(s\) indexes sites.

13.1 Distribution of Invasion Fitness Values

A histogram of all \(\lambda_{is}\) values provides a system-wide view of community openness to invasion:

  • Right-skewed distributions → prevalence of favourable establishment conditions.
  • Left-skewed distributions → dominance of exclusionary environments.

This complements the site- and invader-specific patterns in Section 12.

# Plot the distribution of invasion fitness values (lambda) across all invader-site combinations,
# using a histogram colored by lambda value for visual emphasis on fitness extremes
ggplot(lambda_df, aes(x = lambda, fill = ..x..)) +
  geom_histogram(bins = 40, color = "grey30") +
  scale_fill_viridis_c(option = "magma", guide = "none") +
  labs(
    title = "Distribution of Invasion Fitness Values (All Invader × Site)",
    x = expression("Invasion Fitness" ~ lambda[i * k]),
    y = "Frequency"
  ) +
  theme_minimal(base_size = 12)

Figure 17: Distribution of invasion fitness values across all invader–site combinations Histogram of invasion fitness (\(\lambda_{is}\)) values summarising the system-wide distribution of establishment success. The distribution is strongly right-skewed, indicating that while many invader-site combinations experience exclusionary or marginal conditions, a substantial subset encounter highly favourable environments for establishment. This pattern complements site- and invader- level clustering (Figures 12-15), providing an overall view of community openness to invasion and highlighting the prevalence of rare but extreme invasion opportunities.

13.2 Top and Bottom Invaders and Sites

For rapid prioritisation, we rank invaders and sites by their mean \(\lambda_{is}\):

  • Top-ranked invaders — consistently high predicted establishment success across sites.
  • Top-ranked sites — consistently permissive to a wide range of invaders.
  • Bottom-ranked invaders/sites — consistently low \(\lambda_{is}\), indicating strong biotic or abiotic resistance.

These rankings are direct, interpretable summaries of the invasion fitness landscape.

#> ==== Top 3 Invaders by Mean Invasion Fitness ====
#> 1. inv3: 5.703
#> 2. inv4: 5.611
#> 3. inv9: 5.14
#> ==== Bottom 3 Invaders by Mean Invasion Fitness ====
#> 1. inv2: 2.816
#> 2. inv10: 2.3
#> 3. inv7: 2.289
#> ==== Top 3 Sites by Mean Invasion Fitness ====
#> 1. 558: 8.559
#> 2. 823: 8.338
#> 3. 1026: 8.173
#> ==== Bottom 3 Sites by Mean Invasion Fitness ====
#> 1. 729: 1.046
#> 2. 614: 0.952
#> 3. 802: 0.927

13.3 Functional Correlates of Invasion Success

To explore trait–fitness relationships, we join \(\lambda_{is}\) summaries with invader trait data:

  • Continuous traits — correlation coefficients with mean \(\lambda_{is}\) identify traits most positively or negatively associated with invasion success.
  • Categorical traits — mean \(\lambda_{is}\) per trait level identifies functional profiles linked to high or low establishment.

This supports hypothesis generation about mechanistic drivers of invasion potential, complementing the purely spatial and compositional analyses of Section 12.

#> ==== Top 3 Continuous Traits by Correlation with Mean Invasion Fitness ====
#> 1. trait_cont10: 0.403
#> 2. trait_cont6: 0.321
#> 3. trait_cont5: 0.302
#> 
#> ==== Bottom 3 Continuous Traits by Correlation with Mean Invasion Fitness ====
#> 1. trait_cont2: -0.278
#> 2. trait_cont9: -0.459
#> 3. trait_cont1: -0.505
#> ==== Categorical Traits: Top Value per Trait by Mean Invasion Fitness ====
#> trait_cat11: grassland (4.53)
#> trait_cat12: nocturnal (4.01)
#> trait_cat13: bivoltine (4.25)
#> trait_cat14: nectarivore (4.56)
#> trait_cat15: migratory (4.36)

13.4 Faceted Maps for Key Invaders

We map \(\lambda_{is}\) for the top 3 and bottom 3 invaders from Section 13.2:

  • Top invaders — spatially explicit “hotspots” of establishment potential.
  • Bottom invaders — regions where their traits or niches are mismatched to the community or environment.

These maps link functional traits, site conditions, and predicted invasion risk in an interpretable, management-relevant format.

# Identify the top 3 and bottom 3 invaders by mean fitness
key_invaders = c(top3_inv$invader, bottom3_inv$invader) # select 3 best/worst

# Filter df amd ensure facet order matches ranking (top first, then bottom)
lambda_key = lambda_df %>%
  filter(invader %in% key_invaders) %>%
  mutate(invader = factor(invader, levels = key_invaders)) # enforce desired order

# Faceted map
# Spatial invasion fitness for each of the top/bottom invaders, enabling direct visual comparison
ggplot(lambda_key, aes(x = x, y = y, fill = lambda)) +
  geom_tile() +
  geom_sf(data = rsa, inherit.aes = FALSE, fill = NA, color = "black", size = 0.5) +
  facet_wrap(~invader, ncol = 3) +
  scale_fill_viridis_c(option = "viridis", direction = -1) +
  labs(
    title = "Spatial Summaries of Invasion Fitness\nfor Top-3 and Bottom-3 Invaders",
    x = "Longitude", y = "Latitude"
  ) +
  theme_minimal(base_size = 12) +
  theme(panel.grid = element_blank())

Figure 18: Spatial invasion fitness patterns for top- and bottom-ranked invaders Faceted maps of invasion fitness (\(\lambda_{is}\)) for the three invaders with the highest mean fitness (top row) and the three with the lowest (bottom row). Top-performing invaders show clear spatial “hotspots” of establishment potential, often corresponding to particular regions with trait–environment matches. In contrast, poorly performing invaders show consistently low fitness values, indicating mismatches between their traits and local community or environmental conditions. Together, these maps link trait-based predictions to spatial risk assessments, providing an interpretable basis for identifying regions of high invasion susceptibility and guiding management prioritisation.

Note: Together, sections 12–13 move from raw \(\lambda_{is}\) values → clustered patternsfunctional correlatessite- and invader- specific rankings. This progression enables scaling from broad ecological profiles to actionable risk profiles for specific species and sites.  


14. Discussion and Conclusion

The code test demonstrates that the invasimapr package provides a practical and transparent framework for quantifying both trait dispersion and site-level invasibility. By linking species-level traits with community-level invasion outcomes, the framework moves beyond binary assessments of invasion success and instead quantifies the relative strength of establishment. This refinement improves ecological interpretation by allowing users to identify not only whether invasion is possible, but also how strongly invaders are likely to establish under given trait and environmental contexts. Importantly, the integration of trait-based analyses with invasion fitness metrics creates a mechanistic and scalable approach that can be applied in conservation management and policy settings. The findings indicate that trait dispersion may function as an early-warning signal of invasion risk, while site-level invasibility metrics highlight areas of ecological susceptibility. Such outputs have clear conservation value, providing managers with repeatable and evidence-based tools to prioritise surveillance, allocate resources, and design interventions before invasive species impacts become entrenched. Some limitations remain. The utility of the package is currently constrained by the availability and consistency of trait data across species and sites, and by assumptions embedded within the modelling framework that may simplify complex ecological interactions. Further development will focus on incorporating phylogenetic relationships, environmental gradients, and spatial heterogeneity to improve predictive accuracy. In addition, broader testing across ecosystems with different invasion histories will be essential for evaluating transferability. However, with continued refinement, invasimapr has strong potential to become a central tool for proactive biodiversity management. Its ability to deliver rapid, repeatable assessments of invasibility is particularly relevant for large, protected landscapes where early detection and prevention of invasive species are critical for long-term conservation outcomes.


15. References

  • Hui, C., Pyšek, P., & Richardson, D.M. (2023). Disentangling the relationships among abundance, invasiveness, and invasibility in trait space. npj Biodiversity, 2, 13. https://doi.org/10.1038/s44185-023-00019-1
  • Hui, C., Richardson, D.M., Landi, P., Minoarivelo, H.O., & Roy, H.E. (2016). Defining invasiveness and invasibility in ecological networks. Biological Invasions. https://doi.org/10.1007/s10530-016-1076-7
  • Hui, C., Richardson, D.M., Landi, P., Minoarivelo, H.O., & Roy, H.E. (2021). Trait positions for elevated invasiveness in adaptive ecological networks. Biological Invasions, 23, 1965–1985. https://doi.org/10.1007/s10530-021-02484-w
  • Kattge, J., Bönisch, G., Díaz, S., Lavorel, S., Prentice, I. C., Leadley, P., Tautenhahn, S., Werner, G., et al. (2020). TRY plant trait database - enhanced coverage and open access. Global Change Biology, 26(1), 119-188. doi:10.1111/gcb.14904)
  • Wang, Y, U Naumann, ST Wright & DI Warton (2012) mvabund - an R package for model-based analysis of multivariate abundance data. Methods in Ecology and Evolution 3: 471-474. (help: Multivariate Analysis with mvabund :: Environmental Computing)
  • Hui C, Richardson DM (2017). Invasion Dynamics. Oxford University Press.
  • Hui C et al. (2021). Trait-mediated ecological networks: mechanisms and consequences of invasion. Trends in Ecology & Evolution.

16. Glossary of definitions

Term Symbol Definition
Resident Abundance Equilibrium \(N^{*}_{js}\) Equilibrium abundance of resident species \(j\) at site \(s\) providing resident abundance context for interaction strength calculations.
Functional Trait Space \(d_{ij}\); \(g^{\mathrm{(all)}}_{ij}\) The multidimensional “map” of traits that describe how species use resources and interact with their environment.
Trait Centrality \(\pmb{\mu}\) The position of a species’ traits relative to the community average, indicating similarity or distinctiveness. We compute abundance-weighted centroid \(\pmb{\mu}\) and species centrality \(c_i\) in the same PCoA space used for \(d_{ij}\).
Trait Dispersion \(\mathrm{FDis}\); \(\mathrm{FRic}\); \(\mathrm{RaoQ}\) The degree of spread in trait space among species, reflecting ecological diversity and functional complementarity. Community dispersion is summarised by \(\mathrm{FDis}\), \(\mathrm{FRic}\) (convex hull volume), and \(\mathrm{RaoQ}\) based on the Euclidean distances underlying \(d_{ij}\).
Hypothetical Invaders \(\tilde{\mathbf{T}}\) Simulated species introduced to test how traits influence establishment success. Hypothetical invaders \(\tilde{\mathbf{T}}\) are sampled from resident trait distributions and evaluated via \(r_{is}\) and \(\lambda_{is}\).
Predicted Intrinsic Growth \(r_{is}\) Estimated growth rate of an invader under specific trait-environment conditions.
Interaction Strength \(I_{ijs}\) A measure of how strongly species affect each other’s growth rates, capturing competition or facilitation.
Competition Pressure \(\alpha_{ij}\) The penalty a species experiences due to trait overlap with resident species.
Environmental Filtering \(K_e(\Delta_{js};\sigma_e)\) The effect of environmental conditions in favouring or excluding certain traits.
Context-Dependent Competition \(C^{\mathrm{(raw)}}_{is}\) Competitive effects modified by environmental context.
Invasion Fitness \(\lambda_{is}\) A measure of an invader’s likelihood of successful establishment relative to residents.
Invasiveness Ranking \(V_i\) Ordering of species by their relative propensity to invade.
Site Invasibility Metrics \(V_s\) Measures of how open a community is to invasion, reflecting ecological “vacancies.”
  • a_ij[i, j] = invader × resident
  • K_env[s, j] = sites × residents
  • Nstar[j, s] = residents × sites
  • r_mat[i, s] = invader × site


Appendices

S1. Glossary table of symbols

Symbol & Link Equation Definition R Object / Slot(s) Function(s) Producing It
\(d_{ij}\) Pairwise trait dissimilarity (e.g. Gower) Functional trait distance between species \(i\) and \(j\); input to Gaussian trait kernel \(K_t\) comp$d_ij[i, j] or cis$g_all[i, j] if g_all computed as distance compute_interaction_strength() / compute_competition_kernel()
\(g^{\mathrm{(all)}}_{ij}\) Generalised trait distance/similarity depending on kernel choice Distance/similarity for all species pairs, used in interaction strength; for competition, must be a distance cis$g_all[i, j] compute_interaction_strength()
\(N^{*}_{js}\) Equilibrium abundance of resident species \(j\) at site \(s\) Resident abundance context for interaction strength calculations cis$Nstar[j, s] compute_interaction_strength()
\(\alpha_{ij}\) \(\alpha_{ij} = K_t(d_{ij}; \sigma_t)\) Competition coefficient between invader \(i\) and resident \(j\) based on trait similarity comp$a_ij[i, j] compute_competition_kernel()
\(\sigma_t\) Trait kernel bandwidth Controls width of Gaussian trait kernel \(K_t\) comp$sigma_t User-defined or estimated
\(\theta_j\) Environmental optimum of species \(j\) Trait–environment centroid for species \(j\) ek$env_opt[j, ] compute_environment_kernel()
\(\Delta_{js}\) \(\Delta_{js} = \| \mathbf{E}_s - \theta_j \|\) Environmental mismatch between site \(s\) and species \(j\) ek$env_dist[j, s] compute_environment_kernel()
\(K_e(\Delta_{js};\sigma_e)\) \(K_e = \exp\!\left( -\frac{\Delta_{js}^2}{2\sigma_e^2} \right)\) Environmental filtering weight ek$K_env[j, s] compute_environment_kernel()
\(\sigma_e\) Environmental kernel bandwidth Controls width of Gaussian environment kernel \(K_e\) ek$sigma_e User-defined or estimated
\(r_{is}\) Predicted intrinsic growth rate Growth rate of invader \(i\) at site \(s\) without biotic/abiotic penalties fitness$r_mat[i, s] predict_invader_response()
\(I_{ijs}\) \(I_{ijs} = \alpha_{ij} \cdot K_e(\Delta_{js}) \cdot N^{*}_{js}\) Impact tensor: per-site, per-resident contribution to invader’s penalty am$I_raw[i, j, s] assemble_matrices() / compute_interaction_strength()
\(C^{\mathrm{(raw)}}_{is}\) \(C^{\mathrm{(raw)}}*{is} = \sum*{j} I_{ijs}\) Total competitive + environmental penalty experienced by invader \(i\) at site \(s\) fitness$C_raw[i, s] compute_invasion_fitness()
\(\lambda_{is}\) \(\lambda_{is} = r_{is} - C^{\mathrm{(raw)}}_{is}\) Low-density per-capita growth rate of invader \(i\) at site \(s\) fitness$lambda[i, s] (also in lambda_mat) compute_invasion_fitness()
\(V_s\) \(V_s = \frac{1}{n_\mathrm{inv}} \sum_{i} \mathbb{1}[\lambda_{is} > 0]\) Invasibility: mean or proportion of invaders with positive \(\lambda_{is}\) at site \(s\) summary$Vs[s] Post-processing / summary functions
\(V_i\) \(V_i = \frac{1}{n_\mathrm{sites}} \sum_{s} \lambda_{is}\) Invasiveness: mean or sum of \(\lambda_{is}\) over sites for species \(i\) summary$-Vi[i] Post-processing / summary functions

Note: * a_ij[i, j] = invader × resident
* K_env[s, j] = sites × residents
* Nstar[j, s] = residents × sites
* r_mat[i, s] = invader × site