Skip to contents

Apart from simulating occurrence cubes from scratch, gcube can also be used to create cubes for virtual species generated using bioclimatic variables. This tutorial demonstrates how to generate a virtual species, apply a geographical limit, sample occurrences, and convert them into an occurrence cube.

Create virtual species

Virtual species can be created using the virtualspecies R package (Leroy et al., 2015). Below is a basic example of how to create a virtual species. For additional options and a more comprehensive guide, refer to the package documentation and the tutorial available here.

# Load packages
library(gcube)

library(geodata)        # get bioclimatic data
library(virtualspecies) # generate virtual species

library(sf)             # work with spatial objects
library(terra)          # work with raster data
library(ggplot2)        # data visualisation

Step 1: Download WorldClim Data

We first obtain WorldClim bioclimatic data at a 10-minute resolution.

# Download bioclimatic data
worldclim <- worldclim_global(var = "bio", res = 10, path = tempdir())

Step 2: Define the region of interest

We define the extent for the Western Palearctic and crop the climate data.

# Western Palearctic
wp <- terra::ext(-15, 65, 30, 75)

# Crop climate data
worldclim <- terra::crop(worldclim, wp)

Step 3: Select relevant bioclimatic variables

We select a subset of bioclimatic variables: bio2, bio5, bio6, bio13, bio14, and bio15.

# Specify variables
bio_vars <- c("bio2", "bio5", "bio6", "bio13", "bio14", "bio15")

# Select variables
bio_string <- paste0("bio_", sub("bio", "", bio_vars), collapse = "|")
bio_vars_selected <- names(worldclim)[grepl(bio_string, names(worldclim))]

# Subset climate data
worldclim_vars <- worldclim[[bio_vars_selected]]

Step 4: Generate a virtual species

We create a random virtual species based on the selected bioclimatic variables.

set.seed(123)

# Generate random virtual species
virtual_species <- generateRandomSp(worldclim_vars)
#> Reading raster values. If it fails for very large rasters, use arguments 'sample.points = TRUE' and define a number of points to sample with 'nb.point'.
#>  - Perfoming the pca
#>  - Defining the response of the species along PCA axes
#>  - Calculating suitability values
#>    The final environmental suitability was rescaled between 0 and 1.
#>                   To disable, set argument rescale = FALSE
#>  - Converting into Presence - Absence
#>    --- Generating a random value of beta for the logistic conversion
#>    --- Determing species.prevalence automatically according to alpha and beta
#>    Logistic conversion finished:
#>               
#> - beta = 0.194194194194194
#> - alpha = -0.1
#> - species prevalence =0.770632849948426

Distribution of random virtual species

Step 5: Apply a geographical limit

We limit the species distribution to the United Kingdom and Ireland.

# Limit distribution
virtual_distribution <- limitDistribution(
  x = virtual_species,
  geographical.limit = "country",
  area = c("United Kingdom", "Ireland"))
#> Warning: [rasterize] unexpected additional argument(s): silent

Limited distribution of random virtual species

Step 6: Sample occurrences

We sample 150 occurrences from the restricted distribution with a detection probability of 0.8 using the virtualspecies::sampleOccurrences() function. Detection probability and sampling bias can be introduced similar to sample_observations().

?virtualspecies::sampleOccurrences
# Sample from virtual species distribution
virtual_sample <- sampleOccurrences(
  virtual_distribution,
  n = 150,
  detection.probability = 0.8)

Samples of random virtual species

Grid designation with gcube

Now we can create an occurrence cube for this virtual species The most important step, is to make the virtualspecies output compatible with gcube input. This is done using the sf package. For grid designation, see also the tutorial 3. Specifying the grid designation process.

Step 1: Convert sampled points to sf format

We transform the sample points into an sf object to the LAEA Europe CRS (EPSG:3035).

# Create reference grid
detections_df_raw <- st_as_sf(
    virtual_sample$sample.points,
    coords = c("x", "y"),
    crs = terra::crs(virtual_sample$original.distribution.raster))
detections_df_raw <- st_transform(detections_df_raw, crs = 3035)

Step 2: Add coordinate uncertainty

We only keep the detected occurrences.

detections_df <- detections_df_raw[!is.na(detections_df_raw$Observed), ]

Optionally, we can add coordinate uncertainty using the add_coordinate_uncertainty() function. See also the tutorial 2. Simulating the detection process. We add 25 m uncertainty for each observation.

# Add coordinate uncertainty
observations_df <- add_coordinate_uncertainty(
  observations = detections_df,
  coords_uncertainty_meters = 25)

Step 3: Specify reference grid

We also need a grid. Each observation will be designated to a grid cell. You can provide your own grid (e.g. EEA reference grid for Europe). For this example we create a simple grid around the sampled points.

# Create reference grid
buffered_observations <- st_buffer(observations_df, 25)
cube_grid <- st_make_grid(
  buffered_observations,
  n = c(20, 20),
  square = TRUE) %>%
  st_sf()

The grid looks like this.

ggplot() +
  geom_sf(data = observations_df) +
  geom_sf(data = cube_grid, alpha = 0) +
  coord_sf(datum = 3035) +
  theme_minimal()

Grid used for creating data cube.

Step 4: Perform grid designation

We perform grid designation both in aggregated version and not for visualisation purposes.

# Perform grid designation
occurrence_cube_df <- grid_designation(
  observations_df,
  cube_grid,
  seed = 123)
#> Warning in sample_from_uniform_circle(observations, seed): No column `time_point` present!
#> Assuming only a single time point.
# Get sampled points from uncertainty circle
sampled_points <- grid_designation(
  observations_df,
  cube_grid,
  seed = 123,
  aggregate = FALSE)
#> Warning in sample_from_uniform_circle(observations, seed): No column `time_point` present!
#> Assuming only a single time point.

Lets visualise were the samples were taken. Note that no distinction is made between zeroes and NA values. Every absence gets a zero value.

ggplot() +
  geom_sf(data = occurrence_cube_df,
          alpha = 0) +
  geom_sf_text(data = occurrence_cube_df,
               aes(label = n)) +
  geom_sf(data = sampled_points,
          colour = "blue") +
  coord_sf(datum = 3035) +
  theme_minimal()

Distribution of random samples within uncertainty circle