Sample observations from a larger occurrence dataset
Source:R/sample_observations.R
sample_observations.Rd
The function computes observations from occurrences based on detection probability and sampling bias by implementing a Bernoulli trial.
Usage
sample_observations(
occurrences,
detection_probability = 1,
sampling_bias = c("no_bias", "polygon", "manual"),
bias_area = NA,
bias_strength = 1,
bias_weights = NA,
seed = NA
)
Arguments
- occurrences
An sf object with POINT geometry representing the occurrences.
- detection_probability
A numeric value between 0 and 1 representing the probability of detecting the species.
- sampling_bias
A character string specifying the method to generate a sampling bias. Options are
"no_bias"
,"polygon"
, or"manual"
."no_bias"
No bias is applied (default).
"polygon"
Bias the sampling within a polygon. Provide the polygon to
bias_area
and the bias strength tobias_strength
."manual"
Bias the sampling manually using a grid. Provide the grid layer in which each cell contains the probability of being sampled to
bias_weights
.
- bias_area
An
sf
object with POLYGON geometry, orNA
. Only used ifsampling_bias = "polygon"
. This defines the area in which the sampling will be biased.- bias_strength
A positive numeric value, or
NA
. Only used ifsampling_bias = "polygon"
. The value represents the strength of the bias to be applied within thebias_area
. Values greater than 1 will increase the sampling probability within the polygon relative to outside (oversampling), while values between 0 and 1 will decrease it (undersampling). For instance, a value of 50 will make the probability 50 times higher within thebias_area
compared to outside, whereas a value of 0.5 will make it half as likely.- bias_weights
A grid layer (an sf object with POLYGON geometry), or
NA
. Only used ifsampling_bias = "manual"
. The grid of bias weights to be applied. This sf object should contain abias_weight
column with the weights per grid cell. Higher weights increase the probability of sampling. Weights can be numeric values between 0 and 1 or positive integers, which will be rescaled to values between 0 and 1.- seed
A positive numeric value setting the seed for random number generation to ensure reproducibility. If
NA
(default), thenset.seed()
is not called at all. If notNA
, then the random number generator state is reset (to the state before calling this function) upon exiting this function.
Value
An sf object with POINT geometry containing the locations of the occurrence with detection status. The object includes the following columns:
detection_probability
The detection probability for each occurrence (will be the same for all).
bias_weight
The sampling probability based on sampling bias for each occurrence.
sampling_probability
The combined sampling probability from detection probability and sampling bias for each occurrence.
sampling_status
Indicates whether the occurrence was detected (
"detected"
) or not ("undetected"
). Detected occurrences are called observations.
See also
Other main:
add_coordinate_uncertainty()
,
filter_observations()
,
grid_designation()
,
simulate_occurrences()
Examples
# Load packages
library(sf)
library(dplyr)
# Simulate some occurrence data with coordinates and time points
num_points <- 10
occurrences <- data.frame(
lon = runif(num_points, min = -180, max = 180),
lat = runif(num_points, min = -90, max = 90),
time_point = 0
)
# Convert the occurrence data to an sf object
occurrences_sf <- st_as_sf(occurrences, coords = c("lon", "lat"))
# 1. Sample observations without sampling bias
sample_observations(
occurrences_sf,
detection_probability = 0.8,
sampling_bias = "no_bias",
seed = 123
)
#> Simple feature collection with 10 features and 5 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: -176.3391 ymin: -71.5952 xmax: 26.60207 ymax: 85.49863
#> CRS: NA
#> # A tibble: 10 × 6
#> time_point detection_probability bias_weight sampling_probability
#> <dbl> <dbl> <dbl> <dbl>
#> 1 0 0.8 1 0.8
#> 2 0 0.8 1 0.8
#> 3 0 0.8 1 0.8
#> 4 0 0.8 1 0.8
#> 5 0 0.8 1 0.8
#> 6 0 0.8 1 0.8
#> 7 0 0.8 1 0.8
#> 8 0 0.8 1 0.8
#> 9 0 0.8 1 0.8
#> 10 0 0.8 1 0.8
#> # ℹ 2 more variables: sampling_status <chr>, geometry <POINT>
# 2. Sample observations with sampling bias in a polygon
# Create bias_area polygon overlapping two of the points
selected_observations <- st_union(occurrences_sf[2:3,])
bias_area <- st_convex_hull(selected_observations) %>%
st_buffer(dist = 50) %>%
st_as_sf()
sample_observations(
occurrences_sf,
detection_probability = 0.8,
sampling_bias = "polygon",
bias_area = bias_area,
bias_strength = 2,
seed = 123
)
#> Simple feature collection with 10 features and 5 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: -176.3391 ymin: -71.5952 xmax: 26.60207 ymax: 85.49863
#> CRS: NA
#> # A tibble: 10 × 6
#> time_point detection_probability bias_weight sampling_probability
#> <dbl> <dbl> <dbl> <dbl>
#> 1 0 0.8 0.333 0.267
#> 2 0 0.8 0.667 0.533
#> 3 0 0.8 0.667 0.533
#> 4 0 0.8 0.667 0.533
#> 5 0 0.8 0.333 0.267
#> 6 0 0.8 0.667 0.533
#> 7 0 0.8 0.333 0.267
#> 8 0 0.8 0.333 0.267
#> 9 0 0.8 0.333 0.267
#> 10 0 0.8 0.667 0.533
#> # ℹ 2 more variables: sampling_status <chr>, geometry <POINT>
# 3. Sample observations with sampling bias given manually in a grid
# Create raster grid with bias weights between 0 and 1
grid <- st_make_grid(occurrences_sf) %>%
st_sf() %>%
mutate(bias_weight = runif(n(), min = 0, max = 1))
sample_observations(
occurrences_sf,
detection_probability = 0.8,
sampling_bias = "manual",
bias_weights = grid,
seed = 123
)
#> Simple feature collection with 10 features and 5 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: -176.3391 ymin: -71.5952 xmax: 26.60207 ymax: 85.49863
#> CRS: NA
#> # A tibble: 10 × 6
#> time_point detection_probability bias_weight sampling_probability
#> <dbl> <dbl> <dbl> <dbl>
#> 1 0 0.8 0.895 0.716
#> 2 0 0.8 0.782 0.626
#> 3 0 0.8 0.929 0.743
#> 4 0 0.8 0.161 0.129
#> 5 0 0.8 0.835 0.668
#> 6 0 0.8 0.0860 0.0688
#> 7 0 0.8 0.755 0.604
#> 8 0 0.8 0.755 0.604
#> 9 0 0.8 0.279 0.223
#> 10 0 0.8 0.0781 0.0625
#> # ℹ 2 more variables: sampling_status <chr>, geometry <POINT>