Skip to contents

This function calculates acceleration values, which quantify the sensitivity of a statistic’s variability to changes in the dataset. Acceleration is used for bias-corrected and accelerated (BCa) confidence intervals in calculate_bootstrap_ci().

Usage

calculate_acceleration(
  data_cube,
  fun,
  ...,
  grouping_var,
  ref_group = NA,
  influence_method = "usual",
  progress = FALSE
)

Arguments

data_cube

A data cube object (class 'processed_cube' or 'sim_cube', see b3gbi::process_cube()) or a dataframe (from $data slot of 'processed_cube' or 'sim_cube'). As used by bootstrap_cube(). To limit runtime, we recommend using a dataframe with custom function as fun.

fun

A function which, when applied to data_cube returns the statistic(s) of interest. This function must return a dataframe with a column diversity_val containing the statistic of interest. As used by bootstrap_cube().

...

Additional arguments passed on to fun.

grouping_var

A character vector specifying the grouping variable(s) for the bootstrap analysis. The function fun(data_cube, ...) should return a row per group. The specified variables must not be redundant, meaning they should not contain the same information (e.g., "time_point" (1, 2, 3) and "year" (2000, 2001, 2002) should not be used together if "time_point" is just an alternative encoding of "year"). This variable is used to split the dataset into groups for separate acceleration calculations.

ref_group

A string indicating the reference group to compare the statistic with. Default is NA, meaning no reference group is used. As used by bootstrap_cube().

influence_method

A string specifying the method used for calculating the influence values.

  • "usual": Negative jackknife (default if BCa is selected).

  • "pos": Positive jackknife

progress

Logical. Whether to show a progress bar for jackknifing. Set to TRUE to display a progress bar, FALSE (default) to suppress it.

Value

A dataframe containing the acceleration values per grouping_var.

Details

Acceleration quantifies how sensitive the variability of a statistic \(\theta\) is to changes in the data.

  • \(a=0\): The statistic's variability does not depend on the data (e.g., symmetric distribution)

  • \(a>0\): Small changes in the data have a large effect on the statistic's variability (e.g., positive skew)

  • \(a<0\): Small changes in the data have a smaller effect on the statistic's variability (e.g., negative skew).

It is used for BCa confidence interval calculation, which adjust for bias and skewness in bootstrapped distributions (Davison & Hinkley, 1997, Chapter 5). See also the empinf() function of the boot package in R (Canty & Ripley, 1999)). The acceleration is calculated as follows:

$$\hat{a} = \frac{1}{6} \frac{\sum_{i = 1}^{n}(I_i^3)}{\left( \sum_{i = 1}^{n}(I_i^2) \right)^{3/2}}$$

where \(I_i\) denotes the influence of data point \(x_i\) on the estimation of \(\theta\). \(I_i\) can be estimated using jackknifing. Examples are (1) the negative jackknife: \(I_i = (n-1)(\hat{\theta} - \hat{\theta}_{-i})\), and (2) the positive jackknife \(I_i = (n+1)(\hat{\theta}_{-i} - \hat{\theta})\) (Frangos & Schucany, 1990). Here, \(\hat{\theta}_{-i}\) is the estimated value leaving out the \(i\)’th data point \(x_i\). The boot package also offers infinitesimal jackknife and regression estimation. Implementation of these jackknife algorithms can be explored in the future.

If a reference group is used, jackknifing is implemented in a different way. Consider \(\hat{\theta} = \hat{\theta}_1 - \hat{\theta}_2\) where \(\hat{\theta}_1\) is the estimate for the indicator value of a non-reference period (sample size \(n_1\)) and \(\hat{\theta}_2\) is the estimate for the indicator value of a reference period (sample size \(n_2\)). The acceleration is now calculated as follows:

$$\hat{a} = \frac{1}{6} \frac{\sum_{i = 1}^{n_1 + n_2}(I_i^3)}{\left( \sum_{i = 1}^{n_1 + n_2}(I_i^2) \right)^{3/2}}$$

\(I_i\) can be calculated using the negative or positive jackknife. Such that

\(\hat{\theta}_{-i} = \hat{\theta}_{1,-i} - \hat{\theta}_2 \text{ for } i = 1, \ldots, n_1\), and

\(\hat{\theta}_{-i} = \hat{\theta}_{1} - \hat{\theta}_{2,-i} \text{ for } i = n_1 + 1, \ldots, n_1 + n_2\)

References

Canty, A., & Ripley, B. (1999). boot: Bootstrap Functions (Originally by Angelo Canty for S) [Computer software]. https://CRAN.R-project.org/package=boot

Davison, A. C., & Hinkley, D. V. (1997). Bootstrap Methods and their Application (1st ed.). Cambridge University Press. doi:10.1017/CBO9780511802843

Frangos, C. C., & Schucany, W. R. (1990). Jackknife estimation of the bootstrap acceleration constant. Computational Statistics & Data Analysis, 9(3), 271–281. doi:10.1016/0167-9473(90)90109-U

See also

Examples

# Get example data
# install.packages("b3gbi", repos = "https://b-cubed-eu.r-universe.dev")
library(b3gbi)
cube_path <- system.file(
  "extdata", "denmark_mammals_cube_eqdgc.csv",
  package = "b3gbi")
denmark_cube <- process_cube(
  cube_path,
  first_year = 2014,
  last_year = 2020)

# Function to calculate statistic of interest
# Mean observations per year
mean_obs <- function(data) {
  out_df <- aggregate(obs ~ year, data, mean) # Calculate mean obs per year
  names(out_df) <- c("year", "diversity_val") # Rename columns
  return(out_df)
}
mean_obs(denmark_cube$data)
#>   year diversity_val
#> 1 2014     11.553740
#> 2 2015     11.532206
#> 3 2016      5.532491
#> 4 2017      5.703888
#> 5 2018      5.598413
#> 6 2019      4.802676
#> 7 2020      4.972163

# Perform bootstrapping
# \donttest{
bootstrap_mean_obs <- bootstrap_cube(
  data_cube = denmark_cube$data,
  fun = mean_obs,
  grouping_var = "year",
  samples = 1000,
  seed = 123,
  progress = FALSE)
head(bootstrap_mean_obs)
#>   sample year est_original rep_boot est_boot  se_boot     bias_boot
#> 1      1 2014     11.55374 11.08362 11.55352 1.484546 -0.0002192649
#> 2      2 2014     11.55374 12.50984 11.55352 1.484546 -0.0002192649
#> 3      3 2014     11.55374 11.00085 11.55352 1.484546 -0.0002192649
#> 4      4 2014     11.55374 11.62364 11.55352 1.484546 -0.0002192649
#> 5      5 2014     11.55374 13.37887 11.55352 1.484546 -0.0002192649
#> 6      6 2014     11.55374 11.70199 11.55352 1.484546 -0.0002192649

# Calculate acceleration
acceleration_df <- calculate_acceleration(
  data_cube = denmark_cube$data,
  fun = mean_obs,
  grouping_var = "year",
  progress = FALSE)
acceleration_df
#>   year acceleration
#> 1 2014   0.06931908
#> 2 2015   0.03320883
#> 3 2016   0.05700415
#> 4 2017   0.06025669
#> 5 2018   0.09607958
#> 6 2019   0.10540064
#> 7 2020   0.10579425
# }