Skip to contents

This function calculates acceleration values, which quantify the sensitivity of a statistic’s variability to changes in the dataset. Acceleration is used for bias-corrected and accelerated (BCa) confidence intervals in calculate_bootstrap_ci().

Usage

calculate_acceleration(
  data_cube,
  fun,
  ...,
  grouping_var,
  ref_group = NA,
  influence_method = "usual",
  progress = FALSE
)

Arguments

data_cube

A data cube object (class 'processed_cube' or 'sim_cube', see b3gbi::process_cube()) or a dataframe (from $data slot of 'processed_cube' or 'sim_cube'). As used by bootstrap_cube(). To limit runtime, we recommend using a dataframe with custom function as fun.

fun

A function which, when applied to data_cube returns the statistic(s) of interest. This function must return a dataframe with a column diversity_val containing the statistic of interest. As used by bootstrap_cube().

...

Additional arguments passed on to fun.

grouping_var

A character vector specifying the grouping variable(s) for the bootstrap analysis. The function fun(data_cube, ...) should return a row per group. The specified variables must not be redundant, meaning they should not contain the same information (e.g., "time_point" (1, 2, 3) and "year" (2000, 2001, 2002) should not be used together if "time_point" is just an alternative encoding of "year"). This variable is used to split the dataset into groups for separate acceleration calculations.

ref_group

A string indicating the reference group to compare the statistic with. Default is NA, meaning no reference group is used. As used by bootstrap_cube().

influence_method

A string specifying the method used for calculating the influence values.

  • "usual": Negative jackknife (default if BCa is selected).

  • "pos": Positive jackknife

progress

Logical. Whether to show a progress bar for jackknifing. Set to TRUE to display a progress bar, FALSE (default) to suppress it.

Value

A dataframe containing the acceleration values per grouping_var.

Details

Acceleration quantifies how sensitive the variability of a statistic \(\theta\) is to changes in the data.

  • \(a=0\): The statistic's variability does not depend on the data (e.g., symmetric distribution)

  • \(a>0\): Small changes in the data have a large effect on the statistic's variability (e.g., positive skew)

  • \(a<0\): Small changes in the data have a smaller effect on the statistic's variability (e.g., negative skew).

It is used for BCa confidence interval calculation, which adjust for bias and skewness in bootstrapped distributions (Davison & Hinkley, 1997, Chapter 5). See also the empinf() function of the boot package in R (Canty & Ripley, 1999)). The acceleration is calculated as follows:

$$\hat{a} = \frac{1}{6} \frac{\sum_{i = 1}^{n}(I_i^3)}{\left( \sum_{i = 1}^{n}(I_i^2) \right)^{3/2}}$$

where \(I_i\) denotes the influence of data point \(x_i\) on the estimation of \(\theta\). \(I_i\) can be estimated using jackknifing. Examples are (1) the negative jackknife: \(I_i = (n-1)(\hat{\theta} - \hat{\theta}_{-i})\), and (2) the positive jackknife \(I_i = (n+1)(\hat{\theta}_{-i} - \hat{\theta})\) (Frangos & Schucany, 1990). Here, \(\hat{\theta}_{-i}\) is the estimated value leaving out the \(i\)’th data point \(x_i\). The boot package also offers infinitesimal jackknife and regression estimation. Implementation of these jackknife algorithms can be explored in the future.

If a reference group is used, jackknifing is implemented in a different way. Consider \(\hat{\theta} = \hat{\theta}_1 - \hat{\theta}_2\) where \(\hat{\theta}_1\) is the estimate for the indicator value of a non-reference period (sample size \(n_1\)) and \(\hat{\theta}_2\) is the estimate for the indicator value of a reference period (sample size \(n_2\)). The acceleration is now calculated as follows:

$$\hat{a} = \frac{1}{6} \frac{\sum_{i = 1}^{n_1 + n_2}(I_i^3)}{\left( \sum_{i = 1}^{n_1 + n_2}(I_i^2) \right)^{3/2}}$$

\(I_i\) can be calculated using the negative or positive jackknife. Such that

\(\hat{\theta}_{-i} = \hat{\theta}_{1,-i} - \hat{\theta}_2 \text{ for } i = 1, \ldots, n_1\), and

\(\hat{\theta}_{-i} = \hat{\theta}_{1} - \hat{\theta}_{2,-i} \text{ for } i = n_1 + 1, \ldots, n_1 + n_2\)

References

Canty, A., & Ripley, B. (1999). boot: Bootstrap Functions (Originally by Angelo Canty for S) [Computer software]. https://CRAN.R-project.org/package=boot

Davison, A. C., & Hinkley, D. V. (1997). Bootstrap Methods and their Application (1st ed.). Cambridge University Press. doi:10.1017/CBO9780511802843

Frangos, C. C., & Schucany, W. R. (1990). Jackknife estimation of the bootstrap acceleration constant. Computational Statistics & Data Analysis, 9(3), 271–281. doi:10.1016/0167-9473(90)90109-U

See also

Examples

if (FALSE) { # \dontrun{
# After processing a data cube with b3gbi::process_cube()

# Function to calculate statistic of interest
# Mean observations per year
mean_obs <- function(data) {
  out_df <- aggregate(obs ~ year, data, mean) # Calculate mean obs per year
  names(out_df) <- c("year", "diversity_val") # Rename columns
  return(out_df)
}
mean_obs(processed_cube$data)

# Perform bootstrapping
bootstrap_mean_obs <- bootstrap_cube(
  data_cube = processed_cube$data,
  fun = mean_obs,
  grouping_var = "year",
  samples = 1000,
  seed = 123,
  progress = FALSE
)
head(bootstrap_mean_obs)

# Calculate acceleration
acceleration_df <- calculate_acceleration(
  data_cube = processed_cube$data,
  fun = mean_obs,
  grouping_var = "year",
  progress = FALSE
)
acceleration_df
} # }