Skip to contents

This function calculates confidence intervals for a dataframe containing bootstrap replicates based on different methods, including percentile (perc), bias-corrected and accelerated (bca), normal (norm), and basic (basic).

Usage

calculate_bootstrap_ci(
  bootstrap_samples_df,
  grouping_var,
  type = c("perc", "bca", "norm", "basic"),
  conf = 0.95,
  aggregate = TRUE,
  data_cube = NA,
  fun = NA,
  ...,
  ref_group = NA,
  jackknife = ifelse(is.element("bca", type), "usual", NA),
  progress = FALSE
)

Arguments

bootstrap_samples_df

A dataframe containing the bootstrap replicates, where each row represents a bootstrap sample. As returned by bootstrap_cube(). Apart from the grouping_var column, the following columns should be present:

  • est_original: The statistic based on the full dataset per group

  • rep_boot: The statistic based on a bootstrapped dataset (bootstrap replicate)

grouping_var

A string specifying the grouping variable(s) used for the bootstrap analysis. This variable is used to split the dataset into groups for separate confidence interval calculations.

type

A character vector specifying the type(s) of confidence intervals to compute. Options include:

  • "perc": Percentile interval

  • "bca": Bias-corrected and accelerated interval

  • "norm": Normal interval

  • "basic": Basic interval

  • "all": Compute all available interval types (default)

conf

A numeric value specifying the confidence level of the intervals. Default is 0.95 (95 % confidence level).

aggregate

Logical. If TRUE (default), the function returns distinct confidence limits per group. If FALSE, the confidence limits are added to the original bootstrap dataframe bootstrap_samples_df.

data_cube

Only used when type = "bca". A data cube object (class 'processed_cube' or 'sim_cube', see b3gbi::process_cube()) or a dataframe (from $data slot of 'processed_cube' or 'sim_cube'). As used by bootstrap_cube(). To limit runtime, we recommend using a dataframe with custom function as fun.

fun

Only used when type = "bca". A function which, when applied to data_cube returns the statistic(s) of interest. This function must return a dataframe with a column diversity_val containing the statistic of interest. As used by bootstrap_cube().

...

Additional arguments passed on to fun.

ref_group

Only used when type = "bca". A string indicating the reference group to compare the statistic with. Default is NA, meaning no reference group is used. As used by bootstrap_cube().

jackknife

Only used when type = "bca". A string specifying the jackknife resampling method for BCa intervals.

  • "usual": Negative jackknife (default if BCa is selected).

  • "pos": Positive jackknife

progress

Logical. Whether to show a progress bar for jackknifing. Set to TRUE to display a progress bar, FALSE (default) to suppress it.

Value

A dataframe containing the bootstrap results with the following columns:

  • est_original: The statistic based on the full dataset per group

  • rep_boo

  • est_boot: The bootstrap estimate (mean of bootstrap replicates per group)

  • se_boot: The standard error of the bootstrap estimate (standard deviation of the bootstrap replicates per group)

  • bias_boot: The bias of the bootstrap estimate per group

  • int_type: The interval type

  • ll: The lower limit of the confidence interval

  • ul: The upper limit of the confidence interval

  • conf: The confidence level of the interval When aggregate = FALSE, the dataframe contains the columns from bootstrap_samples_df with one row per bootstrap replicate.

Details

We consider four different types of intervals (with confidence level \(\alpha\)). The choice for confidence interval types and their calculation is in line with the boot package in R (Canty & Ripley, 1999) to ensure ease of implementation. They are based on the definitions provided by Davison & Hinkley (1997, Chapter 5) (see also DiCiccio & Efron, 1996; Efron, 1987).

  1. Percentile: Uses the percentiles of the bootstrap distribution.

    $$CI_{perc} = \left[ \hat{\theta}^*_{(\alpha/2)}, \hat{\theta}^*_{(1-\alpha/2)} \right]$$

    where \(\hat{\theta}^*_{(\alpha/2)}\) and \(\hat{\theta}^*_{(1-\alpha/2)}\) are the \(\alpha/2\) and \(1-\alpha/2\) percentiles of the bootstrap distribution, respectively.

  2. Bias-Corrected and Accelerated (BCa): Adjusts for bias and acceleration

    Bias refers to the systematic difference between the observed statistic from the original dataset and the center of the bootstrap distribution of the statistic. The bias correction term is calculated as follows:

    $$\hat{z}_0 = \Phi^{-1}\left(\frac{\#(\hat{\theta}^*_b < \hat{\theta})}{B}\right)$$

    where \(\#\) is the counting operator and \(\Phi^{-1}\) the inverse cumulative density function of the standard normal distribution.

    Acceleration quantifies how sensitive the variability of the statistic is to changes in the data.

    • \(a=0\): The statistic's variability does not depend on the data (e.g., symmetric distribution)

    • \(a>0\): Small changes in the data have a large effect on the statistic's variability (e.g., positive skew)

    • \(a<0\): Small changes in the data have a smaller effect on the statistic's variability (e.g., negative skew).

    The acceleration term is calculated as follows:

    $$\hat{a} = \frac{1}{6} \frac{\sum_{i = 1}^{n}(I_i^3)}{\left( \sum_{i = 1}^{n}(I_i^2) \right)^{3/2}}$$

    where \(I_i\) denotes the influence of data point \(x_i\) on the estimation of \(\theta\). \(I_i\) can be estimated using jackknifing. Examples are (1) the negative jackknife: \(I_i = (n-1)(\hat{\theta} - \hat{\theta}_{-i})\), and (2) the positive jackknife \(I_i = (n+1)(\hat{\theta}_{-i} - \hat{\theta})\) (Frangos & Schucany, 1990). Here, \(\hat{\theta}_{-i}\) is the estimated value leaving out the \(i\)’th data point \(x_i\). The boot package also offers infinitesimal jackknife and regression estimation. Implementation of these jackknife algorithms can be explored in the future.

    The bias and acceleration estimates are then used to calculate adjusted percentiles.

    \(\alpha_1 = \Phi\left( \hat{z}_0 + \frac{\hat{z}_0 + z_{\alpha/2}}{1 - \hat{a}(\hat{z}_0 + z_{\alpha/2})} \right)\), \(\alpha_2 = \Phi\left( \hat{z}_0 + \frac{\hat{z}_0 + z_{1 - \alpha/2}}{1 - \hat{a}(\hat{z}_0 + z_{1 - \alpha/2})} \right)\)

    So, we get

    $$CI_{bca} = \left[ \hat{\theta}^*_{(\alpha_1)}, \hat{\theta}^*_{(\alpha_2)} \right]$$

  3. Normal: Assumes the bootstrap distribution of the statistic is approximately normal

    $$CI_{norm} = \left[\hat{\theta} - \text{Bias}_{\text{boot}} - \text{SE}_{\text{boot}} \times z_{1-\alpha/2}, \hat{\theta} - \text{Bias}_{\text{boot}} + \text{SE}_{\text{boot}} \times z_{1-\alpha/2} \right]$$

    where \(z_{1-\alpha/2}\) is the \(1-\alpha/2\) quantile of the standard normal distribution.

  4. Basic: Centers the interval using percentiles

    $$CI_{basic} = \left[ 2\hat{\theta} - \hat{\theta}^*_{(1-\alpha/2)}, 2\hat{\theta} - \hat{\theta}^*_{(\alpha/2)} \right]$$

    where \(\hat{\theta}^*_{(\alpha/2)}\) and \(\hat{\theta}^*_{(1-\alpha/2)}\) are the \(\alpha/2\) and \(1-\alpha/2\) percentiles of the bootstrap distribution, respectively.

References

Canty, A., & Ripley, B. (1999). boot: Bootstrap Functions (Originally by Angelo Canty for S) [Computer software]. https://CRAN.R-project.org/package=boot

Davison, A. C., & Hinkley, D. V. (1997). Bootstrap Methods and their Application (1st ed.). Cambridge University Press. doi:10.1017/CBO9780511802843

DiCiccio, T. J., & Efron, B. (1996). Bootstrap confidence intervals. Statistical Science, 11(3). doi:10.1214/ss/1032280214

Efron, B. (1987). Better Bootstrap Confidence Intervals. Journal of the American Statistical Association, 82(397), 171–185. doi:10.1080/01621459.1987.10478410

Efron, B., & Tibshirani, R. J. (1994). An Introduction to the Bootstrap (1st ed.). Chapman and Hall/CRC. doi:10.1201/9780429246593

Frangos, C. C., & Schucany, W. R. (1990). Jackknife estimation of the bootstrap acceleration constant. Computational Statistics & Data Analysis, 9(3), 271–281. doi:10.1016/0167-9473(90)90109-U

See also

Examples

# Get example data
# install.packages("remotes")
# remotes::install_github("b-cubed-eu/b3gbi")
library(b3gbi)
cube_path <- system.file(
  "extdata", "denmark_mammals_cube_eqdgc.csv",
  package = "b3gbi")
denmark_cube <- process_cube(
  cube_path,
  first_year = 2014,
  last_year = 2020)

# Function to calculate statistic of interest
# Mean observations per year
mean_obs <- function(data) {
  out_df <- aggregate(obs ~ year, data, mean) # Calculate mean obs per year
  names(out_df) <- c("year", "diversity_val") # Rename columns
  return(out_df)
}
mean_obs(denmark_cube$data)
#>   year diversity_val
#> 1 2014     11.553740
#> 2 2015     11.532206
#> 3 2016      5.532491
#> 4 2017      5.703888
#> 5 2018      5.598413
#> 6 2019      4.802676
#> 7 2020      4.972163

# Perform bootstrapping
# \donttest{
bootstrap_mean_obs <- bootstrap_cube(
  data_cube = denmark_cube$data,
  fun = mean_obs,
  grouping_var = "year",
  samples = 1000,
  seed = 123,
  progress = FALSE)
head(bootstrap_mean_obs)
#>   sample year est_original rep_boot est_boot  se_boot     bias_boot
#> 1      1 2014     11.55374 11.08362 11.55352 1.484546 -0.0002192649
#> 2      2 2014     11.55374 12.50984 11.55352 1.484546 -0.0002192649
#> 3      3 2014     11.55374 11.00085 11.55352 1.484546 -0.0002192649
#> 4      4 2014     11.55374 11.62364 11.55352 1.484546 -0.0002192649
#> 5      5 2014     11.55374 13.37887 11.55352 1.484546 -0.0002192649
#> 6      6 2014     11.55374 11.70199 11.55352 1.484546 -0.0002192649

# Calculate confidence limits
# Percentile interval
ci_mean_obs1 <- calculate_bootstrap_ci(
  bootstrap_samples_df = bootstrap_mean_obs,
  grouping_var = "year",
  type = "perc",
  conf = 0.95,
  aggregate = TRUE)
ci_mean_obs1
#>   year est_original  est_boot   se_boot     bias_boot int_type        ll
#> 1 2014    11.553740 11.553521 1.4845460 -0.0002192649     perc  9.049016
#> 2 2015    11.532206 11.552332 0.8048513  0.0201261450     perc 10.021277
#> 3 2016     5.532491  5.533230 0.4938168  0.0007393573     perc  4.669296
#> 4 2017     5.703888  5.697402 0.4791600 -0.0064853139     perc  4.850461
#> 5 2018     5.598413  5.555698 0.6210297 -0.0427145143     perc  4.530512
#> 6 2019     4.802676  4.800357 0.5382311 -0.0023182359     perc  3.897418
#> 7 2020     4.972163  4.964706 0.4742473 -0.0074570833     perc  4.199119
#>          ul conf
#> 1 14.803713 0.95
#> 2 13.171339 0.95
#> 3  6.641566 0.95
#> 4  6.770726 0.95
#> 5  6.927914 0.95
#> 6  5.994246 0.95
#> 7  6.022978 0.95

# All intervals
ci_mean_obs2 <- calculate_bootstrap_ci(
  bootstrap_samples_df = bootstrap_mean_obs,
  grouping_var = "year",
  type = c("perc", "bca", "norm", "basic"),
  conf = 0.95,
  aggregate = TRUE,
  data_cube = denmark_cube$data, # Required for BCa
  fun = mean_obs,                # Required for BCa
  progress = FALSE)
ci_mean_obs2
#>    year est_original  est_boot   se_boot     bias_boot int_type        ll
#> 1  2014    11.553740 11.553521 1.4845460 -0.0002192649     perc  9.049016
#> 2  2015    11.532206 11.552332 0.8048513  0.0201261450     perc 10.021277
#> 3  2016     5.532491  5.533230 0.4938168  0.0007393573     perc  4.669296
#> 4  2017     5.703888  5.697402 0.4791600 -0.0064853139     perc  4.850461
#> 5  2018     5.598413  5.555698 0.6210297 -0.0427145143     perc  4.530512
#> 6  2019     4.802676  4.800357 0.5382311 -0.0023182359     perc  3.897418
#> 7  2020     4.972163  4.964706 0.4742473 -0.0074570833     perc  4.199119
#> 8  2014    11.553740 11.553521 1.4845460 -0.0002192649      bca  9.445556
#> 9  2015    11.532206 11.552332 0.8048513  0.0201261450      bca 10.067004
#> 10 2016     5.532491  5.533230 0.4938168  0.0007393573      bca  4.793565
#> 11 2017     5.703888  5.697402 0.4791600 -0.0064853139      bca  4.963608
#> 12 2018     5.598413  5.555698 0.6210297 -0.0427145143      bca  4.750383
#> 13 2019     4.802676  4.800357 0.5382311 -0.0023182359      bca  4.063479
#> 14 2020     4.972163  4.964706 0.4742473 -0.0074570833      bca  4.356893
#> 15 2014    11.553740 11.553521 1.4845460 -0.0002192649     norm  8.644303
#> 16 2015    11.532206 11.552332 0.8048513  0.0201261450     norm  9.934600
#> 17 2016     5.532491  5.533230 0.4938168  0.0007393573     norm  4.563889
#> 18 2017     5.703888  5.697402 0.4791600 -0.0064853139     norm  4.771236
#> 19 2018     5.598413  5.555698 0.6210297 -0.0427145143     norm  4.423931
#> 20 2019     4.802676  4.800357 0.5382311 -0.0023182359     norm  3.750080
#> 21 2020     4.972163  4.964706 0.4742473 -0.0074570833     norm  4.050112
#> 22 2014    11.553740 11.553521 1.4845460 -0.0002192649    basic  8.303768
#> 23 2015    11.532206 11.552332 0.8048513  0.0201261450    basic  9.893072
#> 24 2016     5.532491  5.533230 0.4938168  0.0007393573    basic  4.423416
#> 25 2017     5.703888  5.697402 0.4791600 -0.0064853139    basic  4.637049
#> 26 2018     5.598413  5.555698 0.6210297 -0.0427145143    basic  4.268912
#> 27 2019     4.802676  4.800357 0.5382311 -0.0023182359    basic  3.611105
#> 28 2020     4.972163  4.964706 0.4742473 -0.0074570833    basic  3.921348
#>           ul conf
#> 1  14.803713 0.95
#> 2  13.171339 0.95
#> 3   6.641566 0.95
#> 4   6.770726 0.95
#> 5   6.927914 0.95
#> 6   5.994246 0.95
#> 7   6.022978 0.95
#> 8  16.274751 0.95
#> 9  13.226708 0.95
#> 10  7.054737 0.95
#> 11  7.077113 0.95
#> 12  7.646333 0.95
#> 13  6.882690 0.95
#> 14  6.452541 0.95
#> 15 14.463616 0.95
#> 16 13.089559 0.95
#> 17  6.499615 0.95
#> 18  6.649509 0.95
#> 19  6.858323 0.95
#> 20  5.859907 0.95
#> 21  5.909127 0.95
#> 22 14.058465 0.95
#> 23 13.043135 0.95
#> 24  6.395686 0.95
#> 25  6.557314 0.95
#> 26  6.666313 0.95
#> 27  5.707933 0.95
#> 28  5.745206 0.95
# }