data:image/s3,"s3://crabby-images/9c2d4/9c2d4857c3b6b752727f16e5c58cb9dd1d537e9b" alt=""
Calculate confidence intervals for a dataframe with bootstrap replicates
Source:R/calculate_bootstrap_ci.R
calculate_bootstrap_ci.Rd
This function calculates confidence intervals for a dataframe containing
bootstrap replicates based on different methods, including percentile
(perc
), bias-corrected and accelerated (bca
), normal (norm
), and basic
(basic
).
Usage
calculate_bootstrap_ci(
bootstrap_samples_df,
grouping_var,
type = c("perc", "bca", "norm", "basic"),
conf = 0.95,
aggregate = TRUE,
data_cube = NA,
fun = NA,
...,
ref_group = NA,
jackknife = ifelse(is.element("bca", type), "usual", NA),
progress = FALSE
)
Arguments
- bootstrap_samples_df
A dataframe containing the bootstrap replicates, where each row represents a bootstrap sample. As returned by
bootstrap_cube()
. Apart from thegrouping_var
column, the following columns should be present:est_original
: The statistic based on the full dataset per grouprep_boot
: The statistic based on a bootstrapped dataset (bootstrap replicate)
- grouping_var
A string specifying the grouping variable(s) used for the bootstrap analysis. This variable is used to split the dataset into groups for separate confidence interval calculations.
- type
A character vector specifying the type(s) of confidence intervals to compute. Options include:
"perc"
: Percentile interval"bca"
: Bias-corrected and accelerated interval"norm"
: Normal interval"basic"
: Basic interval"all"
: Compute all available interval types (default)
- conf
A numeric value specifying the confidence level of the intervals. Default is
0.95
(95 % confidence level).- aggregate
Logical. If
TRUE
(default), the function returns distinct confidence limits per group. IfFALSE
, the confidence limits are added to the original bootstrap dataframebootstrap_samples_df
.- data_cube
Only used when
type = "bca"
. A data cube object (class 'processed_cube' or 'sim_cube', seeb3gbi::process_cube()
) or a dataframe (from$data
slot of 'processed_cube' or 'sim_cube'). As used bybootstrap_cube()
. To limit runtime, we recommend using a dataframe with custom function asfun
.- fun
Only used when
type = "bca"
. A function which, when applied todata_cube
returns the statistic(s) of interest. This function must return a dataframe with a columndiversity_val
containing the statistic of interest. As used bybootstrap_cube()
.- ...
Additional arguments passed on to
fun
.- ref_group
Only used when
type = "bca"
. A string indicating the reference group to compare the statistic with. Default isNA
, meaning no reference group is used. As used bybootstrap_cube()
.- jackknife
Only used when
type = "bca"
. A string specifying the jackknife resampling method for BCa intervals."usual"
: Negative jackknife (default if BCa is selected)."pos"
: Positive jackknife
- progress
Logical. Whether to show a progress bar for jackknifing. Set to
TRUE
to display a progress bar,FALSE
(default) to suppress it.
Value
A dataframe containing the bootstrap results with the following columns:
est_original
: The statistic based on the full dataset per grouprep_boo
est_boot
: The bootstrap estimate (mean of bootstrap replicates per group)se_boot
: The standard error of the bootstrap estimate (standard deviation of the bootstrap replicates per group)bias_boot
: The bias of the bootstrap estimate per groupint_type
: The interval typell
: The lower limit of the confidence intervalul
: The upper limit of the confidence intervalconf
: The confidence level of the interval Whenaggregate = FALSE
, the dataframe contains the columns frombootstrap_samples_df
with one row per bootstrap replicate.
Details
We consider four different types of intervals (with confidence level \(\alpha\)). The choice for confidence interval types and their calculation is in line with the boot package in R (Canty & Ripley, 1999) to ensure ease of implementation. They are based on the definitions provided by Davison & Hinkley (1997, Chapter 5) (see also DiCiccio & Efron, 1996; Efron, 1987).
Percentile: Uses the percentiles of the bootstrap distribution.
$$CI_{perc} = \left[ \hat{\theta}^*_{(\alpha/2)}, \hat{\theta}^*_{(1-\alpha/2)} \right]$$
where \(\hat{\theta}^*_{(\alpha/2)}\) and \(\hat{\theta}^*_{(1-\alpha/2)}\) are the \(\alpha/2\) and \(1-\alpha/2\) percentiles of the bootstrap distribution, respectively.
Bias-Corrected and Accelerated (BCa): Adjusts for bias and acceleration
Bias refers to the systematic difference between the observed statistic from the original dataset and the center of the bootstrap distribution of the statistic. The bias correction term is calculated as follows:
$$\hat{z}_0 = \Phi^{-1}\left(\frac{\#(\hat{\theta}^*_b < \hat{\theta})}{B}\right)$$
where \(\#\) is the counting operator and \(\Phi^{-1}\) the inverse cumulative density function of the standard normal distribution.
Acceleration quantifies how sensitive the variability of the statistic is to changes in the data.
\(a=0\): The statistic's variability does not depend on the data (e.g., symmetric distribution)
\(a>0\): Small changes in the data have a large effect on the statistic's variability (e.g., positive skew)
\(a<0\): Small changes in the data have a smaller effect on the statistic's variability (e.g., negative skew).
The acceleration term is calculated as follows:
$$\hat{a} = \frac{1}{6} \frac{\sum_{i = 1}^{n}(I_i^3)}{\left( \sum_{i = 1}^{n}(I_i^2) \right)^{3/2}}$$
where \(I_i\) denotes the influence of data point \(x_i\) on the estimation of \(\theta\). \(I_i\) can be estimated using jackknifing. Examples are (1) the negative jackknife: \(I_i = (n-1)(\hat{\theta} - \hat{\theta}_{-i})\), and (2) the positive jackknife \(I_i = (n+1)(\hat{\theta}_{-i} - \hat{\theta})\) (Frangos & Schucany, 1990). Here, \(\hat{\theta}_{-i}\) is the estimated value leaving out the \(i\)’th data point \(x_i\). The boot package also offers infinitesimal jackknife and regression estimation. Implementation of these jackknife algorithms can be explored in the future.
The bias and acceleration estimates are then used to calculate adjusted percentiles.
\(\alpha_1 = \Phi\left( \hat{z}_0 + \frac{\hat{z}_0 + z_{\alpha/2}}{1 - \hat{a}(\hat{z}_0 + z_{\alpha/2})} \right)\), \(\alpha_2 = \Phi\left( \hat{z}_0 + \frac{\hat{z}_0 + z_{1 - \alpha/2}}{1 - \hat{a}(\hat{z}_0 + z_{1 - \alpha/2})} \right)\)
So, we get
$$CI_{bca} = \left[ \hat{\theta}^*_{(\alpha_1)}, \hat{\theta}^*_{(\alpha_2)} \right]$$
Normal: Assumes the bootstrap distribution of the statistic is approximately normal
$$CI_{norm} = \left[\hat{\theta} - \text{Bias}_{\text{boot}} - \text{SE}_{\text{boot}} \times z_{1-\alpha/2}, \hat{\theta} - \text{Bias}_{\text{boot}} + \text{SE}_{\text{boot}} \times z_{1-\alpha/2} \right]$$
where \(z_{1-\alpha/2}\) is the \(1-\alpha/2\) quantile of the standard normal distribution.
Basic: Centers the interval using percentiles
$$CI_{basic} = \left[ 2\hat{\theta} - \hat{\theta}^*_{(1-\alpha/2)}, 2\hat{\theta} - \hat{\theta}^*_{(\alpha/2)} \right]$$
where \(\hat{\theta}^*_{(\alpha/2)}\) and \(\hat{\theta}^*_{(1-\alpha/2)}\) are the \(\alpha/2\) and \(1-\alpha/2\) percentiles of the bootstrap distribution, respectively.
References
Canty, A., & Ripley, B. (1999). boot: Bootstrap Functions (Originally by Angelo Canty for S) [Computer software]. https://CRAN.R-project.org/package=boot
Davison, A. C., & Hinkley, D. V. (1997). Bootstrap Methods and their Application (1st ed.). Cambridge University Press. doi:10.1017/CBO9780511802843
DiCiccio, T. J., & Efron, B. (1996). Bootstrap confidence intervals. Statistical Science, 11(3). doi:10.1214/ss/1032280214
Efron, B. (1987). Better Bootstrap Confidence Intervals. Journal of the American Statistical Association, 82(397), 171–185. doi:10.1080/01621459.1987.10478410
Efron, B., & Tibshirani, R. J. (1994). An Introduction to the Bootstrap (1st ed.). Chapman and Hall/CRC. doi:10.1201/9780429246593
Frangos, C. C., & Schucany, W. R. (1990). Jackknife estimation of the bootstrap acceleration constant. Computational Statistics & Data Analysis, 9(3), 271–281. doi:10.1016/0167-9473(90)90109-U
See also
Other uncertainty:
add_effect_classification()
,
bootstrap_cube()
Examples
# Get example data
# install.packages("remotes")
# remotes::install_github("b-cubed-eu/b3gbi")
library(b3gbi)
cube_path <- system.file(
"extdata", "denmark_mammals_cube_eqdgc.csv",
package = "b3gbi")
denmark_cube <- process_cube(
cube_path,
first_year = 2014,
last_year = 2020)
# Function to calculate statistic of interest
# Mean observations per year
mean_obs <- function(data) {
out_df <- aggregate(obs ~ year, data, mean) # Calculate mean obs per year
names(out_df) <- c("year", "diversity_val") # Rename columns
return(out_df)
}
mean_obs(denmark_cube$data)
#> year diversity_val
#> 1 2014 11.553740
#> 2 2015 11.532206
#> 3 2016 5.532491
#> 4 2017 5.703888
#> 5 2018 5.598413
#> 6 2019 4.802676
#> 7 2020 4.972163
# Perform bootstrapping
# \donttest{
bootstrap_mean_obs <- bootstrap_cube(
data_cube = denmark_cube$data,
fun = mean_obs,
grouping_var = "year",
samples = 1000,
seed = 123,
progress = FALSE)
head(bootstrap_mean_obs)
#> sample year est_original rep_boot est_boot se_boot bias_boot
#> 1 1 2014 11.55374 11.08362 11.55352 1.484546 -0.0002192649
#> 2 2 2014 11.55374 12.50984 11.55352 1.484546 -0.0002192649
#> 3 3 2014 11.55374 11.00085 11.55352 1.484546 -0.0002192649
#> 4 4 2014 11.55374 11.62364 11.55352 1.484546 -0.0002192649
#> 5 5 2014 11.55374 13.37887 11.55352 1.484546 -0.0002192649
#> 6 6 2014 11.55374 11.70199 11.55352 1.484546 -0.0002192649
# Calculate confidence limits
# Percentile interval
ci_mean_obs1 <- calculate_bootstrap_ci(
bootstrap_samples_df = bootstrap_mean_obs,
grouping_var = "year",
type = "perc",
conf = 0.95,
aggregate = TRUE)
ci_mean_obs1
#> year est_original est_boot se_boot bias_boot int_type ll
#> 1 2014 11.553740 11.553521 1.4845460 -0.0002192649 perc 9.049016
#> 2 2015 11.532206 11.552332 0.8048513 0.0201261450 perc 10.021277
#> 3 2016 5.532491 5.533230 0.4938168 0.0007393573 perc 4.669296
#> 4 2017 5.703888 5.697402 0.4791600 -0.0064853139 perc 4.850461
#> 5 2018 5.598413 5.555698 0.6210297 -0.0427145143 perc 4.530512
#> 6 2019 4.802676 4.800357 0.5382311 -0.0023182359 perc 3.897418
#> 7 2020 4.972163 4.964706 0.4742473 -0.0074570833 perc 4.199119
#> ul conf
#> 1 14.803713 0.95
#> 2 13.171339 0.95
#> 3 6.641566 0.95
#> 4 6.770726 0.95
#> 5 6.927914 0.95
#> 6 5.994246 0.95
#> 7 6.022978 0.95
# All intervals
ci_mean_obs2 <- calculate_bootstrap_ci(
bootstrap_samples_df = bootstrap_mean_obs,
grouping_var = "year",
type = c("perc", "bca", "norm", "basic"),
conf = 0.95,
aggregate = TRUE,
data_cube = denmark_cube$data, # Required for BCa
fun = mean_obs, # Required for BCa
progress = FALSE)
ci_mean_obs2
#> year est_original est_boot se_boot bias_boot int_type ll
#> 1 2014 11.553740 11.553521 1.4845460 -0.0002192649 perc 9.049016
#> 2 2015 11.532206 11.552332 0.8048513 0.0201261450 perc 10.021277
#> 3 2016 5.532491 5.533230 0.4938168 0.0007393573 perc 4.669296
#> 4 2017 5.703888 5.697402 0.4791600 -0.0064853139 perc 4.850461
#> 5 2018 5.598413 5.555698 0.6210297 -0.0427145143 perc 4.530512
#> 6 2019 4.802676 4.800357 0.5382311 -0.0023182359 perc 3.897418
#> 7 2020 4.972163 4.964706 0.4742473 -0.0074570833 perc 4.199119
#> 8 2014 11.553740 11.553521 1.4845460 -0.0002192649 bca 9.445556
#> 9 2015 11.532206 11.552332 0.8048513 0.0201261450 bca 10.067004
#> 10 2016 5.532491 5.533230 0.4938168 0.0007393573 bca 4.793565
#> 11 2017 5.703888 5.697402 0.4791600 -0.0064853139 bca 4.963608
#> 12 2018 5.598413 5.555698 0.6210297 -0.0427145143 bca 4.750383
#> 13 2019 4.802676 4.800357 0.5382311 -0.0023182359 bca 4.063479
#> 14 2020 4.972163 4.964706 0.4742473 -0.0074570833 bca 4.356893
#> 15 2014 11.553740 11.553521 1.4845460 -0.0002192649 norm 8.644303
#> 16 2015 11.532206 11.552332 0.8048513 0.0201261450 norm 9.934600
#> 17 2016 5.532491 5.533230 0.4938168 0.0007393573 norm 4.563889
#> 18 2017 5.703888 5.697402 0.4791600 -0.0064853139 norm 4.771236
#> 19 2018 5.598413 5.555698 0.6210297 -0.0427145143 norm 4.423931
#> 20 2019 4.802676 4.800357 0.5382311 -0.0023182359 norm 3.750080
#> 21 2020 4.972163 4.964706 0.4742473 -0.0074570833 norm 4.050112
#> 22 2014 11.553740 11.553521 1.4845460 -0.0002192649 basic 8.303768
#> 23 2015 11.532206 11.552332 0.8048513 0.0201261450 basic 9.893072
#> 24 2016 5.532491 5.533230 0.4938168 0.0007393573 basic 4.423416
#> 25 2017 5.703888 5.697402 0.4791600 -0.0064853139 basic 4.637049
#> 26 2018 5.598413 5.555698 0.6210297 -0.0427145143 basic 4.268912
#> 27 2019 4.802676 4.800357 0.5382311 -0.0023182359 basic 3.611105
#> 28 2020 4.972163 4.964706 0.4742473 -0.0074570833 basic 3.921348
#> ul conf
#> 1 14.803713 0.95
#> 2 13.171339 0.95
#> 3 6.641566 0.95
#> 4 6.770726 0.95
#> 5 6.927914 0.95
#> 6 5.994246 0.95
#> 7 6.022978 0.95
#> 8 16.274751 0.95
#> 9 13.226708 0.95
#> 10 7.054737 0.95
#> 11 7.077113 0.95
#> 12 7.646333 0.95
#> 13 6.882690 0.95
#> 14 6.452541 0.95
#> 15 14.463616 0.95
#> 16 13.089559 0.95
#> 17 6.499615 0.95
#> 18 6.649509 0.95
#> 19 6.858323 0.95
#> 20 5.859907 0.95
#> 21 5.909127 0.95
#> 22 14.058465 0.95
#> 23 13.043135 0.95
#> 24 6.395686 0.95
#> 25 6.557314 0.95
#> 26 6.666313 0.95
#> 27 5.707933 0.95
#> 28 5.745206 0.95
# }