
Calculate acceleration for a statistic in a dataframe
Source:R/calculate_acceleration.R
calculate_acceleration.Rd
This function calculates acceleration values, which quantify the sensitivity
of a statistic’s variability to changes in the dataset. Acceleration is used
for bias-corrected and accelerated (BCa) confidence intervals in
calculate_bootstrap_ci()
.
Usage
calculate_acceleration(
data_cube,
fun,
...,
grouping_var,
ref_group = NA,
influence_method = "usual",
progress = FALSE
)
Arguments
- data_cube
A data cube object (class 'processed_cube' or 'sim_cube', see
b3gbi::process_cube()
) or a dataframe (from$data
slot of 'processed_cube' or 'sim_cube'). As used bybootstrap_cube()
. To limit runtime, we recommend using a dataframe with custom function asfun
.- fun
A function which, when applied to
data_cube
returns the statistic(s) of interest. This function must return a dataframe with a columndiversity_val
containing the statistic of interest. As used bybootstrap_cube()
.- ...
Additional arguments passed on to
fun
.- grouping_var
A character vector specifying the grouping variable(s) for the bootstrap analysis. The function
fun(data_cube, ...)
should return a row per group. The specified variables must not be redundant, meaning they should not contain the same information (e.g.,"time_point"
(1, 2, 3) and"year"
(2000, 2001, 2002) should not be used together if"time_point"
is just an alternative encoding of"year"
). This variable is used to split the dataset into groups for separate acceleration calculations.- ref_group
A string indicating the reference group to compare the statistic with. Default is
NA
, meaning no reference group is used. As used bybootstrap_cube()
.- influence_method
A string specifying the method used for calculating the influence values.
"usual"
: Negative jackknife (default if BCa is selected)."pos"
: Positive jackknife
- progress
Logical. Whether to show a progress bar for jackknifing. Set to
TRUE
to display a progress bar,FALSE
(default) to suppress it.
Details
Acceleration quantifies how sensitive the variability of a statistic \(\theta\) is to changes in the data.
\(a=0\): The statistic's variability does not depend on the data (e.g., symmetric distribution)
\(a>0\): Small changes in the data have a large effect on the statistic's variability (e.g., positive skew)
\(a<0\): Small changes in the data have a smaller effect on the statistic's variability (e.g., negative skew).
It is used for BCa confidence interval calculation, which adjust for
bias and skewness in bootstrapped distributions (Davison & Hinkley, 1997,
Chapter 5). See also the empinf()
function of the boot package in R
(Canty & Ripley, 1999)). The acceleration is calculated as follows:
$$\hat{a} = \frac{1}{6} \frac{\sum_{i = 1}^{n}(I_i^3)}{\left( \sum_{i = 1}^{n}(I_i^2) \right)^{3/2}}$$
where \(I_i\) denotes the influence of data point \(x_i\) on the estimation of \(\theta\). \(I_i\) can be estimated using jackknifing. Examples are (1) the negative jackknife: \(I_i = (n-1)(\hat{\theta} - \hat{\theta}_{-i})\), and (2) the positive jackknife \(I_i = (n+1)(\hat{\theta}_{-i} - \hat{\theta})\) (Frangos & Schucany, 1990). Here, \(\hat{\theta}_{-i}\) is the estimated value leaving out the \(i\)’th data point \(x_i\). The boot package also offers infinitesimal jackknife and regression estimation. Implementation of these jackknife algorithms can be explored in the future.
If a reference group is used, jackknifing is implemented in a different way. Consider \(\hat{\theta} = \hat{\theta}_1 - \hat{\theta}_2\) where \(\hat{\theta}_1\) is the estimate for the indicator value of a non-reference period (sample size \(n_1\)) and \(\hat{\theta}_2\) is the estimate for the indicator value of a reference period (sample size \(n_2\)). The acceleration is now calculated as follows:
$$\hat{a} = \frac{1}{6} \frac{\sum_{i = 1}^{n_1 + n_2}(I_i^3)}{\left( \sum_{i = 1}^{n_1 + n_2}(I_i^2) \right)^{3/2}}$$
\(I_i\) can be calculated using the negative or positive jackknife. Such that
\(\hat{\theta}_{-i} = \hat{\theta}_{1,-i} - \hat{\theta}_2 \text{ for } i = 1, \ldots, n_1\), and
\(\hat{\theta}_{-i} = \hat{\theta}_{1} - \hat{\theta}_{2,-i} \text{ for } i = n_1 + 1, \ldots, n_1 + n_2\)
References
Canty, A., & Ripley, B. (1999). boot: Bootstrap Functions (Originally by Angelo Canty for S) [Computer software]. https://CRAN.R-project.org/package=boot
Davison, A. C., & Hinkley, D. V. (1997). Bootstrap Methods and their Application (1st ed.). Cambridge University Press. doi:10.1017/CBO9780511802843
Frangos, C. C., & Schucany, W. R. (1990). Jackknife estimation of the bootstrap acceleration constant. Computational Statistics & Data Analysis, 9(3), 271–281. doi:10.1016/0167-9473(90)90109-U
See also
Other indicator_uncertainty:
add_effect_classification()
,
bootstrap_cube()
,
calculate_bootstrap_ci()
Examples
# Get example data
# install.packages("b3gbi", repos = "https://b-cubed-eu.r-universe.dev")
library(b3gbi)
cube_path <- system.file(
"extdata", "denmark_mammals_cube_eqdgc.csv",
package = "b3gbi")
denmark_cube <- process_cube(
cube_path,
first_year = 2014,
last_year = 2020)
# Function to calculate statistic of interest
# Mean observations per year
mean_obs <- function(data) {
out_df <- aggregate(obs ~ year, data, mean) # Calculate mean obs per year
names(out_df) <- c("year", "diversity_val") # Rename columns
return(out_df)
}
mean_obs(denmark_cube$data)
#> year diversity_val
#> 1 2014 11.553740
#> 2 2015 11.532206
#> 3 2016 5.532491
#> 4 2017 5.703888
#> 5 2018 5.598413
#> 6 2019 4.802676
#> 7 2020 4.972163
# Perform bootstrapping
# \donttest{
bootstrap_mean_obs <- bootstrap_cube(
data_cube = denmark_cube$data,
fun = mean_obs,
grouping_var = "year",
samples = 1000,
seed = 123,
progress = FALSE)
head(bootstrap_mean_obs)
#> sample year est_original rep_boot est_boot se_boot bias_boot
#> 1 1 2014 11.55374 11.08362 11.55352 1.484546 -0.0002192649
#> 2 2 2014 11.55374 12.50984 11.55352 1.484546 -0.0002192649
#> 3 3 2014 11.55374 11.00085 11.55352 1.484546 -0.0002192649
#> 4 4 2014 11.55374 11.62364 11.55352 1.484546 -0.0002192649
#> 5 5 2014 11.55374 13.37887 11.55352 1.484546 -0.0002192649
#> 6 6 2014 11.55374 11.70199 11.55352 1.484546 -0.0002192649
# Calculate acceleration
acceleration_df <- calculate_acceleration(
data_cube = denmark_cube$data,
fun = mean_obs,
grouping_var = "year",
progress = FALSE)
acceleration_df
#> year acceleration
#> 1 2014 0.06931908
#> 2 2015 0.03320883
#> 3 2016 0.05700415
#> 4 2017 0.06025669
#> 5 2018 0.09607958
#> 6 2019 0.10540064
#> 7 2020 0.10579425
# }