Based on the observation that frequentist confidence intervals and Bayesian credible intervals sometimes happen to have the same numerical boundaries (under very specific conditions), Albers et al. (

The main line of reasoning of Albers et al. (

While we agree with their main observation (i.e., that confidence intervals and credible intervals obtained with uninformative priors might sometimes coincide), we disagree with their main conclusion (i.e., that confidence intervals can be interpreted as credible intervals). We think the examples presented in Albers et al. (

The debate between the frequentist and the Bayesian schools of inference has been firing for many decades and we do not wish to reiterate all the arguments here (we refer the interested reader to the introduction of

In other words, the posterior probability of some parameter (or vector of parameters)

This highlights a first undesirable consequence of Albers et al.’s (

Further, there are examples where numerically equivalent intervals do not necessarily reflect the most probable parameter values (given all available information), but could still have valid frequentist properties. Indeed, whereas both Bayesian and frequentist intervals could have nominal coverage probabilities (

In Figure

Coverage properties of Bayesian credible intervals when using weakly informative priors. Blue vertical credible intervals represent intervals that “missed” the population value of the parameter (whose value is represented by the horizontal dashed line), while grey intervals represent intervals that contained the population value. Note: for readability, only the first 100 simulations are plotted.

Bayesian credible intervals with non-informative or weakly informative priors may have the same frequentist characteristics as confidence intervals, but also allow for conditional probability statements (e.g., given the prior and the information contained in the data, we can say that there is a X% probability that the population value of

In this section, we report simulation results of the coverage properties of both confidence and credible intervals around the amount of heterogeneity

The effect sizes to be combined in meta-analyses are often found to be more variable than it would happen because of sampling alone. The usual way to take into account this heterogeneity is to use random-effects models (also known as multilevel models). Several methods have been proposed to obtain confidence intervals around the point estimate of

As shown in Figure

Coverage properties of 95% confidence intervals and 95% credible intervals for recovering the amount of heterogeneity in random-effects meta-analysis models. Note: For readability, only the first 100 simulations are plotted.

Thus, even when using non-informative priors (we used

Albers et al. (

Contrary to what the authors postulate, differences between confidence intervals and credible intervals are observable in a large variety of situations (actually, all but one). For instance (but non exhaustively), i) when samples are small, ii) when the space of the outcome is multi-modal or non-continuous, iii) when the range of the outcome is restricted, or iv) when the prior is at least weakly informative. Combining these four possibilities, we argue that confidence intervals and credible intervals actually almost never give similar results. Moreover, as we previously demonstrated, numerical estimates can be similar, but it does not entail that the conclusion we can draw from it (i.e., the inference being made) should be similar.

In the previous sections, we discussed why we think the heuristic suggested by Albers et al. (

Albers et al. (

Confidence intervals can sometimes (i.e., under specific conditions) be identified with a special case of credible intervals for which priors are non-informative. Thus, one could ask, in consideration of the parsimony principle, why reporting redundant statistics? Would not it be easier to use the more general and flexible case? The parsimonious stance that we adopt here leads to the conclusion that the researcher interested in one specific interpretation should report the statistics that corresponds to this goal.

Albers et al. (

We could not agree more with this statement. In addition, we recognise that both statistical traditions have their own advantages and drawbacks, and have been built to answer somehow different questions. Therefore, pretending that a statistic issued from one school of inference can be interpreted as a statistic issued from another school because they sometimes (under very restricted conditions) give the same numerical estimates is confusing and misleading.

To sum up, we feel that every proposal going in the direction of more fuzziness in the distinction between different kinds of intervals is misleading and should be rejected. Using a confidence interval as a credible interval or using a credible interval as a confidence interval seems inappropriate to us, as it tends to blur the distinction between essentially different statistical tools. Instead, we prefer to emphatically teach and discuss the differences between these tools and their domains of application. As Hoekstra, Morey, and Wagenmakers (

Given the limitations of the pragmatic perspective offered by Albers et al. (

Reproducible code and figures are available at:

Which is given by the intercept of the model, if no predictor is included, or if these predictors have been contrast-coded.

Although, as we discussed earlier, this probability statement, while valid, makes little sense knowing that it is conditional on all possible values being equally likely a priori.

Obviously, it is perfectly legitimate to be interested in several goals, but these goals should be clearly stated as such, and pursued using appropriate tools.

We thank Antonio Schettino and Ivan Grahek for helpful comments on a previous version of this manuscript, as well as the original authors and one anonymous reviewer for their suggestions during the review process.

The authors have no competing interests to declare.

LN wrote a first version of the manuscript and conducted the simulations for the regression example. DW wrote a part of the paper and conducted the simulations for the meta-analysis example. DW and PB critically commented on various versions of the manuscript. All authors contributed to writing of the final manuscript.