The debate between Bayesians and frequentist statisticians has been going on for decades. Whilst there are fundamental theoretical and philosophical differences between both schools of thought, we argue that in two most common situations the practical differences are negligible when off-the-shelf Bayesian analysis (i.e., using ‘objective’ priors) is used. We emphasize this reasoning by focusing on interval estimates: confidence intervals and credible intervals. We show that this is the case for the most common empirical situations in the social sciences, the estimation of a proportion of a binomial distribution and the estimation of the mean of a unimodal distribution. Numerical differences between both approaches are small, sometimes even smaller than those between two competing frequentist or two competing Bayesian approaches. We outline the ramifications of this for scientific practice.

The exchange of arguments between frequentist statisticians and Bayesian statisticians goes back many decades. Frequentists rely on the work of classical statisticians such as Fisher, Pearson and Neyman, and apply the lines of thought of these scholars in estimation and inference, most notably in their approach to null hypothesis significance testing (NHST) and the construction of confidence intervals. On the other hand, Bayesians rely on Bayes’ paradigm on conditional probability and adjust (subjective) a priori thoughts about the truth – formalized by a probability distribution – into a posteriori statements after observing data.

For many years, the Bayesian approach had two practical disadvantages: (i) many types of models needed a vast amount of computing time, e.g. for estimation through Markov Chain Monte Carlo methods (see, e.g.

The frequentist and Bayesian approaches have fundamental philosophical differences as to how to describe Nature in the form of probability statements. It is obviously important to discuss these differences and the consequences of the choices that both sides make, and this has been done extensively in the (mathematical) statistical literature (cf.

However, too often in our view, the debate is harsh, with Bayesians claiming that all frequentist methods are useless, or vice versa. This style of debating is not new. For instance, over four decades ago Lindley already stated that “the only good statistics is Bayesian statistics” (

At the core, frequentist and Bayesian approaches have the same goal: proper statistical inference. Philosophical differences in how best to conduct such inference seem less important than the merits of what both approaches have in common. As we will show in this paper, in practice the overlap in uncertainty intervals produced for parameter estimates by both schools is often very large.

Occasionally, the Bayesian and frequentist approach yield substantially different inferences. Usually this occurs when the sample size is very small (see Morey et al. (

Previous work has examined the relationship between the frequentist

We shall motivate our opinion on the basis of a series of typical examples from social research. The structure of the paper is as follows. In the next section, we discuss estimation of the population mean in the form of interval estimates. In the section thereafter, we outline, through simulation techniques, the consequences when we are moving away from the ‘regular situation’ of normally distributed values around a group mean. We end with a discussion including practical recommendations.

Suppose the interest lies in estimating the proportion of a given population that holds a specific property. This is a very general research question, applicable to many areas: the proportion of diabetes patients that respond positively to a certain treatment method, the proportion of voters expected to vote for a certain political party, the proportion of students passing an exam, etc.

To express the statistical uncertainty about the population proportion, a point estimate alone is not sufficient and an estimate in the form of an interval is preferred. Frequentists call such an interval a confidence interval, Bayesians call it a credible interval. These two types of intervals are, from a theoretical/philosophical point of view, fundamentally different. From a practical point of view, however, both intervals share a common feature: the interval is preferred over the point estimate to express uncertainty. Suppose one estimates a population proportion

There are different frequentist and Bayesian approaches to generating such intervals, all based on a random sample of

When

where

This asymptotic approach can be improved upon through the so-called plus-four method (

Approach F1 is asymptotic and – even with the “plus four”-correction outlined – does not necessarily work well for small samples. However, it is frequently used, mainly because of its simplicity and the lack of alternative methods available in common software packages. Blyth (

with _{0.025; 2m, 2(n – m + 1)} and _{0.975; 2(m + 1), 2(n – m)} being percentiles from

This approach is based on the approximation (cf.,

which, after some derivations, leads to the interval

One of the instances where this approach is used is in the computation of Cohen’s

Bayesian approaches are specified through their prior distribution. The beta-distribution is a so-called conjugate prior of the Binomial distribution, which means that the posterior distribution is also Beta. In general, when using a Beta(

Approach B1 is based on the prior assertion that all values for

Jeffreys prior is a so-called non-informative prior (which means it is invariant under reparametrizations of the problem space), which is a desirable property of a prior. The Jeffreys prior for the current setting is the Beta(½, ½) distribution, yielding the Beta(½ +

Table

95% confidence/credible intervals for the five methods for various settings of

Method | ||||
---|---|---|---|---|

F1 | (.134, .410) | (.345, .655) | (.390, .610) | (.456, .544) |

F2 | (.127, .412) | (.338, .662) | (.386, .614) | (.455, .545) |

F3 | (.119, .429) | (.305, .743) | (.357, .667) | (.440, .564) |

B1 | (.142, .403) | (.351, .649) | (.393, .607) | (.456, .544) |

B2 | (.136, .398) | (.350, .650) | (.392, .608) | (.456, .544) |

Overlap between methods. Overlap between approaches

F1–F2 | F1–F3 | F2–F3 | F1–B1 | F1–B2 | F2–B1 | F2–B2 | F3–B1 | F3–B2 | B1–B2 | |
---|---|---|---|---|---|---|---|---|---|---|

10 | .978 | .924 | .935 | .913 | .916 | .930 | .933 | .890 | .903 | .970 |

25 | .978 | .893 | .909 | .950 | .948 | .950 | .947 | .873 | .885 | .971 |

50 | .978 | .879 | .896 | .969 | .965 | .962 | .958 | .867 | .877 | .975 |

100 | .980 | .869 | .884 | .981 | .977 | .971 | .968 | .862 | .869 | .980 |

500 | .987 | .852 | .861 | .994 | .989 | .985 | .984 | .850 | .853 | .990 |

For continuous data, the central limit theorem states that for any reasonable

where _{n – 1} is the corresponding critical value from a

A prior is constructed for the population effect size delta, such that

With these restrictions in place, we conducted two sets of simulations. In the first set, we generate normally distributed data for a single group that varied along the following two dimensions:

Corresponding

Number of participants: 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, and 30.

Subsequently, we calculated 95% confidence and credible intervals for the resulting data.

In the second set of simulations, the data is artificially constructed such that the data vary on how skewed the underlying population distribution is. This was done by simulating data using the rsn function in R (from package sn, see

Results for the first set of simulations, based on normally distributed data, are shown in Figure

Comparison of 95% confidence intervals (black) to 95% credible intervals, based on the default Cauchy prior (red) for Normally distributed data. Results show intervals are nearly identical.

Results for the second set of simulations, based on right-skewed data, are shown in Figure

Comparison of 95% confidence intervals (black) to 95% credible intervals, based on the default Cauchy prior (red) for right-skewed data. Results show intervals are nearly identical.

In the present paper, we have demonstrated by means of various examples that confidence intervals and credible intervals, in various practical situations, are very similar and will lead to the same conclusions for many practical purposes when relatively uninformative priors are used. The examples used here are based on small samples but are otherwise well behaved and could easily occur in practice. When sample size increases, the numerical difference between both types of interval will (usually) decrease.

So in what situations do the approaches yield more substantial differences? There are two main examples: (1) restriction of range of the data; (2) Bayesian methods based on a considerably more informative prior. As an example of the first point, consider 15 scores on a Likert scale ranging from 1 to 5. Suppose that ten scores are 1, four scores are 2, and one score is 5. Construction of a classical 95% confidence interval results in the interval (0.95, 2.12), an interval that includes values below the minimum possible value of 1. The Bayesian 95% credible interval is bounded by definition to not include values beyond the range of the parameter space. For a uniform prior on this interval, combined with the assumption that the sample standard deviation equals the population standard deviation, the resulting 95% credible interval is (1.08, 2.07) (see Figure

Posterior density, credible interval (red) and confidence interval (blue) for the example with 15 measurements on a Likert-scale.

The second point highlights the scope of our present findings: we have shown numerical similarities between frequentist and Bayesian methods for (relatively) uninformative priors. Depending on the research context, vastly different intervals can be obtained if one chooses a specific informative prior. Our paper meant to highlight similarities when relatively standard, off-the-shelve, methods are used for constructing intervals under both regimes, using ‘objective’ or fairly uninformative priors, in the simple common contexts of estimation of proportions and means.

Why then, in cases with little or no prior information, bother with Bayesian approaches, and not stick to the more traditional frequentist confidence interval? A good reason is that a Bayesian analysis is more in line with the way researchers actually interpret their results (whether frequentist or not). That is, researchers tend to interpret their results in explicit or implicit terminology indicating how certain they are about what the effect size truly (i.e. in the population) is. As many papers and text books emphasize, frequentist approaches cannot warrant such statements, but Bayesian approaches can: One can claim that there is a 95% chance that the true effect size is in the credible interval. Even stronger, one can accompany the credible interval with a

The frequentist approach works from the premise that only the data are prone to random fluctuations, while the true effect is fixed, and hence it makes no sense to specify probabilities for the (fixed) population effect size but only about the probability as to whether the confidence intervals estimated by means of the data will cover the true effect size. This is a subtle difference with the Bayesian credible interval interpretation, but as the way people like to interpret results is more in line with the latter, the Bayesian approach is better in serving researchers at their wishes. This comes with a price, however. The price is that the statements are always conditional upon the prior that one has specified. Fortunately, however, the exact location of credible intervals does not appear to vary strongly with variations in the prior. Indeed, in the case where we assume that the population variance is known, the confidence interval for means can be obtained by a particular choice of the prior, namely the uniform prior. This is implausible in practice, but can be seen as a limiting case of a flat prior. And as we have seen now, it does not lead to very different intervals than does the more realistic Cauchy prior.

For us the main message of our paper is as follows. Frequentist confidence intervals can be interpreted as a reasonable approximation to a Bayesian credible interval (with uninformative prior). This is reassuring for those who struggle with the formally correct interpretation of frequentist intervals. Additional insight can be obtained when these intervals are complemented (or replaced) by a full posterior distribution for the effect size measure under study. The posterior distribution will, conditionally upon a chosen prior, give the full picture of the uncertainty around its possible value. It can provide information on skewness, bimodality, and other properties – or the lack thereof, such as in Figure

All computations have been performed using

The preprint of this paper has also been published on OSF.

With this specification for each data set _{n – 1}/√

We replicated this simulation study 25 times, and only found negligible differences in lower and upper bounds, see supplementary materials.

We also replicated this simulation study 25 times and again only found negligible differences in the lower and upper bounds, see supplementary materials.

The authors have no competing interests to declare.

CJA conducted the intervals for discrete data analysis and wrote the first version of the manuscript, HALK wrote a part of the paper, and critically commented on various versions of the manuscript, DvR conducted the intervals for continuous data simulations and critically commented on various versions of the manuscript. All authors contributed to writing of the final manuscript.