Does a hypothesis test ever prove the null hypothesis?

The traditional view that one cannot prove the null hypothesis by a statistical analysis is a consequence of Ronald Fisher's structuring of the problem of probabilistic inference. Fisher argued that if one wanted to determine whether an experimental manipulation (e.g., a drug treatment or a treatment of some crop land) had an effect (on e.g., recovery rate or crop yield), one should compute the probability that one would obtain an effect (that is, a difference in the means of two samples, a control sample, and an experimental sample) as big or bigger than the one in fact obtained if both samples were in fact drawn from the same distribution (that is, if there were in fact no effect). In other words, how likely is it that one would see an effect that big or bigger by chance? If this probability is sufficiently low (say, less than one chance in 20), then one is justified in concluding that what one did had an effect (or that there was a difference in the average values in the populations that one drew the two samples from). Thus, if the difference in the means of the two samples was sufficiently greater than expected by chance, then one was justified in concluding that something more than chance was at work. This way of framing the problem has come to be called Null Hypothesis Significance Testing (NHST, for short). In this formulation, you cannot prove the null hypothesis, because failing to reject it is not the same as accepting it--anymore than a verdict of "not proven" is the same as a verdict of "not guilty." Thus, if you think this is the right way to formulate the problem of probabilistic inference, then you cannot prove the null.

But this way of formulating the problem flies in the face of common sense. We all draw a strong and clear distinction between "not proven" and "not guilty." In this formulation, the only possible verdict as regards the null hypothesis is "not proven." This would perhaps be okay if null hypotheses were of no scientific or practical importance. But, in fact, they are of profound scientific and practical importance. The conservation laws, which are at the foundation of modern physics, are null hypotheses; they all assert that "under no circumstance does this change." And, for most people it matters whether a generic drug costing 1/10 the cost of a brand drug really has "the same effect" as the brand drug or just "has not been proven to have a different effect." If you frame it in the latter way, then many more people will opt to pay the additional cost than if you say that "the evidence shows that the effects of the generic drug and the brand drug do not differ" (a null hypothesis).

Moreover, and this is more technical, the traditional formulation violates basic mathematical/logical considerations. One of these is consistency: a rational treatment of the evidence for and against any hypothesis should have the property that as the number of observations "goes to infinity" (becomes arbitrarily large), then the probability of drawing the correct conclusion should go to 1. But, under the NHST formulation, when the null hypothesis is true, the probability of rejecting it remains .05 or .01 (whatever one regards at the crucial degree of improbability) no matter how many observations there are. Another curious aspect of the traditional formulation is that it licenses concluding the one's own (non-null) hypothesis is correct because the null hypothesis appears to fail, even though one's own hypothesis is never tested against the data, whereas the null hypothesis is. This is a little bit like concluding that one could oneself climb a formidable mountain just because someone else has failed to climb it. Fairness would seem to require that one's own hypothesis, whatever it may be, should undergo the same test that the null hypothesis has undergone. At a somewhat simpler level, How can a statistical method for drawing conclusions be valid if it prohibits ever drawing a conclusion in favor of some hypotheses (null hypotheses) that are of fundamental scientific and practical importance?

There is an alternative to the NHST formulation of the problem of probabilistic inference that dates back to the work of the Reverend Thomas Bayes in the 18th century. In the Bayesian formulation, both the null hypothesis and one or more alternatives to it are tested against the data. In this formulation, each of the hypotheses places a bet on where the data from the experimental treatment (or the other condition of observation) will fall. The hypothesis that does the best job of anticipating where the data in fact fall obtains the greatest odds of being correct (or, at least, more valid than the alternatives to it that have been proposed). In this formulation, it is perfectly possible for the null hypothesis to be the odds on favorite. In other words, in this conception of how to do probabilistic inference, it is possible to prove the null in the sense that the null may have arbitrarily greater odds as against any of the proposed alternative to it. Thus, in this formulation, the null is no different than any other hypothesis. This approach to the problem of probabilistic inference has gained considerable currency in recent years. According to its advocates, it is the only "normative" (mathematically correct) formulation, because, among other things, it does not prohibit any hypothesis from being accepted, and because it is consistent: as the data (number of observations) become arbitrarily large, the odds that the true hypothesis will be accepted increase toward infinity, regardless of whether the true hypothesis is the null hypothesis or an alternative to it. In short, the Bayesian formulation places the null hypothesis on the same footing as any other hypothesis, so it is just as susceptible of proof as any other hypothesis.