Null Hypothesis: Y U NO good enough for scientific articles?

If you’ve ever been involved in a scientific endeavour, there is a good chance you are familiar with the null hypothesis (which I’ll call H0). Basically, it is the opposite of the “real” hypothesis of a study. Say you want to demonstrate the following effect: chocolate consumption improves memorising skills. Your corresponding H0 would be the absence of such an effect.

In the ensuing statistical analyses, you’ll probably want to disprove the H0 to reject it in favour of your alternative hypothesis, thus showing a significant effect of chocolate on memory.

However, finding this Saint-Graal of inferential statistics is not the easiest thing. I won’t talk here about what influences this since it isn’t anything close to my area of expertise – I’d rather not ridicule myself. Rather, I’d like to discuss a little bit the overwhelming discrimination against unrejected H0s in the scientific literature.

You see? Source: xkcd

In my school projects so far, I have NEVER found ANY significant effect. EVER. It is disappointing. Most of all, my apparently consistent inability to reject the H0 made me think that, further in my academic career, I’d never be able to publish an article.

Indeed, most scientific journals accept almost only articles that contain significant effects (I don’t have numbers about this phenomenon, sorry). This attitude suggests that unrejected H0s somehow signify a lack of (convenient?) information.

But don’t they say that absence of evidence is not evidence of absence? Just because one team couldn’t reject the H0 doesn’t mean that their results are devoid of interest.

My point exactly! Source: muddylemon

For one thing, publishing unsignificant results would be like taking into account antimatter in addition to matter (i.e., significant results). They represent as revelant an information. Choosing to communicating them, instead of concealing them, would help increase transparency in science.

Secondly, researchers interested in replicating the experiment could focus on improving the methods rather than on inventing a whole procedure from scratch. This would mean saved time, saved money, collaboration opportunities and possibly less frustrating research.

Finally, and perhaps most importantly, information on “failed” experiments could help prevent un-needed research from happening. Steven Reysen, from the Journal of Articles in Support of the Null Hypothesis, explains it better than I do:

The file-drawer problem is that psychologists, and scientists in general, will not report research that does not meet traditional levels of significance. If a study has null results psychologists will often abandon the research to move on to other ideas and not report the findings. The result is that the journals are filled with studies that reached significance. For example, there may have been 20 null studies conducted on a topic but one significant study reported in the literature. If I then try to research the same topic I may be wasting time and money on that idea.

Clearly, I am in favor of the scientific community paying more attention to the H0/null hypothesis than it does at the moment, and not only because this could potentially give me a better shot at publishing my work.

What do you think? Publishing articles without significant results: yay or nay?


11 thoughts on “Null Hypothesis: Y U NO good enough for scientific articles?

  1. Falko Buschke

    I don’t think publishing studies with negative results is worthwhile… well, at least not in the current form of publication. If you can’t refute the null hypothesis, then you can’t make any concrete conclusions or add any information to the current knowledge of the field.

    However, I do agree that these (failed?) studies can be useful; if only to prevent others from following similar dead ends.

    I think a simple database with short reports of failed experiments will be sufficient. These reports would state: (1) why we expected the alternative hypothesis; (2) these were our methods, (3) this is what we found. Nothing more, nothing less. Any interpretation of these studies would be speculation and they could easily be twisted, misinterpreted and miscited in some inappropriate – possibly counter-productive – way.

    Journals are already encouraging authors to be as clear and concise as possible, even when the null hypothesis is rejected. Wasting pages (and reviewers time) on verbose introductions and discussion of inconclusive data is pointless and, frankly, pseudo-science: it is impossible (in all but the simplest systems) to extract anything meaningful about the study system from a failed experiment.

    1. Sam Hardman

      I disagree, the aim of conducting a study shouldn’t be to reach a predetermined and desired result but should be to test a hypothesis. As long as the study design is rigorous then a null result in itself can be useful and interesting data. No significant result is in itself a result even if it is not as desirable.

      1. Ria Pi Post author

        Indeed. I feel like in burgeoning areas of scientific study, there is an even bigger chance of this outcome – it would be a shame to disregard the results.

      2. Falko Buschke

        Sam, I agree that a rigorous study can produce valuable data even with negative results. But I don’t think the goal of the current publication system should be showcasing these promising data. Publication (and peer-reviewers’ time) should be spared for studies that make an active contribution to the current knowledge of the field.

        I think that you and I fundamentally disagree on what should be considered an active contribution to the current knowledge of the field. In my opinion, data alone is not worthy of publication; it should also be interpreted in a meaningful way.

        You wrote:
        “…a study shouldn’t be to reach a predetermined and desired result but should be to test a hypothesis”

        True, but we can argue about whether you can prove anything without falsification – I agree with Karl Popper that you can’t. So, you can’t conclusively test a hypothesis without rejecting the null expectation. But we’re heading down a slippery slope of a full-blown debate on the philosophy of science, which I’d prefer to avoid.

        Maybe we can agree to disagree?

    2. Ria Pi Post author

      “Any interpretation of these studies would be speculation and they could easily be twisted, misinterpreted and miscited in some inappropriate – possibly counter-productive – way.”

      I agree. I realise now I may have implied that unsignificant results may have the same interpretative value as significant ones. Indeed I think they don’t, if only because of the way ‘traditional’ statistics are built: we don’t usually look for the absence of an effect, but even if we did, it wouldn’t make sense to try to prove the null hypothesis to do so. Maybe other types of inferential statistics (e.g., Bayesian) could do in such cases? I don’t know.

      While I see your point, I still maintain that a “failed?” experiment (if it was done well, of course) is only failed insofar as it doesn’t support the researchers’ hypothesis, but it is potentially useful data nonetheless. I imagine this could be especially true if the predictions made aren’t based on solid theory and backed up by years and years of published research, which occurs relatively frequently as far I know.

      1. Falko Buschke

        I thought about this for a while and I still maintain that negative results are not useful unless you can also explain why the experiment “failed”… and the only way to conclusively identify why an experiment failed is to rectify its weakness and actually get positive results (which should then be published instead).

        Alternatively, if a benchmark study found outcome X during an experiment and I replicate the experiment and fail to find outcome X- it doesn’t necessarily mean that the original benchmark experiment was faulty. No amount of negative experiments will be able to disprove the original study. It is an example of Karl Popper’s solution to the problem of induction.

        Of course, if you can prove why the original experiment was faulty, then it will be a valuable contribution. However, to do this you should first replicate outcome X using the faulty experiment, then correct the flaw and show that outcome X disappears. In this way, you clearly show that outcome X depends on the presence of the flaw. This is a super valuable contribution to science but it doesn’t count as a negative result (or a “failed” experiment).

      2. Ria Pi Post author

        I think you are right about that.

        I’m just not sure I understand the following: “No amount of negative experiments will be able to disprove the original study. It is an example of Karl Popper’s solution to the problem of induction.”. Isn’t the point of Popper’s solution that a single observation can falsify a statement? Yet you say that a potentially large number of negative experiments couldn’t do so?

        Granted, it’s not a super important part of your argument, but I’d like to understand.

  2. Falko Buschke

    I just realised that I misread your original comment, so my response was slightly out of context. Nevertheless, what I wrote is still conceptually sound (I think).

    Imagine this silly example: researchers found that the lesser-spotted genet wags its tail when it sees a bright light. You want to replicate this study, so you formulate the null hypothesis that the lesser-spotted genet does NOT wag its tail in response to light. You run the experiment and fail to reject the null hypothesis (i.e. a negative result). Does this mean that the original research was wrong? You repeat the experiment and, again, you cannot reject the null because the genet just isn’t wagging its tail.

    You can repeat you experiment 100 times and fail to reject the null hypothesis 100 times. Although the evidence suggests that the lesser-spotted genet doesn’t wag its tail in bright light, you cannot conclusively – with 100% certainty – say that it will NEVER wag its tail when it sees bright light. No statement was falsified yet.

    On the flip-side: the original researchers found that the lesser-spotted genet wagged its tail. Granted, they couldn’t say, with complete certainty, that it will ALWAYS wag its tail but they can conclusively state that it SOMETIMES wags its tail (or it will not ALWAYS have a motionless tail). They falsified the null hypothesis.

    A simpler – but overused – example is the black swan problem. I make the statement that ALL swans are white. I go out and see one white swan (so far so good). Hours later I have seen 174 white swans. I feel very confident that my statement is true. The next day, feeling very self-assured that all swans are white, I see a black blip on the horizon. It turns out to be a black swan! That single individual falsified my statement despite the 174 white swans I had seen the day before.

    In other words, the failure to reject the null hypothesis (falsify a statement) does not, by default, mean that the alternative hypothesis is definitely false. But the rejection of the null, leaves no doubt whatsoever that the null hypothesis is false.

    It might all just be a bit of academic philosophising though. I think the newer Bayesian and Information theoretic approaches are less pedantic about the black-and-white realm of hypothesis testing and are more pragmatic about the relative importance of evidence (but I’m by no means an expert; so feel free to correct me if I am misguided).


Share your thoughts

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s