The p-value controversy

I was first introduced to the p-value controversy by an epidemiologist in 2014. The controversy is about the use and practice of null hypothesis significance testing (Tramifow, 2014; Wasserstein, 2016; Wellek, 2017). It is essentially due to the prevalence of the practice of mindless hypothesis testing procedures, also known as the null ritual (Gigerenzer, 2004).

Why does the controversy exist? Ultimately, it’s because statistical reasoning and inference, as well as the hypothesis testing procedure, is difficult to understand (del Mas, 2004). So widespread is confusion that the null ritual has even been labelled as tyrannical (England, 1991; Stang, Pool, & Kuss, 2010).

There’s so much confusion that even the controversy itself can be misunderstood as a indictment of p-values and the hypothesis testing procedure, whereas it is simply a recommendation that one should be thoughtful and not use statistical tools blindly (Wasserstein, Schirm, & Lazar, 2019).

Surely, one part of the problem, and one part of any solution, is the statistics classroom. If people fail to understand statistics, is it not the responsibility of statisticians, as stewards of the field, to help remedy the situation?

I think understanding the evolution and origins of the p-value in hypothesis testing can go a long way in helping. Most people would find it surprising to hear that the three people credited with its development would likely balk at its current practice (Gigerenzer, 2004).

There are many articles that discuss the origins of the procedure, but, one of my favourites is by Biau, Jolles, & Porcher (citation listed below). I strongly recommend everyone to read that paper (and require it of my students) – it is short and very accessible – and at the very least, should prove illuminating.

References and Further Reading:

Biau, D. J., Jolles, B. M., & Porcher, R. (2010). P value and the theory of hypothesis testing: an explanation for new researchers. Clinical Orthopaedics and Related Research®, 468(3), 885-892.

del Mas, R. C. (2004). A comparison of mathematical and statistical reasoning. In The challenge of developing statistical literacy, reasoning and thinking (pp. 79-95). Springer, Dordrecht.

England, C. (1991). On the tyranny of hypothesis testing in the social sciences. Contemporary psychology, 36(2), 102-105.

Gigerenzer, G. (2004). Mindless statistics. The Journal of Socio-Economics, 33(5), 587-606.

Trafimow, D. (2014). Editorial. Basic and Applied Social Psychology, 36(1), 1-2.

Stang, A., Poole, C., & Kuss, O. (2010). The ongoing tyranny of statistical significance testing in biomedical research. European journal of epidemiology, 25(4), 225-230.

Wasserstein, R. L. (2016). ASA statement on statistical significance and P-values.

Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a World Beyond “p< 0.05”. The American Statistician, 73 (S1), 1-19.

Wellek, S. (2017). Author response to the contributors to the discussion on “A critical evaluation of the current ‘p‐value controversy’”. Biometrical Journal, 59(5), 897-900.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s