A common adage recited by many statistics instructors is that correlation is not causation. I am concerned that many students hear the adage, misinterpret it, and leave thinking that correlation, and by extension all of statistics, is useless in the real world.
A purer wording is that correlation does not imply causation, and is important to state because causation implies correlation. My guess is that we don’t say this in classrooms because we believe students don’t yet understand what imply means.
I believe a better way to translate this would be to say Correlation may be causation, but it also may not be. So, what can be done to ensure correlation is a sign of causation?
One answer lies in introducing students to causal inference (Neyman, 1923; Rubin, 1974). Causal inference fundamentally differs from traditional statistical inference. While statistical inference is limited to inferring relationships that exist under the specific conditions through which data were collected, causal inference explicitly attempts to infer relationships amidst changing conditions (Pearl, 2009).
Causal inference is still gaining early traction among the statistics education community. Often times, teachers are unfamiliar with it, and thus hesitate to teach it. I believe that teaching students counterfactual reasoning through causal inference is an important skill to teach as part of teaching statistical literacy (Gal, 2002).
One key visual tool in causal inference is the exploration of relationships between variables using a directed acyclic graph (DAG).
For example, we may think that a person’s weight affects to their blood cholesterol level.
But perhaps we realize that diet may also affect both characteristics.
Here, diet is explicitly acknowledged as a confounding variable. It must be considered in any research design attempting to speak to the relationship between weight and cholesterol level. Furthermore, we have a clear visual to help students understand why weight and cholesterol may be correlated, but may not have a causal relationship.
Unfortunately, I have not yet found a simple explanation I feel is appropriate for introductory students. This may be a future project of mine. In the meantime, I personally enjoyed reading about the history of causal inference in a paper by Freedman (1999), the first few sections of which may be accessible and informative.
References and further reading:
Freedman, D. (1999). From association to causation: some remarks on the history of statistics. Journal de la société française de statistique, 140(3), 5-32.
Neyman, J. 1923 . “On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Section 9.” Statistical Science, 5 (4), 465–472. Trans. Dorota M. Dabrowska and Terence P. Speed.
Pearl, J. (2009). Causal inference in statistics: An overview. Statistics surveys, 3, 96-146.
Rubin, D. B. (1974). Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies. Journal of Educational Psychology, 66, 688-701 .