An 8-week course

2020’s outbreak of COVID-19 has affected many things, and one of them is the development of HSCI 3117 – no major overhaul of the course will occur, due to funding limitations, before the course is offered in Spring 2021. However, COVID-19 has re-invigorated my commitment to the importance of a statistically literate citizenry. Therefore, I hope to push forward with a bare-bones update of a new statistical literacy focused course.

One other bombshell was dropped – the course will now be an 8-wk course. I understand why the university may prefer the 8wk format, and I understand why many students do as well, but I’m not entirely convinced the format is in the best interests of students’ learning, because asking students to spend 14 hours a week never seems to quite pay off twice as much as only asking them to spend 7 hrs a week – there are diminishing returns.

After reviewing the original 15wk format, I can’t bring myself to cut any material, even though I know I am going to be asking a lot from my students – this is a literacy course, and as such, is reading intensive. The new weeks 2 and 3 require reading approximately 100 pages, which may startle statistics students but is typical for social science students – I don’t know how health science students will react. My goal is for students to acquire a general-level understanding of principles and terms, and I will, via discussion activities, help navigate nuances of the key concepts. To help them develop an understanding of why these statistical concepts are important, I plan to include readings from Stephen Stigler’s Seven Pillars of Statistical Wisdom in addition to Harvey Motulsky’s Intuitive Biostatistics as the main textbooks.

The plan is to rely heavily on the readings for each week, readiness quizzes to ensure basic comprehension, and discussion activities when I, as an active participant, will be able to refine my students’ understanding.

Statistical Literacy and COVID-19

COVID-19 has changed many things about the world. These experiences have convinced me of the importance of a basic statistical literacy for all citizens of the world. While there doesn’t appear to be a consensus definition for statistical literacy (Ziegler & Garfield, 2018), the definition that I mean here is something close to Wallman’s (1993) definition of an ability to understand and critically evaluate statistical information. I would argue that “statistical” isn’t an attribute of any information, but rather, it is an attribute of the way in which one processes and views any information. This is akin to Goodwin’s (1994) professional vision and practices of seeing frameworks.

So, how does a statistician view information related to COVID-19? I’ll present two examples of things that my colleagues and I saw in news stories (and knew that each other understood) that perhaps are not generally immediately obvious.

In April, Madagascar launched a herbal remedy for COVID-19. According to an article by the BBC, the Malagasy president’s communications director said, “The majority of people who used the product and don’t have a chronic illness recovered completely … as long as it’s working, we don’t need clinical trials.” What I immediately think of is the counterfactual – would those same people who used the product and recovered have recovered if they didn’t use the product? The quote has two clues – “the majority of people” implies that there are some who used the product and didn’t recover, and “don’t have a chronic illness” implies that this is describing individuals without comorbidities. According to a study by Ioannidis et al. (2020), which has not been peer-reviewed, “deaths for people <65 years without underlying predisposing conditions are remarkably uncommon.” Therefore, if the patients in Madagascar who did not have a chronic illness would have recovered even if they didn’t drink the herbal remedy, did the herbal remedy really do anything or have any effect? Even if the herbal remedy wasn’t working those patients would likely have gotten better. Therefore, this evidence (that most people without chronic illness who used the product recovered) does not support the claim (that the product works).

Around that same time, one of the first studies of Remdesivir was published. Wang et al. (2020) found, in a study where 158 patients were given Remdesivir against 79 given a placebo, that Remdesivir did not appear to improve time to recovery in patients. Shortly thereafter, Beigel et al. (2020) found, in a study with approximately 500 patients receiving Remdesivir and another 500 receiving a placebo, that Remdesivir did appear to slightly improve time to recovery. Which study is correct? Well, they both are. Each individual is unique, the way the disease progresses in each individual is unique, and the way Remdesivir affects each individual is unique. If we did ten studies with 100 patients, but 100 different patients each time, we wouldn’t get exactly the same results each time because the individuals in our study are different – this is called sampling variability. So if both these studies are correct, how do we combine their information? We can use a technique called meta-analysis. Simply, it seems that even if Remdesivir helps, it only helps a little bit. That’s still better than nothing, but as Beigel et al. (2020) note, “it is clear that treatment with [Remdesivir] alone is not likely to be sufficient.”

In both cases, what we statisticians saw were two clearly viable alternate explanations explaining observed phenomenon. Note that we didn’t react by saying the treatment doesn’t work, or that one of the studies is wrong – we acknowledged alternate competing theories being plausible, and therefore, to say a treatment works, we must first rule out these other competing theories – that there only appears to be a difference because of sampling variability, or that even if the treatment didn’t work we still would have seen this effect occur. These two alternate explanations are the core of our introductory statistics classes.

I hope all my students, and all individuals worldwide, process information by the following maxims:

  • Accept uncertainty – it is ubiquitous, and statistics helps us to quantify our uncertainty.
  • Acknowledge variability – particularly in the form of sampling variability, which casts uncertainty upon our estimates.
  • Ask the counterfactual – in order to say that one thing caused something to happen, you should also demonstrate that what happened would not have happened without that thing.

References:

Beigel, J. H., Tomashek, K. M., Dodd, L. E., Mehta, A. K., Zingman, B. S., Kalil, A. C., … & Lopez de Castilla, D. (2020). Remdesivir for the treatment of Covid-19—preliminary report. New England Journal of Medicine.

Goodwin, C. (1994). Professional Vision. American Anthropologist, 96(3), 606-633.

Ioannidis, J. P., Axfors, C., & Contopoulos-Ioannidis, D. G. (2020). Population-level COVID-19 mortality risk for non-elderly individuals overall and for non-elderly individuals without underlying diseases in pandemic epicenters. medRxiv.

Wallman, K., K. (1993). Enhancing statistical literacy: Enriching our society. Journal of
the American Statistical Association, 88
(421), 1–8.

Wang, Y., Zhang, D., Du, G., Du, R., Zhao, J., Jin, Y., … & Hu, Y. (2020). Remdesivir in adults with severe COVID-19: a randomised, double-blind, placebo-controlled, multicentre trial. The Lancet.

Ziegler, L., & Garfield, J. (2018). Developing a statistical literacy assessment for the modern introductory statistics course. Statistics Education Research Journal17(2), 161-178.

Textbook Choice

When I last contemplated the curriculum I wished to adopt, there was one text book I didn’t consider, because it was being used for a Master’s level course in the department – Harvey Motulsky’s Intuitive Biostatistics (IB). However, that course since switched textbooks, leaving me free to consider IB for 3117.

I requested an exam copy of the book to assess its suitability for 3117 – The chapters are short, introduce important vocabulary with examples, avoids mathematical formulas, and most importantly, covers nearly all of the content I was hoping to cover in 3117.

Of 46 chapters in the book, 40 appeared to fit my original conception for 3117. Furthermore, the only things I intend to supplement are resources focusing on measurement error, randomization in experiments, simulation based tests, and perhaps poisson regression.

One added advantage is that the IB curriculum has been adopted for biostat literacy courses in three universities by expert statistics educators. Therefore, choosing IB for 3117 carries the advantage of a small yet excellent set of resources in terms of a community of instructors to seek advice from.

Weekly Learning Objectives

Having settled on content and course learning objectives, I began to plan out weekly topics and learning objectives. I searched several textbooks for ideas on how to sequence topics, with an ulterior motive of perhaps selecting one for students.

However, none of the textbooks aligned with my combination of context and approach. I then recalled research in statistics education stating that the order of topics probably doesn’t matter all that much.

Although I wanted to make evidence-based decisions in course design, I decided to simply rely on my own experience and intuition to sequence topics. In hindsight, I might have reached out to other biostatistics instructors I know for their courses’ weekly learning objectives.

Once I finished sequencing topics, I used Bloom’s Taxonomy to help me create weekly learning objectives. Because of the literacy focus, I used the verbs Identify, Explain, Describe, and Evaluate repeatedly.

As a final check of alignment, I mapped each of my weekly objectives to the four course objectives. There was relative balance, with all course objectives mapped to between 14-17 of the 47 total weekly learning objectives.

Towards a framework for context selection in statistics instruction

How many datasets should a statistics instructor use over the course of a semester? Fundamentally, there are two key forces at play between which we must strike an equilibrium.

The benefit of using more than one dataset, or context, and continuing to introduce new datasets, is fostering students’ abstraction and generalization of statistical concepts. The cost of using each additional dataset with which students interact in a meaningful way is inciting cognitive load, and students have limited cognitive capacity. Essentially, we want students to develop a deep understanding but we don’t want to overburden them.

For a single topic or statistical content, research suggests that we should use somewhere between 3 to 5 different datasets and contexts. We should also space each dataset we use throughout the course, using it multiple times if possible. However, we should do this in such a way that there is not more than 1-2 new datasets for students to work with in consecutive weeks.

Based on these recommendations, here’s how I might use various datasets, represented by letters, in an 8 week course:

A similar strategy can be used in 15-week courses, or even two semester course sequences which can be considered a 30-week course. These strategies suggest that the number of datasets statistics instructors should use in a course is between 1.0 – 1.2 times the total number of weeks.

This assumes that instructors follow several other pedagogies and utilize the datasets in a very particular way. Thus, this recommendation may not be globally applicable. However, this is the strategy I now use in the courses I teach, and have found it to achieve the goals I’ve set.

Probability versus Likelihood

I sort of understand that you can’t state a probability for whether or not the population mean is in the interval because it either is or it isn’t (or at least I read that, but I don’t think I really understand it). If there is a 90% chance that the interval contains the population mean, I’m not really clear as to why this is not also the probability because doesn’t that mean there is a 10% chance that it does not contain the population mean, giving you a 0.9 probability that it does contain it? 

One of my students

The answer, somewhat unfortunately/unintuitively/confusingly lies in semantics. The population mean is some value – we just dont know what it is. Therefore, there’s no random chance about it. It straight up is equal to some value. And here’s where more confusion comes into play. 

Probability and Likelihood are two different concepts. They’re basically opposite directions on the same two-way street. When we know population parameters and we’re examining outcomes from the population, that is probability. When we know a sample and we’re examining population parameters, that is likelihood. So, you can say something like ‘given that the mean of a distribution is 15, the probability that a random sample of size 12 has a sample mean greater than 18 is 0.342 (made up number)’. And, you can say something like, ‘given that we observed a sample mean of 42 from 20 observations, the likelihood that the true population mean is greater than 40 is 0.712 (made up number)’. The difference is super subtle, and I’ll admit, probably doesn’t really matter at the end of the day, and the authors of the textbook pull a tricky one by using the word sure and contrasting it with probability, without making the difference explicit (which it absolutely intuitively is not). But, this is why if we’re going by the book, we can’t say something like ‘theres a 90% chance the true mean is in our interval’, because its not even in the realm of probability to discuss the behaviour of population parameters given observed samples. We have to say things like ‘its 90% likely that the true mean is in our interval’. 

Again, I admit, does this distinction really make a difference, a practically significant difference? No, probably not (no pun intended). But if you understand that using population parameters to describe the behaviour of samples and using observed samples to talk about population parameters are two different things, kind of like opposite directions on the same two-way street, then you’re in good shape. It’s just a matter of knowing which side of the street you’re driving on, and what the specific verbiage/jargon is to use on that side of the street.

Choosing a curriculum – Part 2

We have a collection of introductory statistics textbooks in my office, mostly written within the last 10 years, many of which were written by researchers in statistics education. I decided to go through them one by one, to help me shape my thoughts and inform the selection of a new textbook.

Chance Encounters – a first course in data analysis and inference by C.J. Wild and G.A.F. Seber: I liked the way the authors introduced datasets, where they come from, and what they represent. However, I didn’t want to discuss probability at all, nor provide a mathematical treatise of the central limit theorem. I wanted to take a more intuitive approach.

Introductory Statistics – exploring the world through data by R. Gould and C. Ryan: I liked the guided activities included in the textbook. However, I still struggle with how to translate such activities to the distance-based environment. Furthermore, the book still focused on z-values and t-values, something I did not want to teach.

Activity-based Statistics by R.L. Scheaffer, A. Watkins, J. Witmer, and M. Gnanadesikan: I really liked this textbook, and the activity-based pedagogy. However, I did not know how to translate this to a completely distance-based environment, which I think is still an open research question in statistics education.

Statistics – the art and science of learning from data by A. Agresti and C. Franklin: I really liked the section introducing data, which had a similar feel to Chance Encounters. Again, however, I did not want to have my students calculate t scores or z scores during inference.

Statistics – unlocking the power of data by R.H. Lock, P.F. Lock, K.L. Morgan, E.F. Lock, and D.F. Lock: I really liked the simulation based approach to teaching inference, and the use of the free software StatKey. Furthermore, the order of the chapters almost perfectly aligned with my vision of the course.

After examining these five text books, the Lock5 text book stood out to me as the most interesting. I briefly investigated other simulated based text books, but decided to stay with the Lock5 book, and our new curriculum was in place!

The p-value controversy – Part 2

I recently attended a conference where one of the plenary sessions included a discussion of the p-value controversy, and provided suggestions on what we should be teaching and using instead.

When I first spoke about the p-value controversy on this blog (Rao, 2019), I suggested that an understanding of the history of hypothesis testing may help clear students’ confusion. To Fisher, p-values were a measure of likelihood that a current hypothesis or theory could explain observed phenomenon. Only when a hypothesis was so unlikely to be a sufficient explanation of a phenomenon could it be ruled out or rejected. It was with this lens that Fisher described p-values less than 0.05 as significant, choosing 0.05 as a convenient threshold (Fisher, 1925).

Wasserstein, Schirm, and Lazar (2019) promote many alternate statistics, one of which is the effect size. They advise that careful consideration be made to determine what a meaningful effect size would be for each individual study.

However, I am concerned that we are setting ourselves up for an effect-size controversy in the future, similar to the p-value controversy. Cohen’s d (Cohen, 1988) is one of the most famous measures of effect size, and comes with convenient thresholds for small, medium, and large effects. Sullivan & Feinn (2012) summarize many different measures of effect size, and for all, include thresholds for determining the size of the effect.

If 0.2 is a small effect, and 0.5 is a medium effect, what is 0.35? Is 0.1 still a small effect, or is it no effect? What about 0.05?

In my mind, these alternate procedures do not solve the problem at the root of the p-value controversy, it just re-directs it. I still don’t know of any perfect solution, but I believe Wasserstein, Schirm, and Lazar’s (2019) recommendation to be thoughtful is what we must seek to impart on our students.

My favourite summary of recommendations thus far is one by Dr Andrew Zieffler (citation listed below). I plan on requiring my students to review these slides after reading the Biau, Jolles, & Porcher (2010) article, in an attempt to help set the stage for thoughtful interaction with statistical tools.

References and further reading:

Biau, D. J., Jolles, B. M., & Porcher, R. (2010). P value and the theory of hypothesis testing: an explanation for new researchers. Clinical Orthopaedics and Related Research®, 468(3), 885-892.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum Associates.

Fisher, R.A. (1925). Statistical methods for research workers. London: Oliver and Boyd.

Rao, V.N.V. (2019, March 23). The p-value controversy [Blog post]. Retrieved from https://statisticaljourneys.home.blog/2019/03/23/the-p-value-controversy/

Sullivan, G. M., & Feinn, R. (2012). Using effect size—or why the P value is not enough. Journal of graduate medical education4(3), 279-282.

Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a World Beyond “p< 0.05”. The American Statistician, 73 (S1), 1-19.

Zieffler, A. (2019). Deprecating statistical significance: Toward better science [Lecture slides]. Retrieved from: http://www.datadreaming.org/post/2019-04-26-slhs-prosem/.

Correlation may be Causation

A common adage recited by many statistics instructors is that correlation is not causation. I am concerned that many students hear the adage, misinterpret it, and leave thinking that correlation, and by extension all of statistics, is useless in the real world.

A purer wording is that correlation does not imply causation, and is important to state because causation implies correlation. My guess is that we don’t say this in classrooms because we believe students don’t yet understand what imply means.

I believe a better way to translate this would be to say Correlation may be causation, but it also may not be. So, what can be done to ensure correlation is a sign of causation?

One answer lies in introducing students to causal inference (Neyman, 1923; Rubin, 1974). Causal inference fundamentally differs from traditional statistical inference. While statistical inference is limited to inferring relationships that exist under the specific conditions through which data were collected, causal inference explicitly attempts to infer relationships amidst changing conditions (Pearl, 2009).

Causal inference is still gaining early traction among the statistics education community. Often times, teachers are unfamiliar with it, and thus hesitate to teach it. I believe that teaching students counterfactual reasoning through causal inference is an important skill to teach as part of teaching statistical literacy (Gal, 2002).

One key visual tool in causal inference is the exploration of relationships between variables using a directed acyclic graph (DAG).

For example, we may think that a person’s weight affects to their blood cholesterol level.

But perhaps we realize that diet may also affect both characteristics.

Here, diet is explicitly acknowledged as a confounding variable. It must be considered in any research design attempting to speak to the relationship between weight and cholesterol level. Furthermore, we have a clear visual to help students understand why weight and cholesterol may be correlated, but may not have a causal relationship.

Unfortunately, I have not yet found a simple explanation I feel is appropriate for introductory students. This may be a future project of mine. In the meantime, I personally enjoyed reading about the history of causal inference in a paper by Freedman (1999), the first few sections of which may be accessible and informative.

References and further reading:

Freedman, D. (1999). From association to causation: some remarks on the history of statistics. Journal de la société française de statistique140(3), 5-32.

Neyman, J. 1923 [1990]. “On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Section 9.” Statistical Science, 5 (4), 465–472. Trans. Dorota M. Dabrowska and Terence P. Speed.

Pearl, J. (2009). Causal inference in statistics: An overview. Statistics surveys, 3, 96-146.

Rubin, D. B. (1974). Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies. Journal of Educational Psychology, 66, 688-701 .

Anyone can do statistics

My favourite pixar movie is Ratatouille. Chef Gusteau’s motto is “tout le monde peut cuisiner”, or, “anyone can cook”. Later in the movie, Anton Ego, voiced by Peter O’Toole, writes “Not everyone can become a great artist, but a great artist can come from anywhere.”

Similarly, I believe that anyone can do statistics, or, that not everyone can become a statistician, but a statistician can come from anywhere.

I recently attended a talk where a prominent statistician and data analyst told the story of how she came into the field. It was entirely due to her instructors explicitly encouraging her and telling her that she has potential in the field.

While not all of our students will become statisticians, or major in statistics, or even master the material we attempt to teach them, we must encourage them, keep an open mind, and instill in them a positive identity as statisticians or data scientists/analysts (Cobb & Hodge, 2002). Our students will never believe in themselves if we do not believe in them first.

I came home from the talk and sent an email to my students from last semester to encourage them to consider furthering their skills, to explicitly state that I believe they have potential as data analysts, and to attempt to instill in them an identify as a statistician.

My message to them: Not only can you do statistics, you have done statistics. You are a statistician.


References and further reading:

Cobb, P., & Hodge, L. (2002). Learning, identity, and statistical data analysis. In Sixth International Conference on Teaching Statistics (ICOTS6), Cape Town, South Africa.

Lewis, B. (Producer), & Bird, B. (Director). (2007). Ratatouille [Motion Picture]. United States: Pixar.