Probability versus Likelihood

I sort of understand that you can’t state a probability for whether or not the population mean is in the interval because it either is or it isn’t (or at least I read that, but I don’t think I really understand it). If there is a 90% chance that the interval contains the population mean, I’m not really clear as to why this is not also the probability because doesn’t that mean there is a 10% chance that it does not contain the population mean, giving you a 0.9 probability that it does contain it? 

One of my students

The answer, somewhat unfortunately/unintuitively/confusingly lies in semantics. The population mean is some value – we just dont know what it is. Therefore, there’s no random chance about it. It straight up is equal to some value. And here’s where more confusion comes into play. 

Probability and Likelihood are two different concepts. They’re basically opposite directions on the same two-way street. When we know population parameters and we’re examining outcomes from the population, that is probability. When we know a sample and we’re examining population parameters, that is likelihood. So, you can say something like ‘given that the mean of a distribution is 15, the probability that a random sample of size 12 has a sample mean greater than 18 is 0.342 (made up number)’. And, you can say something like, ‘given that we observed a sample mean of 42 from 20 observations, the likelihood that the true population mean is greater than 40 is 0.712 (made up number)’. The difference is super subtle, and I’ll admit, probably doesn’t really matter at the end of the day, and the authors of the textbook pull a tricky one by using the word sure and contrasting it with probability, without making the difference explicit (which it absolutely intuitively is not). But, this is why if we’re going by the book, we can’t say something like ‘theres a 90% chance the true mean is in our interval’, because its not even in the realm of probability to discuss the behaviour of population parameters given observed samples. We have to say things like ‘its 90% likely that the true mean is in our interval’. 

Again, I admit, does this distinction really make a difference, a practically significant difference? No, probably not (no pun intended). But if you understand that using population parameters to describe the behaviour of samples and using observed samples to talk about population parameters are two different things, kind of like opposite directions on the same two-way street, then you’re in good shape. It’s just a matter of knowing which side of the street you’re driving on, and what the specific verbiage/jargon is to use on that side of the street.

Choosing a curriculum – Part 2

We have a collection of introductory statistics textbooks in my office, mostly written within the last 10 years, many of which were written by researchers in statistics education. I decided to go through them one by one, to help me shape my thoughts and inform the selection of a new textbook.

Chance Encounters – a first course in data analysis and inference by C.J. Wild and G.A.F. Seber: I liked the way the authors introduced datasets, where they come from, and what they represent. However, I didn’t want to discuss probability at all, nor provide a mathematical treatise of the central limit theorem. I wanted to take a more intuitive approach.

Introductory Statistics – exploring the world through data by R. Gould and C. Ryan: I liked the guided activities included in the textbook. However, I still struggle with how to translate such activities to the distance-based environment. Furthermore, the book still focused on z-values and t-values, something I did not want to teach.

Activity-based Statistics by R.L. Scheaffer, A. Watkins, J. Witmer, and M. Gnanadesikan: I really liked this textbook, and the activity-based pedagogy. However, I did not know how to translate this to a completely distance-based environment, which I think is still an open research question in statistics education.

Statistics – the art and science of learning from data by A. Agresti and C. Franklin: I really liked the section introducing data, which had a similar feel to Chance Encounters. Again, however, I did not want to have my students calculate t scores or z scores during inference.

Statistics – unlocking the power of data by R.H. Lock, P.F. Lock, K.L. Morgan, E.F. Lock, and D.F. Lock: I really liked the simulation based approach to teaching inference, and the use of the free software StatKey. Furthermore, the order of the chapters almost perfectly aligned with my vision of the course.

After examining these five text books, the Lock5 text book stood out to me as the most interesting. I briefly investigated other simulated based text books, but decided to stay with the Lock5 book, and our new curriculum was in place!

Teaching sampling variability

To my mind, the most important foundational concept to statistical inference is an appreciation of sampling variability. Chance, del Mas, & Garfield (2004) lay out a vision of what students need to understand and what they should be able to do with that understanding. However, I wouldn’t reach as far as they did. I believe that the core understanding rests on only a small subset of their list.

The main understanding students should have, in my opinion, is that given a population parameter, some values of a sample statistic are more or less likely to than others to be the result of a sample from that population. This should manifest in a student’s ability to make statements about how far a sample statistics is likely to vary from a population parameter, and vice versa.

Developing such an understanding in students is no trivial matter. There seems to be consensus in the statistics education research community that the use of simulations can help develop students’ understanding of sampling variability (Garfield, et. al., 2008).

I particularly like an activity designed by Scheaffer, et. al. (1996) called What is a confidence interval anyway?. The instructor resources presents a scatterplot relating population proportions to their likely sample proportions (Figure 8, page 274).

Printed below is an adaptation of this scatterplot demonstrating how a student might use it to determine that the likely values of a population proportion are between approximately 65% and 75% after determining that their sample proportion is 0.70 from a sample of size 100.

I particularly like this tool as I believe it helps to frame the idea of inference quite nicely. We never know what the true population parameters are. However, the theory of sampling distributions tells us something about how sample statistics behave in relation to those parameters.

Each of the vertical bars represent the likely sample proportions we might get when we sample from a population with the given population proportion. When we take only one sample sample, we can never know for sure the exact value of the population parameter, but certain options become to look increasingly unlikely. Use of this scatterplot may guide students into a more multiplicative conception of a sample (Saldanha & Thompson, 2002).

I believe such an activity can help improve students’ ability to make statements about how far a sample statistics are likely to vary from a population parameter, and vice versa. However, by only focusing on this one learning objecting, as opposed to full list of recommendations by Chance, et. al. (2004), would I be doing a disservice to our students in their future work and studies in statistics, or will this indeed provide a sufficient foundation for them to become statistically literate?

References and further reading:

Chance, B., del Mas, R., & Garfield, J. (2004). Reasoning about sampling distribitions. In The challenge of developing statistical literacy, reasoning and thinking (pp. 295-323). Springer, Dordrecht.

Garfield, J. B., Ben-Zvi, D., Chance, B., Medina, E., Roseth, C., & Zieffler, A. (2008). Learning to Reason About Statistical Inference. In Developing Students’ Statistical Reasoning (pp. 261-288). Springer, Dordrecht.

Saldanha, L., & Thompson, P. (2002). Conceptions of sample and their relationship to statistical inference. Educational studies in mathematics, 51(3), 257-270.

Scheaffer, R.L., Watkins, A., Gnanadesikan, M., Witmer, J.A. (1996). What Is a Confidence Interval Anyway? In Activity-Based Statistics: Instructor Resources (pp. 274-278). Springer, New York, NY.

Teaching data collection

I recently attended a presentation on the Island and its use in a statistics class. The Island was developed by Dr Michael Bulmer, and is a simulated environment where students can practice data collection and study design.

We want our students to be able to use the statistical tools we teach in order to make data informed decisions, which means they will need to collect data first. Unfortunately, we typically hand students’ datasets in our classrooms. An appreciation of the data creation process can help to improve students’ reasoning when applying statistical analyses to the data (McClain & Cobb, 2001).

In order to develop this appreciation for data collection, I have begun brainstorming activities for students using the Island, such as the following activity:

  • Select a sample of 10 different people. Select one person randomly from 10 different cities. You can select from any 10 cities you wish. To select your sample, choose an island, then choose a city, then randomly choose a house, and randomly choose a person in that house.
  • Record which island they live on, which city they live in, their house number, their name, their age, their gender, their systolic blood pressure, and their cholesterol level.
  • Summarize the distribution of the following traits of people in your sample: age, gender, systolic blood pressure, cholesterol level
  • Share your dataset and your summaries with your classmates.

In having students share their results, we also create a window of opportunity to discuss sampling variability, one of the cornerstone topics of any course in statistics. I am looking forward to experimenting with this resource and evaluating its efficacy in developing students’ appreciation of a dataset.

References:

McClain, K., & Cobb, P. (2001). Supporting students’ ability to reason about data. Educational Studies in Mathematics, 45(1-3), 103-129.

Choosing a curriculum – Part 1

With the green light to make as dramatic a proposal as I desired, I set to considering a new syllabus accompanied by general learning objectives in February 2019. My guiding principles were based on what I had recently learnt about statistics education research, my experiences teaching HSCI 2117, and my experience working as a statistician in the health sciences. I knew I wanted to make decisions supported by academic research, but before I went hunting for recommendations in the research literature, I sat down to plot out where my current visions lay.

New statistical reasoning course objectives

I wanted my students to walk out of my course with knowledge and skills that would allow them to use data to inform decisions. This meant I needed them to be comfortable asking a question, collecting relevant data, summarizing and describing data, and using data to make inferences. Specific modules could focus on:

  • measurement – focusing on validity and data collection
  • types of variables (categorical, ordinal, discrete quantities, continuous ratio scales) – the backbone to future instruction, as each of the tools I would introduce can be mapped to a subset of these four types in univariate and bivariate cases
  • summary statistics and data visualizations
  • modelling – focusing on ‘naming’ identified patterns from summaries and descriptions
  • inference – focusing on sampling variability, likelihood as a measure, and interval estimation
  • hypothesis testing – introducing both null hypothesis testing and also comparing two hypotheses or models in significance testing

While this helps me to decide on content, the specific reasoning skills that I could stress would be reasoning about data, reasoning about variability, and reasoning about inference. While we do touch on modelling, I feel that models and modelling require too much attention to include in addition to setting a solid foundation for inference in general.

Out with the old and in with the new

Furthermore, the manner in which I wanted to teach these materials was vastly different from the old standard curriculum. It is 2019 – I do not need my students to know how to use a normal probability table. No one does that any more. As such, I decided on the following changes in content:

  • Exclude by-hand calculations – Although one could argue that working out equations is the only way to truly understand them, I wanted this course to be a statistical literacy course. What real value is there in having a student being able to calculate the standard deviation of a dataset by hand? I believe the answer is ‘none’.
  • Exclude probability theory and probability distributions – I knew this was a recommendation of GAISE 2016, and it was something that students often struggled with. In the consensus curriculum, I think its introduced in order to facilitate calculations of test statistics by hand, a need now obviated.
  • Exclude the critical value approach and tests statistics – although understanding sampling distributions is arguably an essential aspect of statistical literacy, I do not believe it is worth the trouble in an undergraduate introductory course. It is more important to develop students’ understanding of sampling variability, which can be done without specifically addressing sampling distributions at all. Since in practicing statisticians rely on p-values and confidence intervals, I decided to go with a combination of the p-value and the confidence interval approach, further diminishing value in introducing critical values and test statistics.
  • Exclude the one-sample Z-test for a proportion and the two-sample Z-test for a difference in proportions – these tests are approximations of exact tests and were prevalent in an age before computers. If we are adopting a p-value and confidence interval approach, then the choice of the test we instruct is irrelevant. Why not just teach the exact tests?

Preparing for the hunt for a new text book

Satisfied with my proposed changes, I thought about how I would operationalize instruction for each of my major learning objectives, began to search for a new text book that would align with these values.

References and further reading:

GAISE (2016). Guidelines for assessment and instruction in statistics education. College report. Alexandria, VA: American Statistical Association

Garfield, J., & Ben-Zvi, D. (2008). Developing students’ statistical reasoning: Connecting research and teaching practice. Springer Science & Business Media.

Deciding course content

Although I view 2117 as a statistics class, its title, Introduction to Statistics for the Health Sciences, reminds me that it is statistics situated within a specific context. 3117’s title, Principles of Biostatistics, makes this even clearer.

As such, I want the specific statistical skills students walk out of the course with to be well-aligned with the tasks they might be asked to do in their careers. How should I decide on content for the course and how do I anticipate what students will need?

I have professional experience as a statistician in the health sciences, and largely drew upon that experience in the past to decide on relevant content for this course. However, I wanted a stronger evidence-base for my decisions. Luckily, a recent study by Hayat, et. al (2017) sampled published papers in public health journals. I could use this to decide the typical basket of tools my students might need to be familiar with.

Hayat, et. al. (2017) found the following epidemiological terms common:

  • Prevalance
  • Relative Risk
  • Odds Ratio
  • Incidence
  • Mortality
  • Hazard Ratio

Hayat, et. al. (2017) found the following statistics tests common (including p-values and confidence intervals):

  • T-test
  • Chi-squared Test / Exact Test (presumably for contingency tables)
  • Correlation tests
  • Non-parametric tests

Hayat, et. al. (2017) found the following statistics models common:

  • ANOVA
  • Linear Regression
  • Logistic Regression
  • Poisson Regression
  • Cox Proportional Hazards Regression
  • Generalized Linear Mixed Models

These results gave me a frame from which I could choose content for 2117 and 3117. I will introduce prevalence, relative risk, odds ratios, and incidence in 2117, and introduce mortality and hazard ratios in 3117. I will introduce the p-values, confidence intervals, T-test, Chi-squared test, and correlation tests in 2117, and introduce non-parametric tests in 3117. I will introduce ANOVA and linear regression in 2117, and generalized linear models in 3117.

While I certainly don’t want to automatically perpetuate the status quo, if I wish my students to be statistically literate, especially in the field of the health sciences, then they must be familiar with these methods. However, I struggle between balancing this traditional base and exposing my students to more modern methods. It seems unlikely that many will take more statistics courses in their lives. Is it my responsibility to seize this opportunity to expose them now, at the risk of leaving them without common skills prevalent in their field?

Nevertheless, with these content learning objectives in place, I could now move on to the specific statistical reasoning learning objectives and content sequencing.

References and further reading:

Hayat, M. J., Powell, A., Johnson, T., & Cadwell, B. L. (2017). Statistical methods used in the public health literature and implications for training of public health professionals. PloS one, 12(6), e0179032.

The initial call for change

In 2018, the research core curriculum director (my boss at GWU) and I decided we wanted to attempt a course re-design. She had for some time wanted to develop a course that would scaffold students through material to better guide and assess student learning and development through each module.

We conceived splitting each module (one per week) into three phases:

  • 1) introduction – reinforcing previously learnt material, and ensuring students were comfortable with the base ideas that would be developed in the module.
  • 2) knowledge development – guided learning through a series of lectures and activities.
  • 3) reinforcement of the main ideas and review

Each of the phases would be made up of a different balance of lessons, collaborative activities, assigned readings, and assessments (mini-quizzes). This initial conception kept the status quo curriculum in place, and simply was a change to the learning environment.

However, neither of us were perfectly happy with the current curriculum. I had been making small changes to the course with increasing frequency – rewriting assignments, rerecording lessons, and rewriting test questions – as I gained experience and became more familiar with statistics education research literature. This resulted in the fall 2018 version of the course being almost unrecognizable compared to the fall 2016 version.

Fueled by the creative opportunity provided by our re-design initiative, as a thought exercise we decided imagine what we wanted to teach with as close to a carte blanche as feasible.