Textbook Choice

When I last contemplated the curriculum I wished to adopt, there was one text book I didn’t consider, because it was being used for a Master’s level course in the department – Harvey Motulsky’s Intuitive Biostatistics (IB). However, that course since switched textbooks, leaving me free to consider IB for 3117.

I requested an exam copy of the book to assess its suitability for 3117 – The chapters are short, introduce important vocabulary with examples, avoids mathematical formulas, and most importantly, covers nearly all of the content I was hoping to cover in 3117.

Of 46 chapters in the book, 40 appeared to fit my original conception for 3117. Furthermore, the only things I intend to supplement are resources focusing on measurement error, randomization in experiments, simulation based tests, and perhaps poisson regression.

One added advantage is that the IB curriculum has been adopted for biostat literacy courses in three universities by expert statistics educators. Therefore, choosing IB for 3117 carries the advantage of a small yet excellent set of resources in terms of a community of instructors to seek advice from.

Weekly Learning Objectives

Having settled on content and course learning objectives, I began to plan out weekly topics and learning objectives. I searched several textbooks for ideas on how to sequence topics, with an ulterior motive of perhaps selecting one for students.

However, none of the textbooks aligned with my combination of context and approach. I then recalled research in statistics education stating that the order of topics probably doesn’t matter all that much.

Although I wanted to make evidence-based decisions in course design, I decided to simply rely on my own experience and intuition to sequence topics. In hindsight, I might have reached out to other biostatistics instructors I know for their courses’ weekly learning objectives.

Once I finished sequencing topics, I used Bloom’s Taxonomy to help me create weekly learning objectives. Because of the literacy focus, I used the verbs Identify, Explain, Describe, and Evaluate repeatedly.

As a final check of alignment, I mapped each of my weekly objectives to the four course objectives. There was relative balance, with all course objectives mapped to between 14-17 of the 47 total weekly learning objectives.

Towards a framework for context selection in statistics instruction

How many datasets should a statistics instructor use over the course of a semester? Fundamentally, there are two key forces at play between which we must strike an equilibrium.

The benefit of using more than one dataset, or context, and continuing to introduce new datasets, is fostering students’ abstraction and generalization of statistical concepts. The cost of using each additional dataset with which students interact in a meaningful way is inciting cognitive load, and students have limited cognitive capacity. Essentially, we want students to develop a deep understanding but we don’t want to overburden them.

For a single topic or statistical content, research suggests that we should use somewhere between 3 to 5 different datasets and contexts. We should also space each dataset we use throughout the course, using it multiple times if possible. However, we should do this in such a way that there is not more than 1-2 new datasets for students to work with in consecutive weeks.

Based on these recommendations, here’s how I might use various datasets, represented by letters, in an 8 week course:

A similar strategy can be used in 15-week courses, or even two semester course sequences which can be considered a 30-week course. These strategies suggest that the number of datasets statistics instructors should use in a course is between 1.0 – 1.2 times the total number of weeks.

This assumes that instructors follow several other pedagogies and utilize the datasets in a very particular way. Thus, this recommendation may not be globally applicable. However, this is the strategy I now use in the courses I teach, and have found it to achieve the goals I’ve set.

Probability versus Likelihood

I sort of understand that you can’t state a probability for whether or not the population mean is in the interval because it either is or it isn’t (or at least I read that, but I don’t think I really understand it). If there is a 90% chance that the interval contains the population mean, I’m not really clear as to why this is not also the probability because doesn’t that mean there is a 10% chance that it does not contain the population mean, giving you a 0.9 probability that it does contain it? 

One of my students

The answer, somewhat unfortunately/unintuitively/confusingly lies in semantics. The population mean is some value – we just dont know what it is. Therefore, there’s no random chance about it. It straight up is equal to some value. And here’s where more confusion comes into play. 

Probability and Likelihood are two different concepts. They’re basically opposite directions on the same two-way street. When we know population parameters and we’re examining outcomes from the population, that is probability. When we know a sample and we’re examining population parameters, that is likelihood. So, you can say something like ‘given that the mean of a distribution is 15, the probability that a random sample of size 12 has a sample mean greater than 18 is 0.342 (made up number)’. And, you can say something like, ‘given that we observed a sample mean of 42 from 20 observations, the likelihood that the true population mean is greater than 40 is 0.712 (made up number)’. The difference is super subtle, and I’ll admit, probably doesn’t really matter at the end of the day, and the authors of the textbook pull a tricky one by using the word sure and contrasting it with probability, without making the difference explicit (which it absolutely intuitively is not). But, this is why if we’re going by the book, we can’t say something like ‘theres a 90% chance the true mean is in our interval’, because its not even in the realm of probability to discuss the behaviour of population parameters given observed samples. We have to say things like ‘its 90% likely that the true mean is in our interval’. 

Again, I admit, does this distinction really make a difference, a practically significant difference? No, probably not (no pun intended). But if you understand that using population parameters to describe the behaviour of samples and using observed samples to talk about population parameters are two different things, kind of like opposite directions on the same two-way street, then you’re in good shape. It’s just a matter of knowing which side of the street you’re driving on, and what the specific verbiage/jargon is to use on that side of the street.

Choosing a curriculum – Part 2

We have a collection of introductory statistics textbooks in my office, mostly written within the last 10 years, many of which were written by researchers in statistics education. I decided to go through them one by one, to help me shape my thoughts and inform the selection of a new textbook.

Chance Encounters – a first course in data analysis and inference by C.J. Wild and G.A.F. Seber: I liked the way the authors introduced datasets, where they come from, and what they represent. However, I didn’t want to discuss probability at all, nor provide a mathematical treatise of the central limit theorem. I wanted to take a more intuitive approach.

Introductory Statistics – exploring the world through data by R. Gould and C. Ryan: I liked the guided activities included in the textbook. However, I still struggle with how to translate such activities to the distance-based environment. Furthermore, the book still focused on z-values and t-values, something I did not want to teach.

Activity-based Statistics by R.L. Scheaffer, A. Watkins, J. Witmer, and M. Gnanadesikan: I really liked this textbook, and the activity-based pedagogy. However, I did not know how to translate this to a completely distance-based environment, which I think is still an open research question in statistics education.

Statistics – the art and science of learning from data by A. Agresti and C. Franklin: I really liked the section introducing data, which had a similar feel to Chance Encounters. Again, however, I did not want to have my students calculate t scores or z scores during inference.

Statistics – unlocking the power of data by R.H. Lock, P.F. Lock, K.L. Morgan, E.F. Lock, and D.F. Lock: I really liked the simulation based approach to teaching inference, and the use of the free software StatKey. Furthermore, the order of the chapters almost perfectly aligned with my vision of the course.

After examining these five text books, the Lock5 text book stood out to me as the most interesting. I briefly investigated other simulated based text books, but decided to stay with the Lock5 book, and our new curriculum was in place!

The p-value controversy – Part 2

I recently attended a conference where one of the plenary sessions included a discussion of the p-value controversy, and provided suggestions on what we should be teaching and using instead.

When I first spoke about the p-value controversy on this blog (Rao, 2019), I suggested that an understanding of the history of hypothesis testing may help clear students’ confusion. To Fisher, p-values were a measure of likelihood that a current hypothesis or theory could explain observed phenomenon. Only when a hypothesis was so unlikely to be a sufficient explanation of a phenomenon could it be ruled out or rejected. It was with this lens that Fisher described p-values less than 0.05 as significant, choosing 0.05 as a convenient threshold (Fisher, 1925).

Wasserstein, Schirm, and Lazar (2019) promote many alternate statistics, one of which is the effect size. They advise that careful consideration be made to determine what a meaningful effect size would be for each individual study.

However, I am concerned that we are setting ourselves up for an effect-size controversy in the future, similar to the p-value controversy. Cohen’s d (Cohen, 1988) is one of the most famous measures of effect size, and comes with convenient thresholds for small, medium, and large effects. Sullivan & Feinn (2012) summarize many different measures of effect size, and for all, include thresholds for determining the size of the effect.

If 0.2 is a small effect, and 0.5 is a medium effect, what is 0.35? Is 0.1 still a small effect, or is it no effect? What about 0.05?

In my mind, these alternate procedures do not solve the problem at the root of the p-value controversy, it just re-directs it. I still don’t know of any perfect solution, but I believe Wasserstein, Schirm, and Lazar’s (2019) recommendation to be thoughtful is what we must seek to impart on our students.

My favourite summary of recommendations thus far is one by Dr Andrew Zieffler (citation listed below). I plan on requiring my students to review these slides after reading the Biau, Jolles, & Porcher (2010) article, in an attempt to help set the stage for thoughtful interaction with statistical tools.

References and further reading:

Biau, D. J., Jolles, B. M., & Porcher, R. (2010). P value and the theory of hypothesis testing: an explanation for new researchers. Clinical Orthopaedics and Related Research®, 468(3), 885-892.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum Associates.

Fisher, R.A. (1925). Statistical methods for research workers. London: Oliver and Boyd.

Rao, V.N.V. (2019, March 23). The p-value controversy [Blog post]. Retrieved from https://statisticaljourneys.home.blog/2019/03/23/the-p-value-controversy/

Sullivan, G. M., & Feinn, R. (2012). Using effect size—or why the P value is not enough. Journal of graduate medical education4(3), 279-282.

Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a World Beyond “p< 0.05”. The American Statistician, 73 (S1), 1-19.

Zieffler, A. (2019). Deprecating statistical significance: Toward better science [Lecture slides]. Retrieved from: http://www.datadreaming.org/post/2019-04-26-slhs-prosem/.

Correlation may be Causation

A common adage recited by many statistics instructors is that correlation is not causation. I am concerned that many students hear the adage, misinterpret it, and leave thinking that correlation, and by extension all of statistics, is useless in the real world.

A purer wording is that correlation does not imply causation, and is important to state because causation implies correlation. My guess is that we don’t say this in classrooms because we believe students don’t yet understand what imply means.

I believe a better way to translate this would be to say Correlation may be causation, but it also may not be. So, what can be done to ensure correlation is a sign of causation?

One answer lies in introducing students to causal inference (Neyman, 1923; Rubin, 1974). Causal inference fundamentally differs from traditional statistical inference. While statistical inference is limited to inferring relationships that exist under the specific conditions through which data were collected, causal inference explicitly attempts to infer relationships amidst changing conditions (Pearl, 2009).

Causal inference is still gaining early traction among the statistics education community. Often times, teachers are unfamiliar with it, and thus hesitate to teach it. I believe that teaching students counterfactual reasoning through causal inference is an important skill to teach as part of teaching statistical literacy (Gal, 2002).

One key visual tool in causal inference is the exploration of relationships between variables using a directed acyclic graph (DAG).

For example, we may think that a person’s weight affects to their blood cholesterol level.

But perhaps we realize that diet may also affect both characteristics.

Here, diet is explicitly acknowledged as a confounding variable. It must be considered in any research design attempting to speak to the relationship between weight and cholesterol level. Furthermore, we have a clear visual to help students understand why weight and cholesterol may be correlated, but may not have a causal relationship.

Unfortunately, I have not yet found a simple explanation I feel is appropriate for introductory students. This may be a future project of mine. In the meantime, I personally enjoyed reading about the history of causal inference in a paper by Freedman (1999), the first few sections of which may be accessible and informative.

References and further reading:

Freedman, D. (1999). From association to causation: some remarks on the history of statistics. Journal de la société française de statistique140(3), 5-32.

Neyman, J. 1923 [1990]. “On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Section 9.” Statistical Science, 5 (4), 465–472. Trans. Dorota M. Dabrowska and Terence P. Speed.

Pearl, J. (2009). Causal inference in statistics: An overview. Statistics surveys, 3, 96-146.

Rubin, D. B. (1974). Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies. Journal of Educational Psychology, 66, 688-701 .

Anyone can do statistics

My favourite pixar movie is Ratatouille. Chef Gusteau’s motto is “tout le monde peut cuisiner”, or, “anyone can cook”. Later in the movie, Anton Ego, voiced by Peter O’Toole, writes “Not everyone can become a great artist, but a great artist can come from anywhere.”

Similarly, I believe that anyone can do statistics, or, that not everyone can become a statistician, but a statistician can come from anywhere.

I recently attended a talk where a prominent statistician and data analyst told the story of how she came into the field. It was entirely due to her instructors explicitly encouraging her and telling her that she has potential in the field.

While not all of our students will become statisticians, or major in statistics, or even master the material we attempt to teach them, we must encourage them, keep an open mind, and instill in them a positive identity as statisticians or data scientists/analysts (Cobb & Hodge, 2002). Our students will never believe in themselves if we do not believe in them first.

I came home from the talk and sent an email to my students from last semester to encourage them to consider furthering their skills, to explicitly state that I believe they have potential as data analysts, and to attempt to instill in them an identify as a statistician.

My message to them: Not only can you do statistics, you have done statistics. You are a statistician.


References and further reading:

Cobb, P., & Hodge, L. (2002). Learning, identity, and statistical data analysis. In Sixth International Conference on Teaching Statistics (ICOTS6), Cape Town, South Africa.

Lewis, B. (Producer), & Bird, B. (Director). (2007). Ratatouille [Motion Picture]. United States: Pixar.

My favourite number

On this blog’s homepage I state “I’ve been in love with numbers for as long as I can remember.” Even when I was a toddler I never wanted to practice reciting the alphabet – I preferred reciting numbers.

Yet, one number has always stood above the rest. It is my favourite number – 7.

Origins

I think I decided that 7 ought to be my favourite number when I was relatively young. I was 7 years old when my sister was born (technically, 6yrs 10mos 23days 17hrs and 39mins, and yes I did do that calculation the day she was born itself). 7 was the first jersey number I had for soccer. 7 is the sum of the digits of my birth date.

However, I remember recognizing quite early on, no later than the age of 8, that the reciprocal of 7 was the most interesting reciprocal of all of the numbers less than 12.

  • 1/2 = 0.5
  • 1/3 = 0.333333…
  • 1/4 = 0.25
  • 1/5 = 0.2
  • 1/6 = 0.166666…
  • 1/7 = 0.142857142857….
  • 1/8 = 0.125
  • 1/9 = 0.1111111…
  • 1/10 = 0.1
  • 1/11 = 0.090909…
  • 1/12 = 0.083333…

The reciprocals of 2, 4, 5, 8, and 10 all have finite decimal expansions. The reciprocals of 3, 6, 9, and 12 all end with a single digit repeating ad infinitum while the reciprocal of 11 ends with a repeating two-digit sequence. Yet 1/7 was in a class of its own. I didn’t know why 7 should have such a unique decimal expansion at the time, but I was captivated by it.

A growing fancy

A few years later I realized that the pattern went deeper. Comparing 1/3 (0.333…) to 2/3 (0.666…), the decimal expansions have the same form, in that it is a single digit that repeats, but the digit that repeats is different. This holds for the fractions of 6, 9, 11, and 12 as well. However, the fractions of 7 do something entirely different:

  • 1/7 = 0.142857142857…
  • 2/7 = 0.2857142857…
  • 3/7 = 0.42857142857…
  • 4/7 = 0.57142857…
  • 5/7 = 0.7142857…
  • 6/7 = 0.857142857…

I’ve aligned the decimal expansions to help identify the pattern. Each of the fractions has the same six numbers in the same order, just with a different starting point! 7 was simply outlapping the other numbers in terms of mystique. Why were the same numbers repeating for each fraction, and why were they in the same order?

I began my search for other numbers that had this pattern, but, only using pen and paper, or calculators that only displayed 8 to 10 digits, proved limiting in my search.

Getting serious with number theory

I held on to this intrigue with the number 7 into college, and approached the professor of my number theory class with the question ‘why does 1/7 have such unique patterns?’. He then showed me a whole new dimension to the number 7. It was as if I was peering through Lewis Carroll’s looking glass into a hitherto unknown world of exotic beauty.

He explained that the pattern occurred because we operate in the numeric base of 10, and 10 is a primitive root modulus 7. That means that a series of 6 9’s in a row is divisible by 7, (important since 6 = 7-1) i.e. 999999/7 is an integer, and no other smaller series of 9’s divided by 7 is so (9/7 is not an integer, 99/7 is not an integer, etc.). Any prime number that has this property is called a full repetend prime, and all full repetend primes exhibit the same properties that I described above for the number 7. The first five full repetend primes are 7, 17 (meaning that a series of 16 9’s in a row is the smallest series of 9’s that is divisible by 17), 19, 23, and 29.

He then showed me the property of 9’s, also called Midy’s Theorem. If we recall the repetend of 1/7, i.e. the part that repeats, or 142857:

  • 1+4+2+8+5+7 will be divisible by 9 (it equals 3*9)
  • 14+28+57 will be divisible by 99 (it equals 99)
  • 142+857 will equal 999

Similarly, for the repetend of 1/17, 0588235294117647:

  • 0+5+8+8+2+3+5+2+9+4+1+1+7+6+4+7 will be divisible by 9 (it equals 8*9)
  • 05+88+23+52+94+11+76+47 will be divisible by 99 (it equals 4*99)
  • 0588+2352+9411+7647 will be divisible by 9999 (it equals 2*9999)
  • 05882352+94117647 will equal 99999999

All full repetend primes have this property. We went on to discuss many other things, including discrete logarithms and other properties of cyclic numbers and prime reciprocals, and I went on to discover and play with subclasses of the full repetend primes, but I never lost any love or interest for my favourite number, 7.

Learning objectives: Literacy or computing?

I recently met two colleagues to solicit their advice on course creation, and the first thing they asked me was ‘are your students expected to run the models themselves?’

My first gut answer was ‘no’ – I want to focus on developing students’ statistical reasoning and literacy. What did the original course creators have in mind? The course description is:

Biostatistics for health science professionals. Concepts and methods, including confidence intervals, ANOVA, multiple and logistic regression, and non-parametric analyses. Scientific literature is used to provide a comprehensive context in which analytical evidence is employed to support practices in the health sciences.

The last sentence seems to imply a focus on literacy, but does ‘methods’ imply computation? The course objectives are that students in the course will learn to:

  • Apply biostatistical concepts, including: probability, distribution, confidence intervals, inference, hypothesis testing, P-value and confidence interval.
  • Apply numerical, tabular, and graphical descriptive techniques to health sciences data. 
  • Conduct appropriate statistical procedures to test null hypotheses.
  • Appraise statistical results in health science research articles and reports. 

Again, ‘appraise’ strikes me as a desire for literacy, but ‘conduct’ seems a clear indicator that computing is also an expectation. However, the weekly learning objectives from units focusing on ANOVA, multiple linear regression, and logistic regression paint a different story:

  • Interpret estimates from a one-way and two-way analysis of variance. 
  • Appraise procedures used to address the problem of multiple comparisons. 
  • Explain prediction models that are grounded in the population regression line.
  • Interpret regression coefficients from a multiple regression model. 
  • Calculate and interpret chi-square test statistics.
  • Interpret results from unadjusted logistic regression models.
  • Identify the reasons for conducting a non-parametric test.
  • Interpret estimates from non-parametric tests including the Wilcoxon Signed-Rank test, the Wilcoxon Rank Sum test and Kruskal-Wallis test. 

‘Interpret’, ‘Appraise’, ‘Identify’, and ‘Explain’ all seem like literacy skills to me, and the only computation that is being asked is a chi-squared test statistic, as well as (in other weeks) odds ratios, risk ratios, relative risks, and other descriptive statistics.

Perhaps this is the right place to draw the line between computation and literacy – I will ask my students to ‘compute’ descriptive statistics, but focus on literacy with regards to inference, modelling, and hypothesis testing.

I am curious though if it is possible to focus on both literacy and computation in the same course. I imagine students becoming stressed and annoyed with debugging with whichever software we might use. Whichever software I do choose ought to be able to do all the calculations I expect students to do in 2117 and 3117, and should be used in both courses.