How many datasets should a statistics instructor use over the course of a semester? Fundamentally, there are two key forces at play between which we must strike an equilibrium.

The benefit of using more than one dataset, or context, and continuing to introduce new datasets, is fostering students’ abstraction and generalization of statistical concepts. The cost of using each additional dataset with which students interact in a meaningful way is inciting cognitive load, and students have limited cognitive capacity. Essentially, we want students to develop a deep understanding but we don’t want to overburden them.

For a single topic or statistical content, research suggests that we should use somewhere between 3 to 5 different datasets and contexts. We should also space each dataset we use throughout the course, using it multiple times if possible. However, we should do this in such a way that there is not more than 1-2 new datasets for students to work with in consecutive weeks.

Based on these recommendations, here’s how I might use various datasets, represented by letters, in an 8 week course:

A similar strategy can be used in 15-week courses, or even two semester course sequences which can be considered a 30-week course. These strategies suggest that the number of datasets statistics instructors should use in a course is between 1.0 – 1.2 times the total number of weeks.

This assumes that instructors follow several other pedagogies and utilize the datasets in a very particular way. Thus, this recommendation may not be globally applicable. However, this is the strategy I now use in the courses I teach, and have found it to achieve the goals I’ve set.