The Pedagogically Optimal Way to Learn Math

by Justin Skycak on

The underlying principle that it all boils down to is deliberate practice.

Cross-posted from here.

A lot of people fall into the trap of thinking that to train up their math skills, they should be focusing on the hardest problem types where they think really hard, struggle for a while, and eventually solve it or look up the answer. These problems can be fun (for a certain type of person) but they’re not an efficient way to learn.

Approaching challenging problems without having the foundational skills down pat is like jumping into a game of basketball without having developed dribbling and shooting skills. It might feel fun but you’re just going to be whiffing every shot and getting the ball stolen from you. You might make one layup the entire game & feel good about it, but that’s barely any training volume.

It’s like going to the gym to lift weights but only eeking out a single rep over the entire course of your workout. You need to be banging out more reps if you want to get stronger, and the only way you can bang out those reps is by working with a level of weight that’s appropriate for you.

Math is the same way. In an hour-long session, you’re going to make a lot more progress by solving 30 problems that each take 2 minutes given your current level of knowledge, than by attempting a single competition problem that you struggle with for an hour. (This assumes those 30 problems are grouped into minimal effective doses, well-scaffolded & increasing in difficulty, across a variety of topics at the edge of your knowledge.)

Now, I’m not saying that challenge problems are bad. I’m just saying that they’re not a good use of time until you’ve developed the foundational skills that are necessary to grapple with these problems in a productive and timely fashion.

Some References

I should point out that while I’ve made heavy use of analogies here, this is all grounded in decades of research into the cognitive science of learning.

One key empirical result is the expertise reversal effect, a well-replicated phenomenon that instructional techniques that promote the most learning in experts, promote the least learning in beginners, and vice versa. It’s true that many highly skilled professionals spend a lot of time solving open-ended problems, and in the process, discovering new knowledge as opposed to obtaining it through direct instruction. But that doesn’t mean beginners should do the same. The expertise reversal effect suggests the opposite – that beginners (i.e., students) learn most effectively through direct instruction.

Additionally (and relatedly), there’s a mountain of empirical evidence that you can increase the number of examples & problem-solving experiences in a student’s knowledge base – but a lack of evidence that you can increase the student’s ability to generalize from those examples (by doing things other than equipping them with progressively more advanced examples & problem-solving experiences).

In other words, research indicates the best way to improve your problem-solving ability in any domain is simply by acquiring more foundational skills in that domain. The way you increase your ability to make mental leaps is not by learning to jump further, but by building bridges.

So, there does not seem to be any tangible, empirically-supported reason to struggle with a challenge problem for a long period of time, when you consider that you could be making more educational progress using that time to learn more content. For instance, in an hour-long session, you’re going to make a lot more progress by solving 30 problems that each take 2 minutes given your current level of knowledge, than by attempting a single competition problem that you struggle with for an hour. (This assumes those 30 problems are grouped into minimal effective doses, well-scaffolded & increasing in difficulty, across a variety of topics at the edge of your knowledge.)

(Note: When I say “2 minute problems” I don’t mean that as a hard cap. I just mean that you come into the problem with enough background knowledge that you can work 100% productively to complete it in a minimal amount of time. For low-level high school math courses, the minimal amount of time might be lower for many problems, like 1 minute or less. For more theoretical courses in late undergrad, the minimal amount of time might rise to, say, 5-10 minutes for some problems. But I can’t imagine a scenario in which a student spends a full hour on a non-research problem while working productively the whole time, unless the solution itself is commensurately long – e.g., the solution requires writing a chunk of code to implement an algorithm, or there’s a proof that requires many cases/sub-proofs, or there are otherwise many different parts to the problem that require solving.)

As Sweller, Clark, and Kirschner sum it up in their 2010 article Teaching General Problem-Solving Skills Is Not a Substitute for, or a Viable Addition to, Teaching Mathematics:

  • "Although some mathematicians, in the absence of adequate instruction, may have learned to solve mathematics problems by discovering solutions without explicit guidance, this approach was never the most effective or efficient way to learn mathematics. ... In short, the research suggests that we can teach aspiring mathematicians to be effective problem solvers only by providing them with a large store of domain-specific schemas. Mathematical problem-solving skill is acquired through a large number of specific mathematical problem-solving strategies relevant to particular problems. There are no separate, general problem-solving strategies that can be learned."

Another good reference is Putting Students on the Path to Learning: The Case for Fully Guided Instruction by the same authors in 2012. (It’s like an expanded version of the 2010 article.)

Deliberate Practice

In the field of talent development, there is absolutely no debate about the most superior form of training. It’s deliberate practice: mindful repetition on performance tasks just beyond the edge of one’s capabilities.

Deliberate practice is about making performance-improving adjustments on every single repetition. Any individual adjustment is small and yields a small improvement in performance – but when you compound these small changes over a massive number of action-feedback-adjustment cycles, you end up with massive changes and massive gains in performance.

Deliberate practice is superior to all other forms of training. That is a “solved problem” in the academic field of talent development. It might as well be a law of physics. There is a mountain of research supporting the conclusion that the volume of accumulated deliberate practice is the single biggest factor responsible for individual differences in performance among elite performers across a wide variety of talent domains. (The next biggest factor is genetics, and the relative contributions of deliberate practice vs genetics can can vary significantly across talent domains.)

So, how can you engage in deliberate practice?

What you need to do is avoid the two failure modes – that is, cutting corners on either of the two key attributes of deliberate practice (“mindful” and “repetition”).

Deliberate practice is not mindless repetition. If you’re doing the same thing over and over again, then you’re doing deliberate practice wrong. Deliberate practice is about making performance-improving adjustments on every single repetition. Any individual adjustment is small and yields a small improvement in performance, but when you compound these small changes over a massive number of cycles, you end up with massive changes and massive gains in performance. None of this happens if you’re mindlessly doing the same thing over and over again without making adjustments.

Likewise, even if you’re mindful during practice, you can’t skimp on repetition and still call it “deliberate practice.” Deliberate practice necessitates a high volume of action-feedback-adjustment cycles in every single training session. Otherwise, the compounding doesn’t happen. Any activity that throttles the number of these cycles cannot be described as deliberate practice.

Many heated debates in math education stem from these misinterpretations of deliberate practice. Mindless repetition, doing the same thing over and over again without making performance-improving adjustments, is not deliberate practice. Likewise, any activity that throttles the volume of action-feedback-adjustment cycles (e.g., excessively challenging problems, or think-pair-share type of stuff) is not deliberate practice.

Worked Examples are Needed

Math gets hard for different students at different levels – it can be as early as high school algebra or as late as graduate-level algebraic topology – but everyone eventually reaches a level where things no longer feel obvious and they can’t figure things out as quickly on the fly. If you’re not yet at the edge of human knowledge, that’s where worked examples and instructional scaffolding come in to keep you making fast progress.

Without worked examples, learners eventually reach a point where unguided problem-solving overwhelms their working memory and puts them in a state of cognitive overload. They feel frustrated, confused, and are unable to solve problems. And even before that point, even if a student is able to solve problems successfully without worked examples, it typically takes a lot longer, which throttles the volume of practice.

If you don’t have worked examples and instructional scaffolding to help carry you through once math becomes hard for you, then every problem basically blows up into a research project for you. That’s okay if you’re a research mathematician (or maybe late-stage graduate student) at the edge of your field, but if you’re a student who still has a ways to go before reaching the edge of human mathematical knowledge, then it’s just less efficient, even if you have fun with it.

If you want to maximize how far you get in a talent domain, then you need to grab all the examples & problem-solving experiences in the direction that you’re going, as quickly as possible, and then only once you reach the end of the road with known examples and problem-solving experiences, do you switch over to creative production. Creative production is a way less efficient way of moving forward so you want to save it for the end when it’s the only way to continue moving forward.

(To be clear: this is not just my opinion, this is based on findings from Benjamin Bloom’s research on talent development. This is the same Bloom who created Bloom’s taxonomy, which many educators use as justification for having students spend a lot of time working on project – but that is actually misinterpreting Bloom’s taxonomy in a way that is not aligned with Bloom’s seminal work that came later in his career. I’ve written more about that here.)

But how can you develop deep understanding if you use worked examples?

First of all: I’m not advocating for mindless mimicry here. There should be plenty of practice on theoretical foundations, including derivations, reasoning about the conditions under which a theorem might apply, etc. This is not inherently at odds with the idea of deliberate practice using worked examples & practice problems. It is trickier to construct worked examples & practice problems that meaningfully test learners on theoretical foundations, but it can be done.

Second of all: when you continually layer advanced skills on top of existing skills, it forces you to deepen your understanding, really internalizing those existing skills and the ideas behind them. This is basically the idea of “structural integrity” in the context of knowledge.

  • When advanced features are built on top of a system, they sometimes fail in ways that reveal previously-unknown foundational weaknesses in the underlying structure. This forces engineers to fortify the underlying structure so that the system can accommodate new elements without compromising its integrity.
  • Fortifying the underlying structure often requires improving its organization and elegance, which, in the context of student knowledge, produces deep understanding and insight.
  • When the structural integrity of a system is increased, it also becomes easier to add more advanced features in general. In the same way, when the structural integrity of a student’s knowledge is increased, it becomes easier to assimilate new knowledge in general.

When you layer on advanced skills, it forces understanding. If there’s some level of understanding you’re lacking in some component knowledge, you eventually get to a point where the lack of understanding prevents you from successfully learning more advanced skills.

For instance, there’s no way that you can get through a legitimate calculus course without having a real understanding of algebra. In fact, many math students will tell you that calculus is what really deepened their understanding of algebra.

For anyone who suggests that this answer does not apply to advanced mathematics (analysis, topology, abstract algebra, etc.)...

… I recognize that there’s barely any empirical research into education at this level of math. But are you aware of any research where deliberate practice was not found to be the most efficient practice technique, in any talent domain? As far as I’m aware, no counterexamples have been found (though I’d be really interested to read about any if there’s something I’ve missed).

Basically, it seems like there’s a mountain of circumstantial empirical evidence (in both talent development & cognitive science research) suggesting that the superiority of deliberate practice would carry over to the setting of advanced mathematics. So in the absence of any contradictory research, why would we expect anything otherwise?

If you’re a math professor, then I recognize that you have plenty of experience studying/teaching advanced mathematics, and if you claim that doesn’t carry over into advanced math in your personal experience, then that’s a claim worth considering. But at the same time, I do have some experience myself in studying/teaching advanced mathematics (not as much as you, but some), and in my experience, the superiority of deliberate practice is fully supported.

Just to name one example: last year, I tutored a student who was taking analysis at an elite university, and each problem set consisted of those “think really hard for a long period of time” problems. Things were going the way of a train wreck: despite her best efforts, she was spinning her wheels on these problems and making very little progress. Not only was she unable to solve the problems, but also, she was not noticeably improving any supporting knowledge by trying and failing to solve them.

What I ended up doing was engaging her in deliberate practice on all of the component knowledge that was being pulled together in each problem. Something like this (the following is non-exhaustive):

  • Deliberate Practice on Definitions: I give you a mathematical object and you tell me whether it meets the definition (and why or why not). Repeat over and over again increasing in difficulty. Okay, now suppose we remove some criterion from the definition. What's an object that didn't meet the original definition but does now after dropping that criterion? Repeat over and over dropping different criteria.
  • Deliberate Practice on Theorems: I give you a scenario and you tell me whether it meets the assumptions of the theorem. If so, you tell me specifically what else you know is true about the scenario, according to the theorem. Repeat over and over again increasing in difficulty. Okay, if this is a one-way implication, tell me some scenarios where the converse does not hold.

As soon as we took a step back from the homework problems and started doing that, she started making actual progress on her supporting knowledge. After enough cycles of deliberate practice, she’d re-attempt the homework problems, often solving them completely or at least getting a lot further before starting to spin her wheels again.

The result: her exam performance skyrocketed and she ended up finishing the course with an A.

What could have happened: without this deliberate practice intervention, she would have gotten a low grade in the course and possibly even dropped out of the major entirely.

(Note: Even after learning metacognitive techniques for thinking critically about definitions and theorems, she still required me to create deliberate practice problems for her by targeting weaknesses and selecting specific objects / scenarios to highlight key features & scaffold from easy to hard difficulty.

In general, a learner typically cannot construct their own deliberate practice experience without expert guidance since they are typically unable to construct information-rich & scaffolded practice problems, much less correctly identify their areas of weakness for targeted practice to begin with.

So, it would not be accurate to suggest that the student could carry out future deliberate practice on her own after learning metacognitive skills for parsing definitions/theorems, implying that the kinds of problems given to the student could be construed as effective for independent practice once those metacognitive skills were learned.)

Further Reading on Talent Development and Expertise

I’d start with K. Anders Ericsson, the academic father of deliberate practice. His work is extensive. Here are some of my selections:


Some other references worth checking out, related to deliberate practice:


And some lighter reading:


Oh, you probably also want to read about Bloom’s 3-stage process of talent development that describes the striking commonalities Bloom discovered when studying the backgrounds of extremely successful individuals across a wide variety of fields.

The 3-stage process is the main thesis of Benjamin Bloom’s 1985 book Developing Talent in Young People. The stages are summarized well in the 2002 paper Role of the Elite Coach in the Development of Talent by Gordon Bloom.

Some other interesting reads from Benjamin Bloom on the topic of talent development:


Another reference worth checking out: