Chalk and Talk Podcast #42: Math Academy: Optimizing student learning

by Justin Skycak (@justinskycak) on


Link to Podcast



The best podcast about Math Academy to date. If you want to understand what we're doing but don't have time to skim our 400+ page book, this episode sums it up in just an hour.
[~5:00] What is Bloom's two-sigma problem, how did Bloom attempt to solve it, why does it remain unsolved, and what is Math Academy's approach to solving it?
[~10:00] What is mastery learning? Why is full individualization important? What is our knowledge graph and how do we use it to implement mastery learning? How do we use data to improve our curriculum?
[~21:00] Why is it so important to be proficient on prerequisite skills? How does this relate to cognitive load? You see this same phenomenon everywhere outside of math education. Jason has a "learning staircase" analogy that elegantly encapsulates the core idea.
[~26:30] Why are worked examples so important? How do we leverage them?
[~29:30] Our perspective on memorization. Yes, students need to memorize times tables (among other things). No, they should not be expected to do this before they know what multiplication means (and how to calculate it using repeated addition).
[~33:30] Our perspective on the concrete-pictorial-abstract approach -- what it's useful for, and how it often gets misapplied.
[~41:00] What is spaced repetition? How does that work in a hierarchical body of knowledge like math? What are "encompassings" and why are they so important? How do we choose tasks that maximize learning efficiency? How do we calibrate the spaced repetition system to student performance and intrinsic difficulty in topics?
[~48:00] What is the testing effect (retrieval practice effect) and how do we leverage it? How do we gradually wean students off of reference material? How do quizzes play into this?
[~52:00] What does a student need to do to be successful on Math Academy? What does an adult need to do to facilitate their kid's success, and what are our plans to build more of this directly into the system?
[~55:30] We have a streamlined learning path specifically designed for adults, to get them up from foundational middle-school material up to university-level math in the most efficient way possible. What the learning experience often feels like for adults: it can be an emotional experience when you successfully learn math that you used to be intimidated by, and realize that the reason you struggled in the past wasn't because you're dumb but rather because you were missing prerequisites.
[~1:02:00] How did Math Academy get 8th graders getting 5's on the AP Calculus BC exam? What's our origin story? Can any student be successful on Math Academy? The students in our original Pasadena program -- what was their background, what did they learn in our program, and what are they doing now?
[~1:10:00] What's next for Math Academy? We want to become the ultimate math learning platform and empower the next generation of students with the ability to learn as much as they can.

Want to get notified about new posts? Join the mailing list and follow on X/Twitter.


Link to Podcast



This is, without a doubt, the best podcast about Math Academy to date – which, in hindsight, is no surprise. Anna has a knack for bringing out the best in her guests, and Alex had numerous insights to share as well.

If you want to understand what we’re doing but don’t have time to skim our 400+ page book, this episode sums it up in just an hour. Huge thanks to Anna for having us on – it was a pleasure!

∗     ∗     ∗

The transcript below is provided with the following caveats:

  1. There may be occasional typos and light rephrasings. Typos can be introduced by process of converting audio to a raw word-for-word transcript, and light rephrasings can be introduced by the process of smoothing out natural speech patterns to be more readable via text.
  2. The transcript has been filtered to include my responses only. I do not wish to infringe on another speaker's content or quote them with the possibility of occasional typos and light rephrasings.
∗     ∗     ∗

Justin: Thanks. It’s great to be here.

Justin: Sure. Bloom’s two sigma problem was coined in the 1980s by educational psychologist Benjamin Bloom.

There was a particular study of his that really kicked it off, comparing the effectiveness of one-on-one tutoring versus traditional classroom teaching. He found that the average tutored student performed better than 98% of students in a traditional class.

They measure this effect size in standard deviations, or sigmas. That’s why it’s called a two-sigma effect. The idea is that this is a huge effect. Is it 1.5 sigmas? Is it 2.5 sigmas? Depending on how you measure it, you will get different results based on who the tutors are and the overall context.

But the key idea is that a lot of learning is being left on the table—a lot of human potential is going unrealized. How do we capture it? That sums up the two-sigma problem. You can massively elevate student learning outcomes with properly individualized pedagogy, but society can’t afford to equip every student with a human tutor.

So, what can we do about it? That’s the two-sigma problem.

Justin: Let me start by explaining how Bloom tried to solve it. He didn’t just frame the problem—he had an entire research program aimed at solving it.

Our approach is similar to his in some ways but different in critical ways.

He hoped the benefit of a human tutor could be captured by combining various evidence-based learning strategies. That sounds like a great idea, right? You take something effective, deliver it to students, then add another evidence-based strategy—whether it’s pedagogy, the study environment, or something else. You keep layering these scientifically supported techniques, hoping that as you build them up, you reach the two-sigma effect—or at least get as close as possible. You try to improve learning outcomes as much as possible.

The thing about Bloom’s approach is that he restricted his search space to strategies that could be implemented manually, not necessarily to the fullest extent. For instance, one of the strategies he examined was mastery learning, in which you ensure students know their prerequisites before moving them on to more advanced material.

In Bloom’s solution to the two-sigma problem, mastery learning was not designed for an individual student but for the class as a whole. You can only approximate this for an entire class. That’s not bad—it does improve learning outcomes if a teacher attempts to apply mastery learning at a class level—but it’s not as effective as tailoring it to each individual student, who will have a unique knowledge profile.

This was the fatal flaw in his approach. I don’t mean flaw in the sense that it didn’t work at all, but rather that it didn’t fully achieve the effect of human tutoring. His search was ultimately unsuccessful because it was constrained by the limits of human teaching labor.

At the time, that was a reasonable constraint because computer technology was far less mature. Computers were very expensive. You were lucky if a school even had one. Things are totally different today.

For nearly a decade, Math Academy’s challenge—and in fact, our purpose, the whole reason we exist—has been to carry this torch forward and reattempt a solution. We overcome the limitation of human teaching labor by leveraging technology to implement individualized learning techniques to a much fuller extent.

We started out in a public school district, teaching manually and applying individualized learning techniques as much as possible. It was similar to Bloom’s approach—doing it manually to the greatest extent feasible.

The key difference is that we gradually built an online system to automate pieces of the work and apply them more effectively than we could manually. That freed us up. We would teach as well as we could in person, get a handle on the problem, and once we understood how to structure the solution, offload it to a computer program—whether for mastery learning on a knowledge graph, spaced repetition, or other techniques. Then we would return to manual instruction and ask, “What’s next? What else needs to be offloaded?”

By the end of this process, we created a teaching machine that is shockingly effective. For instance, students have passed the AP Calculus BC exam as early as eighth grade.

Justin: Exactly right. One of the core problems of classroom teaching is the heterogeneity of student learning profiles.

You can have students enter a class with A’s from their prior class. Optimistically, let’s say they mastered 90 percent of the content. That means each student is missing some 10 percent chunk of knowledge. This is a very optimistic case—realistically, they likely mastered less—but those missing chunks can be in different places.

If you have 30 students, each missing 10 percent of the knowledge in different areas, the class as a whole might collectively know as little as 50 percent of the prior course, or even less. These students all have different background knowledge.

To achieve perfect mastery learning, you have to individualize instruction for every student.

Justin: The other component is that we continually refine our curriculum over time. It wasn’t always this way. When we started building these lessons, they were not all at a 95 percent pass rate.

We built curriculum analytics tools that allow us to track which topics have the lowest pass rates. We then dig into those and determine where students are actually struggling.

Are they struggling at the very first knowledge point, or do they typically get through the first two knowledge points and then struggle with the third? We can even pinpoint specific questions students struggle with.

We have this whole curriculum on lockdown, allowing us to zoom in and identify hot spots that need attention and refinement. Within those topics, we can target specific segments and introduce more scaffolding.

One thing I should clarify is that we never lower the bar for success. It’s not about making lessons easier. It’s about introducing more scaffolding, better preparation, and smoothing out the learning experience.

That has been a key part of how we have managed to get these pass rates so high—continually refining lessons as we gather more data about where students struggle.

Justin: Just to give an even more concrete example for people who aren’t familiar with the structure of math, imagine you go to the gym and decide to learn gymnastics. You want to learn to backflip. You go to the coach and say, “Hey, can you teach me how to backflip?”

The coach says, “Yeah, sure, I can teach you.”

Then you ask, “Great, what do I do?”

And the coach says, “Go try a backflip.”

You try, but you don’t even do a rotation. You just fall over. You have no idea what you’re doing.

Then the coach says, “Just keep trying. Use the whole hour to try a backflip. Come back tomorrow and try again.”

It’s clear you’re not going to get there. You’re just going to keep falling on your face and develop a distaste for the whole process.

In gymnastics or any advanced sport, you scaffold the process. You make sure you can do the subskills first. Then you focus on combining some of them, then more of them. Eventually, there’s a more advanced variation that unlocks the next step. That leads you toward the thing you originally wanted to do. It’s like that.

Justin: There’s an analogy that Jason Roberts, he and his wife Sandy Roberts are the founders of Math Academy, but Jason has this one analogy that’s called the learning staircase that I think really puts this all in perspective.

Students who are learning math, who are climbing up this knowledge graph, you can kind of simplify the situation into they are climbing a staircase. The reason why a lot of students don’t make it up the staircase is that the steps are too big. Their stair-climbing ability is kind of like their working memory, how much information they can fit in their brain at once and make sense of, and the height of the step is kind of like the cognitive load of the task we are trying to do.

When the height of the step is too high, when we are trying to do too much stuff at one time, and it’s higher than what the student’s working memory can step over, the student is going to experience cognitive overload, and they are not going to be able to climb.

Basically, all we are doing is taking this learning staircase, which originally had very large steps, and students would get stranded under these steps they couldn’t climb. We are just breaking those steps down into smaller steps, and the staircase is still going just as high, and it’s just as rigorous. These incremental steps are small enough for students to climb all the way up.

One of the ways to do that is worked examples.

Justin: One thing we should probably clarify about our perspective on rote memorization is that we are not saying a student should memorize times tables or trig values without knowing what multiplication means or what trig functions represent. It’s like, get the meaning first.

But at some point, you have to really hammer these times tables or these trig values, these trig identities, these derivative rules, you have to hammer them into your memory. It’s not enough to understand what they are; it’s also about executing them. It’s like, it’s not enough to understand how to do a backflip, you actually have to do the backflip. To do that, you have to practice repetitively over and over, really hammering this into place.

We are not suggesting that students should memorize symbols they don’t even know what they mean, like memorize a times table without understanding that multiplication is repeated addition.

No, it’s like half the students do multiplication problems using repeated addition first, but then move on to really hammer this information into memory. So, it’s quick, instant recall, building that automaticity that’s needed.

Justin: It’s good to practice this level of conceptual understanding before you move on to memorizing timetables. But if you stay too long in that area, this thing that’s supposed to be scaffolding you up to memorize your timetables actually becomes a crutch. You just want to rely on it the whole time.

Justin: Spaced practice, spaced review, is one of the cognitive strategies we have actually leaned into the most. I spent a long time building a spaced repetition system into Math Academy’s learning system that takes our knowledge graph into account.

I should back up and say that spaced repetition has been known for over a century. Ebbinghaus was the scientist who first noticed that when you learn something, your memory decays. But if you review it after a while, it decays slower.

You can wait longer until you review it again, and your memory will extend more. The way you increase your retention of something is by retrieving a fuzzy memory. When you retrieve a fuzzy memory, it gets stronger, and it decays slower the next time. You can wait longer until it gets fuzzy, that same level of fuzziness again.

Anyway, this is the idea of spaced repetition. The intervals depend a lot on the context of what’s being learned and how well you’re doing on your recalls. Roughly speaking, it’s something like maybe you wait about a day the first time you review it, then a couple of days, then a week, a couple of weeks, and a month. It gets to a point where the intervals are expanding sort of exponentially, roughly doubling, which is a good rule of thumb.

Each time you successfully recall something, maybe you wait about twice as long before attempting to recall it again. There are platforms for doing spaced repetition on flashcards that are unrelated, each with their own spaced repetition schedule based on when you introduce the card, when you started reviewing it, and whether you got the repetition correct or not.

But in math, things get really complicated because we’ve got this big knowledge graph of concepts, and we are often implicitly practicing subskills. For instance, imagine you stick one-step linear equations and two-step linear equations into a spaced repetition system. You are doing spaced repetitions on these two topics, and let’s say a repetition is just solving some problems from each topic.

If you do this naively, doing each repetition schedule in parallel without interacting them at all, you are being very inefficient. Every time you solve a two-step linear equation, you are implicitly solving a one-step linear equation. The whole point of solving a two-step linear equation is that you do something to the equation, and that turns it into a one-step linear equation that you know how to solve.

Every time you solve a two-step equation, you are implicitly solving a one-step equation. It’s like, okay, learn one-step first, but as soon as you learn two-steps, now you don’t need to worry about reviews on one-step anymore. You should just throw that spaced repetition card away and focus on the two-step linear equations card. Every time you do that, it’s essentially a repetition on the one-step.

This gets really complicated when you have this whole web of 3,000-ish math topics connected by around 10,000 relationships. It just gets really, really complicated. We had to devote a lot of time into figuring out how this repetition system was going to work.

Jason was the one who first pointed it out when a kid was getting silly reviews on some topics that you would reasonably infer they already know how to do and are already practicing in their lessons. He called it an encompassing.

Topic A encompasses topic B if topic B is used as a subskill in topic A. We ended up building encompassings into our spaced repetition system so that every time a student does a repetition on a higher-level topic, the repetition flows down the knowledge graph through the encompassings to affect the lower-level subskill topics being exercised.

We also generalized this to the idea of fractional encompassings. Sometimes you are not using the subskill in full, but using a part of it, or maybe you are using it in full, but only in a part of the problems in this higher-level topic. I spent a lot of time building this algorithm and system to track these space repetitions through the knowledge graph. We call it fractional implicit repetition, or FIRE.

Justin: Efficiency is the name of the game with Math Academy. Every single task that we choose for a student to do is the result of an optimization problem that’s going behind the scenes. The optimization problem is how do you get the student making the most progress in the knowledge graph, knocking out as much review implicitly as possible? We are trying to minimize the amount of work the student has to do to get through the course. There will still be plenty of review, but the idea is that we are choosing the reviews to knock out a lot of lower-level concept, skill, and topic reviews as possible.

I should also mention that the spaced repetition process, in addition to being calibrated to this knowledge graph, is also calibrated to how well the student does on their learning tasks. If you blow a task out of the water, getting everything correct, every question perfect, that’s going to count for more repetition credit than if you only do decently on it, like maybe getting most of the questions right, but still missing some. If you get a lesson halted, that’s not going to count as a repetition. You need to actually do it again, the same for a review.

This is tailored not just to the levels of student performance, but also to the levels of intrinsic difficulty in topics. Some topics are just more difficult than others. That’s just how math goes. We factor that into our spaced repetition algorithms, too. When a topic has a higher level of intrinsic difficulty, the repetitions are more frequent, so students will get more practice on it.

Every topic has its own spaced repetition schedule for the average student, and every student has their own spaced repetition schedule based on their performance on a topic. It also takes into account performance on prerequisite topics and overall accuracy. Every single student, for every single topic, has a unique spaced repetition schedule that encodes all the context of the situation and tries to optimize the right amount of review. This way, they don’t forget it, but at the same time, it frees up time for them to learn new material instead of spending all their time reviewing.

Justin: Let me start by explaining what that is for listeners. The best way to extend the memory duration of any information you’ve consumed is to test yourself on it, meaning quizzing yourself, trying to pull it out of your brain without looking at a reference.

This stands in stark opposition to other strategies like re-reading or just transcribing notes again. There are a lot of students who get into the mindset of rewriting their notes or transcribing what’s in the textbook alongside, or just re-reading over and over the same paragraph. They think somehow it’s getting into their brain, but what happens is they feel this comfortable sense of fluency from having that information in their working memory. What they don’t realize is that it’s not enough for the information to be in your working memory. That information dissipates shortly after you stop rehearsing it.

You actually have to get it to encode into long-term memory, and the way you encode it into long-term memory is by retrieving it from long-term memory.

It’s kind of like weightlifting, where your long-term memory is like a muscle that you have to exercise, and the act of lifting the weight is the act of retrieving the information. The testing effect, also known as retrieval practice, is a centerpiece of our pedagogy. As soon as a student goes through a worked example, what’s up next is practice problems, where they have to pull what they have learned out of their brain and apply it to a new practice problem in a slightly different context.

As a student sees a topic multiple times throughout the course on Math Academy, we gradually back away even further from reference material and rely more on assisted retrieval. It starts out where, when they are solving problems, they have the worked example to refer to if they need to.

It’s kind of like the spotter at a gym. The spotter can help you lift the weight if you are really getting crushed by it, but you shouldn’t be relying on the spotter. After a student goes through the lesson and moves on to a review, the worked example is not easy for them to reference.

They can still click back to the topic and scroll through to find that this problem is kind of similar to this worked example, but it is intentionally a little bit more annoying to go look up the reference material because we are trying to wean students off of it. We are trying to get them to pull the information out of their brain when solving these review problems.

The third stage is after a student has learned some material and done some review on it, they get quizzed on it. Quizzes on Math Academy happen about every 150 minutes of work. We measure work in XP (experience points), where one XP is roughly equivalent to one minute of work for an average student at that level. Quizzes come every two and a half hours, maybe every two and a half hours. This is not like a typical classroom, where you might get a quiz every two weeks at most, or maybe every two months for a student who’s using Math Academy along a class schedule. For a student spending 50 minutes a day on Math Academy, they will get a couple of quizzes a week.

These quizzes are 15-minute quizzes, but they are packed with quite a few questions. In lower grades, you might get 20 questions on a 15-minute quiz. By the time a student enters the quiz, they should be strong enough on the material to pull it out of their brain in a reasonably quick timeframe and start building the level of automaticity they need. In higher-level courses, like calculus, it might be more like six or seven questions for a 15-minute quiz. We calibrate the number of questions in a quiz to the amount of time we expect the questions to take. The idea is that we are always focused on retrieval practice, gradually stripping away reference material, and having students retrieve under time constraints to build automaticity.

Justin: One thing that I want to say is a common reaction, especially from adults using our system, is that they get to a point where they have this real emotional experience of seeing some math on a system that they had tried to learn previously. They were really intimidated by it because previously it did not go well at all, and it just felt out of reach. They were thinking, “I guess I am just a dummy because I don’t understand any of this math.” But when they see it on our system, it’s a totally different feeling. They say, “Wait, that’s all it is? It just comes down to these prerequisites I know. That’s all I was missing. I thought I was dumb, but it turns out I was just missing the prerequisites.”

Then they look at all the time they spent in the past just banging their head on these problems when they could have been filling in their prerequisite knowledge. It gets so much easier to learn. That’s a very common reaction with our adults.

Justin: I should say that when a student is advanced and sitting in a class just rehashing things they’ve already learned, that’s not a good use of time. Every student needs instruction that is appropriate for them. A student doesn’t need to be advanced or even at grade level to benefit from individualized instruction.

I don’t want to turn off people who don’t think of themselves as advanced learners because the benefits of filling in your prerequisite knowledge and having instruction adapted to your pace of learning really benefit all students. Even if a student is below grade level, if they are below grade level just because they have missing knowledge, then we can fill that knowledge up quickly for them. The only type of student we don’t work for is one who is dead set on not learning—no matter what you do, they refuse to do problems. That’s more of a behavioral thing than a knowledge thing.

So back to this eighth graders taking AP Calc BC story. We originally started as a nonprofit school program founded by Jason and Sandy Roberts. One of their kids, Colby, was on the fourth-grade math field day team, and his parents were coaching that team. Their kid and his friends were all really excited about learning math, so they did the standard fourth-grade field day stuff. But the kids were so excited that they didn’t want to just stop at fourth grade. Something they would often ask Jason and Sandy was, “What’s the highest level of math?”

Jason and Sandy would have to say, “Well, it goes really, really high, but for your purposes, let’s just say it’s calculus, because that’s what seniors in high school take if they are on the honors track.” And the next question was, of course, “When do we get to learn it? Can we learn it now? Can we learn calculus tomorrow?” They were just so excited about it.

Jason and Sandy were teaching a bunch of these kids advanced math, even through fifth grade. They got up through a bunch of high school math and to the point where they could start learning calculus. One thing led to another, and this turned into an official school program that was not just a pullout class but became a daily Math Academy class. There were other cohorts that came in following years.

What this turned into was that we would get students in sixth grade who were solid on their arithmetic. They might know what a variable is, but they didn’t really know how to solve equations or anything. They were kind of at an early pre-algebra level. We would scaffold them up, teach them all of high school math within the next two years—sixth and seventh grade. Pre-algebra, algebra one, geometry, algebra two, and pre-calculus. In eighth grade, they’d be ready to take calculus.

Then, they would take the AP Calculus BC exam. We got to the point where most of the students who took the AP Calc BC exam in eighth grade passed, and most who passed got a perfect five out of five on the exam. A couple of things I should say is that these are not national talent search students.

How the kids were selected was that they scored at or above the 90th percentile on a middle school math placement exam, which is typically taken by all fifth graders in the district around February or March. They were then invited to join the program. It’s a seventh-grade math skills test, so it provides a somewhat high skill level, but it’s not designed to identify math aptitude.

This is also in the Pasadena Unified School District, where about two-thirds of the student population qualifies for the federal free and reduced lunch program, and about 44 percent of all K-12 students are educated in private schools, compared to the California average of 11%. This is not a particularly talented group of students. It’s not a biased group of top students. Just think of a standard school and kids in the standard honors class. They can be accelerated way, way, way higher than they currently are.

When Jason and Sandy were teaching, they were doing this all manually and achieving very good results. But these results got even better once students started working on the Math Academy system. Jason got tired of the kids saying, “I forgot to do my homework,” or “Oh, I forgot a pencil,” or all these excuses for not doing work. So, he just built a system where he could pick problems for them to do, and then all they had to do was log in at home and do the problems online.

It would automatically grade the problems and keep track of all the kids’ stats, keep track of the class accuracy, and various topics. Over time, this evolved into a system that did more and more of the teaching work. In the summer of 2019, that’s when Jason pulled me in to make this system a fully automated platform that would actually select learning tasks for students. So, we built this automated task selection algorithm and continued refining it. By the time the pandemic hit in 2020, the big question was how to maintain this level of efficiency from manual instruction.

The answer was, “Well, we have this halfway baked task selection algorithm. Let’s just get it all in place over the summer and put the whole school program on it.” And that’s what we did. That’s how our AP Calc BC scores skyrocketed, from putting them on the system.

Justin: In 9th through 12th grade, what they do is learn a bunch of undergraduate math. We have PhD-level math instructors who teach the 9th through 12th graders, and they learn linear algebra, multivariable calculus, probability statistics, real analysis, abstract algebra, and algebra.

They go through all this content, and they are also often working on some independent math projects. In terms of full outcomes for the students, it’s still pretty early, so the first cohort is still in their junior year of college, and they haven’t really hit their careers yet.

We’ve been hearing a lot of really cool things from them. One kid is doing an accelerated master’s degree in school. Some other kids got into MIT and Caltech. Another kid is currently a senior in high school, and he did an internship at Caltech the summer of his sophomore year, then worked there on a research project for his junior year. He actually let me know a couple of weeks ago that he got a paper published as a high schooler, a solo-authored paper in a legit journal. It’s interesting to see his author affiliations: Pasadena High School and California Institute of Technology.

Justin: Well, I’ll say a particular and a general. A particular thing that I am really excited about is our upcoming machine learning course and machine learning course sequence, and we are actually going to have a programming course too.

In general, what’s next? We are coming for everything. Machine learning, proof-based math. We have a methods of proof course. We have a linear algebra course that’s more concrete than proof-based, but we are going to have an abstract linear algebra course in the future, on par with Axler’s linear algebra. We don’t have a real analysis course in the system yet, but we are going to. We are going to have competition math. We are going to have in-task coaching to guide students through the learning process. We want to become the ultimate math learning platform.

And anything that we don’t currently have under our umbrella doesn’t mean that it’s not on our roadmap. We are coming to engulf math education, do it the way that it ought to be done, and empower the next generation of students with the ability to learn as much as they can and make use of that mathematical knowledge throughout the rest of their lives and their careers.

Justin: Thank you so much for having us. We are absolutely honored to be here with you. It was a great conversation for us as well.

Prompt

The following prompt was used to generate this transcript.

You are a grammar cleaner. All you do is clean grammar, remove single filler words such as “yeah” and “like” and “so”, remove any phrases that are repeated consecutively verbatim, and make short paragraphs separated by empty lines. Do not change any word choice, or leave any information out. Do not summarize or change phrasing. Please clean the attached text. It should be almost exactly verbatim. Keep all the original phrasing. Do not censor.

I manually ran this on each segment of a couple thousand characters of text from the original transcript.


Want to get notified about new posts? Join the mailing list and follow on X/Twitter.