Self-Transcript for Golden Nuggets Podcast #35: Optimizing learning efficiency at Math Academy

by Justin Skycak (@justinskycak) on

Why are people quitting their jobs to study math? How to study math like an Olympic athlete. Spaced repetition is like "wait"-lifting. Desirable difficulties. Why achieving automaticity in low-level skills is a necessary for creativity. Why it's still necessary to learn math in a world with AI. Abstraction ceilings as a result of cognitive differences between individuals and practical constraints in life. How much faster and more efficiently we can learn math (as evidenced by Math Academy's original school program in Pasadena). Math Academy's vision and roadmap.

Cross-posted from here.

Want to get notified about new posts? Join the mailing list and follow on X/Twitter.

Link to Podcast

The transcript below is provided with the following caveats:

  1. There may be occasional typos and light rephrasings. Typos can be introduced by process of converting audio to a raw word-for-word transcript, and light rephrasings can be introduced by the process of smoothing out natural speech patterns to be more readable via text.
  2. The transcript has been filtered to include my responses only. I do not wish to infringe on another speaker's content or quote them with the possibility of occasional typos and light rephrasings.
∗     ∗     ∗

Justin: Thanks, James, Zander, it’s great to meet you both.

Justin: Well, I guess I was also surprised to see some people quitting their jobs. I’ll say that was not something I expected. I can only speculate on it, but my guess is that some people have been wanting to make career transitions into machine learning, tech in general, or some quantitative field. Maybe this has pushed them over the edge, making them feel like it’s really possible.

If they just lean into skilling up, what once seemed like an aspirational dream that was so far away is now within reach. All they need to do is level up their math skills, and they’ll stick the landing.

I was actually talking to one of these people yesterday. Last night, we had a three-hour conversation. It was crazy because I saw that he said he was quitting his job as a teacher. I was like, dang, that’s a big career move. This is right at the beginning of the year. I knew he was going to be making a transition to software engineering, which he had on deck.

In addition to the whole Math Academy thing, he was just getting a little fed up with the situation this year. He said he had a class of 53 students, with an average class size of 40. That’s just ridiculous.

I’m sure various people have various reasons. In addition to the aspirational “learn math, get into a career situation you like,” there’s also the fact that teaching nowadays is kind of a pressure cooker for a lot of public teachers. It’s just stress on top of stress on top of stress. Eventually, there’s this straw that breaks the camel’s back. It’s a stick-and-carrot situation going on.

Justin: The two that pop out to me are, first, this teacher was a math teacher. A lot of math teachers go into the profession thinking they’ll have fun learning and exploring math with kids. It ends up being much more of an administrative burden, where you have to become a taskmaster and ensure kids are on track.

In a class of 53 students, maybe five are really interested and trying hard, 20 don’t hate it but aren’t particularly motivated, and the rest are just against you. You have to essentially be a drill sergeant, holding them accountable. If you enter the teaching profession hoping for a “Dead Poets Society” inspirational experience, it often doesn’t turn out that way. It can be frustrating to deal with this dynamic. I imagine many math teachers feel similarly.

On the other hand, there are quite a few software engineers. I think one of them left their job, but in software engineering, it’s often easier to keep things as they are professionally and take on more machine learning projects or math-related work. If you’re already in the arena, you don’t have to leave to join somewhere else. You can just ease your way into it.

Justin: James, tell me about that.

Justin: That’s great.

Justin: For you, it’s about focusing 100% on taking this step and landing the transition.

Justin: I would agree that being successful at Math Academy requires motivation. Whether it’s intrinsic motivation to learn the material or extrinsic motivation—maybe you don’t really like math for math’s sake, but you love machine learning and know you need to “eat your veggies” to get there.

If someone comes in thinking, “Math Academy is super efficient, so I won’t have to work very hard,” that attitude won’t work. That’s not what we mean by efficient. The training is taxing. Efficiency comes from packing the maximum amount of learning into the time spent.

I usually describe it like this: Imagine you’re signing up for a gym. Lots of people go to the gym and do random exercises on random machines. Between sets, they might do a couple of reps on the bench press, sit down for five minutes, play on their phone, text a friend, and call it a workout after an hour. That’s very low efficiency.

Now imagine going to a gym where there’s an Olympic sprinter waiting for you. They say, “You want to work today?” and you reply, “Yes, I’ve got an hour. I’ll do whatever you tell me. Use whatever training methods will move the needle most on my strength and speed.” The sprinter makes you work hard. You’re sweating, completely wiped out afterward, but they won’t ask you to do anything you’re not capable of. Everything is within your zone of capability, but you’re working hard the whole time.

That’s where the effort comes in. You’re packing maximum learning into the time. It’s like circuit training supersets. There’s no downtime. You’re constantly moving and making the most of your time.

Justin: I love it. That’s the kind of person we want using Math Academy. People have asked me, “Can I just do five minutes of math a day, or maybe flip through my phone for a couple of minutes here and there? Math Academy is super efficient, so it’ll help me do that, right?” I tell them, no, don’t. You’re not the kind of person who’s going to benefit. That’s a completely different optimization problem.

A lot of people want to learn a superficial level of math. They learn at a very surface level, not comprehensively. They can’t code anything up with the math they’ve learned, but they think, “Hey, it’s cool. I know integration can be interpreted as the area under the curve.” That’s fine, but it’s not what we’re about. We cover the full standard. You’re going to learn how to do everything. We’re not optimizing for a small amount of time.

You’re coming in like you would with a workout—at least a couple of hours per week total—and packing the most into that time. We’re not afraid to turn people away. If they’re not the right fit, they’re not the ideal customer.

Justin: That’s a great question. For lower bounds, I typically say the cutoff is a couple of hours a week, or maybe half an hour a day. Twenty minutes a day might work, but once you go lower than that, it’s like, what are you even doing? Are you really here for your workout, or what?

The one exception is younger kids. If you have an eight-year-old or similar age, they have a limited attention span.

Justin: That’s about right. It depends on the topic and level of math. At higher levels, things get more complex, so lessons can sometimes take 20 minutes. For lower grades, though, 10 minutes is typically the upper bound.

Justin: In fourth and fifth grade, the courses are relatively small compared to something like AP Calculus or Linear Algebra. In terms of what students are expected to learn in the school system at those grade levels, even 10–15 minutes a day can help a kid progress quickly. Once you get into more advanced levels of math, you need to put in more effort to see significant progress.

Justin: That’s a really good question. Let me first address the other part of Zander’s question about the upper limit of effort. If someone puts in a couple of hours a week, that’s great. There’s no point where the learning process breaks down with higher levels of effort.

This actually ties into your question too. It’s very similar to someone training seriously for athletics. People who are seriously training can go for multiple hours a day. However, deliberate practice—being focused and working at the edge of your ability—is very taxing. There’s a limit to how long you can do that productively, and it varies for each person. For some, an hour a day is enough to leave them totally spent. It’s like their mental stamina is exhausted, much like physical exhaustion after intense physical activity. At that point, it’s time to rest and come back the next day.

Others can go longer, and stamina can be built over time. Just like in athletics, consistent training increases stamina and allows for longer sessions. You can imagine serious athletes during the off-season getting out of shape. When they return for regular practice, the first day can be brutal, and they might need to ease back in before regaining their training capacity.

Nothing breaks down when you’re doing extensive training each day, as long as you’re consistent. The only scenario where things would break down is if you cram, like doing ten hours of work in one day, and then don’t train for several weeks. In that case, you’d forget much of what you learned. But if you’re consistent, doing sessions every other day or several times a week, and putting in serious effort, nothing breaks down. You just move faster.

Justin: Exactly. I would say I’m a little hesitant to claim there’s no such thing as overtraining. When you overtrain athletically, you can feel it. You know you’re not operating as well as you were initially. You encounter progress problems, the needle isn’t moving, and you feel fatigued and tired.

In learning, the analogy would be burnout. If you feel burned out or lose motivation for what you’re learning, it’s time to tone things down, take it lighter, and give yourself some recovery time. But if you’re feeling great, making progress, passing tasks, and learning effectively, there’s no reason to stop if you want to do more.

At the same time, you don’t want to be the person who joins a gym, works out for four hours on the first day, and gets so sore they can’t continue. You have to pace yourself. It’s about staying in touch with how you’re feeling and monitoring your progress.

To jump to James’s question—James, your question was about the principles of effective training. Can you summarize it again?

Justin: One of the biggest insights is just the importance of having a strong knowledge of your foundational skills. You need to be automatic—the key word is automaticity—on lower-level skills.

The idea is, if you’re a basketball player, you need to be automatic on your dribbling. You’re just not going to be able to function or do any advanced maneuvers if you have to consciously think about dribbling the ball. Your head’s going to be down, you’re going to be looking at the ball, you’re going to be tripping over your feet, you won’t know where your teammates are, and you won’t even know where the hoop is. You might just be running the wrong way. You have to be able to look up, not think about dribbling, and do a bunch of other things in parallel.

The same thing happens with learning, especially in hierarchical knowledge domains like math. Imagine dribbling is like times tables. If you’re trying to do algebra and you don’t know your times tables, it’s like trying to play basketball without knowing how to dribble. Not only is it going to take forever to get through a single problem, but you’re going to be continually interrupted because you have to go back and think about these low-level skills.

The particular example I like to give is a student learning what an exponent is and needing to compute 4³. That’s the day’s lesson: cubing numbers. First problem, 4³, which is 4 × 4 × 4. If somebody is automatic on their times tables, like a basketball player who’s automatic on dribbling, they can just think, “Oh, 4 × 4 = 16, not even thinking about it. Then 16 × 4 = 64. Easy. What’s the big deal?”

But if you don’t know your times tables, and you have to consciously recompute every time, you’ll get lost. You might think, “Okay, 4 × 4. What is that? I don’t know. Let me count it up. 4 + 4 = 8, plus 4 makes 12, plus 4 makes 16.” Now do 16 × 4. “Okay, 16 + 16 = 32, plus 16 makes 48, plus 16 makes 64,” and maybe you make a mistake somewhere and get 62. Then you think, “I got it: 62,” and someone tells you, “Not quite.” Now you have to go through the whole process again.

You’re not getting additional reps of practice because you’re still struggling with the same problem. You’re missing out on noticing key things, like someone who can compute cubes quickly might notice, “Oh, every time I cube an even number, the result is even. Every time I cube an odd number, the result is odd.” If it takes you five minutes to work through a problem, you’re not going to notice those trends.

This goes even further. Some students aren’t automatic on times tables or even on addition. They have to finger count. Imagine computing 4³ without knowing addition facts. You’d break it down as 4 × 4 × 4. To figure out 4 × 4, they might go back to 4 + 4 = 8, plus 4 makes 12, plus 4 makes 16. I made a section in my book, The Math Academy Way, where I detail all the individual computations going on here. It takes a full page to compute 4³ if you don’t know your addition facts by heart and need to finger count. If you’re automatic on your skills, it’s two sentences of thinking. That’s it.

Justin: That’s a great point about reading. There are so many other principles besides automaticity. One is the idea of interleaving—mixing up your practice. If you’re a basketball player training, that might mean not just practicing 50 jump shots from the same location on the court, taking the same stance. Instead, you move to other areas of the court, throw in a layup, or try a jump shot while coming at it from a different angle or after receiving a pass from a different direction.

The same thing happens with math problems or learning in general. If you give a student a task, the first time they complete it, they have to load information into their mental RAM, their working memory, to complete it. But if you immediately give them the same task again, that information is still sitting there in working memory. They don’t have to pull it from long-term memory, which makes the task artificially easy.

At the beginning, when a student is first learning a skill, making the task artificially easy is actually good because it’s like using a weight that’s calibrated to their current ability. But as they progress, you want to strip away that easiness and require them to pull information from scratch every time. That’s how you really write something to memory.

If the context is already loaded in their head and they just have to do the final part of the problem, and it’s exactly like the previous problem, it’s like going to the gym and having your spotter help you lift the weight every time. You’re not going through the full process of taking information from long-term memory and lifting it into working memory. If you let it stay in working memory and just keep using it from there, it’s like you think you’re lifting the weight, but you’re not actually doing the full workout.

Justin: There’s more to it. A lot of these learning strategies have many different benefits. For interleaving, one benefit is having to load information from scratch every time. Another is that it helps train your ability to match solutions to problems.

For instance, suppose you give someone a linear equation to solve, and they solve a bunch of linear equations in a row. They get into the flow of solving those. Then you give them a quadratic equation, like $x^2 - 4 = 0.$ They shift to solving quadratic equations, using the square root method, and get into the flow of that.

The next day, you give them a new equation, and they can’t remember which solution procedure goes with which type of equation because they haven’t practiced distinguishing the skills. You just gave them a bunch of problems of type A, and they executed solution A. Then you gave them problems of type B, and they executed solution B. They just knew what they were supposed to do each time without thinking about it.

But they didn’t practice deciding, “Should I use solution A or solution B here?” Interleaving forces you to make that decision. It makes part of the problem identifying the correct approach.

Justin: Right, the associative interference happens when you have something like 3 × 8 = 24 and 4 × 6 = 24. You get used to the idea that there’s an “8 fact” that maps to 24 and a “4 fact” that maps to 24. When you’re asked something like 4 × 8, you might experience interference from those other facts. You think, “Okay, there’s a 4 fact and an 8 fact, and both involve 24.” This might push you to incorrectly say 4 × 8 = 24.

This kind of error shows up a lot in multiplication mistakes. It’s actually one of the top missed multiplication facts. You wouldn’t expect 4 × 8 to be one of the harder ones, but it is. Initially, when teaching someone, you want to keep these things separate. Have them execute the skill in isolation without any additional difficulties to make sure they can get over that initial hump. Once they’ve mastered the skill in isolation, it’s time to combine them and untangle these responses in their head.

Justin: There are a couple of ways to interpret layering. The way I usually talk about it is layering on actual content knowledge, like learning more difficult techniques or higher levels of math. But I think what you’re getting at with layering here is adding difficulty to the learning process, like distinguishing between solution techniques.

The official term for something like that in the literature is “desirable difficulty.” The idea is that you can add certain features to a learning task to make the recall process harder, but in a productive way for the learner. Interleaving is one example because it forces the learner to pull information from scratch and match the appropriate information to the problem they’re solving.

Another type of desirable difficulty is spacing out practice. When you space out practice, you let your memory get a little fuzzy between practice sessions. Overcoming that difficulty during recall strengthens the memory trace. It’s like your brain adapts and says, “This memory isn’t as strong as we need it to be,” so it increases retention.

Justin: That’s exactly right. You’re upping the level of difficulty each time. I always think of this as putting weight on the bar in the gym. Maybe you start out with 100 pounds, and it’s tricky to lift. But as you keep going to the gym and doing this repeatedly, you need to make it more difficult for yourself. You don’t get stronger by just lifting 100 pounds indefinitely—you have to keep adding weight.

In the context of spaced repetition, the “weight” you’re putting on the bar is how long you wait until the next recall. Spaced repetition is like weightlifting, W-A-I-T. This is all subject to the fact that you’re overcoming the difficulty. That’s the key to desirable difficulties. Some people think, “Oh, all difficulties are desirable, so let’s throw everything at the learner at once.” They might make the information really fuzzy, jump straight into interleaving, and use the hardest kind of interleaving.

That’s the equivalent of taking someone who just signed up for the gym, is learning to squat, and putting 500 pounds on the bar. They just get crushed. It doesn’t make them stronger—they’re not able to lift the weight, they’re not building muscle, and they’re getting demotivated because they feel terrible.

The key to desirable difficulties is making it as challenging as possible while ensuring the learner can overcome those challenges. At first, that means keeping things relatively easy. Even if it’s artificially easy, they need to get to a point where they can handle the difficulty. Once they’re there, you start ramping it up.

Justin: Totally. There’s a lot to this. A lot of people think drill-style questions are the opposite of creativity. They might think automaticity is the opposite of creativity because it’s about doing things without thinking—getting so comfortable that you can be a robot doing those low-level skills. They think, “Robots aren’t creative; humans are creative. You don’t want to be a robot, right? Why are we doing this?”

The idea is that you want to free up your mental process for creativity. Everyone has a limited capacity in their working memory for how much mental effort they can devote to solving a problem at once. If you’re spending all that effort on low-level skills, like recomputing multiplication facts or, as a basketball player, thinking about dribbling, you’ll overwhelm your working memory. You won’t have any room to think about higher-level problem-solving features, look for your teammates, or come up with creative insights.

You want to be robotic on those low-level skills so they don’t exhaust your working memory. That way, your human creativity is free to focus on the higher-level thinking needed to solve problems.

Justin: That’s a great point. I like that a lot. People often think of creativity in the arts, and you’re right—it applies there. The same is true in music. If you don’t know your scales and haven’t mastered the basics of your instrument, you can’t bring your creative ideas into reality. It’s hard to think creatively when you’re struggling just to figure out the basics.

The same goes for writing. If you don’t have a large vocabulary or aren’t good with grammar, it’s impossible to let your words flow. You’re constantly stuck thinking about the low-level grammatical and semantic details.

I think I accidentally interrupted you earlier if you were trying to finish a point. Sorry about that.

Justin: Right, exactly. I think we’ve covered it well. Let’s move on.

Justin: The question is about people worried whether learning math has utility anymore and if AI is just going to take over everything. I have to say, that perspective doesn’t even make intuitive sense to me. I’m sure it’s the same for you, but anyone who’s had to solve hard software problems—not just putting together an HTML page or solving a LeetCode problem—knows that a lot of problems in software are about defining the problem itself.

Once you reach a high enough level, part of the challenge is figuring out what the software is even supposed to do. Your software has to make life easier for someone, but the hard part is defining what that actually means. Once you know what it’s supposed to do, it’s a lot easier to write it. At least for now, today’s AI tools are far from being able to handle problems like that.

It’d be great if I could spin up one of these chatbots, feed it the Math Academy codebase, and say, “Calibrate all the spaced repetition mechanics in the model.” But so much of this code doesn’t exist anywhere else online. It involves new algorithms and perspectives, things AI has never been trained on before. Even understanding what “calibrating the spaced repetition model” means depends entirely on the specific data available.

Another example is improving the onboarding experience for users. That requires understanding the specific problems students face when using the system. You can’t know that unless you’re talking to users and getting a sense of their real issues. It’s not something you can pull from a dataset.

It reminds me of recent conversations I’ve had. A friend might suggest doing X, Y, and Z to improve the product. Sure, those are good ideas, but they’re not as impactful as focusing on A, B, and C instead. In software, there are always a million things you could do to improve a product. Most of them would be positive changes, but the hard part isn’t knowing what will improve the product—it’s knowing what will move the needle the most.

There’s an exponential distribution of impact. Some improvements might have 100x or 1000x the effect of others that take the same amount of work. Threading the needle on that requires deep domain knowledge about how the product works, who’s using it, and what problems they’re facing. These things aren’t going to show up in a training dataset.

I think of AI tools as like Google on steroids. If you could find something on Google, then an AI tool can probably help you with it. But if it’s not on Google, there’s a limit to what AI can do.

Justin: That’s a great point. I hadn’t thought about it like that before, but it’s spot on. It reminds me of a conversation I had recently about backfilling—taking a top-down approach to learn whatever information is necessary to do a task. It’s much easier if you’ve already learned that information in the past and are just rusty on it now. You have a general mental framework to work with. But if you’ve never learned that material before, you’re just lost.

It’s like handing someone a hardcore machine learning paper and asking them to implement it when they don’t even know calculus or algebra—they’re stuck at arithmetic. If you ask them, “What math topics do you need to know to implement this paper?” they’ll have no idea.

Justin: Exactly. If you’ve learned the material before, even if you’re not 100% up to speed, you’ll look at it and think, “I remember these kinds of manipulations or statistical measures.” You have a frame of reference, so you know where to start working.

Justin: I’d actually agree that, in theory, a lot of people could go on indefinitely if they had an infinite time horizon and unlimited effort. Of course, there’s the caveat that some people might have working memory disabilities or other impediments to the learning process. For example, some people have difficulty imagining things in their minds. I think the technical term for that is aphantasia.

Justin: Yes, that sounds right. Assuming there are no cognitive difficulties like that impeding the learning process, I’d agree that, in theory, you could keep progressing indefinitely.

However, in reality, an abstraction ceiling arises due to practical constraints of life and individual differences in the speed at which people acquire and forget knowledge. For instance, people have different working memory capacities. Those with higher working memory capacity often find it easier to see the forest for the trees, given an equivalent level of prior knowledge.

Justin: Exactly. You have a leg up on seeing patterns and noticing things. There have been some studies on this. I remember one study about working memory capacity being linked to people’s ability to generalize rules from datasets.

In the study, participants were given input-output data points, like “input 5, output 10” or “input 2, output 3.” These numbers followed a parabolic curve. A successful generalization meant that after reviewing a list of these numbers, participants could predict outputs for new inputs, recognizing that the data followed a curve. An unsuccessful generalization meant the participant failed to see the curve, perhaps predicting something more like a constant line or upward slope. Working memory capacity was found to influence this ability to generalize successfully.

Back to the main point—there are cognitive differences in people that make acquiring skills easier or harder. Some people acquire skills faster, while others acquire them slower. This applies to athletics, music, and other areas. These same differences also affect forgetting rates. Studies have shown that people who acquire information faster are also slower to forget it. It’s like a double advantage.

Justin: You’re asking if pedagogical scaffolding can factor out these cognitive differences?

Justin: That’s a good question. I’m not sure about specific studies on that. Honestly, I don’t know. I’d need to look at the literature to answer definitively.

Justin: I would agree with that. I’m a little hesitant to say flat out that even if people have the same exact level and depth of background knowledge, their forgetting rates would differ. But my intuition points to the same conclusion. In practice, though, it’s important to distinguish between a platonic ideal educational setting—where you manage to get everyone to the same level—and the reality that some students require more practice than others. I’m not sure everyone ever really reaches the exact same level.

Thinking back to when I was teaching and using Math Academy in classrooms, I had a calculus class. One thing I noticed was when we covered the limit comparison test for convergence of sequences, some students naturally picked up on the trick involved. The trick is comparing one series to another mathematically similar series with the same convergence behavior. This isn’t a topic requiring explicit background knowledge, but something more intuitive. If a student doesn’t see it, you can teach why it works, but some students just see it immediately while others don’t.

Even when students had gone through the same curriculum and achieved baseline mastery in all the preceding topics, there were differences. It felt like some students with high generalization ability had an embedding connectivity in their background knowledge that extended beyond the expected scope of the curriculum. With these students, it often felt like you were reminding them of things they already knew, even though they hadn’t explicitly learned it. Their learning experiences seemed to seep through in ways you wouldn’t predict just by looking at the curriculum.

If you had two students with exactly the same background knowledge—not just having completed the same curriculum to the same level of mastery, but literally having identical connectivity in their brains—I’m not sure if they would have the same or different forgetting rates. However, I think getting to that point is unrealistic in real life, because there are always differences in how knowledge is managed and internalized.

Justin: That’s a great estimation question—how many orders of magnitude can we improve? I’m not even sure what the scale of measurement would be for something like this.

Justin: Right, it’s tricky. What does one unit of intelligence mean? What does improvement mean in this context? It’s abstract, but that’s part of what makes the question great—it raises so many good secondary questions. I do have thoughts on this.

To start with the obvious, the point of instructional scaffolding is to help people overcome cognitive difficulties that would otherwise prevent them from learning new information. This applies to everyone, not just slower or faster learners. The goal of instruction is to accelerate the acquisition of information by presenting it in a way that’s easy to digest and providing a practice environment that is highly efficient. You’re focusing on knocking out repetitions incrementally at the edge of the learner’s ability, building things up progressively.

This approach clearly works. For example, a few centuries ago, in the time of Isaac Newton, calculus was cutting-edge mathematics, and even many serious mathematicians didn’t get to that level. Now, high schoolers are learning calculus. Instructional improvements have definitely sped things up.

How much further can it go? If I were to estimate, based on Math Academy’s original in-school program, we saw mathematically bright kids—those in the top 5% of honors classes—go through a radically accelerated curriculum. They were learning college-level calculus, including AP Calculus BC, in eighth grade. By the time they graduated high school, they had learned the equivalent of an undergraduate math degree.

And to be clear, these weren’t students handpicked from a national talent search for the top ten math prodigies in the country. These were local kids, mathematically bright but more like the top 5%, not the absolute top tier.

Justin: This is all happening in Pasadena. These students are basically just the top 5–10% of math students in a normal high school or middle school. If you put them through the Math Academy system, held them accountable for doing the work, and avoided distractions like goofing off on YouTube, this level of acceleration is absolutely possible.

I’d love to see this approach become more normalized for kids who are into math and serious about learning it. You don’t have to be a math prodigy. If you’re a good math student and motivated, using the Math Academy system puts learning all of high school math by middle school and undergrad math by high school well within reach. That’s not an extreme level of talent—it’s achievable.

For students who are very serious, they could go even further. For instance, the students I’ve mentioned who were learning AP Calculus BC in middle school and completing an undergrad math degree equivalent in high school weren’t working four hours a day. It was just a normal course load. They had about 40 minutes of fully focused work per day—sometimes finishing all their homework in class. At most, they’d spend 15–30 minutes on homework outside of class. The training is intense, but it’s not an all-day event.

Now imagine pushing that further. If this level of progress is possible by simply replacing the classroom experience with a more productive learning system, think about adding more deliberate practice. We talked earlier about how much taxing learning activity someone can handle before hitting overtraining. You could definitely increase that from 40 minutes to an hour a day, and probably even two hours a day.

Justin: Exactly. Including weekends and summer would be huge. If you did two hours a day, every day, and didn’t skip summer, the speed at which you’d learn would be incredible. I don’t have the exact computations worked out, but it would be much faster.

To answer James’s question about the speedup, I think we could see an order-of-magnitude improvement in how quickly people reach a certain level. For many, that speedup would also raise their effective ceiling. Some people, after learning a lot of math, will just want to keep going because they love it. For those people, their ceiling will definitely rise as they keep learning.

However, some people only want to learn enough math for their next step, like engineering or med school, and stop there. In those cases, they’ll just reach their artificial ceiling faster and decide it’s enough.

For those who continue learning as much math as possible throughout their lives, better scaffolding and practice environments will effectively raise the ceiling. But there’s a limit. Once you reach undergrad math and start moving into graduate-level material, you kind of run out of road. There aren’t many good instructional resources beyond a certain point. The speedup you get from an optimal practice environment diminishes because you’re entering an area with less scaffolding and more friction.

This is happening now in machine learning. Back in 2014, when I was a freshman undergrad and got into neural networks, convolutional neural networks were just becoming popular. Resources were scarce. Nowadays, learning about convolutional neural nets is almost like learning baby stuff—it’s so well-documented and scaffolded. But once you move to cutting-edge topics like transformers, you hit a zone where there’s less structured material. Progress becomes harder, and your effective ceiling isn’t raised as much because you’re trekking through uncharted territory.

Justin: That’s a great point. The people at the edge of the field are typically those who had an easier time getting there, which skews towards people with advantageous individual differences. It’s a compounding effect because those people are better equipped to slog through the poor pedagogy. If you have those advantages, you’re going to have an easier time.

Justin: Yes, I would totally agree. I think the abstraction ceiling is essentially when someone accumulates enough friction in the learning process that they decide, “Forget it, I’m doing something else.” For some people, that friction happens with algebra, and they jump off the math train after that. For others, it’s calculus, undergrad math, or beyond.

It’s about when the amount of educational friction you experience offsets your ability to acquire new knowledge. If you have a faster rate of skill acquisition, you’ll get further, but friction will eventually catch up. At the edge of the field, even the most prominent mathematicians face friction, but there it comes from producing new knowledge. It’s still friction, though.

Many people jump off the train well before their learning completely grinds to a halt. It becomes a trade-off: people want to optimize their time and get the most reward out of life. When the friction makes progress unreasonably hard and others with a competitive edge find it easier, they often switch to something else they’re better at.

For example, someone might be great at math, speeding through algebra and calculus. But in undergrad math, the friction sets in. That’s a common story.

Justin: Exactly. The level of competition changes. Instead of doubling down on math, you start looking in other directions. You might realize, “Hey, I’m pretty good at coding, and many of these people aren’t as good at coding.” Then you start leaning into coding more and take a meandering path.

Justin: I would mostly agree with that. But I think interest in the material can also be influenced. There are ways to try to raise people’s ceilings in terms of motivation and interest. Right now, Math Academy has been heavily focused on solving the training techniques problem—optimizing the nuts and bolts of learning. In the future, we want to lean more into optimizing the motivational process and finding ways to keep people on track.

Justin: I don’t have any pressing commitments, so I’m good to keep going as long as you’d like. I could talk about this stuff for hours.

Justin: I just want to say I completely agree with the idea that nothing succeeds like success. Being able to make progress is the biggest motivator. When you feel like you’re growing, that drives you to keep going. In that sense, Math Academy does have motivation baked in, even if it’s not explicitly designed as a motivational tool.

But there’s still more we could do, like adding streaks or notifications. For example, if you set a practice schedule and it’s 8 p.m. and you haven’t done any math, we could send a notification saying, “Do one lesson tonight to keep the streak going.” Duolingo does things like that well.

That said, one challenge we face compared to other platforms is that we don’t pull punches. We cover a lot of material rigorously, and we assess you thoroughly. If you don’t do well, we tell you and make you try again later. Succeeding in this kind of environment takes more motivation than working through a watered-down curriculum that lets you skip problems, move on without addressing mistakes, and fail without consequences.

Justin: The modest phrasing is intentional. On one hand, everybody appreciates someone who’s humble, someone who does great work but just says, “Yeah, I think we’re doing a pretty good job.” It makes it more approachable. On the other hand, there really are some intense college courses out there that go into deep coverage. We don’t want to declare that every other resource is terrible and that we’re the absolute best because some classes do a good job.

That said, the default mode in most schools is severely lacking in comprehensiveness. Compared to the typical school class, we blow them out of the water in terms of coverage. There might be some resources that are more on par, but even then, I think it’d be hard to find something more comprehensive than what we’ve built. We do textbook comparisons to ensure we create a superset of all the key information from major textbooks.

Justin: Thank you for the kind words. We’ve put so much work into this. Jason and Sandy have been working on this for over a decade since the initial conception. Alex has been on it for seven years, and I’ve been working on it for about five and a half years. It’s been constant work.

It’s like you took a bunch of nerds, put them in a room for 80 hours a week, and just let them code and problem-solve. Five or ten years later, you check in and see what they’ve built. It’s been a lot of work, and we’re proud of it.

One thing I want to address, tying into the comprehensiveness point, is when people say, “You don’t have a proof-based linear algebra course. How can you compare to something like Axler’s Linear Algebra Done Right?”

The thing is, we’re building this curriculum from the ground up. Our current linear algebra course is more of a prerequisite to something like Axler’s book. It’s designed to ensure students fully understand the procedures before jumping into abstract, proof-based linear algebra.

Sometimes universities throw students into Axler’s book without prior knowledge of linear algebra, and it’s a major struggle. Many students don’t know how to compute eigenvectors, eigenvalues, or diagonalize a matrix, and now they’re being asked to do proofs that require intuition about these processes.

Justin: You can start to think, “Oh, I just can’t handle the material,” when in fact, there’s this huge chunk of prerequisite knowledge missing. If you had that, you could totally get through Axler’s book. What I’m getting at is that it’s 100% on our roadmap to have a second course in linear algebra, which is really what Axler’s book should be called: Linear Algebra Done a Second Time. That would be a more accurate title.

We fully intend to offer that course. To all the pure math enthusiasts who get a bit high and mighty from their ivory towers, saying, “We do proofs, and you don’t,” just know that Math Academy is coming for you. We’ve just released our initial Methods of Proof course, which covers all the basic proof techniques. Unfortunately, many students don’t learn these before tackling advanced math and end up struggling.

This is just the beginning. We’ll roll out more proof courses and advanced material. Anyone claiming that Math Academy won’t reach the full comprehensiveness of a true undergrad math degree might be correct at this exact moment, but that’s going to change. We’re going to cover everything better than anyone else.

Justin: I can talk about that. The idea is we have so many things to do, and we were working through a lot of those tasks. Then, in August, a couple of schools contacted us about using the system. I can’t name the schools, but one of them is one of the top five leading institutions in the U.S. and globally.

When that kind of opportunity comes up—when those kinds of institutions are reaching out—you put everything else on hold and focus on what they need. We had to prioritize building functionality to better integrate the platform with schools.

A lot of what we had planned was focused on individual learners, who remain our primary market. But when opportunities like this arise, you have to let some fires burn while you capitalize on them.

Justin: It’s actually kind of difficult. They’re not at odds with each other, but they’re somewhat orthogonal. There is some overlap, though. For example, if you’re a teacher using Math Academy in your classroom, you need to ensure that all students are staying on task and aligned with the process. That requires tools like a dashboard with aggregate stats for each student, showing information such as how long it’s been since they last solved a problem or were active.

This allows the teacher to identify when a student might need help. For instance, if Johnny hasn’t solved a problem in over 10 minutes, he might be stuck—or worse, he might be on YouTube instead of working. Teachers need a way to monitor and manage the entire class, but individual learners don’t require that.

However, some aspects of communication are universally useful. For example, many of our individual learners are kids whose parents signed them up. In those cases, the parent essentially acts as the teacher in a one-student class. While they don’t need to teach the subject matter, they act as the coach, ensuring the child stays aligned with the learning process. If things go off the rails, parents need to know what’s happening and how to fix it.

Even for individual learners, it’s crucial to communicate progress effectively. It’s easy to get lost in the grind of solving math problems, where you know deep down you’re improving, but nothing ever feels easier because you’re constantly pushing at the edge of your ability. This can be demoralizing.

Having concrete visualizations and metrics to show improvement over time helps. For example, celebrating milestones like reaching 1,000 XP and revisiting the types of problems that were challenging at the start can make a big difference. Seeing how those problems now feel easy can be incredibly motivating.

In the Math Academy school program, I taught an intense computer science course where students came in not even knowing how to code. We powered through basic coding, algorithms, building linear regression models, logistic regression models, decision trees, neural nets, backpropagation, and even reimplementing historical AI papers, like the Blondie24 program that trained a neural net to play tic-tac-toe and checkers.

The students were constantly tackling hard problems, and they’d sometimes feel overwhelmed, asking, “Why is everything so hard? Can’t anything be easy?” To re-motivate them, I’d remind them that things only feel easy in hindsight. I’d show them problems they struggled with at the start of the year and ask how long it would take them now. When they realized they could solve those problems in five minutes, they could see their progress clearly.

The common link between schools, parents, and individual learners is the communication of progress, motivation, and keeping things on track.

Justin: Great question. Right now, in terms of undergraduate math, let me enumerate what we have. We’ve got Calculus I, Calculus II, Multivariable Calculus, Linear Algebra, Methods of Proof, and a Math for Machine Learning course. That last one pulls in all the necessary linear algebra and multivariable calculus and includes quite a bit of probability and statistics that we’ve built out specifically for that course.

We don’t yet have a full probability and statistics course, but that’s one of the courses we’re actively working on.

Justin: That was a bit optimistic. It’s definitely the next course up, but you’d have to ask Alex, our curriculum director. He’s Ninja_of_Math on Twitter. He’s very responsive and happy to give updates. We initially had estimated completion dates when we weren’t as prominent, so there wasn’t much pushback if we were a little late. Now that Math Academy has gained more attention, we’re getting a lot more pressure when timelines slip.

Justin: Absolutely. The more users we have asking for these courses, the more pressure there is to deliver. That’s part of why Alex recently posted about hiring content writers. Right now, he’s a bottleneck since all the content has to go through him for quality assurance. We don’t want to over-delegate and risk releasing content that’s hard to learn from. Quality is our priority, but we’re working on speeding up the development process.

In terms of other courses, we’re also working on a discrete math course, which I think will be fascinating when it’s finished. It’s going to leverage a lot of proofs. Methods of Proof will actually be a prerequisite for discrete math, which will go very deep into proofs while also covering graph algorithms and introducing computer science concepts. That’s going to be a fun one.

We’re also planning an abstract algebra course. We already have some abstract algebra content from the school program, but it’s not a complete course yet. The existing content is decent but could be improved. When we officially launch the abstract algebra course, we’ll enhance what’s already there.

Differential equations is in a similar state. We have some content, but it’s not part of a fully released course yet. When we officially release a differential equations course, we’ll need to revise and expand the material.

Right now, we have university-level math like differential equations and abstract algebra floating in the system. These topics aren’t part of an officially released course but were used in the school program. If a user completes all the courses in their sequence—say they finish the foundation sequence and high-level courses like Math for Machine Learning—they might reach the end of the road. In that case, they’ll start receiving a hodgepodge of topics that are live in the system but not yet part of officially released courses.

Justin: These are topics that are in a good enough state that high schoolers in our school program were able to learn from them pretty well. But we definitely want to revamp them before they become part of a publicly released course. It’s because we try our best to prevent people from running out of things to do. If there’s content in a workable state, we’ll provide it once they complete all the topics in the system.

Just the other day, my model broke an edge case I always wondered about: someone actually completed all the topics in the system. They went through everything, including the hodgepodge topics like differential equations. Now they’re essentially “reviewing” until we build more courses and levels into the system.

Later down the road, in addition to courses like discrete math, probability and statistics, abstract algebra, and differential equations, we’ll need a real analysis course, a complex analysis course, and probably courses on number theory or graph theory. We aim to cover an entire undergraduate math degree. That’s one of our key goals.

We still have quite a few courses to develop. I think once we have real analysis and abstract algebra in place, it will feel like a true, full undergraduate math degree. Those two courses are often considered the hardest in a math degree, alongside topology.

Justin: Absolutely. When we released our Methods of Proof course, I hadn’t seen much of it because I was focused on my model work, but when I finally looked at it, I was floored. I wished I had this course before I went to college.

It came out right after a conversation I had with someone who attended the University of Chicago. He started as a math major but quickly realized that getting a 5 on AP Calculus BC wasn’t enough preparation. His classmates were leagues ahead, already familiar with multivariable calculus, linear algebra, and proof-writing. Proof-writing is one of those things many math programs, especially prestigious ones, expect students to pick up on the fly. Unless you’re a mathematical genius, that typically doesn’t work well. You might get through, but it’ll be a rough experience.

If every high schooler aspiring to major in math took our Methods of Proof course after AP Calculus, they’d be so much better prepared. The course covers direct proofs, proofs by contradiction, contrapositive proofs, proofs in various scenarios, proofs of divisibility, modular congruence, and logical quantifiers like “for all” and “there exists.” It even includes early topics in real analysis and abstract algebra, such as epsilon-delta proofs and properties of additive and multiplicative groups of integers modulo n.

With that background knowledge, students would be so well-prepared for any college math program, even if they faced poor teaching. For example, I tutored someone through a real analysis course with no notes, no textbook, and a professor who would just write epsilon-delta proofs on the board without explaining the backward work needed to derive the expressions. He’d just say, “If we choose this epsilon, it works,” and leave students bewildered. Coming in with a strong foundation guards you against such situations.

I believe if every aspiring math major took this course, it would cut the math major dropout rate by more than half. Most people don’t lack interest in the material—they face poor teaching and struggle because of missing prerequisite knowledge. It sours their experience and makes them think they can’t succeed, when that’s not true.

Justin: This product was born out of that same sentiment. Jason, Alex, and I have all wanted to learn advanced math but had to teach ourselves because the school systems we were in weren’t supportive. When you try to do something like that, you realize just how inefficient everything is. You want to elevate your skill set, but you’re constantly facing friction dragging you down. It’s a very emotional experience.

This product is built on what we wish we had growing up. It would’ve changed our lives. I’m really interested to know more about your background. You said you didn’t go to school at all. What was your math background? How did you learn? What resources were you using, if any?

Justin: That’s a familiar situation, and I’m sure many people can relate. One key difference between Math Academy and other resources like Beast Academy is that other systems often focus on creative problem solving. That approach can feel like you’re spending forever on a single problem and not gaining much from it.

It reminds me of the issues with discovery learning for novices. It’s inefficient to be in a state where you’re forced to produce your own solution types without proper guidance. We’ve talked about how things get muddy when you’re at the edge of a field, waiting through friction because you don’t have the tools to navigate it efficiently. That happens with resources that emphasize creative insights without training you to apply them.

It’s far more efficient to guide the learner explicitly on how to solve problems, have them practice repeatedly, and build skills incrementally with guidance. The problem with taking forever to solve one problem is that it reduces your repetitions. For deliberate practice to work—action, feedback, adjustment—you need a massive volume of reps. Spending an hour solving one problem is like going to the gym for an hour and doing one push-up. That’s not a workout.

Justin: You love the graph. Everybody loves the graph. It’s cool. I remember when we didn’t actually have any sort of graph visualization. This was before anyone was using the product publicly. Back when I was just starting to work with Jason on the automated task selection algorithm, it popped into my head that it’d be really cool to visualize all these topics in some way.

I pulled up one of those online JavaScript code emulators, dumped in some connectivity for our calculus course, imported a graph visualization library, and put it on the screen. It was incredible how beautiful it looked. I showed Jason, and he said, “Whoa, we need this in the system right now.” He spent the rest of the day just looking at it, mesmerized.

We’ve wanted to lean into that more. Ideally, we’d like the graph to be front and center in all the visualizations. For example, when you complete a task, you could see the effect it has on your graph. You’d see the new topics unlocked and how the spaced repetition trickles down into related topics.

It’d be great to have weekly reports showing a time-lapse of your graph filling up—what the tasks you completed this week have done for your knowledge. There’s so much we could do with it.

Justin: 100%. You took the words right out of my mouth. That’s exactly what I was going to say. It’s also on our roadmap. We want to make it so that anytime you’re doing a lesson from a lower-level course because it’s a missing foundation for your current course, we tell you why you’re doing it. For example, “You need this lesson to unlock X lessons in your current course.”

It’s about closing the loop—making it clear how this is relevant to what you’re trying to achieve. It’s kind of embarrassing we don’t have all this in there already. Jason says all the time that the system is useful and valuable in its current state, but it’s still nowhere near what we’re imagining for it. It’s really good to hear your feedback.

Justin: Exactly. We’ve focused on building core functionality. Even then, there’s so much core functionality that’s not in there yet. We have so many big plans for this system. It should, indefinitely into the future, keep getting better and better. But I’m so glad that it’s appreciated in its current state.

Justin: I’m so happy you guys appreciate it, even if it’s a little rough around the edges. I’m glad it’s solving key problems.

Justin: No rush, no rush.

Justin: That’s a really good question. My most important takeaway… there are a couple of things that come to mind.

The first is something that reminds me of Paul Graham’s essay, Do Things That Don’t Scale. One reason Math Academy has been able to solve the problems we set out to address and provide such efficient instruction is that we didn’t start with a software solution. Our first step was teaching classes in person and getting a gut feel for effective learning.

We asked, “What does effective learning mean? How do these techniques work in real life?” We tried them in the classroom to see the results. Even though it’s impossible for a human teacher to fully leverage all these techniques—it’s just too much work—you start to understand how they function and why they matter. When you solve a problem manually first, you position yourself to automate it in a way that actually works.

It’s too easy to build software thinking it will solve a problem, invest all that effort, and then realize the solution is defective—missing critical pieces. Starting manually, at the lowest level of scale, and then ramping up afterward is the way to go. That’s probably the biggest takeaway.

The second takeaway, related to doing things that don’t scale, is that when you start small, you work from first principles. A common mistake is trying to tackle something big and, because it’s so big, relying on pattern matching—copying what others or bigger companies have done. But working at the lowest level allows you to see the real levers being pulled. You discover that many things people think are common knowledge are actually false.

In education, this is particularly true. There’s so much misinformation, like myths about learning styles or “neuromyths.” People focus on things they think move the needle, but they aren’t working at the first-principles level to manually improve student outcomes and hold themselves accountable. Instead, they copy what someone famous said or what a big company does. To truly figure out what works, you have to start small, from first principles, and identify the key components of the problem.

Justin: Honestly, I think Math Academy is sufficient. Just get to the level of our Methods of Proof course.

If you want to use Math Academy as your primary resource to prepare for an undergraduate math degree, here’s what you’d need to do: 1) complete everything up through calculus, 2) take the Methods of Proof course.

Some people manage to transition from AP Calculus BC in high school to a math program and keep up, but there’s often a lot of friction. I’d estimate fewer than half succeed without additional preparation. To avoid that, taking Methods of Proof is critical.

For even better preparation, I’d recommend also taking Linear Algebra and Multivariable Calculus. But at a minimum, mastering calculus and completing Methods of Proof should make you ready.

Justin: I can give a ballpark. That guy didn’t start at fourth-grade math; he placed higher because he already had some background math knowledge. He didn’t explicitly complete every topic since he placed out of quite a bit. But if someone started at fourth-grade math and completed everything in the system, I recently posted baseline XP measurements for all the courses. It came to around 30,000 or 40,000 XP total for a baseline serious student—someone who’s a good student but not perfect.

The system adapts. If you’re struggling with something, you get more frequent reviews. Every time you miss a quiz question, there’s follow-up review. So the total XP depends on how well someone is doing, but for a baseline serious student, it’s around 30,000 to 40,000.

Justin: Just to clarify, that includes all the content currently available. If you’re just preparing for undergraduate-level math and skipping courses like linear algebra or multivariable calculus, you’d focus on Foundations One, Foundations Two, Foundations Three, and Methods of Proof.

In traditional school sequences, they include a lot of material that doesn’t get used in undergraduate math, like circle theorems in geometry or inscribed angles.

Justin: Exactly. When building the foundation sequence as the most streamlined and efficient way to prepare for university-level math, we removed content that isn’t essential.

If someone started at Mathematical Foundations One—which begins with adding fractions—and completed Foundations One, Two, and Three, that’s about 15,000 XP. Adding the Methods of Proof course, which is relatively small at about 3,000 XP, brings the total to roughly 18,000 XP.

If you divide that by 52 weeks in a year, it’s about 350 XP per week, which works out to 50 XP a day. Just 50 XP daily for one year can take someone from adding fractions to being ready for an undergraduate math degree. That’s wild.

Justin: Exactly. The amount of learning per unit of time—even per unit of XP—is incredible.

Justin: Exactly. It’s just this missing chunk of knowledge. To sum up, if you complete Foundations One, Two, and Three plus Methods of Proof—around 18,000 XP—you’d be fully prepared for an undergraduate math degree.

Justin: That’s the truth.

Justin: It’s great to meet you both, Zander and James. If you ever have more questions in the future, I’d be happy to talk more.

Prompt

The following prompt was used to generate this transcript.

You are a grammar cleaner. All you do is clean grammar, remove single filler words such as “yeah” and “like” and “so”, remove any phrases that are repeated consecutively verbatim, and make short paragraphs separated by empty lines. Do not change any word choice, or leave any information out. Do not summarize or change phrasing. Please clean the attached text. It should be almost exactly verbatim. Keep all the original phrasing.

I manually ran this on each segment of a couple thousand characters of text from the original transcript.


Want to get notified about new posts? Join the mailing list and follow on X/Twitter.