Self-Transcript for Scraping Bits Podcast #137 (Round 4): Learning Math is Hard, Proof Writing, Which Order to Learn Math
[0:00] How to get stuff to stick in your head. The importance of retrieval practice: comfortable fluency in consuming information is not the same as learning. Making connections to existing knowledge and/or emotions, exploring edge-cases in your own understanding. How to get stuff to actually enter your head in the first place: the importance of prerequisite knowledge. [~19:00] Math Academy's upcoming Machine Learning and programming courses. Closing the loop on the pipeline from learning math to producing seriously cool ML/CS projects. How to get learners to persist through that pipeline at scale by breaking it up into incrementally simple steps. [~40:00] Why it's worth learning proof-writing if you want to do any kind of mathy things in the future (including any sort of applied math). When to make the jump into proof-writing. What learners typically find challenging about proof-writing. [~53:00] The advantages and challenges of modeling the world with differential equations. The importance of physics-y intuition about how the world works, what features actually matter enough to be incorporated into your model, and how much approximation you can get away with. [~1:14:00] The experience of diving down the deep trench of mathematics (and also coming back to concrete everyday life). [~1:22:00] The advantages and challenges of modeling the world with probability and game theory. The importance of understanding human nature and deviations from probabilistic / game-theoretic rationality. [~1:33:00] The importance of getting through the grindy stage of things, especially at the beginning when you have no data points to look back at to see the transformation underway. You often need to stick with it for several months, not just several days or even several weeks, before you really see the transformation get underway. [~1:54:00] Even after reaching a baseline level of initial mastery, it takes repeated exposures over time for knowledge to become fully ingrained. The importance of spaced review and continually layering / building new knowledge on top of old knowledge. Gaining procedural fluency opens up brainspace to think more deeply about components of the procedure. [~2:25:00] People who hate on vs support others who are on an upskilling journey. Supporters tend to be more skilled themselves. [~2:37:00] Progress update on the upcoming ML course. The mountain of positive sentiment online surrounding Math Academy. Our learners being incredibly supportive to each other. How calculus, linear algebra, and probability work together as prerequisites for machine learning.
Cross-posted from here.
Want to get notified about new posts? Join the mailing list and follow on X/Twitter.
The transcript below is provided with the following caveats:
- There may be occasional typos and light rephrasings. Typos can be introduced by process of converting audio to a raw word-for-word transcript, and light rephrasings can be introduced by the process of smoothing out natural speech patterns to be more readable via text.
- The transcript has been filtered to include my responses only. I do not wish to infringe on another speaker's content or quote them with the possibility of occasional typos and light rephrasings.
Justin: Well, it’s fun to be back. It’s always fun.
Justin: Sure. I am the Chief Quant Director of Analytics at mathcademy.com. It’s an online math learning platform that’s hyper-efficient, individualized, adaptive, and fully automated. Instead of optimizing the return of the stock market, I optimize learning efficiency in students’ brains. I do all our AI, science, and algorithmic-heavy infrastructure. We’re the most efficient way to learn math.
Justin: Just gotta do the work. Just gotta lift the heavy weights. Can’t go to the gym and mess around on your phone. Gotta pick up.
Justin: Totally. A comfortable fluency in consuming information is not the same as being able to pull it out of your head and solve problems with it. Good on that professor for trying to emphasize that. It’s weird that a lot of people, even teachers and professors, fall into that trap where they write on the board, ask the class, “Any questions?” Nobody raises their hand, and they assume, “Okay, good, you all understand.”
Justin: It’s really weird because you don’t even notice that it’s happening until you attempt to retrieve it. You can think that you know something or remember the time when you were comfortably thinking about that thing and consuming that information while not realizing that the information is no longer available for retrieval until you actually try to retrieve it.
Justin: Whenever you successfully retrieve a fuzzy memory or any aspect of it, even just thinking about it, that extends its duration. The silver lining of realizing that you’re fuzzy on a memory is that when you fight through and successfully retrieve it, it stays longer.
If you retrieve something you’re not fuzzy on, it’s not actually increasing your retention so much as just refreshing it. But when it’s fuzzy and you’re retrieving it, it becomes more deeply ingrained.
Justin: That’s good. That’s funny. I agree. Whenever you can come up with some kind of scenario that just hits home in some way—you have a gut or emotional reaction—and you try to entangle that with what you’re learning, it helps.
When what you’re learning resonates not only intellectually but also emotionally, it ingrains deeper. Especially, as you said, with injective and injecting yourself. When you make mnemonics like that, it helps.
Justin: All these new lenses on mathematical topics emerge as you get into higher levels of math. You gain different perspectives on new topics.
At some point, though, a lens that was useful at the beginning stops being helpful. Trying to push an analogy further can actually make it less intuitive. Once you move beyond three dimensions and into six dimensions, the analogy breaks down again.
Justin: They’re kind of edge cases within your own mental schema. They’re less explored areas. Even if they’re not mathematical edge cases, they’re like corners of a room in mathematics that you haven’t checked out yet.
Justin: The easiest way to understand the countability stuff is that it’s ultimately just whether you can put it in one-to-one correspondence—a bijection—with the natural numbers.
In theory, if you had infinitely many fingers, could you count it on your fingers as discrete items?
Justin: The explanation I gave depends on a lot of prerequisites. If you map out the prerequisite knowledge, you can see the gaps that need to be filled.
In addition to surjections and bijections, you need to understand the cardinality of sets, especially infinite sets. You need to know what it means for two sets to have the same cardinality.
For finite sets, it’s just the number of elements, but for infinite sets, it comes down to whether you can create a bijection between them. Once you have that knowledge in place, the one-to-one correspondence with the natural numbers becomes incrementally simple.
It makes sense that if you’re not familiar with ways of measuring set cardinality, the explanation sounds like nonsense.
Justin: They get to a point where they ask, what’s next? What course do I take next?
I have a lot of thoughts on this.
The first thing that came to mind with the guy who regretted locking in is that it totally makes sense. The split between what he regretted and didn’t regret follows a pattern—he regrets the really passive activities, like just watching lectures, consuming information without producing anything. The things that were worthwhile were solving problems.
At some point, when you finish a degree or even go to grad school, you run out of courses. It comes down to producing things.
In the context of Math Academy, it’s still new on the scene, so there aren’t a ton of people who have fully finished all of our course offerings. But right now, you can get up through multivariable calculus, probability, statistics, and math for machine learning. Then you ask, what’s next?
It’s not enough math to just go apply to grad school. It’s not enough to immediately become a machine learning engineer. There’s a gap between our current offerings and what it takes to actually produce mathematical work professionally.
One of our goals this year is to focus on filling that gap.
We’re currently working on a proper machine learning course—not just the math for machine learning, but actually applying that math to understand various machine learning algorithms. It will cover everything from linear and logistic regression up to neural nets, convolutional nets, decision trees, and random forests—all the standard classical machine learning concepts—plus coding projects.
After that, we’ll have a second machine learning course covering transformers, diffusion models, game AI, and modern machine learning techniques.
The goal is that after students complete these two courses, they should be in a position to apply for a machine learning job and be a contender.
We’re working on fully closing the pipeline from learning math to producing with it.
Justin: That’s right. I’ve actually been pulled even deeper into that.
In addition to the machine learning work, I was working on two projects for the machine learning course today. One of them involves re-implementing an AI research paper from the ’90s that evolved neural networks to play tic-tac-toe. It’s fairly simple compared to modern AI research, but at the time, it was a big deal—even before neural nets were really popular.
Neural nets didn’t really become cool in the public eye until the 2000s or 2010s when convolutional networks started doing crazy image processing.
That was one project. Another was a reinforcement learning project on Q-learning. It’s going to be awesome.
I’ve been pulled into these machine learning courses, and we’re also addressing the natural next question—what if someone doesn’t know how to code? How are they going to complete the coding problems in the machine learning course? How will they do the projects?
The answer is that coding is just prerequisite knowledge we need to cover in our system.
In addition to the machine learning courses, we’re working on introductory programming courses. This week, we really got the ball rolling on the first one.
The Introduction to Programming 1 course starts from zero, covering everything from variable assignment and print statements to loops, if statements, functions, and problem-solving patterns that commonly appear when tackling coding problems.
We describe Intro Programming 1 as the course that takes you from zero to solving LeetCode easy problems.
Programming 2 will focus more on object-oriented programming and software development.
Then we’ll have a Data Structures and Algorithms course and an Intro to Computer Science course, which will cover the academic discipline of computer science rather than just programming.
We’re going to have a massive machine learning and computer science arc this year.
Justin: I get what you’re saying.
It reminds me of a common failure mode among mathematically smart people who apply for software engineering jobs. They expect someone to write down a short problem on a piece of paper for them to crank through, just like a standard math problem.
But in reality, it’s different. There’s a complex legacy codebase, and you have to navigate it to accomplish something meaningful. You need to figure things out.
To produce at a professional level, having strong math skills is just table stakes. Other skills need to be developed as well.
From my perspective, you can still address this within a skill development framework. These skills are just so high-level that they come later—after something like Machine Learning 2.
Get students to understand coding, math, and machine learning models, and then have them take on larger, tougher projects. I’m not entirely sure what that would look like, but I could see it being part of a Machine Learning 3 course.
At the same time, I understand that in the real world, you don’t have a course for everything. Courses accelerate skill acquisition, but at some point, you reach a stage where no structured learning exists.
If you pursue this fully—learning everything available—you eventually get spit out into a job doing math, machine learning, software, or something similar. And at that point, you have to learn by getting your hands dirty in a non-optimal learning environment. It’s about persisting in an unfavorable practice environment, extracting learning from it, and still getting things done.
Justin: I’m not sure if I totally agree with that.
I don’t disagree, but I think it’s just one component. It’s necessary, but if you don’t have your core skills in place, even if you’re highly creative in thinking up solutions, it doesn’t matter.
If you can’t execute solutions or don’t have the skills developed to the point where you can think creatively in non-trivial ways, you’re kind of useless too.
To me, it seems like a combination of two necessary factors—hard technical skills and persistence and creativity. What do you think?
Justin: I’m not saying just crunch numbers or only focus on technical skills. I’m saying you really need both.
Justin: That’s true. You get more out of it if you have to scrape the dataset yourself.
The thing is, for a lot of learners, it comes down to breaking things into the smallest manageable chunks. In our case, with an automated system, nobody is sitting next to the learner. Initially, we would want to provide a very simple problem framing—just a small dataset. Take it, use it. Now let’s fit a model you’ve already practiced by hand. Maybe later, we introduce how to scrape data from the web.
What you’re describing is taking on a large-scale project and persisting through all the trapdoors—where it’s more than just fitting the model. You also have to collect and store the data, structure it, and figure out where to get it from. This creates a challenge, and if you overcome it, you gain an enormous amount of learning.
There are definitely people who can take a big leap like this, but most learners—outside of exceptional cases—will give up. If the instructional leap or task is too big, they quit. If it were broken into smaller steps, they would complete it and acquire the skills.
I’m not saying it’s never a good idea to take on big projects, but for learning at scale, most learners will give up. Instead of making it through, they’ll just struggle unproductively. It would be more efficient to break it up.
Of course, there are people with advantages—more background knowledge, more context, more perseverance—who can get through big projects quickly. It all depends on the learner.
Justin: I think you showed that to me. I kind of remember that one.
Justin: I thought, wow, this is sick. It’s almost like producing code.
Justin: Real analysis is on the to-do list.
Real analysis and abstract algebra are the two courses where math majors typically hit the point where things get really hard and abstract. Those are the courses people refer to when they talk about advanced mathematics.
Abstract algebra is on the roadmap for the first half of this year. Real analysis will come after that, but not too far down the line. These topics are definitely planned—we’ve just been building up to them.
One thing we want to do first is differential equations. That’s typically an earlier math course taken before real analysis or abstract algebra, and it’s one of the last missing pieces of core engineering math.
As for why we’re prioritizing machine learning and programming over real analysis and abstract algebra, the answer is market demand. People lose their minds over machine learning. It’s all over the news.
I’d say 75% of the people I’ve met who are interested in learning math are doing it because of machine learning. It has captured their interest. Another 15% are drawn in by other programming applications.
We’ve realized there’s a major need we can fill in machine learning and programming. There’s generally more interest in those areas than in real analysis or abstract algebra.
But we are definitely still going to cover real analysis and abstract algebra. Our goal is to fill out the entirety of an undergraduate math degree. It will all be there eventually—but machine learning comes first.
Justin: That’s the missing piece of the puzzle for a lot of students.
They enter a math degree without proof-writing ability, not realizing it’s essentially a prerequisite. Universities often have an Intro to Proofs course, but it’s not always rigorous. Some students get through it without really solidifying their proof skills. Then they get completely punched in the face by real analysis and abstract algebra—and drop the major.
Justin: You don’t want to lose money.
People don’t accept hand-wavy assurances like, “Don’t worry, the money’s safe.”
Justin: Proof writing is definitely foundational. It shows up all the time.
I guess you can technically get through life without knowing how to read or write, but it makes everything so much harder and more limiting. In the same way, you could theoretically grind your way through advanced math or land a math-related job without knowing proofs, but the likelihood of success drops significantly.
Proofs are essential for communicating mathematical ideas.
Justin: I would agree with that.
It’s funny to call math fun because most people wouldn’t agree, but for those who do find it fun, proof writing usually isn’t what they’re referring to. Of course, some people enjoy it, but most who like math are drawn to solving puzzles, learning how to solve equations, or working with something more concrete.
Proof writing ventures into an abstract, almost philosophical space that makes some people’s eyes glaze over.
The further you get in math, the more committed you become and the more ready you are for proof writing. But another reason proofs are difficult for early math learners—aside from being drier than other areas—is that they are extremely abstract.
If you understand the abstractions, proofs can be incredibly satisfying. But to really grasp abstraction, you need a zoo of concrete examples.
It’s like trying to understand the difference between mammals and reptiles without ever seeing them. If you’ve never seen a lizard, a cow, or a Komodo dragon, it’s hard to comprehend those abstract categories.
It’s the same with proofs. Some people try to learn proof writing before they’ve become solid in arithmetic and algebra, before they’ve seen enough examples of the things being proved.
Without that zoo of examples, proofs can feel like you’re just pushing symbols around. That makes learning harder and the experience even more boring.
I would agree that proof writing is foundational. Once you have your zoo of examples and a real commitment to math, it levels up your understanding.
Justin: It’s so weird.
It’s like pushing your brain to the limit. You know when you go to the gym and do a one-rep max on a lift? I feel like increasing the number of dimensions you’re reasoning about has the same effect mentally.
Justin: That makes sense.
I think Terry Tao has a similar characterization. He describes the same three-stage process, calling it pre-rigorous, rigorous, and post-rigorous.
Justin: Are you talking about force equals mass times acceleration?
Justin: If you discover these fundamental laws, it’s almost like unlocking secrets.
It’s funny you mention this because one of the things on my mind when I first got really obsessed with math was the idea of modeling the entire world with differential equations. I wanted to push that concept as far as possible. It was so cool that you could model all of physics with differential equations—even some social phenomena, like market behavior with Black-Scholes. It was mind-blowing. I just wanted to see how far it could go.
Did I reach the edge?
I was actually disappointed with how early I had to give up. Things became complex so quickly. I’m sure you could push it further than I did, but I ran into a major roadblock—the lack of available data when modeling sufficiently complex systems.
I remember in my senior year of high school, I was obsessed with the idea of having a dataset of all the neurons in the brain—their connectivity, the weights between them. I knew there was more to it, like dendrites and neurotransmitters, but I imagined having all that information in a giant spreadsheet. Could you build a functioning brain from that dataset using the biophysical differential equations governing electrical signaling in neurons?
It was a fun thought experiment. I played around with toy problems, generating fake data, building models, and even exploring the inverse problem—if you had neural activity data, could you figure out the underlying connectivity?
But in the end, I realized that bringing this idea into reality required more than just theory. You would need a way to actually measure all of this in a real brain. It’s been exciting to watch neuroimaging evolve over the years, but a fully high-resolution dataset is still far away.
I realized that if I wanted to pursue this seriously, I’d have to devote my life to lab biology. That was not something I wanted to do.
Justin: That’s really interesting because it intersects with creativity in modeling and also staying in tune with reality.
Physicists talk about this a lot—having a sense for empirical physics. I’ve heard many physicists say that mathematicians they work with are incredibly smart but often lack intuition about what’s actually happening. They don’t always know how to set up the model correctly.
If a physicist frames the problem for them, they can solve it and uncover fascinating implications. But they struggle with making the model properly representative of reality while still being mathematically tractable.
Justin: That makes perfect sense.
You should definitely go through some mechanics textbooks or wait until we add physics to Math Academy’s system. Either way, it’s totally worth digging into.
I think I’ll probably upset some physicists by saying this, but personally, my take on physics is that it’s a great field to get into and pick up a lot of techniques from. But you also need to know when to get out.
If you don’t, you could spend your entire life in that area while missing out on other opportunities where you could apply those skills.
Justin: It’s totally true. If you can figure out a way to use mathematical tools to define or wrangle information, and nobody else does, you get an edge.
It makes sense why people go into finance. It’s pretty easy to exploit that.
Justin: That’s the physics intuition—just knowing what actually matters and when it doesn’t.
If you incorporate too many factors into your model, it technically becomes more representative of reality, but at the same time, it becomes so clunky that you can’t actually use it. It gets so complex that it’s almost as complex as reality itself, which defeats the purpose of building the model in the first place. The goal is to simplify reality so you can actually think about it.
Justin: It’s kind of like computing. It’s kind of wild physics.
Justin: Oh, you’re saying that everything is in a fraction, so it’s a ratio of two things, which gives it meaning.
Justin: That is pretty interesting. When you put a differential equation together, you’re subject to some measurement error in the parameters. What’s the range here?
Especially when those errors compound—if you’re simulating the differential equation into the future, small initial measurement errors can grow into pretty big ones.
Justin: It reminds me of the physics tricks where, if you want to solve an equation but it’s intractable, you might use a Taylor series approximation and just keep the first two terms.
If the other terms are insignificant, you get an error bound that says your result is accurate to something like 0.000001. If it’s good enough for NASA, it’s good enough for you.
Justin: Oh, that’s interesting. So this passion for math—when you were first learning computer science and coding, you didn’t feel the same level of intensity?
What was it like when you first got into coding? Was it just a lukewarm interest, or did you think it was cool at the time?
Justin: I’m thinking of this meme from a Reddit post or something. Do you know the deep trench of math?
For anyone listening, if you just look up “deep trench of math” on Google Images, I’m sure it’ll come up.
It starts at sea level and descends deeper and deeper underwater. At the top, you have basic arithmetic and algebra, then it drops down into calculus and real analysis. At some point, you’re in deep-sea animal territory doing things like real numbers and cohomology.
Then it keeps going even further into concepts that barely make sense, like the prediction of random sequences. That’s a good one.
Justin: I remember the incompleteness theorem. It was deeply troubling to a lot of people.
But there’s always the question—how big is this space of things that cannot be proven? I don’t know. It just makes you think.
Justin: I remember when I first read about the incompleteness theorem. That was around the time I was going deep into differential equations, trying to model everything.
The way it hit me was this: math is incredibly powerful, and you should push it as far as you can. But at the same time, you have to recognize that you’re playing an unwinnable game. If you try to see the math in everything, at some point, you have to step back. If you want to enjoy the rest of your life, you have to return to reality and figure out how to integrate math into your life.
It’s about balancing the pursuit of these deep mathematical ideas while not letting them consume you entirely. For me, math is a tool in service of agency in reality.
Justin: It’s like taking a rocket ship out into another galaxy. You just keep going, and it’s amazing, but at some point, you’re running out of food and supplies. You’ve stranded yourself out there.
Justin: That’s true. It leads to an interesting transition into the psychology of people playing games, especially things like loss aversion.
There are decisions that are mathematically suboptimal in the sense that they don’t maximize expected return, but people hedge based on other factors.
I have an anecdote about probability theory as well.
Justin: Normally, that’s something you might have to learn from experience the hard way. But in this situation, you were able to think of it as a game, which made the inference easier.
Justin: After my deep dive into differential equations, I got into game theory as well.
Funny enough, that was my resolution to the complexity problem—there are too many things to measure. With differential equations, things just get too complicated at some point. What do you do? You abstract things away, figure out the rules everyone is playing by, and work from there.
Of course, the moment I really got hit with the realization that people are not rational actors, that was disappointing. I thought, Wait, I actually need to keep talking to people, understanding human nature? If you want to predict the game, you need to know how people are playing it.
Justin: It comes down to definitions—what exactly are you trying to do? What are you trying to accomplish with your modeling?
Justin: That’s true.
We have the whole explore-exploit ratio in reinforcement learning, game theory, and other disciplines. You need to explore the space early on; otherwise, you risk getting sucked into locally optimal strategies while missing out on better options. If you don’t take time to sample early, you might lock into what seems optimal in the short term without seeing the full picture.
That reminds me of Bloom’s talent development process. In the early years, people sample across different activities—kids playing multiple sports, joining various clubs, trying different interests without serious commitment. Then the second stage is when they go through the rigorous years of specialization, committing fully to one thing.
You look at your sample space and decide—what do I really feel attached to? What am I most willing to commit to? Then you commit.
This explore-exploit ratio seems to come up constantly across so many domains, even in daily life.
Justin: Totally true. You can look at someone else having a great time while you’re having a less great time. Then if you go join them, thinking you’ll have a better time, you might actually end up having a worse time because you don’t even enjoy what they’re doing.
Justin: Exactly.
Everyone has to go through the grindy stage, but you don’t need to spend your entire day grinding through it.
We talked about this before in the context of exercise. Just wake up and do 30 minutes a day. Even 20 minutes. Just do it seriously and intensely, and you’ll go a long way.
When you apply that same principle to something like math, you come out the other side with real skills that you can use. Then it becomes fun, and suddenly, you want to do it all day because you enjoy it.
Justin: Exactly.
Justin: Now you’re reaping the rewards.
That’s awesome—you’ve reached a point where you can look back and see the complete transformation. You stuck with it long enough that you now have a “before and after” version of yourself, intellectually.
And the best part is, there’s still so much further to go. You’re going to have this same moment again, six months from now, looking back at today. Or even four months. You can repeat the cycle over and over.
Justin: You have to have a lot of faith at the beginning to push yourself through it.
I think that’s one of the hardest things for people who want to start a transformation—whether it’s learning math, getting jacked at the gym, running a marathon, or learning an instrument. Sometimes, they just don’t have faith that the process will work for them, so they give up too early before they can see results.
They’ll spend one or two weeks on it, but at that point, they haven’t put in enough time to see the difference in themselves. You really have to stick with it for several months, fully embrace the struggle, and just grind through it.
After several months, you finally start to see the difference.
One of the Math Academy students posted a video on Twitter about his experience. He used to have major anxiety about math, and it took a full 60 days for it to completely go away.
If I remember correctly, for those 60 days, he would wake up not wanting to do it, but he forced himself to push through. Eventually, his brain adjusted. He described it as realizing, Okay, we thought this was a temporary thing, and we hated it. But now we see this isn’t stopping, so we’re going to rewire ourselves to not hate it as much.
It takes a couple of months, but most people don’t stay on the train long enough to see the progress.
Justin: It’s really a threshold thing.
You can’t just do two weeks on, two weeks off, one month on, then stop. You have to stick with it consistently to get over the hump and complete the transition.
Justin: That’s a good point.
Giving yourself that relaxed time is important. If you’re grinding through something you hate, it’s a good idea to reward yourself afterward. That way, you feel like you’ve earned it.
And if you want to experience that same reward the next day, that’s fine—but you have to earn it again. That cycle reinforces itself.
Justin: You were going crazy.
Justin: It doesn’t sound like you’ve gotten off track at all. It just sounds like you’re doing some kind of cross-training program.
Justin: That’s a good point.
If the goal is too far away to reach in several months or a year, but you can reach it in two years, then just start. The time is going to pass either way. At the end of it, you can either have achieved your goal or not, and you’ll still have plenty of life left.
Justin: What’s the goal? Build the ultimate online learning platform for these math-heavy subjects.
Jason and I talk about this a lot. We know we have so much work to do—getting these machine learning courses shipped, integrating programming, and completing the rest of the math degree. But we’re going to get there.
In the not-too-distant future, we’ll have a full undergrad math degree, machine learning, programming—everything. It’ll be really interesting to see what kind of opportunities that opens up.
One thought Jason and I were discussing the other day that I found really cool—Jason first had the idea not too long ago—is that beyond schools, companies also need a way to assess skills.
Right now, when someone applies to a company, the company needs some way to evaluate their abilities. In software, that usually means whiteboard interviews and solving LeetCode problems. It’s a metric, but it’s nowhere near a full picture of what someone actually knows.
And if someone makes it halfway through the interview but falls just short of the required bar, what do they do? Just grind more LeetCode problems?
It would be amazing to have a more comprehensive skills assessment. If someone doesn’t meet the bar for an employer, they should have a structured way to work up to it, rather than just blindly grinding.
That’s further down the line, but it would be another pillar worth building.
Justin: When you first acquire a skill—whatever the skill is—you don’t see the full picture. You can’t execute it elegantly, and you don’t have full context around it. It just takes time for it to settle in.
Eventually, you hit a point where it feels fully ingrained.
I’ve heard something similar from people who have used the system for a long time. Some concepts or skills that initially felt shaky when they first learned them started making more sense after another month or so. They stuck with it, and over time, they started seeing things they hadn’t noticed before—especially when concepts came up again in reviews.
I think that’s what’s happening here. There are two factors at play: stepping away and then revisiting, and gaining a fresh perspective from other sources before coming back.
The general idea is that repeated exposures, especially after consuming different information, reveal connections you didn’t see before.
I actually remember reading some spaced repetition studies on this. In addition to strengthening retention, spaced reviews—especially with expanding intervals—help with generalization. People who review over time actually end up understanding concepts better than those who feel like they’ve mastered everything upfront.
The power of spacing and fresh perspectives.
Justin: I would agree.
One area where I’d add another perspective is that intuition often gets built through repetition. Not in the sense of mindlessly solving problems without thinking, but through doing something enough times that it becomes muscle memory. That frees up your brain to focus on deeper details.
If you’re actively thinking about what you’re doing, you start to notice patterns and connections that you didn’t have the brain space to consider before. At least for me, repetition leads to that same feeling it in your bones intuition.
I completely agree with going back to axioms and first principles, but if you try to do that too early—before you’ve built any muscle memory with problem-solving—you’re missing a lot of context for where the first principles are actually leading.
Justin: When you say understanding why after solving, I’m curious—why wouldn’t that occur during the problem-solving process? Assuming you’re actually recalling from memory and not just copying steps from a worked example.
Justin: So if you’re solving a problem that involves a formula, and you’re not copying it from a worked example—you’re recalling it from memory—but you’re also not thinking deeply about what the formula actually means…
Justin: I think this is something we try to incorporate.
After a student gets comfortable using a formula, we typically build on it with further topics that go deeper. Eventually, this culminates in proofs related to that concept.
We don’t introduce proofs at the very beginning because most learners find that overwhelming. But it makes sense that once you’re comfortable using a formula, the next step is to explore its deeper structure—like proving that the formula is actually true.
It reminds me of when we talked about calculus and real analysis. Do you remember when you were learning calculus, going through all those derivative rules, and thinking, Where did these even come from?
I was explaining why the math curriculum is structured the way it is—why calculus comes before real analysis in essentially every math curriculum. It’s the same idea: you need procedural fluency first to free up brain space to think about the deeper components of the procedure.
It sounds like what you’re doing is working through Math Academy topics, getting introduced to new ideas, and then becoming so interested in certain ones that you feel the need to dive deeper. Your brain just wants to go on an intellectual excursion—why is this formula the way it is?
Have you taken the Methods of Proof course at all?
Justin: The full thing or just part of it?
Justin: I bet you’ll run into a number of topics that layer onto things you’ve already seen.
For instance, the concepts of accountability and cardinality of sets—we have an entire section on that in the Methods of Proof course, probably around 10 to 15 topics dedicated to it. You’ll learn all about countability, the differences between countably finite, countably infinite, and uncountable sets, and the bijections with real numbers we were talking about earlier.
You’ll also see Cantor’s diagonal argument for why the real numbers are uncountable.
When we initially introduce these ideas—maybe in probability and statistics—we briefly mention that discrete random variables need to be countable. But we don’t go deep into the full rabbit hole of countability at that point.
This material is either already covered in some courses or part of post-requisite courses we haven’t created yet, like real analysis or abstract linear algebra. We build up to these concepts incrementally.
But if you’re going through your lessons, come across a formula, get familiar with it, and suddenly start asking, Why is this the way it is? then you might jump off the system and go down the rabbit hole on your own.
That makes sense—it would have the same effect of deeply ingraining your memory and building intuition.
I think it’s just a matter of when that happens.
You’re definitely on the end of the spectrum where you have more mental capacity free to think about these things. Your typical math learner, on the other hand, would be completely overwhelmed by going down those rabbit holes too early. They need a lot of practice and skill-building first, to reach a level of automaticity.
It makes sense why your approach works for you.
Justin: I think I remember that one.
Justin: This makes sense.
It sounds like you go depth-first into everything until you hit too much friction, and then you’re like, Okay, fine, I guess we have to stop here. Then you go back and apply the same depth-first approach to the next topic until you reach a similar point.
Justin: When you first looked at this formula, the graph didn’t immediately pop into your head?
Justin: Right. Lambda is basically a stretch factor in the graph, but in addition to stretching horizontally, it also compresses vertically because the total probability has to sum to one.
The easiest way to think of it is just graph transformations—pushing lambda in one direction makes the graph go down.
Justin: Sounds like some elements of the abstraction were just a little unfamiliar.
Where would you have seen e^λ before? Probably nowhere, unless you came across it on your own.
If I asked you to compare the graphs of e^(3x) versus e^(7x), you could probably visualize that easily. But e^(λx) is a bit trickier—it adds more cognitive load.
Justin: It probably depends on a lot of contextual factors.
Anytime a student encounters a lesson, there are so many variables at play—how strong they are on prerequisites, how recently they’ve seen them, how deeply they understand them, and how much transferability they’ve developed.
Some students breeze through certain lessons but trip up on others. Even the smartest students experience this all the time.
I remember in Math Academy’s school classes, we had eighth graders taking BC Calculus. Some kids were insanely fast at mental arithmetic but couldn’t graph to save their lives. Others were great at graphing but struggled to add fractions with different denominators.
Everyone has a different knowledge profile with gaps in random places. It’s just expected.
Everyone has those well, duh moments, and they see those same moments happen to others too.
Justin: You went from hating statistics to getting deep into probability. And also debating the proof question.
Justin: Take advantage of your inspiration.
Things move quickly when you’re inspired to do them.
Justin: They don’t even acknowledge it.
Justin: I bet the fuck yeah people are a lot more mathematically capable than the haters.
Justin: It’s really weird, but it makes sense.
The people who have put in the work to reach a high level of excellence want to see others succeed too.
It’s usually the ones who wish they were capable but don’t have the willpower to train that get bitter. They hate seeing someone else become the person they wish they could be.
They also tend to see everything as a competition, like it’s a zero-sum game. But the people at the top know that if more people got good at math, we could all do cool shit together.
Justin: Yeah, I bet another part of it is just general dissatisfaction with life.
You can totally imagine someone who wishes they were somebody else but isn’t making progress toward it. They’re just unhappy, and instead of fixing it, they try to spread that unhappiness.
Unhappy people try to make others unhappy. Happy people try to make others happy.
Justin: If only you could model this with a differential equation. That’d be nice, but unfortunately, I don’t think there’s any way to get this information other than just interacting with people.
Justin: That makes perfect sense. No plan survives contact with the enemy, right?
The plan itself doesn’t even matter as much as what it gets you to do in the next month. As long as it drives action and keeps your trajectory moving forward, then it’s a good plan.
Justin: You just keep a portfolio of productive things you can do, and then switch to whatever you’re most inspired to do at the moment.
When you get bored or fatigued, you switch to something else. This can play out hour by hour during the day or week by week, but the general idea is the same. Just keep pushing the boulder in whatever direction it needs to go.
Justin: I should eat some dinner before I sleep. It’s 9:15.
This was great—a really fun chat. I always feel like we come away with something valuable. It’s funny how whatever we end on always somehow connects back to where we started.
Justin: I’ll be happy. Always happy to chat with you.
Justin: We’re aiming to release it this spring.
I’m working on it every day.
Justin: The internal deadline—don’t hold my head to the fire on this, or Alex’s, or anyone’s—but we’re shooting for an end-of-March release. That’s the goal.
Definitely well before summer. It’s imminent.
I’m working on this every single day. Alex and his team are working on it every single day. We’ve got multiple PhD mathematicians cranking out content day in and day out.
I was just reviewing topics on support vector machines, the kernel trick, dropout regularization, and neural networks. The ball is rolling.
It’s really exciting to watch this come together because it’s something I wish I had when I was learning.
Justin: And that’s the goal—get people to go from I don’t know how to add fractions to I know how to add fractions, and yesterday I built a transformer or a diffusion model.
It’s going to be great.
It’s literally going to create these gigantic math arcs. Of course, there’s a lot of work involved in getting from one end to the other, but at least we’ll have the rails for people to move efficiently.
Justin: The funny thing is, it already seems fake to some people.
I saw a Reddit post where someone was saying, There’s so much positive feedback online about Math Academy, I don’t know what to believe, and they were asking for honest opinions.
Of course, all the comments were positive, so I think they got converted.
Then there was this other person on Twitter—this was months ago, back in the fall—who pointed out how many people were talking about math on Twitter. They were convinced it was part of some massive psychological operation, funded by Big Math Academy.
Then, ironically, a bunch of people heard about that tweet and started accusing them of being part of the psyop.
I get accused of being in it too.
Anyway, I love all the positivity around it. But the best part is seeing our learners become fuck yeah people for each other—cranking on learning math, being supportive online, pushing each other, and celebrating wins.
Justin: Yeah, that sounds good.
Justin: I think everything in machine learning ultimately serves probability.
Of course, linear algebra and calculus are super important, but mainly because they show up in probability.
Linear algebra is necessary for handling matrices and vectors, especially when differentiating functions that take them as inputs and outputs. Calculus is essential for optimization, but both are ultimately in service of fitting probabilistic models.
So, my answer aligns with your intuition—probability is the big one.
That said, calculus and linear algebra are also important, but more as supporting pillars.
Justin: That’s a good way to put it.
Probability is the top-level framework for understanding what models are doing.
Linear algebra provides the structure of the models, and calculus determines the values you should put into that structure to make the model behave probabilistically the way you want.
Justin: Oh, yeah, I’ve seen that one. It’s great.
Justin: My pleasure. Always happy to chat.
Prompt
The following prompt was used to generate this transcript.
You are a grammar cleaner. All you do is clean grammar, remove single filler words such as “yeah” and “like” and “so”, remove any phrases that are repeated consecutively verbatim, and make short paragraphs separated by empty lines. Do not change any word choice, or leave any information out. Do not summarize or change phrasing. Please clean the attached text. It should be almost exactly verbatim. Keep all the original phrasing. Do not censor.
I manually ran this on each segment of a couple thousand characters of text from the original transcript.
Want to get notified about new posts? Join the mailing list and follow on X/Twitter.