Self-Transcript for Scraping Bits Podcast #116 (Round 3): Essential Math for Machine Learning, Math Intuition/Creativity, Proof Vs Computation
Why go through lots of concrete computational examples first before jumping into abstract proofs. The importance of having a zoo of concrete examples. The evolution of Math Academy's content. How to identify the right "chunks" of information and the right prerequisites for the knowledge graph. How to continue learning math as efficiently as possible after you finish all the courses on Math Academy. Frustrations with the lack of existing ML learning resources. How to know whether you're ready for ML projects or you need to learn more math. The blessing and curse of intellectual body dysmorphia. Harnessing reality distortion as a helpful tool. Journaling and documenting one's life.
Cross-posted from here.
Want to get notified about new posts? Join the mailing list and follow on X/Twitter.
The transcript below is provided with the following caveats:
- There may be occasional typos and light rephrasings. Typos can be introduced by process of converting audio to a raw word-for-word transcript, and light rephrasings can be introduced by the process of smoothing out natural speech patterns to be more readable via text.
- The transcript has been filtered to include my responses only. I do not wish to infringe on another speaker's content or quote them with the possibility of occasional typos and light rephrasings.
Justin: Thank you. Happy to be here. We always have such interesting conversations.
Justin: It makes a lot of sense. Math is one of those subjects where you can’t go super deep on it. It just branches out so much. There’s the math itself, the training of your ability to do math, the emotional experience of it, and so many dimensions.
Justin: You get this combinatorial explosion. We can talk about emotions while scaling up for this combination of things XYZ, just exposed.
Justin: You accumulate lots of content, and the content is evergreen.
Justin: That’s great.
Justin: That’s a great feeling—being able to teach anyone.
It’s kind of interesting how you mentioned going back to learn the derivations, the proofs, and the foundations of these things. When you think about building math from the ground up, these are technically the first principles underlying all these topics. But typically, as you go through the math education system, what you’re doing with calculus derivations and proofs is an introduction to real analysis, a course that comes after math academy or typically after calculus. After doing computational approaches, you go on to do the proof style.
Justin: I get what you’re saying. If you can read a proof and understand it fully without going through concrete examples, then why not just do the proof? You work through it, reproduce it, and move on. But the thing is, proofs are typically one of the hardest things for math learners. If you take a calculus student and say, “Instead of practicing the power rule, we’re going to go through a proof of it,” most students would struggle. Maybe 0.01% of students can follow something like that, but they already have a background knowledge of derivatives. They’ve made these mental leaps or reached a knowledge state that most students would need to grind through concrete examples to reach. For example, knowing that when you differentiate a curve, it gets less curvy. Most students need to go through many examples to reach this kind of intuition. Only a very few students can make these big generalization leaps, so they can stomach proofs earlier.
Justin: When I was learning math, I was in a similar category. A lot of people would need to grind through examples, but it took me fewer examples to get to the next step. I still had to do examples, but I processed them mentally by thinking critically about them. It took fewer examples for me to be ready for the next step. For a long time, I thought that was the ideal way to teach math. But when I started tutoring and teaching a broader group of students, it made a lot of sense why traditionally you go through compute-heavy classes before proof classes. For most students, if you take a larger leap into a more abstract or technically complex setting, you lose them. Their eyes glaze over, and then you start trying to recover, but they’re not really paying attention and become demotivated. It works better for most students to grind through compute-intensive examples first.
Justin: I think your compiler analogy is actually really good. You learn a programming language, and the first language you learn is probably not assembly. You’re not going to build your computer science knowledge from the literal ground up, like starting with assembly when you can just solder your own circuits. There’s value in that, but at the same time, there’s a level of sophistication required before you can really capture that value.
For most people, especially computer science learners, it’s probably something like Python. But if you have more of a computer science motivation and a greater aptitude in that area, you might start with C or something a bit lower. It requires more motivation and a genuine interest in the material the lower level you start. Once you learn one of these mid-level or high-level languages, it’s a good idea to go down a level of abstraction. In math, this would be like learning proofs. You see how the tools you’re using are actually made, and then you can manipulate those fundamental building blocks to create new things.
Justin: You learn the standard tools, then break down the standard to figure out how these tools are built. Once you understand that, you build your own tools. Hopefully, you’re working in an interesting discipline where there are unsolved problems. You create tools to solve these problems, and now you’ve created a lot of value and are having a lot of fun. You’re creating new math.
Justin: That’s totally true. For people who don’t want to become mathematicians but want to use math, they’re in a similar situation. They start learning high-level concepts in their domain, and for them, the underlying building blocks are the math. It’s not the proof-based math but the computational math. They learn these computational math concepts and use them to create high-level tools in their domain.
It’s the same journey: starting high-level, going down to lower levels of abstraction, and coming back up. For some people, it’s a journey from computational math to proofs and back to computational math. For others, it’s a journey from software tools in a discipline, down to how computational math works behind those tools, and then back up to new tools in that discipline. There’s no harm in knowing all the layers of abstraction. It’s super hard to know a discipline really well, understand the computational math behind all the tools, and also be solid on theoretical foundations. More math is never a bad thing.
Justin: I would say that the point of doing computational exercises is to help build up this more general understanding. For most students, you can’t just jump straight into this generalized understanding. You can’t walk into a high school calculus class and say, “Alright, today the textbook has a section on the power rule, but we’re not going to do any examples. We’re just going to go through the proof, and then you guys will be able to do stuff with the power rule.” You’d get a bunch of stairs. Maybe if you ran this experiment 100 times at 100 different schools with 30 kids per class, you might get a 0.1 percentile student who has some background knowledge, has seen examples before, and has a high generalization ability. They’ll be able to follow.
But the way to look at the concrete examples is that the goal is still to build the underlying knowledge structure. Most students can’t handle jumping straight into a more abstract treatment, so the computational problems scaffold them. They get enough of the underlying knowledge in place that, by the time we go to shore up the foundations in the abstract sense of the proofs, they actually have some structure built up. It’s like having a spot in the gym.
The computational examples serve as scaffolding. There are a lot of students who think they don’t need computational practice. They end up going through thinking that until they’re held to account on actually solving problems, and then they realize they don’t know it as well. It’s always a good idea not to burn time grinding through tons of the same type of problem and saying, “Okay, I get it, let’s move on.” But it’s also easy to overestimate how ready you are to understand the underlying proofs. Many students struggle with knowing whether they’ve actually understood a proof.
It sounds like you have a solid sense of that, that you can go through a proof, and your mental compiler is accurate. But many students I’ve worked with have trouble with proofs. They’ll construct an incorrect proof and be convinced that it’s correct. Or they’ll be handed a proof and asked if it’s correct, and they’ll say yes, not realizing there’s an error. You can also give them a false statement and ask them to come up with a counterexample. Even if the statement is actually true, they’ll still come up with a counterexample and claim the statement is false. A lot of students have trouble knowing whether they’ve accurately understood a proof. This is where the concrete problems serve as scaffolding because it’s very unclear whether you’ve got it correct or not, whether you actually understood and executed the thing.
Justin: That makes perfect sense.
Justin: You got to a level where you can look back at the proofs and appreciate the elegance that went into constructing these mathematical tools. My prediction is that it sounds like you’re experiencing a lot of “aha” moments right now, and things are clicking into place. That’s exactly what you’d expect if you go through a lot of concrete problems, get familiar with the tools, have some unresolved questions, and then go down a level of abstraction. Once those questions are answered, everything just clicks, and it makes perfect sense.
Here’s the thing that I think wouldn’t work as well: If you went to a subject you haven’t worked out with concrete examples, for example, linear algebra. Let’s say you hadn’t done computations with matrices, hadn’t computed determinants, or hadn’t computed eigenvalues or eigenvectors. If you started with the proofs of everything, it would not work out well. It would be confusing to you, especially things like spans and bases, because you wouldn’t have the concrete examples to make sense of them.
Here’s a concrete example of what I’m trying to illustrate: Let’s take the span of vectors and determine whether a set of vectors is dependent or independent. The span of vectors is the space covered by all linear combinations of those vectors. You add the vectors head to tail, and you can do that as many times as you want, with any multiples of the vectors. You get this result if you see examples with actual two-component vectors. For example, with the vectors (1, 2) and (5, 8), you can add multiples of them head to tail or negative multiples to go backward. You can reach any point in the coordinate plane if these two vectors do not overlap with each other. If they do overlap, you’re limited and can only go in the direction they both point.
If you haven’t played with this computational example, if you haven’t solved problems to determine whether a concrete point in the plane can be represented as a linear combination of vectors, then if you go straight to proofs about linear dependence, independence, and the span of vector spaces, you won’t have a concrete understanding of what the symbols mean.
I’ve tutored quite a few people who started out with Axler’s Linear Algebra Done Right, which is very proof-heavy. The question they had was, “What is a vector? What is a span?” They didn’t know what these things meant. They knew it was something like “a times v1 plus b times v2,” but they didn’t understand what the symbols represented. It felt like pushing symbols around. It’s like trying to describe a zoo and the characteristics of animals to someone who’s never been to the zoo or seen animals.
From their perspective, these words were just made up. You might say, “A lion has a mane, it roars loudly, and it tries to eat gazelles, who run away.” But if they’ve never seen this play out, they won’t really understand it.
Justin: Right, the point of proofs, derivations, and abstract math is to compress all these concrete examples, distill them into the densest form, and compact the most information into the least amount of symbols and writing. If you have those uncompressed examples in your head, the act of compressing them into the distilled idea feels euphoric. You get a huge dopamine rush. It’s the “aha” moment, when things click into place and you think, “Oh, that’s awesome.” But if you don’t have those examples, you’re just scratching your head, wondering if there’s something you’re supposed to be getting or taking away. It just seems unclear. It’s like the chicken and egg problem.
Justin: Totally. I think that’s the way to do it. There’s this whole debate in math education, where some people ask whether you should teach the procedures or the concepts first. Some say you don’t need procedures if you know the concepts. Others say concepts are overrated, and if you just do the procedures, the concepts will naturally fall into place. People try to break concepts, procedures, computational math, and proofs into two separate bits, claiming you only need one or the other or you should do one entirely before the other. But in reality, they’re mutually reinforcing. I read that terminology in a paper, though I can’t remember the name. It struck me as the right way to describe it. They’re mutually reinforcing. You kind of interleave them, alternating between the two. It’s a dance between both of them.
Justin: I know what you mean. As you go through a computational course, there are times when you’re ready for a proof on some underlying material.
Justin: Yeah, you need to know what the symbols actually mean that you’re working with.
Justin: Kind of like a foreign language. I agree. Before working on concrete examples, you need to understand what the symbols are that you’re working with. You need to loosely understand what you’re working with. Sometimes, the explanation can go deeper with a derivation—like a quick derivation. Other times, the derivation is so long-winded that a good explanation will skip over it and just give you the highlights. These are the players, this is what they do, you do some examples, and revisit this later. If you don’t know the meaning behind the symbols you’re working with, it makes it hard to even remember what you’re doing. Imagine trying to teach a kid multiplication tables without them knowing that multiplication is repeated addition. They wouldn’t even understand why 3 times 3 is smaller than 7 times 6. Exactly.
Justin: Totally agree. If we don’t have that already in our lessons, then that’s definitely something we need to add—the meaning of the bar, such that or given. I’d be surprised if it’s not there, but if it’s not, I’ll check after.
Justin: Right, you see it in the intro topic, then later you may forget what the bar means, and it’s not on your mind. Then you feel like you’re grinding through something you don’t really understand. It’s just fragments.
Justin: That would be such a big help. I think it would increase the rate of learning exponentially. It’s definitely on the roadmap. That happens to me a lot—I go through things quickly because I’m doing a lot at once, and I skip over some stuff. There’s always a portion that isn’t retained. For example, one lesson mentioned something, but then another lesson references it later, and I don’t remember it.
Justin: We actually took a look at that lesson afterward, and there was something we needed to mention more or prepare the student for in the tutorial slide before the lesson. It was about the system of equations that has to be constructed. My guess is that’s what kind of threw you off. In the previous examples, with the previous knowledge points within that lesson, you can always just plug in a value of x that eliminates all the coefficients except for one of them, and it’s just a linear equation with one variable that you can solve. But in the last example, there is a system of equations. You get to a point where no matter what value of x you plug in, you have two unknown coefficients to solve for. So, you have to plug in two values of x to get a system of two equations. This is explained, but we didn’t really prepare the student for it.
Jason asked me to look at that last night, and I wrote a note to Alex about all the things we need to adjust, and he totally agreed. We run tons of analytics on the lessons and generally look at aggregate pass rates. If anything has too low a pass rate over a large number of students, we drill down on where the issues are. But it’s true, there are still some rough edges here and there that need to be buffed out, like this one. We need to add more explanation to the tutorial.
We’re also going to handle the explanation of why, in partial fractions, when there’s a term in the denominator with an exponent, you have to duplicate all the powers of it in the partial fractions expansion. The proof is very involved, and it’s not the best way to understand it, but if you think about it really hard, you can distill the core idea down. It’s a hand-wavy but useful intuition that sticks better. That lesson is getting a makeover now. You’re not stupid; it was a good thing for us to improve on.
Justin: It’s kind of like it’s been hammered into good shape over the years of just having tons of kids use it—middle schoolers, high schoolers—that kind of forces a level of explanation ability. But it’s one of those things where it hurts, or it’s like fixing bugs in a product. Yeah, it’s the same thing as software bugs; it’s just old content bugs. Some bugs are more severe, some are less, and some are just little rough edges that would be nice to have fixed. You can fix 99.9% of the bugs in the product, have it working correctly almost all the time, but there are still some rough edges that pop up here and there. You still have to buff them out.
I think, funny enough, when I first joined Math Academy about five or six years ago, we didn’t even have lessons in the sense of broken-up slides with practice problems interleaved between them. It was just one big tutorial slide, this wall of text with little example headers. That’s about it. I remember Jason asking me if the students were actually reading those. I would go to some TA sessions for Math Academy students in our school program, and the teacher would go through the material. These were summer students, and they were supposed to read the tutorial and do practice problems at the bottom. Jason would create an assignment full of problems on these topics for them to do, and they were supposed to do the problems.
Jason kept asking if the students were reading the tutorials or just glossing over them and not doing the practice questions. I monitored a bunch of these students, and yeah, that’s exactly what they were doing. We had this whole event where we realized that we couldn’t depend on static content where the students were supposed to read the whole thing and then do the problems. We had to present minimal information, check their ability to solve problems corresponding to that minimal information before even allowing them to proceed.
Alex could talk about this much more than me, but it has evolved so much over the past five years, and I’m sure the next five years will continue to bring improvements.
Justin: That’s pretty much it. I remember there have been many content discussions with Jason, where he’d say, “It needs to be simpler, it needs to be simpler. I don’t even want to read this. If I don’t want to read this, the kid doesn’t want to read this. Make it simpler.” He’d fiercely hack away at anything that’s not essential. Otherwise, you end up with this huge, thousand-page tome. I don’t know if you’ve seen these gigantic textbooks on all the math you need to know for machine learning.
Justin: It’s a very mid-wit approach to intimidate your reader by showing off the size of your vocabulary. Okay, great, you have a good vocabulary, but now you’re losing the person who’s supposed to be reading your work.
Justin: That’s true. I am sketching out a lot of this machine learning course because Alex is tasked with turning out a lot of content in a very short time, and he needs my help. We’re not going to lag on this machine learning course. We’re going to push this thing through, whatever it takes. What that takes is me pushing the boulder on this course, and that’s what I’m doing.
Funny enough, this is less about me joining the content team and more about me returning. When I started with Math Academy, I actually began by making instructional videos based on those page-long tutorials. This was our initial solution to why kids weren’t reading the tutorials and working on the practice problems. Our first solution was, “Let’s make videos so they can watch the videos.” I made hundreds of these five to twelve-minute videos, explaining things, but it turned out the videos weren’t that helpful. They were better than a single page tutorial with no way of checking if students were paying attention, but students would zone out during the video. They wouldn’t work along with the problems; the video was just playing.
You think you’re following along, but it’s really the equivalent of staring at a page of text. You start out paying attention for a few minutes, but then you gloss over, and you get lazy pretty quickly.
Once we had the realization that lessons had to be broken up into slides with minimal doses of content, followed by actual practice problems you couldn’t move on from unless you got them right, I was the one who went through and converted hundreds of these huge tutorial pages into the lesson format. After converting a bunch of them and writing a lot of lessons, it wasn’t until a year and a half after I started working with Math Academy that we even talked about the idea of an automated task selection algorithm. That’s when I got involved with the real quant side of things.
So this is a return to concept for me, for two or three months.
Justin: I actually got asked a very similar question recently, and I gave this huge, long-winded answer. I realized at the end of it that it all comes down to one thing in particular, and that one thing is something we talked about earlier on this podcast: concrete examples. Here’s what I mean: concrete examples are not just the key to scaffolding yourself up to this lower-level abstract understanding of the composition of these building blocks. It’s not just a learning thing; it’s also a content development thing. Without a concrete example, it’s easy to start waxing philosophically, and students start glossing over. Nobody knows what the hell you’re talking about.
When you’re waxing philosophically, you can get into this situation where you go on and on and on. But if you distill this down into a concrete example, it forces your hand. You have to write out a solution using the building blocks the student already knows. If you’re going on for three pages of computational work to get the student to solve the problem, that means you haven’t chunked the information well enough in the student’s mind through prerequisite material.
Ideally, no matter how big or advanced the problem or topic is that you’re trying to get the student to solve, the concrete example solution needs to be small enough that it fits on one page. It should be a reasonable problem for the student to solve, and you’ll only get to that point if you’ve built up the right foundations, the right prerequisite content for the student.
How do you identify the right prerequisite? A lot of times, this is clear to someone who has learned the material and is familiar with how this is typically done in education. In an algebra or calculus course, there are thousands of textbooks out there. If you grab ten calculus textbooks off the shelf, you’ll find things like the chain rule, the power rule, the definite integral, the limit of a quotient—these concepts are chunked up and universally agreed upon. This is the way you chunk it because it’s what’s worked well.
You piggyback off of that. If you put out a calculus course that doesn’t have that kind of chunking, it won’t even be good from a marketing standpoint. There’s no reason to do things in a weird way. It’ll produce subpar outcomes pedagogically, and you’re not going to sell anything.
Justin: Right. So you’re wondering how to take it if you age out of Math Academy, having completed all the concepts there, and now you’re kind of left stranded. You had your luxury learning experience where everything was nicely served up to you, but now you’re thrown into the jungle and left to fend for yourself. How do you construct a little hut for yourself, or a little micro-system, and build up all your knowledge from your own mind?
There are resources out there. As you get to the cutting edge of a field, the research gradually starts to vanish, but there are still resources. There are textbooks on really advanced math, grad-level and beyond, and blog posts. Eventually, at the cutting edge, there are papers. Even in the papers, your prerequisites are basically the references.
I think Alex made a post about this on Twitter where he mentioned one of the mistakes he made when starting his PhD. He picked a paper off the shelf that he was supposed to improve upon and just tried to dive in without building up the prerequisite knowledge. It wasn’t as efficient as he had hoped. What he ended up having to do was go back to the references in the paper and make sure he understood all those first. If there was a reference he couldn’t fully understand, he would go back to it, filling in the gaps of his knowledge.
I don’t think there’s ever a situation where you have to pull everything from your own mind. There’s always external material you can leverage. The only exception would be if you’re solving a problem that nobody’s solved before, and it’s not like you’re iterating on someone else’s work. If you’re the first one there, then the way to approach it would be to solve the smallest, simplest version of the problem. You start out with this big lofty idea of how it will generalize, but you just simplify. Take out the things that make it difficult until you get to this nugget of a simple case. Then you check it out and solve those simple cases, and eventually, you build up to where you want to go.
Justin: Yeah, exactly. It’s about trusting what you know.
Justin: I think that’s totally what we’re going for. We want to finish that pipeline from learning math to actually doing machine learning out in the real world. That was one of the main reasons this machine learning course got started. I remember Jason asking me one day, “Hey, it seems like there are a lot of people interested in machine learning. How much do you think the interest is around there on Twitter?” I said, “Probably about 70%. I’d say 70% of our Twitter community is interested in Math Academy for machine learning.” He was like, “That’s a lot. What do you think would happen when we drop it?”
I guess that even proves the point more. What would you say it is?
Justin: Exactly. The thing is, as Jason and I were talking about on that one call when we discussed the machine learning course, what he originally asked me was to gauge the sentiment of the Twitter community. The thing that stuck with us most is that we need a pipeline with an end goal. Right now, it’s just learn math for machine learning. Great, but what do you do after you learn machine learning? It’s just, “Yeah, find another course.”
I’ve done this exhaustive search for machine learning resources online while designing this machine learning course. It is rough out there. If you want to learn machine learning, it’s not easy. It’s patchy—this weird, ugly patchwork quilt with holes everywhere. We need to complete this pipeline. You need to actually do machine learning, code some stuff up, and get to the point where you can implement and re-implement real machine learning papers. Ultimately, we want to help you land a machine learning gig.
We want to do the actual zero to hero, not just a cross-section, but the actual zero to hero.
Justin: I guess people just gloss over it. Another thing that complicates the matter is that these courses often assume you come in with the necessary math or they water down all their content so you don’t need a lot of math to do it. But let’s say you find a course that actually gives a decent treatment of the machine learning material and assumes you know all the necessary math. The next question for the learner is, “What is the math that I need to know?”
You go and google, “What math do I need to know for machine learning?” and the results vary so much. Some people say, “Oh, you don’t need to know any math at all.” Other people throw a 1,300-page textbook at you with tons of proofs of obscure abstract algebra things, and you’re left thinking, “Why do I need this? What does this even explain?”
Exactly. You don’t even know what you’re supposed to learn. You don’t know what the core math is that you need. If you’re lucky, you might stumble upon a Reddit thread or find someone knowledgeable who says, “Don’t listen to all the crazies out there. What you need is calculus, probability, statistics, and linear algebra. Those are your three main friends.” From there, you can kind of backfill a little easier.
But even then, you might wonder: Do you need all of linear algebra? Do you need all of calculus, including multi-variable calculus? No, you don’t need to know things like Stokes’ Theorem. You don’t need to know all those intense theorems at the end of a multi-variable calculus course. It’s fine. In most applied subjects, you don’t need those theorems unless you’re doing something like theoretical physics.
That’s a whole other problem: what math you even need to know. And that’s when learners get discouraged. There’s confusion about what you even need to do, and then you rationalize that you don’t end up making a choice. You don’t get on a program to acquire all that knowledge.
Justin: Totally. If you’re in a subscription-based model, like Math Academy, as opposed to just paying for a single course, it’s even better. Our retention becomes inversely proportional to the depth and length of our content. If you only have small cross-sections, people are going to leave right after. But if you have courses upon courses, people will stick around. There’s no reason to leave; might as well keep them. I guess this even applies to people who sell courses one by one. If they have many courses in sequence, it’s the same idea.
You can extend your customers’ lifespan with you if you take them through the whole pipeline.
Justin: It’s definitely on our roadmap to factor purchasing power parity into the pricing. The reason we don’t have it right now is because we’re so new to the scene and we’re dealing with scaling issues. We have so much to do.
The overall goal of factoring purchasing power parity into your pricing is that you’re being kind to a lot of people, making the product more accessible, but you’re also extending your reach. You should come out ahead with people who wouldn’t have purchased it otherwise, but now are, and that should offset. The thing about that is, it also increases the size of your customer base by a lot. If you haven’t solved all of the scaling problems yet—or at least enough of them—then having your customer base grow isn’t an issue if you’re still in the stage where, like us, we’re getting crushed by support.
Sandy is our only support person.
Justin: Yeah, every support email is answered by her, and she also handles all of our operations. She’s just drowning. We’re in a situation where it’d be great to have a ton of extra customers, but at the same time, we’re not trying to spike our growth. The trade-off of having more customers has to offset the trade-off in support, which is a bigger issue right now. We want to get ourselves to a point where that’s not a problem.
Justin: Listen to Justin lament over the fact that we have too many students. I know, I know, it sounds terrible.
But it is a good problem to have. It’s still a problem, though, and something we need to get out of the way and resolve. We need to buff out the product more to prevent early beta discount issues. This is the early beta discount. It’s going to be some product. I don’t know.
But yeah, we’ve always considered the current pricing to be beta pricing because we still refer to ourselves as being in beta. There’s still so much to build.
Justin: It’s been many years of hard work, most of my life. I’ve been with it for six years, Alex for eight, and Jason and Sandy even longer—at least ten years. It’s taken a while to get to the point where we feel that this is something our lives can be centered around in the future. For a long time, it was just an unknown.
Justin: Oh, completely. He made some money from it, but it’s not like this is money he can throw away. It impacts his life. It’s a significant portion of his family’s wealth. I remember when I was living with them during the pandemic, quarantining. Jason would have moments where he couldn’t sleep at night just thinking about how much money was disappearing into this and the uncertainty. The risk that Jason and Sandy took on for this was enormous.
Justin: It got to the point where people would ask me, “Are you still working on that Math Academy startup?” And I’d say, “Yeah, it’s still going on.” They’d say, “Wait, it’s not been going on for three or four years? Are you guys even real?” That was a lot of it. At that point, it was nearly a decade for Jason and Sandy. Especially in the face of so many people saying you need to see profit within a few years to validate your startup. This was the total opposite—just heads down building for so long.
The one thing that gave us some evidence that we were going in the right direction was that we were actually supporting our school program with this product in the school district. We had kids using the product. They were seeing success, and we were resolving issues to make the product work appropriately. Some kids moved away for various reasons, like their parents getting a job, and they continued using the product independently, without a teacher. It worked for them.
On one hand, it was extremely risky, and we didn’t have full market validation. But on the other hand, we did have users. It was almost like a large-scale usability lab, and you could see that things were going in the right direction.
Justin: Oh, yeah, graduate courses, like the ones that PhD students in math would take, they don’t know our first two years of graduate school.
Oh, god, yeah. What did you do? You should do stochastic calculus. Well, okay, here’s the thing: it’d be awesome to do every single field of math, physics, quantitative finance, machine learning—everything. But, as we’ve talked about, math is such a combinatorial explosion that you have to decide what to prioritize. The things we’re going to end up prioritizing are the undergrad math degree and whatever courses people are most interested in. If there’s a lot of interest in stochastic calculus—which I actually think there will be—that could be a course on our radar.
Justin: Totally, totally. Yeah, I think quantitative finance could actually be a solid area for us to expand into the future. But it seems like there’s a lot of excitement about it. It’s very math-heavy, sort of like machine learning, but I think it’s a little less… I think the finance version of machine learning. Right now, given current events with AI, machine learning is more of a craze. But if all this AI stuff weren’t happening nowadays, I bet quantitative finance would be pretty high up there in terms of priority.
But on the other hand, something like algebraic topology would be super cool to have. But it would just have very few people in it. It wouldn’t really offset the cost, at least not at this stage.
Justin: Yeah, that’s a good question. Really, it’s how you go on for your whole life, you know.
You can go on forever in all these different fields. I think you’ve got to come at it with a goal in mind. There are several components to this answer, but that’s the first one. If you don’t come at it with a goal in mind—what you want to achieve, what you want to accomplish, what you want to use math for, or what kind of math you’re interested in, and what kind of field you want to be a mathematician in—you can run this kind of learning routine indefinitely.
You reach the edge of one field, and then you think, “Wouldn’t it be really cool to spin up on this other field, just so I can be more prepared in case I want to do something that involves all the fields?” That’s not always a bad thing to do, but when you run that routine over and over, you get to a point where you’ve spent so much of your life being a student, you haven’t actually produced anything. That becomes a problem.
So, you have to come in with a goal in mind and settle on what math you’re going to learn and what other math is just a shiny, fascinating distraction that’s going to be really fun to learn but isn’t essential.
Justin: I think the solution to that is to simultaneously try to do things you ultimately want to accomplish, like the chain of form or something. For example, if you want to be a machine learning researcher, one thing you need to do is be able to re-implement papers. When you try to re-implement a paper, you get smacked down by your lack of math knowledge. You realize, “Oh, I need to learn linear algebra, probably statistics, right around that area.”
You take some introductory courses in these areas, and then you realize there’s basic linear algebra, more advanced linear algebra, super advanced linear algebra, and even “god mode” linear algebra. There’s even more beyond that—where do you stop? I think the solution is to, after spending some time on linear algebra, making serious progress, and maybe completing an intro course, go back to trying to do the thing you originally set out to do.
Maybe it’s implementing a machine learning paper. You might get to a point where you understand that these are matrices, and you understand what the notation means. But then, some combinations of symbols might seem weird. Why is there some norm of something, and minimizing subject to a constraint, and this other thing squared? That might be an indication that you don’t know classical machine learning, and you need to learn about cost functions, like mean squared error, minimizing things subject to constraint optimization.
You identify these missing pieces of knowledge, and then you can go fill them in. There is a failure mode where you try to go too fine-grained. I once talked to somebody who said, “I don’t see any quadratic equations in this paper. Why do I have to learn quadratic equations?” It was just like, “Dude, just learn algebra. You need to know this stuff. Maybe it didn’t come up in this paper, but it’s probably baked implicitly into some concept. Just take the course on it.”
After you cover a course, go back to the thing you’re trying to do and see what’s missing. I guess that’d be my recommendation.
Justin: I think that sounds alright. Once you get deep enough, you kind of just have a better lay of the land. You know what’s what. But I guess this is one of the reasons why it’s so useful to have a full pipeline from beginner to as close as possible to the thing you want to do. Otherwise, if you don’t have that full pipeline, you’re constantly trying to do something, failing, diagnosing what you’re missing, and then finding some decent resource online, or college, or whatever. If domain experts have mapped out the whole process for you, then you have to take care of a lot of uncertainty.
Justin: Yeah, I think the assumptions are typically where things go wrong for people with strong mathematical reasoning abilities.
Justin: I think first order, you just kind of do it. You operate on whatever information you have and end up getting smacked in the face a lot because you’re making assumption mistakes. You learn from them, but another component is identifying common trends in these mistakes. Everybody has particular blind spots that they’re more susceptible to, given different life experiences and mental models. Some people lean on certain mental models more than others, and depending on how you go about things, you probably have bigger blind spots in some areas.
Once you get hit by incorrect assumptions and notice they happen often around particular blind spots, you learn to be more wary around them. But it also helps to identify these missing assumptions. First, you have to go out and do things. You can’t be doing things that are so high-risk that one failure takes you out of the game and you can’t afford to continue. It’s like the saying in trading: don’t bet more money than you can afford to lose. It applies to anything with risk.
If you don’t have a good sense of where your blind spots are and you’re figuring it out, just be prepared to get smacked in the face by things you didn’t see coming. Don’t make such a big bet that you can’t recover from it. It’s important to be mindful of that.
Justin: That’s a good point. If you ask yourself, “How is this going to go wrong? If this goes wrong, how is it most likely to go wrong?” and force yourself to think more critically about the assumptions, that can help. Do you do that often? Has it worked for you?
Justin: You have the most context. Everyone else who gives a recommendation is missing some context, which may or may not be critical.
Justin: Here’s the thing. I don’t think it ultimately really matters. Whatever you’re more motivated to do, just go and do that. It sounds like the proofs course is what you’re more interested in, so go into that. If there’s anything you’re missing in prerequisites for the proof course, we identify it. If there’s stuff you need to know to succeed in the proof course, we’ll make sure you get that material, maybe in the foundation study. You’ll still receive it before doing the corresponding lessons in the proof course. Worst case scenario, you may have some knowledge gaps in foundation study that you need to fill before you can access some lessons in the proofs course. But, like, I find linear algebra very easy relative to calculus, just because I don’t know calculus super in-depth, which is what I’m doing with proofs. Linear algebra is intuitive to me, but probability theory was beyond not intuitive to me at all. Calculus is kind of like the middle ground, which is interesting because you get imported from a very young age with the idea of a slope and x + b—it’s just an expansion of that.
Justin: Actually, a lot of that is covered in Foundations Three.
Justin: I think the answer to the question is this: whatever you decide on, what are you most excited about, what’s your goal, and just enroll in that course. Then, Math Academy gets you there as efficiently as possible. Whether you complete linear algebra in full before doing math proofs, or vice versa, or complete some topics in Foundations Three that aren’t correct for math proofs—it doesn’t matter. If you go through all the math stuff, you’ll acquire it anyway. Back to what you were saying about being really excited about learning proofs behind some topic, and having that stick in your brain—a large portion of this is because you were excited about it. Excitement does the same thing for retention. If you care about what you’re learning, your retention spikes.
Justin: Exactly, exactly right. It’s like if you’re going to the gym, just do the exercises you’re most excited about. Whatever gets you in the gym and gets you more fit, just do that. It doesn’t really matter the order of the exercises, whatever gets you there and gets you doing more work. It’s all about the habit of doing anything.
Justin: The thing is, the reason why this strategy works in Math Academy is because we’ll backfill your prerequisite knowledge. You can pick whatever course you want to do off the shelf, and if you’re not fully prepared for it, that’s okay. We’ll backfill for you. If you’re missing stuff in calculus one or two for methods of proof, it doesn’t matter. You’ll get lessons on that before you receive those corresponding methods of proof topics.
Or whatever subsets of those courses are needed. Every single topic is hooked up to prerequisites. You can think of it as prerequisite topics. All these are drawn at the topic level. Imagine this body of topics that is methods of proof, and this is just part of this huge knowledge graph of topics. Imagine methods of proof casting a shadow through all the prerequisites and their purposes. That is really the entire course on methods of proof. When you place into methods of proof for us, you’re placing into methods of proof and its prerequisite topics.
There are some conventional sequences, like doing calculus two before methods of proof. But there are plenty of topics in calculus two that are not prerequisites, like volumes of revolution, for instance. Here’s a better example: in pre-calculus courses, students often learn what a matrix is, and maybe they learn how to solve a system of equations with a matrix. They might even learn the determinant of a matrix. Typically, you learn this in pre-calculus and then take a calculus course afterward, but these matrix topics are not prerequisites for the calculus course. That’s how Math Academy works. You don’t use matrices in single-variable calculus.
In Math Academy, if you place into single-variable calculus but you’re missing foundational knowledge, we’re going to fill it in for you. A lot of it will be pre-calculus and algebra, but we’re not going to make you do matrices. The courses are overlapping subsets of the knowledge graph. It’s really the topics we’re talking about, so you’re free to enroll in whatever course you want, all of them, to actually complete the foundations.
Justin: There are two components to it. One component is that the foundation sequence is meant to prepare you for any intermediate-level university course you might want to take. After completing math for machine learning, you might get really excited to take something else, like discrete math or another course. If you’ve filled in all your foundations, you can just jump right into that. If not, you can still enroll in the course, but you’ll have more foundational work to do.
The second component is motivational. If you’re missing a ton of foundations, say you enroll in math for machine learning and you don’t know any of Foundations Three or Foundations Two, you’re basically starting from scratch. Technically, you’re in the math for machine learning course, but you’re spending most of your time backfilling prerequisites from Foundations Two.
You won’t see your progress metric go up on math for machine learning because you’re not doing any topics in that course. You just feel like you’re in this void, thinking, “What did I do today?” You’re working on prerequisites, not math for machine learning, so your progress on that course isn’t moving forward. You’re backfilling in Foundations, but it’s not the same as seeing your progress tick up on a course. When you see it moving, you get excited, thinking, “Yeah, I got 5% done. A couple of days later, I’m at 70%, 80%, 90%, and I completed the course.” That’s really motivating.
If you start with Foundations Two, you experience those small wins, which keeps you on the train longer. This is how it goes for most students, but if you’re super motivated and know you’re not going to fall off, you just want the most direct path to math for machine learning, then yeah, you could go all in. But most people who say that are probably wrong about their level of commitment and motivation.
Justin: I think my recommendation is just whatever you’re most motivated to do. Do the thing you’re most interested in. The fun thing about the math for machine learning course is that it covers a lot of different subjects. You’ve got probability, statistics, linear algebra, multivariable calculus—it’s a combination of various topics. It’s almost like a crossfit workout; instead of doing leg day every day for a year, you’re working out and getting a nice variety. You’re covering a lot of undergrad math in layers. It’s almost like interleaving, right? It’s a little like Foundations Four in that sense. The interleaving between all the content is good for your attention and motivation. If you want to get into machine learning as soon as it comes out, you might as well fill in your foundations so you can jump right into it.
Justin: Yeah, if you’re more inclined to do methods, then do that for machine learning. The only real enemy is just stopping doing math. Keep going.
Justin: If you’re aiming for 250 XP a day, you’re going to be fatigued. It’s like somebody going to the gym for that many hours. Your first hour of working out is going to be the best, the second hour will be a bit worse, and the third hour will be even more challenging. It just goes downhill.
Justin: Four hours can be a bit drawn out. I guess it depends on what you’re able to withstand. There are a lot of factors at play here. You have a very high motivation to learn the material, and you latch onto something and go all in. That’s what it seems like. You have these particular traits that put you in a position to do three hours of work. Many other students start to feel that fatigue after 45 minutes, especially if they care less about the material or aren’t as all-in.
Justin: I know exactly what you mean. It’s like imagining the progress bar is over halfway full. You think, “There’s nothing left. It’s mostly done.” These are both really interesting. I totally relate to both ideas—the fear being a huge motivator and tricking yourself into thinking the outcome is closer than it is. I’ve talked about this with some friends. It’s a form of reality distortion, but it works for you.
Everyone knows that Steve Jobs quote, where he talks about computer load times being too long. He went to the engineers and said, “Do you know how many lives are being wasted because of this?” He broke it down like this: this is how long it takes for the computer to load, and this is how many computers we sell. Then, he calculated how many seconds, minutes, hours, days of a human lifespan are wasted waiting for the computer to load, and how many years are lost each year. It’s a distortion of reality, but it’s productive. You can get yourself to feel like it’s true, and it helps you get more work done.
Justin: Right. If you get demotivated and leave before that, you never know. It’s literally a dream.
Justin: Exactly. It makes all the difference to know that your progress is actually increasing. In hardcore settings, like in Math Academy or the gym, it’s easier to see progress, but in real life, it’s harder to get a real progress measure. One trick that always works for me is to think back to something equivalent in the same area that I was doing one, two, or three years ago. Then, just comparing what I was able to do or my behavior back then to now shows a lot of growth.
Justin: That’s totally false. It’s good that you document your progress, though. You can look back on it. It’s funny how it feels that way, but without that level of intellectual body dysmorphia, would you be where you are right now?
Justin: So we’ve kind of established that it’s both a blessing and a curse. The real trick is—level one in leveraging this reality distortion or dysmorphia—is to have it working for you, motivating you to achieve more than you would otherwise. But I think level two is being able to turn it off when it’s not helping you anymore. And that’s very challenging to do. I can’t do that. I think documenting things that you’re proud of helps with this. There are things I’ve done that I can look back on and think, “I’m proud that I did this,” and that’s a real achievement that nobody can take away from me. But at the same time, you need to be able to switch from that self-indulgent mode to a mindset of, “Okay, these things are crap and I need to improve.”
Justin: Wait, how long ago was this? One or two months ago?
Justin: Three months ago? You went from not knowing fraction arithmetic to doing calculus? You’re at a point where you’ve done enough calculus that you can read through a proof of the chain rule and even derive part of it yourself?
Justin: That is incredible. You need to remind yourself of that every time you feel like you don’t know anything, especially when looking at a topology textbook. This is incredible progress.
Justin: Very similar to that, my wife can do the reality distortion stuff but has a hard time turning it off. I think one thing that definitely helps her is having someone around. I’m kind of an external viewer and can point things out to her that she would deny when thinking about it herself. When I say, “Wow, you did this or that, and that was a good job,” it helps to hear it from someone else. I don’t know if you have anyone in your life who can provide that external perspective.
Justin: That’s a good idea. I was just going to say that it’s kind of interesting how common knowledge is to not compare yourself to other people. But in this situation, that’s kind of happening to you, right? It’s very productive to do so, especially when you think about the level of math the median person in this world knows. If you think about where you are relative to that, or compare it to your friends and how much math they know…
Justin: I think that’s a healthy outlook because everyone has their own advantages and disadvantages. Humans’ working memory is something that basically nobody past a certain point has, it’s such a rare trait. But I’m sure he was also lacking some sort of talent in other areas. Finding the areas where you excel is part of the trick. Sometimes it’s not even talent; I think talent is often necessary, but a lot of competitive advantage comes from just caring about something. We’ve talked about caring about math and learning—it’s the level of care you put into it. The interesting part is that you can’t always choose what you care about, and that’s what leads you down a really interesting path. I wish more people did that, documented their journeys down that path.
Justin: Do you think that makes it easier for you to feel comfortable putting out details about your trajectory?
Justin: That’s true. If you don’t actually know the person, they might as well be honest.
Justin: And then the balance is just… yeah, there are going to be so many more people who reach out with good feedback than bad feedback.
Justin: Yeah, yeah.
Justin: There’s a lot of benefits to it. I don’t actually do you think there are any drawbacks whatsoever, or is it just all benefits across the board?
Justin: Someday, once I do something.
Justin: I do keep a little long bio on my website, which is definitely more than most people put out. But it’s not like a journal. I guess one of my challenges right now is that I have so much time pressure to do everything. It sounds funny as we are here just sitting on this podcast, but the size of my to-do list and the opportunity cost of everything is so big that there are just so many things I care about more than journaling. Maybe I underweight how the benefits of journaling, or just life status updates. I’ve been tweeting more, but it’s still filtered because it’s based on math and education. It’s trying to keep it on brand, but I don’t know. Maybe someday when things calm down a bit.
Justin: Yeah.
Justin: Yes, that’s interesting. I’ll say I haven’t had to really grapple with the depth of losing someone I was very close to, fortunately. The sorts of deaths I’ve had in the family were mostly with people I got to know a little, but there wasn’t this bond or connection. I imagine when the day comes, when I lose someone I’m very close to, I’ll be remembering this conversation. I’ll be sad that their life wasn’t documented more for me to reminisce on. I guess it’s in their hands to document it, but that’s why people keep photo memory albums.
Justin: Yeah, exactly.
Justin: That makes a lot of sense. I think this conversation might inspire me to add more personal anecdotes to my writing. I’m trying to think about sitting down and writing these things. How often do you write up these sorts of things? Is it career-driven, event-driven, or time-driven?
Justin: Yeah, I think the way I’ll get there is by treating my Twitter more and more as a live stream. That’s what I do. I’ve started going in that direction in the past couple of weeks, and I’ve noticed that initially, it was kind of hard to come up with things to say. I would just sit down and try to force myself to write a couple more tweets than usual each day. But eventually, it got to a point where the words just started flowing easier and easier. I think the same thing holds true for writing more personal anecdotes. It’s literally like learning—just do everything, do more of it, and do it better. It becomes automatic.
Justin: I guess in my case, the thing is I want to try to keep my Twitter on brand. It’s a problem. I think what will end up happening is I’ll try to add more personal anecdotes to my tweeting and writing in a way that supports staying on brand. Maybe at some point, I’ll reach a stage where I feel like I have a proper place for less on-brand stuff, maybe as a separate tab on a website or blog posts.
Justin: You know, it sounds like back at the beginning of this conversation, or maybe before the podcast, we were talking about writing. The goal of writing is not to show off your vocabulary size; it’s to connect with the reader. It’s great if you did some cool stuff, but people don’t care. People want to know more about how they can relate to it. You can overwhelm people with interesting stuff, but ultimately, it’s about making it relatable to them. That’s what gets them excited.
Justin: Writing is something I’ve been getting more and more into, realizing just how powerful being a good writer is. Describing the world through math is great, but your stories—stories in particular—are what people latch onto. That’s the key to connecting with people. The key to good writing is connecting with people, and the key to connecting with people is through stories.
Justin: Yeah, it’s cool stuff.
Justin: Oh, no worries. The fact that I’m so happy to do round three with you should speak volumes about how fun it is to do podcasts with you, even in the face of these opportunity trade-offs. I always feel like we end up on a lot of really interesting insights about things we didn’t plan at all. It’s always great food for thought.
Justin: Talk about more things, yeah. I’d be happy to. I can’t wait to. I just want to repeat again: three months, going from not knowing how to solve a quadratic and struggling with fraction arithmetic, to now throwing back proofs of the chain rule in calculus. Dude, that is amazing. You’ve got to stick with this. I know you’re totally planning to, but I’m really enjoying watching what’s happening. There are so many other people on Twitter doing similar things—speedrunning all this math, starting with almost no math foundation, and now getting to the point where they can do university-level material. I’ve had the experience of teaching students from eighth grade to twelfth grade, watching them grow up over time. It’s like watching adults grow up mathematically over months. It’s kind of a brain bender, but amazing to watch. I can’t wait to see how you progress.
Justin: Hopefully, we can engineer a solution to that by the time it happens.
Justin: Again, always a pleasure. I’ve always enjoyed our chats.
Prompt
The following prompt was used to generate this transcript.
You are a grammar cleaner. All you do is clean grammar, remove single filler words such as “yeah” and “like” and “so”, remove any phrases that are repeated consecutively verbatim, and make short paragraphs separated by empty lines. Do not change any word choice, or leave any information out. Do not summarize or change phrasing. Please clean the attached text. It should be almost exactly verbatim. Keep all the original phrasing. Do not censor.
I manually ran this on each segment of a couple thousand characters of text from the original transcript.
Want to get notified about new posts? Join the mailing list and follow on X/Twitter.