Golden Nuggets Podcast #39 (Round 3): MA’s upcoming machine learning course

by Justin Skycak (@justinskycak) on October 30, 2024

Link to Podcast

Rationale, vision, and progress on Math Academy's upcoming Machine Learning I course (and after that, Machine Learning II, and possibly a Machine Learning III). Design principles behind good math explanations (it all comes down to concrete numerical examples). Unproductive learning behaviors (and all the different categories: kids vs adults, good-faith vs bad-faith). How to get the most out of your learning tasks. Why I recommend NOT to take notes on Math Academy. What to try first before making a flashcard (which should be a last resort), and how we're planning to incorporate flashcard-style practice on math facts (not just times tables but also trig identities, derivative rules, etc). Using X/Twitter like a Twitch stream.

Want to get notified about new posts? Join the mailing list and follow on X/Twitter.

Link to Podcast

The transcript below is provided with the following caveats:

There may be occasional typos and light rephrasings. Typos can be introduced by process of converting audio to a raw word-for-word transcript, and light rephrasings can be introduced by the process of smoothing out natural speech patterns to be more readable via text.
The transcript has been filtered to include my responses only. I do not wish to infringe on another speaker's content or quote them with the possibility of occasional typos and light rephrasings.

∗ ∗ ∗

Justin: Machine learning one. It’s going to be a legit machine learning course. We already have math for machine learning. It’s for anyone who signs up on our system wanting to learn machine learning. They go through this Foundations Sequence: Foundations One, Foundations Two, Foundations Three, which covers your standard high school and a bit of undergrad math. Then you jump into math for machine learning, which covers all the math you need to know to take a proper machine learning course. Up until now, we haven’t had a legit machine learning course. It’s all been about the math supporting it, like multivariable chain rule, gradient, etc. We’ve been working with probability distributions and more. But now, we’re building out a course that covers real machine learning algorithms.

The reason we got started on this is from a conversation with Jason Roberts and his wife Sandy, the founders of Math Academy. We were talking about how our learners are doing and what they’re excited about. I told them, “Jason, I think about 70% of our adults on Twitter are excited about machine learning.” Most of our users are from Twitter, and many want to learn machine learning. It makes sense since that’s a big interest.

It seems kind of silly that we support you with all the math up to machine learning and then say, “Just go find another course online.” The courses online haven’t been great. I’ve been pretty disappointed with machine learning resources. Some are halfway decent, but most books or resources just tell you in broad strokes how an algorithm works and give you conceptual intuition so you can check a box, like “okay, I kind of understand this now.” Then you run some code off a tutorial, copy and paste, and say, “Wow, I did machine learning!” But you’re not actually doing the math yourself. You really need to get into the nuts and bolts of this stuff.

There are some resources that point you in the right direction, but you hit a wall quickly on the difficulty, and you just fall off. It feels like the right time for us to build out more courses. So many people want it, and there’s a lack of good scaffolded resources. It’s also a missing piece for getting people where they want to go career-wise. Many people sign up for Math Academy to scale up in math because they believe it will transform their life in some way or improve their future prospects. While knowing math is important, there’s a jump that many are trying to make—getting themselves into a new position. Right now, the knowledge of actual machine learning and the ability to apply it is a missing piece in that bridge to where they want to go.

That’s been the inspiration for it. We’ll still do other math courses like real analysis and proof-based linear algebra. We’ll do everything, but the priority has shifted. We were thinking, “Let’s finish all undergrad math and then expand into machine learning and computer science.” But when the universe puts an opportunity in front of you, it’s dumb not to take it.

This machine learning course will actually be several courses. Initially, we were planning just one course, but as I scoped out the topics, I realized there are a lot of topics in machine learning. Once you start breaking them down into individual topics, you realize it’s more than a semester-long course. Machine learning isn’t just one standardized course. It’s like high school math—it’s not just one course. You have arithmetic, algebra, geometry, calculus, and other subjects. It’s the same with machine learning.

Our plan for machine learning one is to cover classical machine learning: linear regression, logistic regression, clustering techniques, decision trees, and neural nets. It’ll stop around convolutional neural nets, as most introductory machine learning courses do. Convolutional nets are the start of interesting architectures we can build with neural nets. This opens up a whole range of different architectures like transformers, LSTMs, recurrent neural nets, and others.

There are so many topics to cover. You can’t fit all classical machine learning and the different neural net architectures in the same course. That stuff will go in machine learning two. There are also different ways to train these models. For example, in support vector machines and logistic regression, you can train models through gradient descent. Sometimes, linear quadratic programming is used to do this more efficiently. Some hardcore machine learning courses may cover that, but we didn’t have room to fit it in.

Machine learning two will be a second pass at topics from machine learning one, focusing on more advanced setups, architectures, and variations of the models. After that, there will likely be a machine learning three that covers cutting-edge topics. Think transformers: even if you know how a transformer works, you’re not at the cutting edge yet. There are more sophisticated techniques out there.

Ultimately, we want to get people to the point where they can pick up a machine learning paper and understand the background information needed to implement it.

Justin: Yes. The goal is to have these little project-like tasks, where we decompose a small coding project into a bunch of steps. It’s kind of like one of the topics that you learned, and you have to implement it in code.

One of the reasons this approach feels right is, first of all, these coding exercises. It’s hard to make this as a repeated, spaced repetition type of thing. For example, you code up a model from scratch. What do we do? Ask you to do that again? You just copy and paste your code from last time. How do you create a new variation of the problem? It’s really just a different project.

Justin: Yeah, exactly. Right. That’s the challenge with jumping straight into project-based learning. You need to target the core underlying skills as granularly as you need to. But yeah, I’m excited for it.

We’re seriously prioritizing this course more than any course we’ve prioritized in the past. To give you an example, starting next week, I’ll be working full-time on content development for this course. Usually, I do my quant stuff, my algorithms course, or work on analytics and task behavior analysis and various things with the model. But right now, once I close the loop on a couple of things, I’ll be fully focused on this machine learning course.

Justin: I think the secret ingredient is just having experience doing this stuff in the past. On one hand, yeah, I usually focus on the technical aspects of the system rather than the content. But when I started with Math Academy, I was actually working as a content developer because that was what was needed at the time, and I had a bunch of experience writing content. I used to do a ton of stuff—hundreds of lessons in the system. I also spent three years teaching a highly accelerated quantitative coding course sequence within Math Academy’s original school program. A lot of these topics, particularly in classical machine learning but also in coding in general, I spent three years teaching to high schoolers who already knew calculus, multivariable probability, statistics, etc.

Something we always joke about at Math Academy is having to teach all this stuff to kids. They had all the prerequisite information because we taught it to them, but it can be challenging to get something to work for a kid with a lower attention span, especially when they are in a school program. It’s not always driven by pure intrinsic interest. If you can make that work for kids, then you can scale it for adults.

It just comes down to having years of experience going through the grunt work of doing this manually. It’s hard to replicate without that experience. I’ve tried, and me and Alex both have tried using chat GPT in the content development flow. It can be good for idea generation sometimes, but at the end of the day, it doesn’t solve the problem for you. It generates ideas that can help you scaffold things, but you need a really good mental model of how a student is thinking about these things, what confuses them, and what it takes to scaffold things up.

It also helps having done this for math subjects like algebra, calculus, and differential equations, where it’s more straightforward to scaffold things. In math courses, you can flip open an algebra or calculus textbook. People generally try to scaffold things. It’s not always at the ideal granularity, but it’s at least directionally correct. But in machine learning, a lot of resources I’ve seen online are not even directionally correct. It’s hard to know how to do this unless you come from the easier case of scaffolding math. It’s like, “We’ve done this for math, now let’s try a more challenging thing.” Having done this manually in math has scaffolded us into the situation where we know how to do it for machine learning.

Justin: I think there are a couple of components to it. There are two groups of principles. One group is inherent to the explanation itself. The other group is, in order for this to really happen, you need to be up to speed on your prerequisite knowledge. Not only having learned it, but having it relatively fresh in your head. Not like, “Oh, I learned this five years ago and forgot it,” and then you “learn” it again but not really. You need to be spun up on your prerequisite knowledge. Once that condition is satisfied, a good explanation can help make things click for you.

The first thing that pops into my head is just having a concrete example. It’s always helpful to draw analogies and illustrate things conceptually, but having a concrete example that brings it down to actual numbers is key. In a simple case, you can wrestle with it in a hands-on way. In machine learning, there are concepts like overfitting, underfitting, bias, variance, and others. The core of all this is numerical measurements on various functions and data.

If you can bring something down to a concrete numerical example, it feels like a ledge. You can hold on to that, and if you feel yourself slipping or going in a different direction, you can hold on harder to that concrete example. “Oh, this is what it means concretely.” One of the things that makes math explanations hard to follow is when it gets too hand-wavy and nebulous. You might reach a point where you think you’re following, but later you realize you don’t really understand it.

When you boil things down to a concrete example, you become immune to this confusion. You’re not left thinking, “I thought I knew how it worked, but I don’t.” Concrete examples—that’s what I would say is the main thing.

Justin: I’d say some part. There are a couple of ways. There’s some internal knowledge, like, “Oh, I know from experience that students always mix up this with that.” We need to be very careful not to intermingle those two things or just to clarify the difference between them in the explanation. We also run analytics on all the lessons we put out. We can tell what percentage of students pass on the first try, how many pass within two tries, and drill down to see where people are getting stuck. We look at which knowledge point or which questions within that knowledge are causing issues.

If the pass rate is low, we can identify the specific issue and take corrective measures, such as scaffolding up more in that area. Sometimes we split topics into two if we realize we’re trying to bite off too much in one topic. We’ve done this for years, so we have a pretty good idea of what issues we’ve had in the past and how to avoid them in the future. We don’t run into as many of these issues going forward with new material.

The main thing, though, I’d still say, is concrete examples. Concrete examples. Concrete examples. When you write a concrete example, it forces you to prevent falling victim to the curse of knowledge. When you write a concrete example and provide a solution, you see the steps involved. You might realize, “Oh crap, we haven’t covered that step yet.” You need another topic for that step, or you might notice that you’re stringing a bunch of steps together and that’s an underlying technique.

Another thing is, when you have all the prerequisite material scaffolded into topics that the student needs to know before the current topic, your explanations should be decently short. They shouldn’t be pages upon pages of math. If you write a concrete example and need to explain pages of stuff, it means you’re trying to do too much. It forces you to think, “We need to compress this explanation. It’s way too long.” You need to offload some of this cognitive load into lower-level topics that should come as prerequisites. The student does that, and you can condense your explanation of the top-level concept.

If you don’t have concrete examples, it’s hard to gauge how big these explanations might need to be.

Justin: You mean how do we try to make each lesson the same bite size?

Justin: Okay, I’m just trying to understand the question. Are you asking how we keep the lessons calibrated so that there’s the equivalent of one XP per minute?

Justin: The XP is done in hindsight. It’s not like we say, “Okay, this lesson is going to be 20 XP, let’s write a 20 XP lesson.” We just write the lesson, try to keep it bite size, and try to keep it manageable. If you do that, the XP should fall somewhere between 7 and 25 XP—just spitballing, 7 to 25 minutes. Some topics have intrinsically more load and take longer, while others are easier and involve less computation.

The way we do it is we have a lesson, and then we compute the XP for that lesson. The XP computation starts with how many knowledge points are in the lesson, how many tutorial slides there are, and how many questions per knowledge point on average. We multiply that by the expected time per question.

We have time estimates for questions. Subject matter experts and content writers estimate how long it takes to go through a question. Not to speed through it, but just if you have an idea of what you’re doing and go at a normal pace, how long it takes. We record that information and have a couple of content writers do that estimate, which is averaged over a couple of writers. This gives us a hive mind average.

Then we take that as the initial estimate and calibrate it further based on student data. We see how long students actually take to do these questions. Sometimes, when you’re doing a lesson and learning something for the first time, it takes longer than for someone who knows what they’re doing and doesn’t have to go back. Other times, it clicks for most students quickly and they speed through it, and we actually underestimate. That’s a little rarer, but it does happen.

So, yeah, it’s a highly manual process that gets further calibrated with student data.

Justin: For diagnostics, it does. For lessons, reviews, and other tasks, it doesn’t. For quizzes, it sort of does. Let me explain. For diagnostics, we measure time. If you’re taking a long time to solve a problem, we won’t give you much credit for solving it. We’ll give you a little bit, but we don’t want to place you too far ahead of your ability. If someone takes five minutes to solve a quadratic equation, they’re not ready to go far beyond that. They need more practice with that.

Justin: Right. The goal is to get you to the point where you can do these foundational skills quickly, without occupying a lot of mental effort, so you can focus on the next thing you’re learning. For that reason, sometimes people get surprised by the diagnostic and place lower than they think they should because they say, “I learned topics XYZ in school.” The question is, “Can you solve problems correctly, consistently, and quickly on those topics?” If the answer is no, then you need more practice. It doesn’t matter if you did it in school. What we measure is whether you can do it at the level needed to continue building on that skill.

Now, for lessons, let me preface this by saying that we eventually want to incorporate time measurements and all these micro-behaviors, like going back to examples or reference material, into the spaced repetition mechanics. But right now, for lessons, all that matters is your accuracy on the questions. That’s the only thing that feeds into the spaced repetition mechanics. Basically, if you’re solving questions correctly, you get positive credit. If you miss questions, you get less positive credit. If you miss a lot of questions or fail tasks, you get negative spaced repetition credit.

The reason for this is there’s a lot more variance in lesson performance. Different students can come out of a lesson at different speeds. If a student can solve questions in a lesson and get them right consistently, even if they’re kind of slow, we don’t want to hold them back and have them keep practicing the same thing over and over again. They can get implicit practice on these skills by building on their knowledge with more advanced topics.

We don’t want to take too drastic an approach when factoring time into spaced repetition for lessons. But at the same time, if a student is slow to solve problems, even if they’re solving them correctly, we want to trigger reviews earlier. Let’s help them get a little faster.

Also, as we incorporate task behavior analysis, if a student is going back to the example for every problem they solve, that’s not a good sign. They should try to solve it without the example. We want to measure that behavior in lessons and bring it into the spaced repetition mechanics. We also want to incentivize proper learning behaviors. If you go back to the example by default for every question, without even trying to solve it first, that’s not ideal. You’re using the example as a crutch.

We don’t want to give you as much spaced repetition credit, nor as much XP. You’re not aligning yourself with the optimal process for building up knowledge. On the other hand, if you try to solve questions without relying on the example, I’ll give you more XP. If you’re working through this with the right habits and building more as a result, that’s worth the bigger reward.

Justin: That’s a good question. Mysterious, unexplained behavior… This reminds me of a time back in the early days. I saw something in the database that initially seemed like my model was doing something stupid. Students would get an answer correct or do well on a quiz, and then get a bunch of follow-up reviews on questions they had already answered correctly.

The question was, why is the model assigning them follow-up reviews? That sounds like something’s broken. I dug into it, and it looked like the database was just changing itself after it was processed. I thought, “Wow, this is weird behavior. Whatever the student is doing is very unusual.” Of course, I talked to Jason about it. I said, “Jason, I don’t know how else to say this, but I think the database is doing something weird.” He responded, “This is students cheating.”

We looked into it, and sure enough, the students were cheating. They were opening the quiz in a new tab, submitting it, seeing what questions they got wrong, and checking the correct answers in the new tab before submitting the changes. It was just an exploit we had at the time. I didn’t know about it then, but now I see this sort of weird, unexplained behavior often indicates that people are trying to exploit the system in some way.

Now, as I think more about your question, it’s probably more about learner behavior—if there are weird things students do that they think are productive for learning but aren’t. There are definitely weird behaviors, but I wouldn’t call them unexpected. I guess I’m used to it now. Here are some things that might be weird and unexpected to a reasonable adult using the system who hasn’t dealt with these issues for years, but that are totally expected to me.

First, going through a diagnostic and guessing on all the questions, submitting “I don’t know” to everything because they don’t want to do the diagnostic. Another one is going through the diagnostic, looking up material, getting a question, and thinking, “I bet I can figure this out if I look it up online,” then taking half an hour to solve the question before moving on. They may still get it wrong, go on to the next question, and then complain that the diagnostic took them five hours.

Another behavior is clicking on a lesson, skipping the tutorial slide, skipping the example slide, trying to solve a problem, getting it wrong, and then spending 10 minutes confused about how to solve it. They don’t go back to the example or tutorial.

When I was teaching classes at Math Academy, I saw this all the time. I’d get a class of 10 sixth graders on their first day using Math Academy, and I’d just think, “Okay, which one of you is going to be the one who just rushes through everything, doesn’t read anything, and gets stuck on questions?” Then I’d have to sit with them and show them not how to do the problem, but how to approach it. They had to read the tutorial slide, the example slide, and write things out on paper.

Some kids were speed demons who wouldn’t read anything. Others would take a while on a question, struggle because they weren’t reading the material, and then take their best guess. You’d see seven failed tasks in a row in just 10 minutes. It was baffling. They had zero confidence, but were just going through it all without really engaging.

Justin: Oh, also, getting questions wrong and then not reading the solution to figure out what you got wrong. Just going straight to the next question and thinking that somehow magically you’ll be able to do it, then making the same mistake again. People, especially kids, will just fall off the rails unless you intentionally try to corral them and coach them.

Initially, when I started teaching and inspecting user behavior, all this seemed like, “Wow, people are really not approaching this the right way.” But, having seen it over and over again, sometimes you get an email from a parent saying, “My kid is struggling with Math Academy. They say this explanation isn’t good.” Then you look at their data and see they spent two seconds on the question and didn’t read the example. It’s just one of these adversarial behaviors.

Sometimes this is in good faith, like the kid is actually trying to solve the problem but forgets that the tutorial or example slide is there. It sounds silly, but when you’re a kid, you forget a lot of things. I remember one kid who forgot to write their name on the AP calculus exam, which is a big exam, standardized. Another kid just circled answers in the test book instead of filling in all the bubbles.

Some of this is just not being careful, but other times, they’re trying to game the system or don’t want to do the work. They try to create a confusing scenario where they can trick their parent into thinking they’re doing a lot of hard work. I remember an email from a parent saying, “My kid has been struggling with the system. They’ve been doing four hours of work every day for the past month and can’t get any XP.” We looked at their data from the past two weeks: they spent eight minutes total and answered one question. It’s like, what are you talking about?

There’s a lot of that kind of thing. Sometimes adults do weird things, sometimes kids do, and sometimes it’s unintentional and in good faith. Other times, it’s in poor faith. There are so many different dimensions to it.

Justin: The adults are typically less adversarial. One kind of failure mode that adults sometimes get into is that, especially on their diagnostic, they’ll try to grind through questions that are well beyond their capabilities. The reasoning is usually, “I just want to get as far as I can and face the most challenging problems. You’re supposed to try as hard as you can, right?” They take that as “struggle with it for as long as you’re willing to put up with.”

In reality, the purpose of the diagnostic is just to diagnose whether you’re able to do it comfortably, quickly, and correctly, or if you need more practice. What they should do is say, “I don’t really know how to do this without reference material” or “I covered this once five years ago, but it would probably take me half an hour to figure out how to do it again.” There’s no need to struggle with it. You’ll get more practice if you need it. Don’t try to fake the system into thinking you know how to do this problem, because then you’ll end up further ahead than you should be.

That’s a good faith mistake. You’re trying to work as hard as you can and put your best foot forward, but sometimes that’s not actually what you should do. We need to handle this better in our diagnostic, like having a trigger. We put a message on the screen before the diagnostic, but who reads those? We need to have a pop-up if someone is spending too much time on a question, saying, “Hey, seems like you’re taking a while. If you’re not sure how to do this, just click ‘I don’t know.’ Don’t grind through it.” If enough time goes by after that, we should just move them to the next question.

Justin: I would say that’s definitely a good one. Quizzes are meant to be closed book because they’re trying to gauge whether you need more practice on something or not. If you have to look it up, then you need more practice. You don’t have the level of recall we would like. In general, I would say the number two suggestion is that. The number one suggestion I would give to any adult wanting to get the most out of the system is just to try not to rely on the examples or solutions of previous problems if you can help it.

Initially, use the reference to figure out how to solve the problem. But if you go through a work example and think, “Okay, I got this,” then go to a problem and realize you forgot how to start, don’t just go back to the work example. Don’t keep the work example open in a separate tab and try to transpose the solution technique. The point is to try to recall as much as possible and use the work example almost like a spotter at the gym. You’re the one lifting the weight. The spotter isn’t lifting the weight. If you’re really struggling under the weight, the spotter helps, but if you rely on the spotter for everything, you’re not lifting the weight.

That’s the issue with quizzes. If you can’t remember how to do something, sometimes that just happens and means you need more practice. But if it happens all the time, it typically means you’re not engaging in recall and retrieval practice during lessons and reviews. I’ve seen this in some adults who, in good faith, may be trying to go back to the work example carefully. They have the example in a separate tab every time. They don’t realize the problem is that they’re not recalling. They think they’re being conscientious and disciplined, taking great notes, and using them to refresh on how to do the problem. But they’re shooting themselves in the foot because they’re not practicing retrieval. They’re just using the spotter.

Justin: The goal is to try to recall it every time. Trying alone doesn’t trigger the testing effect because it’s about successful retrieval. The idea is that if you’re actually trying and putting effort into weaning yourself off the resource, and only look at it when needed, you’ll get yourself into a position where you are succeeding and retrieving more and more about the solution technique each time you try.

In the scenario you suggested, where you’re continually trying but continually having to rely on the work example, as long as you’re relying on it less and less, that’s good. But if you repeatedly can’t figure out how to start the problem and always have to go back to the work example, it seems like you’re not really trying to remember how to do the problem. If you apply information from the work example to the problem, go to the next problem, and then forget that same information right after, that’s an issue.

I’ve seen that happen with kids before. It always turned out that they weren’t really trying to remember and put in more effort. It’s like someone going to the gym saying they’re working out the right way, but somehow the amount of weight or reps they’re doing isn’t increasing. Biologically, the quantity should increase—the weight or the reps. Your capacity should improve. If not, it indicates something is wrong with the way you’re lifting the weight.

Excluding edge cases like overtraining, if you’re a beginner at the gym and doing a basic workout, you should see improvement. If not, something is likely wrong with your technique, possibly because your spotter is doing too much for you.

Justin: That’s a good question. I think the answer is that there’s a trade-off. It’s a little hard to optimize, and the safe thing is typically to just work through or look at the example before doing the problem. But if you go about it the exact right way, kind of threading the needle, you might get a better outcome.

The trade-off is like this: let me explain the failure mode with that. Most of the time, when people start using that strategy—trying to solve the next round of problems without looking at the example beforehand—there are a number of things that can happen. One of the most important things is that they can spend too long on the problem, losing track of time. They might look up and realize 20 minutes has passed and then go back to the example. It’s not necessarily unproductive for learning, but when you consider the opportunity cost, there are better ways to use that 20 minutes. You could have finished the rest of the lesson and moved on to other things.

Another issue is that sometimes adults will skip the example, manage to solve the problem, but end up solving it with a solution technique that’s overfit to the specific features of that one problem. Then they realize, on the next problem, that the solution technique they used worked for the previous problem but failed on this one. They get frustrated. If they understand how learning works, they might think, “I overfitted the solution technique,” and decide to go back and look at the example. They’ll then compare the more general solution technique from Math Academy with what they came up with and try to identify the shortcoming in their approach.

There is some positive learning from this, but there are ways you can fall off track. For instance, you might drag this process out over such a long period of time that the additional learning is not worth the opportunity cost. It can also feel demotivating, and you might stop and think, “This is too hard.”

I wouldn’t say it’s necessarily a bad thing to look at the next set of problems and see if you can solve it without the worked example. But you need to be really careful not to spend too much time on it or get demotivated. Even if you come up with a solution that seems obvious, like, “Oh, you just do this and solve the problem,” it’s still a good idea to go back to the worked example or look at the solution to see if your solution matches. Then ask yourself, “Is there a real difference between the approaches?” Sometimes there is, and sometimes it’s subtle, but it can impact you in the future.

You definitely don’t want to practice a solution technique that doesn’t generalize properly, because then you’ll start building automaticity on the wrong thing. It’s like building a bad reflex. Imagine playing sports and developing the wrong reflex. If a goalie is used to going the wrong way when the shooter exhibits certain characteristics, it’s going to be hard to unlearn that behavior.

Justin: That’s a really good concrete analogy. I think that’s exactly it. Unless you’re 100% confident you know exactly what you’re doing, just read the worked example and apply that technique to the next problem.

Justin: Yeah, you’re exactly right. Try to solve the problem, maybe struggle with it a little bit. Don’t spend half an hour just staring at the screen or trying things that don’t work. Give it a solid few minutes. Don’t just give up after 10 seconds. Try, “I don’t know how to do this,” then look back at the reference. Give it a real attempt for several minutes. If you’re making progress, keep going a little longer. But if you reach a point where you’re banging your head on the wall, not knowing how to do this, and after a couple of minutes, you’re still stuck, go back to the reference. But only peek at it. Don’t look at the full solution. Just look at the part where you’re stuck, then go back to the problem and try to carry out the rest on your own.

It’s like the spotter at the gym. Don’t let the spotter lift all the way for you if you can help it. Just have the spotter get you over the edge when you’re having trouble. Do as much as you can on your own.

I would agree, don’t use the reference material at the very outset. It should be your last resort. I also recommend that guessing should almost never be the answer to a problem. If you don’t know how to do the problem, the reference material is there. Everything is mapped out, including prerequisites, core concepts, and the worked examples, which should be enough to help you figure out how to solve the problem.

Sometimes, there might be a specific aspect of the problem that just doesn’t click, and after wrestling with it for five or ten minutes, you’re not getting anywhere despite using the reference material. In that case, it’s okay to take your best guess, but it should be rare. You can’t spend all day on one problem.

Justin: Totally agree. That’s one of the main features of spaced repetition. You let your memory decay to a point where it’s difficult to overcome the decay and retrieve the information. But if you’re able to overcome that difficulty, it really increases your retention. Desirable difficulties in general.

Justin: What you’re describing we typically refer to as layering. You layer additional knowledge onto what you’ve learned, building on it. This gives you implicit practice with your lower-level skills, and the more deeply ingrained those skills become, the more structural integrity your knowledge base has. It’s like when you’re coding a project and have to add new features or capabilities. Often, you start building a feature and then realize you need to refactor some lower-level code to make it fit. Eventually, your lower-level aspects become really strong. It’s the same way with knowledge.

Justin: I explicitly recommend against taking notes. There’s a difference between taking notes that you’re going to refer to in the future and use as a crutch versus thinking on paper and diagramming something out. The reason I say not to take notes is that it’s too tempting to go back to them in the future and use them as a crutch. You want to make it annoying to look up material. If your notebook is right beside you, and the distance between you and looking up how to solve the problem is just flipping the page and looking at your perfect notes, you’re going to over-rely on that. It’s just too tempting.

Especially if you put so much effort into your notes, you’ll think, “Am I never going to use these?” That temptation will make you rely on them, which will hurt you because of what we talked about earlier regarding the retrieval process. You have to try to retrieve as much as possible without reference material. Reference material should be a last resort. If you really need it, you can go to the system to brush up on a topic. You’ll get there quickly, but it might be a little annoying, and that’s a good thing. You want to be incentivized not to rely on it.

If you’re reading an explanation, working things out on paper, or diagramming how concepts relate to each other, that’s fine. When your notes are just augmenting your thinking process and you’re basically just listening and thinking on paper, with the intention of throwing that paper away later, there’s no problem with that. The goal is to avoid using your notes as a crutch and to ensure they don’t slow down your learning process. Those are the main pitfalls to avoid.

Justin: It’s intentionally built in where, if you really need it, you can go back to the reference material quickly. But we don’t want to make it so easy that you’re always tempted to look at it. We try to strike a nice balance, and you really don’t need to be taking notes on paper and referring back to them.

About the flashcard stuff, I’ve also heard from a couple of people who ask if they need to make flashcards for topics, like derivative rules. The answer is a little bit of yes and no. You don’t want to make flashcards for everything you read because we have a spaced repetition system that will take care of most, ideally all, of your review.

You also shouldn’t freak out if you did a lesson, and the review comes several days later and you’ve kind of forgotten how to do the problem. This is expected at the beginning of spaced repetition because your memory decay curve is dropping so fast. If we’re a little off with when you should review a topic, or even if we decide the right time, but you took a day off or had a quiz, there can be some noise in the process. A little bit of noise at the beginning of spaced repetition can mean you forget how to do it and need to glance back at the reference to refresh.

Give it some time to get further into the spaced repetition process. Maybe several weeks, even a month. If you find you’re still having to look up stuff all the time, and it’s been a month since you learned it with several exposures, then it might make sense to make a flashcard. But before you make one, the first thing is to ask yourself: are you actually engaging in retrieval practice?

Make sure you’re not just defaulting to looking it up without trying very hard. Or are you referring back to notes you made and using them as a crutch? If you’re using them in a cycle of forgetting, it means you’re not trying to remember. If you’ve covered all those bases, then sure, it may make sense to make a flashcard, but that shouldn’t happen often.

In some cases, depending on how quickly you’re forgetting things, you might want to make a flashcard for tricky identities or derivative rules. This could be for stuff that’s not syncing in as well as you’d like after many exposures. Jason and I have actually talked about integrating this into the system.

For example, if someone is having trouble remembering derivative rules or trick identities, these are ultimately just math facts. It’s similar to multiplication tables, which are just math facts. There are ways to help kids automatically retrieve multiplication facts, and it involves time-renewable practice, flashcard-style. We want to build that into the system. I started working on that last summer, but I had to focus on other things. We have plans to include math facts practice for arithmetic in the system. The same approach will extend to trick identities and derivative rules—just these facts you need to know—and provide time-renewable practice on them as well.

Justin: Yeah, or the recognition-type problems.

Justin: That’s a really interesting question.

Justin: It’d be so nice if we could know exactly what the neural connectivity is of all these math topics and students’ brains. We have this knowledge graph of math topics and how far along you are in the spaced repetition process, which is kind of like how solidified this is in your brain. But these are just estimates based on answering questions correctly or incorrectly. It’s based on behavioral data.

It’d be amazing to actually know the biological health level of this. Though, that seems pretty far off. When I was in college, I got really interested in computational neuroscience. I came in with a math background and thought, wouldn’t it be amazing if I could have a dataset of all the neural connectivity and properties of the neurons in the brain? And I thought, just having that information would be incredible.

Right now, the level of granularity at which we read out brain signals, even in brain-computer interfaces with brain activity and brainwaves, is still very aggregate. You can do some machine learning on them to match the properties of your aggregate brain metrics to different actions, but understanding the underlying conceptual mapping of information in someone’s brain seems very far away.

It would be so cool to actually have a literal physical MRI of a student’s brain. Our knowledge graph is kind of like an MRI of the student’s brain—an approximation of how your math connectivity is. But it’d still be interesting. If or when that sort of thing exists technologically, there will likely be a number of ethical issues to consider. So, yeah, I don’t know. It’s an interesting question.

Justin: That’s interesting. In the ideal learning system, it responds to emotional behaviors or different cognitive states of a student. That’s kind of what a good coach does with an athlete—they can tell when the athlete is in a bad headspace, and if they’re struggling with a particular exercise, maybe it’s time to switch it up.

As far as us incorporating that sort of stuff, I think we want to start off with as much information as we can get from a student’s clicks within the system—their answer behavior, their return to work examples, their navigation within the system. I think you can infer a lot of the actionable emotions a student is experiencing based on their clicks and navigation.

It would be really interesting to have more affective information about a student’s state, but there’s always the trade-off, the costs. It introduces a lot of complexity, like privacy issues, especially when it comes to kids. It opens a whole can of worms, like what kind of information you’re storing. We typically try to avoid sensitive information or data that makes people uneasy. We don’t really want to get involved with that.

Maybe someday, way down the road, it turns out that measuring that stuff could seriously improve the system, and it’d be worth the headache of managing that data. But for now, I’m not sure. Maybe way down the road.

Justin: Yeah, it’s a good point. You’d have to get some kind of insight into what’s going on in somebody’s head that results in you taking different actions than what you can infer based on behavioral data. It’d be kind of silly to measure all these metrics and come to the same decision that a coach could make just by watching a video of the player on the field.

Justin: They only ran like three steps in practice. Yeah, exactly. Don’t need a neural link to tell that, but it’d be interesting to see how that develops.

Justin: Pretty much. There is technically a limit to how far it looks back. At least for now, there are limits on my to-do list to refactor the diagnostic algorithms. I’m a bit behind on that to look back all the way. But right now, I think Math for Machine Learning either looks back all the way to the beginning of Foundations 2 or maybe all the way to Foundations 1. I can’t remember off the top of my head, but for these university courses, it typically looks back to early high school math.

If you could reasonably take the course and just have a bunch of foundations that need to be filled in, you’re fine. But if you don’t know how to add fractions and you sign up for multivariable calculus, it’s probably not going to look back that far. It looks back pretty far, though. If you even remotely think that the topics in Math for Machine Learning might be appropriate for you, we can probably capture all your missing foundational knowledge in the diagnostic for that exam and have you fill it in along the way.

Justin: Right. Whenever you take a diagnostic for the course, we’re trying to assess your knowledge of all the topics in that course and all the prerequisites of those topics.

The easiest way to think about this is probably like someone who signs up for calculus. What kind of foundations are captured in there? Well, it’s going to look back through pre-calculus, algebra, and geometry—probably stopping around Algebra 1 or something. We’re going to assess you on all this algebra knowledge, geometry knowledge, and trig—definitely everything that comes up in calculus. But there will also be some stuff we don’t assess you on that might be in pre-calculus, like matrices.

It’s pretty common for pre-calculus courses to include early linear algebra, such as matrices and linear transformations. That kind of stuff doesn’t pop up in a standard single-variable calculus course. So, if you sign up for calculus, do all the coursework 100%, you’ll still have some content from pre-calculus and probably geometry that you haven’t been assessed on and haven’t learned in the system. You can always go back and fill in those courses. But when you sign up for a course, we’re just trying to get you learning all the topics in that course as quickly as possible.

Justin: Describe this again. So they sign up for…

Justin: If you just take a diagnostic for Foundations 1, no, it wouldn’t assess you on any material in Foundations 2 unless there’s some overlap between Foundations 1 and Foundations 2, which is possible. I can’t remember if there’s overlap, but sometimes we have courses with topics in common. It’s possible there’s overlap between those courses.

Sometimes what happens is somebody might take a diagnostic for Foundations 2 or Foundations 3, get totally hammered by it, and then switch down to Foundations 1. They’ll take the Foundations 1 diagnostic and start climbing back up. But they may have gotten some credit for higher-level topics based on that higher-level diagnostic. That actually happens pretty commonly—people drop down.

Justin: Right. You’ll continue to do your reviews. If you finish all the courses in your sequence, you actually unlock an Easter egg in the system where it starts feeding you topics from various other courses, including unfinished ones.

If you learn all the stuff—Math for Machine Learning, Math as a Proof, Algebra, and all the courses we’ve got—you’ll still continue receiving topics on things like differential equations, abstract algebra, probability, and statistics that you haven’t seen yet. These are unfinished courses, but there are decently finished topics within them. This content comes from the material we made for our original school program, where high schoolers were learning university courses. There’s a lot of that content floating around that you’ll still get.

You’ll also continue reviewing what you learned. How much review do you need to maintain your knowledge? That’s a good question. Suppose someone finishes all the topics, or maybe just finishes Math for Machine Learning, and doesn’t care about the other courses. They just want to continue reviewing the math so they’re fresh when Machine Learning 1 comes out. You put yourself into test prep mode, which keeps you in that course and feeds you reviews.

How much of that would you have to do? I’d say, just spitballing, I haven’t gone through this simulation or calculation, but probably about 15 minutes a day or an hour a week once you’ve finished all the topics and are just in review mode. That would be my guess—about 15 minutes a day or an hour a week in maintenance mode.

Justin: It’s a little different. If something is a low-level topic that comes up in a lot of high-level topics, you may never see an explicit review on that low-level topic. We might just give you reviews on those high-level topics. Every task we give you is designed to get the most bang for your buck. We actually compute how much it will elevate your entire knowledge profile. We choose the review topic that encapsulates the most implicit review, subject to the constraint that you’re getting reviews on everything that is due at that time.

I have this, I forget what I call it, I think I call it a review optimizer, but it’s a hardcore algorithm. It chooses a topic, or you tell it some topics that definitely need to be covered. You tell it the knowledge states of other topics and give it the encompassing information between them. You say, “Okay, give me the minimal set of topics that covers all these reviews 100% and maximizes the amount of additional review you get,” meaning pushing off other reviews that would otherwise be coming up. Every task is a very well-thought-out process.

Sometimes, people get confused because they’ll say, “Wait, I did a lesson on this topic, but I never saw a review on it.” The reason is that reviews are happening implicitly. We have ideas to make this a little more visible in the system, to show exactly what implicit reviews you’re getting. You can imagine doing a task and seeing a little knowledge graph animation afterward. It would show the reviews trickling down, like: “You got implicit credit, 50% on this original topic. Your review was scheduled for 10 days from now, but now it’s 15 days from now.” That’s what’s happening under the hood.

Justin: It’s funny what you said about the game. It’s interesting what you said about the Easter Egg as well. Once you finish your course, you get shown previews of what’s currently being worked on. I wonder if that would be an incentive for people to finish so they can see the new machine learning topics being worked on.

–>Justin: Yeah, that’d be interesting. Let me think. The one caveat with the Easter Egg is that we have to have the course connected to the course graph. Right now, all the stuff from the original school program that high schoolers were doing is connected up to the course graph. So you get those topics, even the ones we haven’t officially released yet. But our machine learning course is still in development. Even if we have live topics in there, it’s not necessarily connected to the course graph. I don’t think anyone will be getting machine learning topics as part of the Easter Egg.

It’ll be more about the university courses that we haven’t covered yet, but we do have content from it, like in our school program.

Justin: The challenge is that whenever we hook something up to the course graph, we have to be very careful about structuring the course. The model depends on the graph making logical sense. If you’re making heavy edits to the course, the connectivity and stuff, it gets a little more clunky because everything has to be validated really carefully. It takes some time for the edits to go through.

I guess it’d be worth chatting with Alex or the content director about that to see if he thinks it will work for him. But, again, we’re trying to get this course out so fast that hopefully it won’t matter. Hopefully, it’ll just be here before anyone even knows it. That’s what I’m thinking about.

Justin: That’s fine. Don’t hold me to this, and don’t hold Alex to this, but the goal we’re shooting for is the end of February. That’s the goal. We’re really going all hands on deck with that course. I’m going to be working on that course full time until it’s out as my main project. Alex is working on that. I’ve got Yuri, who’s Alex’s right-hand man, working on it too. A couple of other content writers are working on it. It’s making good progress so far, and I think it should be achievable. But that’s the goal—the end of February. If everything goes well, that’s the plan.

Justin: For the most part, I’ve got a pretty good understanding of classical machine learning. I taught a lot of these topics to students in Math Academy’s school program for several years. I even wrote a textbook on that. So it’s pretty well spun up in my head.

There are times when it’s just the degree of scaffolding required and being very precise. Coming up with well-scaffolded examples does take a lot of thought sometimes, and I do have to revisit the very core of how everything works sometimes.

It feels like I’m in a position where it’s just going back to stuff that I either had a really great understanding of before or have a pretty good understanding of and refining it further.

Justin: To a level of just scoping it out to the granularity that we need, which is honestly pretty fun for me. It’s not too difficult when you have that kind of background knowledge. But the background knowledge has faded a bit in some places, and it’s not as solid as you’d like it to be. You just get to fill in those little gaps.

For the most part, it’s less about relearning how something works and more about thinking about how to scaffold this into examples that can be done by hand. They shouldn’t take 10-15 minutes to solve, just a couple of minutes. We also need variations, not just running through the backpropagation algorithm. We need to have questions on that computation that we can vary, so we can give you the same type of problem several times to drill in the skill.

It’s difficult because it doesn’t seem like that’s done much in machine learning resources compared to algebra and calculus. That’s where most of the difficulty lies—figuring out how to scaffold this. But it’s coming along well.

Justin: For all the topics in the course, for all the lessons, it’s going to be drilled down to just the math by hand, kind of like the linear algebra course. Initially, when Jason and I were talking about this course, we thought, “Doesn’t there have to be code for students to write?” But really, the core skills, the building blocks of the skills, aren’t about the code as much as they are about knowing the math that’s going on under it.

We do have plans to make some mini projects that pull these mathematical building blocks into writing some code. It seems like there are ways we could keep this within the platform, such as having a little Python editor where you write some Python code, and it spits out numbers for given inputs. It’s kind of like the free response parser. We have free response questions where you type in a symbolic expression, and the parser figures out what mathematical expression it represents and evaluates it for a bunch of random inputs. If it evaluates correctly, it means you’ve got the right expression. We can do something similar with code.

Justin: Not as much as you think. The thing is, with a lot of these machine learning tutorials and classes, they tend to be code-heavy. Often, they’re teaching a framework like TensorFlow or something similar. That’s definitely good to know if you’re trying to get into a machine learning position, but it’s different from understanding what’s actually going on.

You’re learning how to use a framework that does things efficiently, which is good, but actually doing things by hand and understanding what’s being done under the hood feels like it boils down to typical math problems. It’s not always so. For instance, gradient descent. We’re not going to ask a student to work out 100 iterations of gradient descent by hand. It’s more about scoping it down to one or two iterations. Given a setup, we might say, “It’s gone on for a thousand iterations, and here’s our function. This is the point we’re at right now. Does this meet the stopping criteria?” You could define stopping criteria as the number of iterations, the slope of the loss function, or the absolute difference between your previous two estimates.

This allows you to give practice on various parts of the algorithm without executing the whole thing. Of course, a coding project would be more like, “Go implement gradient descent for this particular function.”

Justin: Yeah. I’d be happy to. It’s always a lot of fun talking to you guys. You ask such interesting questions.

Justin: You know, I’m kind of used to automatic responses on Twitter, like I’ve heard the same question 10 times before. With you guys, there’s definitely a lot of new stuff where I have to pause and think about it. It’s really fun talking to you guys. I’ve also heard a lot of good concrete examples for things we were talking about that I never heard before. Like that uppercut and head lean back. That’s such an interesting thing. Yeah. Anyway, I’m more than happy to do another chat.

Justin: Let me just say about the posting. I used to just fire off one post a day. I’d take something I’d written before, maybe a thought I had at the moment, and WordSmith it into a perfect Twitter post, maybe add an image for it. I did that for a while. Then I was talking to Jason, and he was like, “Dude, you should treat this like your Twitch stream. Anything you’re doing, just post about it. People will find it awesome.” I was like, “Really?” He said, “Yeah, you should totally do it.” So I started doing it.

At first, I was worried that posting too much would tank my visibility, but it doesn’t seem like that’s the case. More shots on goal and more opportunities to gain traction. People aren’t really concerned about whether your language is perfect. If you have a decent idea, even if you misspell a bunch of words or don’t use uppercase, people will pick it up sometimes.

That’s been my new strategy. Stream of consciousness. Anything that’s halfway decent, I post it. There’s a limit to that though. I know it’s a slippery slope, where you might start posting just random stuff. I don’t want to get to that level though. I’m still far from that.

Justin: Keep it on the middle ground.

Prompt

The following prompt was used to generate this transcript.

You are a grammar cleaner. All you do is clean grammar, remove single filler words such as “yeah” and “like” and “so”, remove any phrases that are repeated consecutively verbatim, and make short paragraphs separated by empty lines. Do not change any word choice, or leave any information out. Do not summarize or change phrasing. Please clean the attached text. It should be almost exactly verbatim. Keep all the original phrasing.

I manually ran this on each segment of a couple thousand characters of text from the original transcript.

Want to get notified about new posts? Join the mailing list and follow on X/Twitter.