Q&A #2: WMC, chunking subskills in LTM, writing down work, using/applying vs deriving/proving

by Justin Skycak (@justinskycak) on November 24, 2024

Link to Podcast

Understanding working memory capacity. Scaffolding new skills by chunking subskills into long-term memory. Why it's beneficial to write down your work. Why solving problems is necessary. Using/applying mathematical tools vs deriving/proving them. What's good vs inefficient in the standard math curriculum.

Want to get notified about new posts? Join the mailing list and follow on X/Twitter.

Link to Podcast

The transcript below is provided with the caveat that there may be occasional typos and light rephrasings. Typos can be introduced by process of converting audio to a raw word-for-word transcript, and light rephrasings can be introduced by the process of smoothing out natural speech patterns to be more readable via text.

∗ ∗ ∗

Intro

So I said I would do another Q&A session this weekend. Here I am again.

I put it to a vote what category I should do because I’ve just got this massive drill of questions. I want to do my best to answer them, but it’s kind of gotten to a point where it’s overwhelming, and I just need to focus on one coherent category at a time. Otherwise, my brain is going to turn to mush and pour out of my ears because I’ll be just context switching between all these different question types.

It looks like it’s actually surprisingly even. Learning came out ahead, followed by coding, math, and productivity. I was kind of surprised to see how even the votes were. I thought there would be one category where most people would want to hear the most about. My prediction was that learning would come out ahead, but I thought it would be something like 70% learning and then 10% math academy, 10% coding, 10% product. But learning came out ahead, and there’s interest in the others too. So, I’m going to focus on learning for this Q&A session, but I’ll do other sessions focused on the other categories. I’ll probably do coding next, then math academy, then product, and circle back to learning.

This morning, I was going through all the questions on learning, and there were just too many for a single session. When I answer a question, I want to give it a good answer that I can think properly about, not just spit out something like a 30-second or one-minute surface-level answer. That’s not fun for me, and it’s not even really informative. I want to get to the point where we’re actually thinking deeply about these questions, which means I have to filter the questions I’m answering and start with the ones I think I can get deep into. I pulled those questions aside, and I’ve got six questions that I’d like to cover today. I’ll probably talk for five to ten minutes on each one just to give it a proper answer.

Let’s jump into it.

The questions I’ll answer today cover a variety of topics. We’ve got some about working memory, chunking, recommended ways to study something like data structures and algorithms, but I think it generalizes to other topics as well. We’ve got questions about the physical aspect of learning, like writing things down or using your body. There are questions about learning versus understanding versus doing problems, drawing distinctions between those things. We’ll also discuss learning versus just applying a formula without understanding or how to derive it, and questions about the curriculum for schools and universities. If I had to redesign it, what would I change, or are the groupings good?

Each of these topics brings a lot to mind, so I’ll just do a brain dump on each one. Starting with this one:

Most people can only hold about seven digits or, more generally, four chunks of coherently grouped items in working memory. Fabian says this is really fascinating. I tried it with eight digits, and my brain just grouped them into three groups instead: 26, 356, 235. Is there any research on compaction done to increase working memory capacity?

That’s a good question. I think there’s one distinction to make here, though. When you chunk things like this and leverage those chunks to store additional information in working memory, you’re not actually increasing your working memory capacity. You’re managing to activate more information in your brain and hold that information, but you’re not actually increasing your working memory capacity. I know that sounds a little confusing, but the way to think about this is to think back to what working memory capacity means at a physical level in your brain.

What working memory capacity really means is the amount of effort you can devote to maintaining intentional, persistent neural activity. When you’re thinking about things, you’re activating neurons in your brain, and that activity spreads to connections with other neurons. When you think of something, you are intentionally rehearsing activity patterns to keep that information in your head. For example, if you have 26, 356, 235, you’re rehearsing those numbers, trying not to let that activity dissipate, just continually trying to fuel it. That takes effort.

Working memory capacity is the capacity for effortful rehearsal of information. You can make the information easier to rehearse in a way that doesn’t actually increase your working memory capacity or the amount of effort you can put forth. You can do this by leveraging long-term memory. Long-term memory is the connectivity that’s baked into your brain from longer-term learning. It’s what’s already wired up in your brain.

When you learn information and retain it for the long term, it’s stored as connectivity patterns in your brain. These patterns make it easier for neural activation to spread throughout your brain. For example, if you want to activate a bunch of neurons, you may have to put forth intentional effort to activate each one. But if those neurons are connected in long-term memory, you can activate just one of them, and the activation spreads to the others naturally. This is because the long-term memory representation makes it easier to activate and maintain the pattern without putting in as much effort.

When you chunk information, you’re relying more on long-term memory to make the information easier to rehearse in working memory. Another way to think about this is with chess players. You can show a chess master a chessboard, and they can recall it from memory, generating the board with high accuracy. A novice, however, will struggle because they don’t have the same long-term memory representations that the expert does. The expert can chunk the board into smaller, manageable parts, allowing them to recall the information more easily. The novice, however, must remember each piece individually, requiring more effort.

The key idea is that long-term memory allows you to chunk information so that you don’t have to remember every individual detail. Instead, you can rely on chunks stored in long-term memory, which makes working memory more efficient.

You can think of it like this: Imagine three neurons in a line, not connected at all. You have to activate each one intentionally. But if these neurons are connected in long-term memory, you can activate just one, and the activation spreads to the others through the wiring in your brain. You can activate the entire pattern with less effort because you’re leveraging long-term memory.

This ability to chunk information is why experts in chess, or any domain, are able to recall large amounts of information without putting in as much effort. Long-term memory representations allow them to make connections between different pieces of information quickly and without as much mental effort.

If you show an expert a random chessboard that doesn’t make sense in a game, they won’t be able to recall it easily because they don’t have chunks for that specific situation. In that case, their working memory is forced to hold all the information, and they perform worse.

This shows that the buildup of long-term memory allows you to effectively increase your working memory capacity. You pull in more information for working memory without requiring more mental effort to rehearse that information. The key distinction is that working memory capacity is not about how much information you can hold, but about how much mental effort you can apply to maintain information in your mind.

When you store more information in long-term memory, you can bring that information into working memory without increasing the effort. This is how you can handle more information in working memory without taxing your mental resources.

Now, I guess the natural question is, how do you increase the amount of actual effort you can put forth in working memory?

The answer is unclear. I did a literature search on this about a year ago, and from what I gathered, it seems that it’s unclear whether it’s even possible to increase your working memory capacity in that sense—how many digits or chunks you can actively hold in your working memory.

There have been studies on this, and it’s something that would be really useful if it were possible to increase working memory capacity reliably. However, in most studies, when people practice something like a digit span test, they can improve on that task. But when you then evaluate them on a different working memory loading task, they don’t tend to improve on it. What tends to happen is they find a way to hack the task, leveraging long-term memory, rather than actually increasing their effort in working memory. They find a way to exploit the situation.

For example, you might have heard of memory palaces, where you visualize information in a specific location, like a house with different rooms. This is an example of a hack that doesn’t actually increase your working memory capacity. It’s leveraging long-term memory structures to help with the task at hand. If you practice using a memory palace technique to improve your working memory, you aren’t actually increasing your working memory capacity; you’re just mapping the task onto a structure you already have in long-term memory.

You can improve your performance on working memory tasks, but it doesn’t seem to transfer reliably to other tasks. I recall seeing some papers where people were able to improve on one working memory task, and that improvement carried over partially to another task of a similar nature. But as the tasks become more different, the effect diminishes.

The state of the research is unclear. There isn’t much support for the idea that you can train your working memory capacity to carry over into real-life situations, like academics. I remember reading work from math interventionists who worked with students struggling in math, often because of low working memory. Their approach wasn’t to train the students to increase their working memory capacity but to help them store more math in long-term memory. By doing that, the students could handle more math without relying as much on their working memory.

It seems the best bet, if you want to increase your working memory capacity, is to write more relevant information into long-term memory. The more automatic your recall is from long-term memory, the easier it becomes to pull that information into working memory without putting in as much effort.

I should mention that I’m not saying it’s impossible to increase your working memory capacity. I’m just saying that, as of the current research, there’s no serious support for that idea yet. However, it’s definitely an interesting topic.

At the current state of things, the best way to increase your working memory capacity seems to be by writing more information into long-term memory. You can develop automaticity with that information so you can recall it easily and use it for chunking purposes. If you try to train yourself on a working memory task, like digit span, and you improve your ability, you shouldn’t expect that to carry over into other areas of your life.

It reminds me of a paper where someone trained their digit span. They chunked the numbers into groups, associating them with running times, such as a fast mile time or a 5K time. They could remember large numbers by thinking about these times, but when asked to remember a sequence of colors, their performance didn’t improve.

Next question: For somebody studying data structure and algorithms, but I think this generalizes to many different situations, would you recommend memorizing a lot of problems early on in a regurgitation fashion, just so the data is cached in your brain in some way? For instance, have one phase be typing the algorithm over and over, then go into deep understanding, or do most simultaneously, similar to how they learned piano by playing scales?

I think there is value in going through the motions, even if you don’t fully understand what you’re doing. Would I say you need to solve a hundred problems of the same type, just going through the motions without understanding? No, that sounds excessive and inefficient. Would I say you shouldn’t try to fully understand the thing first? No. Here’s how I would put it: If you’re learning a new thing, say breadth-first search, then start by trying to wrap your head around it. If it makes sense to you, and you can develop intuition, then go ahead and implement it. That will make things easier.

But if you’re struggling to keep the algorithm in your head, if it’s not making sense, and you’re trying to eat the whole elephant in one bite, then it helps to just practice the motions, to cache the information in long-term memory. That way, you can rely on those chunks. If breadth-first search isn’t making sense to you, and you’re reading the algorithm but it’s not clicking, try doing it with a simpler case. If that doesn’t work, just write it out, get used to where everything goes in the algorithm. That way, you take some load off your working memory, and you have more mental bandwidth to think about why things are happening the way they are.

I think the failure mode people often fall into is resisting the motions. They look at a hard algorithm that’s just beyond their depth and try to fit it all in their working memory, continuing to stare at it and try to make sense of it, even though it’s not working. They get frustrated and don’t move forward, but they don’t start writing it out or looking at a special case. Instead, they keep trying to eat the whole thing in one bite.

I’m not saying you shouldn’t understand algorithms—that’s not the point. What I’m saying is, if you’re struggling to understand something complex, regurgitation practice can help scaffold your understanding. There’s nothing weak about that. Sometimes you don’t have the right mental representations to fit something into working memory, so you need to build up chunks.

Think about it like this: Imagine you’re a gymnast learning to backflip. At first, you can’t do it, so you might try and fail. You could just keep trying the backflip and get nowhere, or you could practice simpler skills that build up to it. By learning those basic skills, you’re setting yourself up to eventually succeed at the backflip.

Memorization and regurgitation are tools in the toolbox. There’s a right time to use them and a wrong time. You can’t just regurgitate an algorithm and claim you understand it. You need to apply it in different contexts, and if you don’t understand the algorithm, you won’t know how to tweak it for other situations.

But should you never memorize or use regurgitation? No, you can use that tool and build chunks in your long-term memory that get you closer to fully understanding the concept. The problem arises when students only regurgitate information without being forced to apply it in different contexts. This happens when the curriculum doesn’t push students to build a flexible mental representation of the information.

If there’s a coding class where all the student does is regurgitate the code for a breadth-first search algorithm, that’s not enough. They should be made to apply the algorithm in different situations, tweak it, and answer elaboration questions. For example, suppose the breadth-first search algorithm fails in a specific case. The student should identify the bug causing the issue.

Next question: What do you think about the physical aspect of math and coding? For instance, writing and typing things down or visualizing spaces using your body?

Again, it comes down to whether it helps you develop long-term memory representations that can scaffold the process and make it easier for you to chunk information into working memory. If it helps the brain activity flow to the right areas, then it can be helpful.

One thing I think helps is writing down your work in math. In addition to developing a memory of the steps involved in solving a problem, you can actually develop muscle memory for the process. For example, when solving an algebra equation, you might write “minus four” on both sides of the equation, draw a line under it, and write the new equation. Over time, this creates a kind of muscle memory. Once you start writing “minus four” on one side, you know you need to do the same on the other side, then draw the line and write the new sides of the equation.

Is this muscle memory fully understanding the rationale behind what you’re doing? Not entirely. But is it getting you closer to the point where you can fully understand? Yes, because it allows you to focus on what you’re doing rather than the mechanics of the process. Writing things down helps take the load off working memory so you can focus on deeper understanding.

Does this mean you have to learn all of math by remembering the hand motions for each equation? No. Not everyone has to write “minus four” on both sides of the equation or draw the line. But if you’re struggling to grasp a concept, going through the physical motions can help you understand what’s happening.

I think it’s a good habit for students to write down their work, even if they don’t need to at that point in their learning. For example, if someone learning algebra feels comfortable solving problems in their head, they may not need to write down the steps. But writing it down can still be helpful in reinforcing the concept.

There comes a point where math becomes too complex to fit in your head, and you have to use scratch paper as an extension of your working memory. It takes practice to get to that point. I’ve seen students struggle because they have this idea that doing math in your head is really cool. The more complicated the equation they can solve in their head, the cooler they are.

But eventually, it gets to a point where the math becomes too complicated to handle entirely in your head. They don’t want to write anything down, and that’s when they start making mistakes. They’re too resistant to writing it down and end up confused, not wrapping their heads around the problem.

I would always call this the “day of reckoning” with math learners.

A student who refuses to use scratch paper will eventually reach a point where they can’t make any more progress because they’re not using the scratch paper effectively. They’ll try to blame the instruction, but it’s really their problem—they’re not using the tools they need.

It’s similar to being on a sports team with novice athletes. There might be one athlete who is naturally fast and can outrun everyone else. In basketball, for example, they can outrun everyone on the court just by being faster. But eventually, that athlete will have to develop other skills like passing, shooting, and jumping. They can’t rely solely on speed forever.

This is similar to a student who can rely on their ability to do math in their head for a while. But eventually, the problems become too complex, and they’ll need to develop skills like writing things down and using scratch paper. If they don’t develop these skills, they’ll hit a wall.

You can’t be a one-trick pony. The game will catch up to you as the complexity increases. If you don’t develop your skills across the board, you’ll eventually run into problems.

You want to make sure that you’re developing skills across the board. Even if you technically don’t need them right now, you should make sure that when the time comes to use them, you’re ready. If you wait too long to develop these skills, you’ll be in trouble.

This is especially important in math. If you don’t develop the habit of writing things down when the problems are simpler, you’ll struggle when the problems become more complex. Writing things down is essential for keeping track of your work and preventing mistakes.

In the same way, you need to develop your math skills incrementally, building up your foundational knowledge so that you’re ready for more advanced concepts when they come.

If you don’t make sure to build the foundational skills first, you’ll eventually hit a wall. The problems will get too complex, and you won’t be able to handle them in your head anymore. You’ll be forced to rely on scratch paper, but by then, it might be too late. You won’t know how to use scratch paper effectively because you haven’t practiced it.

It’s important to start early, even if the concepts are simple. Get used to writing things down and organizing your work from the beginning. If you don’t develop that habit early on, you’ll struggle when things get more complicated.

Building these skills early on will help you move smoothly through more advanced topics later. It’s about developing the right habits and using the right tools at the right time.

All right, next one is, what do you think about learning equals understanding something? For example, whenever I read a math theorem, I truly sit with it and understand the reasoning behind it. Have I grasped the theorem, or is there a hole in my understanding that can only be filled by applying it?

There are two things that pop into my head immediately. There’s an in-theory answer and there is an in-practice answer. So in theory, can you read a theorem, try to understand the reasoning behind it, and do that well enough that you don’t have any holes in your understanding? Sure. I mean, it doesn’t seem like it’s necessarily impossible. But in practice, that just doesn’t work out like that because you’re… nobody is that smart.

Maybe you’re that smart at the beginning of your math learning journey. Maybe a theorem about algebra just comes really, really easy to you. Maybe you’re handed the quadratic formula. This is essentially a theorem about algebra, right? It’s stating that this formula provides the solutions to this quadratic equation. Maybe you’re handed this formula and you fully understand that you plug in the coefficients of the equation into this formula, and you understand. You can immediately jump out to you like, oh, well, the thing inside the square root might be positive or negative or zero, and that kind of governs the amount of solutions.

But you’re going to get to a point where it doesn’t work like that anymore. Nobody has infinite intelligence. Even the world’s smartest mathematicians, they get to the point where they are wrestling with material that is complex enough that there’s no way to get through it without holes in their understanding unless they are solving, like, applying it in special cases to get a feel for it.

I think the students who are typically really good at math that comes naturally to them, when they are reading a theorem and it looks like they just get it, what they’re actually doing is thinking about sort of edge cases of the theorem and kind of working through examples in their head. And most of the time, students need to explicitly practice that. Most of the time, students don’t do that naturally. They just read the theorem. They’re like, okay, the words make sense. I got it.

And there are varying degrees of this. What the problems are meant to do is expose you to situations where maybe you read the theorem, you say, okay, this makes sense. I understand the reasoning behind it. And then the problem smacks you in the face because you’re like, wait, I thought I understood the reasoning behind it, but I can’t, I don’t know exactly how to solve this problem. This seems a little complicated, and it forces you to go back and understand the things that you missed out on the first time.

I guess what I’m trying to say is nobody is so smart that at all levels of math, they can just look at a theorem and say, oh, I understand this fully and be completely sure that they’re not missing some portion of it. Because if you, just think about it. If there were a portion of the theorem that you thought you didn’t really understand, you’d just think about it. If you’re a serious student, you’re going to think about it more before moving on, and then you’re going to get to the point where you understand.

So you’re always going to get to a point where a serious student like that is always going to get to a point where they think they understand it fully. But if you take that and extrapolate it into the future, not every serious student is able to learn math just by reading and thinking about theorems. Like, everybody knows you have to actually solve—or everybody should know by this point that you have to solve problems.

In practice, you have to solve problems to get to the point of fully understanding anything that you may have missed out on. Another thing is solving problems. It’s not just about grasping the theorem in the moment. It’s not just about reading something and having it fluently make sense in your brain. You also have to actually practice retrieving that from memory.

You know how when you read a book, a good story, it makes total sense in your brain, and you fully understand it, and then you finish the book, you don’t come back to it for a while? You’re not like answering questions about it, and now you only remember 10% of what happened in there. The same thing happens with math.

The problems, they not only force you to grapple with understanding that you missed out on when reading the theorem, but it’s also about hammering it into your brain through a form of retrieval practice and getting it into your reflexes. Where you not only think it makes sense in the moment, but it is also like a brain muscle memory, automatic reaction. Like you can just pull it from your head in any sort of context.

It’s kind of like, here’s another sports analogy. Suppose that you take a kid who wants to learn basketball, and you have them come up to the free throw line, and they shoot a free throw and it goes in the net, swish. And that’s the first time shooting a ball from the free throw line. Does that mean they’re done shooting free throws? They’ve mastered the skill? No, they have to continually practice shooting the ball. Even when you’re at a pro level, you have to practice your free throws because you have to keep it fresh in your memory. You have to keep that reflex alive.

It’s not just about being able to do it once. Just because you can shoot a free throw, does that mean you don’t have to practice shooting in a different context? Like when there’s a defender in front of you, or when you’re doing a fadeaway, or maybe you’re not shooting straight on with the hoop. You’re a little bit off to an angle. There’s so many different contexts, and when you grasp, when you’re reading the theorem initially, you’re kind of in this very serene state where there’s not a whole lot of opponents in your way, and it’s just you and the problem. It’s a lot easier to develop an ability to wrestle with it in that setting.

But the point of the problems is to introduce you to various kinds of opponents that you may not have anticipated coming up and force you to just better ingrain that theorem or the skill in your head where you can just pull it out automatically on the fly, reflexively, in different situations and fully master it.

So the theorems and problems are essential to understanding the theorems. If you just read through a math book and you think you understand all the theorems, probably what’s happening is you actually don’t understand them. You just think you understand them. When you do problems, you’ll struggle with it, or there’ll be parts that come into your head that you didn’t think about initially.

Additionally, even if in the miraculous, unlikely scenario that you just managed to fully understand the theorems as you are reading through the math book, and you’re like, okay, I know all the special cases where this might be a counterintuitive result, that might be a counterintuitive result, here’s how you apply the theorem in different contexts. If that just comes naturally to you, well, you close the math book and wait a while, and it’s just going to fall out of your memory.

You’re not going to, if somebody asked you about it days or weeks later, it’s not going to stick in there. You need to actually practice retrieving this information from your head. That’s kind of like what the problems do. That’s what they force.

So, yeah, problems are totally necessary.

What say you to people who say it's not really learning to apply a formula without understanding that formula and how to derive it? That seems like an intuitively good take, but again, Math Academy doesn't follow this. Why does this feel so intuitively true and why may it not be true?

Honestly, this is not even intuitively true to me. How would it not be learning if you go from not being able to apply a formula to being able to apply it? It’s not learning the full skill yet, right? The full skill in its fullest context of mastery is being able to apply the formula and understanding how to derive it. But you can’t say that you go from not being able to do something and then you’re able to do something. You can’t say that’s not learning. That is learning. That is the definition of learning. It is being able to do something consistently that you weren’t able to do before.

Is that the fullest definition of learning? No, but we’re just talking about learning in general. And typically the way you scaffold it is you have students practice applying the skill, and then once they’re comfortable with that, then you get to go deeper into like, okay, let’s talk about why this rule is the way it is.

Typically, for calculus stuff, that’s the time when you go and derive all the formulas. It’s actually a later course in real analysis. So I’d also completely disagree with the idea that Math Academy doesn’t follow this. It’s not that we are forgoing teaching how to derive formulas. It’s that we are building math from the ground up, scaffolding it as well as we can. We haven’t gotten to the point of proving all these calculus formulas yet. That is typically done in a real analysis course. We don’t have a real analysis course yet, but when we do, guess what it’s going to cover? It’s going to cover all these proofs of calculus topics.

It’s the same way in every other subject too. Like in linear algebra, for example, sometimes people say, hey, well, your linear algebra course isn’t as rigorous as Axler’s Linear Algebra Done Right, which goes through a bunch of proofs of all these linear algebra things.

But here’s the thing: Linear Algebra Done Right, that book, is not an introductory course in linear algebra. That’s a second course in linear algebra. And we often joke in math that it should be called Linear Algebra Done a Second Time. It’s not that we disagree that students should learn that stuff, it’s just that we are building a curriculum from the bottom up. So we have to do the simpler skills first to get these chunks in students’ heads so that they can use those chunks to get the more advanced skills in their heads.

So that when it comes time to do the more complicated thing of deriving a formula, they’re not wondering, wait, what even is this formula? I’m not familiar with it. No, they should have that in their head, have an idea of how the formula works, and we should be at the point where all they need to focus their working memory on is the proof of it.

Additionally, they should also have proof experience beforehand, like in our Methods of Proof course, that would be a prerequisite, so we don’t have to be simultaneously trying to teach them what a proof is or what a derivation is. These sub-skills need to be in place. If you want to create a highly scaffolded learning environment that maximally supports the success of all students, even those with more or less advantageous working memory capacities, you have to build things up from the ground up with sub-skills.

We want our students to get to the point where they are able to apply these formulas, derive them, reason about them, but you have to do some things before other things. We’re going to get there. We’re going to get there, but we just haven’t written all the—we don’t have a full undergrad degree in math built out yet. We’re building things from the core outwards, the core underlying skills outwards.

Before we ask a student to derive a formula or prove a formula, we need the execution of the formula in place, and we need general proof or derivation skills in place too. Only once we have those skills in place can we layer the next skill on top.

Yeah, we’re going to get there. We’re coming for all of this. We’re not stopping at just the application of the formula. But it does happen that where we are right now is kind of your first two years of undergrad math. That’s kind of like the top of our knowledge graph that tends to be focused more on application of these formulas. Just going through, getting comfortable with what they mean, how to apply them in various scenarios.

When we get to the next couple layers of math, like upper undergrad math, abstract algebra, real analysis, and Axler’s Linear Algebra, which is really abstract linear algebra, we will be doing all of these proofs and derivations.

Another thing I want to point out here is that you can’t just jump to proofs and derivations. Even if you successfully pull off a proof or derivation of the formula, that does not imply that you have acquired all the knowledge that you would get from executing the formula and seeing it play out in various scenarios and applying it.

I think it’s a specific counterexample: You can do proofs with eigenvectors and eigenvalues. Just Av = λv. You can throw that around and prove it. You can manipulate symbols to get to a desired result and still have no idea how to actually compute an eigenvalue or an eigenvector and apply it to situations, like diagonalizing a matrix.

You may know that a matrix can be diagonalized, and it takes a certain form, and you may know a proof supporting that, but that does not automatically imply that you know how to actually diagonalize a given matrix. If I give you a 2x2 matrix and ask you, “How do you diagonalize it for me?” and you just say, “Well, I proved that this matrix can be diagonalized in my linear algebra class,” I’m going to say, “Well, you don’t know how to diagonalize it.” I don’t care if you know how to prove it. That’s great, but you have not learned the full skill.

In general, it’s a mistake to think about proving and derivation as the all-encompassing component of knowledge. Mastery comes down to, yes, proofs and derivation are part of it, but so is applying the formula and getting comfortable with how things work out. Being able to use it, these are all separate components of mastery. You can’t just do one of them and not do the other and claim that you’ve got full mastery over the topic. You need to do both. That’s what we’re going to shoot for. We’re going to do both.

There’s a relationship between these two components in that, if you want to scaffold things in a way that is easiest to follow for students, if you want to build up those chunks in students’ long-term memory before you throw them problems that would otherwise explode their working memories, you need to get the skills of using and being familiar with and applying the formulas in place before you go down to the derivations and the proofs.

Even if you think about the idea of the derivation or proof, this is like an abstract idea that is meant to encapsulate a lot of concrete examples. There’s this analogy I made a while ago: If you know how to do a bunch of math proofs and you haven’t had any experience actually executing these formulas or working with them in applied contexts, solving problems, but you know the most abstracted form of the idea, it’s kind of like your kid who has no life experience but picked up a book of life quotes and thinks that just because they’ve read all the life quotes, they know all there is about life.

Really, the quotes are meant to encapsulate a lot of life experiences. You don’t really understand the quote in full value until you have that zoo of concrete experiences that compress into this quote and this abstracted idea. That’s where the power of the abstracted idea comes from. It’s from compressing all your concrete examples, your experiences.

The same thing holds in math. If you don’t have a zoo of concrete computational examples and you try to make the jump to deriving a formula or proving something, pretty quickly you’re going to get to the point where you feel like you’re just pushing symbols around. You don’t actually know what’s going on. You can say things that sound smart, but do you actually feel it in your bones? What this means, and is this tool like an extension of you in the same way that an instrument is an extension of a professional musician? No.

You can’t just learn music theory and not actually practice composing songs, playing an instrument, the hands-on part of it, and call yourself a musician. It’s not enough to know where the tools come from. You also have to be able to use the tools. They’re separate skills. You gotta do it all.

Anyway, we’re going to be doing that at Math Academy. It takes time. It takes time to build a math curriculum spanning grade school through university level and beyond. I guess we’re going to continue to get this critique into the future until we actually have those courses in the system, but yeah, we’re going to have those courses in the system, and then it’ll be clear.

Last question for today. Math Academy follows the standard curriculum for schools and universities, but if you had to redesign the curriculum, what would you change? Are the groupings even good? How are they even initially decided?

Honestly, I think history has done a pretty good job of identifying core atomic topics in math, things that segment out pretty well. There’s linear equations, quadratic equations. These are two separate things. They’re not intermingled into the same. You can pick up a textbook, it has it divided out well, and that makes a lot of sense. The lines between topics have been drawn pretty well. Sometimes, we split up those lines even further. So, linear equations, we might split that up into linear equations, just x + a = b, and we might have another form of linear equation that’s like a * x = b. And then another form that’s like a * x + b = c. Maybe another form where x is on both sides of the equations. It might not be split out this well in a textbook, though oftentimes, it is split out to some degree.

I don’t think the splits are done incorrectly or anything. I think that’s about right. It’s just a matter of, oftentimes, more splits are needed. I think what the issue usually is is more about the sequencing of material. In a standard curriculum, you typically cover groups of related material for a week or longer at a time. Whereas, what we do at Math Academy is interleave through all the materials. Instead of covering a single unit at a time, we cover a single topic from each of a number of different units and do that over and over again.

I’ve written a lot about interleaving and mixed practice and how that is a more optimal way to go about things. In short, the thing is, when you practice a group of related material all the same within a cohesive timespan, you’re not being forced to switch context and pull the previous context afresh. You’re able to keep all the stuff in the front of your mind. It’s like you’re just getting into the rhythm of shooting three-pointers everywhere and you’re sinking them. You’re not made to go in for a layup and then come out the other side and then do a three-pointer, which is a harder thing to do because you have to… it breaks your rhythm.

The point is to break the rhythm because you want to learn the skills to the point where you’re not dependent on the rhythm being there in your head. You’re not dependent on having it at the front of your memory in order to recall it again. We want to get you to the point where you’re transitioning from doing something kind of unrelated, and now you’re forced to pull this information from your head again.

Say you’re in calculus. Normally, what happens in a calculus class is they’ll group the first several weeks of the course just doing limits, and that’s it. Maybe the first month or two of the course, just limits. When we do it, we have you doing limits, derivatives, sequence convergence. We have you doing a bunch of different things from the beginning.

Now, you might say, “Well, I was doing great at limits. I had it, and then you made me switch to doing derivatives. I lost my rhythm, and I forgot how to do the limits problems. Now, I have to refresh on that a little bit. They feel harder than before. You screwed up my learning.” But that’s not really screwing up your learning. That’s just exposing that you didn’t learn it as well as you thought in the first place. We are forcing you to practice an activity that is forcing you to learn it better.

If we don’t force you to recall this information—like not having it out of the front of your head—if we don’t force you to recall this information kind of more from scratch, then you’re not going to get good at doing it by recalling from scratch.

That’s the main difference. I think it’s not about the granularity of the chunks. I think they are chunked up in the right granularity in terms of schools and universities. It seems, but the sequencing is kind of wrong or suboptimal, at least. I mean, I understand that when you’re a textbook writer or an instructor, it makes life a lot easier, especially if you’re building out a core, if you’re teaching off the top of your head, or you have limited teaching resources. It makes it easier for you to go in batch mode and do all the limits first, then all the derivatives, then all the integrals, to have these cohesive units. But in terms of learning the material for the long term, that’s not the optimal way to go about it.

Now, I guess another kind of perspective on this question is, are there any topics that we would not include at all in our curriculum if we could make the choice to do so? And I would say that there are quite a few topics in the traditional math sequence that feel like they’re just there for the purpose of satisfying some standards. University math doesn’t really build upon them very much. Personally, I would say that they’re not really that necessary, but we kind of need them in there to check the box on traditional courses.

One of those types of topics would be inscribed angles in circles or a bunch of geometry like intercepted arcs. Yeah, those circle chord theorems, where you just never really build upon it unless you’re doing more geometry like that, but it just doesn’t really get built upon very much, or at all, I think, in university math. We actually identified—Alex went through and identified a ton of these topics that are kind of just there to check the box on standards. We made a separate track. We have our traditional and integrated math sequences that cover all the courses comprehensively to the standard, but we also have a math foundation sequence.

It’s meant for adult learners who want to shore up their foundational math, pre-university math, to the point where they can learn any university courses they want. We don’t include things that are just meant to check the box on standards, like those circle theorems from geometry. Alex went through and figured out all the topics we were able to remove. It turned out we were able to remove about a quarter or a third of the traditional curriculum for our math foundation sequence.

Personally, if you’re up to me, I probably wouldn’t really waste time on those topics if I were thinking about what would I change about the standard curriculum. I would say streamline it a little bit more and just get to the core foundational topics. After you’ve knocked out the most foundational topics, just move on to the next level of math. Don’t spend an extra 50% amount of time in a geometry course once you’ve already knocked out all the core material. Just move on to the next course. Get to the university level math. Get to the math that really ends up getting applied a ton in the real world. That’s probably what I would change.

One additional thing that comes to mind is that as we're building out our machine learning course, we're kind of realizing that there is a shortage of good curriculum for machine learning online, even at universities, even in textbooks.

The problem is, it seems like you’ve got extreme theoretical approaches to machine learning and you’ve got extreme applied approaches. There’s very little in between. You’ve got tutorials online that are like, “Hey, you want to learn machine learning? Just import TensorFlow and run this code. Now you’ve got a neural net. You’ve trained a neural net. Congrats. You’ve learned neural nets.”

No, no, no. You’ve learned how to implement a neural net, piggybacking off an existing library, but that’s not the full mastery of neural nets. At the same time, you’ve also got textbooks that are focused on like, maybe they’ll present the backpropagation algorithm for training a neural net. You’ll have to maybe implement that in code or maybe prove something about it. But it’s kind of missing this component of actually feeling the algorithm in your bones, going through and doing some by-hand examples.

Not overly tedious. I’m not saying you have to manually work out a hundred iterations of backpropagation in a five-layer neural net, but what I’m getting at is you need reps on these algorithms. You need to go through them and work them up by hand and really feel how all the nuts and bolts are moving together. Develop that sort of high level of automaticity, in your bones, by-heart intuition for the algorithm. Almost like you know how to multiply numbers for ad numbers. You can add numbers by stacking one on top of the other, doing the addition with carrying.

If you really want to know machine learning and want to know it well, you need to do some similar practice with the machine learning algorithms. That’s a component that is missed from other curricula. That’s one thing that we’re going to be bringing into our machine learning course. It’s going to be heavily centered on these by-hand problems. We’re going to have coding problems too, where you apply what you’ve learned to build a neural net, code it up, apply it to model, maybe do some image classification. We’re going to have tons of projects like that.

But what we’re going to do is make sure that by the time you get to the projects, everything is so scaffolded up. You’ve developed a high degree of mastery on all these component skills. You feel the neural net backpropagation algorithm in your bones. You know it’s just flowing through you. You’ve got it. When it comes time to code it up, you don’t have to think, “Wait, how does this work again? I forget how to even go about the algorithm. I forget how numbers flow through it.” No, you’ve gotten to the point where you know exactly how the numbers flow through it. You can focus on the implementation of it.

We’re going to make you do all that from scratch in Python, and you’re going to build your own neural net. You’re going to bridge from theory to application that way.

Now, things about, well, one thing that we’re not going to have in our machine learning course just yet is proving theorems about machine learning algorithms. Maybe we’ll have some really, really key results, like the universal approximation theorem, but in general, the field of machine learning—you can go super deep into theoretical machine learning and it’s still kind of an area of active research. Most people who sign up for a machine learning course, they don’t want to become pure theorists. They want to learn what these algorithms do, how they do what they do, how to code this up, and how to go apply this to do cool stuff.

That’s what we’re going to be focusing on. Of course, that leaves the door open to maybe additional courses in machine learning where we take a more theoretical treatment and have more of a focus on proofs. But it’d be a similar situation to what I talked about with calculus versus real analysis. Calculus is more about the application of calculus ideas, working through problems in concrete examples. Real analysis is more about the theoretical underpinnings of calculus, proving things.

Same way with linear algebra versus abstract linear algebra, or Axler’s Linear Algebra, which is really abstract linear algebra. It’s a second course in linear algebra. It would be a similar way with machine learning where, at least the courses we’re focused on right now, are the application of these things. Now, theoretical—I should say, real analysis, abstract linear algebra, we’re definitely going to have all those courses in the system at some point. We want to fill out the undergrad math curriculum. The undergrad math curriculum includes those courses—real analysis and abstract linear algebra. You go look at a top university’s math program and they’re going to have those kinds of courses. We want to match up our offerings with that.

But something like theoretical machine learning, that’s not a standard part of the undergrad curriculum. I don’t know if we’d get to that. It would all depend on market demand, whether people really want that.

Outro

Anyway, I think that wraps up this Q&A. Next time, I’ll rotate the categories. I think next time we’ll probably go over to coding. Was that the next highest? Yeah, that was the next highest. Next time, we’ll be coding.

All right, till next time.

Prompt

The following prompt was used to generate this transcript. The is was repeated twice for the first and second halves of the file (each about 35k characters).

You are a grammar cleaner. All you do is clean grammar, remove single filler words such as “yeah” and “like” and “so”, remove any phrases that are repeated consecutively verbatim, and make short paragraphs separated by empty lines. Do not change any word choice, or leave any information out. Do not summarize. Please clean the attached document and deliver it to me one section at a time. Again, do not summarize. It should be almost exactly verbatim.

After each section:

Next. Remember: You are a grammar cleaner. All you do is clean grammar, remove single filler words such as “yeah” and “like” and “so”, remove any phrases that are repeated consecutively verbatim, and make short paragraphs separated by empty lines. Do not change any word choice, or leave any information out. Do not summarize. It should be almost exactly verbatim.

Want to get notified about new posts? Join the mailing list and follow on X/Twitter.