Q&A #1: WM taxation, ML ETA, catching errors, coding tutorials, math vs calisthenics, foundations
Cross-posted from here.
Want to get notified about new posts? Join the mailing list and follow on X/Twitter.
Intro
All right, I said I would do some Q&A, so here I am for some Q&A.
First question, how do you recommend recharging your working memory? In Math Academy, it's been two days now and I can't do half an hour of math before my working memory gets blown out.
I guess personally, whenever I’m working on something that is just extremely cognitively taxing, I sometimes have to take breaks. Maybe it’ll just be half an hour before I’m just like, my brain is like mush on the ground falling out of my ears, and I just have to switch to something else that’s a little less taxing and then come back at it maybe like half an hour later or even just a 10-minute break. Sometimes helps a lot.
Sometimes you just can’t keep up. It’s like you’re sprinting mentally, right? You’re going through these sprints and you can only sprint for so long before your legs just give out, or in this case, your brain. But then you just do a little rest and then you get back to it. I think you can kind of—or at least if this is happening to you and it’s your first time experiencing this—you can build up some more stamina by just continually doing these kinds of intense sessions. Go as far as you can, take a little break, come back after and do a little more, and then just try to keep that pace up each day.
I’m not saying to push yourself to the point that you’re miserable, and I’m not saying go for six half-hour sessions in a single day. That’s a little excessive. But if you want to do an hour of math a day, then maybe just split that up into a morning session, an evening session, or like a morning session, have some coffee, and then work on something else less taxing. Then come back for another half hour or even a 15-minute session after that. Breaking it up and working on less taxing things in between helps.
I do that all the time with what I’m working on too. I’ll work on something particularly mentally taxing, notice myself getting kind of tired, then I’ll switch to something else. Luckily, I’m in a position where I have a variety of tasks with different amounts of cognitive load in each. I can just switch back and forth between all these different important things I need to do.
My final answer would be just take a break once you get to the point where you’re not being productive anymore because your working memory is blown. Take a break, come back later, and maybe you can build up a little more stamina. But everyone hits a point where you can’t spend six hours a day doing something as taxing as math at the edge of your ability. Think of it like a sprinter sprinting for six hours a day. It doesn’t really work like that. It’s a smaller dose you can handle, then you just take breaks and come back.
Do you plan to introduce dark mode on the Academy? Many would love it.
I’m not really the one who does the UI stuff or the application level stuff. I do the quant, AI, and data processing, but Jason does the UI stuff. As my understanding is, we will get to it eventually, but we have so many other big fish to fry at the moment that dark mode is kind of a nice-to-have.
There are things like even bigger rough edges we need to focus on rounding out in the system. For instance, when you sign up, there’s so much stuff in the onboarding process. When you sign up on the system and you have a diagnostic in front of you, you’re supposed to take the diagnostic. Sometimes people get confused and think, “Where are my lessons? What is this?” This is a diagnostic. Do I have to take the diagnostic? Then they go to the table of contents of the course, looking around, skimming through lessons, wondering why they’re not getting XP, because they’re not even going to an actual task. They’re just looking at lessons in a reference mode.
That’s just an example of something related to the UI that we need to focus on first. Again, dark mode, it’s not that it wouldn’t be valuable. Many people would love it. But in terms of where people are falling off the rails in the system, it’s just not as big of a priority right now. But I’m pretty sure there are web extensions where you can just invert the colors.
I’ve seen some Math Academy students, back when I was teaching, using a dark mode filter on their browser, essentially creating Math Academy dark mode. It would look a little janky, but it wouldn’t be too bright. I think someone mentioned a Firefox dark mode extension. There you go.
Any ETA on when the machine learning course at Math Academy will be available?
There is a date we’re shooting for. Now, don’t take this as a promise. This is just a target we’ve set internally. Don’t get mad at me or Alex or anyone if it ends up taking a little longer. What we’re shooting for with the machine learning courses is the end of February.
This came from when we first talked about the course. Jason was saying, “We have to get this course out really fast. We have to put serious manpower behind it. We have to prioritize this course because people are so excited about it, and we need to really pick up the pace on content development.” At first, end of February seemed like an impossible task. But Alex has some additional people on the content development team now, and I also jumped on the machine learning course—not just sketching out the course table of contents, but actually working on mapping out individual lessons and multi-step tasks. We’re making pretty good progress.
Based on the progress we’ve made so far, we’re still shooting for the end of February. It seems like it may be achievable. But this isn’t going to take a year to develop. It will take several months.
Curious how long you can stay in a single demanding task?
That kind of depends on how demanding the task is. I think it’s a sliding scale. When I am doing something that is extremely taxing, maybe that’s only half an hour or so. If it feels like working memory training almost.
There was one time—this doesn’t happen that often when coding—but sometimes there’s a complicated bug or a problem you’re trying to solve, and it gets a lot more complicated than you thought it was going to be. Maybe you haven’t built up the right abstractions to deal with the problem, and you’re just trying to see if you can solve it quickly and move on to other things you need to do. You end up creating a situation for yourself where it’s more demanding than it needs to be.
I’ve been in that situation before, and yeah, I don’t know, maybe half an hour for the extremely demanding situations. But I generally notice that it’s too demanding when I haven’t built up the right constructs to think about what I’m trying to do. It usually requires restructuring some form of data or functions or cleaning up the code so that it’s less demanding.
I don’t think I’m ever really faced with a situation where I can only do half an hour of a task and that’s it. I might try solving a task initially and then notice it’s too demanding. Then I’ll build up better constructs to solve it. If it’s a coding task, it becomes less demanding on the brain, though it takes a little longer to build those constructs. That’s kind of how it goes in coding.
I can’t really come up with an example where I would be cornered into a demanding task I can’t escape. Oh, okay, I got one. Setting and compassing weights for topics in the system. Anytime a new topic comes out, I have to set weights representing how much implicit credit a student should get for other prerequisite topics.
Sometimes it’s obvious. For example, a two-step linear equation completely encompasses a one-step linear equation. That’s just a weight of 100%. But sometimes it gets fuzzier. You’re practicing some of the prerequisite skills but not all cases, or the prerequisite isn’t fully rendered, or the student doesn’t have to review it at all if they do the more advanced topic. They get some amount of implicit credit, but not 100%.
I have to make a decision on all these percentages. Sometimes it’s 100%, sometimes 0%, sometimes 50%, 25%, or 10%. That takes a lot of cognitive effort, especially for proof topics when I have to figure out what tools are being used. In algebra, it’s easier to tell. For example, solving a quadratic equation by factoring gives you 100% credit on factoring.
How long could I stay on that? I’d say I could do maybe one hour at a time, take a little break, work on something else, then come back for another hour. It’d be hard to do more than four to six hours total in a day. That seems to be the range for serious deliberate practice. If you think about an athlete or a musician, the capacity for practice is around four to six hours a day.
I've been using Math Academy for linear algebra as a supplement for my class. It's nice being able to go back and practice specific topics. Although I wonder if there is any approach to reducing computational errors.
Some things off the top of my head: you can often double check a math problem. If you solve it and get a solution, you can do some sanity checks on the solution to make sure it makes sense. For instance, if you’re solving an equation and get a result, just plug it back into the equation and see if it checks out. You can extend that to higher levels of math as well, beyond just simple algebra.
In calculus, for example, if you evaluate an integral that is supposed to give a positive area as a result but you get a negative result, you know you didn’t do that right. You can go back and see what went wrong.
In linear algebra, there are similar things. If you’re solving an equation, you can plug it back in and check. If you’re factoring a matrix, just multiply a couple of rows—do a couple of row-column multiplications of your factored form to make sure it comes back out to the original matrix. If you’re solving for an eigenvalue-eigenvector pair, you can easily double check that. Just multiply the matrix by the eigenvalue and then multiply the matrix by the eigenvector, then make sure it comes out to the eigenvalue times the eigenvector.
You don’t have to check all the components. Maybe just check the first component of the vector and see if it matches up as desired. Top row of the matrix times the vector equals your eigenvalue times the vector. Just little sanity checks like that. You don’t always have to go through the full spot check, but sometimes you can just do a minimal amount of work to run a sanity check.
Another thing is, if you’re tired in the day, that will definitely inflight computational errors. When I’m a little tired right now and calling eigenvalues eigenvectors, I’m kind of spaced out. The same thing happens with math. It’s easy to mix up some symbols when you’re tired. You might skip a step in the problem.
Does your brain physically hurt like mine does after a day of doing hard things?
I don’t really get headaches very much. What I do get are symptoms almost like I have a cold, but not a real cold. I get watery eyes and a runny nose. It’s not like I’m actually sick. I just feel kind of run down, with watery eyes and a runny nose—almost like severe allergy symptoms. It’s kind of weird.
Would I prefer that or headaches? I don’t know. When I get cold-like symptoms after pushing myself too far or not getting enough sleep, I can still power through it. I don’t know if you can do that with a headache. In my experience, the few times I have gotten headaches, they’ve been more of an impediment.
Short answer, no, my brain doesn’t hurt like that. I do get run down physically.
How do you think we should improve coding tutorials? What should be done instead?
The broader context around this was a post about how I was saying it’s hard to find hands-on examples stepping through actual computations in machine learning resources.
Coding tutorials typically say import this function, run it. Math tutorials typically say this is the form of the model, you can fit it using the usual techniques, and then leave it to the reader to figure out the rest. I don’t mean to say that coding tutorials necessarily have a problem with what they’re trying to accomplish. A lot of coding tutorials are just trying to show the syntax for how to run something in your code.
What I’m trying to say is that if you want to learn machine learning and step through some actual computations to see how some algorithm or update works under the hood, it’s hard to find those resources online. Do those resources have to come from coding tutorials? Probably not. The coding tutorial has a different goal: one is to teach you a library, another is to teach you how an algorithm or model works.
Ultimately, there’s just a lack of resources for people who want to learn the underlying knowledge of how various models and algorithms work behind the code. There are two ends of the spectrum: there’s the math purist, who says let’s just write down the optimization problem and prove abstract results about it, and there’s the code implementation perspective, which is how to use a library. But it’s hard to find that third element: how do we go through a concrete example to gain a concrete understanding of what’s happening in the tutorial?
The reason simple computational examples are so rare is that they take a lot of work to put together. The amount of time you’d have to spend to write a tutorial or math book that goes through numerous concrete examples for particular algorithms or models would be immense.
Next question, hosting a podcast.
I guess we’ll see what this evolves into. I feel like I’m better on a podcast when there’s another person, an actual person who’s conversing in real time. But I guess we’ll see how the solo stuff goes, how solo Q&A evolves.
I originally tried to do this on Studio, the broadcast thing, but for whatever reason, I was having trouble actually recording a broadcast. So I’m just going to post this as a video separately and maybe try a broadcast again in the future once things are ironed out a little bit.
Next question, any tips for how you would apply hierarchical skill acquisition for other topics outside of math? How does it differ from your approach to calisthenics? How would you go about completing a project when your base level knowledge is low? What do you aim for, focusing on the foundation or trying to build the V1? Learn appropriate details, fill in the leaves later?
Let me just go through these one at a time. Any tips for how you would apply hierarchical skill acquisition outside of math? I think everything maps over pretty well in terms of getting your prerequisites in place and interleaving through.
In sports, for example, you’re not going to try to do a double backflip until you can do a single backflip. That’s a prerequisite. Do your single backflip, then you can do a double backflip. By the way, I’m just thinking about gymnastics. I can’t do a double backflip. I can’t even do a single backflip.
But I would say it maps over well. If you follow mastery learning, get your prerequisites in place, and interleave, it makes sense. For example, in basketball, it’s not productive to shoot a three-pointer from the same spot over and over again. You need to vary where you’re shooting from. If you only practice shooting the same shot over and over, you might settle into a rhythm where you think you’re much better than you are. When you play in a game, the context is different, and you might miss a shot you thought you’d make.
If you practice more varied, mixed practice, you’re creating a more robust representation of the skill in your mind. You’re able to execute under less favorable or consistent conditions. But when you’re first learning a skill, you want to keep the conditions consistent. That way, you can overcome the task and improve. Then you start raising the bar by interleaving.
Space repetition is also important. If you don’t practice your shots or backflip or instrument, you’ll get rusty.
One thing I will say about my approach to calisthenics is that I’m not really training in the most productive way if I wanted to get to a high level of skill recognized by the calisthenics community. I kind of just focus on the exercises I enjoy most, and I do best on them. It’s just a gamified form of exercise to me, so I’m not taking it seriously enough to optimize my progression.
I’m not trying to maximize my improvement or do the kind of work we do at Math Academy, where we seriously optimize every step of the learning process. I’m not doing that with calisthenics. I would be a lot better at it if I actually put in the work to optimize it, but it’s beyond the scope of what I want calisthenics to be.
How would you go about completing a project when your base level knowledge is low? Get enough knowledge to make a dent in it.
I would start trying to do the project and figure out what’s missing. If I can’t do things XYZ, I need to learn the background knowledge to handle them and then fill in that background knowledge.
However, there are other projects I wouldn’t do because I have very low foundational background knowledge. For instance, front-end development. I don’t know much about it. If I were tasked with creating a UI for a system, I’d probably just go through a more scaffolded course or project to learn the basics of front-end development.
The backfilling of prerequisites depends on having enough foundational knowledge so you’re not backfilling too far. If you lack a lot of foundational knowledge, things get fuzzier. You may end up missing prerequisites that would make your life easier without even realizing it. If you’re missing a ton of foundational knowledge, you can go through a comprehensive course or scaffolded project to fill in those gaps before returning to your original project.
How would you go about completing a project when your base level knowledge is low? I would aim to focus on the foundation. You need enough foundational knowledge to get you closer to a position where you can fill in the leaves later.
In machine learning, for example, if you don’t know algebra, that’s a problem. You won’t be able to successfully take a machine learning paper off the shelf and understand it. The paper may reference linear algebra, calculus, probability, statistics, and ultimately algebra. Everything in machine learning depends on algebra. Even if the paper doesn’t use quadratic equations, algebra is baked into the prerequisites somewhere you don’t see.
You can get a surface level understanding of these high-level things, but when you need to have a concrete mental picture of them, all those low-level skills come into play. It’s often the difference between someone who has the foundational knowledge and someone who doesn’t. Someone with the foundation will look at the paper and say, “Oh, obviously you do blah, blah, blah,” because they understand the underlying math. Meanwhile, someone without that knowledge might think the other person is a genius, but it’s often that they have the foundational knowledge.
I would start with the foundation and then fill in the leaves later. That said, it’s possible to spend too much time building up the foundation. It’s also easy to overestimate how much foundation you need before jumping into a project. I think that’s another failure mode.
I’m not saying you should go get a PhD in math before doing machine learning. That would be an overcorrection in the opposite direction. For machine learning, you mainly need linear algebra, probability, statistics, and calculus. Once you have that, you can take a classical machine learning course and then dive into the latest methods.
Some people pick up a 1,000-plus page PDF with abstract algebra and think they need to learn all of it before diving into machine learning. I don’t think that’s necessary. There’s a middle ground. You need a solid foundation, but you don’t need to overestimate how much foundation you need.
One way to figure out how much foundation is enough is to look online. If you search “How much math do I need to know for machine learning?” and check several articles or Reddit posts, you’ll get a good sense of the basics—linear algebra, calculus, probability, and statistics. Once you know that, you’re ready to take a machine learning course.
That’s a general strategy. Whatever you want to learn, whatever your goal is, just go online and check a number of different articles from various sources. Average the results to get a lay of the land.
It can also help to identify the goal you’re working toward. If it’s learning a specific machine learning paper, for example, learn some foundational math and see how well suited you are to taking another stab at the paper. If you can read it, or at least keep up with it without feeling overwhelmed, then you’re probably at a stage where you have enough foundation.
If the paper still feels overwhelming, identify which math is being used that you don’t understand, and go search for a more comprehensive course that covers that material. Go through it, then come back to the paper. Rinse and repeat—see what else you’re missing.
Outro
I think that’s about it. Those are all the other questions for this week. When I post this video, if there are other questions or follow-up questions, or if I didn’t address some part of a question fully, you can put them as comments on the video post. I’ll do another Q&A in a week or two.
Prompt
The following prompt was used to generate this transcript.
You are a grammar cleaner. All you do is clean grammar, remove single filler words such as “yeah” and “like” and “so”, remove any phrases that are repeated consecutively verbatim, and make short paragraphs separated by empty lines. Do not change any word choice, or leave any information out. Do not summarize. Please clean the attached document and deliver it to me one section at a time. Again, do not summarize. It should be almost exactly verbatim.
After each section:
Next. Remember: You are a grammar cleaner. All you do is clean grammar, remove single filler words such as “yeah” and “like” and “so”, remove any phrases that are repeated consecutively verbatim, and make short paragraphs separated by empty lines. Do not change any word choice, or leave any information out. Do not summarize. It should be almost exactly verbatim.
Want to get notified about new posts? Join the mailing list and follow on X/Twitter.