It’s Memorization All The Way Down
At the end of the day all learning is memorization. Read more...
At the end of the day all learning is memorization. Read more...
Appreciation of mathematical beauty gets held up on too high a pedestal as the “correct” source of motivation in math learning. Read more...
And the problem with many existing times tables practice systems. Read more...
Myth 1: Understanding amounts to something other than memory. Myth 2: Sudents can perform high-level skills without mastering low-level component skills. Read more...
Coding tutorials typically just say “import this function then run it,” and the math tutorials typically just say “this is the form of the model, you can fit it using the usual techniques” and leave it to the reader to figure out the rest. Read more...
If you can scaffold the content so well that it creates a smooth, efficient learning experience for knucklehead kids, it’s going to feel even smoother for more conscientious adults. Read more...
Specific areas of friction that cause students to struggle with math. What needs to be done to remove friction from the learning process. Why friction remains so prevalent. Read more...
At the end of the day, whether or not they know math comes down to whether or not they can apply techniques within that well-defined body of knowledge to solve problems within that well-defined body of knowledge. Read more...
Enter grades early on, and (if pre-college) email parents early on. Read more...
If you go directly to the most abstract ideas then you’re basically like a kid who reads a book of famous quotes about life and thinks they understand everything about life by way of those quotes. Read more...
… an infinitely tall ladder where the rungs get spaced further and further apart the higher you climb. Read more...
… is reducing friction in the learning process. Read more...
Depending on your goals, either A) methods of proof, or B) linear algebra followed by probability & statistics. Read more...
It’s helpful to loosely understand what something means before memorizing it, but this does not have to be a rigorous derivation. Read more...
It’s really just “loading” the info into temporary storage – like picking up a weight off the rack, whereas learning is increasing your ability to lift said weight. Read more...
1) Confusing “conceptually simple” with “notationally compact”, and 2) jumping to the most general method right away. Read more...
The 3 types of problems that I would have students work out back when I was teaching ML. Read more...
When an algorithm or process feels magical, that’s typically an indication you don’t really understand what’s happening under the hood. Read more...
There are many studies demonstrating a benefit of some component of deliberate practice, but these studies often get mislabeled or misinterpreted as demonstrating the full benefit of true deliberate practice. The field of education is particularly susceptible to this issue because it is impossible for a teacher with a classroom of students to provide a true deliberate practice experience without assistive technology that perfectly emulates the one-on-one pedagogical decisions that an expert tutor would make for each individual student. Read more...
I was coming in with the mindset of “we need to cover the superset of all the content covered in the major textbooks,” which we’re able to do quite well for traditional math. For ML, the rule will have to be amended to “we need to cover the superset of all the content covered in standard university course syllabi.” Read more...
A little rhyme to understand the big picture of top-down vs bottom-up learning, particularly in the context of machine learning (ML). Read more...
Pictures can help build mathematical intuition, but sometimes learners think they should fully visualize every single problem they solve, which actually handicaps their thinking. Math involves generalizing patterns in logically consistent ways, and the generalizations eventually go beyond what you can fully picture in your head. Read more...
When students do the mathematical equivalent of playing kickball during class, and then are expected to do the mathematical equivalent of a backflip at the end of the year, it’s easy to see how struggle and general negative feelings can arise. Read more...
1) Don’t use projects as a way to acquire fundamental skills. 2) Make sure the projects are guided. 3) Don’t let the projects cut too much into your foundational skill-building. Read more...
The habit is a psychological force field that protects you from all sorts of negative feelings that try to dissuade you from training. Read more...
You get to provide value that nobody else can, and you get recognized for it. Read more...
If you try to keep information close by taking great notes that you can reference all the time… that just PREVENTS you from truly retaining it. Read more...
every individual student is actively engaged on every piece of material to be learned. Read more...
If any student, anywhere, is looking for advice on how to prepare for a standardized math test, then this is everything I’d tell them. Read more...
It’s the tragedy of the commons. Read more...
A comment to page 165 of Jo Boaler’s new book Math-ish Read more...
1) Foundational math. 2) Classical machine learning. 3) Deep learning. 4) Cutting-edge machine learning. Read more...
You gotta develop automaticity on low-level skills in order to free up mental resources for higher-level thinking! Read more...
… is to not overwhelm them. In my experience, students naturally enjoy math when it doesn’t feel overwhelmingly difficult to learn. Read more...
It can be helpful to take a top-down approach in planning out your overarching learning goals, but the learning itself has to occur bottom-up. Read more...
Effective explicit instruction is all about clarity, and breaking down information, and minimizing the load on working memory. Read more...
1) The information must have already been written to memory. 2) The information must be retrieved from memory, unassisted. Read more...
I think optimal motivation requires a balance of both intrinsic and extrinsic factors. Read more...
Nobody who knows the science of learning is actually debating this. Read more...
The amount of practice should be determined on the basis of each student’s individual performance on each individual topic. Some students may end up having to do more work, but this ultimately empowers them to learn and continue learning into the future. Read more...
There is an asymmetric tradeoff between 1) blowing your working memory capacity and leaving yourself unable to make progress, versus 2) wasting a couple extra seconds writing down a bit more work than you need to. When in doubt, write it out. Read more...
I can think of 4 possible sources. Read more...
With the science of learning, it’s less about “keeping up” with what’s happening, and more about “catching up” with what’s already happened. Read more...
Most people can tell when their practice is too easy, but what about when your tasks are too hard? That’s often less obvious. Read more...
Accumulating mathematical knowledge gaps can lead students to reach a tipping point where further learning becomes overwhelming, ultimately causing them to abandon math entirely. Read more...
The only way to argue against the existence of learning loss and grade inflation is to argue against the very idea of measuring learning objectively (i.e., radical constructivism). Read more...
You haven’t learned unless you’re able to consistently reproduce the information you consumed and use it to solve problems. Read more...
The hard truth is that if you want to build a serious educational product, you can’t be afraid to charge money for it. You can’t back yourself into a corner where you depend on a massive userbase. Why? Because most people are not serious about learning, and if you depend on a massive base of unserious learners, then you have to employ ineffective learning strategies that do not repel unserious students. Which makes your product suck. Read more...
The underlying principle that it all boils down to is deliberate practice. Read more...
When students are not given the opportunity to learn math seriously, and are instead presented with watered-down courses and told that they’re doing a great job, they’re being set up for failure later in life when it matters most. Read more...
Math gets hard for different students at different levels. If you don’t have worked examples to help carry you through once math becomes hard for you, then every problem basically blows up into a “research project” for you. Sometimes people advocate for unguided struggle as a way to improve general problem-solving ability, but this idea lacks empirical support. Worked examples won’t prevent you from developing deep understanding (actually, it’s the opposite: worked examples can help you quickly layer on more skills, which forces a structural integrity in the lower levels of your knowledge). Even if you decide against using worked examples for now, continually re-evaluate to make sure you’re getting enough productive training volume. Read more...
Research mathematicians are like professional athletes. Read more...
First, you need extensive and solid content knowledge. Then, you need to work through tons of practice exams for the specific exam you’re taking. This might sound simple, but every year, countless people manage to screw it up. Read more...
“…[D]eliberate practice requires effort and is not inherently enjoyable. Individuals are motivated to practice because practice improves performance.” Read more...
Long-term learning is represented by the creation of strategic electrical wiring between neurons. Read more...
Learning math with little computation is like learning basketball with little practice on dribbling & ball handling techniques. Read more...
Research indicates the best way to improve your problem-solving ability in any domain is simply by acquiring more foundational skills in that domain. The way you increase your ability to make mental leaps is not actually by jumping farther, but rather, by building bridges that reduce the distance you need to jump. Yet, higher math textbooks & courses seem to focus on trying to train jumping distance instead of bridge-building. Read more...
I learned from those kinds of resources myself, and while I came a long way, for the amount of effort I put into learning, I could have gone a lot further if my time were used more efficiently. That’s the problem that Math Academy solves. Read more...
Challenge problems are not a good use of time until you’ve developed the foundational skills that are necessary to grapple with these problems in a productive and timely fashion. Read more...
If you start to flail (or, more subtly, doubt yourself and lose interest) after jumping into ML without a baseline level of foundational knowledge, then you need to put your ego aside and re-allocate your time into shoring up your foundations. Read more...
If you understand the interplay between working memory and long-term memory, then then you can actually derive – from first principles – the methods of effective teaching. Read more...
Hard-coding explanations feels tedious, takes a lot of work, and isn’t “sexy” like an AI that generates responses from scratch – but at least it’s not a pipe dream. It’s a practical solution that lets you move on to other components of the AI that are just as important. Read more...
An idea for a paper that I don’t currently have the bandwidth to write. Read more...
If all the knowledge you show up with is high school math and AP Calculus, and you’re not a genius, then you’re going to get your ass handed to you. Read more...
When you’re developing skills at peak efficiency, you are maximizing the difficulty of your training tasks subject to the constraint that you end up successfully overcoming those difficulties in a timely manner. Read more...
It’s the act of successfully retrieving fuzzy memory, not clear memory, that extends the memory duration. Read more...
To transfer information into long-term memory, you need to practice retrieving it without assistance. Read more...
There’s a cognitive principle behind this: associative interference, the phenomenon that conceptually related pieces of knowledge can interfere with each other’s recall. Read more...
Learning is the incremental gain in your ability to perform a tangible, reproducible skill. Read more...
Sure, accelerating via self-study not as optimal as accelerating within teacher-managed courses, but it’s way better than not accelerating at all. Read more...
It’s actually the opposite – to get students actively retrieving information from memory, while minimizing their cognitive load. Read more...
There are numerous cognitive learning strategies that 1) can be used to massively improve learning, 2) have been reproduced so many times they might as well be laws of physics, and 3) connect all the way down to the mechanics of what’s going on in the brain. Read more...
Solving equations feels smooth when basic arithmetic is automatic – it’s like moving puzzle pieces around, and you just need to identify how they fit together. But without automaticity on basic arithmetic, each puzzle piece is a heavy weight. You struggle to move them at all, much less figure out where they’re supposed to go. Read more...
It highlights the aversion that people have to doing hard things. People will do unbelievable mental gymnastics to convince themselves that doing an easy, enjoyable thing that is unrelated to their supposed goal somehow moves the needle more than doing a hard, unpleasant thing that is directly related to said goal. Read more...
In general, when you feel yourself running up against a ceiling in life, the solution is typically to pivot and into a direction where the ceiling is higher. Read more...
But in talent development, the optimization problem is clear: an individual’s performance is to be maximized, so the methods used during practice are those that most efficiently convert effort into performance improvements. Read more...
No matter what skill is being trained, improving performance is always an effortful process. Read more...
By periodically revisiting content, a spiral curriculum periodically restores forgotten knowledge and leverages the spacing effect to slow the decay of that knowledge. Spaced repetition takes this line of thought to its fullest extent by fully optimizing the review process. Read more...
The strongest people lift weights heavy enough to make them feel weak. Read more...
While there is plenty of room for teachers to make better use of cognitive learning strategies in the classroom, teachers are victims of circumstance in a profession lacking effective accountability and incentive structures, and the end result is that students continue to receive mediocre educational experiences. Given a sufficient degree of accountability and incentives, there is no law of physics preventing a teacher from putting forth the work needed to deliver an optimal learning experience to a single student. However, in the absence of technology, it is impossible for a single human teacher to deliver an optimal learning experience to a classroom of many students with heterogeneous knowledge profiles, each of whom needs to work on different types of problems and receive immediate feedback on each of their attempts. This is why technology is necessary. Read more...
Gamification, integrating game-like elements into learning environments, proves effective in increasing student learning, engagement, and enjoyment. Read more...
The testing effect (or the retrieval practice effect) emphasizes that recalling information from memory, rather than repeated reading, enhances learning. It can be combined with spaced repetition to produce an even more potent learning technique known as spaced retrieval practice. Read more...
Interleaving (or mixed practice) involves spreading minimal effective doses of practice across various skills, in contrast to blocked practice, which involves extensive consecutive repetition of a single skill. Blocked practice can give a false sense of mastery and fluency because it allows students to settle into a robotic rhythm of mindlessly applying one type of solution to one type of problem. Interleaving, on the other hand, creates a “desirable difficulty” that promotes vastly superior retention and generalization, making it a more effective review strategy. But despite its proven efficacy, interleaving faces resistance in classrooms due to a preference for practice that feels easier and appears to produce immediate performance gains, even if those performance gains quickly vanish afterwards and do not carry over to test performance. Read more...
When reviews are spaced out or distributed over multiple sessions (as opposed to being crammed or massed into a single session), memory is not only restored, but also further consolidated into long-term storage, which slows its decay. This is known as the spacing effect. A profound consequence of the spacing effect is that the more reviews are completed (with appropriate spacing), the longer the memory will be retained, and the longer one can wait until the next review is needed. This observation gives rise to a systematic method for reviewing previously-learned material called spaced repetition (or distributed practice). A repetition is a successful review at the appropriate time. Read more...
Layering is the act of continually building on top of existing knowledge – that is, continually acquiring new knowledge that exercises prerequisite or component knowledge. This causes existing knowledge to become more ingrained, organized, and deeply understood, thereby increasing the structural integrity of a student’s knowledge base and making it easier to assimilate new knowledge. Read more...
Associative interference occurs when related knowledge interferes with recall. It is more likely to occur when highly related pieces of knowledge are learned simultaneously or in close succession. However, the effects of interference can be mitigated by teaching dissimilar concepts simultaneously and spacing out related pieces of knowledge over time. Read more...
Automaticity is the ability to perform low-level skills without conscious effort. Analogous to a basketball player effortlessly dribbling while strategizing, automaticity allows individuals to avoid spending limited cognitive resources on low-level tasks and instead devote those cognitive resources to higher-order reasoning. In this way, automaticity is the gateway to expertise, creativity, and general academic success. However, insufficient automaticity, particularly in basic skills, inflates the cognitive load of tasks, making it exceedingly difficult for students to learn and perform. Read more...
Different students have different working memory capacities. When the cognitive load of a learning task exceeds a student’s working memory capacity, the student experiences cognitive overload and is not able to complete the task. Read more...
Mastery learning is a strategy in which students demonstrate proficiency on prerequisites before advancing. While even loose approximations of mastery learning have been shown to produce massive gains in student learning, mastery learning faces limited adoption due to clashing with traditional teaching methods and placing increased demands on educators. True mastery learning at a fully granular level requires fully individualized instruction and is only attainable through one-on-one tutoring. Read more...
Deliberate practice is the most effective form of active learning. It consists of individualized training activities specially chosen to improve specific aspects of a student’s performance through repetition and successive refinement. It is the opposite of mindless repetition. The amount of deliberate practice has been shown to be one of the most prominent underlying factors responsible for individual differences in performance across numerous fields, even among highly talented elite performers. Deliberate practice demands effort and intensity, and may be discomforting, but its long-term commitment compounds incremental improvements, leading to expertise. Read more...
Active learning leads to more neural activation than passive learning. Automaticity involves developing strategic neural connections that reduce the amount of effort that the brain has to expend to activate patterns of neurons. Read more...
During practice, the elite skaters were over 6 times more active than passive, while non-competitive skaters were nearly as passive as they were active. Read more...
A startup spent months building a sophisticated lecture tool and raising over half a million dollars in investments – but after observing students in the lecture hall, they completely abandoned the product and called up their investors to return the money. Read more...
True active learning requires every individual student to be actively engaged on every piece of the material to be learned. Read more...
Six weeks of pure review and six official practice exams. Read more...
It’s easier to run into roadblocks, but also easier to maintain what you’ve learned. Read more...
Passive consumption. Lack of depth. Lack of rigorous assessments. Failing upwards. Lack of skill development. Read more...
It’s like going to the gym without a solid workout plan in place. Read more...
If you know your single-variable calculus, then it’s about 70 hours on Math Academy. Read more...
Not everybody can learn every level of math, but most people can learn the basics. In practice, however, few people actually reach their full mathematical potential because they get knocked off course early on by factors such as missing foundations, ineffective practice habits, inability or unwillingness to engage in additional practice, or lack of motivation. Read more...
Learning math early guards you against numerous academic risks and opens all kinds of doors to career opportunities. Read more...
Effortful processes like testing, repetition, and computation are essential parts of effective learning, and competition is often helpful. Read more...
The most effective learning techniques require substantial cognitive effort from students and typically do not emulate what experts do in the professional workplace. Direct instruction is necessary to maximize student learning, whereas unguided instruction and group projects are typically very inefficient. Read more...
Different people generally have different working memory capacities and learn at different rates, but people do not actually learn better in their preferred “learning style.” Instead, different people need the same form of practice but in different amounts. Read more...
Students and teachers are often not aligned with the goal of maximizing learning, which means that in the absence of accountability and incentives, classrooms are pulled towards a state of mediocrity. Accountability and incentives are typically absent in education, which leads to a “tragedy of the commons” situation where students pass courses (often with high grades) despite severely lacking knowledge of the content. Read more...
In terms of improving educational outcomes, science is not where the bottleneck is. The bottleneck is in practice. The science of learning has advanced significantly over the past century, yet the practice of education has barely changed. Read more...
Cognition involves the flow of information through sensory, working, and long-term memory banks in the brain. Sensory memory temporarily holds raw data, working memory manipulates and organizes information, and long-term memory stores it indefinitely by creating strategic electrical wiring between neurons. Learning amounts to increasing the quantity, depth, retrievability, and generalizability of concepts and skills in a student’s long-term memory. Limited working memory capacity creates a bottleneck in the transfer of information into long-term memory, but cognitive learning strategies can be used to mitigate the effects of this bottleneck. Read more...
Talent development is not only different from schooling, but in many cases completely orthogonal to schooling. Read more...
The average tutored student performed better than 98% of students in the traditional class. Read more...
Why it’s common for students to pass courses despite severely lacking knowledge of the content. Read more...
Spaced repetition is complicated in hierarchical bodies of knowledge, like mathematics, because repetitions on advanced topics should “trickle down” to update the repetition schedules of simpler topics that are implicitly practiced (while being discounted appropriately since these repetitions are often too early to count for full credit towards the next repetition). However, I developed a model of Fractional Implicit Repetition (FIRe) that not only accounts for implicit “trickle-down” repetitions but also minimizes the number of reviews by choosing reviews whose implicit repetitions “knock out” other due reviews (like dominos), and calibrates the speed of the spaced repetition process to each individual student on each individual topic (student ability and topic difficulty are competing factors). Read more...
Myth 1: Understanding amounts to something other than memory. Myth 2: Sudents can perform high-level skills without mastering low-level component skills. Read more...
… an infinitely tall ladder where the rungs get spaced further and further apart the higher you climb. Read more...
… is reducing friction in the learning process. Read more...
The 3 types of problems that I would have students work out back when I was teaching ML. Read more...
When an algorithm or process feels magical, that’s typically an indication you don’t really understand what’s happening under the hood. Read more...
I was coming in with the mindset of “we need to cover the superset of all the content covered in the major textbooks,” which we’re able to do quite well for traditional math. For ML, the rule will have to be amended to “we need to cover the superset of all the content covered in standard university course syllabi.” Read more...
If any student, anywhere, is looking for advice on how to prepare for a standardized math test, then this is everything I’d tell them. Read more...
A comment to page 165 of Jo Boaler’s new book Math-ish Read more...
… is to not overwhelm them. In my experience, students naturally enjoy math when it doesn’t feel overwhelmingly difficult to learn. Read more...
Each decomposition produces a system of linear equations where the number of unknowns equals the number of equations. Read more...
I can think of 4 possible sources. Read more...
Write code that makes complicated decisions, often involving some kind of inference. Read more...
Around 50-60 XP/day, that is, 50-60 minutes of serious practice per day. Just like the high-end amount of daily exercise you’d expect from people who keep a consistent exercise routine at the gym. Read more...
Most people can tell when their practice is too easy, but what about when your tasks are too hard? That’s often less obvious. Read more...
A silly bug turned genius hack. Read more...
The only way to argue against the existence of learning loss and grade inflation is to argue against the very idea of measuring learning objectively (i.e., radical constructivism). Read more...
834 XP = 834 minutes = 14 hours of work in a single day. You’re probably wondering, what kind of person does that much math in a day? Time for a little story. Read more...
Research mathematicians are like professional athletes. Read more...
Long-term learning is represented by the creation of strategic electrical wiring between neurons. Read more...
There are many, many studies that measure variation in WMC vs variation in other metrics. Read more...
If you start to flail (or, more subtly, doubt yourself and lose interest) after jumping into ML without a baseline level of foundational knowledge, then you need to put your ego aside and re-allocate your time into shoring up your foundations. Read more...
Our AI system is one of those things that sounds intuitive enough at a high level, but if you start trying to implement it yourself, you quickly run into a mountain of complexity, numerous edge cases, lots of counterintuitive low-level phenomena that take a while to fully wrap your head around. Read more...
An idea for a paper that I don’t currently have the bandwidth to write. Read more...
Learning is the incremental gain in your ability to perform a tangible, reproducible skill. Read more...
It’s actually the opposite – to get students actively retrieving information from memory, while minimizing their cognitive load. Read more...
Perform the desired transformation on identity matrix to get a left-multiplier, and maybe transpose the output. Read more...
First, fun and exciting playtime. Then, intense and strenuous skill development. Finally, developing one’s individual style while pushing the boundaries of the field. Read more...
It highlights the aversion that people have to doing hard things. People will do unbelievable mental gymnastics to convince themselves that doing an easy, enjoyable thing that is unrelated to their supposed goal somehow moves the needle more than doing a hard, unpleasant thing that is directly related to said goal. Read more...
In general, when you feel yourself running up against a ceiling in life, the solution is typically to pivot and into a direction where the ceiling is higher. Read more...
Loosely inspired by the German tank problem: several witnesses reported seeing a UFO during the given time intervals, and you want to quantify your certainty regarding when the UFO arrived and when it left. Read more...
There’s only so much fun you can have trying to follow another person’s footsteps to arrive at a known solution. There’s only so much confidence you can build from fighting against a problem that someone else has intentionally set up to be well-posed and elegantly solvable if you think about it the right way. Read more...
The network becomes book-smart in a particular area but not street-smart in general. The training procedure is like a series of exams on material within a tiny subject area (your data subspace). The network refines its knowledge in the subject area to maximize its performance on those exams, but it doesn’t refine its knowledge outside that subject area. And that leaves it gullible to adversarial examples using inputs outside the subject area. Read more...
Imitating without analyzing produces a robot / ape who can’t think critically; analyzing without imitating produces a critic who can’t act on their own advice. Read more...
A startup spent months building a sophisticated lecture tool and raising over half a million dollars in investments – but after observing students in the lecture hall, they completely abandoned the product and called up their investors to return the money. Read more...
Six weeks of pure review and six official practice exams. Read more...
Initial parameter range, data sampling range, severity of regularization. Read more...
It’s like going to the gym without a solid workout plan in place. Read more...
If you know your single-variable calculus, then it’s about 70 hours on Math Academy. Read more...
… is to present a problem where known simpler techniques fail. Read more...
My training has been scattered and fuzzy until recently. Here’s the whole story. Read more...
An oval () fits inside a rectangle [ ] with the same width and height. Read more...
The average tutored student performed better than 98% of students in the traditional class. Read more...
Many students who pattern-match will tend to prefer solutions requiring fewer and simpler operations, especially if those solutions yield ballpark-reasonable results. Read more...
Is there a standard “order of operations” for parallel vs nested absolute value expressions, in the absence of clarifying notation? Read more...
Q: Draw a 10 x 10 square grid. How many squares are there in total? Not just 1 x 1 squares, but also 2 x 2 squares, 3 x 3 squares, and so on. A: The total number of square shapes is the total sum of square numbers 1 + 4 + 9 + 16 + … + 100. Read more...
First, you want to form a habit. Second, you want to operate at peak productivity during your session. Third, you want to minimize the amount you forget between sessions. Read more...
Answer: It’s not very useful (not in practice, not in theory). Read more...
For many (but not all) students, the answer is yes. And for many of those students, automation can unlock life-changing educational outcomes. Read more...
As you climb the levels of math, sources of educational friction conspire against you and eventually throw you off the train. And one of the first warning signs is when you stop understanding things at the core, and instead try to memorize special cases cookbook-style. Read more...
In general, you can manipulate total derivatives like fractions, but you can’t do the same with partial derivatives. Read more...
Drawing –> Latex commands –> ChatGPT summary –> Google more info Read more...
Type I pairs with the variable that runs vertically in the usual representation of the coordinate system. The remaining types are paired with the rest of the variables in ascending order. Read more...
Minor changes to increase workout intensity and caloric surplus. Read more...
Daily 20-30 minute bedroom workout with gymnastic rings hanging from pull-up bar – just as much challenge as weights, but inexpensive and easily portable. Read more...
Two subtypes of coders that I watched students grow into. Read more...
A way to visualize some cognitive learning strategies. Read more...
… are summarized in the following table. Read more...
An aha moment with object-oriented programming. Read more...
In 9 months, these students went from initially not knowing how to write helper functions to building a machine learning library from scratch. Read more...
How to avoid some of the most common pitfalls leading to ugly LaTeX. Read more...
The behavior of a multivariable function can be highly specific to the path taken. Read more...
Every inscribed triangle whose hypotenuse is a diameter is a right triangle. Read more...
A simple mnemonic trick for quickly differentiating complicated functions. Read more...
A prototype web app to automatically assist students in self-correcting small errors and minor misconceptions. Read more...
A walkthrough of solving Tower of Hanoi using the approach of one of the earliest AI systems. Read more...
In a simplified problem framing, we investigate the (game-theoretical) usefulness of limiting the number of social connections per person. Read more...
Category theory provides a language for explicitly describing indirect relationships in graphs. Read more...
Framing complex systems in the language of category theory. Read more...
The main ideas behind computers can be understood by anyone. Read more...
The brain is a neuronal network integrating specialized subsystems that use local competition and thresholding to sparsify input, spike-timing dependent plasticity to learn inference, and layering to implement hierarchical predictive learning. Read more...
We solve a special case of how to periodically stimulate a biological neural network to obtain a desired connectivity (in theory). Read more...
Montaigne’s education, strictly dictated by his parents and university studies, resulted in an isolative work with scholarly impact but limited public reach. Conversely, Benjamin Franklin’s goal-oriented self-teaching led to influential creations and roles benefiting his community and nation. Read more...
Implementation notes for STDP learning in a network of Hodgkin-Huxley simulated neurons. Read more...
Many existing proofs are not accessible to young mathematicians or those without experience in the realm of dynamic systems. Read more...
A workbook I created to explain the math and physics behind an Iron Man suit to a student who was interested in the comics / movies. Read more...
A workbook I created to explain the math and physics behind an egg drop experiment to a student who was interested in Lord of the Rings and Star Wars. Read more...
And a proof via double induction. Read more...
A brief overview of sound waves and how they interact with things. Read more...
A brief overview of the experimental search for dark matter (XENON, CDMS, PICASSO, COUPP). Read more...
Mass discrepancies in galaxies and clusters, cosmic background radiation, the structure of the universe, and big bang nucleosynthesis’s impact on baryon density. Read more...
At the end of the day all learning is memorization. Read more...
Appreciation of mathematical beauty gets held up on too high a pedestal as the “correct” source of motivation in math learning. Read more...
And the problem with many existing times tables practice systems. Read more...
Start out with a volume of work that’s small enough that you don’t dread doing it again the next day. Read more...
1) Learn SQL and how to use a debugger. 2) Never come up emptyhanded, even if you don’t fix the bug. Read more...
Coding tutorials typically just say “import this function then run it,” and the math tutorials typically just say “this is the form of the model, you can fit it using the usual techniques” and leave it to the reader to figure out the rest. Read more...
If you can scaffold the content so well that it creates a smooth, efficient learning experience for knucklehead kids, it’s going to feel even smoother for more conscientious adults. Read more...
You can be the most committed and capable workhorse on the planet, but if you’re on the wrong team, the only thing you’ll change is your team’s allocation of work. Read more...
Specific areas of friction that cause students to struggle with math. What needs to be done to remove friction from the learning process. Why friction remains so prevalent. Read more...
One main focus, one semi-focus, and everything else a hobby with whatever time you have left over. Read more...
It can help to zoom out and look at your progress on a longer timescale. Read more...
1) Difficulty grappling with complexity when it grows so big that you can’t fit everything in your head. 2) Lack of understanding or willingness to accept practical constraints of the problem and incorporate them into the solution. 3) Getting distracted by low-ROI features/details. 4) Being unwilling to do “tedious” work. Read more...
Depending on your goals, either A) methods of proof, or B) linear algebra followed by probability & statistics. Read more...
It’s helpful to loosely understand what something means before memorizing it, but this does not have to be a rigorous derivation. Read more...
It’s really just “loading” the info into temporary storage – like picking up a weight off the rack, whereas learning is increasing your ability to lift said weight. Read more...
1) Confusing “conceptually simple” with “notationally compact”, and 2) jumping to the most general method right away. Read more...
If you don’t love it, you’ll never be able to keep up with the same volume of effective practice as someone who does have that love. You’ll never outwork them. Read more...
An easy trick to improve your retention while working through a bank of review or challenge problems like LeetCode, HackerRank, etc. Read more...
A little rhyme to understand the big picture of top-down vs bottom-up learning, particularly in the context of machine learning (ML). Read more...
At the end of the day you can either waste time debating your coach on the training regimen, or you can use that time to just put your head down and do some f*cking work. Read more...
Pictures can help build mathematical intuition, but sometimes learners think they should fully visualize every single problem they solve, which actually handicaps their thinking. Math involves generalizing patterns in logically consistent ways, and the generalizations eventually go beyond what you can fully picture in your head. Read more...
Making progress is all about putting pressure on a problem: applying the force of your skills to a specific problem area (pressure = force / area). Read more...
And why we refer to ourselves as still being “in beta.” Read more...
The need for automaticity on low-level skills is obvious to anyone with experience learning a sport or instrument. So why is there sometimes resistance in education? It makes sense if you think about what people usually find persuasive. Read more...
The whole idea is that you want the other person to raise the bar on competition and pass you up, so that you’re motivated to come right back and do the same to them. Read more...
Every time you put out a post, get feedback, make improvements, and carry those improvements forward into future posts, that’s essentially a “rep” of deliberate practice. Read more...
1) Don’t use projects as a way to acquire fundamental skills. 2) Make sure the projects are guided. 3) Don’t let the projects cut too much into your foundational skill-building. Read more...
Fun is a supplement, not a substitute, for deliberate practice. Read more...
The article presents two claims of deliberate practice that it argues against – but the first claim is a misattribution, and the second claim is not actually argued against. Read more...
Doesn’t “beyond the edge of one’s capabilities” mean that you can’t do it? How can you practice it if you can’t do it? Also, “performance-improving adjustments on every single repetition” is hard to understand in some realms of performance. For instance, does each step a runner takes involve feedback and improvement? Read more...
Even if students are working on exactly the right things, they need to be working exactly the right way to capture the most learning from their time spent working. Read more...
every individual student is actively engaged on every piece of material to be learned. Read more...
And if you want to get the most out of your review, you need to engage in spaced, interleaved retrieval practice. Read more...
It’s the tragedy of the commons. Read more...
It can be helpful to take a top-down approach in planning out your overarching learning goals, but the learning itself has to occur bottom-up. Read more...
Bloom studied the training backgrounds of 120 world-class talented individuals across 6 talent domains: piano, sculpting, swimming, tennis, math, & neurology, and what he discovered was that talent development occurs through a similar general process, no matter what talent domain. In other words, there is a “formula” for developing talent – though executing it is a lot harder than simply understanding it. Read more...
Curiosity/interest motivates people to engage in deliberate practice, which is what builds ability. Read more...
Effective explicit instruction is all about clarity, and breaking down information, and minimizing the load on working memory. Read more...
1) The information must have already been written to memory. 2) The information must be retrieved from memory, unassisted. Read more...
I think optimal motivation requires a balance of both intrinsic and extrinsic factors. Read more...
The amount of practice should be determined on the basis of each student’s individual performance on each individual topic. Some students may end up having to do more work, but this ultimately empowers them to learn and continue learning into the future. Read more...
Here’s a trick to feel amazingly capable and confident: periodically look back at stuff you originally found challenging months ago. Read more...
When you’re knowledgeable/skilled enough to grapple with problems in a more directly applicable field, math gives you the superpower of being able to compress those problem representations into an abstract space where they’re easier to solve. Read more...
You haven’t learned unless you’re able to consistently reproduce the information you consumed and use it to solve problems. Read more...
When students are not given the opportunity to learn math seriously, and are instead presented with watered-down courses and told that they’re doing a great job, they’re being set up for failure later in life when it matters most. Read more...
Learning math with little computation is like learning basketball with little practice on dribbling & ball handling techniques. Read more...
… and they should be treated as such. Read more...
There’s a cognitive principle behind this: associative interference, the phenomenon that conceptually related pieces of knowledge can interfere with each other’s recall. Read more...
No matter what skill is being trained, improving performance is always an effortful process. Read more...
By periodically revisiting content, a spiral curriculum periodically restores forgotten knowledge and leverages the spacing effect to slow the decay of that knowledge. Spaced repetition takes this line of thought to its fullest extent by fully optimizing the review process. Read more...
The strongest people lift weights heavy enough to make them feel weak. Read more...
While there is plenty of room for teachers to make better use of cognitive learning strategies in the classroom, teachers are victims of circumstance in a profession lacking effective accountability and incentive structures, and the end result is that students continue to receive mediocre educational experiences. Given a sufficient degree of accountability and incentives, there is no law of physics preventing a teacher from putting forth the work needed to deliver an optimal learning experience to a single student. However, in the absence of technology, it is impossible for a single human teacher to deliver an optimal learning experience to a classroom of many students with heterogeneous knowledge profiles, each of whom needs to work on different types of problems and receive immediate feedback on each of their attempts. This is why technology is necessary. Read more...
Gamification, integrating game-like elements into learning environments, proves effective in increasing student learning, engagement, and enjoyment. Read more...
The testing effect (or the retrieval practice effect) emphasizes that recalling information from memory, rather than repeated reading, enhances learning. It can be combined with spaced repetition to produce an even more potent learning technique known as spaced retrieval practice. Read more...
Interleaving (or mixed practice) involves spreading minimal effective doses of practice across various skills, in contrast to blocked practice, which involves extensive consecutive repetition of a single skill. Blocked practice can give a false sense of mastery and fluency because it allows students to settle into a robotic rhythm of mindlessly applying one type of solution to one type of problem. Interleaving, on the other hand, creates a “desirable difficulty” that promotes vastly superior retention and generalization, making it a more effective review strategy. But despite its proven efficacy, interleaving faces resistance in classrooms due to a preference for practice that feels easier and appears to produce immediate performance gains, even if those performance gains quickly vanish afterwards and do not carry over to test performance. Read more...
When reviews are spaced out or distributed over multiple sessions (as opposed to being crammed or massed into a single session), memory is not only restored, but also further consolidated into long-term storage, which slows its decay. This is known as the spacing effect. A profound consequence of the spacing effect is that the more reviews are completed (with appropriate spacing), the longer the memory will be retained, and the longer one can wait until the next review is needed. This observation gives rise to a systematic method for reviewing previously-learned material called spaced repetition (or distributed practice). A repetition is a successful review at the appropriate time. Read more...
Layering is the act of continually building on top of existing knowledge – that is, continually acquiring new knowledge that exercises prerequisite or component knowledge. This causes existing knowledge to become more ingrained, organized, and deeply understood, thereby increasing the structural integrity of a student’s knowledge base and making it easier to assimilate new knowledge. Read more...
Associative interference occurs when related knowledge interferes with recall. It is more likely to occur when highly related pieces of knowledge are learned simultaneously or in close succession. However, the effects of interference can be mitigated by teaching dissimilar concepts simultaneously and spacing out related pieces of knowledge over time. Read more...
Automaticity is the ability to perform low-level skills without conscious effort. Analogous to a basketball player effortlessly dribbling while strategizing, automaticity allows individuals to avoid spending limited cognitive resources on low-level tasks and instead devote those cognitive resources to higher-order reasoning. In this way, automaticity is the gateway to expertise, creativity, and general academic success. However, insufficient automaticity, particularly in basic skills, inflates the cognitive load of tasks, making it exceedingly difficult for students to learn and perform. Read more...
Different students have different working memory capacities. When the cognitive load of a learning task exceeds a student’s working memory capacity, the student experiences cognitive overload and is not able to complete the task. Read more...
Mastery learning is a strategy in which students demonstrate proficiency on prerequisites before advancing. While even loose approximations of mastery learning have been shown to produce massive gains in student learning, mastery learning faces limited adoption due to clashing with traditional teaching methods and placing increased demands on educators. True mastery learning at a fully granular level requires fully individualized instruction and is only attainable through one-on-one tutoring. Read more...
Active learning leads to more neural activation than passive learning. Automaticity involves developing strategic neural connections that reduce the amount of effort that the brain has to expend to activate patterns of neurons. Read more...
During practice, the elite skaters were over 6 times more active than passive, while non-competitive skaters were nearly as passive as they were active. Read more...
True active learning requires every individual student to be actively engaged on every piece of the material to be learned. Read more...
It’s easier to run into roadblocks, but also easier to maintain what you’ve learned. Read more...
Passive consumption. Lack of depth. Lack of rigorous assessments. Failing upwards. Lack of skill development. Read more...
I’d start off with some introductory course that covers the very basics of coding in some language that is used by many professional programmers but where the syntax reads almost like plain English and lower-level details like memory management are abstracted away. Then, I’d jump right into building board games and strategic game-playing agents (so a human can play against the computer), starting with simple games (e.g. tic-tac-toe) and working upwards from there (maybe connect 4 next, then checkers, and so on). Read more...
Solving problems, building on top of what you’ve learned, reviewing what you’ve learned, and quality, quantity, and spacing of practice. Read more...
Effortful processes like testing, repetition, and computation are essential parts of effective learning, and competition is often helpful. Read more...
The most effective learning techniques require substantial cognitive effort from students and typically do not emulate what experts do in the professional workplace. Direct instruction is necessary to maximize student learning, whereas unguided instruction and group projects are typically very inefficient. Read more...
In terms of improving educational outcomes, science is not where the bottleneck is. The bottleneck is in practice. The science of learning has advanced significantly over the past century, yet the practice of education has barely changed. Read more...
Why it’s common for students to pass courses despite severely lacking knowledge of the content. Read more...
Won first place in a state-level competition by finding and exploiting a loophole in the points scoring logic. Read more...
While some may view Feynman-style pedagogy as supporting inclusive learning for all students across varying levels of ability, Feynman himself acknowledged that his methods only worked for the top 10% of his students. Read more...
The most important things I learned from competing in science fairs had nothing to do with physics or even academics. My main takeaways were actually related to business – in particular, sales and marketing. Read more...
In order to justify using a more complex model, the increase in performance has to be worth the cost of integrating and maintaining the complexity. Read more...
Good problem = intersection between your own interests/talents, the realm of what’s feasible, and the desires of the external world. Read more...
Stuff you don’t find in math textbooks. Read more...
Effective learning strategies sometimes go against our human instincts about conversation. Read more...
1) Learn SQL and how to use a debugger. 2) Never come up emptyhanded, even if you don’t fix the bug. Read more...
1) Difficulty grappling with complexity when it grows so big that you can’t fit everything in your head. 2) Lack of understanding or willingness to accept practical constraints of the problem and incorporate them into the solution. 3) Getting distracted by low-ROI features/details. 4) Being unwilling to do “tedious” work. Read more...
1) Foundational math. 2) Classical machine learning. 3) Deep learning. 4) Cutting-edge machine learning. Read more...
Write code that makes complicated decisions, often involving some kind of inference. Read more...
I’d start off with some introductory course that covers the very basics of coding in some language that is used by many professional programmers but where the syntax reads almost like plain English and lower-level details like memory management are abstracted away. Then, I’d jump right into building board games and strategic game-playing agents (so a human can play against the computer), starting with simple games (e.g. tic-tac-toe) and working upwards from there (maybe connect 4 next, then checkers, and so on). Read more...
During its operation from 2020 to 2023, Eurisko was the most advanced high school math/CS sequence in the USA. It culminated in high school students doing masters/PhD-level coursework (reproducing academic research papers in artificial intelligence, building everything from scratch in Python). Read more...
In order to justify using a more complex model, the increase in performance has to be worth the cost of integrating and maintaining the complexity. Read more...
Two subtypes of coders that I watched students grow into. Read more...
An aha moment with object-oriented programming. Read more...
Using convolutional layers to create an even better checkers player. Read more...
Extending Fogel’s tic-tac-toe player to the game of checkers. Read more...
Reimplementing the paper that laid the groundwork for Blondie24. Read more...
A method for training neural networks that works even when training feedback is sparse. Read more...
Combining game-specific human intelligence (heuristics) and generalizable artificial intelligence (minimax on a game tree) Read more...
Repeatedly choosing the action with the best worst-case scenario. Read more...
Building data structures that represent all the possible outcomes of a game. Read more...
A convenient technique for computing gradients in neural networks. Read more...
The deeper or more “hierarchical” a computational graph is, the more complex the model that it represents. Read more...
We can algorithmically build classifiers that use a sequence of nested “if-then” decision rules. Read more...
Computing spatial relationships between nodes when edges no longer represent unit distances. Read more...
Using traversals to understand spatial relationships between nodes in graphs. Read more...
Graphs show up all the time in computer science, so it’s important to know how to work with them. Read more...
A simple classification algorithm grounded in Bayesian probability. Read more...
One of the simplest classifiers. Read more...
In many real-life situations, there is more than one input variable that controls the output variable. Read more...
Gradient descent can help us avoid pitfalls that occur when fitting nonlinear models using the pseudoinverse. Read more...
Just because model appears to match closely with points in the data set, does not necessarily mean it is a good model. Read more...
Transforming nonlinear functions so that we can fit them using the pseudoinverse. Read more...
Exploring the most general class of functions that can be fit using the pseudoinverse. Read more...
Using matrix algebra to fit simple functions to data sets. Read more...
A technique for maximizing linear expressions subject to linear constraints. Read more...
Under the hood, dictionaries are hash tables. Read more...
Implementing a differential equations model that won the Nobel prize. Read more...
A simple differential equations model that we can plot using multivariable Euler estimation. Read more...
Arrays can be used to implement more than just matrices. We can also implement other mathematical procedures like Euler estimation. Read more...
One of the best ways to get practice with object-oriented programming is implementing games. Read more...
Guess some initial clusters in the data, and then repeatedly update the guesses to make the clusters more cohesive. Read more...
You can use the RREF algorithm to compute determinants much faster than with the recursive cofactor expansion method. Read more...
We can use arrays to implement matrices and their associated mathematical operations. Read more...
Merge sort and quicksort are generally faster than selection, bubble, and insertion sort. And unlike counting sort, they are not susceptible to blowup in the amount of memory required. Read more...
Some of the simplest methods for sorting items in arrays. Read more...
Just like single-variable gradient descent, except that we replace the derivative with the gradient vector. Read more...
We take an initial guess as to what the minimum is, and then repeatedly use the gradient to nudge that guess further and further “downhill” into an actual minimum. Read more...
Bisection search involves repeatedly moving one bound halfway to the other. The Newton-Raphson method involves repeatedly moving our guess to the root of the tangent line. Read more...
Backtracking can drastically cut down the number of possibilities that must be checked during brute force. Read more...
Brute force search involves trying every single possibility. Read more...
Implementing the Cartesian product provides good practice working with arrays. Read more...
How to sample from a discrete probability distribution. Read more...
Estimating probabilities by simulating a large number of random experiments. Read more...
Sequences where each term is a function of the previous terms. Read more...
There are other number systems that use more or fewer than ten characters. Read more...
It’s assumed that you’ve had some basic exposure to programming. Read more...
A prototype web app to automatically assist students in self-correcting small errors and minor misconceptions. Read more...
A walkthrough of solving Tower of Hanoi using the approach of one of the earliest AI systems. Read more...
Media outlets often make the mistake of anthropomorphizing or attributing human-like characteristics to computer programs. Read more...
As computation power increased, neural networks began to take center stage in AI. Read more...
Expert systems stored “if-then” rules derived from the knowledge of experts. Read more...
Framing reasoning as searching through a maze of actions for a sequence that achieves the desired end goal. Read more...
Turing test, games, hype, narrow vs general AI. Read more...
Rather than duplicating such code each time we want to use it, it is more efficient to store the code in a function. Read more...
We often wish to tell the computer instructions involving the words “if,” “while,” and “for.” Read more...
We can store many related pieces of data within a single variable called a data structure. Read more...
We can store and manipulate data in the form of variables. Read more...
Each decomposition produces a system of linear equations where the number of unknowns equals the number of equations. Read more...
Answer: It’s not very useful (not in practice, not in theory). Read more...
Hidden inside of every quadratic, there is a perfect square. Read more...
Equations involving compositions of trigonometric functions can create wild patterns in the plane. Read more...
Lissajous curves use sine functions to create interesting patterns in the plane. Read more...
Absolute value graphs can be rotated to draw stars. Read more...
Non-euclidean ellipses can be used to draw starry-eye sunglasses. Read more...
Euclidean ellipses can be combined with sine wave shading to form three-dimensional shells. Read more...
High-frequency sine waves can be used to draw shaded regions. Read more...
Roots can be used to draw deer. Read more...
Sine waves can be used to draw scales on a fish. Read more...
Parabolas can be used to draw a fish. Read more...
Absolute value can be used to draw a person. Read more...
Slanted lines can be used to draw a spider web. Read more...
Horizontal and vertical lines can be used to draw a castle. Read more...
Compositions of functions consist of multiple functions linked together, where the output of one function becomes the input of another function. Read more...
Inverting a function entails reversing the outputs and inputs of the function. Read more...
When a function is reflected, it flips across one of the axes to become its mirror image. Read more...
When a function is rescaled, it is stretched or compressed along one of the axes, like a slinky. Read more...
When a function is shifted, all of its points move vertically and/or horizontally by the same amount. Read more...
A piecewise function is pieced together from multiple different functions. Read more...
Trigonometric functions represent the relationship between sides and angles in right triangles. Read more...
Absolute value represents the magnitude of a number, i.e. its distance from zero. Read more...
Exponential functions have variables as exponents. Logarithms cancel out exponentiation. Read more...
Radical functions involve roots: square roots, cube roots, or any kind of fractional exponent in general. Read more...
A slant asymptote is a slanted line that arises from a linear term in the proper form of a rational function. Read more...
If we choose one input on each side of an asymptote, we can tell which section of the plane the function will occupy. Read more...
Vertical asymptotes are vertical lines that a function approaches but never quite reaches. Read more...
Rational functions can have a form of end behavior in which they become flat, approaching (but never quite reaching) a horizontal line known as a horizontal asymptote. Read more...
Polynomial long division works the same way as the long division algorithm that’s familiar from simple arithmetic. Read more...
We can sketch the graph of a polynomial using its end behavior and zeros. Read more...
The rational roots theorem can help us find zeros of polynomials without blindly guessing. Read more...
The zeros of a polynomial are the inputs that cause it to evaluate to zero. Read more...
The end behavior of a polynomial refers to the type of output that is produced when we input extremely large positive or negative values. Read more...
To solve a system of inequalities, we need to solve each individual inequality and find where all their solutions overlap. Read more...
Quadratic inequalities are best visualized in the plane. Read more...
When a linear equation has two variables, the solution covers a section of the coordinate plane. Read more...
An inequality is similar to an equation, but instead of saying two quantities are equal, it says that one quantity is greater than or less than another. Read more...
Systems of quadratic equations can be solved via substitution. Read more...
To easily graph a quadratic equation, we can convert it to vertex form. Read more...
Completing the square helps us gain a better intuition for quadratic equations and understand where the quadratic formula comes from. Read more...
To solve hard-to-factor quadratic equations, it’s easiest to use the quadratic formula. Read more...
Factoring is a method for solving quadratic equations. Read more...
Quadratic equations are similar to linear equations, except that they contain squares of a single variable. Read more...
A linear system consists of multiple linear equations, and the solution of a linear system consists of the pairs that satisfy all of the equations. Read more...
Standard form makes it easy to see the intercepts of a line. Read more...
An easy way to write the equation of a line if we know the slope and a point on a line. Read more...
Introducing linear equations in two variables. Read more...
Loosely speaking, a linear equation is an equality statement containing only addition, subtraction, multiplication, and division. Read more...
A series is the sum of a sequence. Read more...
A sequence is a list of numbers that has some pattern. Read more...
A function is a scribble that crosses each vertical line only once. Read more...
Appreciation of mathematical beauty gets held up on too high a pedestal as the “correct” source of motivation in math learning. Read more...
It should look less like them helping you and more like you helping them. Read more...
Start out with a volume of work that’s small enough that you don’t dread doing it again the next day. Read more...
You can be the most committed and capable workhorse on the planet, but if you’re on the wrong team, the only thing you’ll change is your team’s allocation of work. Read more...
At the end of the day, whether or not they know math comes down to whether or not they can apply techniques within that well-defined body of knowledge to solve problems within that well-defined body of knowledge. Read more...
If you go directly to the most abstract ideas then you’re basically like a kid who reads a book of famous quotes about life and thinks they understand everything about life by way of those quotes. Read more...
… an infinitely tall ladder where the rungs get spaced further and further apart the higher you climb. Read more...
… is reducing friction in the learning process. Read more...
One main focus, one semi-focus, and everything else a hobby with whatever time you have left over. Read more...
Tear down the unproductive habit and build up a counter-habit whose gravity eventually becomes strong enough to completely overtake the original habit. Read more...
It can help to zoom out and look at your progress on a longer timescale. Read more...
What you want is a continual cycle of strain and adaptation. Read more...
If you don’t love it, you’ll never be able to keep up with the same volume of effective practice as someone who does have that love. You’ll never outwork them. Read more...
There are many studies demonstrating a benefit of some component of deliberate practice, but these studies often get mislabeled or misinterpreted as demonstrating the full benefit of true deliberate practice. The field of education is particularly susceptible to this issue because it is impossible for a teacher with a classroom of students to provide a true deliberate practice experience without assistive technology that perfectly emulates the one-on-one pedagogical decisions that an expert tutor would make for each individual student. Read more...
A little rhyme to understand the big picture of top-down vs bottom-up learning, particularly in the context of machine learning (ML). Read more...
At the end of the day you can either waste time debating your coach on the training regimen, or you can use that time to just put your head down and do some f*cking work. Read more...
Making progress is all about putting pressure on a problem: applying the force of your skills to a specific problem area (pressure = force / area). Read more...
Once you get past steps 1-3, it’s hard to find scaffolding. You can’t just enroll in a course or pick up a textbook. The scaffolding comes from finding a mentor on a mission that you identify with and are well-suited to contribute to. And it can take a lot of searching to find that person and problem area that’s the right fit. Read more...
Every time you put out a post, get feedback, make improvements, and carry those improvements forward into future posts, that’s essentially a “rep” of deliberate practice. Read more...
1) Don’t use projects as a way to acquire fundamental skills. 2) Make sure the projects are guided. 3) Don’t let the projects cut too much into your foundational skill-building. Read more...
The habit is a psychological force field that protects you from all sorts of negative feelings that try to dissuade you from training. Read more...
You get to provide value that nobody else can, and you get recognized for it. Read more...
Fun is a supplement, not a substitute, for deliberate practice. Read more...
The article presents two claims of deliberate practice that it argues against – but the first claim is a misattribution, and the second claim is not actually argued against. Read more...
Doesn’t “beyond the edge of one’s capabilities” mean that you can’t do it? How can you practice it if you can’t do it? Also, “performance-improving adjustments on every single repetition” is hard to understand in some realms of performance. For instance, does each step a runner takes involve feedback and improvement? Read more...
1) Foundational math. 2) Classical machine learning. 3) Deep learning. 4) Cutting-edge machine learning. Read more...
Bloom studied the training backgrounds of 120 world-class talented individuals across 6 talent domains: piano, sculpting, swimming, tennis, math, & neurology, and what he discovered was that talent development occurs through a similar general process, no matter what talent domain. In other words, there is a “formula” for developing talent – though executing it is a lot harder than simply understanding it. Read more...
Curiosity/interest motivates people to engage in deliberate practice, which is what builds ability. Read more...
I think optimal motivation requires a balance of both intrinsic and extrinsic factors. Read more...
Nobody who knows the science of learning is actually debating this. Read more...
Here’s a trick to feel amazingly capable and confident: periodically look back at stuff you originally found challenging months ago. Read more...
The underlying principle that it all boils down to is deliberate practice. Read more...
Math gets hard for different students at different levels. If you don’t have worked examples to help carry you through once math becomes hard for you, then every problem basically blows up into a “research project” for you. Sometimes people advocate for unguided struggle as a way to improve general problem-solving ability, but this idea lacks empirical support. Worked examples won’t prevent you from developing deep understanding (actually, it’s the opposite: worked examples can help you quickly layer on more skills, which forces a structural integrity in the lower levels of your knowledge). Even if you decide against using worked examples for now, continually re-evaluate to make sure you’re getting enough productive training volume. Read more...
Research mathematicians are like professional athletes. Read more...
“…[D]eliberate practice requires effort and is not inherently enjoyable. Individuals are motivated to practice because practice improves performance.” Read more...
Many educators think that the makeup of every year in a student’s education should be balanced the same way across Bloom’s taxonomy, whereas Bloom’s 3-stage talent development process suggests that the time allocation should change drastically as a student progresses through their education. Read more...
… and they should be treated as such. Read more...
When you’re developing skills at peak efficiency, you are maximizing the difficulty of your training tasks subject to the constraint that you end up successfully overcoming those difficulties in a timely manner. Read more...
First, fun and exciting playtime. Then, intense and strenuous skill development. Finally, developing one’s individual style while pushing the boundaries of the field. Read more...
Talent development is not only different from schooling, but in many cases completely orthogonal to schooling. Read more...
The average tutored student performed better than 98% of students in the traditional class. Read more...
At the end of the day all learning is memorization. Read more...
Myth 1: Understanding amounts to something other than memory. Myth 2: Sudents can perform high-level skills without mastering low-level component skills. Read more...
Specific areas of friction that cause students to struggle with math. What needs to be done to remove friction from the learning process. Why friction remains so prevalent. Read more...
It’s helpful to loosely understand what something means before memorizing it, but this does not have to be a rigorous derivation. Read more...
It’s really just “loading” the info into temporary storage – like picking up a weight off the rack, whereas learning is increasing your ability to lift said weight. Read more...
There are many studies demonstrating a benefit of some component of deliberate practice, but these studies often get mislabeled or misinterpreted as demonstrating the full benefit of true deliberate practice. The field of education is particularly susceptible to this issue because it is impossible for a teacher with a classroom of students to provide a true deliberate practice experience without assistive technology that perfectly emulates the one-on-one pedagogical decisions that an expert tutor would make for each individual student. Read more...
An easy trick to improve your retention while working through a bank of review or challenge problems like LeetCode, HackerRank, etc. Read more...
The fuzzier that memory, the harder it is to lift. The wait creates the weight. Read more...
… is asking students to perform activities that leverage a non-existent knowledge base. Read more...
The need for automaticity on low-level skills is obvious to anyone with experience learning a sport or instrument. So why is there sometimes resistance in education? It makes sense if you think about what people usually find persuasive. Read more...
If you try to keep information close by taking great notes that you can reference all the time… that just PREVENTS you from truly retaining it. Read more...
It’s a hard truth that some people have more advantageous cognitive differences than others – e.g., higher working memory capacity, higher generalization ability, slower forgetting rate. However, there are two sources of hope: 1) automaticity can effectively turn your long-term memory into an extension of your working memory, and 2) many sources of friction in the learning process can be not only remedied but also exploited to increase learning speed beyond the status quo. Read more...
And if you want to get the most out of your review, you need to engage in spaced, interleaved retrieval practice. Read more...
You gotta develop automaticity on low-level skills in order to free up mental resources for higher-level thinking! Read more...
… is to not overwhelm them. In my experience, students naturally enjoy math when it doesn’t feel overwhelmingly difficult to learn. Read more...
1) The information must have already been written to memory. 2) The information must be retrieved from memory, unassisted. Read more...
There is an asymmetric tradeoff between 1) blowing your working memory capacity and leaving yourself unable to make progress, versus 2) wasting a couple extra seconds writing down a bit more work than you need to. When in doubt, write it out. Read more...
You haven’t learned unless you’re able to consistently reproduce the information you consumed and use it to solve problems. Read more...
The underlying principle that it all boils down to is deliberate practice. Read more...
Long-term learning is represented by the creation of strategic electrical wiring between neurons. Read more...
Research indicates the best way to improve your problem-solving ability in any domain is simply by acquiring more foundational skills in that domain. The way you increase your ability to make mental leaps is not actually by jumping farther, but rather, by building bridges that reduce the distance you need to jump. Yet, higher math textbooks & courses seem to focus on trying to train jumping distance instead of bridge-building. Read more...
There are many, many studies that measure variation in WMC vs variation in other metrics. Read more...
Challenge problems are not a good use of time until you’ve developed the foundational skills that are necessary to grapple with these problems in a productive and timely fashion. Read more...
If you understand the interplay between working memory and long-term memory, then then you can actually derive – from first principles – the methods of effective teaching. Read more...
An idea for a paper that I don’t currently have the bandwidth to write. Read more...
It’s the act of successfully retrieving fuzzy memory, not clear memory, that extends the memory duration. Read more...
To transfer information into long-term memory, you need to practice retrieving it without assistance. Read more...
There’s a cognitive principle behind this: associative interference, the phenomenon that conceptually related pieces of knowledge can interfere with each other’s recall. Read more...
It’s actually the opposite – to get students actively retrieving information from memory, while minimizing their cognitive load. Read more...
There are numerous cognitive learning strategies that 1) can be used to massively improve learning, 2) have been reproduced so many times they might as well be laws of physics, and 3) connect all the way down to the mechanics of what’s going on in the brain. Read more...
By periodically revisiting content, a spiral curriculum periodically restores forgotten knowledge and leverages the spacing effect to slow the decay of that knowledge. Spaced repetition takes this line of thought to its fullest extent by fully optimizing the review process. Read more...
While there is plenty of room for teachers to make better use of cognitive learning strategies in the classroom, teachers are victims of circumstance in a profession lacking effective accountability and incentive structures, and the end result is that students continue to receive mediocre educational experiences. Given a sufficient degree of accountability and incentives, there is no law of physics preventing a teacher from putting forth the work needed to deliver an optimal learning experience to a single student. However, in the absence of technology, it is impossible for a single human teacher to deliver an optimal learning experience to a classroom of many students with heterogeneous knowledge profiles, each of whom needs to work on different types of problems and receive immediate feedback on each of their attempts. This is why technology is necessary. Read more...
The testing effect (or the retrieval practice effect) emphasizes that recalling information from memory, rather than repeated reading, enhances learning. It can be combined with spaced repetition to produce an even more potent learning technique known as spaced retrieval practice. Read more...
Interleaving (or mixed practice) involves spreading minimal effective doses of practice across various skills, in contrast to blocked practice, which involves extensive consecutive repetition of a single skill. Blocked practice can give a false sense of mastery and fluency because it allows students to settle into a robotic rhythm of mindlessly applying one type of solution to one type of problem. Interleaving, on the other hand, creates a “desirable difficulty” that promotes vastly superior retention and generalization, making it a more effective review strategy. But despite its proven efficacy, interleaving faces resistance in classrooms due to a preference for practice that feels easier and appears to produce immediate performance gains, even if those performance gains quickly vanish afterwards and do not carry over to test performance. Read more...
When reviews are spaced out or distributed over multiple sessions (as opposed to being crammed or massed into a single session), memory is not only restored, but also further consolidated into long-term storage, which slows its decay. This is known as the spacing effect. A profound consequence of the spacing effect is that the more reviews are completed (with appropriate spacing), the longer the memory will be retained, and the longer one can wait until the next review is needed. This observation gives rise to a systematic method for reviewing previously-learned material called spaced repetition (or distributed practice). A repetition is a successful review at the appropriate time. Read more...
Layering is the act of continually building on top of existing knowledge – that is, continually acquiring new knowledge that exercises prerequisite or component knowledge. This causes existing knowledge to become more ingrained, organized, and deeply understood, thereby increasing the structural integrity of a student’s knowledge base and making it easier to assimilate new knowledge. Read more...
Associative interference occurs when related knowledge interferes with recall. It is more likely to occur when highly related pieces of knowledge are learned simultaneously or in close succession. However, the effects of interference can be mitigated by teaching dissimilar concepts simultaneously and spacing out related pieces of knowledge over time. Read more...
Automaticity is the ability to perform low-level skills without conscious effort. Analogous to a basketball player effortlessly dribbling while strategizing, automaticity allows individuals to avoid spending limited cognitive resources on low-level tasks and instead devote those cognitive resources to higher-order reasoning. In this way, automaticity is the gateway to expertise, creativity, and general academic success. However, insufficient automaticity, particularly in basic skills, inflates the cognitive load of tasks, making it exceedingly difficult for students to learn and perform. Read more...
Different students have different working memory capacities. When the cognitive load of a learning task exceeds a student’s working memory capacity, the student experiences cognitive overload and is not able to complete the task. Read more...
Active learning leads to more neural activation than passive learning. Automaticity involves developing strategic neural connections that reduce the amount of effort that the brain has to expend to activate patterns of neurons. Read more...
Effortful processes like testing, repetition, and computation are essential parts of effective learning, and competition is often helpful. Read more...
The most effective learning techniques require substantial cognitive effort from students and typically do not emulate what experts do in the professional workplace. Direct instruction is necessary to maximize student learning, whereas unguided instruction and group projects are typically very inefficient. Read more...
Different people generally have different working memory capacities and learn at different rates, but people do not actually learn better in their preferred “learning style.” Instead, different people need the same form of practice but in different amounts. Read more...
In terms of improving educational outcomes, science is not where the bottleneck is. The bottleneck is in practice. The science of learning has advanced significantly over the past century, yet the practice of education has barely changed. Read more...
Cognition involves the flow of information through sensory, working, and long-term memory banks in the brain. Sensory memory temporarily holds raw data, working memory manipulates and organizes information, and long-term memory stores it indefinitely by creating strategic electrical wiring between neurons. Learning amounts to increasing the quantity, depth, retrievability, and generalizability of concepts and skills in a student’s long-term memory. Limited working memory capacity creates a bottleneck in the transfer of information into long-term memory, but cognitive learning strategies can be used to mitigate the effects of this bottleneck. Read more...
Spaced repetition is complicated in hierarchical bodies of knowledge, like mathematics, because repetitions on advanced topics should “trickle down” to update the repetition schedules of simpler topics that are implicitly practiced (while being discounted appropriately since these repetitions are often too early to count for full credit towards the next repetition). However, I developed a model of Fractional Implicit Repetition (FIRe) that not only accounts for implicit “trickle-down” repetitions but also minimizes the number of reviews by choosing reviews whose implicit repetitions “knock out” other due reviews (like dominos), and calibrates the speed of the spaced repetition process to each individual student on each individual topic (student ability and topic difficulty are competing factors). Read more...
Effective learning strategies sometimes go against our human instincts about conversation. Read more...
A way to visualize some cognitive learning strategies. Read more...
In general, you can manipulate total derivatives like fractions, but you can’t do the same with partial derivatives. Read more...
An intuitive derivation. Read more...
A simple mnemonic trick for quickly differentiating complicated functions. Read more...
Many differential equations don’t have solutions that can be expressed in terms of finite combinations of familiar functions. However, we can often solve for the Taylor series of the solution. Read more...
To find the Taylor series of complicated functions, it’s often easiest to manipulate the Taylor series of simpler functions. Read more...
Many non-polynomial functions can be represented by infinite polynomials. Read more...
Various tricks for determining whether a series converges or diverges. Read more...
A geometric series is a sum where each term is some constant times the previous term. Read more...
When we know the solutions of a linear differential equation with constant coefficients and right hand side equal to zero, we can use variation of parameters to find a solution when the right hand side is not equal to zero. Read more...
Integrating factors can be used to solve first-order differential equations with non-constant coefficients. Read more...
Undetermined coefficients can help us find a solution to a linear differential equation with constant coefficients when the right hand side is not equal to zero. Read more...
Given a linear differential equation with constant coefficients and a right hand side of zero, the roots of the characteristic polynomial correspond to solutions of the equation. Read more...
Non-separable differential equations can be sometimes converted into separable differential equations by way of substitution. Read more...
When faced with a differential equation that we don’t know how to solve, we can sometimes still approximate the solution. Read more...
The simplest differential equations can be solved by separation of variables, in which we move the derivative to one side of the equation and take the antiderivative. Read more...
Improper integrals have bounds or function values that extend to positive or negative infinity. Read more...
We can apply integration by parts whenever an integral would be made simpler by differentiating some expression within the integral, at the cost of anti-differentiating another expression within the integral. Read more...
Substitution involves condensing an expression of into a single new variable, and then expressing the integral in terms of that new variable. Read more...
To evaluate a definite integral, we find the antiderivative, evaluate it at the indicated bounds, and then take the difference. Read more...
The antiderivative of a function is a second function whose derivative is the first function. Read more...
When a limit takes the indeterminate form of zero divided by zero or infinity divided by infinity, we can differentiate the numerator and denominator separately without changing the actual value of the limit. Read more...
We can interpret the derivative as an approximation for how a function’s output changes, when the function input is changed by a small amount. Read more...
Derivatives can be used to find a function’s local extreme values, its peaks and valleys. Read more...
There are convenient rules the derivatives of exponential, logarithmic, trigonometric, and inverse trigonometric functions. Read more...
Given a sum, we can differentiate each term individually. But why are we able to do this? Does multiplication work the same way? What about division? Read more...
When taking derivatives of compositions of functions, we can ignore the inside of a function as long as we multiply by the derivative of the inside afterwards. Read more...
There are some patterns that allow us to compute derivatives without having to compute the limit of the difference quotient. Read more...
The derivative of a function is the function’s slope at a particular point, and can be computed as the limit of the difference quotient. Read more...
Various tricks for evaluating tricky limits. Read more...
The limit of a function, as the input approaches some value, is the output we would expect if we saw only the surrounding portion of the graph. Read more...
It comes out to roughly a fortieth of that of a truck. Read more...
String art works because the strings are tangent lines to a curve. Read more...
Calculus can show us how our intuition can fail us, a common theme in philosophy. Read more...
Nobody came out of the dispute well. Read more...
When Joseph Fourier first introduced Fourier series, they gave mathematicians nightmares. Read more...
Deriving the “Pert” formula. Read more...
If we know the revenue and costs associated with producing any number of units, then we can use calculus to figure out the number of units to produce for maximum profit. Read more...
Calculus can be used to find the parameters that minimize a function. Read more...
Physics engines use calculus to periodically updates the locations of objects. Read more...
Introducing Kajiya’s rendering equation. Read more...
Deriving the ideal rocket equation. Read more...
Deriving the Gompertz function. Read more...
Understanding why even slight narrowing of arteries can pose such a big problem to blood flow. Read more...
Measuring volume of blood the heart pumps out into the aorta per unit time. Read more...
A series is the sum of a sequence. Read more...
A sequence is a list of numbers that has some pattern. Read more...
Integrals give the area under a portion of a function. Read more...
The derivative tells the steepness of a function at a given point, kind of like a carpenter’s level. Read more...
The limit of a function is the height where it looks like the scribble is going to hit a particular vertical line. Read more...
We do it all manually, entirely by hand. Read more...
It should look less like them helping you and more like you helping them. Read more...
At the end of the day, whether or not they know math comes down to whether or not they can apply techniques within that well-defined body of knowledge to solve problems within that well-defined body of knowledge. Read more...
Enter grades early on, and (if pre-college) email parents early on. Read more...
If you go directly to the most abstract ideas then you’re basically like a kid who reads a book of famous quotes about life and thinks they understand everything about life by way of those quotes. Read more...
Tear down the unproductive habit and build up a counter-habit whose gravity eventually becomes strong enough to completely overtake the original habit. Read more...
What you want is a continual cycle of strain and adaptation. Read more...
There are many studies demonstrating a benefit of some component of deliberate practice, but these studies often get mislabeled or misinterpreted as demonstrating the full benefit of true deliberate practice. The field of education is particularly susceptible to this issue because it is impossible for a teacher with a classroom of students to provide a true deliberate practice experience without assistive technology that perfectly emulates the one-on-one pedagogical decisions that an expert tutor would make for each individual student. Read more...
Hardcore skill development is necessary to do big things, it’s one of the greatest social mobility hacks, and it gives you the ability/confidence to take risks knowing that you’ll be okay. Read more...
One of the best career hacks – especially for a junior dev – is to knock out your work so quickly and so well that you put pressure on your boss to come up with more work for you. Your boss starts giving you work that they themself need to do soon, which is really the exact kind of work that’s going to move your career forward. Read more...
To quote a Math Academy student: “The fastest and most rigorous progress will be made by individuals in front of their computers.” Read more...
Get yourself into an area that requires deep domain expertise, working on things that haven’t been done or even thoroughly imagined yet. Read more...
Once you get past steps 1-3, it’s hard to find scaffolding. You can’t just enroll in a course or pick up a textbook. The scaffolding comes from finding a mentor on a mission that you identify with and are well-suited to contribute to. And it can take a lot of searching to find that person and problem area that’s the right fit. Read more...
When students do the mathematical equivalent of playing kickball during class, and then are expected to do the mathematical equivalent of a backflip at the end of the year, it’s easy to see how struggle and general negative feelings can arise. Read more...
Regret minimization cuts both ways. Read more...
… is asking students to perform activities that leverage a non-existent knowledge base. Read more...
You get to provide value that nobody else can, and you get recognized for it. Read more...
It’s a hard truth that some people have more advantageous cognitive differences than others – e.g., higher working memory capacity, higher generalization ability, slower forgetting rate. However, there are two sources of hope: 1) automaticity can effectively turn your long-term memory into an extension of your working memory, and 2) many sources of friction in the learning process can be not only remedied but also exploited to increase learning speed beyond the status quo. Read more...
A limit problem conjured up from the depths of hell. Read more...
You gotta develop automaticity on low-level skills in order to free up mental resources for higher-level thinking! Read more...
Nobody who knows the science of learning is actually debating this. Read more...
There is an asymmetric tradeoff between 1) blowing your working memory capacity and leaving yourself unable to make progress, versus 2) wasting a couple extra seconds writing down a bit more work than you need to. When in doubt, write it out. Read more...
The underlying principle that it all boils down to is deliberate practice. Read more...
Math gets hard for different students at different levels. If you don’t have worked examples to help carry you through once math becomes hard for you, then every problem basically blows up into a “research project” for you. Sometimes people advocate for unguided struggle as a way to improve general problem-solving ability, but this idea lacks empirical support. Worked examples won’t prevent you from developing deep understanding (actually, it’s the opposite: worked examples can help you quickly layer on more skills, which forces a structural integrity in the lower levels of your knowledge). Even if you decide against using worked examples for now, continually re-evaluate to make sure you’re getting enough productive training volume. Read more...
First, you need extensive and solid content knowledge. Then, you need to work through tons of practice exams for the specific exam you’re taking. This might sound simple, but every year, countless people manage to screw it up. Read more...
Many educators think that the makeup of every year in a student’s education should be balanced the same way across Bloom’s taxonomy, whereas Bloom’s 3-stage talent development process suggests that the time allocation should change drastically as a student progresses through their education. Read more...
Research indicates the best way to improve your problem-solving ability in any domain is simply by acquiring more foundational skills in that domain. The way you increase your ability to make mental leaps is not actually by jumping farther, but rather, by building bridges that reduce the distance you need to jump. Yet, higher math textbooks & courses seem to focus on trying to train jumping distance instead of bridge-building. Read more...
I learned from those kinds of resources myself, and while I came a long way, for the amount of effort I put into learning, I could have gone a lot further if my time were used more efficiently. That’s the problem that Math Academy solves. Read more...
Challenge problems are not a good use of time until you’ve developed the foundational skills that are necessary to grapple with these problems in a productive and timely fashion. Read more...
When you’re developing skills at peak efficiency, you are maximizing the difficulty of your training tasks subject to the constraint that you end up successfully overcoming those difficulties in a timely manner. Read more...
Students eat meals of information at similar bite rates when each spoonful fed to them is sized appropriately relative to the size of their mouth. (Note that equal bite rates does not imply equal rates of food volume intake.) Read more...
It’s the act of successfully retrieving fuzzy memory, not clear memory, that extends the memory duration. Read more...
To transfer information into long-term memory, you need to practice retrieving it without assistance. Read more...
What are LLMs good for in STEM education? Where do LLMs fall short, and why? What does an educational AI need to do for its students to succeed? Read more...
Deliberate practice is the most effective form of active learning. It consists of individualized training activities specially chosen to improve specific aspects of a student’s performance through repetition and successive refinement. It is the opposite of mindless repetition. The amount of deliberate practice has been shown to be one of the most prominent underlying factors responsible for individual differences in performance across numerous fields, even among highly talented elite performers. Deliberate practice demands effort and intensity, and may be discomforting, but its long-term commitment compounds incremental improvements, leading to expertise. Read more...
Not everybody can learn every level of math, but most people can learn the basics. In practice, however, few people actually reach their full mathematical potential because they get knocked off course early on by factors such as missing foundations, ineffective practice habits, inability or unwillingness to engage in additional practice, or lack of motivation. Read more...
Different people generally have different working memory capacities and learn at different rates, but people do not actually learn better in their preferred “learning style.” Instead, different people need the same form of practice but in different amounts. Read more...
Students and teachers are often not aligned with the goal of maximizing learning, which means that in the absence of accountability and incentives, classrooms are pulled towards a state of mediocrity. Accountability and incentives are typically absent in education, which leads to a “tragedy of the commons” situation where students pass courses (often with high grades) despite severely lacking knowledge of the content. Read more...
Cognition involves the flow of information through sensory, working, and long-term memory banks in the brain. Sensory memory temporarily holds raw data, working memory manipulates and organizes information, and long-term memory stores it indefinitely by creating strategic electrical wiring between neurons. Learning amounts to increasing the quantity, depth, retrievability, and generalizability of concepts and skills in a student’s long-term memory. Limited working memory capacity creates a bottleneck in the transfer of information into long-term memory, but cognitive learning strategies can be used to mitigate the effects of this bottleneck. Read more...
Talent development is not only different from schooling, but in many cases completely orthogonal to schooling. Read more...
It rests on a critical assumption that the amount of learning that occurs during initial instruction is zero or otherwise negligible, which is not always true. Read more...
If you look at the kinds of math that most quantitative professionals use on a daily basis, competition math tricks don’t show up anywhere. But what does show up everywhere is university-level math subjects. Read more...
During its operation from 2020 to 2023, Eurisko was the most advanced high school math/CS sequence in the USA. It culminated in high school students doing masters/PhD-level coursework (reproducing academic research papers in artificial intelligence, building everything from scratch in Python). Read more...
Speaking as someone who had to suffer through a teacher credentialing program… it’s actually an anti-signal when someone references their teaching credential as a qualification to speak about how learning happens. It’s centered around political ideology rather than the science of learning. Read more...
An intuitive derivation. Read more...
Coding tutorials typically just say “import this function then run it,” and the math tutorials typically just say “this is the form of the model, you can fit it using the usual techniques” and leave it to the reader to figure out the rest. Read more...
The 3 types of problems that I would have students work out back when I was teaching ML. Read more...
I was coming in with the mindset of “we need to cover the superset of all the content covered in the major textbooks,” which we’re able to do quite well for traditional math. For ML, the rule will have to be amended to “we need to cover the superset of all the content covered in standard university course syllabi.” Read more...
A little rhyme to understand the big picture of top-down vs bottom-up learning, particularly in the context of machine learning (ML). Read more...
1) Foundational math. 2) Classical machine learning. 3) Deep learning. 4) Cutting-edge machine learning. Read more...
It can be helpful to take a top-down approach in planning out your overarching learning goals, but the learning itself has to occur bottom-up. Read more...
If you start to flail (or, more subtly, doubt yourself and lose interest) after jumping into ML without a baseline level of foundational knowledge, then you need to put your ego aside and re-allocate your time into shoring up your foundations. Read more...
The network becomes book-smart in a particular area but not street-smart in general. The training procedure is like a series of exams on material within a tiny subject area (your data subspace). The network refines its knowledge in the subject area to maximize its performance on those exams, but it doesn’t refine its knowledge outside that subject area. And that leaves it gullible to adversarial examples using inputs outside the subject area. Read more...
Initial parameter range, data sampling range, severity of regularization. Read more...
If you know your single-variable calculus, then it’s about 70 hours on Math Academy. Read more...
Using convolutional layers to create an even better checkers player. Read more...
Extending Fogel’s tic-tac-toe player to the game of checkers. Read more...
Reimplementing the paper that laid the groundwork for Blondie24. Read more...
A method for training neural networks that works even when training feedback is sparse. Read more...
A convenient technique for computing gradients in neural networks. Read more...
The deeper or more “hierarchical” a computational graph is, the more complex the model that it represents. Read more...
We can algorithmically build classifiers that use a sequence of nested “if-then” decision rules. Read more...
A simple classification algorithm grounded in Bayesian probability. Read more...
One of the simplest classifiers. Read more...
In many real-life situations, there is more than one input variable that controls the output variable. Read more...
Gradient descent can help us avoid pitfalls that occur when fitting nonlinear models using the pseudoinverse. Read more...
Just because model appears to match closely with points in the data set, does not necessarily mean it is a good model. Read more...
Transforming nonlinear functions so that we can fit them using the pseudoinverse. Read more...
Exploring the most general class of functions that can be fit using the pseudoinverse. Read more...
Using matrix algebra to fit simple functions to data sets. Read more...
Guess some initial clusters in the data, and then repeatedly update the guesses to make the clusters more cohesive. Read more...
A walkthrough of solving Tower of Hanoi using the approach of one of the earliest AI systems. Read more...
Media outlets often make the mistake of anthropomorphizing or attributing human-like characteristics to computer programs. Read more...
As computation power increased, neural networks began to take center stage in AI. Read more...
Expert systems stored “if-then” rules derived from the knowledge of experts. Read more...
Framing reasoning as searching through a maze of actions for a sequence that achieves the desired end goal. Read more...
Turing test, games, hype, narrow vs general AI. Read more...
The type of ensemble model that wins most data science competitions is the stacked model, which consists of an ensemble of entirely different species of models together with some combiner algorithm. Read more...
Decision trees are able to model nonlinear data while remaining interpretable. Read more...
NNs are similar to SVMs in that they project the data to a higher-dimensional space and fit a hyperplane to the data in the projected space. However, whereas SVMs use a predetermined kernel to project the data, NNs automatically construct their own projection. Read more...
A Support Vector Machine (SVM) computes the “best” separation between classes as the maximum-margin hyperplane. Read more...
In linear regression, we model the target as a random variable whose expected value depends on a linear combination of the predictors (including a bias term). Read more...
To visualize the relationship between the MAP and MLE estimations, one can imagine starting at the MLE estimation, and then obtaining the MAP estimation by drifting a bit towards higher density in the prior distribution. Read more...
Naive Bayes classification naively assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. Read more...
Sure, accelerating via self-study not as optimal as accelerating within teacher-managed courses, but it’s way better than not accelerating at all. Read more...
There’s only so much fun you can have trying to follow another person’s footsteps to arrive at a known solution. There’s only so much confidence you can build from fighting against a problem that someone else has intentionally set up to be well-posed and elegantly solvable if you think about it the right way. Read more...
My training has been scattered and fuzzy until recently. Here’s the whole story. Read more...
I’d start off with some introductory course that covers the very basics of coding in some language that is used by many professional programmers but where the syntax reads almost like plain English and lower-level details like memory management are abstracted away. Then, I’d jump right into building board games and strategic game-playing agents (so a human can play against the computer), starting with simple games (e.g. tic-tac-toe) and working upwards from there (maybe connect 4 next, then checkers, and so on). Read more...
Solving problems, building on top of what you’ve learned, reviewing what you’ve learned, and quality, quantity, and spacing of practice. Read more...
An oval () fits inside a rectangle [ ] with the same width and height. Read more...
In general, you can manipulate total derivatives like fractions, but you can’t do the same with partial derivatives. Read more...
Enter grades early on, and (if pre-college) email parents early on. Read more...
Imitating without analyzing produces a robot / ape who can’t think critically; analyzing without imitating produces a critic who can’t act on their own advice. Read more...
Initial parameter range, data sampling range, severity of regularization. Read more...
… is to present a problem where known simpler techniques fail. Read more...
I’d start off with some introductory course that covers the very basics of coding in some language that is used by many professional programmers but where the syntax reads almost like plain English and lower-level details like memory management are abstracted away. Then, I’d jump right into building board games and strategic game-playing agents (so a human can play against the computer), starting with simple games (e.g. tic-tac-toe) and working upwards from there (maybe connect 4 next, then checkers, and so on). Read more...
For many (but not all) students, the answer is yes. And for many of those students, automation can unlock life-changing educational outcomes. Read more...
As you climb the levels of math, sources of educational friction conspire against you and eventually throw you off the train. And one of the first warning signs is when you stop understanding things at the core, and instead try to memorize special cases cookbook-style. Read more...
Why it’s common for students to pass courses despite severely lacking knowledge of the content. Read more...
If you look at the kinds of math that most quantitative professionals use on a daily basis, competition math tricks don’t show up anywhere. But what does show up everywhere is university-level math subjects. Read more...
While some may view Feynman-style pedagogy as supporting inclusive learning for all students across varying levels of ability, Feynman himself acknowledged that his methods only worked for the top 10% of his students. Read more...
Speaking as someone who had to suffer through a teacher credentialing program… it’s actually an anti-signal when someone references their teaching credential as a qualification to speak about how learning happens. It’s centered around political ideology rather than the science of learning. Read more...
Two subtypes of coders that I watched students grow into. Read more...
Perform the desired transformation on identity matrix to get a left-multiplier, and maybe transpose the output. Read more...
The matrix exponential can be defined as a power series and used to solve systems of linear differential equations. Read more...
Jordan form provides a guaranteed backup plan for exponentiating matrices that are non-diagonalizable. Read more...
Matrix diagonalization can be applied to construct closed-form expressions for recursive sequences. Read more...
The eigenvectors of a matrix are those vectors that the matrix simply rescales, and the factor by which an eigenvector is rescaled is called its eigenvalue. These concepts can be used to quickly calculate large powers of matrices. Read more...
The inverse of a matrix is a second matrix which undoes the transformation of the first matrix. Read more...
Every square matrix can be decomposed into a product of rescalings and shears. Read more...
How to multiply a matrix by another matrix. Read more...
Matrices are vectors whose components are themselves vectors. Read more...
Solving linear systems can sometimes be a necessary component of solving nonlinear systems. Read more...
Shearing can be used to express the solution of a linear system using ratios of volumes, and also to compute volumes themselves. Read more...
Rich intuition about why the number of solutions to a square linear system is governed by the volume of the parallelepiped formed by the coefficient vectors. Read more...
N-dimensional volume generalizes the idea of the space occupied by an object. We can think about N-dimensional volume as being enclosed by N-dimensional vectors. Read more...
If we interpret linear systems as sets of vectors, then elimination corresponds to vector reduction. Read more...
The span of a set of vectors consists of all vectors that can be made by adding multiples of vectors in the set. We can often reduce a set of vectors to a simpler set with the same span. Read more...
A line starts at an initial point and proceeds straight in a constant direction. A plane is a flat sheet that makes a right angle with some particular vector. Read more...
What does it mean to multiply a vector by another vector? Read more...
N-dimensional space consists of points that have N components. Read more...
We do it all manually, entirely by hand. Read more...
And the problem with many existing times tables practice systems. Read more...
I was coming in with the mindset of “we need to cover the superset of all the content covered in the major textbooks,” which we’re able to do quite well for traditional math. For ML, the rule will have to be amended to “we need to cover the superset of all the content covered in standard university course syllabi.” Read more...
To quote a Math Academy student: “The fastest and most rigorous progress will be made by individuals in front of their computers.” Read more...
Once you get past steps 1-3, it’s hard to find scaffolding. You can’t just enroll in a course or pick up a textbook. The scaffolding comes from finding a mentor on a mission that you identify with and are well-suited to contribute to. And it can take a lot of searching to find that person and problem area that’s the right fit. Read more...
And why we refer to ourselves as still being “in beta.” Read more...
Even if students are working on exactly the right things, they need to be working exactly the right way to capture the most learning from their time spent working. Read more...
Around 50-60 XP/day, that is, 50-60 minutes of serious practice per day. Just like the high-end amount of daily exercise you’d expect from people who keep a consistent exercise routine at the gym. Read more...
834 XP = 834 minutes = 14 hours of work in a single day. You’re probably wondering, what kind of person does that much math in a day? Time for a little story. Read more...
Learning math with little computation is like learning basketball with little practice on dribbling & ball handling techniques. Read more...
I learned from those kinds of resources myself, and while I came a long way, for the amount of effort I put into learning, I could have gone a lot further if my time were used more efficiently. That’s the problem that Math Academy solves. Read more...
Our AI system is one of those things that sounds intuitive enough at a high level, but if you start trying to implement it yourself, you quickly run into a mountain of complexity, numerous edge cases, lots of counterintuitive low-level phenomena that take a while to fully wrap your head around. Read more...
During its operation from 2020 to 2023, Eurisko was the most advanced high school math/CS sequence in the USA. It culminated in high school students doing masters/PhD-level coursework (reproducing academic research papers in artificial intelligence, building everything from scratch in Python). Read more...
Two subtypes of coders that I watched students grow into. Read more...
In 9 months, these students went from initially not knowing how to write helper functions to building a machine learning library from scratch. Read more...
Using convolutional layers to create an even better checkers player. Read more...
Extending Fogel’s tic-tac-toe player to the game of checkers. Read more...
Reimplementing the paper that laid the groundwork for Blondie24. Read more...
A method for training neural networks that works even when training feedback is sparse. Read more...
Combining game-specific human intelligence (heuristics) and generalizable artificial intelligence (minimax on a game tree) Read more...
Repeatedly choosing the action with the best worst-case scenario. Read more...
Building data structures that represent all the possible outcomes of a game. Read more...
A convenient technique for computing gradients in neural networks. Read more...
The deeper or more “hierarchical” a computational graph is, the more complex the model that it represents. Read more...
We can algorithmically build classifiers that use a sequence of nested “if-then” decision rules. Read more...
Computing spatial relationships between nodes when edges no longer represent unit distances. Read more...
Using traversals to understand spatial relationships between nodes in graphs. Read more...
Graphs show up all the time in computer science, so it’s important to know how to work with them. Read more...
It should look less like them helping you and more like you helping them. Read more...
1) Learn SQL and how to use a debugger. 2) Never come up emptyhanded, even if you don’t fix the bug. Read more...
You can be the most committed and capable workhorse on the planet, but if you’re on the wrong team, the only thing you’ll change is your team’s allocation of work. Read more...
One main focus, one semi-focus, and everything else a hobby with whatever time you have left over. Read more...
1) Difficulty grappling with complexity when it grows so big that you can’t fit everything in your head. 2) Lack of understanding or willingness to accept practical constraints of the problem and incorporate them into the solution. 3) Getting distracted by low-ROI features/details. 4) Being unwilling to do “tedious” work. Read more...
Depending on your goals, either A) methods of proof, or B) linear algebra followed by probability & statistics. Read more...
Hardcore skill development is necessary to do big things, it’s one of the greatest social mobility hacks, and it gives you the ability/confidence to take risks knowing that you’ll be okay. Read more...
Get yourself into an area that requires deep domain expertise, working on things that haven’t been done or even thoroughly imagined yet. Read more...
Write code that makes complicated decisions, often involving some kind of inference. Read more...
It comes out to roughly a fortieth of that of a truck. Read more...
String art works because the strings are tangent lines to a curve. Read more...
Calculus can show us how our intuition can fail us, a common theme in philosophy. Read more...
Deriving the “Pert” formula. Read more...
If we know the revenue and costs associated with producing any number of units, then we can use calculus to figure out the number of units to produce for maximum profit. Read more...
Calculus can be used to find the parameters that minimize a function. Read more...
Physics engines use calculus to periodically updates the locations of objects. Read more...
Introducing Kajiya’s rendering equation. Read more...
Deriving the ideal rocket equation. Read more...
Deriving the Gompertz function. Read more...
Understanding why even slight narrowing of arteries can pose such a big problem to blood flow. Read more...
Measuring volume of blood the heart pumps out into the aorta per unit time. Read more...
Equations involving compositions of trigonometric functions can create wild patterns in the plane. Read more...
Lissajous curves use sine functions to create interesting patterns in the plane. Read more...
Absolute value graphs can be rotated to draw stars. Read more...
Non-euclidean ellipses can be used to draw starry-eye sunglasses. Read more...
Euclidean ellipses can be combined with sine wave shading to form three-dimensional shells. Read more...
High-frequency sine waves can be used to draw shaded regions. Read more...
Roots can be used to draw deer. Read more...
Sine waves can be used to draw scales on a fish. Read more...
Parabolas can be used to draw a fish. Read more...
Absolute value can be used to draw a person. Read more...
Slanted lines can be used to draw a spider web. Read more...
Horizontal and vertical lines can be used to draw a castle. Read more...
Equations involving compositions of trigonometric functions can create wild patterns in the plane. Read more...
Lissajous curves use sine functions to create interesting patterns in the plane. Read more...
Absolute value graphs can be rotated to draw stars. Read more...
Non-euclidean ellipses can be used to draw starry-eye sunglasses. Read more...
Euclidean ellipses can be combined with sine wave shading to form three-dimensional shells. Read more...
High-frequency sine waves can be used to draw shaded regions. Read more...
Roots can be used to draw deer. Read more...
Sine waves can be used to draw scales on a fish. Read more...
Parabolas can be used to draw a fish. Read more...
Absolute value can be used to draw a person. Read more...
Slanted lines can be used to draw a spider web. Read more...
Horizontal and vertical lines can be used to draw a castle. Read more...
… is asking students to perform activities that leverage a non-existent knowledge base. Read more...
The need for automaticity on low-level skills is obvious to anyone with experience learning a sport or instrument. So why is there sometimes resistance in education? It makes sense if you think about what people usually find persuasive. Read more...
Students eat meals of information at similar bite rates when each spoonful fed to them is sized appropriately relative to the size of their mouth. (Note that equal bite rates does not imply equal rates of food volume intake.) Read more...
What are LLMs good for in STEM education? Where do LLMs fall short, and why? What does an educational AI need to do for its students to succeed? Read more...
Many students who pattern-match will tend to prefer solutions requiring fewer and simpler operations, especially if those solutions yield ballpark-reasonable results. Read more...
It rests on a critical assumption that the amount of learning that occurs during initial instruction is zero or otherwise negligible, which is not always true. Read more...
The fuzzier that memory, the harder it is to lift. The wait creates the weight. Read more...
The habit is a psychological force field that protects you from all sorts of negative feelings that try to dissuade you from training. Read more...
If you try to keep information close by taking great notes that you can reference all the time… that just PREVENTS you from truly retaining it. Read more...
With the science of learning, it’s less about “keeping up” with what’s happening, and more about “catching up” with what’s already happened. Read more...
Accumulating mathematical knowledge gaps can lead students to reach a tipping point where further learning becomes overwhelming, ultimately causing them to abandon math entirely. Read more...
“…[D]eliberate practice requires effort and is not inherently enjoyable. Individuals are motivated to practice because practice improves performance.” Read more...
If you understand the interplay between working memory and long-term memory, then then you can actually derive – from first principles – the methods of effective teaching. Read more...
Hard-coding explanations feels tedious, takes a lot of work, and isn’t “sexy” like an AI that generates responses from scratch – but at least it’s not a pipe dream. It’s a practical solution that lets you move on to other components of the AI that are just as important. Read more...
If all the knowledge you show up with is high school math and AP Calculus, and you’re not a genius, then you’re going to get your ass handed to you. Read more...
There are numerous cognitive learning strategies that 1) can be used to massively improve learning, 2) have been reproduced so many times they might as well be laws of physics, and 3) connect all the way down to the mechanics of what’s going on in the brain. Read more...
Solving equations feels smooth when basic arithmetic is automatic – it’s like moving puzzle pieces around, and you just need to identify how they fit together. But without automaticity on basic arithmetic, each puzzle piece is a heavy weight. You struggle to move them at all, much less figure out where they’re supposed to go. Read more...
But in talent development, the optimization problem is clear: an individual’s performance is to be maximized, so the methods used during practice are those that most efficiently convert effort into performance improvements. Read more...
A silly bug turned genius hack. Read more...
The type of ensemble model that wins most data science competitions is the stacked model, which consists of an ensemble of entirely different species of models together with some combiner algorithm. Read more...
Decision trees are able to model nonlinear data while remaining interpretable. Read more...
NNs are similar to SVMs in that they project the data to a higher-dimensional space and fit a hyperplane to the data in the projected space. However, whereas SVMs use a predetermined kernel to project the data, NNs automatically construct their own projection. Read more...
A Support Vector Machine (SVM) computes the “best” separation between classes as the maximum-margin hyperplane. Read more...
In linear regression, we model the target as a random variable whose expected value depends on a linear combination of the predictors (including a bias term). Read more...
To visualize the relationship between the MAP and MLE estimations, one can imagine starting at the MLE estimation, and then obtaining the MAP estimation by drifting a bit towards higher density in the prior distribution. Read more...
Naive Bayes classification naively assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. Read more...
An idea for a paper that I don’t currently have the bandwidth to write. Read more...
Students eat meals of information at similar bite rates when each spoonful fed to them is sized appropriately relative to the size of their mouth. (Note that equal bite rates does not imply equal rates of food volume intake.) Read more...
It rests on a critical assumption that the amount of learning that occurs during initial instruction is zero or otherwise negligible, which is not always true. Read more...
Spaced repetition is complicated in hierarchical bodies of knowledge, like mathematics, because repetitions on advanced topics should “trickle down” to update the repetition schedules of simpler topics that are implicitly practiced (while being discounted appropriately since these repetitions are often too early to count for full credit towards the next repetition). However, I developed a model of Fractional Implicit Repetition (FIRe) that not only accounts for implicit “trickle-down” repetitions but also minimizes the number of reviews by choosing reviews whose implicit repetitions “knock out” other due reviews (like dominos), and calibrates the speed of the spaced repetition process to each individual student on each individual topic (student ability and topic difficulty are competing factors). Read more...
In a simplified problem framing, we investigate the (game-theoretical) usefulness of limiting the number of social connections per person. Read more...
We solve a special case of how to periodically stimulate a biological neural network to obtain a desired connectivity (in theory). Read more...
Implementation notes for STDP learning in a network of Hodgkin-Huxley simulated neurons. Read more...
Many existing proofs are not accessible to young mathematicians or those without experience in the realm of dynamic systems. Read more...
And a proof via double induction. Read more...
When a limit takes the indeterminate form of zero divided by zero or infinity divided by infinity, we can differentiate the numerator and denominator separately without changing the actual value of the limit. Read more...
We can interpret the derivative as an approximation for how a function’s output changes, when the function input is changed by a small amount. Read more...
Derivatives can be used to find a function’s local extreme values, its peaks and valleys. Read more...
There are convenient rules the derivatives of exponential, logarithmic, trigonometric, and inverse trigonometric functions. Read more...
Given a sum, we can differentiate each term individually. But why are we able to do this? Does multiplication work the same way? What about division? Read more...
When taking derivatives of compositions of functions, we can ignore the inside of a function as long as we multiply by the derivative of the inside afterwards. Read more...
There are some patterns that allow us to compute derivatives without having to compute the limit of the difference quotient. Read more...
The derivative of a function is the function’s slope at a particular point, and can be computed as the limit of the difference quotient. Read more...
Various tricks for evaluating tricky limits. Read more...
The limit of a function, as the input approaches some value, is the output we would expect if we saw only the surrounding portion of the graph. Read more...
A silly bug turned genius hack. Read more...
834 XP = 834 minutes = 14 hours of work in a single day. You’re probably wondering, what kind of person does that much math in a day? Time for a little story. Read more...
Won first place in a state-level competition by finding and exploiting a loophole in the points scoring logic. Read more...
The most important things I learned from competing in science fairs had nothing to do with physics or even academics. My main takeaways were actually related to business – in particular, sales and marketing. Read more...
During its operation from 2020 to 2023, Eurisko was the most advanced high school math/CS sequence in the USA. It culminated in high school students doing masters/PhD-level coursework (reproducing academic research papers in artificial intelligence, building everything from scratch in Python). Read more...
Minor changes to increase workout intensity and caloric surplus. Read more...
Speaking as someone who had to suffer through a teacher credentialing program… it’s actually an anti-signal when someone references their teaching credential as a qualification to speak about how learning happens. It’s centered around political ideology rather than the science of learning. Read more...
Daily 20-30 minute bedroom workout with gymnastic rings hanging from pull-up bar – just as much challenge as weights, but inexpensive and easily portable. Read more...
Two subtypes of coders that I watched students grow into. Read more...
An aha moment with object-oriented programming. Read more...
A technique for maximizing linear expressions subject to linear constraints. Read more...
Under the hood, dictionaries are hash tables. Read more...
Implementing a differential equations model that won the Nobel prize. Read more...
A simple differential equations model that we can plot using multivariable Euler estimation. Read more...
Arrays can be used to implement more than just matrices. We can also implement other mathematical procedures like Euler estimation. Read more...
One of the best ways to get practice with object-oriented programming is implementing games. Read more...
Guess some initial clusters in the data, and then repeatedly update the guesses to make the clusters more cohesive. Read more...
You can use the RREF algorithm to compute determinants much faster than with the recursive cofactor expansion method. Read more...
We can use arrays to implement matrices and their associated mathematical operations. Read more...
1) Foundational math. 2) Classical machine learning. 3) Deep learning. 4) Cutting-edge machine learning. Read more...
What are LLMs good for in STEM education? Where do LLMs fall short, and why? What does an educational AI need to do for its students to succeed? Read more...
A walkthrough of solving Tower of Hanoi using the approach of one of the earliest AI systems. Read more...
Media outlets often make the mistake of anthropomorphizing or attributing human-like characteristics to computer programs. Read more...
As computation power increased, neural networks began to take center stage in AI. Read more...
Expert systems stored “if-then” rules derived from the knowledge of experts. Read more...
Framing reasoning as searching through a maze of actions for a sequence that achieves the desired end goal. Read more...
Turing test, games, hype, narrow vs general AI. Read more...
A convenient technique for computing gradients in neural networks. Read more...
The deeper or more “hierarchical” a computational graph is, the more complex the model that it represents. Read more...
In many real-life situations, there is more than one input variable that controls the output variable. Read more...
Gradient descent can help us avoid pitfalls that occur when fitting nonlinear models using the pseudoinverse. Read more...
Just because model appears to match closely with points in the data set, does not necessarily mean it is a good model. Read more...
Transforming nonlinear functions so that we can fit them using the pseudoinverse. Read more...
Exploring the most general class of functions that can be fit using the pseudoinverse. Read more...
Using matrix algebra to fit simple functions to data sets. Read more...
Bridging the communication gap between academia and industry in the field of TDA. Read more...
Demonstrating an open-source implementation of persistent homology techniques in the TDA package for R. Read more...
Persistent homology provides a way to quantify the topological features that persist over our a data set’s full range of scale. Read more...
At Aunalytics, Mapper outperformed hierarchical clustering in providing granular insights. Read more...
Ayasdi developed commercial Mapper software and sells a subscription service to clients who wish to create topological network visualizations of their data. Read more...
Demonstrating an open-source implementation of Mapper in the TDAmapper package for R. Read more...
Representing a data space’s topology by converting it into a network. Read more...
Media outlets often make the mistake of anthropomorphizing or attributing human-like characteristics to computer programs. Read more...
As computation power increased, neural networks began to take center stage in AI. Read more...
Expert systems stored “if-then” rules derived from the knowledge of experts. Read more...
Framing reasoning as searching through a maze of actions for a sequence that achieves the desired end goal. Read more...
Turing test, games, hype, narrow vs general AI. Read more...
Nobody came out of the dispute well. Read more...
When Joseph Fourier first introduced Fourier series, they gave mathematicians nightmares. Read more...
When we know the solutions of a linear differential equation with constant coefficients and right hand side equal to zero, we can use variation of parameters to find a solution when the right hand side is not equal to zero. Read more...
Integrating factors can be used to solve first-order differential equations with non-constant coefficients. Read more...
Undetermined coefficients can help us find a solution to a linear differential equation with constant coefficients when the right hand side is not equal to zero. Read more...
Given a linear differential equation with constant coefficients and a right hand side of zero, the roots of the characteristic polynomial correspond to solutions of the equation. Read more...
Non-separable differential equations can be sometimes converted into separable differential equations by way of substitution. Read more...
When faced with a differential equation that we don’t know how to solve, we can sometimes still approximate the solution. Read more...
The simplest differential equations can be solved by separation of variables, in which we move the derivative to one side of the equation and take the antiderivative. Read more...
The network becomes book-smart in a particular area but not street-smart in general. The training procedure is like a series of exams on material within a tiny subject area (your data subspace). The network refines its knowledge in the subject area to maximize its performance on those exams, but it doesn’t refine its knowledge outside that subject area. And that leaves it gullible to adversarial examples using inputs outside the subject area. Read more...
Using convolutional layers to create an even better checkers player. Read more...
Extending Fogel’s tic-tac-toe player to the game of checkers. Read more...
Reimplementing the paper that laid the groundwork for Blondie24. Read more...
A method for training neural networks that works even when training feedback is sparse. Read more...
A convenient technique for computing gradients in neural networks. Read more...
The deeper or more “hierarchical” a computational graph is, the more complex the model that it represents. Read more...
We solve a special case of how to periodically stimulate a biological neural network to obtain a desired connectivity (in theory). Read more...
A workbook I created to explain the math and physics behind an Iron Man suit to a student who was interested in the comics / movies. Read more...
A workbook I created to explain the math and physics behind an egg drop experiment to a student who was interested in Lord of the Rings and Star Wars. Read more...
A brief overview of sound waves and how they interact with things. Read more...
A brief overview of the experimental search for dark matter (XENON, CDMS, PICASSO, COUPP). Read more...
Mass discrepancies in galaxies and clusters, cosmic background radiation, the structure of the universe, and big bang nucleosynthesis’s impact on baryon density. Read more...
Improper integrals have bounds or function values that extend to positive or negative infinity. Read more...
We can apply integration by parts whenever an integral would be made simpler by differentiating some expression within the integral, at the cost of anti-differentiating another expression within the integral. Read more...
Substitution involves condensing an expression of into a single new variable, and then expressing the integral in terms of that new variable. Read more...
To evaluate a definite integral, we find the antiderivative, evaluate it at the indicated bounds, and then take the difference. Read more...
The antiderivative of a function is a second function whose derivative is the first function. Read more...
Integrals give the area under a portion of a function. Read more...
Systems of quadratic equations can be solved via substitution. Read more...
To easily graph a quadratic equation, we can convert it to vertex form. Read more...
Completing the square helps us gain a better intuition for quadratic equations and understand where the quadratic formula comes from. Read more...
To solve hard-to-factor quadratic equations, it’s easiest to use the quadratic formula. Read more...
Factoring is a method for solving quadratic equations. Read more...
Quadratic equations are similar to linear equations, except that they contain squares of a single variable. Read more...
Many differential equations don’t have solutions that can be expressed in terms of finite combinations of familiar functions. However, we can often solve for the Taylor series of the solution. Read more...
To find the Taylor series of complicated functions, it’s often easiest to manipulate the Taylor series of simpler functions. Read more...
Many non-polynomial functions can be represented by infinite polynomials. Read more...
Various tricks for determining whether a series converges or diverges. Read more...
A geometric series is a sum where each term is some constant times the previous term. Read more...
Using convolutional layers to create an even better checkers player. Read more...
Extending Fogel’s tic-tac-toe player to the game of checkers. Read more...
Reimplementing the paper that laid the groundwork for Blondie24. Read more...
A method for training neural networks that works even when training feedback is sparse. Read more...
Combining game-specific human intelligence (heuristics) and generalizable artificial intelligence (minimax on a game tree) Read more...
One of the best ways to get practice with object-oriented programming is implementing games. Read more...
An oval () fits inside a rectangle [ ] with the same width and height. Read more...
Is there a standard “order of operations” for parallel vs nested absolute value expressions, in the absence of clarifying notation? Read more...
Drawing –> Latex commands –> ChatGPT summary –> Google more info Read more...
There’s a cognitive principle behind this: associative interference, the phenomenon that conceptually related pieces of knowledge can interfere with each other’s recall. Read more...
Start out with a volume of work that’s small enough that you don’t dread doing it again the next day. Read more...
Regret minimization cuts both ways. Read more...
The whole idea is that you want the other person to raise the bar on competition and pass you up, so that you’re motivated to come right back and do the same to them. Read more...
Bridging the communication gap between academia and industry in the field of TDA. Read more...
At Aunalytics, Mapper outperformed hierarchical clustering in providing granular insights. Read more...
Ayasdi developed commercial Mapper software and sells a subscription service to clients who wish to create topological network visualizations of their data. Read more...
Demonstrating an open-source implementation of Mapper in the TDAmapper package for R. Read more...
Representing a data space’s topology by converting it into a network. Read more...
A linear system consists of multiple linear equations, and the solution of a linear system consists of the pairs that satisfy all of the equations. Read more...
Standard form makes it easy to see the intercepts of a line. Read more...
An easy way to write the equation of a line if we know the slope and a point on a line. Read more...
Introducing linear equations in two variables. Read more...
Loosely speaking, a linear equation is an equality statement containing only addition, subtraction, multiplication, and division. Read more...
A slant asymptote is a slanted line that arises from a linear term in the proper form of a rational function. Read more...
If we choose one input on each side of an asymptote, we can tell which section of the plane the function will occupy. Read more...
Vertical asymptotes are vertical lines that a function approaches but never quite reaches. Read more...
Rational functions can have a form of end behavior in which they become flat, approaching (but never quite reaching) a horizontal line known as a horizontal asymptote. Read more...
Polynomial long division works the same way as the long division algorithm that’s familiar from simple arithmetic. Read more...
A piecewise function is pieced together from multiple different functions. Read more...
Trigonometric functions represent the relationship between sides and angles in right triangles. Read more...
Absolute value represents the magnitude of a number, i.e. its distance from zero. Read more...
Exponential functions have variables as exponents. Logarithms cancel out exponentiation. Read more...
Radical functions involve roots: square roots, cube roots, or any kind of fractional exponent in general. Read more...
Compositions of functions consist of multiple functions linked together, where the output of one function becomes the input of another function. Read more...
Inverting a function entails reversing the outputs and inputs of the function. Read more...
When a function is reflected, it flips across one of the axes to become its mirror image. Read more...
When a function is rescaled, it is stretched or compressed along one of the axes, like a slinky. Read more...
When a function is shifted, all of its points move vertically and/or horizontally by the same amount. Read more...
If we interpret linear systems as sets of vectors, then elimination corresponds to vector reduction. Read more...
The span of a set of vectors consists of all vectors that can be made by adding multiples of vectors in the set. We can often reduce a set of vectors to a simpler set with the same span. Read more...
A line starts at an initial point and proceeds straight in a constant direction. A plane is a flat sheet that makes a right angle with some particular vector. Read more...
What does it mean to multiply a vector by another vector? Read more...
N-dimensional space consists of points that have N components. Read more...
The inverse of a matrix is a second matrix which undoes the transformation of the first matrix. Read more...
Every square matrix can be decomposed into a product of rescalings and shears. Read more...
How to multiply a matrix by another matrix. Read more...
Matrices are vectors whose components are themselves vectors. Read more...
Implementing a differential equations model that won the Nobel prize. Read more...
A simple differential equations model that we can plot using multivariable Euler estimation. Read more...
Arrays can be used to implement more than just matrices. We can also implement other mathematical procedures like Euler estimation. Read more...
How to sample from a discrete probability distribution. Read more...
Estimating probabilities by simulating a large number of random experiments. Read more...
Just like single-variable gradient descent, except that we replace the derivative with the gradient vector. Read more...
We take an initial guess as to what the minimum is, and then repeatedly use the gradient to nudge that guess further and further “downhill” into an actual minimum. Read more...
Bisection search involves repeatedly moving one bound halfway to the other. The Newton-Raphson method involves repeatedly moving our guess to the root of the tangent line. Read more...
Backtracking can drastically cut down the number of possibilities that must be checked during brute force. Read more...
Brute force search involves trying every single possibility. Read more...
Students eat meals of information at similar bite rates when each spoonful fed to them is sized appropriately relative to the size of their mouth. (Note that equal bite rates does not imply equal rates of food volume intake.) Read more...
Solving problems, building on top of what you’ve learned, reviewing what you’ve learned, and quality, quantity, and spacing of practice. Read more...
It rests on a critical assumption that the amount of learning that occurs during initial instruction is zero or otherwise negligible, which is not always true. Read more...
First, you want to form a habit. Second, you want to operate at peak productivity during your session. Third, you want to minimize the amount you forget between sessions. Read more...
Deliberate practice is the most effective form of active learning. It consists of individualized training activities specially chosen to improve specific aspects of a student’s performance through repetition and successive refinement. It is the opposite of mindless repetition. The amount of deliberate practice has been shown to be one of the most prominent underlying factors responsible for individual differences in performance across numerous fields, even among highly talented elite performers. Deliberate practice demands effort and intensity, and may be discomforting, but its long-term commitment compounds incremental improvements, leading to expertise. Read more...
Active learning leads to more neural activation than passive learning. Automaticity involves developing strategic neural connections that reduce the amount of effort that the brain has to expend to activate patterns of neurons. Read more...
During practice, the elite skaters were over 6 times more active than passive, while non-competitive skaters were nearly as passive as they were active. Read more...
A startup spent months building a sophisticated lecture tool and raising over half a million dollars in investments – but after observing students in the lecture hall, they completely abandoned the product and called up their investors to return the money. Read more...
True active learning requires every individual student to be actively engaged on every piece of the material to be learned. Read more...
Active learning leads to more neural activation than passive learning. Automaticity involves developing strategic neural connections that reduce the amount of effort that the brain has to expend to activate patterns of neurons. Read more...
Cognition involves the flow of information through sensory, working, and long-term memory banks in the brain. Sensory memory temporarily holds raw data, working memory manipulates and organizes information, and long-term memory stores it indefinitely by creating strategic electrical wiring between neurons. Learning amounts to increasing the quantity, depth, retrievability, and generalizability of concepts and skills in a student’s long-term memory. Limited working memory capacity creates a bottleneck in the transfer of information into long-term memory, but cognitive learning strategies can be used to mitigate the effects of this bottleneck. Read more...
The brain is a neuronal network integrating specialized subsystems that use local competition and thresholding to sparsify input, spike-timing dependent plasticity to learn inference, and layering to implement hierarchical predictive learning. Read more...
We solve a special case of how to periodically stimulate a biological neural network to obtain a desired connectivity (in theory). Read more...
The limit of a function is the height where it looks like the scribble is going to hit a particular vertical line. Read more...
To solve a system of inequalities, we need to solve each individual inequality and find where all their solutions overlap. Read more...
Quadratic inequalities are best visualized in the plane. Read more...
When a linear equation has two variables, the solution covers a section of the coordinate plane. Read more...
An inequality is similar to an equation, but instead of saying two quantities are equal, it says that one quantity is greater than or less than another. Read more...
We can sketch the graph of a polynomial using its end behavior and zeros. Read more...
The rational roots theorem can help us find zeros of polynomials without blindly guessing. Read more...
The zeros of a polynomial are the inputs that cause it to evaluate to zero. Read more...
The end behavior of a polynomial refers to the type of output that is produced when we input extremely large positive or negative values. Read more...
Rather than duplicating such code each time we want to use it, it is more efficient to store the code in a function. Read more...
We often wish to tell the computer instructions involving the words “if,” “while,” and “for.” Read more...
We can store many related pieces of data within a single variable called a data structure. Read more...
We can store and manipulate data in the form of variables. Read more...
Solving linear systems can sometimes be a necessary component of solving nonlinear systems. Read more...
Shearing can be used to express the solution of a linear system using ratios of volumes, and also to compute volumes themselves. Read more...
Rich intuition about why the number of solutions to a square linear system is governed by the volume of the parallelepiped formed by the coefficient vectors. Read more...
N-dimensional volume generalizes the idea of the space occupied by an object. We can think about N-dimensional volume as being enclosed by N-dimensional vectors. Read more...
The matrix exponential can be defined as a power series and used to solve systems of linear differential equations. Read more...
Jordan form provides a guaranteed backup plan for exponentiating matrices that are non-diagonalizable. Read more...
Matrix diagonalization can be applied to construct closed-form expressions for recursive sequences. Read more...
The eigenvectors of a matrix are those vectors that the matrix simply rescales, and the factor by which an eigenvector is rescaled is called its eigenvalue. These concepts can be used to quickly calculate large powers of matrices. Read more...
Implementing the Cartesian product provides good practice working with arrays. Read more...
Sequences where each term is a function of the previous terms. Read more...
There are other number systems that use more or fewer than ten characters. Read more...
It’s assumed that you’ve had some basic exposure to programming. Read more...
During its operation from 2020 to 2023, Eurisko was the most advanced high school math/CS sequence in the USA. It culminated in high school students doing masters/PhD-level coursework (reproducing academic research papers in artificial intelligence, building everything from scratch in Python). Read more...
Two subtypes of coders that I watched students grow into. Read more...
An aha moment with object-oriented programming. Read more...
In 9 months, these students went from initially not knowing how to write helper functions to building a machine learning library from scratch. Read more...
Using convolutional layers to create an even better checkers player. Read more...
Extending Fogel’s tic-tac-toe player to the game of checkers. Read more...
Reimplementing the paper that laid the groundwork for Blondie24. Read more...
A method for training neural networks that works even when training feedback is sparse. Read more...
Using convolutional layers to create an even better checkers player. Read more...
Extending Fogel’s tic-tac-toe player to the game of checkers. Read more...
Reimplementing the paper that laid the groundwork for Blondie24. Read more...
A method for training neural networks that works even when training feedback is sparse. Read more...
In order to justify using a more complex model, the increase in performance has to be worth the cost of integrating and maintaining the complexity. Read more...
Two subtypes of coders that I watched students grow into. Read more...
Stuff you don’t find in math textbooks. Read more...
… are summarized in the following table. Read more...
An easy trick to improve your retention while working through a bank of review or challenge problems like LeetCode, HackerRank, etc. Read more...
There’s only so much fun you can have trying to follow another person’s footsteps to arrive at a known solution. There’s only so much confidence you can build from fighting against a problem that someone else has intentionally set up to be well-posed and elegantly solvable if you think about it the right way. Read more...
Good problem = intersection between your own interests/talents, the realm of what’s feasible, and the desires of the external world. Read more...
Stuff you don’t find in math textbooks. Read more...
Minor changes to increase workout intensity and caloric surplus. Read more...
Daily 20-30 minute bedroom workout with gymnastic rings hanging from pull-up bar – just as much challenge as weights, but inexpensive and easily portable. Read more...
1) Foundational math. 2) Classical machine learning. 3) Deep learning. 4) Cutting-edge machine learning. Read more...
The hard truth is that if you want to build a serious educational product, you can’t be afraid to charge money for it. You can’t back yourself into a corner where you depend on a massive userbase. Why? Because most people are not serious about learning, and if you depend on a massive base of unserious learners, then you have to employ ineffective learning strategies that do not repel unserious students. Which makes your product suck. Read more...
Learning math early guards you against numerous academic risks and opens all kinds of doors to career opportunities. Read more...
Spaced repetition is complicated in hierarchical bodies of knowledge, like mathematics, because repetitions on advanced topics should “trickle down” to update the repetition schedules of simpler topics that are implicitly practiced (while being discounted appropriately since these repetitions are often too early to count for full credit towards the next repetition). However, I developed a model of Fractional Implicit Repetition (FIRe) that not only accounts for implicit “trickle-down” repetitions but also minimizes the number of reviews by choosing reviews whose implicit repetitions “knock out” other due reviews (like dominos), and calibrates the speed of the spaced repetition process to each individual student on each individual topic (student ability and topic difficulty are competing factors). Read more...
Bridging the communication gap between academia and industry in the field of TDA. Read more...
Demonstrating an open-source implementation of persistent homology techniques in the TDA package for R. Read more...
Persistent homology provides a way to quantify the topological features that persist over our a data set’s full range of scale. Read more...
An intuitive derivation. Read more...
A simple mnemonic trick for quickly differentiating complicated functions. Read more...
Hidden inside of every quadratic, there is a perfect square. Read more...
Every inscribed triangle whose hypotenuse is a diameter is a right triangle. Read more...
A limit problem conjured up from the depths of hell. Read more...
Type I pairs with the variable that runs vertically in the usual representation of the coordinate system. The remaining types are paired with the rest of the variables in ascending order. Read more...
The behavior of a multivariable function can be highly specific to the path taken. Read more...
During its operation from 2020 to 2023, Eurisko was the most advanced high school math/CS sequence in the USA. It culminated in high school students doing masters/PhD-level coursework (reproducing academic research papers in artificial intelligence, building everything from scratch in Python). Read more...
Two subtypes of coders that I watched students grow into. Read more...
In 9 months, these students went from initially not knowing how to write helper functions to building a machine learning library from scratch. Read more...
We can algorithmically build classifiers that use a sequence of nested “if-then” decision rules. Read more...
A simple classification algorithm grounded in Bayesian probability. Read more...
One of the simplest classifiers. Read more...
My training has been scattered and fuzzy until recently. Here’s the whole story. Read more...
Implementation notes for STDP learning in a network of Hodgkin-Huxley simulated neurons. Read more...
Many existing proofs are not accessible to young mathematicians or those without experience in the realm of dynamic systems. Read more...
Category theory provides a language for explicitly describing indirect relationships in graphs. Read more...
Framing complex systems in the language of category theory. Read more...
A function is a scribble that crosses each vertical line only once. Read more...
A series is the sum of a sequence. Read more...
A sequence is a list of numbers that has some pattern. Read more...
Merge sort and quicksort are generally faster than selection, bubble, and insertion sort. And unlike counting sort, they are not susceptible to blowup in the amount of memory required. Read more...
Some of the simplest methods for sorting items in arrays. Read more...
Repeatedly choosing the action with the best worst-case scenario. Read more...
Building data structures that represent all the possible outcomes of a game. Read more...
Minor changes to increase workout intensity and caloric surplus. Read more...
Daily 20-30 minute bedroom workout with gymnastic rings hanging from pull-up bar – just as much challenge as weights, but inexpensive and easily portable. Read more...
Won first place in a state-level competition by finding and exploiting a loophole in the points scoring logic. Read more...
The most important things I learned from competing in science fairs had nothing to do with physics or even academics. My main takeaways were actually related to business – in particular, sales and marketing. Read more...
As you climb the levels of math, sources of educational friction conspire against you and eventually throw you off the train. And one of the first warning signs is when you stop understanding things at the core, and instead try to memorize special cases cookbook-style. Read more...
Is there a standard “order of operations” for parallel vs nested absolute value expressions, in the absence of clarifying notation? Read more...
Hard-coding explanations feels tedious, takes a lot of work, and isn’t “sexy” like an AI that generates responses from scratch – but at least it’s not a pipe dream. It’s a practical solution that lets you move on to other components of the AI that are just as important. Read more...
For many (but not all) students, the answer is yes. And for many of those students, automation can unlock life-changing educational outcomes. Read more...
One of the best career hacks – especially for a junior dev – is to knock out your work so quickly and so well that you put pressure on your boss to come up with more work for you. Your boss starts giving you work that they themself need to do soon, which is really the exact kind of work that’s going to move your career forward. Read more...
The network becomes book-smart in a particular area but not street-smart in general. The training procedure is like a series of exams on material within a tiny subject area (your data subspace). The network refines its knowledge in the subject area to maximize its performance on those exams, but it doesn’t refine its knowledge outside that subject area. And that leaves it gullible to adversarial examples using inputs outside the subject area. Read more...
Initial parameter range, data sampling range, severity of regularization. Read more...
If any student, anywhere, is looking for advice on how to prepare for a standardized math test, then this is everything I’d tell them. Read more...
First, you need extensive and solid content knowledge. Then, you need to work through tons of practice exams for the specific exam you’re taking. This might sound simple, but every year, countless people manage to screw it up. Read more...
Montaigne’s education, strictly dictated by his parents and university studies, resulted in an isolative work with scholarly impact but limited public reach. Conversely, Benjamin Franklin’s goal-oriented self-teaching led to influential creations and roles benefiting his community and nation. Read more...
The main ideas behind computers can be understood by anyone. Read more...
Framing complex systems in the language of category theory. Read more...
In a simplified problem framing, we investigate the (game-theoretical) usefulness of limiting the number of social connections per person. Read more...
Persistent homology provides a way to quantify the topological features that persist over our a data set’s full range of scale. Read more...
The derivative tells the steepness of a function at a given point, kind of like a carpenter’s level. Read more...
How to avoid some of the most common pitfalls leading to ugly LaTeX. Read more...
A technique for maximizing linear expressions subject to linear constraints. Read more...
… are summarized in the following table. Read more...
In general, you can manipulate total derivatives like fractions, but you can’t do the same with partial derivatives. Read more...
Q: Draw a 10 x 10 square grid. How many squares are there in total? Not just 1 x 1 squares, but also 2 x 2 squares, 3 x 3 squares, and so on. A: The total number of square shapes is the total sum of square numbers 1 + 4 + 9 + 16 + … + 100. Read more...
Active learning leads to more neural activation than passive learning. Automaticity involves developing strategic neural connections that reduce the amount of effort that the brain has to expend to activate patterns of neurons. Read more...
Deliberate practice is the most effective form of active learning. It consists of individualized training activities specially chosen to improve specific aspects of a student’s performance through repetition and successive refinement. It is the opposite of mindless repetition. The amount of deliberate practice has been shown to be one of the most prominent underlying factors responsible for individual differences in performance across numerous fields, even among highly talented elite performers. Deliberate practice demands effort and intensity, and may be discomforting, but its long-term commitment compounds incremental improvements, leading to expertise. Read more...
Mastery learning is a strategy in which students demonstrate proficiency on prerequisites before advancing. While even loose approximations of mastery learning have been shown to produce massive gains in student learning, mastery learning faces limited adoption due to clashing with traditional teaching methods and placing increased demands on educators. True mastery learning at a fully granular level requires fully individualized instruction and is only attainable through one-on-one tutoring. Read more...
Loosely inspired by the German tank problem: several witnesses reported seeing a UFO during the given time intervals, and you want to quantify your certainty regarding when the UFO arrived and when it left. Read more...
Sure, accelerating via self-study not as optimal as accelerating within teacher-managed courses, but it’s way better than not accelerating at all. Read more...
When you’re developing skills at peak efficiency, you are maximizing the difficulty of your training tasks subject to the constraint that you end up successfully overcoming those difficulties in a timely manner. Read more...
There are many, many studies that measure variation in WMC vs variation in other metrics. Read more...
Bloom studied the training backgrounds of 120 world-class talented individuals across 6 talent domains: piano, sculpting, swimming, tennis, math, & neurology, and what he discovered was that talent development occurs through a similar general process, no matter what talent domain. In other words, there is a “formula” for developing talent – though executing it is a lot harder than simply understanding it. Read more...
Regret minimization cuts both ways. Read more...