Bottom-Up Versus Top-Down Arguments in Machine Learning Education
Want to get notified about new posts? Join the mailing list and follow on X/Twitter.
âBottom-up vs top-downâ arguments in ML education never end well because there is little agreement on where the bar is for what it means to learn ML.
On one extreme, some people drop the bar really low, sometimes all the way down to âyou know ML if you can use an off-the-shelf library to complete a run-of-the-mill project, you donât need to know whatâs going on under the hood, you donât need to be able to read/implement research papers, you donât need to be able to customize for novel use-cases that pop up when youâre trying to innovate.â
(Nowadays some people will even claim they do ML/AI if all theyâre doing is chatting with an LLM, and theyâll tell you that not only do you not need to know much math, but you donât need to know much coding either â anyone can learn ML/AI in a day, you just throw a coherent prompt into an API call.)
On the other hand, some people raise the bar so high that you only meet it if youâre publishing mathematical theorems about ML theory. You ask them how much math you need for ML, theyâll point you at a textbook like [Abstract] Algebra, Topology, Differential Calculus, and Optimization Theory For Computer Science and Machine Learning that weighs in at nearly 2,200 pages peppered with stuff like Zornâs Lemma, Group Theory, Quaternions, ⊠none of which are needed to read/implement most research papers and customize for novel use-cases that pop up when youâre trying to innovate.
Hereâs what I consider to be the most pragmatic, actionable, and efficient approach to learning serious ML:
Plan your broad-strokes journey top-down, but carry out the granular steps bottom-up.
The top-down approach can be useful for planning a broad-strokes learning journey towards a goal. For instance, if you want to learn ML, then you can think top-down to figure out what fields of math you need to learn in order for machine learning to become accessible to you. Youâll find that you absolutely need to learn calculus, linear algebra, and prob/stats, and you can skip stuff like abstract algebra, number theory, etc.
However, the granular steps of the journey, the actual learning, needs to be carried out bottom-up.
For instance, are you really going master computing neural net weight gradients via backpropagation by asking âwhat does that squiggly âdâ mean,â âwhy do you have to chain-multiply the derivatives like that,â âhow do you calculate the derivative of any activation function,â etc., all the way down to the depths of whatever is the last piece of math youâve mastered?
No, all youâre going to do with those questions is create a roadmap of what you need to learn. Which is essentially a calculus course. Except your roadmap will be terrible because you donât actually know the subject yourself â it will have all sorts of gaps that you donât even realize are missing because, which is to be expected given that you donât actually know the subject.
Youâll try to climb back up the skill tree implied by your incomplete roadmap and youâll repeatedly get stuck trying to climb up to the next branch that you canât reach because there are prerequisites that you donât realize youâre missing.
Most people in this situation will eventually just give up due to all the friction. Only those who have extremely outsized perseverance and generalization ability have any chance of fighting through and making it to the other side. And even then, it will take longer (and theyâll likely end up with more holes in their knowledge) than if they just sucked it up and worked through a well-sequenced calculus course.
Anyway, Iâll probably circle back and write more about this once our âserious MLâ course sequence is done (not just the first course, but the full course sequence all the way up through transformers, diffusion models, DDPGs, etc.) â which should make it clear where the bar is (i.e., what constitutes âserious MLâ), how much math you need to clear the bar, why you need that math (i.e., what exact ML stuff is it a prerequisite of), and how vast a chasm there is between the level of math that most aspiring âserious MLâ learners know and level they really do need to get to, to achieve their âserious MLâ aspirations.
By the way, our approach comes from teaching this stuff manually to high schoolers for several years with great success:
E.g., one of them won 1st place ($250,000) in last yearâs Regeneron Science Talent Search for developing a model that ârevealed 1.5 million previously unknown objects in space, broadened the potential of a NASA missionâ (a direct quote from Caltechâs website) and published his results solo-author in The Astronomical Journal. More info in the image below:
Lastly, I want to be clear that I have nothing against ML projects. Hands-on, guided projects that go beyond textbook exercises are important, we will be including plenty of projects in our own ML courses, and I even recommended some project-heavy learning resources as part of a ML roadmap that I put together last year. There is a clear benefit to doing projects and Iâm not trying to steer anyone away.
What I am saying, though, is that in my experience as well as my read of the science of learning literature: Projects are great for pulling a bunch of knowledge together and pushing it further, but when a student is learning something for the very first time, itâs more efficient to do so in a lower-complexity, higher-scaffolding setting. Otherwise, without a high level of background knowledge, itâs easy to get overwhelmed by a project â and when a student is overwhelmed and spinning their wheels being confused without making much progress, thatâs very inefficient for learning.
Want to get notified about new posts? Join the mailing list and follow on X/Twitter.