ML Courses can Vary Massively in their Coverage

by Justin Skycak (@justinskycak) on

I was coming in with the mindset of "we need to cover the superset of all the content covered in the major textbooks," which we're able to do quite well for traditional math. For ML, the rule will have to be amended to "we need to cover the superset of all the content covered in standard university course syllabi."

Want to get notified about new posts? Join the mailing list and follow on X/Twitter.

A really interesting thing I’m finding while scoping out Math Academy’s upcoming classical ML course is that ML resources can vary massively in their coverage.

This seems to occur to a much lesser extent in traditional math courses – for instance, calculus is segmented out pretty well:

  • Calculus I ~ single-variable differential calculus
  • Calculus II ~ single-variable integral calculus
  • Calculus III ~ multivariable calculus
  • Real Analysis ~ proof-based calculus (loosely speaking)

But it seems like in ML, everything is just kind of thrown together into a big stew and different resources cover different cross-sections.

The two “canonical” ML books that come to mind are Bishop and ESL, both weighing in at 700+ pages. There are calculus books that big, but that’s because they cover Calculus I, II, and III all together!

I already anticipated that we’d need to break ML into two different semester-long courses, classical ML and then deep learning:

  • Classical ML would cover linear regression through decision trees and MLPs (plus other standard stuff like naive Bayes, clustering, PCA, etc.)... maaaaybe a brief intro to CNNs.
  • Deep learning would start at CNNs and cover the zoo of other architectures, also GANs, diffusion models, etc.

But it’s looking like we might need to segment further. Just looking at the linear regression learning pathway, Bayesian stats is one boundary where I’d draw the line of “beyond the scope of a semester-long classical ML course.”

  • Fitting a linear regression to data via pseudoinverse & gradient descent? Yeah, I'd obviously include that.
  • Constructing confidence intervals for linear regression coefficients? Starting to get into the gray area between ML and stats.
  • Bayesian linear regression, where you factor in a prior distribution? Seems out-of-scope for a ML course. I would put that in a "Bayesian stats" course instead, an offshoot that would come after a typical prob/stats course.

But I’d still want to include naive Bayes classifiers, of course. Naive Bayes is solidly in classical ML.

Probably, the general rule should be this: instead of trying to match up with ML textbook coverage, I should focus on syllabi for standard ML courses at serious universities.

I guess this is all sort of obvious in hindsight, but the thing that threw me off is I was coming in with the mindset of “we need to cover the superset of all the content covered in the major textbooks,” which we’re able to do quite well for traditional math.

For ML, the rule will have to be amended to “we need to cover the superset of all the content covered in standard university course syllabi.”


Want to get notified about new posts? Join the mailing list and follow on X/Twitter.