A Recent Lighbulb Moment in my Math Academy Work

by Justin Skycak (@justinskycak) on July 19, 2025

Want to get notified about new posts? Join the mailing list and follow on X/Twitter.

This is a bit unexpected, not something I’ve been intentionally working on until this evening, but I just sped up the diagnostic pool selection algorithm by about 20x – it still produces the exact same results but now it does so 20x faster.

For reference, the diagnostic pool selection algo is the thing that selects a "pool" of topics that cover the desired course and prerequisites to a desired resolution. This is the first of steps in reducing the number of diagnostic questions: 1) compress the graph into the smallest "pool" of topics with sufficient coverage beforehand, that can be used by every diagnostic on that course, and 2) intelligently select topics from that pool during the actual diagnostic based on a student's individual answers.

This is significant because it resolves a bottleneck for allowing users to create custom courses in the future:

Previously, diagnostic pools could take up to about 20-30 minutes to compute for courses like M4ML, LA, etc., where there are many hundreds of prerequisites that we want to assess the student's knowledge on. (And that was already after introducing some clever graph traversals to speed it up -- it would have taken multiple hours otherwise.)
A compute time of 20-30 minutes was enough to generate and calibrate diagnostics for our purposes, but it's always been in the back of my mind that we eventually want to enable students to create their own "custom courses" on demand, and ideally we'd want to let students jump straight into the diagnostic within minutes creating their custom course, which meant I would have to figure out some way to speed up that diagnostic pool selection algo, especially for those courses with many hundreds of prerequisites.
Anyway, after this algo update, those 20-30 minute pool computations are now clocking in at a little over 1 minute. So, that's a nice thing not to have to worry about in the future!

Funny enough, the way I stumbled upon this improvement was by taking a break in the evening and doing a bit of math for fun:

I was reading up on existing approaches to an open research problem, designing algorithms to compute "distance-k coverings" in undirected graphs. It's kind of similar to the idea of computing a diagnostic pool, except our "diagnostic pool" problem takes place in a more complicated setting, a directed graph with a multi-condition definition of "covering."
There's a clever trick that pops out in the simpler setting where you can leverage previous computations to filter down the set of possibilities for the next node to select, and it turns out that trick can be leveraged in our more complicated setting.
(The extension to our more complicated setting is not obvious -- it's one of those cases where you really need that gut feeling of "seems like it might be possible, probably worth looking into," and then the more you look into it the more possible it seems, and then finally you really nail it down after a couple hours.)

Want to get notified about new posts? Join the mailing list and follow on X/Twitter.